On the Value of Knowing a z• Ion for What It Is - Journal of Proteome

Nov 13, 2007 - This provides a straightforward means for identifying the z• ions in ETD spectra. This information can improve the specificity of datab...
0 downloads 0 Views 293KB Size
On the Value of Knowing a z• Ion for What It Is Jian Liu, Xiaorong Liang, and Scott A. McLuckey* Department of Chemistry, Purdue University, West Lafayette, Indiana, 47907-2084 Received June 26, 2007

Computer simulation of database searches of electron transfer dissociation (ETD) spectra using both “bottom up” and “top down” approaches was performed to evaluate the utility of knowing a priori which product ions contain the C-terminus (i.e., the z• ions). In this work, knowledge of the identities of the z• ions was used to exclude putative identifications that are based solely on the mass matching of undifferentiated product ions derived from an experiment with those derived from in silico fragmentation. The benefit from knowing which ions are z• ions was found to be heavily dependent on the quality of the ETD spectra, in terms of sequence coverage afforded by the product ions, the amount of noise in the spectra (i.e., extraneous peaks that do not directly reflect primary structure), and mass measurement accuracy. Under conditions in which the likelihood for misidentifications are high without a priori knowledge of ion types (e.g., b-, y-, c-, or z-ions), a knowledge of which product ions are z• ions allows discrimination against false-positive identifications. Relatively little benefit from knowing which ions are z• ions was noted when product spectra reflected relatively high sequence coverage and when a low fraction of the products ions were due to extraneous peaks (i.e., spectra with relatively little noise). In all cases, specificity is higher with higher mass measurement accuracy with the consequent reduction in benefit from knowledge of which ions are z• ions. Keywords: Protein database search • electron transfer dissociation • protein identification • top down proteomics • bottom up proteomics

Introduction Because of its speed, specificity, and sensitivity, mass spectrometry has become a dominant technique in proteomics as a tool for rapid protein identification and characterization. Tandem mass spectrometry (MS/MS), in particular, is used when mixtures of gene products are present. Tandem mass spectrometry-based proteomics approaches are generally placed in one of two categories: the ‘bottom up’ approach, which involves the analysis of peptides derived from enzymatic or chemical cleavage of whole proteins, and the ‘top down’ approach, which involves the direct analysis of intact protein ions.1–8 In a typical MS/MS experiment, an isolated precursor ion derived from a protein/peptide of interest is dissociated by either an activation method, such as collision-induced dissociation (CID),9 or by an ion-electron reaction via either direct electron capture by or ion/ion electron transfer to a multiply charged peptide or protein. Fragmentation resulting from electron capture is referred to as electron capture dissociation (ECD),10 and that resulting from electron transfer is referred to as electron transfer dissociation (ETD).11,12 Complementary to the traditional CID approach, ECD/ETD has been shown to lead to the formation of c and z• ions through the cleavage of N-CR bonds of the polypeptide backbone, often providing higher sequence coverage than CID and with pres* To whom correspondence should be addressed. Dr. Scott A. McLuckey, Department of Chemistry Purdue University West Lafayette, IN, USA 479072084. Phone: (765) 494-5270. Fax: (765) 494-0239. E-mail: mcluckey@ purdue.edu.

130 Journal of Proteome Research 2008, 7, 130–137 Published on Web 11/13/2007

ervation of post-translational modification information. (Note that throughout this paper the presence of an additional proton to provide a charge is implied in referring to a c or z• ion.) Recently, electron transfer dissociation (ETD) has been gaining increasing popularity due to its compatibility with widely used 3-D and linear ion trap instruments. Radical chemistry associated with the z• ions formed from ETD has been observed in ion traps, such as formation of oxygen adducts to z• ions, from the trace oxygen in the background gas.13 No such adduct formation has been noted for c ions or any other even electron peptide product ions. This unique chemistry associated with z• ions allows facile identification of z• species in an ETD spectrum by identifying the doublet peaks of the z• ion and its corresponding oxygen adduct ion (z• + 31.9998 Da). A knowledge of which product ions resulting from ETD are z• ions might be expected to allow improved specificity in database searching relative to approaches in which no independent distinction between product ion types can be made. Unknown protein identification using tandem mass spectrometry data, which may be interpreted or uninterpreted, is generally accomplished through a search against a protein or translated genomic database. In an interpreted MS/MS spectrum, a contiguous series of product ions corresponding to fragmentation at adjacent residues might be identified and used as a “sequence tag”14–17 for protein database identification, which is optionally combined with other information. In the case of the “bottom up” approach, for example, the enzyme specificity and the masses of the N-terminal and C-terminal “flanking” regions of the peptide may be used, whereas in a 10.1021/pr0703977 CCC: $40.75

 2008 American Chemical Society

research articles

Value of Knowing a z• Ion for What It Is “top down” approach, the mass of the intact protein might be used. However, the most widely used approach for database identification using peptide or protein ion MS/MS spectra has been to directly subject the uninterpreted product ion spectra to database analysis by comparing the experimental product ion spectra to the theoretical product ion spectra created in silico from database candidate proteins.18–22 The protein candidates are ranked by various scoring schemes either mainly based on the number of peak matches22 or together with other information such as cleavages at specific residue sites18 or product ion abundance information.23 Sequence database searching has proved to be a powerful and essential technique in proteomics for unknown protein identification, but misidentifications still pose a challenge. Identifications based on poor-quality spectra, which can arise, for example, from poor product ion signal-to-noise ratios, poor sequence coverage, major contributions from uninformative fragmentation channels, mixtures of precursor ions, and so forth, are particularly suspect. When no additional chemical information (e.g., fragment ion types) is available, the simple “mass-to-mass” matching scheme used in most algorithms might randomly match some noise peaks in the experimental spectra with sequence ions in the theoretical spectra. To mitigate the problem due to the “blindness” associated with this matching scheme, chemical information such as product ion fragmentation pattern24 may be combined into the database search process or a presearch or postsearch filter may be used to remove some or all unlikely candidate proteins. For example, the product ion type (i.e., b-, y-, c-, and/or z-type ion) might be used as such a filter where a misidentified protein, no matter how well it matches the rest of the spectrum, can be excluded if it has an incorrect match or no match to any of the product ions whose type can be known somehow. However, the value of knowing the type of a fragment ion remains largely unexplored in the context of a database search, although many efforts have been made to differentiate between the b and y or c and z• ions in the spectra to improve peptide/ protein de novo sequencing.25–31 In this study, the importance of using a specific fragment ion type information, namely, a z• ion in this work, as a postsearch filter to remove the interfering protein in a database search was studied through computer simulation.

Methods A computer program written in Perl was used to simulate ETD spectra and served as the engine for searching against the Swiss-Prot database. ETD spectra were generated from proteins randomly selected from a Yeast database (6359 entries) in both “bottom up” and “top down” simulations. In the latter case, the randomly selected protein was fragmented in silico into c and z• ions, a specified number of which were randomly selected as the product ions which, combined with a specified amount of random noise (i.e., product masses generated randomly), constitute a deconvoluted intact protein ETD spectrum consisting solely of singly charged ions. However, in the “bottom up” approach, the randomly selected yeast protein was first digested in silico into tryptic peptides with no missed cleavages. Tryptic peptides were randomly selected, and if the peptide contained more than two amino acids, it was further subjected to in silico fragmentation into c and z• ions, a specified amount of which were randomly selected to simulate a peptide ETD spectrum when mixed with a specified amount of random noise. To randomly select a specified number of c

ions for a tryptic peptide in the “bottom-up” approach or for a protein in the “top-down” approach, that peptide or protein was subjected to complete in silico fragmentation, which formed a product ion pool consisting of all possible c ions. A Perl built-in random function was used to randomly select a c ion with uniform probability from that pool with subsequent removal of that c ion from the remaining pool. This process was repeated until the specified number of c ions was obtained. An analogous procedure was employed for the random selection of a specified number of z• ions. The percentage of c or z• ion peaks in a simulated spectrum was based on the maximum number of c or z• ions possible for a protein/peptide (i.e., N – 1 for a polypeptide of N residues). For example, in a spectrum consisting of x% c ion peaks and y% z• ion peaks, the number of c ions (nc) and z• ions (nz) in the spectrum for a peptide/ protein of N amino acid residues was defined as: nc ) x % (N - 1) and nz ) y % (N - 1) , respectively. Note that the benefit from knowing which product ions were z• ions increased with the number of z• ions in the spectrum. A selection of a 50:50 split between c ions and z• ions was used in our study, that is, x equal to y in the above equations, representing an intermediate case. The maximum number of random noise peaks was arbitrarily set to 2(N - 1) such that the number of such peaks is related to the w% of noise peaks by nn ) w % [2(N - 1)] And all calculated numbers were rounded up to the next integer. It was assumed in this study that all z• ions in the experimental ETD spectra could be identified as such via, for example, the formation of oxygen adduct ions. To search the simulated ETD spectrum of an intact protein against the database, proteins within a specified mass window of the target protein ((100 Da in this study) were extracted from the Swiss-Prot database (release 52). However, during the searching of a peptide spectrum, all proteins in the Swiss-Prot database were first digested in silico into tryptic peptides; only those which were within a specific mass window ((2 Da) of the target tryptic peptide were selected. The mass-selected proteins/peptides were subjected to in silico fragmentation into c and z• ions and compared to the simulated ETD spectrum. Candidate proteins were ranked by the number of matches. Incorrect protein identifications with an equal or higher number of matches than the real target yeast protein were defined herein as interfering proteins, and those interfering proteins with a z• ion series that does not match all the z• ions in the simulated ETD spectrum were defined as removable interfering proteins. The value of knowing a z• ion in the database search for its use in removing interfering proteins can be expressed quantitatively via a term defined herein as protein discriminating power (DP): DP )

Number of removable interfering proteins Total number of interfering proteins

(1)

An arbitrary value of -0.2 was assigned to DP in cases where target proteins can be correctly identified without any interfering proteins. This permits differentiation from cases in which DP value equal to 0 is found, which corresponds to the case where interfering proteins are present but cannot Journal of Proteome Research • Vol. 7, No. 01, 2008 131

research articles

Liu et al.

Figure 1. Results of “bottom up” database search of 100 simulated ETD spectra (5% c ions, 5% z• ions, and 95% noise peaks) of tryptic peptides from 100 yeast proteins with a mass accuracy of 100 ppm.

be excluded on the basis of knowing which ions in the spectrum are z• ions.

Results and Discussions “Bottom Up” Database Search of ETD Spectra. Figure 1 summarizes the results of a database search of 100 simulated ETD spectra from a “bottom up” approach, in which the masses of the peptides for which the ETD spectra were simulated are plotted from low mass to high on the abscissa. Each spectrum was simulated from a randomly selected peptide with at least 3 amino acid residues, formed by in silico trypsin digestion of one of 100 yeast proteins also randomly selected from the yeast database. All simulated spectra in this case consist of 5% c ions, 5% z• ions, and 95% noise peaks using a mass accuracy of 100 ppm. The green bar in Figure 1 represents the total number of proteins in the Swiss-Prot database that contain at least one in silico trypsin digested peptide with a mass within the mass tolerance window ((2 Da) of a specific yeast probe peptide whose ETD spectrum was subjected to database search. Among these proteins, defined herein as mass matched proteins, some proteins may have an in silico trypsin-digested peptide with the same or a higher number of matches to the ETD spectrum than does the yeast probe peptide, and are defined herein as interfering proteins and represented by the red bars in Figure 1. The number of mass matched proteins is inversely related to the mass of the yeast probe peptide, as is also the case for the number of interfering proteins. The larger number of interfering proteins associated with the probe peptide of lower mass is not because of the existence of the same probe peptide sequence in these interfering proteins. Rather, this result is due to the smaller number of c and z• sequence ions present in the simulated spectra, as the fixed %c and %z ions used in the simulation lead to smaller numbers of c and z• ions as the number of residues in the probe sequence decreases. If the z• ions in the ETD spectrum can be identified, interfering proteins with no matches to any of these z• ions can be readily excluded, even if they were ranked high based on the total number of matches to the spectrum, because a real target peptide should match all these z• ions. The number of interfering proteins that could be excluded on this basis is represented by the blue bars in Figure 1. A clear trend can be observed in Figure 1 that more blue bars exist in the high mass region, which indicates that more interfering proteins can be 132

Journal of Proteome Research • Vol. 7, No. 01, 2008

excluded on the basis of knowing which fragments are z• ions for higher mass peptides than for the low mass peptides. In the particular simulation associated with Figure 1, all interfering proteins associated with two of the probe peptides can be completely removed (3 interfering proteins for the probe peptide with a mass of 1661.7 Da and 38 interfering proteins for the probe peptide with a mass of 1639.9 Da). Green bars in Figure 1 with no overlapping red bars on them represent the cases where the target protein can be directly identified without any interfering proteins, that is, the target protein has the largest number of matches to the experimental ETD spectrum compared to any other protein in the Swiss-Prot database. The advantage of knowing a z• ion in an ETD spectrum lies in its ability to remove some or all of the interfering proteins in a database search conducted with a commonly used algorithm based on peak matching between spectra from ETD experiments and in silico fragmentation. Therefore, a figureof-merit defined as discriminating power (DP) (see eq 1) was used to evaluate quantitatively the benefit of knowing the z• ions in the ETD spectrum by examining the ratio of the number of removable interfering proteins divided by the total number of interfering proteins from the database search. To facilitate a schematic description of the results, an arbitrary value of DP equal to -0.2 is assigned to cases where the target protein can be directly identified without any interfering proteins. The discriminating power afforded by knowing which products are z• ions in the ETD spectra depends on several factors, including the sequence coverage reflected by the c and z• ions and the contribution of extraneous peaks (noise) to the ETD spectra. Figure 2 summarizes the variation of the DP with various extents of noise and sequence coverage simulated with a mass accuracy of 100 ppm. The data in Figure 2 suggest that spectra with high sequence coverage and relatively little noise benefit very little from knowing which ions are z• ions. In the case summarized in Figure 2d, for example, 53 of the 100 proteins could be identified without any interfering proteins (DP ) -0.2). As noise levels in the spectra increase and sequence coverage decreases, some proteins can no longer be identified directly (Figure 2a-c). The DP values associated with the searches of these spectra increase with the noise level and with decreased sequence coverage. For example, of the 53 probe proteins that can be directly identified in spectra consisting of 75% c ions, 75% z• ions, and 25% noise (Figure

Value of Knowing a z• Ion for What It Is

research articles

Figure 2. Effects of noise levels on the DP in the “bottom up” database search of 100 ETD spectra of tryptic peptides from 100 yeast proteins with a mass accuracy of 100 ppm.

Figure 3. Effects of the instrument mass accuracy ((a) 300 ppm, (b) 100 ppm, and (c) 30 ppm) on the DP in the “bottom up” database search of 100 ETD spectra (5% c ions, 5% z• ions, and 95% noise) from tryptic peptides of 100 yeast proteins.

2d), 46 become unidentifiable when spectrum composition changes to 5% c ions, 5% z• ions, and 95% noise (Figure 2a) because of the existence of interfering proteins. Of these 46 proteins, 7 can be identified by excluding all of the interfering proteins on the basis of knowing the z• ions (i.e., DP ) 1, Figure 2a). For 35 of the 46 proteins, many, but not all, of the interfering proteins can be excluded (i.e., 0 < DP < 1, Figure 2a), while 4 of the 46 proteins yield DP values equal to 0 (i.e., no benefit to knowing the z• ions).

Sequence coverage and mass measurement accuracy32,33 are major factors in determining the overall specificity associated with the MS/MS experiment. Knowledge of which product ions are z• ions can be used to improve the specificity of the approach. The role of mass accuracy is illustrated in Figure 3, which summarizes the DP values obtained with simulations of noisy spectra with low sequence coverage at mass measurement accuracies of 300, 100, and 30 ppm. As the specificity of the approach is improved via higher mass accuracies, it is clear Journal of Proteome Research • Vol. 7, No. 01, 2008 133

research articles

Liu et al.

Figure 4. Results of database search of “top down” ETD spectra (1% c ions, 1% z• ions, and 10% noise) from 100 yeast proteins with a mass accuracy of 100 ppm.

Figure 5. Effects of the coverage of c and z• sequence ions in the ETD spectrum on the DP in the “top down” database search of ETD spectra of 100 yeast proteins with a mass accuracy of 100 ppm.

that the benefit to using knowledge of the z• ions decreases because the specificity of the approach is sufficient without this knowledge. “Top Down” Database Search of ETD Spectra. Figure 4 summarizes the results from a database search of 100 simulated ETD spectra from a set of 100 intact proteins randomly selected 134

Journal of Proteome Research • Vol. 7, No. 01, 2008

from the yeast database. The ion mixture consists of 1% c ions, 1% z• ions, and 10% noise in the spectra simulated with a mass accuracy of 100 ppm. Similarly, as in Figure 1, the red bars stand for the interfering proteins and the blue bars indicate the removable interfering proteins, while the green bars represent the mass matched proteins (i.e., proteins in the Swiss-Prot database

Value of Knowing a z• Ion for What It Is

research articles

Figure 6. Effects of the instrument mass accuracy ((a) 30 ppm, (b) 100 ppm, and (c) 300 ppm) on the DP in the “top down” database search of ETD spectra (1% c ions, 1% z• ions, and 10% noise) of 100 yeast proteins.

with masses within the mass window ((100 Da) of the intact yeast probe protein). A salient feature shown in Figure 4 is that the number of removable interfering proteins is very close or equal to the number of interfering proteins in almost all searches in which interfering proteins were observed, which indicates that most of the interfering proteins can be identified and removed by knowing the z• ions in the spectrum. Among the 100 database searches shown in Figure 4, only 11 of the searches gave direct identifications of the target yeast proteins (green bars with no overlapping red bars), while the remaining 89 searches were all complicated by the existence of interfering proteins. However, 61 of these 89 searches have a DP value equal to 1, which suggests that if the z• ions can be recognized in the spectrum, the number of proteins that can be identified increases from 11 to 72. Furthermore, 18 of the remaining searches with interfering proteins have a DP value larger than 0.95 and only one gives a DP value equal to 0. The results from this particular simulation suggest that a database search of “top down” ETD spectra of intact proteins might be significantly improved, in cases with low sequence coverage and high relative contributions of noise, if the z• ions can be identified in the spectra. As shown in the “bottom up” database search simulations, both sequence coverage associated with the product ions and the contribution of extraneous (noise) peaks have a significant effect on the degree to which a database search can benefit from knowing the z• ions in the spectra (see Figure 2). The role of sequence coverage was investigated in the “top down” approach by varying the coverage of sequence ions in the spectra while keeping constant the amount of noise. The change of DP as a function of the coverage of c and z• ions in the spectra is illustrated in Figure 5. It is clearly suggested in Figure 5c that when there are enough sequence ions in the spectra, algorithms based mainly on the number of matches perform very well for identification. For example, 96 out of 100 target proteins can be directly

identified (DP ) -0.2) when the product ions consist of 2% c ions, 2% z• ions, and 10% noise. However, the number of direct identifications decreases dramatically with a decrease in the number of sequence ions in the spectra. For example, there are only 11 and 39 proteins that can be directly identified, respectively, in the database searches shown in Figure 5a,b in which the percentages of c/z• ions varies as 0.5% and 1.0%, respectively. On the other hand, it is obvious in Figure 5 that a majority of database searches that did not give a correct identification have very high DP values, and most of them have a DP value equal to 1, which means all interfering proteins can be excluded. Specifically, the number of searches with a DP value equal to 1 is 62, 53, and 2 in panels a, b, and c of Figure 5, respectively. As a result, the total correct identifications from both direct identification based on the number of matches and by knowing which products ions are z• ions are 73, 92, and 98, respectively, for searches shown in panels a, b, and c of Figure 5. The above results strongly suggest that recognizing the z• ions in the spectrum can be very helpful in the database search of ETD spectra in cases where algorithms based only on the number of matches are complicated by interfering proteins. The effect of mass measurement accuracy in the top down simulations is illustrated in Figure 6, where ETD spectra consisting of 1% c ions, 1% z• ions, and 10% noise were simulated using mass accuracies of 30, 100, and 300 ppm. As expected, the number of proteins that can be identified directly increases with better mass measurement accuracy. For instance, the number of yeast probe proteins that can be directly identified are 91, 41, and 8, respectively, for searches of spectra with mass accuracies of 30 ppm (Figure 6a), 100 ppm (Figure 6b), and 300 ppm (Figure 6c). However, if the z• ions can be identified in the spectra obtained with relatively poor mass accuracy, the ability to correctly identify an unknown protein can be improved significantly. Figure 6b shows, for example, Journal of Proteome Research • Vol. 7, No. 01, 2008 135

research articles that, in the database search of 100 spectra simulated with 100 ppm mass accuracy, an additional 54 target proteins can be identified (DP ) 1) by knowing the z• ions in the spectrum, in addition to the 41 proteins from direct identification. In the case of the simulation at 300 ppm (Figure 6c), the number of additional identifications increases to 79 (to reach a total of 87 identifications including those made directly). It is also noteworthy that even with an instrument of moderate mass accuracy, for example, 30 ppm, database searches can still be improved by knowing the z• ions in the spectrum, as indicated in Figure 6a where 4 additional identifications were obtained. Overall, the total number of identifications from both direct identification and indirect identification by knowing a z• ion is 95, 95, and 87, respectively, for the spectra simulated with mass accuracies of 30 ppm (Figure 6a), 100 ppm (Figure 6b), and 300 ppm (Figure 6c). As suggested in the results from both “bottom up” and “top down” database searches of ETD spectra, in order to obtain a correct identification using an algorithm based on the number of matches to the spectrum requires an ETD spectrum of high quality, that is, high sequence coverage, few extraneous peaks (noise), and relatively high mass accuracy (Figures 2d, 3c, 5c, and 6a). However, low quality ETD spectra are often encountered, especially in high-throughput experiments, due to multiple factors that include insufficient HPLC separation of highly complex mixtures, large precursor isolation windows for dissociation, low product ion signal-to-noise ratios, use of an instrument of low mass accuracy, inefficient ETD process, and so forth. Production of c and z• sequence ions in ion/ion electron transfer to a polypeptide–cation is only one of several competing channels. Others include proton transfer (PT) from the multiply charged peptide/protein cation to the reagent radical anion, electron transfer with no dissociation (ET,noD), and electron transfer with neutral side-chain losses. The competition between these channels has been noted to be dependent upon the anionic reagent,34 the bath gas temperature,35 the charge site identities and their locations in the peptide,36 and the charge state or mass-to-charge ratio of the precursor ion.37 For example, when ETD is performed on a precursor of relatively low charge state (high mass-to-charge ratio), PT and ET,noD dominate, and consequently, a lowquality spectrum with few sequence ions is often observed. A database search based on low-quality spectra usually leads to ambiguous identification complicated by the interfering proteins (Figures 2a-c, Figures 3a and 3b, Figures 5a and 5b, and Figures 6b and 6c). The simulations performed here clearly show that it is this scenario that can benefit most from the ability to recognize z• ions. Therefore, by recognizing z• ions in the spectra, effective protein identification through an algorithm based on the number of matches is no longer limited to high quality ETD spectra and can be extended to spectra of relatively low quality without significant loss of performance.

Conclusions The likelihood for correct protein identification, based either on peptides or whole proteins, depends both upon the specificity of the approach and the required level of discrimination. In tandem mass spectrometry, the specificity of the approach is largely determined by the information content of the product ion spectrum and the mass measurement accuracy applied to the product ions. The required level of discriminatory power is largely determined by the complexity of the mixture and the relative contribution of products that do not provide accurate 136

Journal of Proteome Research • Vol. 7, No. 01, 2008

Liu et al. sequence information. When ETD spectra yield high sequence coverage with relatively little contribution from extraneous peaks and are collected with relatively high mass accuracies, algorithms that rely on matching of masses derived from uninterpreted product ion spectra with in silico fragmented species enable high confidence protein identification. The additional specificity afforded by the knowledge of which product ions are z• ions is unnecessary under such favorable conditions. However, low-quality spectra (i.e., low sequence coverage and relatively high contributions from extraneous peaks) can give rise to misidentifications. The situation is exacerbated when data are collected with relatively poor mass accuracy. Under such conditions, the additional specificity that results from knowing which ions are z• ions can lead to a significant improvement in the likelihood for making correct identifications by identifying putative identifications that can be excluded. In the simulations conducted here, the “top down” approach appears to show a greater potential benefit from knowing which product ions are z• ions, particularly when data are collected with modest mass accuracy.

Acknowledgment. This work was supported by the National Institute of General Medical Sciences under Grant GM 45372. References (1) Henzel, W. J.; Billeci, T. M.; Stults, J. T.; Wong, S. C.; Grimley, C.; Watanabe, C. Identifying proteins from 2-dimensional gels by molecular mass searching of peptide-fragments in proteinsequence databases. Proc. Natl. Acad. Sci. U.S.A. 1993, 90, 5011– 5015. (2) Mann, M.; Hojrup, P.; Roepstorff, P. Use of mass-spectrometric molecular-weight information to identify proteins in sequence databases. Biol. Mass Spectrom. 1993, 22, 338–345. (3) Pappin, D. J. C.; Hojrup, P.; Bleasby, A. J. Rapid identification of proteins by peptide-mass finger printing. Curr. Biol. 1993, 3, 487– 487. (4) Yates, J. R.; Speicher, S.; Griffin, P. R.; Hunkapiller, T. Peptide mass maps-a highly informative approach to protein identification. Anal. Biochem. 1993, 214, 397–408. (5) Zhang, W. Z.; Chait, B. T. Profound: An expert system for protein identification using mass spectrometric peptide mapping information. Anal. Chem. 2000, 72, 2482–2489. (6) Clauser, K. R.; Baker, P.; Burlingame, A. L. Role of accurate mass measurement (+/- 10 ppm) in protein identification strategies employing MS or MS/MS and database searching. Anal. Chem. 1999, 71, 2871–2882. (7) Chorush, R. A.; Little, D. P.; Beu, S. C.; Wood, T. D.; Mclafferty, F. W. Surface-induced dissociation of multiply protonated proteins. Anal. Chem. 1995, 67, 1042–1046. (8) Schaaff, T. G.; Cargile, B. J.; Stephenson, J. L.; McLuckey, S. A. Ion trap collisional activation of the (M + 2H)2+ - (M + 17H)17+ ions of human hemoglobin β-chain. Anal. Chem. 2000, 72, 899–907. (9) Mcluckey, S. A. Principles of collisional activation in analytical mass-spectrometry. J. Am. Soc. Mass Spectrom. 1992, 3, 599–614. (10) Zubarev, R. A.; Kelleher, N. L.; McLafferty, F. W. Electron capture dissociation of multiply charged protein cations. A nonergodic process. J. Am. Chem. Soc. 1998, 120, 3265–3266. (11) Syka, J. E. P.; Coon, J. J.; Schroeder, M. J.; Shabanowitz, J.; Hunt, D. F. Peptide and protein sequence analysis by electron transfer dissociation mass spectrometry. Proc. Natl. Acad. Sci. U.S.A. 2004, 101, 9528–9533. (12) Pitteri, S. J.; Chrisman, P. A.; Hogan, J. M.; McLuckey, S. A. Electron transfer ion/ion reactions in a three-dimensional quadrupole ion trap: Reactions of doubly and triply protonated peptides with SO2•. Anal. Chem. 2005, 77, 1831–1839. (13) Xia, Y.; Chrisman, P. A.; Pitteri, S. J.; Erickson, D. E.; McLuckey, S. A. Ion/molecule reactions of cation radicals formed from protonated polypeptides via gas-phase ion/ion electron transfer. J. Am. Chem. Soc. 2006, 128, 11792–11798. (14) Mann, M.; Wilm, M. Error tolerant identification of peptides in sequence databases by peptide sequence tags. Anal. Chem. 1994, 66, 4390–4399.

research articles

Value of Knowing a z• Ion for What It Is (15) Cargile, B. J.; McLuckey, S. A.; Stephenson, J. L. Identification of bacteriophage MS2 coat protein from e-coli lysates via ion trap collisional activation of intact protein ions. Anal. Chem. 2001, 73, 1277–1285. (16) Ge, Y.; Lawhorn, B. G.; ElNaggar, M.; Strauss, E.; Park, J. H.; Begley, T. P.; McLafferty, F. W. Top down characterization of larger proteins (45 kda) by electron capture dissociation mass spectrometry. J. Am. Chem. Soc. 2002, 124, 672–678. (17) Demirev, P. A.; Ramirez, J.; Fenselau, C. Tandem mass spectrometry of intact proteins for characterization of biomarkers from bacillus cereus t spores. Anal. Chem. 2001, 73, 5725–5731. (18) Eng, J. K.; Mccormack, A. L.; Yates, J. R. An approach to correlate tandem mass-spectral data of peptides with amino-acid-sequences in a protein database. J. Am. Soc. Mass Spectrom. 1994, 5, 976– 989. (19) Yates, J. R.; Eng, J. K.; Mccormack, A. L.; Schieltz, D. Method to correlate tandem mass-spectra of modified peptides to aminoacid-sequences in the protein database. Anal. Chem. 1995, 67, 1426–1436. (20) Perkins, D. N.; Pappin, D. J. C.; Creasy, D. M.; Cottrell, J. S. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 1999, 20, 3551–3567. (21) MacCoss, M. J.; Wu, C. C.; Yates, J. R. Probability-based validation of protein identifications using a modified sequest algorithm. Anal. Chem. 2002, 74, 5593–5599. (22) Meng, F. Y.; Cargile, B. J.; Miller, L. M.; Forbes, A. J.; Johnson, J. R.; Kelleher, N. L. Informatics and multiplexing of intact protein identification in bacteria and the archaea. Nat. Biotechnol. 2001, 19, 952–957. (23) Reid, G. E.; Shang, H.; Hogan, J. M.; Lee, G. U.; McLuckey, S. A. Gas-phase concentration, purification, and identification of whole proteins from complex mixtures. J. Am. Chem. Soc. 2002, 124, 7353–7362. (24) Huang, Y. Y.; Triscari, J. M.; Pasa-Tolic, L.; Anderson, G. A.; Lipton, M. S.; Smith, R. D.; Wysocki, V. H. Dissociation behavior of doublycharged tryptic peptides: Correlation of gas-phase cleavage abundance with ramachandran plots. J. Am. Chem. Soc. 2004, 126, 3034– 3035. (25) Shevchenko, A.; Chernushevich, I.; Ens, W.; Standing, K. G.; Thomson, B.; Wilm, M.; Mann, M. Rapid ‘de novo’ peptide sequencing by a combination of nanoelectrospray, isotopic labeling and a quadrupole/time-of-flight mass spectrometer. Rapid Commun. Mass Spectrom. 1997, 11, 1015–1024.

(26) Horn, D. M.; Zubarev, R. A.; McLafferty, F. W. Automated de novo sequencing of proteins by tandem high-resolution mass spectrometry. Proc. Natl. Acad. Sci. U.S.A. 2000, 97, 10313–10317. (27) Roth, K. D. W.; Huang, Z. H.; Sadagopan, N.; Watson, J. T. Charge derivatization of peptides for analysis by mass spectrometry. Mass Spectrom. Rev. 1998, 17, 255–274. (28) Keough, T.; Youngquist, R. S.; Lacey, M. P. A method for highsensitivity peptide sequencing using postsource decay matrixassisted laser desorption ionization mass spectrometry. Proc. Natl. Acad. Sci. U.S.A. 1999, 96, 7131–7136. (29) Yan, B.; Pan, C.; Olman, V. N.; Hettich, R. L.; Xu, Y. A graphtheoretic approach for the separation of b and y ions in tandem mass spectra. Bioinformatics 2005, 21, 563–574. (30) Nielsen, M. L.; Savitski, M. M.; Zubarev, R. A. Improving protein identification using complementary fragmentation techniques in fourier transform mass spectrometry. Mol. Cell. Proteomics 2005, 4, 835–845. (31) Savitski, M. M.; Kjeldsen, F.; Nielsen, M. L.; Zubarev, R. A. Complementary sequence preferences of electron-capture dissociation and vibrational excitation in fragmentation of polypeptide polycations. Angew. Chem., Int. Ed. 2006, 45, 5301–5303. (32) He, F.; Emmett, M. R.; Hakansson, K.; Hendrickson, C. L.; Marshall, A. G. Theoretical and experimental prospects for protein identification based solely on accurate mass measurement. J. Proteome Res. 2004, 3, 61–67. (33) Olsen, J. V.; de Godoy, L. M. F.; Li, G. Q.; Macek, B.; Mortensen, P.; Pesch, R.; Makarov, A.; Lange, O.; Horning, S.; Mann, M. Parts per million mass accuracy on an orbitrap mass spectrometer via lock mass injection into a c-trap. Mol. Cell. Proteomics 2005, 4, 2010–2021. (34) Gunawardena, H. P.; He, M.; Chrisman, P. A.; Pitteri, S. J.; Hogan, J. M.; Hodges, B. D. M.; McLuckey, S. A. Electron transfer versus proton transfer in gas-phase ion/ion reactions of polyprotonated peptides. J. Am. Chem. Soc. 2005, 127, 12627–12639. (35) Pitteri, S. J.; Chrisman, P. A.; McLuckey, S. A. Electron-transfer ion/ ion reactions of doubly protonated peptides: Effect of elevated bath gas temperature. Anal. Chem. 2005, 77, 5662–5669. (36) Xia, Y.; Gunawardena, H. P.; Erickson, D. E.; McLuckey, S. A. Effects of cation charge-site identity and position on electron-transfer dissociation of polypeptide cations. J. Am. Chem. Soc. 2007, 129, 12232–12243. (37) Liu, J.; Xia, Y.; Liang, X.; McLuckey, S. A. Charge state dependence of proton transfer versus electron transfer in a gas-phase ion/ion electron transfer dissociation process, 55th American Society for Mass Spectrometry, Indianapolis, IN, June 3, 2007.

PR0703977

Journal of Proteome Research • Vol. 7, No. 01, 2008 137