Capture Dissociation Spectra of

Feb 11, 2010 - ... plus some additional pathways based on the author's assumptions and observations. ... For a more comprehensive list of citations to...
0 downloads 0 Views 3MB Size
Anal. Chem. 2010, 82, 1990–2005

Prediction of Electron-Transfer/Capture Dissociation Spectra of Peptides Zhongqi Zhang* Process and Product Development, Amgen Inc., One Amgen Center Drive, Thousand Oaks, California 91320 An empirical model, based on classic kinetics, was developed for quantitative prediction of electron-transfer dissociation (ETD) and electron-capture dissociation (ECD) spectra of peptides. The model includes most fragmentation pathways described in the literature plus some additional pathways based on the author’s assumptions and observations. The ETD model was trained with more than 7000 ETD spectra, with and without supplemental activation. The ECD model was trained with more than 6000 ECD spectra. The trained ETD and ECD models are able to predict ETD and ECD spectra with reasonable accuracy in ion intensities for peptide precursors up to 4000 u in mass. Peptide tandem mass spectra generated from collision-induced dissociation (CID) have been widely used in the identification of peptides and post-translational modifications. However, not all peptides and post-translational modifications can be identified by CID due to the lack of sequence information in some spectra. Alternative fragmentation techniques such as electron-capture dissociation (ECD)1-5 and electron-transfer dissociation (ETD)6,7 provide sequence information complementary to that obtained from CID by cleaving a peptide backbone in a less selective fashion than CID,8 therefore potentially providing more peptide sequence information. Additionally, ECD and ETD are ideal in characterizing labile post-translational modifications,9 such as glycosylation,10-12 phosphorylation,13,14 and glycation.15 To confirm a peptide by its tandem mass spectrum, a spectrum predicted from the sequence of the peptide is usually generated and compared to the experimental spectrum. The predictions are usually performed without much differentiation of the intensity of each ion. To take full advantage of all the information provided in a fragmentation spectrum, we need to predict, to a reasonable * To whom correspondence should be addressed. Fax: (805)376-2354. E-mail: [email protected]. (1) Zubarev, R. A.; Kelleher, N. L.; McLafferty, F. W. J. Am. Chem. Soc. 1998, 120, 3265–3266. (2) Zubarev, R. A.; Haselmann, K. F.; Budnik, B.; Kjeldsen, F.; Jensen, F. Eur. J. Mass Spectrom. 2002, 8, 337–349. (3) Zubarev, R. A. Curr. Opin. Biotechnol. 2004, 15, 12–16. (4) Cooper, H. J.; Hakansson, K.; Marshall, A. G. Mass Spectrom. Rev. 2005, 24, 201–222. (5) Zubarev, R. Expert Rev. Proteomics 2006, 3, 251–261. (6) Syka, J. E. P.; Coon, J. J.; Schroeder, M. J.; Shabanowitz, J.; Hunt, D. F. Proc. Natl. Acad. Sci. U. S. A. 2004, 101, 9528–9533. (7) Mikesh, L. M.; Ueberheide, B.; Chi, A.; Coon, J. J.; Syka, J. E. P.; Shabanowitz, J.; Hunt, D. F. Biochim. Biophys. Acta, Proteins Proteomics 2006, 1764, 1811–1822. (8) Kruger, N. A.; Zubarev, R. A.; Carpenter, B. K.; Kelleher, N. L.; Horn, D. M.; McLafferty, F. W. Int. J. Mass Spectrom. 1999, 183, 1–5.

1990

Analytical Chemistry, Vol. 82, No. 5, March 1, 2010

accuracy, the relative intensities of product ions. Taking advantage of the intensity information, more reliable peptide identification may be achieved. An empirical kinetic model has been reported previously for quantitative prediction of CID spectra of peptides16,17 and has been used for full characterization of therapeutic proteins.18,19 Although backbone cleavages generated by ETD/ECD are less selective, fragment efficiency is a complicated function of peptide length, charge distribution, etc.20 The abundances of different charge states of the resulting c and z ions are also not uniformly distributed. Therefore, peptides can be identified with more confidence if the intensities of the c and z ions can be predicted accurately. In addition, secondary fragmentation pathways such as hydrogen atom (H · ) transfer21 and neutral losses22-27 may complicate the spectrum, and an incorrect conclusion may be reached if care is not taken during spectral interpretation. This article describes an empirical model for quantitative prediction of ETD/ECD spectra of peptides. Similar to the CID model described previously,16,17 the model is based largely on classic kinetics and is able to predict peptide ETD/ECD spectra, (9) Kelleher, N. L.; Zubarev, R. A.; Bush, K.; Furie, B.; Furie, B. C.; McLafferty, F. W.; Walsh, C. T. Anal. Chem. 1999, 71, 4250–4253. (10) Mirgorodskaya, E.; Roepstorff, P.; Zubarev, R. A. Anal. Chem. 1999, 71, 4431–4436. (11) Hakansson, K.; Cooper, H. J.; Emmett, M. R.; Costello, C. E.; Marshall, A. G.; Nilsson, C. L. Anal. Chem. 2001, 73, 4530–4536. (12) Hogan, J. M.; Pitteri, S. J.; Chrisman, P. A.; McLuckey, S. A. J. Proteome Res. 2005, 4, 628–632. (13) Stensballe, A.; Jensen, O. N.; Olsen, J. V.; Haselmann, K. F.; Zubarev, R. A. Rapid Commun. Mass Spectrom. 2000, 14, 1793–1800. (14) Chalmers, M. J.; Kolch, W.; Emmett, M. R.; Marshall, A. G.; Mischak, H. J. Chromatogr., B 2004, 803, 111–120. (15) Zhang, Q. B.; Frolov, A.; Tang, N.; Hoffmann, R.; van de Goor, T.; Metz, T. O.; Smith, R. D. Rapid Commun. Mass Spectrom. 2007, 21, 661–666. (16) Zhang, Z. Anal. Chem. 2004, 76, 3908–3922. (17) Zhang, Z. Anal. Chem. 2005, 77, 6364–6373. (18) Zhang, Z.; Pan, H.; Chen, X. Mass Spectrom. Rev. 2009, 28, 147–176. (19) Zhang, Z. Anal. Chem. 2009, 81, 8354–8364. (20) Good, D. M.; Wirtala, M.; McAlister, G. C.; Coon, J. J. Mol. Cell. Proteomics 2007, 6, 1942–1951. (21) Savitski, M. M.; Kjeldsen, F.; Nielsen, M. L.; Zubarev, R. A. J. Am. Soc. Mass Spectrom. 2007, 18, 113–120. (22) Cooper, H. J.; Hudgins, R. R.; Hakansson, K.; Marshall, A. G. J. Am. Soc. Mass Spectrom. 2002, 13, 241–249. (23) Haselmann, K. F.; Budnik, B. A.; Kjeldsen, F.; Polfer, N. C.; Zubarev, R. A. Eur. J. Mass Spectrom. 2002, 8, 461–469. (24) Fung, Y. M. E.; Chan, T. W. D. J. Am. Soc. Mass Spectrom. 2005, 16, 1523– 1535. (25) Chalkley, R. J.; Brinkworth, C. S.; Burlingame, A. L. J. Am. Soc. Mass Spectrom. 2006, 17, 1271–1274. (26) Savitski, M. M.; Nielsen, M. L.; Zubarev, R. A. Anal. Chem. 2007, 79, 2296– 2302. (27) Falth, M.; Savitski, M. M.; Nielsen, M. L.; Kjeldsen, F.; Andren, P. E.; Zubarev, R. A. Anal. Chem. 2008, 80, 8089–8094. 10.1021/ac902733z  2010 American Chemical Society Published on Web 02/11/2010

achieving reasonable prediction accuracy in fragment ion intensities for peptide ions up to 4000 u in mass. In the following sections the empirical model will be described. Keep in mind that the goal of the work is to develop a simple and working model for predicting peptide ETD/ECD spectra, and it is not the author’s intention to describe all fragmentation pathways accurately. THEORY AND METHOD The Kinetics. For a unimolecular reaction involving many competing pathways, such as the case of peptide fragmentation, the reaction kinetics can be described as follows. Assuming a precursor ion P has n competing fragmentation pathways to form fragments F1, F2, ..., Fn, with rate constants k1, k2, ..., kn, respectively, the kinetics of fragmentation can be described as [P]t ) [P]0 exp(-ktotalt)

(1)

where [P]0 and [P]t are abundances of the precursor ion at time zero and time t, respectively, and ktotal is the sum of rate n constants for all pathways ktotal ) ∑i)1 ki. The abundance of each fragment at time t can be expressed as [Fi]t )

∫ k [P] dt t

0

i

t

(2)

Combining eqs 1 and 2, we have

[Fi]t ) ki[P]0

∫ exp(-k t

0

totalt)dt

)

ki[P]0[1 - exp(-ktotalt)] ktotal (3)

When rate constants of all fragmentation pathways ki are known, the abundance of each fragment at time t can be derived from eq 3. Major Assumptions. During the ETD or ECD process, an electron is attached to a multiply protonated peptide ion, followed by bond cleavage. However, the cleavage products are often associated in a noncovalent complex, in which secondary processes such as H · transfer21 may occur. To simplify the peptide fragmentation model, the following major assumptions are made when calculating the rate constant for each ETD/ECD fragmentation pathway. 1. Peptide ETD/ECD pathways include an intramolecular proton transfer step, followed by electron attachment, and then by bond cleavage and fragmentation. Intramolecular proton transfer is a fast process compared to electron attachment. 2. After electron attachment, radical induced backbone and side-chain cleavages happen immediately.28 All other fragmentation pathways, including dissociation of c and z-type fragments as well as H · transfer from c to z · , are slow processes as compared to intramolecular energy exchange and energy exchange caused by collision in the case of quadrupole ion traps. The rate constants of slow processes are calculated by Arrhenius equation (28) Cerda, B. A.; Horn, D. M.; Breuker, K.; Carpenter, B. K.; McLafferty, F. W. Eur. Mass Spectrom. 1999, 5, 335–338.

( )

k ) A exp -

Ea RTeff

(4)

where A is the frequency factor (or A factor), Ea is the activation energy, R is the gas constant, and Teff is the effective temperature of the ion. 3. Side chain of each residue has a constant hydrogen affinity (HA), which is independent of its environment. 4. Gas-phase basicity (GB) and HA of each backbone amide site depend on its neighboring residues, and the effects are additive. 5. The likelihood of H · -induced cleavage is related to the likelihood of the site to capture an H · , which can be estimated by Boltzmann distribution based on the HA of each site. 6. The frequency factors A are the same for each fragmentation pathway. 7. H · transfer form c to z · or y to b · only happens before the noncovalent complex is dissociated. 8. Activation energy (Ea) of H · transfer depends on the nature of the donor residue and receiver residue, and the effects are additive. Fragment Ion Notations. Peptide ETD and ECD processes produce mainly c-type, z-type, y-type, b-type, and a-type of ions. However, due to different hydrogen rearrangement events21 many subtypes of these ions are observed. Below are the definitions of these subtype ions used in this article. Radical induced backbone cleavage produces c and z · ions as well as b · and y ions. H · transfer from c to z · produces c · and z’ fragments. H · transfer from y to b · produces y · and b’ fragments. Loss of an H · from z · or b · forms z and b, respectively. Loss of H2 from a z · ion forms a z · ’ ion. Loss of CO from a b · ion produces an a · ion. The above notations for b, c and y, z ions are consistent with the common definitions of these ions during CID. Figure 1 shows the pathways, as implemented in the model, to form these different subtypes of fragment ions. Fragmentation Pathways Included in the Model. Based on the work in the literature and some observations/assumptions made by the author, the following competing pathways

Figure 1. Schematics, as implemented in the model, of the formation of different subtypes of c, z, b, a, and y fragments. Note that each species on the pathway can also act as a precursor and undergoes the entire pathway by capturing another electron. “-R · ” represents a radical neutral loss from a side chain of a z · species. Analytical Chemistry, Vol. 82, No. 5, March 1, 2010

1991

Figure 2. Competing pathways included in the model.

(illustrated in Figure 2) are considered important for ETD/ ECD of a protonated peptide and therefore included in the model. 1. Transfer a proton (H+) from the precursor ion to the reagent ion.29,30 2. Attachment of an electron (e-) to the precursor ion. 3. Radical induced cleavages immediately after electron attachment, including backbone cleavages to form a noncovalent c/z · complex,31 radical and charge-directed cleavages32,33 to form a y/b · complex, NH3 loss from NH2-terminus,23,34,35 CO loss from COOH-terminus (author’s observation/assumption, see Supporting InformationFigure S-2F for an example), and side-chain even-electron neutral losses.22-24,27 4. H · transfer from c to z · 21 or y to b · in a noncovalent complex (author’s assumption to explain observations of y-1 and b+2 ions. See Supporting Information Figure S-1 for examples). 5. Neutral losses including H · , H221 and side-chain radical losses24-26 from z · species, and H · and CO losses from b · species, regardless the z · or b · species are in their free form or in a complex. To simplify the model, it is assumed that all evenelectron side-chain neutral losses (see item 3 above) are a fast process after electron attachment, and all odd-electron (or radical) side-chain neutral losses are a slow process from a z · species. Odd-electron neutral losses from charge-reduced species27 are interpreted as losses from z · in the c/z · complex. 6. Dissociation of the noncovalent complex to form c- and z-type of fragments or b- (or a-) and y-type of fragments. Please note that it is not the author’s intention to get every pathway correctly. The goal is to include pathways that can be simply modeled mathematically and can explain most ions (29) Coon, J. J.; Syka, J. E. P.; Schwartz, J. C.; Shabanowitz, J.; Hunt, D. F. Int. J. Mass Spectrom. 2004, 236, 33–42. (30) Gunawardena, H. P.; He, M.; Chrisman, P. A.; Pitteri, S. J.; Hogan, J. M.; Hodges, B. D. M.; McLuckey, S. A. J. Am. Chem. Soc. 2005, 127, 12627– 12639. (31) McLafferty, F. W.; Horn, D. M.; Breuker, K.; Ge, Y.; Lewis, M. A.; Cerda, B.; Zubarev, R. A.; Carpenter, B. K. J. Am. Soc. Mass Spectrom. 2001, 12, 245–249. (32) Lee, S.; Chung, G.; Kim, J.; Oh, H. B. Rapid Commun. Mass Spectrom. 2006, 20, 3167–3175. (33) Liu, H. C.; Hakansson, K. J. Am. Soc. Mass Spectrom. 2007, 18, 2007– 2013. (34) Turecek, F.; Syrstad, E. A. J. Am. Chem. Soc. 2003, 125, 3353–3369. (35) Chakraborty, T.; Holm, A. I. S.; Hvelplund, P.; Nielsen, S. B.; Poully, J. C.; Worm, E. S.; Williams, E. R. J. Am. Soc. Mass Spectrom. 2006, 17, 1675– 1680.

1992

Analytical Chemistry, Vol. 82, No. 5, March 1, 2010

observed in the spectra. For example, although many people believe that b ions are formed from vibrationally excited nonradical species followed by H · loss,36 the author favors the assumed y/b · complex model due to its mathematical simplicity and its ability to straight-forwardly explain the formation of a · ion (see Figure 4 for some examples) and occasional observation of b+1 (b · ), y-1 (y · ), and b+2 (b′) ions (see Supporting Information Figure S-1 for some examples). Parameters Used in the Model. Among many parameters tested, a total of 280 parameters were determined important and were included in the ETD model, and 266 parameters were included in the ECD model. Tables 1-5 show a complete list of these parameters. Many of these parameters reflect the properties of different amino acids (Tables 1-4). The amino acid residues considered in the model include the 20 common amino acids plus carboxymethylated cysteine (CmC), carbamidomethylated cysteine (CamC), and oxidized methionine (OxM). Different sets of parameters were used for ETD and ECD processes. During refinement of the model, these parameters were varied until a best match was obtained between the predicted spectra and the experimental spectra in the training data sets. Details of these parameters are described below. Please note the symbols used in the tables for these parameters are the same as shown below as well as in all equations. 1. Side-chain gas-phase basicities (GB0SC, Tables 1 and 2) of four relatively basic amino acid residues including histidine (H), lysine (K), arginine (R), and tryptophan (W). For side chains of less basic residues including C, CmC, CamC, D, E, M, OxM, N, Q, S, T, and Y, their GB0SC values are too small to be of any importance in calculating the proton distribution. Therefore, the GBSC 0 values of these less basic residues are estimated based on GB values of other compounds with similar functional groups in the unit of kJ/mol. GB values of the remaining side chains (shown as blanks in Tables 1 and 2) are so small that their basicities are not considered in calculation of proton distribution (side chain never protonated). 2. It is assumed that the GB of a backbone amide is determined BB by the residue to the left (GBBB L ) and right (∆GBR ) of the amide position. “∆” indicates that the corresponding value is a small adjustment. It was found that the slight differences in backbone amide GB values do not change the fragmentation behavior (36) Cooper, H. J. J. Am. Soc. Mass Spectrom. 2005, 16, 1932–1940.

Table 1. Residue-Specific Parameters Involving Proton/Hydrogen Distributions and Radical-Induced Fast Cleavages for ETD

residue A C CmCa CamCb D E F G H I K L M OxMc N P Q R S T V W Y NH2-COOH a

side-chain GB GBSC 0 770 830 860 784 790 905.7 905.5 830 854 863.4 864.6 928.0 775 780 899.0 790

backbone GB GBLBB 898.1 898.1 898.1 898.1 898.1 898.1 898.1 898.1 898.1 898.1 898.1 898.1 898.1 898.1 898.1 898.1 898.1 898.1 898.1 898.1 898.1 898.1 898.1 902.1

∆GBRBB 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3.9 0 0 0 0 0 0 0

side-chain HA HASC 53.9 68.4 71.6 66.3 66.8 60.0 58.2 58.0 71.0 57.0 60.9 58.4 64.1 64.8 68.9 63.3 66.2 66.8 65.6 65.8 62.4 63.5 58.0

-96.5

backbone HA HALBB

∆HARBB

68.8 68.0 68.8 68.9 69.3 69.9 69.7 69.5 70.2 69.1 69.6 68.9 68.5 69.1 69.5 67.7 69.9 69.6 69.0 69.1 69.0 69.3 69.4 70.9

0.0 -1.2 0.2 -0.2 0.6 0.8 0.3 -0.9 0.2 -0.9 1.1 0.0 -0.1 -0.3 0.7 -1.6 0.4 0.5 0.2 0.2 -1.1 -0.1 0.2

first side-chain M neutral loss fraction fNL1 0.990 0.460 0.800 0.980 0.788 0.714 0.594 0.989 0.030 0.771 0.700 0.910 0.709 0.889 0.609 0.970 0.516 0.886 0.820 0.903 0.920 0.890 1.000 1.000

-12.5

second side-chain M neutral loss fraction fNL2 0.111 0.085 0.019 0.212 0.285 0.380 0.000 0.210 0.299 0.089 0.288 0.098 0.391 0.030 0.300 0.114 0.180 0.029 0.080 0.108

CmC, carboxymethylated cysteine. b CamC, carbamidomethylated cysteine. c OxM, oxidized methionine.

Table 2. Residue-Specific Parameters Involving Proton/Hydrogen Distributions and Radical-Induced Fast Cleavages for ECD

residue A C CmC CamC D E F G H I K L M OxM N P Q R S T V W Y NH2-COOH

side-chain GB GBSC 0 770 830 860 784 790 914.7 908.0 830 854 863.4 864.6 919.9 775 780 899.0 790

backbone GB GBLBB 897.9 897.9 897.9 897.9 897.9 897.9 897.9 897.9 897.9 897.9 897.9 897.9 897.9 897.9 897.9 897.9 897.9 897.9 897.9 897.9 897.9 897.9 897.9 878.2

∆GBRBB 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12.3 0 0 0 0 0 0 0 -96.5

side-chain HA HASC 57.6 66.1 70.3 61.8 68.4 51.7 49.2 33.0 76.6 42.5 52.7 51.8 55.4 54.7 69.7 69.2 65.7 61.4 58.8 60.6 52.6 57.0 52.9

backbone HA HALBB

∆HARBB

67.8 66.0 70.6 68.6 70.6 71.6 69.8 69.7 70.9 68.9 70.0 67.9 68.7 69.2 69.9 65.7 71.0 70.4 69.3 69.5 68.7 69.9 69.7 69.2

0.0 -0.3 7.6 3.3 0.4 1.4 0.2 -2.6 -0.3 -2.4 2.7 0.2 -0.1 -0.6 1.0 -4.9 0.8 2.9 -0.2 1.4 -2.4 -0.3 0.3

very much except for the N-terminal side of a proline residue. Therefore, the GBBB L values for all residues are assumed to be the same, and the values of ∆GBBB R are assumed to be zero for all residues except proline (Tables 1 and 2). Terminal functional groups including N-terminal amine (NH2-) and C-terminal carboxylic acid group (-COOH), however, are considered BB backbone with very different GBBB L and ∆GBR values.

-20.4

first side-chain M neutral loss fraction fNL1 0.796 0.011 1.000 0.700 0.863 0.910 0.729 1.000 0.009 0.630 0.051 0.699 0.351 0.580 0.397 0.828 0.300 0.792 0.776 0.600 0.940 0.989 1.000 1.000

second side-chain M neutral loss fraction fNL2 0.935 0.000 0.245 0.009 0.087 0.230 0.002 0.370 0.923 0.239 0.609 0.420 0.519 0.010 0.606 0.000 0.040 0.140 0.000 0.011

3. The likelihood of cleavage at a specific site depends on the H · affinity (HA) of that particular site37 (or other properties such as electron affinity, depending on the mechanism of ETD and ECD). HA for a side chain is determined by the nature of the side chain (HASC). HA for a backbone site is determined by (37) Zubarev, R. A.; Kruger, N. A.; Fridriksson, E. K.; Lewis, M. A.; Horn, D. M.; Carpenter, B. K.; McLafferty, F. W. J. Am. Chem. Soc. 1999, 121, 2857–2862.

Analytical Chemistry, Vol. 82, No. 5, March 1, 2010

1993

Table 3. Residue-Specific Parameters (Activation Energies and A Factors) Involving Slow ETD Pathways H transfer (c to z · ) residue A C CmC CamC D E F G H I K L M OxM N P Q R S T V W Y A typical k (s-1)a a

z· EHT

20.1 19.3 24.8 32.8 18.5 19.6 19.4 17.4 18.8 21.3 19.3 18.6 20.4 19.0 17.6 45.6 19.1 17.3 17.5 18.1 20.7 19.2 21.5

c ∆EHT

0.0 -3.8 0.1 0.8 0.5 0.0 0.5 -1.1 -0.2 0.9 -0.9 0.0 0.0 -1.2 -0.3 0.7 -0.1 -2.2 0.1 0.4 0.0 -1.1 0.0

1.33 × 104 5.5

H transfer b· (y to b · ) EHT

H loss z· (z · ) EHLoss

H loss b· (b · ) EHLoss

CO loss b· (b · ) ECOLoss

H2 loss (z · ) EHz ·2Loss

26.6 26.6 26.6 26.6 26.6 26.6 26.6 26.6 26.6 26.6 26.6 26.6 26.6 26.6 26.6 26.6 26.6 26.6 26.6 26.6 26.6 26.6 26.6 1.33 × 104 0.43

36.2 28.4 31.0 28.7 27.6 36.4 33.9 29.9 29.7 33.1 29.2 31.2 31.6 31.7 30.1 21.6 28.3 29.3 28.0 27.1 34.1 32.2 44.9 1.81 × 104 0.10

10.2 10.2 10.2 10.2 10.2 10.2 10.2 10.2 10.2 10.2 10.2 10.2 10.2 10.2 10.2 10.2 10.2 10.2 10.2 10.2 10.2 10.2 10.2 1.81 × 104 342

9.2 9.2 9.2 9.2 9.2 9.2 9.2 9.2 9.2 9.2 9.2 9.2 9.2 9.2 9.2 9.2 9.2 9.2 9.2 9.2 9.2 9.2 9.2 2.15 × 104 599

25.1 16.1 25.3 22.1 26.3 22.2 26.2 24.3 24.6 24.5 26.1 24.3 24.3 26.0 25.1 28.5 24.0 25.2 22.9 21.6 24.5 23.4 25.1 1.81 × 103 0.14

radical loss z· (z · ) ERLoss 16.2 15.7 15.9 25.1 27.3 51.7 50.7 25.5 32.0 26.4 25.3 51.0 28.5 28.5 33.7 28.8 28.2 58.3 31.2 54.7 1.18 × 104 0.035

Typical rate constant k is calculated based on the average activation energy and effective temperature of 309.1 K.

Table 4. Residue-Specific Parameters (Activation Energies and A Factors) Involving Slow ECD Pathways H transfer (c to z · ) residue

z· EHT

c ∆EHT

A C CmC CamC D E F G H I K L M OxM N P Q R S T V W Y A typical k (s-1)a

19.8 29.9 25.6 35.3 19.2 18.3 18.7 11.3 20.9 23.3 24.8 19.1 17.4 19.2 18.1 20.5 18.5 6.5 16.0 20.1 22.7 18.6 20.8

0.0 -3.4 0.3 0.5 0.6 -1.3 0.2 -1.3 0.7 -0.6 1.5 -0.2 -0.2 0.5 -0.3 -0.1 -1.3 4.5 -0.5 -0.6 -1.4 -1.1 -0.7

a

2.62 × 104 93

H transfer b· (y to b · ) EHT

H loss z· (z · ) EHLoss

H loss b· (b · ) EHLoss

CO loss b· (b · )ECOLoss

H2 loss z· (z · ) EH2Loss

18.5 18.5 18.5 18.5 18.5 18.5 18.5 18.5 18.5 18.5 18.5 18.5 18.5 18.5 18.5 18.5 18.5 18.5 18.5 18.5 18.5 18.5 18.5 2.62 × 104 149

53.7 48.9 48.9 49.0 49.8 52.4 49.6 49.9 48.9 49.6 52.4 51.5 48.9 49.4 54.4 48.8 49.2 51.3 38.1 29.4 52.3 47.8 49.2 3.29 × 104 0.04

10.5 10.5 10.5 10.5 10.5 10.5 10.5 10.5 10.5 10.5 10.5 10.5 10.5 10.5 10.5 10.5 10.5 10.5 10.5 10.5 10.5 10.5 10.5 3.29 × 104 1750

11.6 11.6 11.6 11.6 11.6 11.6 11.6 11.6 11.6 11.6 11.6 11.6 11.6 11.6 11.6 11.6 11.6 11.6 11.6 11.6 11.6 11.6 11.6 1.22 × 104 477

42.2 40.3 40.3 40.3 40.6 42.7 42.3 39.9 40.7 41.3 40.3 41.5 40.4 40.3 42.5 40.4 42.5 43.5 31.5 26.6 41.8 41.8 42.7 1.48 × 103 0.02

radical loss z· (z · ) ERLoss 19.2 12.9 14.3 31.0 28.5 58.8 47.2 31.7 44.6 27.7 24.7 44.6 36.6 31.6 49.8 40.9 46.3 39.0 43.8 60.4 1.18 × 104 0.42

Typical rate constant k is calculated based on the average activation energy and effective temperature of 430.5 K.

BB the residue to the left (HABB L ) and right (∆HAR ) of the site. BB The value for ∆HAR is set to zero for alanine (Tables 1-2). Due to the fact that the nature of these parameters is unknown, the values of HA are purely relative, as opposed to GB values, which are close to their real physical values with a unit of kJ/ mol. The relative nature of the HA values does not affect the mathematics in the model. 4. Attachment of an H · to the side chain may cause instant even-electron neutral loss from the side chain. Two neutral losses

1994

Analytical Chemistry, Vol. 82, No. 5, March 1, 2010

from each side chain were included in the model (Table 6). These neutral losses are largely taken from literature.22-24,27 For neutral losses that are not reported in the literature, the author made several guesses based on the structure of each side chain, and the two guesses (for each residue) that match the experimental data the best (maximum average similarity after model optimization) were implemented in the model. See Supporting Information Figure S-2 for examples of some possible side-chain neutral losses derived this way. Please note that extra work may be needed to

Table 5. Optimized Values of Other Miscellaneous Parameters for ETD and ECD used for calculating

ETD

effective temperature supplemental activation effective temperature proton distribution proton transfer

T0 ) 309.1, cr ) 0.24 T T TSA 0 ) 236.5, cSA ) 240.4, pM ) 0.48, pzT ) -0.51 GB GB FCoulomb ) 5.3, FSaltBridge ) 0.61 APT ) 4.58, pRPT ) 0.1, pzPT ) 2.5, PT pM ) 0.39 EA AEA ) 27.7, pzEA ) 1.94, pM ) EA -0.40, pREA0 ) 0.33, pzR ) -0.62 fyb ) 1.62 z· rRLoss ) 0.10 Ad ) 2.43 × 104, Ed0 ) -14.7, M Ed ) 77.5, pdM1 ) 0.67, pdM2 ) 0.30, M EdCoulomb ) 454.9, pCoulomb ) 0.30 Etrap ) 0.007, pEtrap ) -0.17, f trap ) 2.86, 0 1 trap f trap ) 67.4, pq1 ) 5.12, 2 trap pq2 ) 1.25, pztrap ) 0.81 Etrap ) 0.14, pEtrap ) -0.87, f trap ) 0 1 trap trap 24.9, ftrap ) 6.18, pq1 ) 8.9, pq2 2 ) 1.76, pztrap ) -0.50 ∆w1 ) 0.34, ∆w2 ) 1.31, σ ) 0.60

electron attachment backbone cleavage neutral loss complex dissociation trapping supplemental activation trapping isotope distribution

ECD

Eq.

T0 ) 430.5, cr ) 167.2

5 6

GB GB FCoulomb ) 5.82, FSaltBridge ) 1.38 APT ) 2.29, pRPT ) 3.7, pzPT ) 2.15, PT pM ) -2.0 EA AEA ) 4.16, pzEA ) 0.67, pM ) 0.74, EA pREA0 ) 5.7, pzR ) 0.44 fyb ) 0.50 z· rRLoss ) 0.26 Ad ) 2.89 × 104, Ed0 ) -18.3, EdM ) 67.9, pdM1 ) 0.75, pdM2 ) 0.45, M EdCoulomb ) 364.3, pCoulomb ) 0.32 Etrap ) 11.1, pEtrap ) 0.66, f trap ) 0 1 0.014, pztrap ) 1.98

7 8 9-10 14-15 22 26-28 30-33 30

∆w1 ) 0.00, ∆w2 ) 1.56, σ ) 0.40

34-36

Table 6. Neutral Losses (with References) from the Side Chain of Each Residue (Both ETD and ECD) Implemented in the Modela residue

first side-chain neutral loss (fast)

A C CmC CamC D E F G H I K L M OxM N P Q R S T V W Y NH2-COOH

CH4 H 2S CH3COOH HSCH2CONH2 CH3COOH23 CO2 C6H6 H2 C3H3N2-CH322 CH3CH3 CH2dNH CH2dC(CH3)224 S ) CH2 HSOCH3 HCONH222 N/A HCONH222 HN)CHNH222 H2O23 H2O23 CH4 C8H6N-CH323 CH3-C6H4OH23 NH323 CO

a

mass

second side-chain neutral loss (fast)

mass

radical loss from z · (slow)

mass

16.0313 33.9877 60.0211 91.0092 60.0211 43.9898 78.0470 2.0157 82.0531 30.0470 29.0266 56.0626 45.9877 63.9983 45.0215 N/A 45.0215 44.0374 18.0106 18.0106 16.0313 131.0735 108.0575 17.0265 27.9949

N/A CH2dS24 HSCH2COOH CH3SCH2CONH2 CO27 CH2dCHCOOH24 C6H5-CH3 N/A C3H2N2dCH2 CH3CHdCHCH324 CH3(CH2)3NH222 CH3CHdCH2 CH2dCHSCH322 CH3CH2SOCH3 CH3CONH223 N/A CH2dCHCONH224 HN)C(NH2)222 CH2dO CH3CH2OH23 CH2(CH3)2 C8H7N C6H5OH N/A N/A

N/A 45.9877 91.9932 105.0248 27.9949 72.0211 92.0626 N/A 80.0374 56.0626 73.0891 42.0470 74.0190 92.0296 59.0371 N/A 71.0371 59.0483 30.0106 46.0419 44.0626 117.0578 94.0419 N/A N/A

N/A · SH24 · SCH2COOH25 · SCH2CONH225 · COOH26 · CH2COOH24 · C6H526 N/A · C3H3N226 · CH2CH324 · CH2(CH2)2NH224 · CH(CH3)224 · CH2SCH324 · CH2SOCH3 · CONH224 N/A · CH2CONH224 · CH2CH2NHC(NH)NH2 · OH26 · OH26 · CH324 · C8H6N24 · C6H4OH26 N/A N/A

N/A 32.9799 90.9854 90.0014 44.9977 59.0133 77.0391 N/A 67.0296 29.0391 58.0657 43.0548 61.0112 77.0061 44.0136 N/A 58.0293 86.0718 17.0027 17.0027 15.0235 116.0500 93.0340 N/A N/A

Neutral losses without references are author’s assumptions.

confirm these neutral losses. The fractions (or probabilities, range from 0 to 1) of the two side-chain neutral losses are denoted as M M fNL1 and fNL2 , respectively (Tables 1 and 2). The remaining fraction M M (1 - fNL1 - fNL2 ), if not zero, means that nothing happens after H · attachment to the side chain. The superscript M (stands for molecular ion) in the symbol is used to distinguish it from radical losses from z · species. Keep in mind that some neutral losses implemented in the model are not observable experiM mentally, in which case the optimized parameter fNL of the corresponding neutral loss will be close to zero. Examples include the loss of C3H2N2dCH2 (80 u) from histidine side chain and the loss of C8H7N (117 u) from tryptophan side chain, which are not observed experimentally. 5. Contribution of any residue to the activation energy of H · z· transfer from c to z · when the residue is at z · side (EHT ) or c

side (∆EcHT) of the cleavage site. ∆EcHT is set to zero for alanine. The activation energy of H · transfer from y to b · is assumed b· ) (Tables 3 and 4). to be residue-independent (EHT z· ) and 6. Activation energies for H · loss from z · species (EHLoss b· b· b · species (EHLoss). EHLoss is assumed to be residue-independent (Tables 3 and 4). 7. Activation energies for H2 loss from z · species (EHz ·2Loss) (Tables 3 and 4). · 8. Activation energies for radical losses from z · species (EzRLoss ) (Tables 3 and 4). These radical losses are shown in Table 6 (rightmost column) and are largely taken from literature24-27 as well as from the author’s assumptions. 9. Frequency factors (A) for H · transfer, H · loss, H2 loss, CO loss, and radical losses (Tables 3 and 4). Analytical Chemistry, Vol. 82, No. 5, March 1, 2010

1995

where TSA is a parameter describing the typical change in Teff due to supplemental activation. M and z are the mass and charge of the ion, respectively. All others are parameters describing the relationship between Teff and M or z. To avoid confusion between power functions and superscripts, pow(x, y) is used to represent xy (x to the yth power) in this article. Proton Distribution. Each protonation site, including backbone amides, side chains, and terminal functional groups, has a GB value. The GB of the backbone (GBBB) is assumed to be independent of its nearby residues, except for the two termini and the N-terminal side of a proline. The GB values of basic side chains (GBSC), however, depend on the number of acidic and basic residues in the precursor, due to possible formation of salt-bridges between the basic residues and acidic residues.39 Figure 3. Flowchart of the ECD/ETD model for spectral prediction.

10. Miscellaneous parameters used for calculating effective temperature, proton distribution, proton transfer rate, electron attachment rate, rate of y/b · -type of cleavage, rate for neutral losses, rate of complex dissociation, trapping efficiency, and isotope pattern (Table 5). The Mathematical Model. Figure 3 shows the simplified flow diagram of the fragmentation model, which is very similar to the model for predicting peptide CID spectra.16 The procedure starts by calculating the proton distribution in the precursor ion, followed by calculating rate constant for each competing pathway, and then by calculating the abundance of each product ion (from eq 3). The procedure is an iterative process in that any “product ion” will become “precursor ion” and undergo further fragmentation if the reaction time allows (tnext > 0). To avoid confusion, we will use the term “precursor ion” for any ions that undergo fragmentation, and the term “parent ion” for the original protonated peptide. The details of the mathematical model are described below. The parameters in the model are denoted using the same symbols as those shown in Tables 1-5. The Effective Temperature. The effective temperature (Teff) for any precursor ion is assumed to relate to the reagent ion concentration by

(

Teff ) T0 + cr

)

r -1 r0

(5)

For ETD, r stands for the concentration of reagent ions, as represented by the AGC target value of the reagent ion in Thermo Scientific LTQ. For ECD, r stands for the “electron energy” setting in Thermo Scientific LTQ-FT. The term r0 is a typical value of r (2.0E5 for ETD on LTQ and 5.0 V for ECD on LTQ-FT). The term cr is a parameter representing the dependence of Teff to r. T0 is a parameter representing the effective temperature of an ion under typical conditions (r ) r0). During supplemental activation in ETD,38 small m/z ranges covering the charge-reduced species are collisionally activated. The Teff of any ions located within these small m/z ranges is calculated by T · pow Teff ) T0 + TSA + cSA

M , p ) · pow(z, p ) ( 1000 T M

T z

(6)

(38) Swaney, D. L.; McAlister, G. C.; Wirtala, M.; Schwartz, J. C.; Syka, J. E. P.; Coon, J. J. Anal. Chem. 2007, 79, 477–485.

1996

Analytical Chemistry, Vol. 82, No. 5, March 1, 2010

GB GBSC ) GB0SC - FSaltBridge ·

Number of Acidic Residues Number of Basic Residues (7)

FGB SaltBridge is a parameter representing the decrease of side-chain GB due to the presence of acidic residues. Once the GB values of all groups are known, the distribution of protons across a peptide ion is calculated according to the Boltzmann distribution as described previously, after considering Coulomb repulGB sion (with Factor FCoulomb ).16,17 Proton Transfer and Electron Attachment. The following equation is used to calculate the rate constant of proton transfer (PT) (transfer of a proton from the precursor ion to a reagent ion)

(

kPT ) APT · pow

) ( ) (

r PT z M , pPT ,p · pow , pzPT · pow r0 r 3 1000 M

) (8)

where r and r0 are reagent ion concentrations as described in eq 5, z and M are charge and mass of the precursor ion. Others are parameters relating proton transfer rate to the reagent ion concentration as well as precursor ion properties. Proton transfer from a precursor ion produces the same species with one less charge state. The following equation is used to calculate the rate constant of electron attachment (EA)

(

kEA ) AEA · pow

) ( ) (

r EA z M , pEA ,p · pow , pzEA · pow r0 r 3 1000 M

) (9)

where prEA is a function of z by the following relationship:

( 3z , p )

prEA ) prEA · pow 0

EA zr

(10)

Radical-Induced Bond Cleavages after Electron Attachment. Attachment of an electron to a precursor ion neutralizes a proton and produces a hydrogen radical H · . The distribution of this H · is possibly far from reaching equilibrium before the cleavage happens, but for the sake of simplicity, the distribution (39) Tsybin, Y. O.; Haselmann, K. F.; Emmett, M. R.; Hendrickson, C. L.; Marshall, A. G. J. Am. Soc. Mass Spectrom. 2006, 17, 1704–1711.

is assumed to be directly related to the hydrogen affinity (HA) of each site and estimated by Boltzmann distribution as shown below. The possible location of the H · includes backbone sites, side chains, and terminal groups. Each of these sites has a HA value. The HA value of each side chain (HASC, Tables 1 and 2) is assumed to be independent of its nearby residues. The HA value of each backbone site, however, is calculated by adding the BB contributions of the residues to the left (HABB L ) and right (∆HAR ) of the backbone site (Tables 1 and 2), i.e., BB HABB(Ai) ) HABB L (Ri) + ∆HAR (Ri+1)

(11)

In eq 11 and later, Ai stands for the ith backbone amide site counting from the N-terminus, Ri stands for the ith residue counting from the N-terminus, and “∆” indicates that the corresponding value is a small adjustment. The functional groups at the N-terminus and C-terminus are treated the same way as a backbone site when calculating their HA values. With HA values for all sites available, the probability of having a H · on site j, or H · density on site j (hj), is determined by the following equation

hj )

∑ i

( ) ( )

HAj RTeff HAi exp RTeff

exp

( ) ( )

cz cz kHT ) AHT · exp -

cz EHT RTeff

(18)

yb yb ) AHT · exp kHT

yb EHT RTeff

(19)

cz For hydrogen transfer from c to z · , the activation energy EHT depends on the residues on the left and right sides of the cleavage site, as calculated by the following equation for a cleavage site between residues Ri and Ri+1

(12) cz z· c EHT ) EHT (Ri+1) + ∆EHT (Ri)

Radical induced cleavages happen immediately after H · attachment. Therefore, the rate constant for radical-induced cleavage of site j is calculated by kj ) kEA · hj

For side-chain neutral losses, the noncovalent complex formed by side-chain cleavage is assumed to be unstable and dissociate immediately. H · Transfer within a Complex. Radical-induced backbone cleavage produces a noncovalent complex of the two resulting fragments, which can be either a c/z · pair or a y/b · pair. Before the complex dissociates, a hydrogen atom (H · ) will transfer from the c part of the complex to the z · part.21 Similarly, an H · will also transfer from the y part to the b · part in a y/b · complex. After hydrogen transfer, the c/z · complex becomes c · /z’ complex and the y/b · complex becomes y · /b’ complex. The rate constant of this hydrogen transfer process is calculated by the following Arrhenius equations

(13)

Radical induced backbone cleavages are either c/z · -type or y/b · -type of cleavages. The y/b · -type of cleavage is chargedirected and thus also proportional to the charge density of the cleavage site. Therefore, rate constants for backbone cleavages are calculated by kcz j ) kj /(1 + fyb · pj)(for c/z cleavages)

(14)

kyb j ) kj · fyb · pj /(1 + fyb · pj)(for y/b cleavages)

(15)

where pj is the proton density at the cleavage sites, and fyb is a parameter reflecting the rate of y/b · cleavages. Backbone cleavages generate a noncovalent complex of the two fragments. For cleavage of side chains, two possible even-electron neutral losses may occur for each residue (Table 6), with their respective M M fractions fNL1 and fNL2 (Tables 1 and 2). The rate constants for neutral losses from side chain of residue Rj are calculated by M M (Rj) ) kj · fNL1 kNL1

(16)

M M kNL2 (Rj) ) kj · fNL2

(17)

(20)

For hydrogen transfer from y to b · , however, the activation energy yb EHT is assumed to be residue independent. yb b· EHT ) EHT

(21)

Neutral Losses from z · and b · Species. Radical-induced neutral losses can happen from z · or b · species, regardless of whether they are free fragments or in a complex. These neutral losses include loss of H · and H2 from z · ,21 loss of H · and CO from b · (to form b and a · , respectively), and loss of side-chain radicals from z · (see Table 6).24-26 These neutral losses are slow processes, and their rate constants can be calculated from Arrhenius equations. The rate constants for the side-chain radical losses also depend on the distance of the side chain to the N-terminus of the z · species. For each residue away from the z· N-terminus, the rate decreases by a fixed relative amount rRLoss . Therefore, the rate constant for the radical loss from the jth residue of a z · species can be expressed as

(

z· z· k RLoss (Rj) ) ARLoss exp -

)

z· ERLoss z· · pow(r RLoss , j - 1) RTeff

(22)

· where EzRLoss is the activation energy of radical loss from residue Rj, and j is the residue number counting from the N-terminus of the z · species (j ) 1 for the N-terminal residue). Dissociation of the Noncovalent Complex. The activation energy for dissociation of the noncovalent complex depends on the masses of the two fragments due to noncovalent interactions and the number of charges in each of the two fragments due to

Analytical Chemistry, Vol. 82, No. 5, March 1, 2010

1997

Coulomb repulsion. Therefore, before calculating the rate constant of complex dissociation, we need to calculate the distribution of charges between the two fragments of the complex. The charge distribution is calculated the same way as described previously.17 Briefly, it is assumed that each fragment (c or z · , for example) has no more than four charge states. Let us name the four charge states as z0, z0 + 1, z0 + 2, and z0 + 3 (z0 is an integer) and the abundances (or probabilities) for the fragment to have these charges as A0, A1, A2, and A3, respectively. For any fragment, its average charge state ¯z can be calculated by adding proton densities (pi) for all sites in the fragment (calculated previously). It is obvious that the centroid (abundance weighted average) of the four charge states should also equal the average charge state of the fragment, i.e. ¯z )

∑p

i

) z0A0 + (z0 + 1)A1 + (z0 + 2)A2 + (z0 + 3)A3

each charge distribution (e.g., zn charges on the N-terminal fragment and zc charges on the C-terminal fragment, both zn and zc are integers and zn + zc ) z), the kinetic energy release (KER) after dissociation can be estimated as KER )

EdCoulomb · zn · zc M M pow(Mn, pCoulomb ) + pow(Mc, pCoulomb )

(26)

where Mn and Mc are masses of the N-terminal and C-terminal fragments respectively, and ECoulomb and pM d Coulomb are parameters related to Coulomb repulsions. Activation energy of complex dissociation is related to the masses of the two fragments and KER by

(

Ed ) Ed0 + EdM · pow

i

) (

)

M1 M1 M2 M2 ,p ,p · pow - KER 1000 d 1000 d (27)

(23) It is also assumed that ¯z is always between z0 + 1 and z0 + 2; therefore, with ¯z known from calculated charge distribution pi, z0 is also known. Please note that the four charge states must be within 0 and z (z is the charge state of the complex). If either z0 or z0 + 3 is outside this range, their abundances (A0 or A3) are automatically determined as zero. Otherwise their abundances are determined as follows. First we need to calculate A0, the probability of a fragment to have z0 charges. Let us take a c fragment for example. Since this c fragment has a minimum of z0 charges, at least z0 sites are fully occupied with protons. Therefore, we first find out the z0 sites with largest proton densities and assume the proton density of 1 for all these sites. A0 then corresponds to the probability of zero protons in all remaining sites in the fragment. A0 is estimated from the following equation, assuming the remaining protons are distributed in a way that is proportional to the previously calculated proton distribution. Nc

A0 )

∏ k)1

N

(1 - pk(z - z0)/

∑p) i

(24)

i)k

where Nc represents the total number of remaining sites in this c fragment, and N presents the total number of remaining sites in the complex, z - z0 is the total number of remaining charges in the complex, and pi represents the previously calculated proton density on a remaining site i of the precursor. A0 of the complementary z · fragment can be calculated the same way. Please note that A0 of a c fragment equals A3 of its complementary z · fragment and A0 of a z · fragment equals A3 of its complementary c fragment, and, as a result, A0 and A3 of both c and z · fragments are determined from the above calculation. With A0 and A3 known, considering eq 23 and that A0 + A1 + A2 + A3 ) 1

(25)

A1 and A2 can be obtained by solving the two equations. Due to Coulomb repulsion in the complex, each of the four possible charge distributions has a different dissociation activation energy and therefore a different dissociation rate constant. For 1998

Analytical Chemistry, Vol. 82, No. 5, March 1, 2010

where M1 is the mass of the smaller fragment, and M2 is the M1 M2 mass of the larger fragment. E0d, EM d , pd , and pd are parameters reflecting the relationship of the activation energy and the fragment masses as well as KER. The rate constant of complex dissociation for that particular backbone cleavage and charge distribution i is therefore calculated by

( )

kid ) Ad · Ai · exp -

Ed RTeff

(28)

where Ad is the frequency factor for complex dissociation, and Ai is the abundance of charge distribution i. Note that when either zn or zc is zero, KER is zero, thus Ed is large. Therefore, doubly charged parent ions (z ) 1 after electron attachment) have low fragmentation efficiency. Additionally, backbone cleavages generating fragment ions with the same number of charges as the precursor are usually not seen, unless the cleavage site is very close to one of the termini (small M1). For example, for a triply charged parent ion (charge of the complex z ) 2), complex dissociation is much faster when both fragments have one charge, as compared to both charges are on a single fragment. Further Fragmentation. Once rate constants for all pathways are calculated, the abundance of each product ion at reaction time t can be calculated from eq 3, and the abundance of the remaining precursor ion can be calculated from eq 1. All product ions are subjected to further calculation with decreased fragmentation time. Teff remains unchanged, except during the supplemental activation process, and the Teff of an ion is calculated from eq 6 if it is activated. The average fragmentation time left for the next process (designated as tnext) is assumed to start when half of the precursor ions in the current process are fragmented. The equation is derived to be (derivation not shown) tnext ) t -

1 2 ln ktotal 1 + exp(-ktotalt)

(29)

where t is the fragmentation time for the current process. With time available for the next round of fragmentation, each ion is subject to further calculation to generate more product ions. The process continues until the activation time runs out (tnext ∼ 0).

Ion Loss during Fragmentation. After each fragmentation event, not all fragment ions can be successfully trapped and eventually analyzed. Ion recoveries after each fragmentation event are assumed to depend on the kinetic energy of the fragment ions as well as the q values in the case of a quadrupole ion trap. The ion recovery after a fragmentation event in ETD and supplemental activation after ETD is estimated by P ) P0 · {1 -

exp[-f1trap

E0trap, pEtrap)

·

pow(z, pztrap)

· pow(0.908 -

· pow(KER +

trap qz, pq1 )]}

· {1 - exp[-f2trap ·

trap )]} pow(z, pztrap) · pow(KER + E0trap, pEtrap) · pow(qz, pq2

(30)

where KER is the kinetic energy release calculated from eq 26, E0trap is a parameter reflecting the average kinetic energy of trapped ions, qz is the q values of the ion as shown below, and z is the charge state of the fragment ion. All other terms are parameters related to charge, KER, and q value of the fragment ion for calculating ion recovery. Please note that parameters shown in the equations have different values for the ETD process and supplemental activation process. In this work, all ETD data were collected on Thermo Scientific LTQ or LTQ-Orbitrap systems, in which the ETD process happens at the q value of 0.4 for the reagent ion (fluoranthene) at m/z ) 202, and the supplemental activation process happens at the q value of 0.15 for the charge-reduced species (Jae Schwartz of Thermo Scientific, personal communication). Therefore, the qz value of a fragment ion with m/z value of m is calculated by qz )

qz )

0.4 × 202 during ETD m

(31)

0.15mreduced during supplemental activation after ETD m (32)

where mreduced is the m/z value of the charge-reduced species. During ECD, the ion recovery after fragmentation is calculated by P ) P0 · exp[-f 1trap · pow(z, pztrap) · pow(KER + E0trap, pEtrap)] (33) Again, these parameters have different values from the ETD process. Adding Ions to the Simulated Spectrum. After simulation complete, all product ions as well as the remaining parent ions are added to the simulated spectrum. The isotope pattern of each ion must be calculated before adding its isotope envelop to the spectrum. The details of the calculation has been described previously.17 Briefly, the parent-ion isotope pattern is first estimated based on the mass of the ion, followed by calculation of the truncated isotope distribution (due to isolation) of the parent ion using the isolation efficiency calculated from the following equations

effiso(m) ) 1 when m g m0-w/2 + ∆w1 and m e m0 + w/2 -∆w2 (35)

[ (

effiso(m) ) exp -

)]

1 m - m0 - w/2 + ∆w2 2 , when m > 2 σ m0 + w/2 -∆w2 (36)

where effiso is the isolation efficiency at a specified m/z value, m is the m/z value of any isotope, m0 is the m/z value of the most abundant isotope, w is the isolation width used in the experiment, and ∆w1, ∆w2, and σ are three parameters in the model. With isolation efficiency available, the abundance of each isotope of the parent ion can be calculated. The isotope distribution of each fragment ion is calculated by assuming that all isotopes are distributed evenly across the entire molecule. Training of the Model. The model is trained using a large number of ETD/ECD spectra of known peptides by examining the similarity scores between the predicted and experimental spectra. The similarity score between two spectra is defined as16

Similarity )

∑ √I I √( ∑ I )( ∑ I 1 2 m m

1 m

The above definition of spectral similarity represents the dot product of the two spectra (cosine of spectral angle),40 after a square-root transformation of the intensity of each signal. Equation 37 is preferred over conventional dot product because conventional dot product does not put enough emphasis on low intensity ions, which often contain large amount of sequence information. A logarithm transformation, however, was found to cause too much interference from noises. A square-root transformation before taking the dot product is a nice compromise between the two extremes. Correlation coefficient of two spectra has also been used for evaluating spectral similarities.41 Correlation coefficient of two spectra represents the dot product of the two spectra after shifting the origin of each spectral vector from zero to the average value of all signals in the spectrum, and, therefore, has the same problem as the conventional dot product. The best match between the simulated and experimental spectra is obtained when the average similarity of all spectra in the training data set is maximized. To avoid overfitting the model, the deviations of some parameters among different residues are minimized at the same time when the spectral similarities are maximized. Specifically, the best set of parameters is obtained when the following function is maximized f ) ¯s -

w5 N

w1 N w3 N

√ ∑ (HA √ ∑ (∆E

√ ∑ (E

BB L

z· HLoss

2 - HABB L ) -

c 2 HT)

-

z· - EHLoss )2

w4 N

[ (

)]

w2 N

√ ∑ (∆HA

BB 2 R )

√ ∑ (E - E ) w (E -E ) N√∑ w (f +f - 1) N√∑ z· HT

6

7

1 m - m0 + w/2 - ∆w1 2 effiso(m) ) exp , when m < 2 σ m0-w/2 + ∆w1 (34)

(37) 2 m)

-

z· 2 HT

z· H2Loss

M NL1

z · 2 H2Loss

M NL2

2

(38)

where ¯s is the average similarity score for all spectra in the training data set, N is the total number of spectra in the training data set, Analytical Chemistry, Vol. 82, No. 5, March 1, 2010

1999

and w1 - w7 are weight factors. The weight factors are selected to be the largest numbers that do not decrease the optimized average similarity ¯s significantly. In the work described here, w1 - w6 are set at 0.001 and w7 is set at 0.2. EXPERIMENTAL SECTION To train the described mathematical model for predicting peptide ETD/ECD spectra, a training data set containing 7494 peptide ETD spectra and a training set containing 6905 ECD spectra were used. The ETD training set was generated from 4804 doubly charged and 2690 multiply charged (3+ to 9+) peptide ions (3-69 residues, less than 7600 u in mass) with or without supplemental activation. ETD spectra of the same ion acquired at different conditions (reaction time, reagent AGC target value, etc.) were treated as different spectra. ETD spectra were collected on a Thermo-Scientific LTQ XL mass spectrometer and a ThermoScientific LTQ-Orbitrap XL in centroid mode with isolation width of 1-5, reaction time of 10-500 ms, and reagent target value of 1E5 to 5E5, using singly charged fluoranthene anions as the reagent. Most peptides were generated from proteolytic digestion of recombinant proteins made in Amgen (Thousand Oaks, CA) and commercial proteins purchased from Sigma-Aldrich (Saint Louis, MO). Each peptide was identified from their tandem mass spectra, including both CID and ETD, using custom-written software MassAnalyzer,19 and validated manually. The sequences of these peptides are virtually random due to different specificities of proteases used in the digestion. Proteases used for digestion include trypsin, endoproteinases Lys-C, Glu-C, Asp-N, and pepsin. Most spectra were collected with reversed-phase liquid chromatography/tandem mass spectrometry (LC/MS/MS) at ∼200 µL/ min flow rate using acetonitrile gradient and either 0.02%-0.1% TFA, or 0.1-0.2% formic acid, in the mobile phase. A testing ETD data set generated from tryptic digestion as well as combined trypsin/Glu-C digestion of bovine catalase and rabbit phosphorylase (Sigma-Aldrich) was used to validate the accuracy of the prediction. The testing data set contains 591 ETD spectra of 428 doubly charged and 163 multiply charged (3+ to 5+) peptide ions (4-37 residues in length, mass 500-4000 u) acquired on a Thermo-Scientific LTQ-XL instrument in centroid mode with isolation width of 3 or 4, reaction time of 100 or 125 ms, and reagent target value of 1e5 or 2e5. Other instrument parameters for ETD are similar to those described above for acquiring spectra in the training data set. An ECD data set was obtained from the following five major sources. All data were acquired on Thermo-Scientific LTQ-FT instruments. 1. A collection of ECD spectra of doubly and triply charged peptides derived from tryptic protein digests were kindly provided by Roman Zubarev.27 From the collection, 4155 doubly and 1470 triply charged high-quality spectra were selected based on the absence of interferences near the parent ions. 2. Tryptic digestion of proteins extracted from human lymphocytes and analyzed by 2-dimentional LC/MS/MS (data provided by Shenheng Guan of University of California-San Francisco). The data were searched against human database using (40) Stein, S. E.; Scott, D. R. J. Am. Soc. Mass Spectrom. 1994, 5, 859–866. (41) Budnik, B. A.; Nielsen, M. L.; Olsen, J. V.; Haselmann, K. F.; Horth, P.; Haehnel, W.; Zubarev, R. A. Int. J. Mass Spectrom. 2002, 219, 283–294.

2000

Analytical Chemistry, Vol. 82, No. 5, March 1, 2010

Sequest (Thermo Scientific), and the confidently identified major proteins were used for further peptide identification using both CID and ECD data by MassAnalyzer. Peptide identifications were validated manually. 3. Tryptic digestion of human serum (data provided by Paul Auger/Mike Davis of Amgen), searched with Sequest against human database and then analyzed by MassAnalyzer using CID and ECD for peptide identification. All spectra were validated manually. 4. Tryptic and Lys-C digestion of several commercial protein standards as well as recombinant proteins made by Amgen (analyzed by Paul Auger of Amgen). The data were analyzed by MassAnalyzer and validated manually. 5. Direct infusion or LC/MS/MS analysis of several commercially available peptides. The resulting ECD data set contains 7673 ECD spectra of 4919 doubly charged and 2754 multiply charged (3+ to 6+) peptide ions (4-44 residues in length, mass 500-4500 u). The “electron energy” setting was at 4-5.5 V, reaction time 25-400 ms, and isolation width 2-10. From the ECD data set, ∼10% of the spectra were randomly selected as the testing set, and the remaining 90% were used as the training set. A computer program, written in Microsoft C++, was developed for simulating ETD/ECD spectra and refining the model. The program is incorporated into MassAnalyzer,19 a program for fully automated protein and peptide LC/MS/MS data analyses, through a dynamically linked library (DLL). A static library for Linux is used on a Linux cluster to perform the training of the model. ETD/ECD spectra in the training data set were simulated with varied parameters until function f described in eq 38 was maximized. When simulation was performed on a desktop computer with 3.33 GHz Intel Core2Duo processor (only one CPU was used), for all the spectra with precursor mass less than 4000 u in the training data sets, an average simulation speed of 260 spectra/sec was achieved for ETD spectra and 860 spectra/s for ECD spectra, with larger peptides much slower than smaller peptides. A function optimization routine was developed on a Linux cluster and was used to optimize parameters in the model. The routine is an iterative process in which each parameter in the model was varied until function described in eq 38 was maximized. The process repeats until no further optimization can be achieved. RESULTS Many parameters were tested for their effects on the peptide fragmentation pattern. Those parameters determined to have significant effects were included in the model, and their optimized values are presented in Tables 1-5. Attempts to develop simpler models with fewer parameters usually generated significantly poorer fit to the experimental spectra. The developed ETD and ECD models, with their respective sets of optimized parameters, were used to simulate the spectra in the training and testing data sets. The predicted spectra were compared to the experimental spectra in each data set and the distribution of similarity scores are shown in Table 7. For fair comparison, peptides larger than 4000 u are not included in the calculation of similarity score distributions. For readers

Table 7. Distribution (Average ( Standard Deviation) of Similarity Scores of Predicted Spectra vs Experimental Spectra in the Training Data Sets and Testing Data Sets for Different Fragmentation Techniques and Charge Statesa,b 2+ ETD

ETD with supplemental activation

ECD

g3+

training

testing

training

testing

SIM ) 0.903 ± 0.055 DP ) 0.946 ± 0.073 R ) 0.908 ± 0.101 (n ) 2523) SIM ) 0.870 ± 0.057 DP ) 0.936 ± 0.072 R ) 0.914 ± 0.082 (n ) 2281) SIM ) 0.879 ± 0.071 DP ) 0.976 ± 0.040 R ) 0.949 ± 0.040 (n ) 4405)

SIM ) 0.905 ± 0.050 DP ) 0.948 ± 0.054 R ) 0.923 ± 0.064 (n ) 255) SIM ) 0.870 ± 0.051 DP ) 0.939 ± 0.059 R ) 0.921 ± 0.065 (n ) 173) SIM ) 0.878 ± 0.079 DP ) 0.977 ± 0.034 R ) 0.947 ± 0.034 (n ) 514)

SIM ) 0.805 ± 0.070 DP ) 0.815 ± 0.113 R ) 0.779 ± 0.127 (n ) 2199) SIM ) 0.776 ± 0.087 DP ) 0.816 ± 0.160 R ) 0.775 ± 0.183 (n ) 179) SIM ) 0.836 ± 0.090 DP ) 0.941 ± 0.117 R ) 0.904 ± 0.139 (n ) 2494)

SIM ) 0.825 ± 0.078 DP ) 0.834 ± 0.104 R ) 0.802 ± 0.120 (n ) 163) (n ) 0)

SIM ) 0.838 ± 0.090 DP ) 0.948 ± 0.101 R ) 0.913 ± 0.123 (n ) 254)

a Ions are considered the same when their m/z differ by less than 0.4u for LTQ data and less than 0.01u for LTQ-FT or LTQ-Orbitrap data. b SIM - similarity score (eq 37). DP - dot product. R - correlation coefficient.

more familiar with other similarity scores, distributions of spectral similarities calculated using dot product (DP) and correlation coefficient (R) are also listed in Table 7. It is seen from Table 7 that the similarity score distribution for each testing set is very similar to the distribution in the corresponding training set, demonstrating the validity of the ETD and ECD models. Figures 4-6 show some representative simulated ETD, ETD with supplemental activation, and ECD spectra of protonated peptides in the testing data set, as compared to their experimental spectra. The left side of each figure shows the entire mass range of the fragmentation spectra, and the right side shows a small mass range of the same spectrum to illustrate the isotope details of each fragment ion. In order to make weak fragment ions more visible, the intensities of ions in the full-range spectra (on the left) are in square-root scale. As demonstrated in the full range spectra, the general fragmentation efficiencies and patterns are accurately predicted for different charge states and fragmentation techniques. The details of each fragment subtype are shown in the zoom-view spectra on the right. It can be seen that most fragment ions including c, c · , z · , z′, y, and a · ions and some neutral losses from these ions as well as charge reduced species are predicted accurately. Most H · transfer, H · and H2 losses, and isotope patterns are also predicted nicely. DISCUSSION Although the detailed mechanism of ETD/ECD is not clear, an empirical mathematical model, for the purpose of predicting the peptide ETD/ECD spectra is possible because the model is relatively insensitive to the minor differences in the fragmentation mechanism. For example, backbone fragmentation depends on the hydrogen affinity (HA) of each backbone site. If the true mechanism of backbone fragmentation was demonstrated otherwise, those optimized HA values would most likely represent some other physical properties; the mathematics inside the model remains the same. In fact, all side-chain specific properties of a residue that may affect the likelihood of cleavage near it are reflected in the HA of the residue. Unlike CID, in ETD and ECD, the precursor ions are not heated; therefore, gas-phase peptides may exist in relatively stable

secondary and tertiary structures.35,42-45 As a result, gas-phase conformation of a peptide ion plays an important role in its fragmentation behavior. Although the training and testing data sets contains peptides as large as 7 kDa, reasonable accuracy is achieved for peptides up to 4 kDa. Prediction of ETD/ECD spectra of larger peptide is less accurate, presumably due to stable conformation of larger peptides in the gas phase.35,46 Effects of secondary structure to the ETD/ECD process43-45 are not considered in the model, due to the difficulty to predict the secondary structure of gas-phase peptides. Additionally, due to the low internal energy of the precursor ions, the protons tends to stay on the more basic sites, instead of redistribute among other less basic sites. Therefore, when a backbone cleavage occurs in a singly charged ion, one often observes either c or z-type of fragment ions and not both. Analyzing the optimized parameters (Tables 1-6) may yield useful information that advances our understanding of the peptide ETD/ECD process. Compared to CID, the likelihood of backbone cleavages in ETD and ECD is not drastically influenced by the nature of the neighboring residues, except for proline, as exemplified by the similar backbone HA value for each residue (see Tables 1 and 2). Backbone sites on the C-terminal side of a proline or cysteine residues are least likely to fragment due to their smallest HA values. The finding is consistent with previous observations on cysteine residues.8 On the contrary, cleavages on the Cterminal sides of histidine, glutamine, and glutamic acid are favored as reflected by their higher HA values. Similarly, the backbone cleavages on N-terminal sides of cysteine, valine, glycin, isoleucine, and oxidized methionine are less favored than other residues. The likelihood for a z · species to receive an H · from a c species depends highly on the N-terminal residue of the z · species and the C-terminal residue of the c species. Notably, when (42) Mihalca, R.; Kleinnijenhuis, A. J.; McDonnell, L. A.; Heck, A. J. R.; Heeren, R. M. A. J. Am. Soc. Mass Spectrom. 2004, 15, 1869–1873. (43) Polfer, N. C.; Haselmann, K. F.; Langridge-Smith, P. R. R.; Barran, P. E. Mol. Phys. 2005, 103, 1481–1489. (44) Lin, C.; O’Connor, P. B.; Cournoyer, J. J. J. Am. Soc. Mass Spectrom. 2006, 17, 1605–1615. (45) Ben Hamidane, H.; He, H.; Tsybin, O. Y.; Emmett, M. R.; Hendrickson, C. L.; Marshall, A. G.; Tsybin, Y. O. J. Am. Soc. Mass Spectrom. 2009, 20, 1182–1192. (46) Robinson, E. W.; Leib, R. D.; Williams, E. R. J. Am. Soc. Mass Spectrom. 2006, 17, 1469–1479.

Analytical Chemistry, Vol. 82, No. 5, March 1, 2010

2001

Figure 4. Representative simulated ETD spectra of protonated peptides in the testing data set, as compared to their experimental spectra. Shown on the left are full-range spectra with intensities in square-root scale. Shown on the right are the same spectra, with narrower mass range, to illustrate the isotope details of each fragment ion.

the N-terminus of the z · species is an arginine, glycine, or serine, or the C-terminal residue of the c species is a cysteine, the z · species is more likely to receive an H · (low Ea). However, when the N-terminus of the z · species is a cysteine or its two alkylated 2002

Analytical Chemistry, Vol. 82, No. 5, March 1, 2010

forms, it is very difficult to receive an H · (high Ea). Another property of cysteine and its two alkylated forms is that they are very easy to lose their side chains as radicals25 (low Ea). In fact, the z · ions with cysteine, carboxymethylated cysteine or carba-

Figure 5. Representative simulated ETD spectra, with supplemental activation, of protonated peptides in the testing data set, as compared to their experimental spectra. Shown on the left are full-range spectra with intensities in square-root scale. Shown on the right are the same spectra, with narrower mass range, to illustrate the isotope details of each fragment ion.

midomethylated cysteine at their N-terminus are usually not observed in an ETD or ECD spectrum, due to the extreme likelihood to lose their side chain as radicals. As another example, b · ions lose H · or CO easily, as indicated by the small activation energies (Tables 3 and 4). In fact, b · ions are rarely seen,32 as compared to the more abundant a · ions and b ions.36 Histidine side chain has a very high hydrogen affinity. However, it does not lose its side chain easily, indicating that histidine is a radical (H · ) trap, which is consistent with the finding that histidine-containing peptides have more “ET-no-D” species.47 After electron attachment and backbone cleavage, the resulting fragments often do not dissociate immediately, possibly held together by noncovalent interactions. The rate of dissociation depends primarily on Coulomb repulsion due to the possible positive charges on the two fragments. For a doubly charge precursor, however, after electron attachment, the charge-reduced species contains only one charge. In the absence of Coulomb force, the dissociation rate is extremely slow, thereby exhibiting a large amount of charge-reduced species. Supplemental activation increases the effective temperature of the charge-reduced species, forcing the fragment to dissociate without the help of Coulomb repulsion. For the same reason, triply charged parent ions generate very little doubly charged product ions, due to the absence of Coulomb repulsion in the dissociation process. The reason that ETD/ECD fragmentation efficiency depends highly on the charge density of the precursor ion is 2-fold. First,

the rate of electron attachment depends highly on the number of charges in the precursor ion (power of 1.94 for ETD and power of 0.67 for ECD, based on the optimized parameters in Table 5). Second, precursors with high charge densities exhibit stronger Coulomb repulsion, facilitating separation of the two fragments after backbone cleavage. It is frequently observed that the z · ion has an especially large +1 isotope peak. This is caused by an H · transfer from the c to the z · species. H · transfer from c to z · species is the major secondary process happening during ETD/ECD and may often affect the spectral interpretation if not fully understood. H · transfer happens when c and z · species are held together by noncovalent interactions. The extent of H · transfer observed in a spectrum depends on the competition of the H · transfer process and the fragment dissociation process. More H · transfer will be observed when its rate is faster than the dissociation rate. Therefore, more H · transfer is often observed for doubly charged precursor ions, when rate of dissociation is slow. The dependence of supplemental activation (or other means of increasing ion internal energy) to the extent of hydrogen transfer,48 however, depends on the ion effective temperature as well as the activation energies of the two processes, because increase of effective temperature increases the rates of both processes. To complicate the issue further, loss of H · from z · ion (to form a z ion) is also frequently observed, especially when the N-terminus of the z · ion is a serine or threonine residue21 (Tables 3 and 4).

(47) Xia, Y.; Gunawardena, H. P.; Erickson, D. E.; McLuckey, S. A. J. Am. Chem. Soc. 2007, 129, 12232–12243.

(48) Tsybin, Y. O.; He, H.; Emmett, M. R.; Hendrickson, C. L.; Marshall, A. G. Anal. Chem. 2007, 79, 7596–7602.

Analytical Chemistry, Vol. 82, No. 5, March 1, 2010

2003

Figure 6. Representative simulated ECD spectra of protonated peptides in the testing data set, as compared to their experimental spectra. Shown on the left are full-range spectra with intensities in square-root scale. Shown on the right are the same spectra, with narrower mass range, to illustrate the isotope details of each fragment ion.

Comparing ECD to ETD, ECD ions are hotter49 (typical effective temperature of 431 K as compared to 309 K for ETD) due to lack of collisional cooling in the high vacuum of an ioncyclotron resonance (ICR) cell. As a result, ECD of doubly charged peptides is usually richer in sequence information than ETD, similar to ETD with supplemental activation. Additionally, ECD usually shows less charge-reduced species due to less proton transfer (loss of a proton from the precursor ion) than ETD.30 Some residue-specific behaviors are also different for ECD as compared to ETD. For example, z · ions with N-terminal serine or threonine are extremely easy to lose H · and H2 in ECD; this behavior is not as significant during ETD. Although the ETD/ECD spectra are predicted with reasonable accuracy, the similarity score itself is not a good indictor for (49) Ben Hamidane, H.; Chiappe, D.; Hartmer, R.; Vorobyev, A.; Moniatte, M.; Tsybin, Y. O. J. Am. Soc. Mass Spectrom. 2009, 20, 567–575.

2004

Analytical Chemistry, Vol. 82, No. 5, March 1, 2010

identifying a peptide from random peptides with the same mass. The reason is that the most intense ions in an ETD/ECD spectrum are usually the precursor ions and their chargereduced species. Although the intensity of these ions is accurately predicted, which produces very high similarity scores, they do not contain sequence information and thus are not useful for distinguishing true and false identifications of peptide with the same mass. A simple, but not necessarily ideal, score for the purpose of peptide identification is obtained by removing the precursor ions as well as charge-reduced species from each experimental and predicted spectrum before calculating the similarity score using eq 37. To illustrate the capability of this sequence-ion similarity score to distinguish true and false identifications, ETD/ECD spectra of random peptides were predicted and compared to the experimental spectra of peptides of the same

Table 8. Distribution (Average ( Standard Deviation) of Sequence-Ion Similarity Scores of Predicted Spectra vs Experimental Spectra in the Training Data Sets and Testing Data Sets for Different Fragmentation Techniques and Charge Statesa 2+

g3+

training ETD ETD with supplemental activation ECD

0.658 ± 0.235(n 0.145 ± 0.132(n ∆ ) 0.513 0.732 ± 0.182(n 0.144 ± 0.103(n ∆ ) 0.588 0.531 ± 0.134(n 0.022 ± 0.041(n ∆ ) 0.509

testing ) 2523) ) 8196) ) 2281) ) 7354) ) 4405) ) 9608)

0.657 ± 0.197(n 0.141 ± 0.123(n ∆ ) 0.516 0.724 ± 0.139(n 0.139 ± 0.101(n ∆ ) 0.585 0.527 ± 0.141(n 0.024 ± 0.044(n ∆ ) 0.503

training ) 255) ) 657) ) 173) ) 507) ) 514) ) 1141)

0.677 ± 0.200(n 0.178 ± 0.095(n ∆ ) 0.499 0.591 ± 0.188(n 0.127 ± 0.096(n ∆ ) 0.464 0.494 ± 0.175(n 0.021 ± 0.031(n ∆ ) 0.473

testing ) 2199) ) 5484) ) 179) ) 554) ) 2494) ) 5206)

0.697 ± 0.230(n ) 163) 0.148 ± 0.078(n ) 370) ∆ ) 0.549 (n ) 0) 0.511 ± 0.166(n ) 254) 0.022 ± 0.031(n ) 554) ∆ ) 0.489

a Each table cell contains the sequence-ion similarity score distribution for the peptides with correct sequence, followed by sequence-ion similarity score distribution for peptides with random sequences but correct mass (within 10 ppm), followed by the difference (∆) of the means of the two distributions.

mass (within 10 ppm) in each data set. Table 8 shows the distributions of sequence-ion similarity score of true and false identifications in each data set as well as the separation (represented by the difference of the means) of the two distributions. For confident identification of the peptide, the similarity score distribution of true identifications is ideally far separated from that of false identifications. It is seen from Table 8 that supplemental activation after ETD significantly increased the separation of the two distributions for doubly charged peptides. For multiply charge peptides, however, supplemental activation does not seem to offer any help. As the CID model,16,17 the ETD/ECD model may be used for more reliable peptide/protein identification. There are generally two ways of using the computed theoretical peptide tandem mass spectra for large-scale peptide/protein identification. First, the computed theoretical spectra can be used to validate peptide identifications made by conventional search programs.50 Second, the theoretical spectra can be used as reference spectra in a spectral library for a spectrum-to-spectrum searching.51 The spectral library approach, in which reference spectra are obtained experimentally, has been used for large-scale protein identification.52-56 However, spectral library generated from experimental spectra has its problems. One major problem comes from the fact that most reference spectra are from large-scale protein identification experiments, in which spectral assignments are made by conventional database searching algorithms. To ensure the correct spectral assignments, these spectra have to be identified with high confidence by conventional search algorithms. As a result, (50) Sun, S. J.; Meyer-Arendt, K.; Eichelberger, B.; Brown, R.; Yen, C. Y.; Old, W. M.; Pierce, K.; Cios, K. J.; Ahn, N. G.; Resing, K. A. Mol. Cell. Proteomics 2007, 6, 1–17. (51) Yen, C. Y.; Meyer-Arendt, K.; Eichelberger, B.; Sun, S. J.; Houel, S.; Old, W. M.; Knight, R.; Ahn, N. G.; Hunter, L. E.; Resing, K. A. Mol. Cell. Proteomics 2009, 8, 857–869. (52) Craig, R.; Cortens, J. P.; Beavis, R. C. Rapid Commun. Mass Spectrom. 2005, 19, 1844–1850. (53) Craig, R.; Cortens, J. C.; Fenyo, D.; Beavis, R. C. J. Proteome Res. 2006, 5, 1843–1849. (54) Frewen, B. E.; Merrihew, G. E.; Wu, C. C.; Noble, W. S.; MacCoss, M. J. Anal. Chem. 2006, 78, 5678–5684. (55) Lam, H.; Deutsch, E. W.; Eddes, J. S.; Eng, J. K.; King, N.; Stein, S. E.; Aebersold, R. Proteomics 2007, 7, 655–667. (56) Ahrne, E.; Masselot, A.; Binz, P. A.; Muller, M.; Lisacek, F. Proteomics 2009, 9, 1731–1736.

reference spectra usually contain a large amount of sequence information and can often be easily identified by any conventional algorithms, without using the spectral library. Therefore, searching an experimental spectral library may not offer significant advantage over conventional search methods. Theoretically computed spectra, however, does not suffer this problem and therefore has its advantage over experimental spectra. Additionally, it takes virtually no time to build a spectral library with almost unlimited entries, and all these spectra are noise free. Due to the complementary nature of ECD/ETD as compared to CID, the ETD/ECD model described here, when combined with CID, can be used for reliable identification of peptides and post-translational modifications, especially for labile modifications such as phosphorylation, glycation, glycosylation etc., in most cases. The model will be implemented in the custom program MassAnalyzer for more reliable identification/quantification of modifications in therapeutic proteins.19 ACKNOWLEDGMENT The author thanks Professor Roman Zubarev (Karolinska Institute, Stockholm, Sweden) for his generous contribution of his Swed ECD database as well as helpful discussions. The author also thanks Shengheng Guan (University of California-San Francisco) for the initial discussion that started this work and for his contribution of ECD data from lymphocyte protein digest. Thanks also go to Jae Schwartz of Thermo Scientific for his help in understanding the supplemental activation process in Thermo LTQ instrument, Paul Auger/Mike Davis of Amgen for their contribution of ECD data from tryptic digestion of human serum, Bhavana Shah/Jason Richardson of Amgen for their help in collecting ETD data, and Wen Yu/James Eubank/Harold South of Amgen for their help to get the author started on the Linux cluster for training of the model. SUPPORTING INFORMATION AVAILABLE Figures S-1 and S-2. This material is available free of charge via the Internet at http://pubs.acs.org. Received for review December 1, 2009. Accepted January 25, 2010. AC902733Z

Analytical Chemistry, Vol. 82, No. 5, March 1, 2010

2005