Prediction of Low-Energy Collision-Induced Dissociation Spectra of

their tandem mass spectra using custom-written software Mas-. sAnalyzer16 and validated manually. The sequences of these peptides are virtually random...
9 downloads 0 Views 279KB Size
Anal. Chem. 2005, 77, 6364-6373

Prediction of Low-Energy Collision-Induced Dissociation Spectra of Peptides with Three or More Charges Zhongqi Zhang* Analytical Sciences, Amgen Inc., One Amgen Center Drive, Thousand Oaks, California 91320

A kinetic model, based on the “mobile proton” model of peptide fragmentation, has been reported previously for quantitative prediction of low-energy collision-induced dissociation (CID) spectra of singly or doubly charged peptides. For peptides with three or more charges, however, the simulation process is complex and timeconsuming. This paper describes a simplified model for quantitative prediction of CID spectra of peptide ions with three or more charges. Improvements on other aspects of the model were also made to accommodate large peptides. The performance of the simplified model was evaluated by generating predictions for many known highly charged peptides that were not included in the training data set. It was shown that the model is able to predict peptide CID spectra with reasonable accuracy in fragment ion intensities for highly charged peptide ions up to 5000 u in mass. Peptide tandem mass spectra generated from a low-energy collision-induced dissociation (CID) process have been widely used in peptide sequence confirmation and protein identification.1-6 To confirm a peptide by its tandem mass spectrum, a spectrum predicted from the sequence of the peptide is usually generated and compared to the experimental spectrum. For example, SEQUEST identifies a protein by calculating cross-correlation scores between the experimental spectra and the theoretical spectra predicted from peptide sequences.7 The predictions are usually performed without much differentiation of the intensity of each ion. If the relative intensities of product ions fragmented from a peptide can be predicted more accurately, more reliable peptide identification may be achieved. A kinetic model, based on the “mobile proton” model of peptide fragmentation,8,9 has been reported previously for quantitative prediction of low-energy CID spectra of singly or doubly charged * E-mail: [email protected]. Fax: (805)376-2354. (1) Yates, J. R. Electrophoresis 1998, 19, 893-900. (2) Aebersold, R.; Goodlett, D. R. Chem. Rev. 2001, 101, 269-295. (3) Steen, H.; Mann, M. Nat. Rev. Mol. Cell Biol. 2004, 5, 699-711. (4) Sadygov, R. G.; Cociorva, D.; Yates, J. R. Nat. Methods 2004, 1, 195-202. (5) Resing, K. A.; Ahn, N. G. FEBS Lett. 2005, 579, 885-889. (6) Wysocki, V. H.; Resing, K. A.; Zhang, Q. F.; Cheng, G. L. Methods 2005, 35, 211-222. (7) Eng, J. K.; McCormack, A. L.; Yates, J. R. J. Am. Soc. Mass Spectrom. 1994, 5, 976-989. (8) Dongre, A. R.; Jones, J. L.; Somogyi, A.; Wysocki, V. H. J. Am. Chem. Soc. 1996, 118, 8365-8374. (9) Wysocki, V. H.; Tsaprailis, G.; Smith, L. L.; Breci, L. A. J. Mass Spectrom. 2000, 35, 1399-1406.

6364 Analytical Chemistry, Vol. 77, No. 19, October 1, 2005

peptides acquired on a quadrupole ion trap mass spectrometer.10 The model has been proved quite accurate in predicting tandem mass spectra of small peptides and has been successfully used for de novo peptide sequencing.11 For peptides with three or more charges, however, the process for simulating tandem spectra is very tedious and time-consuming due to the complexity in calculating the proton distribution of a peptide and charge distribution of b and y ions generated from a backbone cleavage. This paper describes a model for quantitative prediction of CID spectra of peptide ions with any number of charges. While spectra of singly and doubly charged peptide ions are simulated precisely as described earlier, a simplified model is used for simulating spectra of peptide ions with more than two charges. Improvements on other aspects of the model were also made to accommodate large peptides. The simplified model is able to predict peptide CID spectra, achieving reasonable accuracy for the fragment ion intensities for highly charged (charge g3) peptide ions up to 5000 u in mass. THEORY AND METHODS Kinetic Model for Singly and Doubly Charged Peptides. The kinetic model used for simulating tandem spectra of singly or doubly charged peptides is briefly described below. The detailed model has been described previously.10 1. The effective temperature (Teff) for the parent ion is estimated from the normalized collision energy as well as the charge and mass of the peptide. 2. The distribution of charges (protons) among different protonation sites is calculated according to Boltzmann distribution based on the gas-phase basicity (GB) of each site. 3. A fragmentation rate constant for each pathway is calculated using the Arrhenius equation based on the activation energy (Ea) and frequency factor (A) of each pathway. Charge density at the cleavage site is considered, depending on whether the pathway is a charge-directed or charge-remote process. 4. When rate constants of all competing pathways are available, the extent of fragmentation for each pathway after a short step time (∆t) is then calculated. 5. One pathway may generate more than one fragment ion. For example, a backbone cleavage at a particular backbone amide site of a doubly charged precursor ion may generate b+, b2+, y+, and y2+ ions. After the extent of fragmentation for a certain pathway is calculated, the abundance for each fragment ion can be calculated based on the calculated charge distribution. (10) Zhang, Z. Anal. Chem. 2004, 76, 3908-3922. (11) Zhang, Z. Anal. Chem. 2004, 76, 6374-6383. 10.1021/ac050857k CCC: $30.25

© 2005 American Chemical Society Published on Web 08/27/2005

6. All product ions, as well as all precursor ions remaining unfragmented after the short step time ∆t, are subject to further calculation (Go back to step 2.) with decreased effective temperature and fragmentation time. The process continues until the fragmentation time runs out, or the rate for further fragmentation is negligible. 7. All product ions are added to the simulated spectrum, after considering the isotope distribution and trapping efficiencies. Simplified Model for Protonated Peptides with Three or More Charges. For protonated peptide ions with three or more charges, the simulation process is complex and computationally challenging. The main complexity arises from calculating proton distribution (step 2, see above) and abundance of each fragment ion (step 5) after the extent of fragmentation for each pathway is calculated. For a compromise between computation speed and accuracy in prediction, the following simplifications were made for parent peptide ions with three or more charges. 1. Proton Distribution. The precise way of calculating proton distribution is through Boltzmann distribution.10 However, for ions with three or more charges, this approach becomes computationally impractical due to the large number of microstates and complicated contribution of Coulomb repulsion to the energy level of each microstate. To resolve this issue, when the parent peptide ion has three or more charges, the following simplified model is used to calculate proton distribution. To the first approximation, the protons in a peptide ion are assumed to distribute proportionally among different protonation sites according to the proton association constants K of these sites, which are calculated by

Ki ) exp(GBi/RTeff)

(1)

where Ki is the proton association constant at site i, GBi is the gas-phase basicity of site i, R is the gas constant, and Teff is the effective temperature of the ion. Therefore, the proton density on site k is calculated by

pk ) zKk/

∑K

i

(2)

i

where z is the number of charges on the ion. If the proton densities on any sites calculated from eq 2 are greater than 1, these sites are assumed to be fully occupied with a proton density of 1. Then the apparent proton association constants for all other sites are calculated according to eq 3, where GBi is the gas-phase basicity

Kapp(i) ) exp{[GBi -

∑(F

GB Coulomb/rij)]/RTeff}

(3)

j

GB of site i, FCoulomb is the factor reflecting the contribution of Coulomb repulsion to the gas-phase basicity, rij represent the distance between sites i and j, in a unit of residue length, and j represent those sites that are fully occupied. After Kapp are calculated for all remaining sites, the remaining protons are distributed in the remaining sites according to eq 2, using Kapp as the proton association constants and remaining protons as z. This process repeats until proton densities calculated from eq 2 are less than 1 for all protonation sites. This method of

calculating the proton distribution for highly charged ions is much faster than the method based on Boltzmann distribution but provides reasonably accurate results. 2. Abundance of Each Fragment Ion. The other difficulty for simulating spectra of highly charged peptide ions involves calculating the abundance of each fragment ion after the extent of fragmentation for a certain backbone cleavage is calculated; i.e., how much of those b or y fragments are neutral, singly charged, doubly charged, or triply charged? To simply the problem, it is assumed that each fragment (b or y) has no more than four charge states. This assumption was found true even for very large peptides (up to at least 10 000 u). Let us name the four charge states as c, c + 1, c + 2, and c + 3 and the charge state of the precursor ion as z. The abundances (or probabilities) for the ion to have these charges are designated as A0, A1, A2, and A3, respectively. For any fragment ion, the total charge density (or average charge state) jz can be calculated by adding proton densities for all sites in the ion. It is obvious that the centroid (abundance weighted average) of the four charge states should equal the total charge density of the ion.

jz )

∑p ) cA i

0

+ (c + 1)A1 + (c + 2)A2 + (c + 3)A3 (4)

i

It is also assumed that jz is always between c + 1 and c + 2; therefore, with jz known from calculated charge distribution, c is also known. Please note that the four charge states must be within 0 and z. If either c or c + 3 is outside this range, their abundances (A0 or A3) are automatically determined as zero. Otherwise their abundances are determined as follows. It is assumed that a backbone cleavage does not change the distribution of protons, which is already calculated previously. First we need to calculate A0, the probability of a fragment ion to have c charges. Let us take a b ion for example. Since the fragment ion has minimum of c charges, at least c sites are fully occupied with protons. Therefore, we first find out the c sites with largest proton densities and assume the proton density of 1 for all these sites. A0 then corresponds to the probability of zero protons in all remaining sites in the fragment ion, which is estimated from the following equation. Nb

A0 )

N

∏(1 - p (z - c)/∑p ) k

k)1

i

(5)

i)k

where Nb represent the total number of remaining site in this b ion and N represent the total number of remaining sites in the precursor ion, z - c is the total number of remaining charges in the precursor, and pi represents the previously calculated proton density on a remaining site i of the precursor. A0 of the complimentary y ion can be calculated the same way. Please note that A0 of a b ion equals A3 of the complimentary y ion and A0 of a y ion equals A3 of its complimentary b ion; as a result, A0 and A3 of both b and y ions are determined from the above calculation. With A0 and A3 known, considering eq 4 and that

A0 + A1 + A2 + A3 ) 1

(6)

A2 and A3 can be obtained by solving the two equations. In Analytical Chemistry, Vol. 77, No. 19, October 1, 2005

6365

practice, when the charge and mass of the parent peptide are both small ( m0 + w/2 - ∆w2 (18)

Aparent ) aieffiso(mi) i

(19)

(c) Isotope Pattern of Fragment Ions. Let us assume that all isotopes are distributed evenly across the entire molecule, that is, when a parent ion dissociates into two fragments, the amount of isotopes distributed into the two fragments will be proportional to their masses. Thus, the isotope distribution of any fragment ion can be calculated by the following equation.

Afrag ) j

∑ igj

[

Aparent C(i,j) i

( )( Mfrag

Mparent

j

1-

Mfrag

)] i-j

Mparent

where C(i,j) )

, i!

j!(i - j)!

(20)

where Afrag is the abundance of the jth isotope (j g 0) in the j fragment ion and Mfrag and Mparent are the mass of the fragment ion and parent ion, respectively. C(i, j) represents the number of possible ways of selecting j objects from i. 2. Effect of Peptide Size to Fragmentation. When the fragment spectra of large peptides were examined, it was found that large peptides tend to fragment at the C-terminal side of acidic or basic residues, even if the number of charges on the parent ion is greater than the number of arginine residues in the peptide. The author ascribes this phenomenon to the decreased gas-phase basicities of backbone amides for large peptides. To accommodate

this observation, an extra parameter fGB is introduced to describe the decrease of backbone GB. The gas-phase basicity of backbone amide is then calculated by BB GBBB(Ai) ) GBBB L (Ri) + ∆GBR (Ri+1) - fGB ln(M/1000)

(21) BB where GBBB L and ∆GBR are contributions of the residues to the left and right of the backbone amide10 and M is the mass of the ion. 3. Effective Temperature of the Parent Ion. When the effective temperature of the parent ion is calculated, the original assumption that the effective temperature is linearly correlated to the charge state and mass of the parent ion does not suffice for highly charged or large peptides. Instead, a logarithm was taken for the charge and mass of the parent ion when calculating the effective temperature as shown below.

T0eff ) T + cE(Ec - 35) + cz ln(z) + cM ln(M/1000)

(22)

where cE, cz, and cM are constants representing the dependence of T0eff on normalized collision energy Ec (%), charge of the parent ion z, and the mass of the peptide M. T is the effective temperature of a parent ion with z ) 1, Ec ) 35%, and M ) 1000 u. 4. Length of Side Chains. When the contribution of Coulomb repulsion to the gas-phase basicities was calculated, it was originally assumed that the distance from the side-chain charge center to the peptide backbone is one residue length. To better estimate the effect of charge repulsion to backbone fragmentation, especially those peptide bonds near arginine residues, a more accurate estimation of side-chain length, based on the number of bonds in the side chains, is used. 5. Activation Window. Ideally, one expects that only the isolated parent ions are activated during collisional activation in an ion trap; i.e., the activation window is the same as the isolation window. It was found, however, that the activation window is larger than the isolation window on both Finnigan LCQ DECA and LTQ instruments. As a result, some fragment ions with their m/z lying within this enlarged range (e.g., water or ammonia loss from highly charged parent ions) are also activated and fragmented. An extra parameter was introduced to take this phenomenon into consideration. It was found that, for both LCQ DECA and LTQ, the higher m/z end of the activation window matches the isolation window quite well. However, the lower m/z end of the activation window is significantly lower than the one determined by the isolation window. Therefore, the model calculates the lower end (wlow a ) and higher end (whigh ) of the activation window from the a following formulas.

wlow a ) m - w/2 - fAWm

(23)

) m + w/2 whigh a

(24)

where m is the m/z value of the parent ion, w is the isolation

width used in the experiment, and fAW is a factor for calculating the low m/z end of the activation window. When any fragment ion lies within this window, it will be activated and its effective temperature calculated from eq 22. 6. N-Terminal Neutral Losses. Additional fragmentation pathways were included in the model. These pathways are neutral losses of N-terminal one or two residues when the second or third residue is a proline. Although N-terminal neutral losses were found in many situations for up to four residues,12 as additional pathways, the author found the pathways are only significant when the cleavage happens on the first or second amide bond with a proline residue located on the right side of the amide bond. These pathways are a charge-directed process.13 To calculate the rate constants for these pathways, two frequency factors, ANtermNL1 and ANtermNL2 for one-residue loss and two-residue loss, respectively (see Table 1), were introduced as two extra parameters in the model. The activation energies are calculated by adding EAL of the residue to the left of the cleavage site and ∆EAR of proline (Table 1). 7. bx f bx-1 Mechanism. For charge-directed backbone cleavages (mechanism A),10 the bx f bx-1 reaction goes through a mechanism different from other backbone cleavages due to the oxazolone structure at the C-terminal end of a b ion.14,15 To take (Table 1) this fact into account, an extra parameter ∆EOxazolone R was added to the activation energy, as calculated previously by adding contributions of neighboring residues together, for the charge-directed bx f bx-1 fragmentation. Training of the Model. The number of fragment ions increases rapidly with the size of the peptides. As a result, the chance of coincidental match of ions between the experimental spectrum and predicted spectrum increases significantly with increased peptide length. To accommodate for large peptides, the following function is maximized when optimizing the parameters in the model.

f)

1

N

∑(s - s ) N i

ij

i)1

x∑(GB

0.2 N

0.2

x∑(∆GB N

BB 2 R )

-

0.005 N

A L

x∑ (∆E )

C 2 L

A 2 L

-

2 - GBBB L ) -

0.02 N

x∑(E - E ) N

0.02

BB L

-

x∑(∆E

0.02 N

x∑ (∆E )

0.005 N

x∑(E

-

A 2 R

x∑(E

0.005 N

A 2 L-1)

C R

-

- ECR )2 -

CO-

- ECO-)2 (25)

where si is the similarity of experimental spectrum and simulated spectrum (see below)10 for peptide i and sij is the average (12) Martin, D. B.; Eng, J. K.; Nesvizhskii, A. I.; Gemmill, A.; Aebersold, R. Anal. Chem. 2005, 77, 4870-4882. (13) Cordero, M. M.; Houser, J. J.; Wesdemiotis, C. Anal. Chem. 1993, 65, 15941601. (14) Yalcin T.; Harrison A. G. J. Mass Spectrom. 1996, 31, 1237-1243. (15) Vachet, R. W.; Ray K. L.; Glish, G. L. J. Am. Soc. Mass Spectrom. 1998, 9, 341-344.

Analytical Chemistry, Vol. 77, No. 19, October 1, 2005

6367

Table 1. Optimized Values of Parameters Used in Calculation of Backbone Cleavages residue

side chain

backbone

GBSC A C CmCa D E F G H I K L M N P Q R S T V W Y NH2COOH b ion a ion

G

830 784 790 928.72 926.78 830 862.99 863.77 1000 775 780 899.05 790

mechanism A (charge-directed)

BB BBB L ∆GBR

882.19 881.61 880.80 879.81 880.02 880.71 881.81 881.88 880.39 880.42 882.25 881.60 881.09 882.31 881.97 885.54 879.55 881.11 882.46 881.52 881.49 912.58

0.00 -1.36 -0.49 -1.18 -0.72 -0.06 1.17 0.26 -1.56 -1.65 -0.32 -0.57 2.38 15.05 6.04 9.73 1.62 1.33 -2.17 0.63 -0.80

A ∆EL-1

EAL

∆EAR

0.00 5.01 3.76 4.44 2.82 0.78 2.57 -3.37 -0.92 1.71 -1.44 1.25 1.99 4.83 2.91 7.99 2.87 0.68 -0.27 -0.69 0.54 21.93

168.13 172.76 168.72 171.11 166.60 164.90 181.18 164.54 160.02 165.61 164.06 165.85 172.55 183.55 168.41 169.50 171.31 169.86 162.50 164.57 166.74

0.00 -0.71 -1.07 0.19 -0.36 -2.96 0.83 -3.12 -2.92 -5.22 -1.40 -2.61 3.26 -0.47 7.20 8.57 0.60 1.74 -3.04 -3.24 -2.73

-96.19 37.17 -73.17

EBL

153.45 168.30 165.14 159.92

177.58

C-term rearrangement

a ion formation

ECR

∆ECL

Ebfa

185.59 188.14 182.74 173.16 183.24 185.93 205.88 170.38 175.68 169.44 180.79 185.01 189.50 173.78 189.62 169.26 188.80 189.31 177.49 182.82 181.11

0.00 -2.39 -0.38 1.75 -0.92 1.02 -0.23 0.62 4.73 -0.25 0.67 -2.58 0.60 2.03 -0.42 -5.66 -3.20 -0.13 3.32 3.04 1.86

129.00 132.99 124.68 145.13 128.93 122.64 132.30 119.90 121.50 163.97 123.65 130.41 137.90 140.48 138.13 131.85 131.29 132.69 124.43 123.38 121.80

168.03

A

5.81 × 1013 s-1

ANtermNL1 ANtermNL2 ∆EOxazolone R

8.94 × 1011 s-1 8.46 × 1013 s-1 -11.26

a

mechanism B

3.05 × 1010 s-1

1.62 × 1011 s-1

1.56 × 109 s-1 (chargedirected) 3.94 × 108 s-1 (chargeremote)

CmC, carboxymethylated cysteine.

similarity between the simulated spectrum of peptide i and experimental spectra of some peptides j unrelated to peptide i but with similar masses. N is the total number of spectra in the training data set. All other symbols are parameters representing contributions of different residues to the gas-phase basicities of backbone amide and activation energies of different pathways.10 Comparing to the original function, sij is applied to reduce undesired effects caused by coincidental matches. The similarity of the two spectra is defined by the following equation

∑xI I x(∑ I )(∑ I 1

s)

m

2

m

(26)

1

m

m

2

)

where I1m and I2m are intensities of an ion at m/z ) m for the two spectra. Two ions are considered the same ion if their m/z ratios differ by less than 0.4 u. EXPERIMENTAL SECTION To train the improved model described in this work for predicting CID spectra of peptide ions with any number of charges, a training data set containing 8876 peptide CID spectra was used. These 8876 CID spectra were generated from 7735 ions (3464 singly charged, 3310 doubly charged, 727 triply charged, and 234 with 4-14 charges) of 6090 different peptides (3-87 6368

Analytical Chemistry, Vol. 77, No. 19, October 1, 2005

residues, 300-9600 u in mass). CID spectra of the same parent ion with different collision energies or different isolation windows were treated as different spectra. Spectra were collected on four Thermo Finnigan LCQ DECA mass spectrometers (6507 spectra) and one Thermo Finnigan LTQ mass spectrometer (2369 spectra) in centroid mode with isolation width of 2.0-4.0 u, activation q of 0.25, activation time of 30 ms, and normalized collision energy of 15-50%. The average and standard deviation of normalized collision energy were 33.8 and 3.0%, respectively. Spectra of the same parent ion collected on the LTQ were treated as different spectra from the spectra collected on the LCQ DECAs. Different instrument-specific parameters (see Table 4) were used for LTQ as compared to LCQ DECA. Most peptides were generated from proteolytic digestion of well-characterized proteins made in Amgen (Thousand Oaks, CA) and commercial proteins purchased from Sigma-Aldrich (Saint Louis, MO). Each peptide was identified from their tandem mass spectra using custom-written software MassAnalyzer16 and validated manually. The sequences of these peptides are virtually random due to different specificities of proteases used in the digestion. Proteases used for digestion include trypsin, endoproteinases Lys-C, Glu-C, Asp-N, and Arg-C from Roche (Mannheim, Germany), trypsin and pepsin from Sigma-Aldrich, and endoproteinase Lys-C from Wako (Osaka, (16) Zhang, Z.; McElvain, J. S. Proc. 49th ASMS Conf. Mass Spectrom. Allied Topics, Chicago, IL, 2001.

Table 2. Optimized Values of Parameters Used in Calculation of Side-Chain and Terminal Neutral Losses Ea (H2O loss) residue D E K N Q R S T NH2 COOH A (s-1)

charge-directed 94.96 68.40

charge-remote

Ea (pyroglutamic acid formation)

Ea (NH3 loss) charge-directed

charge-remote

113.68 133.36

37.13 38.60

125.82 127.00

54.70 1.43 × 1012

128.20 4.36 × 107

205.61 127.63 136.23 198.39

459.09 112.44 132.64 291.23

590.63

127.47

1.43 × 1012

Japan). Most spectra were collected with reversed-phase LC/MS at ∼200 µL/min flow rate using acetonitrile gradient and 0.1% TFA in the mobile phase. A testing data set generated from enzymatic digestion of many commercial proteins (Sigma-Aldrich) was used to validate the accuracy of the prediction. The testing data set contains 191 CID spectra of 191 ions with three or more charges of 176 different peptides (8-47 residues in length, mass