ARTICLE pubs.acs.org/ac
Prediction of Collision-Induced-Dissociation Spectra of Peptides with Post-translational or Process-Induced Modifications Zhongqi Zhang Process and Product Development, Amgen Inc., One Amgen Center Drive, Thousand Oaks, California 91320, United States
bS Supporting Information ABSTRACT: Mass spectrometry, combined with collision-induced dissociation (CID), has become the method of choice for analyzing protein posttranslational and process-induced modifications. However, confident and automated identification of modifications and modification sites is often challenged by the diversity of modifications and their labile nature under typical CID conditions. An accurate prediction of the CID spectra of modified peptides will improve the reliability of automated determination of modifications and modification sites. In this article, the kinetic model for the prediction of peptide CID spectra is extended to the prediction of the CID spectra of modified peptides. The mathematical model for predicting CID spectra of peptides with enzymatic and chemical modifications such as (1) phosphorylation of serine, threonine, and tyrosine, (2) S-carboxymethylation and carbamidomethylation of cysteine, (3) different stages of oxidation of methionine, tryptophan, and cysteine, (4) glycation of lysine, (5) O-mannosylation of serine, (6) hydroxylation of lysine, and (7) N-monomethylation and N-dimethylation of lysine is described. The mathematical model, once established with CID spectra of peptides with known modifications and modification sites, is able to predict CID spectra with excellent accuracy in ion intensities, facilitating more reliable identification of modification and modification sites.
M
ass spectrometry, combined with collision-induced dissociation (CID), has become the method of choice for characterizing protein post-translational and process-induced modifications, in both proteomics settings1 3 and therapeutic protein characterizations.4,5 Notable post-translational modifications include glycosylation,6 phosphorylation,7 methylation,8 mannosylation,9,10 hydroxylation,11 etc. Process-induced chemical modifications such as oxidation, glycation, etc.12 are also a concern in therapeutic protein development. However, automated and reliable identification of protein modifications and modification sites is often challenged by the diversity of modifications and their labile nature under typical CID conditions. For example, phosphorylated,13,14 glycated,15 mannosylated, and glycosylated16 peptides tend to lose the modified side chains during CID, giving few sequence ions to determine the peptide sequence and the site of modification. If the CID spectrum of a modified peptide can be predicted accurately, however, a more reliable identification of the modification and modification site may be achieved based on the characteristic fragmentation pattern of the labile side chain and limited sequence information from peptide backbone cleavages. The peptide fragmentation process during CID can be largely explained by the mobile proton model.17,18 The mobile proton model states that most peptide backbone fragmentation pathways are “charge directed”; i.e., they involve a proton at the cleavage site. In order for a backbone amide bond to cleave into the commonly observed b and y ions, an input of energy is required to move the proton from the more basic sites to the peptide backbone. “Charge-remote” pathways, however, do not require a proton at the cleavage site. r 2011 American Chemical Society
Based on the mobile proton model and current understanding of peptide fragmentation pathways,19 an empirical kinetic model has been developed for quantitative prediction of CID spectra of unmodified peptides.20,21 The model has been used for de novo peptide sequencing,22 full characterization of therapeutic proteins,23 validation of peptide identifications,24,25 and generating a theoretical spectral library for more reliable protein identification.26,27 This article describes the extension of the kinetic model to the prediction of CID spectra of modified peptides. Enzymatic and chemical modifications including phosphorylation, oxidation, glycation, O-mannosylation, hydroxylation, and N-methylation will be discussed. Protein N-glycosylation, one of the most common post-translational modifications, has been described previously in a separate article.28
’ COMPUTATIONAL METHOD The model predicts CID spectra of modified peptides the same way as unmodified peptides20,21 by treating each modified amino acid residue as a distinct residue with distinct parameters, including its side-chain gas-phase basicity (GB), contribution to the GB and activation energies (Ea) of nearby backbone cleavage sites, etc. This empirical kinetic model is based on the “mobile proton” model of peptide fragmentation.17,18 The procedure starts by calculating the proton distribution in the precursor ion based on the Boltzmann distribution, followed by calculating the Received: August 10, 2011 Accepted: October 13, 2011 Published: October 13, 2011 8642
dx.doi.org/10.1021/ac2020917 | Anal. Chem. 2011, 83, 8642–8651
Analytical Chemistry
ARTICLE
Table 1. Modifications and Their Neutral Losses Incorporated in the Current Model modification
a
modified residue
abbreviation
backbone fragmentation mechanismsa
side-chain neutral losses
phosphorylation
Ser
pS
A
H3PO413,31
phosphorylation
Thr
pT
A
H3PO413,31
phosphorylation
Tyr
pY
A
HPO3, HPO3 + C-terminal H2O13,31
S-carboxymethylation
Cys
Cm-C
A
HSCH2COOH
S-carbamidomethylation
Cys
Cam-C
A
HSCH2CONH2
oxidation
Met
Ox-M
A
CH3SOH32,33
oxidation
Trp
Ox-W
A
H2O34
double oxidation double oxidation
Trp Cys
Ox2-W Ox2-C
A and B A and B29
H2O H2O and H2SO229
N-methylation
Lys
Me-K
A and B
none
N-dimethylation
Lys
Me2-K
A and B
none
glycation
Lys
Glyc-K
A
H2O, 2H2O, 3H2O, 4H2O, 3H2O + HCHO, C6H10O515
O-mannosylation
Ser
Man-S
A
H2O, 2H2O, 3H2O, 4H2O, 3H2O + HCHO, C6H10O5
hydroxylation
Lys
Hyl
A and B
none
Backbone fragmentation includes two mechanisms: mechanism A is backbone charge directed, and mechanism B is facilitated by a side-chain proton.20
rate constant for each competing pathway using the Arrhenius equation and then by calculating the abundance of each product ion using first-order kinetics. The procedure is an iterative process in that any “product ion” will become “precursor ion” and undergo further fragmentation if the reaction time allows. If a modified side chain involves neutral losses during CID, each neutral loss is assumed to take place through both charge-directed and chargeremote pathways, and parameters describing these pathways are included in the model for each modified residue. Some modifications do not significantly change the chemical properties of the modified residue. As a result, these modifications generally have a limited effect on the fragmentation pattern of the peptide. These modified residues are assumed to have the same parameters as the unmodified residues (except for the absence of neutral losses from the modified side chain). For example, hydroxylation of a proline residue was found to not change the fragmentation behavior significantly. Although hydroxylation and N-methylation of a lysine residue are built into the current model, these modifications often do not significantly change the peptide fragmentation behavior either, because hydroxylation and N-methylation of a lysine residue do not significantly change the GB of the lysine side chain. Many other modifications do not contribute to the general peptide fragmentation patterns to a significant extent. Examples include most nonbasic, gas-phase-stable modifications that do not introduce charge-remote selective cleavages. These modified residues are considered by the model to have the parameters of an average amino acid residue. An average amino acid residue is defined as a residue with a stable and nonbasic side chain, and does not cause selective (mechanism B) backbone cleavages; all other parameters are the average values of all unmodified amino acid residues. With this simple approach, CID spectra of most modified peptides can be predicted with reasonable accuracy. Examples of these modifications include dehydration of serine, threonine, or aspartic acid, succinimide formation (loss of NH3) from asparagine, double oxidation of methionine, triple oxidation of cysteine, carbamylation of lysine, etc. What is important and challenging are those special modifications that do change the overall peptide fragmentation patterns, either through introduction of additional fragmentation pathways such as oxidation of cysteine to cysteine sulfinic acid,29,30 or
through introduction of labile side chains such as phosphorylation,13,31 glycosylation, glycation,15 methionine oxidation,32,33 and, to a lesser extent, tryptophan oxidations.34 To accurately predict the CID spectra of peptides with these modifications, the prediction model needs to be expanded to include the additional fragmentation pathways introduced by these modifications. Table 1 lists these special modifications incorporated into the model. The neutral losses for these modified residues are either obtained from the literature or based on the author’s observations. Cysteine S-carboxymethylation and carbamidomethylation are included as they are very common alkylation reactions employed during protein digestions. It was found that S-carboxymethylated and carbamidomethylated cysteine side chains may undergo neutral loss of HSCH2COOH or HSCH2CONH2, respectively. Loss of 98 Da from phosphotyrosine is assumed to be a simultaneous loss of HPO3 from phosphotyrosine side chain and H2O from the peptide C-terminus.31 Neutral losses from glycated lysine and O-mannosylated serine were assumed to be the same due to the similarity of the two modifications (both modifications add a C6H10O5 sugar moiety to the side chain). It was also assumed that doubly oxidized tryptophan side chain may facilitate charge-remote backbone fragmentation (mechanism B). Both singly and doubly oxidized tryptophan side chains undergo water losses, with a different mechanism when the oxidized tryptophan residue is on the N-terminus of a peptide or fragment. More special modifications will be added to the model in the future, if they are found to change the peptide fragmentation behavior significantly, and when enough training spectra are collected. The model was trained using CID spectra of known peptides by examining the similarity scores20 between the predicted and experimental spectra, as described in the Supporting Information. The best match between the simulated and experimental spectra was obtained when the average similarity of all spectra in the training data set were maximized. Cysteine S-carboxymethylation and carbamidomethylation, methionine oxidation, and serine, threonine, and tyrosine phosphorylation are commonly observed. Therefore, these modified residues were treated exactly the same way as unmodified amino acids with all their parameters optimized. Model training for these modifications was performed together with ∼18 000 spectra 8643
dx.doi.org/10.1021/ac2020917 |Anal. Chem. 2011, 83, 8642–8651
Analytical Chemistry of unmodified peptides acquired on Thermo Scientific LTQ linear ion-trap instruments. For all other modifications listed in Table 1, only a few parameters that are deemed to significantly affect the overall fragmentation pattern were optimized, leaving other less significant parameters the same as for the unmodified amino acid. Parameters related to these modifications were optimized by using only spectra corresponding to the modified peptides, leaving parameters unrelated to these modifications constant. The source of experimental spectra used for training and testing of the model is described in the Supporting Information. Compared to the model described previously,21 some additional pathways were added to the current model. These additional pathways include ax to ax 1 pathways, a1 yn 1 pathways, and diketopiperzazine yn 2 pathways.19 The diketopiperzazine pathway was included in the earlier model version21 only when the third residue is a proline. Details of these additional pathways are described in the Supporting Information. A computer program, written in Microsoft Visual C++, was developed for simulating CID spectra and refining the model. The program was incorporated into MassAnalyzer,23 a program for fully automated protein and peptide LC/MS/MS data analyses, through a dynamically linked library (DLL). A static library for Linux was also developed for simulating peptide CID spectra and was used on a Linux cluster for optimizing the model. CID spectra in the training data set were simulated with varied parameters until the average similarity score was maximized. A function optimization routine was developed and was used to optimize parameters in the model. The routine was an iterative process in which each parameter in the model was varied until the average similarity score was maximized. The process was repeated until no further optimization could be achieved. For a test of MassAnalyzer or the spectral prediction libraries for noncommercial research purposes, please contact the author directly.
’ RESULTS Many modifications do not significantly affect the overall fragmentation pattern of the modified peptides. Therefore, the modified residues are considered to have the parameters of an average amino acid residue. Examples include doubly oxidized methionine, triply oxidized cysteine, dehydrated serine, threonine, and aspartic acid, succinimide formed from asparagine, etc. Figure S-1 in the Supporting Information shows predicted spectra of peptides with several of these modifications. Some modifications do not change the chemical properties of the residue significantly, such as in the case of hydroxylated proline, in which case the modified residue is considered to have the same parameters as the unmodified residue. For modifications that significantly affect the peptide fragmentation behavior, extra parameters are introduced to describe the properties related to the modified residues. The optimized parameters related to backbone fragmentation for all modified residues considered in this article are shown in Table S-1 in the Supporting Information, together with parameters of all other residues. Parameters related to S-carboxymethylated cysteine, S-carbamidomethylated cysteine, oxidized methionine, phosphoserine, phosphothreonine, and phosphotyrosine were optimized together with all parameters related to unmodified residues, using a data set containing a total of 25 480 CID spectra. Parameters for other modifications were optimized separately using spectra of only modified peptides, with unrelated parameters unchanged.
ARTICLE
The entire list of optimized parameters for all residues is also updated in the Supporting Information (Tables S-1 to S-4). Many modified side chains are labile in the gas phase. These side chains produce neutral losses during CID. Each neutral loss is assumed to take place through both charge-directed and charge-remote pathways. The optimized activation energies and A factors for each of these neutral-loss pathways are shown in Table S-2 in the Supporting Information. Glycated lysine side chain and O-mannosylated serine side chain have many possible neutral losses. The activation energies and A factors for each of these neutral losses are shown in Table S-4 in the Supporting Information. Due to the similarity of the glycation and mannosylation (both modify a residue by adding a sugar moiety of C6H10O5), for convenience, the same mathematical model is used for glycation and mannosylation, although some neutral losses observed from glycated lysine are not observed in mannosylated serine. Table S-4 in the Supporting Information clearly shows that glycated lysine side chain loses two to three water molecules easily (high A factors) while an O-mannosylated serine side chain tends to lose the entire sugar moiety (C6H10O5). Table 2 shows the average values and standard deviations of similarity scores, when the optimized model is used to predict spectra in each training set. The average similarity scores are generally close but slightly lower than the prediction accuracy for unmodified peptides (0.71 ( 0.12, see Supporting Information), because the CID spectra of modified peptides generally have lower quality due to their lower signal intensities. Also shown in Table 2 are similarity scores when the modified residues are assumed to have the parameters of an average amino acid residue, or assumed to have the same parameters as the unmodified residue. It is seen that great improvements are made in the optimized model for some modifications (phosphorylation, methionine oxidation, glycation, mannosylation, cysteine double oxidation), but small improvements for others (cysteine S-carboxymethylation and carbamidomethylation, tryptophan oxidation and double oxidation, and lysine hydroxylation, N-methylation, and N-dimethylation). To aid comparison of these similarity score distributions, the percent of predicted spectra with similarity scores greater than the median score predicted with the optimized model are also shown in Table 2. The optimized model is used to predict the 284 spectra of phosphopeptides, 100 spectra of peptides containing oxidized methionine, and 24 spectra of peptides containing glycated lysine, in the testing data sets. The spectra in the testing sets were randomly selected from the collection of experimental spectra (see Supporting Information). Table 3 shows the comparison of similarity score distribution in each training data set and testing data set. The similarities of the two distributions demonstrate the validity of the prediction model.
’ DISCUSSION “Mobile Proton” Model of Peptide Fragmentation. The well-established “mobile proton” model17,18 explains most peptide fragmentation phenomena under low-energy CID, including peptides with various modifications. Most peptide backbone fragmentation pathways are charge-directed, which require a charging proton on the cleavage site. The exceptions are cleavages of peptide bonds on the C-terminal side of a few acidic residues, which can undergo charge-remote fragmentation pathways. 8644
dx.doi.org/10.1021/ac2020917 |Anal. Chem. 2011, 83, 8642–8651
Analytical Chemistry
ARTICLE
Table 2. Distribution of Similarity Scores (s) for Modified Peptides in the Training Data Set, Using the Prediction Model with Optimized Parameters, Parameters of an Average Amino Acid Residue, and Parameters of the Unmodified Residue “average residue” model
optimized model
“unmodified residue” model
spectra modified peptides
no. of spectra
avg s ( std deva
median
avg s ( std deva
(s > median)b (%)
spectra avg s ( std deva
(s > median)b (%)
Ser/Thr/Tyr phosphorylation
2248
0.70 ( 0.10
0.71
(0.47 ( 0.13)
4
(0.47 ( 0.14)
4
Met oxidation Cys S-carboxymethylation
887 2602
0.71 ( 0.12 0.65 ( 0.13
0.72 0.65
(0.61 ( 0.16) 0.64 ( 0.13
29 48
(0.61 ( 0.16) 0.64 ( 0.12
27 48
Cys S-carbamidomethylation
1894
0.70 ( 0.12
0.71
0.69 ( 0.12
49
0.69 ( 0.12
49
97
0.69 ( 0.10
0.70
0.68 ( 0.10
44
0.68 ( 0.10
43 45
Trp oxidation Trp double oxidation
80
0.69 ( 0.11
0.69
0.67 ( 0.11
46
0.67 ( 0.11
Cys double oxidation
201
0.70 ( 0.10
0.71
(0.47 ( 0.13)
4
(0.47 ( 0.13)
4
Lys glycation
215
0.75 ( 0.12
0.77
(0.35 ( 0.15)
0
(0.35 ( 0.15)
0
10
0.72 ( 0.15
0.76
(0.36 ( 0.17)
0
(0.36 ( 0.16)
0
Lys hydroxylation Lys N-methylation
64 124
0.68 ( 0.08 0.69 ( 0.11
0.68 0.69
(0.60 ( 0.09) (0.57 ( 0.14)
20 22
0.67 ( 0.08 0.68 ( 0.11
44 52
Lys N-dimethylation
193
0.69 ( 0.10
0.69
(0.59 ( 0.14)
26
0.67 ( 0.11
46
Cys triple oxidation
28
N/Ac
N/A
0.68 ( 0.11
N/A
0.68 ( 0.11
N/A
Ser O-mannosylation
a
Average similarity scores and their standard deviations. Unacceptable results are enclosed in parentheses. b Percent of predicted spectra with similarity scores greater than the median score predicted with the optimized model. c Cys triple oxidation model was not built because the simple approaches are acceptable.
Table 3. Distribution of Similarity Scores (Average ( Standard Deviation) in the Training Set and Testing Set for Peptides Containing Phosphorylated Residues, Oxidized Methionine Residues, and Glycated Lysine Residues modified peptides Ser/Thr/Tyr
training set
testing set
0.70 ( 0.10 (n = 2248)
0.71 ( 0.09 (n = 284)
phosphorylation Met oxidation
0.71 ( 0.12 (n = 887)
0.70 ( 0.13 (n = 100)
Lys glycation
0.75 ( 0.12 (n = 215)
0.71 ( 0.11 (n = 24)
Charge-remote pathways do not need a charging proton on the cleavage site. Neutral losses from amino acid side chains, however, often undergo both charge-directed and charge-remote pathways. In the mathematical model described here, both chargedirected and charge-remote pathways are considered for each side-chain neutral loss. That is, no assumption is made regarding whether a neutral loss is charge-directed or charge-remote; the nature of the pathway is derived from the model optimization process. The extent of proton mobility in a protonated peptide precursor ion depends on its charge state and number of basic residues. Due to the extremely high gas-phase basicity of the arginine side chain, the number of arginine residues in a peptide plays a crucial role. When the number of charges is no more than the number of arginine residues, all the charging protons are sequestered on the arginine side chains, leaving no mobile protons on the backbone and other potential cleavage sites. At this extreme, only charge-remote pathways are usually observed. When the number of charges are greater than the number of arginine residues but no more than the number of basic residues (including histidine and lysine, but not the N-terminal amine due to its relatively lower GB), the charging protons are partially mobile. When the number of charges is greater than the number
of basic residues, the protons are fully mobile and charge-directed backbone cleavages usually dominate. Although many post-translational or processed-induced modifications do not affect the general peptide fragmentation patterns, some modifications do affect the patterns significantly. An amino acid side chain modification usually affects the peptide fragmentation pattern in two ways: (1) introduction of a labile side chain, which usually means a fast charge-remote neutral loss process, or (2) introduction of an acidic side chain that facilitates the selective charge-remote backbone cleavage. Comparison of improvements made by the model on different types of modifications (Table 2) reveals that phosphorylation, methionine oxidation, glycation, and O-mannosylation, which introduce labile side chains, make significant improvements. Cysteine double oxidation, which introduces both a labile side chain and the selective charge-remote backbone cleavage, also makes a significant improvement. Improvements made on other modifications are relatively small. The kinetic model described previously20,21,28 and here is largely based on the mobile proton model and some other formerly established fragmentation rules. Due to the complexity of the fragmentation process, it is often difficult to understand the fragmentation behavior of a peptide ion without help from a computer model. Factors to consider include the gas-phase basicity of each side chain, charge density on each fragmentation site, charge repulsion effect, competition of hundreds of fragmentation pathways, etc. Since the model can be conveniently used to check whether a fragmentation behavior can be explained by previously established rules implemented in the model, it can be used to help researchers understand various peptide fragmentation behaviors. Phosphopeptides. Based on the optimized parameters shown in Table S-2 in the Supporting Information, neutral loss of H3PO4 from phosphoserine and phosphothreonine has low activation energies for both charge-directed and charge-remote pathways. The apparent rate constant (rate constant after taking 8645
dx.doi.org/10.1021/ac2020917 |Anal. Chem. 2011, 83, 8642–8651
Analytical Chemistry
ARTICLE
Table 4. Predicted Initial Apparent Rate Constants of Charge-Directed and Charge-Remote Pathways of Some Neutral Losses from Labile Modified Side Chains (Underlined) of Peptides Shown in Figures 1 3 apparent rate constant (s 1) peptide ions
proton mobility
neutral loss
charge directed
charge remote
AAEpSpSDEDpSFEEKR (2+)
partially mobile
H3PO4
2.19 103
3.21 103
AAPAApSERNDR (2+)
nonmobile
H3PO4
13.5
3.22 103
ADDApSDDEDVKVK (3+) AEEAHPApTPVK (2+)
fully mobile partially mobile
H3PO4 H3PO4
2.76 10 579
ALVHGETLMPNNVIpYR (2+)
partially mobile
HPO3
1.0 10
7
60.3
HPO3 + H2O
3.3 10
8
45.1
HPO3
9.1 10
6
60.4
HPO3 + H2O
3.0 10
6
45.2
4.1 10
5
1.13 103
pYGILYPTILR (2+)
fully mobile
3
3.22 103 1.86 103
SRWQQGNVFSCSV(Ox-M)HEALHNHYTQK (3+)
partially mobile
CH3SOH
SR(Ox-W)QQGNVFSCSVMHEALHNHYTQK (3+)
partially mobile
H2O
4.13
36.2
SRWQQGNVFS(Ox2-C)SVMHEALHNHYTQK (3+)
partially mobile
H2SO2 H2O
7.12 432
1.63 103 637
DTL(Ox-M)ISRTPEVTCVVV (2+)
fully mobile
CH3SOH
7.0 10
AA(Ox-W)GK (2+)
fully mobile
H2O
1.03 103
36.1
AA(Ox2-W)GK (2+)
fully mobile
H2O
2.40 103
7.04
DELT(Glyc-K)NQVSLT(Cm-C)LVK (3+)
fully mobile
3 H2O
1.04 104
563
IWSNQDLITVT(Man-S)VSHDTLASFGNWRE (3+)
fully mobile
C6H10O5
1.37 104
4.45 103
the charge density into account; see Supporting Information) of a charge-directed pathway, however, depends on the charge density on the cleavage site. To illustrate the actual rates of neutral losses through the two pathways, the initial apparent rate constants of neutral losses from several phosphopeptides are calculated and shown in Table 4. (The table also shows peptides with other modifications.) It is seen that the neutral loss of H3PO4 from phosphoserine and phosphothreonine can proceed through both charge-directed and charge-remote pathways. The rates of the charge-directed pathways, however, depend strongly on the mobility of the charging protons. Neutral losses of HPO3 and HPO3 + H2O from phosphotyrosine, however, proceed through only charge-remote pathways. Neutral losses from phosphorylated side chains can take place through charge-remote pathways. Backbone fragmentations, on the other hand, are mostly charge directed. Therefore, one expects that phosphopeptides produce large amounts of sequence information with high charge density on the backbone (charge > number of basic sites) and produce predominantly neutral losses with low charge density on the backbone (charge e number of basic sites). Figure 1 shows the predicted CID spectra of these phosphorylated peptides, compared to their experimental spectra. For phosphoserine- and phosphothreonine-containing peptides, when there are no mobile protons available such as in the case of Figure 1B (number of charges e number of Arg), few backbone fragments are observed. With partially mobile protons (number of basic residues g number of charges > number of Arg), such as in the case of Figure 1D, a small number of backbone fragments are observed. Many more backbone fragments are observed, as in the case of Figure 1C, when the protons are fully mobile (number of charges > number of basic residues). The peptide shown in Figure 1A has partially mobile protons but few backbone fragments because of the presence of three phosphoserine residues. Neutral losses from phosphotyrosine take place through only charge-remote pathways, and they are significantly slower than
4
1.13 103
neutral losses from phosphoserine and phosphothreonine (Table 4). Therefore, when there is a partially mobile proton, a weak neutral loss of HPO3 + H2O ( 98 Da) is observed (Figure 1E). When a proton is fully mobile as in the case of Figure 1F, there is virtually no neutral loss from the phosphotyrosine side chain. It has been found by others that basic residues are required for the neutral loss of HPO3 from phosphotyrosine.31,35,36 The phenomenon can be simply explained by suppressed charge-directed backbone fragmentation from the presence of basic residues. This explanation is supported by the observation that cationized phosphopeptides (no mobile protons) tend to have higher extents of neutral losses.37 Because neutral losses from phosphotyrosine are purely charge remote, they have more dependency on the presence of basic amino acids36 and mobile protons.37 The observation that neutral losses from phosphoserine and phosphothreonine proceed through both charge-directed and charge-remote pathways,38 whereas neutral losses from phosphotyrosine proceed through only a charge-remote pathway, allows the prediction model to differentiate phosphopeptides with different phosphorylation sites. More examples of predicted CID spectra of phospeptides are shown in Figure S-2 in the Supporting Information. Peptides Containing Oxidized Methionine, Tryptophan, or Cysteine. Oxidized methionine has been reported to undergo primarily charge-remote neutral loss of CH3SOH (64 Da),39 as is also indicated by the large activation energy of the chargedirected pathway (Table S-2 in the Supporting Information) and the negligible apparent rate constants for the charge-directed pathways in the two methionine-oxidized peptides shown in Table 4. For the peptide with partially mobile protons, a high level of CH3SOH loss is observed (Figure 2A), while virtually no CH3SOH loss is observed for the peptide ion with a fully mobile proton (Figure 2B). It is also interesting to see that the rates of CH3SOH loss in peptides shown in Figure 2A,B are equivalent (Table 4); the fact that little CH3SOH is observed in Figure 2B is due to the much faster charge-directed backbone cleavages. 8646
dx.doi.org/10.1021/ac2020917 |Anal. Chem. 2011, 83, 8642–8651
Analytical Chemistry
ARTICLE
Figure 1. Representative predicted CID spectra of phosphopeptides compared to their experimental spectra.
Cysteine sulfinic acid (product of cysteine double oxidation) has a distinct neutral loss of H2SO2 (66 Da), which is largely charge remote according to the calculation shown in Table 4. Another distinct property of cysteine sulfinic acid is its strong tendency for a charge-remote selective backbone cleavage near its C-terminus29 (mechanism B) (y14 and b11 ions in Figure 2E). In fact, the tendency of cysteine sulfinic acid side chain to facilitate mechanism B type backbone fragmentation is much stronger than the more well-known aspartic acid side chain, based on the activation energies of mechanism B shown in Table S-1 in the Supporting Information. The distinct properties of oxidized methionine and doubly oxidized cysteine described above help in identifying oxidation sites on peptides with more than one potential oxidation site. Figure 2 A,C,E shows CID spectra of the same peptide with different residues oxidized. Although the sequence ions themselves
often do not pinpoint the exact modification sites, the modification sites are determined unambiguously in these cases due to the striking differences in the predicted fragmentation spectra. Tryptophan residues with single or double oxidation tend to lose a water molecule from their side chains, especially when the oxidized residue is on the N-terminus of a peptide or fragment. These water losses are largely charge-directed based on the calculated apparent rate constants shown in Table 4. Doubly oxidized tryptophan side chain also loses a water molecule faster than the singly oxidized tryptophan side chain. Figure 2D,F shows these strong water losses from singly and doubly oxidized tryptophan side chains, especially when the oxidized tryptophan residues are on the N-terminus of a fragment (y3 ions in Figure 2D,F). More examples of predicted CID spectra of oxidized peptides are shown in Figures S-3 to S-5 in the Supporting Information. 8647
dx.doi.org/10.1021/ac2020917 |Anal. Chem. 2011, 83, 8642–8651
Analytical Chemistry
ARTICLE
Figure 2. Representative predicted CID spectra of peptides with oxidized methione, tryptophan, and cysteine residues compared to their experimental spectra.
Peptides Containing Glycated Lysine or O-Mannosylated Serine. Neutral losses from glycated lysine and O-mannosylated
serine side chains can occur through both charge-directed and charge-remote pathways. As a result, abundant neutral losses from these modified residues are often observed regardless of the mobility of the protons. In the two examples shown in Figure 3, these neutral losses are primarily charge directed (Table 4). However, a charge-remote pathway may also dominate in other peptide ions when there is little mobile proton. Glycated lysine side chains primarily lose water molecules, while O-mannosylated serine side chains lose primarily the entire sugar moiety (162 Da) (Table S-4 in the Supporting Information). These observations help in distinguishing these two modifications. More examples of predicted CID spectra of glycated and mannosylated peptides
are shown in Figures S-6 and S-7 in the Supporting Information, respectively. Peptides Containing Hydroxylated or N-Methylated Lysine. After hydroxylation, N-methylation, and N-dimethylation, lysine side chains maintain their basic property with little to moderate changes in their GB values (Table S-1 in the Supporting Information). Therefore, it is not ideal to treat these modified lysine residues as an average residue, whereas giving these modified residues the same parameters as the unmodified lysine residue generated acceptable results (Table 2). The CID spectra of a peptide containing hydroxylysine and a peptide containing N-methylated lysine are predicted using the optimized model and compared to their experimental spectra, as shown in Figure 4. The two peptides shown in Figure 4 also contain S-carboxymethylated cysteine residues. More examples of predicted CID spectra of hydroxylated and 8648
dx.doi.org/10.1021/ac2020917 |Anal. Chem. 2011, 83, 8642–8651
Analytical Chemistry
ARTICLE
Figure 3. Representative predicted CID spectra of peptides with glycated lysine (A) and O-mannosylated serine (B) residues compared to their experimental spectra.
Figure 4. Representative predicted CID spectra of peptides with hydroxylysine (A), methylated lysine (B), and S-carboxymethylated cysteine (both) residues compared to their experimental spectra.
methylated peptides are shown in Figures S-8 and S-9, respectively, in the Supporting Information. Characterization of Post-translational or In-Process Modifications Using the Prediction Model. Accurate prediction of the fragmentation pattern of a modified peptide does not guarantee a reliable identification of the modification type and modification site, especially for labile modifications such as phosphorylation and glycosylation, when very limited sequence information is present in the tandem mass spectrum. Due to their distinct neutral losses of different types of labile modifications, the type of modifications can usually be reliably identified, even for isobaric modifications such as glycation and mannosylation. However, reliably identifying the location of the modification requires more peptide backbone fragmentations. With enough backbone fragmentations, the prediction model will be able to provide sufficient information to distinguish modifications at different sites. Because a charge is not required for most side-chain neutral losses and is required for most backbone fragmentation, more charges are needed on the peptide to get more sequence information. This can be achieved by simple approaches such as decreased ion-source temperature and other in-source ion energies, as well as reduced trifluoroacetic acid (TFA) concentration in the mobile phase. Electrospray of a peptide usually generates protonated peptide ions with several charge states. By acquiring
MS/MS on ions with different charge states, the type of modification can be identified from their distinct neutral losses of low-charge ions, and the peptide sequence and modification site can be identified from backbone fragments of high-charge ions. For recombinant proteins when the peptide analytes are relatively simple and the search space is relatively small, this simple approach may be more advantageous than the MS3 approach. In industrial settings where purified therapeutic proteins are characterized, after implementation of the described prediction model, modifications are often confidently identified based on the accurately determined peptide mass and fragmentation pattern, even for many labile modifications (glycosylation, glycation, mannosylation, met-oxidation, etc.) that generate very limited sequence information. In other settings when a complex protein mixture is analyzed, or when ambiguities exist, alternative fragmentation techniques such as electron-transfer dissociation (ETD) or electron-capture dissociation (ECD) has been proved useful.40 42 A similar kinetic model for peptide ETD and ECD has been described in a separate article.43 Since most labile side chains described here are stable under ETD or ECD conditions, ETD and ECD spectra of most modified peptides are predicted accurately by assuming that they have the same parameters as an average amino acid residue. 8649
dx.doi.org/10.1021/ac2020917 |Anal. Chem. 2011, 83, 8642–8651
Analytical Chemistry Although the similarity score (eq 3 in the Supporting Information) used in the work is a good indicator of the accuracy of the predicted spectrum, it may or may not be a good score function for the purpose of peptide identification. The reason is that some labile neutral losses from modified peptides often dominate the spectrum. As a result, these nonsequence specific ions can make a very large contribution to the similarity score without improving the certainty of peptide identification. An ideal score function for peptide identification is application dependent. For example, the similarity score described here is appropriate for characterization of therapeutic proteins, in which the sequence search space is small, and false negatives are often a bigger problem than false positives. However, when a complex protein mixture is analyzed, such as in the case of a proteomics experiment, the sequence search space is very large, and a false positive problem is much more severe, a different score function may be necessary to put more weight on sequence ions. For example, a good score function for proteomics application is a “Ranked SIM” as described by Yen et al.27
’ CONCLUSION The kinetic model for predicting peptide CID spectra is extended to the prediction of CID spectra of modified peptides. For many modifications, the original model works well when the modified residue is assumed to have either the same properties as the unmodified residue or the properties of an average residue. Other modifications require that each modified residue is considered as a distinct residue with related parameters added into the model and optimized. With the optimized model, CID spectra of modified peptides with many different types of modifications are predicted with reasonable accuracy. Predicted spectra are used for identification of post-translational and process-induced modifications in therapeutic proteins and can be potentially used in proteomics experiments, when an appropriate score function is implemented. ’ ASSOCIATED CONTENT
bS
Supporting Information. Additional information as noted in text. This material is available free of charge via the Internet at http://pubs.acs.org.
’ AUTHOR INFORMATION Corresponding Author
E-mail:
[email protected]. Fax: (805) 376-2354.
’ ACKNOWLEDGMENT The author thanks Karl Clauser and Vicent Fusaro at the Broad Institute of Harvard and MIT and Kimberly Lee and Wen Yu at Amgen Washington for contributing spectra of phosphorylated peptides, Wen Yu of Amgen Washington for selecting highquality spectra of phosphorylated peptides from the ISB data set, Bhavana Shah, Jason Richardson, Gang Xiao, Drew Nichols, and Da Ren for their help in collecting other data for training the model, Le Zhang for performing reductive methylation and tryptic digestion of the IgG2 antibody, and Pavel Bondarenko and Joseph Phillips for helpful discussions during preparation of the manuscript. This work was funded by Amgen Inc.
ARTICLE
’ REFERENCES (1) Jensen, O. N. Curr. Opin. Chem. Biol. 2004, 8, 33–41. (2) Stults, J. T.; Arnott, D. Methods Enzymol. 2005, 402, 245–289. (3) Ahn, N. G.; Shabb, J. B.; Old, W. M.; Resing, K. A. ACS Chem. Biol. 2007, 2, 39–52. (4) Srebalus Barnes, C.; Lim, A. Mass Spectrom. Rev. 2007, 26, 370–388. (5) Zhang, Z.; Pan, H.; Chen, X. Mass Spectrom. Rev. 2009, 28, 147–176. (6) Morelle, W.; Canis, K.; Chirat, F.; Faid, V.; Michalski, J. C. Proteomics 2006, 6, 3993–4015. (7) Paradela, A.; Albar, J. P. J. Proteome Res. 2008, 7, 1809–1818. (8) Martin, C.; Zhang, Y. Nat. Rev. Mol. Cell Biol. 2005, 6, 838–849. (9) Martinez, T.; Pace, D.; Brady, L.; Gerhart, M.; Balland, A. J. Chromatogr., A 2007, 1156, 183–187. (10) Lommel, M.; Strahl, S. Glycobiology 2009, 19, 816–828. (11) Myllyla, R.; Wang, C. G.; Heikkinen, J.; Juffer, A.; Lampela, O.; Risteli, M.; Ruotsalainen, H.; Salo, A.; Sipila, L. J. Cell. Physiol. 2007, 212, 323–329. (12) Manning, M. C.; Chou, D. K.; Murphy, B. M.; Payne, R. W.; Katayama, D. S. Pharm. Res. 2010, 27, 544–575. (13) Annan, R.; Carr, S. Anal. Chem. 1996, 68, 3413–3421. (14) Boersema, P. J.; Mohammed, S.; Heck, A. J. R. J. Mass Spectrom. 2009, 44, 861–878. (15) Frolov, A.; Hoffmann, P.; Hoffmann, R. J. Mass Spectrom. 2006, 41, 1459–1469. (16) Wuhrer, M.; Catalina, M. I.; Deelder, A. M.; Hokke, C. H. J. Chromatogr., B 2007, 849, 115–128. (17) Dongre, A.; Jones, J.; Somogyi, A.; Wysocki, V. J. Am. Chem. Soc. 1996, 118, 8365–8374. (18) Cox, K.; Gaskell, S.; Morris, M.; Whiting, A. J. Am. Soc. Mass Spectrom. 1996, 7, 522–531. (19) Paizs, B.; Suhai, S. Mass Spectrom. Rev. 2005, 24, 508–548. (20) Zhang, Z. Anal. Chem. 2004, 76, 3908–3922. (21) Zhang, Z. Anal. Chem. 2005, 77, 6364–6373. (22) Zhang, Z. Anal. Chem. 2004, 76, 6374–6383. (23) Zhang, Z. Anal. Chem. 2009, 81, 8354–8364. (24) Sun, S.; Meyer-Arendt, K.; Eichelberger, B.; Brown, R.; Yen, C.-Y.; Old, W. M.; Pierce, K.; Cios, K. J.; Ahn, N. G.; Resing, K. A. Mol. Cell. Proteomics 2007, 6, 1–17. (25) Yu, W.; Taylor, J. A.; Davis, M. T.; Bonilla, L. E.; Lee, K. A.; Auger, P. L.; Farnsworth, C. C.; Welcher, A. A.; Patterson, S. D. Proteomics 2010, 10, 1172–1189. (26) Yen, C.-Y.; Meyer-Arendt, K.; Eichelberger, B.; Sun, S.; Houel, S.; Old, W. M.; Knight, R.; Ahn, N. G.; Hunter, L. E.; Resing, K. A. Mol. Cell. Proteomics 2009, 8, 857–869. (27) Yen, C. Y.; Houel, S.; Ahn, N. G.; Old, W. M. Mol. Cell. Proteomics 2011, 10, No. M111.007666. (28) Zhang, Z.; Shah, B. Anal. Chem. 2010, 82, 10194–10202. (29) Wang, Y.; Vivekananda, S.; Men, L.; Zhang, Q. J. Am. Soc. Mass Spectrom. 2004, 15, 697–702. (30) Men, L.; Wang, Y. Rapid Commun. Mass Spectrom. 2005, 19, 23–30. (31) Degnore, J. P.; Qin, J. J. Am. Soc. Mass Spectrom. 1998, 9, 1175–1188. (32) Jiang, X.; Smith, J.; Abraham, E. J. Mass Spectrom. 1996, 31, 1309–1310. (33) Lagerwerf, F.; van de Weert, M.; Heerma, W.; Haverkamp, J. Rapid Commun. Mass Spectrom. 1996, 10, 1905–1910. (34) Lioe, H.; O’Hair, R.; Reid, G. Rapid Commun. Mass Spectrom. 2004, 18, 978–988. (35) Metzger, S.; Hoffmann, R. J. Mass Spectrom. 2000, 35, 1165–1177. (36) Moyer, S. C.; Cotter, R. J.; Woods, A. S. J. Am. Soc. Mass Spectrom. 2002, 13, 274–283. (37) Moyer, S. C.; VonSeggern, C. E.; Cotter, R. J. J. Am. Soc. Mass Spectrom. 2003, 14, 581–592. (38) Palumbo, A. M.; Tepe, J. J.; Reid, G. E. J. Proteome Res. 2008, 7, 771–779. 8650
dx.doi.org/10.1021/ac2020917 |Anal. Chem. 2011, 83, 8642–8651
Analytical Chemistry
ARTICLE
(39) Reid, G.; Roberts, K.; Kapp, E.; Simpson, R. J. Proteome Res. 2004, 3, 751–759. (40) Zubarev, R. A. Curr. Opin. Biotechnol. 2004, 15, 12–16. (41) Mikesh, L. M.; Ueberheide, B.; Chi, A.; Coon, J. J.; Syka, J. E. P.; Shabanowitz, J.; Hunt, D. F. Biochim. Biophys. Acta, Proteins Proteomics 2006, 1764, 1811–1822. (42) Wiesner, J.; Premsler, T.; Sickmann, A. Proteomics 2008, 8, 4466–4483. (43) Zhang, Z. Anal. Chem. 2010, 82, 1990–2005.
8651
dx.doi.org/10.1021/ac2020917 |Anal. Chem. 2011, 83, 8642–8651