Comprehensive and Reliable Phosphorylation Site Mapping of

Jun 12, 2009 - CAS Key Laboratory of Separation Sciences for Analytical Chemistry, National Chromatographic R&A Center, Dalian Institute of Chemical P...
1 downloads 7 Views 879KB Size
Anal. Chem. 2009, 81, 5794–5805

Comprehensive and Reliable Phosphorylation Site Mapping of Individual Phosphoproteins by Combination of Multiple Stage Mass Spectrometric Analysis with a Target-Decoy Database Search Guanghui Han,† Mingliang Ye,*,† Xinning Jiang,† Rui Chen,† Jian Ren,‡ Yu Xue,‡ Fangjun Wang,† Chunxia Song,† Xuebiao Yao,‡ and Hanfa Zou*,† CAS Key Laboratory of Separation Sciences for Analytical Chemistry, National Chromatographic R&A Center, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China, and Hefei National Laboratory for Physical Sciences at Microscale and School of Life Sciences, University of Science & Technology of China, Hefei 230027, China Since the emergence of proteomics, much attention has been paid to the development of new technologies for phosphoproteomcis analysis. Compared with large scale phosphorylation analysis at the proteome level, comprehensive and reliable phosphorylation site mapping of individual phosphoprotein is equally important. Here, we present a modified target-decoy database search strategy for confident phosphorylation site analysis of individual phosphoproteins without manual interpretation of spectra. Instead of using all protein sequences in a proteome database of an organism for the construction of a targetdecoy database for phosphoproteome analysis, the composite database constructed for phosphorylation site analysis of individual phosphoproteins only included the sequences of the individual target proteins and a decoy version of a small inhomogeneous protein database. It was found that the confidence of phosphopeptide identifications could be effectively controlled when the acquired MS2 and MS3 spectra were searched against the above composite database followed with data processing. Because of the small size of the composite database, the computation time for the database search is very short, which allows the adoption of low-specificity proteases for protein digestion to increase the coverage of phosphorylation site mapping. The sensitivity and comprehensive phosphorylation site mapping of this approach was demonstrated by using two standard phosphoprotein samples of r-casein and β-casein, and this approach was further applied to analyze the phosphorylation of the cyclic AMP-dependent protein kinase (PKA), which resulted in the identification of 17 phosphorylation sites, including five novel sites on four PKA subunits. * To whom correspondence should be addressed: (H. Zou) Phone: +86-41184379610. Fax: +86-411-84379620. E-mail: [email protected]. (M. Ye) Phone: +86-411-84379620. Fax: +86-411-84379620. E-mail: [email protected]. † Chinese Academy of Sciences. ‡ University of Science & Technology of China.

5794

Analytical Chemistry, Vol. 81, No. 14, July 15, 2009

Reversible protein phosphorylation is a central cellular regulatory mechanism in modulating protein activity and propagating signals within cellular pathways and networks. Conversely, abnormal phosphorylation is a cause or consequence of multiple diseases, including cancer.1 Knowing the phosphorylated residues in proteins is fundamental for understanding the various signaling events in which they partake; therefore, much effort has been invested in trying to identify and characterize phosphorylation sites. In many cases, a protein can be phosphorylated on multiple sites, which can either act independently or synergistically when phosphorylated simultaneously. Thus, improved methods with which to comprehensively, sensitively, and reliably detect and analyze phosphorylation sites have always been sought to understand this important modification.2-4 Traditional methods for measuring protein phosphorylation such as mutational analysis and Edman degradation chemistry on phosphopeptides have the disadvantage of being relatively timeconsuming and laborious, requiring large amounts of purified protein. Although there are a variety of methods available, mass spectrometry (MS) recently has become the primary choice for the study of protein phosphorylation because of its high sensitivity, selectivity, and speed.5-7 Presently, most MS-based phosphoproteomics analyses adopt the “bottom-up” approach. This approach involves enzymatic cleavage of proteins, most often by trypsin, with subsequent phosphopeptide enrichment and nano-LC-MS/ MS analysis to identify phosphopeptides. Even though large scale phosphoproteome analyses could presently identify ten thousands of phosphorylation sites from a single biologic sample,8,9 mapping of phosphorylation sites for individual phosphoproteins is not comprehensive because of the extreme complexity of the pro(1) Hunter, T. Cell 2000, 100, 113–127. (2) Olsen, J. V.; Blagoev, B.; Gnad, F.; Macek, B.; Kumar, C.; Mortensen, P.; Mann, M. Cell 2006, 127, 635–648. (3) Beausoleil, S. A.; Villen, J.; Gerber, S. A.; Rush, J.; Gygi, S. P. Nat. Biotechnol. 2006, 24, 1285–1292. (4) Schmezle, K.; White, F. M. Curr. Opin. Biotechnol. 2006, 17, 406–414. (5) Mann, M.; Ong, S. E.; Gronborg, M.; Steen, H.; Jensen, O. N.; Pandey, A. Trends Biotechnol. 2002, 20, 261–268. (6) Aebersold, R.; Mann, M. Nature 2003, 422, 198–207. (7) Han, G. H.; Ye, M. L.; Zou, H. F. Analyst 2008, 133, 1128–1138. (8) Zhai, B.; Ville´n, J.; Beausoleil, S. A.; Mintseris, J.; Gygi, S. P. J. Proteome Res. 2008, 7, 1675–1682. 10.1021/ac900702g CCC: $40.75  2009 American Chemical Society Published on Web 06/12/2009

teome sample.10,11 For example, only two phosphorylation sites on period 2 protein could be identified by large-scale phosphoproteome analysis of the sample, while detailed analysis of the individual phosphoprotein resulted in detection of more than 20 in vivo phosphorylation sites.12 Therefore, in order to comprehensively and reliably localize phosphorylation sites of some individual phosphoproteins, detailed analysis of a sample containing only one or a few phosphoproteins is desirable. The most challenge step for the mapping of phosphorylation sites on individual phosphoproteins is how to confidently identify phosphopeptides. Phosphopeptide identification is based on peptide fragmentation by collisionally activated tandem mass spectrometry (MS/MS or MS2). However, the MS2 spectra for phosphopeptides often lack enough fragment peaks due to neural loss of H3PO4, and the assignment of phosphorylation sites was ambiguous in most instances when the peptides contain several potential phosphorylation sites.13 Therefore, manual interpretation is often used to localize the phosphorylation sites.14 However, this is a very time-consuming and laborintensive procedure that has become impractical because data sets have grown in size. In addition, success of this strategy strongly depends on personal experience to analyze the data sets. Thus, the obtained results are typically not objective, and confidence of identification is hard to control. To circumvent these limitations, Schlosser et al.12 have developed a novel score scheme for in-depth analysis of individual phosphoproteins. In their scoring scheme, the approach that an expert mass spectrometrist would use for manual interpretation of phosphopeptide MS2 spectra was mimiced. It was demonstrated that their scheme was very useful in assisting phosphorylated site mapping. Because of low quality of MS2 spectra for phosphopeptides, their scheme still lacks enough sensitivity. As supplementary to MS2, a neutral loss peak could be further fragmented to generate MS3 spectrum, and more fragment information could be obtained. Some phosphopeptides that could not be identified by MS2 were successfully identified by MS3.15-17 MS3 spectra were demonstrated to be beneficial for phosphoproteome analysis, especially when the peptide assignments derived from MS2 and MS3 were combined.18,19 Therefore, combinational usage of MS2 and MS3 should also lead (9) Bodenmiller, B.; Malmstrom, J.; Gerrits, B.; Campbell, D.; Lam, H.; Schmidt, A.; Rinner, O.; Mueller, L. N.; Shannon, P. T.; Pedrioli, P. G.; Panse, C.; Lee, H. K.; Schlapbach, R.; Aebersold, R. Mol. Syst. Biol. 2007, 3, 11. (10) Graham, M. E.; Anggono, V.; Bache, N.; Larsen, M. R.; Craft, G. E.; Robinson, P. J. J. Biol. Chem. 2007, 282, 14695–14707. (11) Craft, G. E.; Graham, M. E.; Bache, N.; Larsen, M. R.; Robinson, P. J. Mol. Cell. Proteomics 2008, 7, 1146–1161. (12) Schlosser, A.; Vanselow, J. T.; Kramer, A. Anal. Chem. 2007, 79, 7439– 7449. (13) Edelson-Averbukh, M.; Pipkorn, R.; Lehmann, W. D. Anal. Chem. 2007, 79, 3476–3486. (14) Schlosser, A.; Vanselow, J. T.; Kramer, A. Anal. Chem. 2005, 77, 5243– 5250. (15) Beausoleil, S. A.; Jedrychowski, M.; Schwartz, D.; Elias, J. E.; Villen, J.; Li, J. X.; Cohn, M. A.; Cantley, L. C.; Gygi, S. P. Proc. Natl. Acad. Sci. U. S. A. 2004, 101, 12130–2135. (16) Olsen, J. V.; Mann, M.H Proc. Natl. Acad. Sci. U. S. A. 2004, 101, 13417– 13422. (17) Lee, J.; Xu, Y.; Chen, Y.; Sprung, R.; Kim, S. C.; Xie, S.; Zhao, Y. Mol. Cell. Proteomics 2007, 6, 669–676. (18) Jiang, X.; Han, G.; Feng, S.; Jiang, X.; Ye, M.; Yao, X.; Zou, H. J. Proteome Res. 2008, 7, 1640–1649. (19) Ulintz, P. J.; Bodenmiller, B.; Andrews, P. C.; Aebersold, R.; Nesvizhskii, A. I. Mol. Cell. Proteomics 2008, 7, 71–87.

to more confident and more sensitive mapping of phosphorylation sites for individual phosphoproteins in a less complex sample. Target-decoy search is a good approach for the evaluation of the confidence of peptide identification for proteome analysis.3,20,21 After database searching against a composite protein database, including target (forward) and decoy (reversed) sequences of all proteins in the proteome of an organism, a false discovery rate (FDR) can be easily determined through the number of decoy identifications. Using the target-decoy search strategy for the acquired spectra, a data set of peptide identifications with low FDR (for example, 2%) could be easily established through postsearch filtering with easily accessible criteria. In order to circumvent labor-intensive manual validation and control the confidence of phosphopeptide identification, the target-decoy approach was successfully applied for phosphoproteome analysis. For large-scale analysis, a high-accuracy mass spectrometer incorporated with a MS2 target-decoy search strategy2,3 and a low-accuracy mass spectrometer (such as ion trap mass spectrometer) with a MS2/ MS3 target-decoy search strategy18,19,22 have been reported to obtain high confident phosphopeptide identification and precise site location without manual validation. However, to the best of our knowledge, a MS2/MS3 target-decoy search strategy for comprehensive mapping of phosphorylation sites on individual phosphoproteins has not been reported. Here, we present a methodology for confident phosphorylation site analysis of individual phosphoproteins by a MS2/MS3 targetdecoy strategy. Instead of using all protein sequences in a proteome database of an organism for the construction of a target-decoy database for phosphoproteome analysis, the composite database constructed for phosphorylation site analysis of individual phosphoproteins only included the sequences of the target individual protein(s) and a decoy version of a small inhomogeneous protein database. The effectiveness of using the above small composite database to control the confidence of phosphopeptide identifications for the analysis of individual phosphoproteins was demonstrated by analysis of phosphorylation sites of R-casein and β-casein. Because of the extremely slow database searching when low-specificity proteases are applied, phosphoproteome analysis is limited to using highspecific proteases like trypsin for digestion of proteins. However, the composite database for phosphorylation site mapping of individual proteins is much smaller, and the database search is much faster. Thus, low-specificity proteases could be applied to increase the coverage of phosphorylation site mapping. In combination with a multiprotease digestion approach, phosphorylation sites of R-casein and β-casein can be comprehensively, sensitively, and reliably detected and located. It was further applied to analyze phosphorylation of the cyclic AMPdependent protein kinase (PKA), and 17 phosphorylation sites were confidently located on four PKA subunits. As the confidence of phosphopeptide identification could be easily controlled with the target-decoy approach, no manual inter(20) Elias, J. E.; Gygi, S. P. Nat. Methods 2007, 4, 207–214. (21) Lu, B. W.; Ruse, C.; Xu, T.; Park, S. K.; Yates, J. Anal. Chem. 2007, 79, 1301–1310. (22) Han, G. H.; Ye, M. L.; Zhou, H. J.; Jiang, X. N.; Feng, S.; Jiang, X. G.; Tian, R. J.; Wan, D. F.; Zou, H. F.; Gu, J. R. Proteomics 2008, 8, 1346–1361.

Analytical Chemistry, Vol. 81, No. 14, July 15, 2009

5795

pretation of MS spectra is required, which allows this approach to be used more easily and simply. EXPERIMENTAL SECTION Chemicals and Materials. All water used in this experiment was prepared using a Milli-Q system (Millipore, Bedford, MA). A ZipTipC18 pipet tip was purchased from Millipore. Dithiothreitol (DTT), ammonium bicarbonate (NH4HCO3), and iodoacetamide (IAA) were all purchased from Bio-Rad (Hercules, CA). Formic acid (FA) and acetonitrile (ACN) were obtained from Aldrich (Milwaukee, WI). Urea, trifluoroacetic acid (TFA), sodium chloride (NaCl), R-casein, β-casein, thermolysin, trypsin (TPCK-treated, proteomics grade), and cyclic AMP-dependent protein kinase (from bovine heart) were all purchased from Sigma (St. Louis, MO); elastase, proteinase K (PCR grade), and endoproteinase Glu-C (sequencing grade) were from Roche (Mannheim, Germany). All chemicals were of analytical grade except acetonitrile, which was of HPLC grade. Proteolytic Cleavage. For R-casein and β-casein, a total of 25 µg of protein was diluted to 100 µL with 0.1 M NH4HCO3 (pH 8), and then divided into 5 aliquots. About 0.2 µg of each protease was used for digestion, respectively. The digestions with trypsin, elastase, proteinase K, Glu-C, and thermolysin were performed overnight at 37 °C in 0.1 M NH4HCO3 (pH 8) for 18 h. All digests were dried in a vacuum concentrator and redissolved in 20 µL of 80% ACN, 6% TFA and then subjected to phosphopeptide enrichment. For digestion of a cyclic AMP-dependent protein kinase (PKA) sample, a total of 100 µg of protein was diluted to 20 µL with a solution containing 8 M urea and 50 mM Tris-HCl at pH 8.3 and then divided into 5 aliquots. After that, 0.4 µL of 1 M DTT was added to each solution. The protein solutions were incubated at 56 °C for 45 min, and then 2 µL of 1 M IAA was added and incubated for an additional 30 min at room temperature in darkness. The protein solutions were diluted by 10-fold with 0.1 M NH4HCO3 (pH 8) for trypsin, elastase, proteinase K, Glu-C, and thermolysin digestion. About 0.8 µg of each protease was used for digestion. The digestions with trypsin, elastase, proteinase K, Glu-C, and thermolysin were performed overnight at 37 °C in 0.1 M NH4HCO3 (pH 8) for 18 h. After incubation, 2.5 µL of each digest was dispensed into a clean tube, and then desalted with ZipTipC18 as product’s instruction for protein identification by LC-MS2, respectively. Another 30 µL of each digest was dried in a vacuum concentrator and redissolved in 40 µL of 80% ACN, 6% TFA and then subjected to phosphopeptides enrichment. Enrichment of Phosphopeptides. Immobilized titanium ion affinity chromatography (Ti4+-IMAC) using phosphonate groups as chelating groups is a new generation of IMAC with high specificity for phosphopeptides.23 Phosphopeptides in the above peptide mixtures were separately enriched by Ti4+-IMAC as follows. The peptide mixture was first incubated with 10 µL of Ti4+-IMAC beads (homemade, 10 mg mL-1) in a loading buffer (80% ACN, 6% TFA) with a vibration of 30 min. The supernatant was removed after centrifugation, and the beads with captured phosphopeptides were washed with 50 µL of two washing (23) Yu, Z. Y.; Han, G. H.; Ye, M. L.; Sun, S. T.; Jiang, X. N.; Chen, R.; Wang, F. J.; Wu, R. A.; Zou, H. F. Anal. Chim. Acta 2009, 636, 34–41.

5796

Analytical Chemistry, Vol. 81, No. 14, July 15, 2009

buffers (50% ACN, 6% TFA containing 200 mM NaCl as washing buffer 1; 30% ACN, 0.1% TFA as washing buffer 2). The bound phosphopeptides were then eluted with 20 µL of 10% NH3 · H2O under sonication for 10 min. After centrifugation at 20000 g for 5 min, the supernatant was collected and lyophilized to dryness for phosphorylation analysis by LC-MS2-MS3. Mass Spectrometric Analysis. Nano-LC-MS2-MS3 was performed on a nano-RPLC-MS/MS system. A Finnigan surveyor MS pump (Thermo Electron Finnigan, San Jose, CA) was used to deliver the mobile phase. For the capillary separation column, one end of the fused silica capillary (75 µm i.d. × 120 mm length) was manually pulled to a fine point, ∼5 µm, with a flame torch. The column was in-house packed with C18 AQ beads (5 µm, 120 Å) from Michrom BioResources (Auburn, CA) using a pneumatic pump. The nano-RPLC column was directly coupled to a LTQ linear ion trap mass spectrometer from Thermo Finnigan with a nanospray source. The mobile phase consisted of mobile phase A, 0.1% formic acid (v/v) in H2O, and mobile phase B, 0.1% (v/v) formic acid in acetonitrile. The samples were manually loaded onto the C18 capillary column using a 75 µm i.d. × 220 mm length empty capillary as sample loop first, and then the reversed phase gradient was executed from 5% to 35% mobile phase B in 60 min at about 200 nL/min. A Finnigan LTQ linear ion trap mass spectrometer equipped with an ESI nanospray source was used for the MS experiment with an ion transfer capillary at 180 °C, and a voltage of 1.8 kV was applied to the cross. The LTQ instrument was operated in positive ion mode. Normalized collision energy was 35%. System control and data collection were done by Xcalibur software version 1.4. For protein identifications of PKA samples, one microscan was set for each MS and MS2 scan. All MS and MS2 spectra were acquired in the data-dependent mode. The mass spectrometer was set such that one full MS scan was followed by six MS2 scans on the six most intense ions. The Dynamic Exclusion was set as follows: repeat count 2, repeat duration 30 s, and exclusion duration 90 s. For phosphorylation analysis of all samples, the mass spectrometer was set so that one full MS scan was followed by three MS2 scans and three neutral loss MS3 scans with the following Dynamic Exclusion settings: repeat count 2, repeat duration 30 s, exclusion duration 60 s. The detection of phosphopeptides was performed in which the mass spectrometer was set as a full scan MS followed by three data-dependent MS2. A subsequent MS3 spectrum was automatically triggered when one of the 10 most intense peaks from the MS2 spectrum corresponded to a neutral loss event of 98, 49, and 32.7 ± 1 Da for the precursor ion with 1+, 2+, 3+ charge states, respectively. Database Searching and Data Analysis. The peak lists for MS2 and MS3 spectra were generated from the raw data by Bioworks 3.2 (Thermo Electron) with the following parameters: mass range, 600-3500 Da; intensity threshold, 1000; precursor ion tolerance, 1.4 Da; group scan, 1; minimum group count, 1; and minimum ion count, 10. For identification of proteins from PKA samples, the acquired MS2 spectra were searched using Sequest (version 0.27) against a composite database including a bovine protein database and its reversed version with the following parameters: precursor-ion mass tolerance, 2 Da; fragment-ion mass tolerance, 1 Da;

Table 1. Cleavage Sites of the Proteases enzyme name

offset

cleavage sites

sites without cleavage

Glu-C trypsin elastase thermolysin proteinase K

after after after before

E KR ALIVGS LFIVMA -

P P P -

enzyme, set as shown in Table 1; missed cleavages, 2; and static modification, Cys (+57). Dynamic modifications were set for oxidized Met (+16). The bovine database was a bovine proteome sequence database (ipi.BOVIN.v3.32.fasta) from the European Bioinformatics Institute, which included 32947 entries (ftp:// ftp.ebi.ac.uk/pub/databases/IPI/current/). For identification of proteins, the following criteria were used: cross-correlation values (Xcorr) g 2.0, 2.5, and 3.8 for singly, doubly, and triply charged peptides,24 respectively, and increases in the values of ∆Cn until FDR e 2%. For phosphorylation analysis, the MS2 and MS3 spectra were searched using Sequest (version 0.27) against a composite database, including R-S1-casein, R-S2-casein, β-casein sequences (or sequences of identified background proteins or PKA subunits for PKA samples), and a reversed yeast database (1000 entries as the decoy database) with the following parameters: precursor-ion mass tolerance, 2 Da; fragment-ion mass tolerance, 1 Da; enzyme, set as shown in Table 1; missed cleavages, 2; and static modification, none for casein and Cys (+57) for PKA. For searching MS2 data, dynamic modifications were set for oxidized Met (+16), phosphorylated Ser, Thr, and Tyr (+80). For searching MS3 data, besides the above set, dynamic modifications were also set for water loss on Ser and Thr (-18). For phosphopeptides identified by MS2, the following criteria were used: Xcorr g2.0, 2.5, and 3.8 for singly, doubly, and triply charged peptides,24 respectively, and increases in the values of ∆Cn until FDR e 2% or minimum FDR. For phosphopeptide identification by matching the assigned sequences derived from MS2 and MS3 data, a homemade software named APIVASE18 (automatic phosphopeptide identification validating algorithm for Sequest) was applied to validate the identifications. APIVASE is available free for academic users from http://bioanalysis.dicp.ac.cn/proteomics/ software/APIVASE.html. This approach was termed the MS2/ MS3 target-decoy database search approach or MS2/MS3 approach in short. Briefly, there are five steps in the MS2/ MS3 approach: (1) evaluation of the charge state to remove invalid MS2/MS3 pairs, (2) performing MS2 and MS3 targetdecoy database searches separately, (3) reassignment of the peptide scores in Sequest output to generate a list of peptide identifications for pair of MS2/MS3 spectra, (4) filtering candidate phosphopeptides with new defined parameters (Rank’m, ∆Cn’m and Xcorr’s) to achieve phosphopeptide identification with specific FDR, and (5) the phosphorylation site localizations were determined by Tscore as described by Jiang et al.18 In this study, to achieve FDR e 2%, cutoff filters such as Rank’m, ∆Cn’m, and Xcorr’s were used to filter the data.

Table 2. Phosphorylation Sites of r-Casein Identified by Different Approaches MS2/MS3 R-casein

MS2

trypsin Glu-C elastase thermolysin

   

S1 S56a,b S61a,b S63a,b T64b S79a S81a S82a S83a S90a S103c S130a S2 S23a S24a,b S25a,b S28b S31a,b S46a S71a S72a S73a S76a S144a,b T145b S146a,b S150b T153c S158a

   

   





     





     

   

   



 







  



proteinase K trypsin



   

a PhosphorylationsiteinformationfromExPasy(http://www.expasy.org). Phosphorylation site information from Phospho.ELM (http:// phospho.elm.eu.org). c Phosphorylation sites localized in this study but not reported previously. b

Table 3. Phosphorylation Sites of β-Casein Identified by Different Approaches MS2/MS3

MS2

β-casein trypsin Glu-C elastase proteinase K thermolysin trypsin S30a,b S32a,b S33a,b S34a,b S37b S50a T56b S111c S137c S139b S181c

 

   

 

   

   



 

 



a PhosphorylationsiteinformationfromExPasy(http://www.expasy.org). Phosphorylation site information from Phospho.ELM (http:// phospho.elm.eu.org). c Phosphorylation sites localized in this study but not reported previuosly. b

RESULTS AND DISCUSSION Because of the well-characterized phosphorylation sites, two standard phosphoprotein samples, R-casein (P02662 and P02663) and β-casein (P02666), were chosen to test our methodology. In order to evaluate the performance of the phosphorylation site analysis, four standard measurements of accuracy (Ac), sensitivity (Sn), specificity (Sp), and the Mathew correlation coefficient (MCC) were used.25 In this work, the known phosphorylation sites of casein from ExPasy (http://www.expasy.org) and Phospho.ELM26 (http://phospho.eAnalytical Chemistry, Vol. 81, No. 14, July 15, 2009

5797

lm.eu.org) were regarded as positive sites (see Table 2 for the phosphorylation sites of R-casein and Table 3 for the phosphorylation sites of β-casein), while all the other (S, T, and Y) sites in the sequences of casein were regarded as negative sites. For the sites which were identified as positive, known phosphorylation ones were defined as true positives (TP), while the others were defined as false positives (FP). For the sites that were identified as negative, real positive sites were defined as false negatives (FN), while the others were called true negatives (TN). Four standard measurements of Ac, Sn, Sp, and MCC were defined as follows25

Ac )

Sn )

TP TP + FN

Sp )

TN TN + FP

TP + TN TP + FP + TN + FN

MCC ) (TP × TN) - (FN × FP) √(TP + FN) × (TN + FP) × (TP + FP) × (TN + FN) Sn and Sp illustrate correct identification ratios of positive and negative sites, respectively, and Ac illustrates correct identification ratios of positive and negative sites. Larger values of Sn, Sp, and Ac stand for more correct identification, in other words, better performance for phosphorylation site localization. However, when the number of positive and negative data differ too much from each other, MCC should be calculated to assess the identification performance. The value of MCC ranges from -1 to 1, and larger MCC values also stands for better identification performance.25 The MS2 spectra for phosphopeptides often lack enough fragment peaks due to neural loss of H3PO4, and manual interpretation is used to verify phosphopeptide identification for mapping of phosphorylation sites in individual phosphoproteins.27 Only expert mass spectrometrists could effectively identify the phosphopeptides via manual interpretation. Even worse, confidence of the identifications is unknown, and results are not objective. The target-decoy database search is a popular approach for controlling the confidence of peptide identification in proteome analysis.3,20,21 In this study, the target-decoy database search approach was applied to control the confidence of phosphopeptide identifications for individual phosphoproteins. In proteome analysis, the composite database for database search was constructed by inclusion of target (forward) and decoy (reversed) sequences of proteins presented in the proteome of an organism. However, for phosphorylation analysis of individual proteins in this study, a composite database was constructed by inclusion of sequences of proteins presented in the sample (target proteins) and a decoy version of a large enough inhomogeneous (24) Jiang, X. N.; Jiang, X. G.; Han, G. H.; Ye, M. L.; Zou, H. F. BMC Bioinf. 2007, 8, 323. (25) Xue, Y.; Ren, J.; Gao, X. J.; Jin, C. J.; Wen, L. P.; Yao, X. B. Mol. Cell. Proteomics 2008, 7, 1598–1608. (26) Diella, F.; Gould, C. M.; Chica, C.; Via, A.; Gibson, T. J. Nucleic Acids Res. 2008, 36, D240-D244. (27) Feng, S.; Ye, M. L.; Zhou, H. J.; Jiang, X. G.; Jiang, X. N.; Zou, H. F.; Gong, B. L. Mol. Cell. Proteomics 2007, 6, 1656–1665.

5798

Analytical Chemistry, Vol. 81, No. 14, July 15, 2009

Table 4. Comparison Accuracy (Ac), Sensitivity (Sn), Specificity (Sp), and Mathew Correlation Coefficient (MCC) of Phosphorylation Site Identifications for r-Casein and β-Casein by Different Approachesa multiproteases

trypsin

R-casein

MS /MS

MS /MS3

MS2

FDR TP FP FN TN Sn Sp Ac MCC

1.26% 21 2 4 50 84.00% 96.15% 92.21% 82.00%

1.90% 17 1 8 51 68.00% 98.08% 88.31% 73.11%

8.72% 5 1 20 51 20.00% 98.08% 72.73% 31.58%

2

3

2

multiproteases 2

3

trypsin 2

3

β-casein

MS /MS

MS /MS

MS2

FDR TP FP FN TN Sn Sp Ac MCC