Combining Alkaline Phosphatase Treatment and Hybrid Linear Ion

Aug 24, 2009 - Combining Alkaline Phosphatase Treatment and Hybrid Linear Ion Trap/Orbitrap High Mass Accuracy Liquid Chromatography−Mass ...
0 downloads 0 Views 4MB Size
Anal. Chem. 2009, 81, 7778–7787

Combining Alkaline Phosphatase Treatment and Hybrid Linear Ion Trap/Orbitrap High Mass Accuracy Liquid Chromatography-Mass Spectrometry Data for the Efficient and Confident Identification of Protein Phosphorylation Hsin-Yi Wu,† Vincent Shin-Mu Tseng,‡ Lien-Chin Chen,‡ Yu-Chen Chang,† Peipei Ping,§ Chen-Chung Liao,| Yeou-Guang Tsay,| Jau-Song Yu,⊥ and Pao-Chi Liao*,† Department of Environmental and Occupational Health, College of Medicine, National Cheng Kung University, Tainan, Taiwan, Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan, Department of Physiology, David Geffen School of Medicine, University of California at Los Angeles (UCLA), Los Angeles, California 90095, Institute of Biochemistry and Molecular Biology, National Yang-Ming University, Taipei, Taiwan, and Department of Cell and Molecular Biology, Chang Gung University, Tao-Yuan, Taiwan Protein phosphorylation is a vital post-translational modification that is involved in a variety of biological processes. Several mass spectrometry-based methods have been developed for phosphoprotein characterization. In our previous work, we demonstrated the capability of a computational algorithm in mining phosphopeptide signals in large LC-MS data sets by measuring the mass shifts due to phosphatase treatment (Wu, H. Y.; Tseng, V. S.; Liao, P. C. J. Proteome Res. 2007, 6, 1812-1821). Mass accuracy seems to play an important role in efficiently selecting out phosphopeptide signals. In recent years, the hybrid linear ion trap (LTQ)/Orbitrap mass spectrometer, which provides a high mass accuracy, has emerged as a powerful instrument in proteomic analysis. Here, we developed a process to incorporate LC-MS data that was generated from an LTQ/Orbitrap mass spectrometer into our strategy for taking advantage of the accurate mass measurement. LTQ/Orbitrap raw files were converted to the open file format mzXML via the ReAdW.exe program. To find peaks that were contained in each mzXML file, an open-source computer program, msInspect, was utilized to locate isotopes and assemble those isotopes into peptides. An in-house program, LcmsFormatConverter, was utilized for signal filtering and format transformation. A proposed in-house program, DeltaFinder, was modified and used for defining signals with an exact mass shift due to the dephosphorylation reaction, which generated a table that listed potential phosphopep* To whom correspondence should be addressed. Dr. Pao-Chi Liao, Department of Environmental and Occupational Health, National Cheng Kung University College of Medicine, 138 Sheng-Li Road, Tainan 70428, Taiwan. Phone: 886-62353535 ext. 5566. Fax: 886-6-2743748. E-mail: [email protected]. † Department of Environmental and Occupational Health, College of Medicine, National Cheng Kung University. ‡ Department of Computer Science and Information Engineering, National Cheng Kung University. § University of California at Los Angeles (UCLA). | National Yang-Ming University. ⊥ Chang Gung University.

7778

Analytical Chemistry, Vol. 81, No. 18, September 15, 2009

tide signals. The retention times and m/z values of these selected LC-MS signals were used to program subsequent LC-MS/MS experiments to get high-confidence phosphorylation site determination. Compared to our previous work finished by using a quadrupole/time-offlight mass spectrometer, a larger number of phosphopeptides in the casein mixture were identified by using LTQ/Orbitrap data, demonstrating the merit of high mass accuracy in our strategy. In addition, the characterization of the lung cancer cell tyrosine phosphoproteome revealed that the use of alkaline phosphatase treatment combined with accurate mass measurement in this strategy increased data repeatability and confidence. Protein phosphorylation, which is a vital post-translational modification, is known to be involved in signal transduction pathways, cell cycle progression, and cell division in both normal and diseased cells.1-3 Many mass spectrometry-based proteomic techniques have been developed for protein phosphorylation analysis in the hope of providing information on these significant processes, and these techniques have rapidly evolved in recent years. However, reliable phosphopeptide identification is not a routine task, especially for large-scale phosphoproteome analysis. It has been reported that current database search tools identify large numbers of false-positive/false-negative peptide assignments.4-6 The situation can be much worse for the identification of phosphorylation that is only based on a single peptide identification.7-9 During the process of phosphopeptide enrichment, nonphosphorylated peptides usually coelute with phosphoHunter, T. Cell 2000, 100, 113–127. Cohen, P. Trends Biochem. Sci. 2000, 25, 596–601. Pawson, T.; Scott, J. D. Trends Biochem. Sci. 2005, 30, 286–290. Zhang, N.; Li, X. J.; Ye, M.; Pan, S.; Schwikowski, B.; Aebersold, R. Proteomics 2005, 5, 4096–4106. (5) Chen, Y.; Kwon, S. W.; Kim, S. C.; Zhao, Y. J. Proteome Res. 2005, 4, 998– 1005. (6) Weatherly, D. B.; Atwood, J. A., 3rd; Minning, T. A.; Cavola, C.; Tarleton, R. L.; Orlando, R. Mol. Cell. Proteomics 2005, 4, 762–772. (1) (2) (3) (4)

10.1021/ac9013435 CCC: $40.75  2009 American Chemical Society Published on Web 08/24/2009

peptides because of a lack of efficiency in phosphopeptide enrichment. Direct LC-MS/MS analysis of enriched sample tends to fail to select most of the phosphopeptides due to the low abundance of phosphopeptides and the suppression effect in the presence of nonphosphorylated peptides. The poor quality of the MS/MS spectra could lead to ambiguous identification of phosphopeptides.9 Moreover, the MS/MS spectra of phosphopeptides are typically more complicated than those of unmodified peptides, which may also enhance the possibility of random match. To locate phosphopeptides, one or more additional methods for validation are valuable. One approach is sample treatment by enzymatic dephosphorylation, which is commonly used in matrix-assisted laser desorption ionization (MALDI)-based analysis.10-16 More recently, the combination of phosphatase treatment with LC-MS/ MS detection has been reported. LC-MS/MS analysis was performed on both the phosphopeptide sample and the corresponding dephosphorylated sample. The enzymatically dephosphorylated peptides are used as a reference database. The identification of both the phosphorylated and dephosphorylated forms of a peptide increases the confidence of that phosphopeptide identification and may immensely decrease false positive data from a large scale phosphoproteomics study.7,9,17,18 Those approaches are powerful and suitable for comprehensive phosphoproteome analysis. Still, some of them have mentioned that, because of poorquality MS/MS spectra, the positions of phosphorylated sites could not be determined in many cases. There are dephosphorylated peptides that are uniquely found after phosphatase treatment but not observed as phosphopeptides.9,17,18 For this issue, in 2005, Torres et al. proposed a MALDI-MS method based on the idea that more complete fragmentation ions can be obtained once the MS/MS analysis was focused on the potential phosphopeptide signals.13 In our previous work, we have proposed a strategy for comprehensive LC-MS analysis on phosphopeptide signals by using the similar idea. Our strategy is coupled with a computational algorithm for mining phosphopeptide signals from LC-MS data by taking advantage of the -80n Da mass shift that is derived from alkaline phosphatase dephosphorylation. The subsequent LC-MS/MS analysis was performed on those selected signals, by which more phosphopeptide identification can (7) Imanishi, S. Y.; Kochin, V.; Ferraris, S. E.; de Thonel, A.; Pallari, H. M.; Corthals, G. L.; Eriksson, J. E. Mol. Cell. Proteomics 2007, 6, 1380–1391. (8) Ville´n, J.; Beausoleil, S. A.; Gerber, S. A.; Gygi, S. P. Proc. Natl. Acad. Sci. U.S.A. 2007, 104, 1488–1493. (9) Ishihama, Y.; Wei, F. Y.; Aoshima, K.; Sato, T.; Kuromitsu, J.; Oda, Y. J. Proteome Res. 2007, 6, 1139–1144. (10) Zhang, X.; Herring, C. J.; Romano, P. R.; Szczepanowska, J.; Brzeska, H.; Hinnebusch, A. G.; Qin, J. Anal. Chem. 1998, 70, 2050–2059. (11) Stensballe, A.; Anderson, S.; Jensen, O. N. Proteomics 2001, 1, 207–222. (12) Zhou, W.; Merrick, B. A.; Khaledi, M. G.; Tomer, K. B. J. Am. Soc. Mass Spectrom. 2000, 11, 273–282. (13) Torres, M. P.; Thapar, R.; Marzluff, W. F.; Borchers, C. H. J. Proteome Res. 2005, 4, 1628–1635. (14) Liao, P. C.; Leykam, J.; Andrews, P. C.; Gage, D. A.; Allison, J. Anal. Biochem. 1994, 219, 9–20. (15) Hirschberg, D.; Jagerbrink, T.; Samskog, J.; Gustafsson, M.; Stahlberg, M.; Alvelius, G.; Husman, B.; Carlquist, M.; Jornvall, H.; Bergman, T. Anal. Chem. 2004, 76, 5864–5871. (16) Larsen, M. R.; Sørensen, G. L.; Fey, S. J.; Larsen, P. M.; Roepstorff, P. Proteomics 2001, 1, 223–238. (17) Collins, M. O.; Yu, L.; Campuzano, I.; Grant, S. G.; Choudhary, J. S. Mol. Cell. Proteomics 2008, 7, 1331–1348. (18) Marcantonio, M.; Trost, M.; Courcelles, M.; Desjardins, M.; Thibault, P. Mol. Cell. Proteomics 2008, 7, 645–660.

be obtained.19 However, with the gradually growing data size, mass accuracy turns out to be the important part of accurately selecting out phosphopeptide signals. In recent years, hybrid linear ion trap (LTQ)/Orbitrap mass spectrometer has emerged as a powerful instrument in proteomic analysis due to its high resolving power, rapid scanning rate, and high mass accuracy. Here, we developed a process to incorporate LC-MS data that was generated from an LTQ/Orbitrap mass spectrometer into our strategy. The data mining algorithm was modified and provided. Its feasibility was evaluated by the characterization of casein mixture phosphopeptides and the tyrosine phosphoproteome of lung cancer cells. MATERIALS AND METHODS Preparation of Tryptic In-Gel Digested r- and β-Caseins. Solutions containing 250 ng of R-casein and 250 ng of β-casein were resolved by SDS-PAGE followed by the silver staining method. Protein bands were excised and solutions containing 50% v/v acetonitrile and 50% v/v acetonitrile/25 mM ammonium bicarbonate were used to wash the gel pieces twice. To perform the reduction and alkylation reaction, the gel fragments were placed in 10 mM dithiothreitol (DTT) and 55 mM iodoacetamide in 25 mM ammonium bicarbonate at 65 °C for 45 min. To digest proteins, 0.1 µg of trypsin (Promega, Madison, WI) was added to each tube, and the samples were incubated overnight at 37 °C. After incubation, the supernatant was transferred to an Eppendorf tube. About 20 µL of 50% v/v acetonitrile/5% v/v formic acid was used to extract the remaining peptides from the gel piece. Cell Culture and Cell Lysate Extraction. The cell line of human lung adenocarcinoma, CL1-5, was kindly provided by Dr. P.-C. Yang (Academia Sinica, Taipei, Taiwan) and was cultured in RPMI-1640 that was supplemented with 10% fetal bovine serum (Gibco BRL, Gaithersburg, MD) and antibiotics at 37 °C under 5% CO2. The dispersed cells were centrifuged at 550g for 10 min at 4 °C. Cell pellets were washed three times with PBS and resuspended in ddH2O containing a phosphatase inhibitor (2 mM sodium orthovanadate and 10 mM β-glycerolphosphate). Cell rupture was achieved by sonication at 20 W for 10 s × 3. After centrifuging at 25 000g for 1 h, the supernatant that contained the proteins was collected. A small aliquot was extracted for the protein assay. Immunoprecipitation and Western Blot. A sample of the total lysate containing 700 µg of cellular proteins in 300 µL of lysis buffer was incubated with 50 µL of immobilized anti-pTyr antibody (PT66)-agarose beads (Sigma, St. Louis, MO) and 50 µL of immobilized anti-pTyr antibody (4G10)-agarose beads (UpstateMillipore, Billerica, MA) at 4 °C overnight. The agarose beads were washed twice with 500 µL of lysis buffer, followed by a wash with 500 µL of 1× PBS. Proteins were eluted by adding 40 µL of 4× sample buffer (250 mM Tris-HCl, 8% SDS, 40% glycerol, 0.04% bromphenol blue, and 400 mM DTT). The eluted proteins were resolved by gradient gels (1.0 mm × 10 well, 4-12% NuPAGE Bis-Tris; Invitrogen, Carlsbad, CA) prior to immunoblotting or visualization by the silver staining method. For the immunoblotting analysis, the SDS-PAGE gel was transferred onto a PVDF membrane. After blocking for 1 h at room temperature, the membrane was washed three times with Tris-buffered saline with (19) Wu, H. Y.; Tseng, V. S.; Liao, P. C. J. Proteome Res. 2007, 6, 1812–1821.

Analytical Chemistry, Vol. 81, No. 18, September 15, 2009

7779

Tween 20 (TBST). The membranes were probed with monoclonal antiphosphotyrosine antibody (4G10, dilution 1:1000) (Upstate Biotechnology, Inc.) at 4 °C overnight. After washing with TBST three times, the membranes were incubated with the secondary antibody (dilution 1:2500, horseradish peroxidase-conjugated antimouse antibodies) at room temperature for 1 h and finally washed three times for 10 min with TBST. The blot was developed using enhanced chemiluminescence detection (PerkinElmer LAS Inc., Boston, MA). Enrichment of Phosphopeptides by TiO2 Microcolumns. The protocol was adapted from Wu et al.20 Briefly, the end of GELoader tip was restricted with a small plug of C8 material, which was stamped out of a C8 solid phase extraction disk (Supelco, Bellefonte, PA) using a 1000 µL Pipetman Tip. TiO2 beads, which were suspended in methanol, were loaded into the tip. The TiO2 microcolumn was prepared with a length of approximately 3 mm and was rinsed with 20 µL of sample loading buffer (2% TFA/65% CH3CN solution saturated with glutamic acid). The peptide mixture (15 µL) was diluted with 185 µL of the sample loading buffer and loaded into the microcolumn with a syringe pump set at a flow rate of 10 µL/ min. The microcolumn was washed with 20 µL each of the sample loading buffer, 65% acetonitrile/0.5% TFA, and 65% acetonitrile/0.1% TFA. A total of 30 µL of 300 mM NH4OH/ 50% CH3CN was used to elute the bound peptides. After acidification, the eluate was analyzed by full-scan LC-MS (Orbitap was used to get high mass accuracy measurement) or LC-MS/MS analysis (by using LTQ). Alkaline Phosphatase Treatment. The TiO2-eluted sample was incubated with 0.25 U of alkaline phosphatase (Roche Applied Science, Mannheim, Germany) at 37 °C for 2 h. The pH was adjusted with 10× dephosphorylation buffer (Roche Applied Science, Mannheim, Germany) by adding one tenth the volume of the total solution. The sample was cooled to room temperature after the dephosphorylation reaction and dried using a speed vacuum centrifuge. Liquid Chromatography-Mass Spectrometry (LC-MS). Protein tryptic digests were fractionated on a BioBasic C18 300 Å Packed PicoFrit Column (75 µm i.d. × 10 cm, New Objective, Woburn, MA) using Finnigan Surveyor high-performance liquid chromatography (Thermo Finnigan Scientific, Bremen, Germany). The sample was loaded with 100% buffer A (5% acetonitrile/0.1% formic acid) to 10% buffer B (80% acetonitrile/0.1% formic acid) for 2 min. Peptides were eluted using the following gradients: 90% buffer A to 60% buffer B for 38 min, which was followed by raising to 100% buffer B within 1 min. Within the subsequent 9 min, the buffer condition changed to 100% buffer A and was held for another 20 min. The flow rate was set at 200 nL/min. An LTQ/ Orbitrap hybrid mass spectrometer with high-resolution isolation capability (Thermo Fisher Scientific) that was equipped with an electrospray ionization source was operated in the positive ionization mode with a spray voltage of 1.8 kV. The scan range of each full MS scan was m/z 350-2000. LC-MS data were acquired in the Orbitrap, with resolution of 30 000 (at m/z 400). (20) Wu, J.; Shakey, Q.; Liu, W.; Schuller, A.; Follettie, M. T. J. Proteome Res. 2007, 6, 4684–4689.

7780

Analytical Chemistry, Vol. 81, No. 18, September 15, 2009

Conversion of Thermo Xcalibur .raw files to mzXML using ReAdW and Peak Finding Using msInspect. Thermo Xcalibur native acquisition files (.raw files) were converted to the open file format mzXML via ReAdW.exe, which is available in the TransProteomic Pipeline (TPP) platform (http://tools.proteomecenter.org/software.php). An open-source computer program, msInspect, was utilized to locate isotopes in the LC-MS data and assemble the isotopes into peptides. The msInspect software is distributed freely under an Apache 2.0 license and is available at http://proteomics.fhcrc.org/. LC-MS data files that were represented in the standard mzXML data format were accepted as input data. The data files encoding peak information were saved as .tsv files. Liquid Chromatography-Tandem Mass Spectrometry (LC-MS/MS) and Database Search. Both of the direct LC-MS/ MS analysis and the LC-MS/MS analysis in our strategy were performed on LTQ linear ion trap (LTQ, Thermo Fisher Scientific) with single injection. The reverse phase separation was performed using a linear acetonitrile gradient, which was identical to the one described in the LC-MS analysis section. Each cycle of one full scan mass spectrum (m/z 350-2000) was followed by three datadependent tandem mass spectra with the collision energy set at 35%. In our strategy, the m/z values of the mass list generated from LC-MS (LTQ-Orbitrap) and selected by DeltaFinder was set in an inclusion list for phosphopeptide identification. Bioworks Browser 3.1 was utilized to convert the Xcalibur binary (RAW) files into peak list (DTA) files. The parameters for DTA creation were set as follows: precursor mass tolerance, 1.4 Da; maximum number of intermediate MS/MS scans, 25 between spectra that have the same precursor masses; minimum peaks, 12 per MS/MS spectrum; minimum scans per group, 1; and automatic precursor charge selection. To concatenate the generated DTA files, merge.pl, which is a Perl script that is provided on the Matrix Science Web site, was used. The resulting peak lists were searched against the Swiss-Prot database via a Mascot search engine (http://www.matrixscience.com, Matrix Science Ltd., U.K.). The search parameters were set as follows: peptide mass tolerance, 1 Da; MS/MS ion mass tolerance, 1 Da; enzyme set as trypsin and allowance of up to two missed cleavages; variable modifications included oxidation on methionine, deamidation on asparagine and glutamine, carboxyamidomethylation on cysteine, and phosphorylation on serine, threonine, and tyrosine residues; peptide charge, 2+ and 3+; and taxonomy limited to human. Only phosphopeptides with a Mascot score larger than 39 were accepted. Manual inspection was based on the fragment ion assignment. In order to be accepted as a phosphotyrosine site, b and y ions on both sides of the phosphorylated sites must be detected. RESULTS AND DISCUSSION Processing Pipeline of LTQ/Orbitrap Data for Recognizing Potential Phosphopeptide Signals and Determining the Phosphorylation Site. The analytical strategy was developed in our previous work.19 To obtain high mass accuracy measurement, the full-scan LC-MS analysis was performed on an LTQ/Orbitrap mass spectrometer (as shown in Figure 1A). The processing of LTQ/Orbitrap data is a critical step in this strategy, and the

Figure 1. (A) Analytical strategy: A TiO2-enriched peptide mixture, which was treated or untreated with alkaline phosphatase, was analyzed by LC-MS using an LTQ/Orbitrap mass spectrometer. After the raw data was processed and the DeltaFinder program computation was performed, potential phosphopeptide signals and their dephosphorylated counterparts, which had the characteristic of a -79.966n Da mass shift after dephosphorylation, were exported. A series of MS/MS experiments were conducted on those potential phosphopeptide signals for the determination of their phosphorylation sites. (B) Scheme of LTQ/Orbitrap raw data processing for DeltaFinder program computations. LTQ/Orbitrap raw files were converted to mzXML via the ReAdW.exe program. A computer program, msInspect, was utilized to find peaks. Exported signals were filtered, and the data format was transformed as the input data for DeltaFinder. After program computing, signals with an exact mass shift between the treated and untreated sample were defined.

scheme was depicted in Figure 1B. Since the mzXML21 format is one common file format that is compatible for various peak finding programs, raw files were converted into mzXML by a freely available converter called ReAdW.exe. To find peaks that were contained in each mzXML file, a noncommercial software tool, msInspect,22 was utilized to locate isotopes, charge deconvolution, and calculate original peptide mass. The output data, which contained peak information such as eluting time, m/z value, intensity, charge state, and mass, were saved in a tsv. file. To focus on peptides with a higher confidence level, the tsv. file was reduced by excluding signals with a Kullback-Leibler (KL) value > 1, fewer than two isotopic peaks, and an intensity below a userdefined threshold by an in-house program called LcmsFormatConverter. The KL value was generated by the msInspect software and was a deviance score that evaluate the closeness of the observed and expected isotopic distribution. Peptides with a KL value e 1 were considered to have a high confidence level.22 LcmsFormatConverter finally transformed the file into a format that is compatible to DeltaFinder. DeltaFinder, an in-house program, was used for selecting signal pairs with mass shift.19 In the present work, DeltaFinder was modified, and some functions were added in order to more specifically identify real phosphopeptide signals. Since the peak finding, deisotope, and calculation

of original mass steps have been accomplished by msInspect, the reduced file was directly processed by “Find ∆”and a new “Retention Time (RT) Difference” function. In the process of “Find ∆”, signals with mass shift of 79.966n ± k Da were extracted from two LC-MS data sets. The tolerance, k Da, which was considered to be a systematic error, was specified to a very small value due to the high mass accuracy of the LTQ/Orbitrap. The introduction of the “RT Difference” came from our observation that a phosphopeptide and its nonphosphorylated form usually were closely eluted from the reverse-phase column (within 1 min),19 which is consistent with previous findings.23-25 Other groups have also reported that dephosphorylated peptides showed slightly earlier retention times than the singly phosphorylated peptides, which suggests that the retention time of a phosphorylated peptide and its dephosphorylated counterpart should be approximately similar.7,26,27 In the “RT Difference” process, only pairs with a retention time difference that was smaller than a specified value were considered to be potential candidates. Evaluation of the Strategy by Analyzing the r- and β-Casein Protein Mixture. Two model phosphoproteins, R- and β-casein, were used for the demonstration of the strategy. A total of 500 ng of an R- and β-casein peptide mixture was submitted to TiO2microcolumn enrichment. The eluted peptide mixture was

(21) Pedrioli, P. G.; Eng, J. K.; Hubley, R.; Vogelzang, M.; Deutsch, E. W.; Raught, B.; Pratt, B.; Nilsson, E.; Angeletti, R. H.; Apweiler, R.; Cheung, K.; Costello, C. E.; Hermjakob, H.; Huang, S.; Julian, R. K.; Kapp, E.; McComb, M. E.; Oliver, S. G.; Omenn, G.; Paton, N. W.; Simpson, R.; Smith, R.; Taylor, C. F.; Zhu, W.; Aebersold, R. Nat. Biotechnol. 2004, 22, 1459– 1466. (22) Bellew, M.; Coram, M.; Fitzgibbon, M.; Igra, M.; Randolph, T.; Wang, P.; May, D.; Eng, J.; Fang, R.; Lin, C.; Chen, J.; Goodlett, D.; Whiteaker, J.; Paulovich, A.; McIntosh, M. Bioinformatics 2006, 22, 1902–1909.

(23) Lucas, J.; Henschen, A. J. Chromatogr. 1986, 369, 357–364. (24) Dass, C.; Mahalakshmi, P.; Grandberry, D. J. Chromatogr., A 1994, 678, 249–257. (25) Tsay, Y. G.; Wang, Y. H.; Chiu, C. M.; Shen, B. J.; Lee, S. C. Anal. Biochem. 2000, 287, 55–64. (26) Kawakami, T.; Tateishi, K.; Yamano, Y.; Ishikawa, T.; Kuroki, K.; Nishimura, T. Proteomics 2005, 5, 856–864. (27) Steen, H.; Jebanathirajah, J. A.; Rush, J.; Morrice, N.; Kirschner, M. W. Mol. Cell. Proteomics 2006, 5, 172–181.

Analytical Chemistry, Vol. 81, No. 18, September 15, 2009

7781

states, was also identified among the 34 signals with 2 and 3 charge states. Among the 35 signals, 32 signals had a k that was smaller than 0.02 Da, which was consistent with an instrument resolution set less than 15 ppm. The deviation of mass measure-

Figure 2. (A) The k value used in DeltaFinder versus the resulting pair numbers in the casein sample. The segment of k ) 0.001-1 was shown, and the total number of phosphopeptides identified from the corresponding signals pairs was also shown in the plot. (B) The allowed RT difference set within each pair versus the resulting pair numbers in the casein sample. The corresponding number of identified phosphopeptides was also illustrated. The data point, k ) 0.05 Da and RT difference lower than 5 min, was indicated by arrows and used for casein data analysis.

separated into three aliquots in a 1:1:2 proportion. Alkaline phosphatase was added to one small aliquot. Both the small aliquots were submitted to LC-MS analysis on the LTQ/ Orbitrap. After data processing by ReAdW and msInspect, 5012 signals were detected in the TiO2-eluted sample, while 7548 signals were found in the alkaline phosphatase-treated sample. The intensity cutoff was set to 10 000 in the LcmsFormatConverter in order to avoid the inclusion of noise signals. A parameter k (Da) was specified as a maximum tolerance between measured mass shifts (∆) and theoretical values (79.966n Da, n is the number of phosphate groups). Only signal pairs that fit the equation of |∆ - 79.966n| e k, n ) 1,2,3. . . will be selected. We supposed that a small k should be used while dealing with high mass accuracy data (the instrument we use has a mass accuracy of 15 ppm). For further inspection of this issue, the k value was plotted versus the pair numbers that were output from DeltaFinder using the casein data. We first fixed the intensity cutoff at 10 000 and the RT difference at 5 min and the resulting number of signal pairs from 2 to 148 while k increased (0.001∼1) (Figure 2A). The plot designated the significance of setting a proper k to exclude randomly matched pairs. After targeted LC-MS/MS analysis on those selected signals, the corresponding phosphopeptide numbers were also represented in Figure 2A. It seems likely that a total of 35 phosphopeptides can be identified, and most have a k value that is smaller than 0.05 Da. While looking into the only phosphopeptide with k larger than 0.05 Da, we determined that the counterpart of this signal appeared with a very low intensity and a broad peak shape. The equivocal mass measurement of the ion may lead to observed greater k. However, this phosphopeptide, with 4 charge 7782

Analytical Chemistry, Vol. 81, No. 18, September 15, 2009

Figure 3. Three dimensional representation of the LC-MS data that was obtained from (A) the tryptic digest of a mixture containing R-and β-casein followed by TiO2 microcolumn enrichment and (B) additional treatment with alkaline phosphatase. In part C, only the phosphopeptide signals that were picked by the program are shown.

Table 1. Phosphopeptides and Phosphorylation Sites Identified from Casein Proteins by Using Our Proposed Strategy peptide no.

pair no.a

m/z of phosphopeptide

P01 P02 P03 P04

D01, D02 D03, D04 D05 D06-D09

687.95, 1031.43 811.36 1107.54 554.27, 830.91

P05 P06 P07 P08

D10, D11 D12, D13 D14, D15 D16-D18

651.33 916.94 924.37 763.67

P09 P10 P11 P12 P13 P14 P15 P16

D19, D20 D21, D22 D23, D24 D25 D26 D27 D28 D29-D34

1240.54 832.69, 1248.84 806.36 683.79 733.81 813.80 837.90 571.95, 857.43

P17

D35-D37

P18

D38-D43

425.44, 566.92, 849.88 425.44, 566.92, 849.88

P19 P20 P21 P22

D44 D45 D46 D47

852.89 570.58 596.97 902.41

charge stateb

nc

∆d - 79.966n

peptide sequencee

protein

3+/2+, 2+/2+ 3+/2+, 3+/3+ 3+/3+ 3+/3+, 2+/3+, 3+/2+, 2+/2+ 3+/3+, 3+/2+, 2+/3+, 2+/2+ 2+/3+, 2+/2+ 3+/3+, 3+/2+, 3+/3+, 2+/3+, 2+/2+, 3+/2+, 2+/2+ 3+/3+, 3+/2+ 4+/4+ 2+/2+ 3+/3+ 2+/2+ 3+/2+, 2+/2+, 3+/3+, 2+/3+, 3+/4+, 2+/4+ 4+/3+, 3+/3+, 2+/3+ 4+/3+, 3+/3+, 2+/3+, 4+/4+, 3+/4+, 2+/4+ 2+/3+ 3+/3+ 3+/3+ 3+/3+

1 1 2 1

-0.0003 ∼ -0.0032 -0.0103 ∼ -0.0114 -0.0040 -0.0016 ∼ -0.0036

FQpSEEQQQTEDELQDK IEKFQpSEEQQQTEDELQDK EELNASGEpTVEpSLpSpSpSEESITHISKEK VPQLEIVPNpSAEER

β-casein β-casein β-casein R-S1-casein

1 1 1 1

-0.0007 ∼ -0.0026 -0.0054 ∼ -0.0083 -0.0055 ∼ -0.0082 -0.0017 ∼ -0.0122

YKVPQLEIVPNpSAEER YLGEYLIVPNpSAEER (variant) DIGpSESTEDQAMEDIK DIGpSEpSTEDQAMEDAKQMK

R-S1-casein R-S1-casein R-S1-casein R-S1-casein

1 1 1 1 1 2 1 1

-0.0081 ∼ -0.0180 -0.0014 ∼ -0.0099 -0.0130 ∼ -0.0156 0.0394 -0.0050 0.0369 -0.0038 -0.0018 ∼ -0.0176

QMEAEpSIpSSSEEIVPISVEQK QMEAEpSIpSSSEEIVPISVEQK QMEAESIpSSSEEIVPNSVEQK AEpSIpSSSEESVPNSVEQK TVDMEpSTEVFTK TVDMEpSpTEVFpTK KTVDMEpSpTEVFTK LpTEEEKNRLNFLK

R-S1-casein R-S1-casein R-S1-casein R-S1-casein R-S2-casein R-S2-casein R-S2-casein R-S2-casein

3

0.0170-0.0192

1

-0.0128-0.0121

1 1 1 2

0.0018 -0.0027 -0.0047 -0.036

a Potential phosphopeptide signal pairs. b Charge state of the pair was illustrated as phosphopeptide/dephosphorylated form. c Difference of phosphate group numbers within the pair, calculated from the mass difference between the phosphorylated and dephosphorylated forms. d ∆ ) mass shift between phosphopeptide and its dephosphorylated form. e The “p” indicates a phosphate group on the following residue.

ment was found to be approximately 0.02-0.03 Da. We considered that k ) 0.05 Da may be a suitable threshold here for selecting potential phosphopeptide signals in order to include all the possible candidates. Figure 2A implied that real phosphopeptide signals can be differentiated from other signals by using a small k value, which can be predicted by the mass accuracy of the instrument. After evaluating k, we looked into how the difference between the elution times of phosphorylated and their dephosphorylated forms affected the signal-mining outcome. The RT difference versus the resulted pair signals and the corresponding identified phosphopeptide numbers was illustrated in Figure 2B. While the fixing intensity cutoff at 10 000 and k ) 0.05 Da, as expected, the number of selected signal pairs increased, while a larger RT difference was allowed. Because of the restriction of the small value of k, only 73 pairs were selected, even though the RT difference was allowed for 40 min. After application of LC-MS/ MS analysis on those signals, the number of identified phosphopeptides was shown in Figure 2B. Most of the 36 identified phosphopeptides have an RT difference smaller than 5 min with their nonphosphorylated forms. Only two phosphopeptides were identified with an RT difference that was larger than 20 min. After inspecting the data, these two signals were of the same m/z value at 841.87. They were selected out with RT difference of 24.2 min with their counterparts, which could be a random match because their elution time at 52 min occurred when all the bound ions had been eluted. A total of 30 phosphopeptides had an RT difference that was smaller than 2 min, which demonstrated the

close elution time of phosphopeptide signals and their counterparts. The criteria, k ) 0.05 Da and RT difference ) 5 min, which are indicated by arrows in parts A and B of Figure 2, were used for the casein phosphopeptide analysis. Three-dimensional plots were provided to illustrate the effective mining of potential phosphopeptide signals in the casein mixture. After setting the intensity cutoff at 10 000, 192 signals were detected in the TiO2-eluted sample (Figure 3A), while 744 signals were found in the alkaline phosphatase-treated sample (Figure 3B). All the parameters that were set in the DeltaFinder were the same as in the previous report,19 except for k ) 0.05 Da and an RT difference within 5 min. From these two LC-MS data, 47 signal pairs (D1-D47) were exported (Figure 3C). Those 47 potential phosphopeptide signals corresponded to 22 masses. Among those candidates, 16 masses (34 signals) were identified as phosphopeptides by LC-MS/MS analysis (Table 1). The identification of a phosphopeptide is usually based upon the identification of one peptide, and additional criteria are therefore required in order to validate the results. Mass shift generated from the dephosphorylation reaction can provide the evidence of the existence of those identified phosphopeptides. According to Rush et al.,28 several criteria have been proposed to validate the result and these were applicable to our data. First, coeluting ions with (28) Rush, J.; Moritz, A.; Lee, K. A.; Guo, A.; Goss, V. L.; Spek, E. J.; Zhang, H.; Zha, X. M.; Polakiewicz, R. D.; Comb, M. J. Nat. Biotechnol. 2005, 3, 94– 101.

Analytical Chemistry, Vol. 81, No. 18, September 15, 2009

7783

Figure 4. The k value and the RT difference of the phosphopeptide and nonphosphopeptide signals. The ROC curves were plotted by setting different thresholds for the k value and RT difference. In part A, k ) 0.001-1 Da was as set as the threshold values while the RT difference was fixed smaller than 5 min. The test results and the calculated sensitivity and specificity were derived from k ) 0.05 Da. In part B, the RT difference ) 1-40 min was set as the threshold value, while k was fixed at 0.05 Da. The test results and the calculated sensitivity and specificity derived from the RT difference e 5 min were depicted.

different charge states are assigned to the same sequence.29 If D01 and D02 as taken as an example, D01 was a pair that included a phosphopeptide signal (m/z 687.95, 3+) and its dephosphorylated counterpart. Another coeluting ion (21.62 min) with charge state 2 at m/z 1031.43 was also selected out with the same dephosphorylated counterpart (2+) and designated as D02. Both of these two potential phosphopeptide signals were identified as FQpSEEQQQTEDELQDK, aa 48-63. The pair of signals with charge state 3 was not detected because the dephosphorylated signal (3+) was undetectable by ion tracing of m/z 661.28. It is expected that once signals with different charge states appeared in the data, all of them will be matched to their counterpart, such as D29-D34. Second, the site should be found in more than one peptide that contains overlapping sequences that are derived from incomplete proteolysis or from the use of a group of proteases. In our data, both phosphopeptide P01 (FQpSEEQQQTEDELQDK, aa 48-63) and P02 (IEKFQpSEEQQQTEDELQDK, aa 45-63) included phosphorylation site S50. Phosphopeptide P04 (VPQLEIVPNpSAEER, aa 21-34) and P05 (YKVPQLEIVPNpSAEER, aa 19-34) contained phosphorylation site S30. A similar result was also found in P07 versus P08 and in P13 versus P15. Last, but not the least, the site should be found in more than one peptide sequence due to homologous but not identical protein isoforms. For example, the sequence of P05 was YKVPQLEIVPNpSAEER with the phosphorylation site on S30 (aa 19-34), and P06 was identified as YLGEYLIVPNpSAEER (aa 20-34). In addition to the above criteria, the same phosphopeptide with other modifications also functioned as another validation. For instance, P09 and P10 both contained two phosphorylation sites while the methionine (29) Nesvizhskii, A. I.; Keller, A.; Kolker, E.; Aebersold, R. Anal. Chem. 2003, 75, 4646–4658.

7784

Analytical Chemistry, Vol. 81, No. 18, September 15, 2009

residue on P10 was oxidized. The above results indicated the efficient mining of phosphopeptide signals by DeltaFinder. (detailed DeltaFinder output data is listed in Supporting Information Table A). Concerning D21 and D22, only one phosphate group difference within the pair (n ) 1) was calculated, but the m/z was identified as a doubly-phosphorylated peptide (QMEAEpSIp(SSS)EEIVPISVEQK, aa 74-94). To solve this puzzle, we performed LC-MS/MS analysis on the m/z value of the dephosphorylated form and identified a monophosphopeptide (QMEAEpSIEEIVPISVEQK, aa 74-94). We concluded that even if incomplete dephosphorylation does occur, the pairs of a phosphorylated peptide signal and its partially dephosphorylated form would also give clues to focus on those phosphorylated peptide signals. To further investigate the k value and RT difference for selecting signal pairs, the receiver operating characteristic (ROC curve), which is a graphical plot of the true positive rate (sensitivity) versus the false positive rate (1 - specificity) with a varying discrimination threshold, was used. On the basis of the data set in Figure 2A, the k value was set as a discriminator, and the resulting ROC curve was depicted in Figure 4A. The plot had an area of 0.943 under the curve. The curve approached the upper left corner of the ROC space, which is considered to be the best possible prediction method as it represents no false negatives and no false positives. To obtain a sensitivity of 100%, the specificity should be lowered to approximately 60%, which means the inclusion of more nonphosphorylated peptide signals. Here, the point k ) 0.05 Da was selected as the criterion to lower the number of nonphosphorylated peptide signals. A total of 34 signals were identified as phosphopeptides, while 13 signals were not, which yielded 97.1% sensitivity and 88.5% specificity. With the use

Figure 5. (A) Venn diagrams of two analytical replicates comparing the overlap in unique tyrosine phosphorylated peptides from the direct LC-MS/MS analysis of TiO2-eluted sample and the application of this strategy. (B) Comparison of overall repeatability between direct LC-MS/ MS analysis and this strategy.

of the same data set from Figure 2B, a ROC curve was obtained by using the RT difference as a discriminator and an area of 0.771 under the curve was obtained (Figure 4B). A sensitivity of 100% can be obtained with a specificity of only 10%. Therefore, according to the distribution, the selection of signal pairs with an RT difference that was smaller than 5 min, a 94.4% sensitivity and a 64.8% specificity was obtained. Although the sensitivity can be increased by allowing a larger RT difference, this may include too many nonphosphorylated peptide signals and, thereby, decrease the specificity. This selection can be used when conducting large-scale analysis for which most of the random matches need to be excluded and for which the focus should be on the potential phosphopeptide signals in the LC-MS/MS analysis in order to improve the phosphopeptide identification.

Analysis of Lung Adenocarcinoma CL1-5 Cell Tyrosine Phosphoproteome. To evaluate the feasibility of our approach for characterizing the tyrosine phosphoproteome from a complex peptide mixture, we used the CL1-5 cells as a model system. When enriching p-Tyr proteins by using immunoprecipitation, a combination that included the 4G10 and PT66 antibodies was implemented. The immunoprecipitated proteins were resolved on an SDS-PAGE gel and were either probed by the anti-pTyr antibodies or visualized by silver staining. Bands 1-10, regarded as phosphotyrosine-containing proteins on the basis of the Western blot results, were excised, digested with trypsin, and enriched by TiO2 microcolumn. This analytical strategy was applied to each sample. With the combination of two replicate experiments, 67 tyrosine phosphorylated peptides, which corresponded to Analytical Chemistry, Vol. 81, No. 18, September 15, 2009

7785

Figure 6. MS/MS spectra of four identified tyrosine phosphorylated peptides, including (A) a monophosphopeptide EKKLpYANMFER (m/z 762.64, 2+) with Mascot score 41 and (B) a monophosphopeptide EMNDAAMFpYTNR (m/z 771.7, 2+) with a Mascot score 51.

64 unique peptides or 64 phosphoproteins, were identified (listed in Supporting Information Table B). Evaluation of Data Reproducibility. The reproducibility of this approach was tested by performing replicate analyses. A direct LC-MS/MS analysis on the same TiO2-eluted sample was conducted for the comparison of these two methods. Venn diagrams of two analytical replicates, for the 10 samples, were used to display the overlap in identified tyrosine phosphorylated peptides (Figure 5A). In our hands, our strategy is relatively reproducible in comparison to the conventional data-dependent LC-MS/MS analysis. The summation of the identified unique 7786

Analytical Chemistry, Vol. 81, No. 18, September 15, 2009

tyrosine phosphorylated peptides was shown in Figure 5B, which revealed the overall repeatability between the conventional method and our strategy. In the direct LC-MS/MS analysis, 16-16.7% of the phosphopeptides overlapped. After application of our strategy, the repeatability rose to 69.8-72.5%. In the datadependent acquisition mode, only the 3-5 most intense signals in each full scan mass spectrum will be selected for MS/MS fragmentation. Lower abundance phosphopeptides may not be selected as precursor ions, which could result in variations between replicate analyses.30 In contrast, phosphotyrosine-containing signals, even with low abundance, can be selected due to the

mass shift between the alkaline phosphatase treated and untreated samples. LC-MS/MS analysis with specified precursor ions may improve the quality of the MS/MS spectrum, which would lead to repeatable results. More phosphopeptides were identified by the direct LC-MS/MS analysis than by the proposed strategy (90 vs 67 phosphopeptides). We speculated that sequencing phosphopeptides without prior precursor ion selection may lead to a higher possibility of random match or false-positive identification than our strategy. Parts A and B of Figure 6 show MS/MS spectra of two tyrosine phosphorylated peptides that was not identified in the direct MS/MS analysis but in our strategy. The serial b or y ions make unambiguous phosphorylation site determination possible. FK506-binding protein 4 (gi|4503729) phosphopeptide EKKLpYANMFER (m/z 762.64, 2+) was identified by the Mascot with a score at 41 (Figure 6A). Phosphopeptide EMNDAAMFpYTNR (m/z 771.7, 2+) was identified with a Mascot score 51, and the y3+ and y4+ ions suggested a phosphate group on the Y9 residue (Figure 6B). CONCLUSIONS Our phosphopeptide signal mining strategy provided an efficient method for the identification of protein phosphorylation and is suitable for comprehensive phosphoproteome analysis. Although this procedure requires 3 LC-MS analyses, it is much more confident at finding phosphorylation sites than is a single direct LC-MS/MS analysis. High mass accuracy helps differentiate the phosphorylated peptide signals. The use of alkaline phosphatase treatment in this strategy increased data repeatability and confidence in phosphopeptide identification. The identification of phosphopeptide by the mass shift approach is also helpful in (30) Kim, J. E.; Tannenbaum, S. R.; White, F. M. J. Proteome Res. 2005, 4, 1339– 1346.

circumventing the challenge of phosphotyrosine identification, which was always considered rare in the sample. Processing of the LC-MS data by computational methods make large-scale analysis of the phosphoproteome feasible. Signals with mass shift can be sieved out by computation, which would be very tedious and time-consuming by manual inspection. Although DeltaFinder currently can only process data from QSTAR and LTQ/Orbitrap, any mass spectrometry data will be acceptable once the data format has been transformed via a program such as the LcmsFormatConverter. The “mass shift” in DeltaFinder is a user-defined parameter that provides a potential for the application of similar strategies in investigating other protein modifications or molecules. DeltaFinder, along with LcmsFormatConverter, is currently available on http://binfo. csie.ncku.edu.tw:8080/DeltaFinder/. ACKNOWLEDGMENT This study was supported by Grant DOH98-TD-G-111-008 from the Department of Health, Executive Yuan; Grant NSC97-2113M-006-005-MY3 from the National Science Council; Landmark Project of National Cheng Kung University and the National Cheng Kung University Project of Promoting Academic Excellence & Developing World Class Research Centers from the Ministry of Education of Taiwan. It was also partially supported by Sustainable Environment Research Center. SUPPORTING INFORMATION AVAILABLE Additional information as noted in text. This material is available free of charge via the Internet at http://pubs.acs.org. Received for review June 22, 2009. Accepted August 13, 2009. AC9013435

Analytical Chemistry, Vol. 81, No. 18, September 15, 2009

7787