Anal. Chem. 2003, 75, 4081-4086
Web and Database Software for Identification of Intact Proteins Using “Top Down” Mass Spectrometry Gregory K. Taylor,† Yong-Bin Kim,† Andrew J. Forbes,‡ Fanyu Meng,‡ Ryan McCarthy,† and Neil L. Kelleher*,‡
Department of Computer Science and Department of Chemistry, University of Illinois, Urbana, Illinois 61820
For the identification and characterization of proteins harboring posttranslational modifications (PTMs), a “top down” strategy using mass spectrometry has been forwarded recently but languishes without tailored software widely available. We describe a Web-based software and database suite called ProSight PTM constructed for largescale proteome projects involving direct fragmentation of intact protein ions. Four main components of ProSight PTM are a database retrieval algorithm (Retriever), MySQL protein databases, a file/data manager, and a project tracker. Retriever performs probability-based identifications from absolute fragment ion masses, automatically compiled sequence tags, or a combination of the two, with graphical rendering and browsing of the results. The database structure allows known and putative protein forms to be searched, with prior or predicted PTM knowledge used during each search. Initial functionality is illustrated with a 36-kDa yeast protein identified from a processed cell extract after automated data acquisition using a quadrupole-FT hybrid mass spectrometer. A +142-Da ∆m on glyceraldehyde-3-phosphate dehydrogenase was automatically localized between Asp90 and Asp192, consistent with its two cystine residues (149 and 153) alkylated by acrylamide (+71 Da each) during the gel-based sample preparation. ProSight PTM is the first search engine and Web environment for identification of intact proteins (https://prosightptm.scs.uiuc.edu/). For detection of posttranslational modifications (PTMs) to proteins on a proteomic scale, mass spectrometric (MS) strategies are now under development to improve the efficiency and reliability of this immense measurement challenge. With far fewer genes in mammalian genomes than once thought,1 the theme of many protein forms from each gene is highly operative in complex eukaryotic proteomes largely due to alternative RNA splicing and PTMs. Beyond the regulation of protein function by dynamic PTM, environmental stressors can also lead to chemical modification of proteins linked to human disease. Regardless of their mecha* To whom correspondence should be addressed. E-mail: kelleher@ scs.uiuc.edu. † Department of Computer Science. ‡ Department of Chemistry. (1) Lander, E. S.; et al. Nature 2001, 409, 860-921. 10.1021/ac0341721 CCC: $25.00 Published on Web 06/28/2003
© 2003 American Chemical Society
nism of formation, the detection of PTMs presents a major opportunity for biomarker discovery2 and understanding the fundamental regulatory mechanisms of eukaryotic cells. The analysis of protein digests remains the most popular form of MS-based proteomics. This “bottom up” approach enables highthroughput protein identification and quantitative measurements of protein expression ratios. Using this protease-driven approach, PTM detection has been performed for many years on single proteins3,4 given enough sample, multiple proteases, or both to generate a peptide map approaching 100% sequence coverage.5 Extending the triple digest strategy, Yates and co-workers recently demonstrated 40% sequence coverage for 10% of the proteins studied in a mixture with four having sequence coverage greater than 90% using an expanded version of the SEQUEST algorithm.6 Other existing search engines for bottom up now support some type of PTM detection and localization, including Prowl at Rockefeller,7 Mascot,8 FindMOD,3 and Protein Prospector at UCSF.9 To target PTMs directly, measurement approaches are being developed based on analysis of tryptic peptides harboring particular classes of modifications. For example, detection of phosphorylation4,10-12 and glycosylation has been enhanced by selective isolation (e.g., based on IMAC or biotinylation), specific PTM detection (e.g., precursor ion scanning of modified peptides13), or both procedures. Even determination of differential modification values between two biological samples has been (2) Chong, B. E.; Hamler, R. L.; Lubman, D. M.; Ethier, S. P.; Rosenspire, A. J.; Miller, F. R. Anal. Chem. 2001, 73, 1219-1227. (3) Wilkins, M. R.; Gasteiger, E.; Gooley, A. A.; Herbert, B. R.; Molloy, M. P.; Binz, P. A.; Ou, K.; Sanchez, J. C.; Bairoch, A.; Williams, K. L.; Hochstrasser, D. F. J. Mol. Biol. 1999, 289, 645-657. (4) Ficarro, S.; McCleland, M.; Stukenberg, P.; Burke, D.; Ross, M.; Shabanowitz, J.; Hunt, D.; White, F. Nat. Biotechnol. 2002, 20, 301-305. (5) Biemann, K.; Papayannopoulos, I. Acc. Chem. Res. 1994, 27, 370-378. (6) MacCoss, M. J.; McDonald, W. H.; Saraf, A.; Sadygov, R.; Clark, J. M.; Tasto, J. J.; Gould, K. L.; Wolters, D.; Washburn, M.; Weiss, A.; Clark, J. I.; Yates, J. R., III. Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 7900-7905. (7) Zhang, W.; Chait, B. Anal. Chem. 2000, 72, 1918-1924. (8) Perkins, D.; Pappin, D.; Creasy, D.; Cottrell, J. Electrophoresis 1999, 20, 3551-3567. (9) Clauser, K. R.; Baker, P. R.; Burlingame, A. L. Anal. Chem. 1999, 71, 28712882. (10) Zhou, H.; Watts, J. D.; Aebersold, R. Nat. Biotechnol. 2001, 19, 375-378. (11) Goshe, M. B.; Conrads, T. P.; Panisko, E. A.; Angell, N. H.; Veenstra, T. D.; Smith, R. D. Anal. Chem. 2001, 73, 2578-2586. (12) Oda, Y.; Nagasu, T.; Chait, B. T. Nat. Biotechnol. 2001, 19, 379-382. (13) Steen, H.; Kuster, B.; Fernandez, M.; Pandey, A.; Mann, M. Anal. Chem. 2001, 73, 1440-1448.
Analytical Chemistry, Vol. 75, No. 16, August 15, 2003 4081
reported on single14 and multiple proteins for relative11 or absolute PTM quantitation15 (e.g., phosphoproteomics4,10-12). While some of these techniques are being scaled up for analysis of hundreds of proteins, none are general for all types of biological events that alter the relative molecular weight (Mr) of wild-type proteins. In contrast, the top down approach involving tandem MS (MS/ MS) of intact proteins is general for all PTMs but has yet to be scaled up beyond initial reports (typically