Molecular-Level Description of Proteins from Saccharomyces

Apr 14, 2004 - Precise and Parallel Characterization of Coding Polymorphisms, Alternative Splicing, and Modifications in Human Proteins by Mass Spectr...
2 downloads 15 Views 178KB Size
Anal. Chem. 2004, 76, 2852-2858

Molecular-Level Description of Proteins from Saccharomyces cerevisiae Using Quadrupole FT Hybrid Mass Spectrometry for Top Down Proteomics Fanyu Meng, Yi Du, Leah M. Miller, Steven M. Patrie, Dana E. Robinson, and Neil L. Kelleher*

Department of Chemistry, University of Illinois at UrbanasChampaign, Urbana, Illinois 61801

For improved detection of diverse posttranslational modifications (PTMs), direct fragmentation of protein ions by top down mass spectrometry holds promise but has yet to be achieved on a large scale. Using lysate from Saccharomyces cerevisiae, 117 gene products were identified with 100% sequence coverage revealing 26 acetylations, 1 N-terminal dimethylation, 1 phosphorylation, 18 duplicate genes, and 44 proteolytic fragments. The platform for this study combined continuous-elution gel electrophoresis, reversed-phase liquid chromatography, automated nanospray coupled with a quadrupole-FT hybrid mass spectrometer, and a new search engine for querying a custom database. The proteins identified required no manual validation, ranged from 5 to 39 kDa, had codon biases from 0.93 to 0.083, and were primarily associated with glycolysis and protein synthesis. Illustrations of gene-specific identifications, PTM detection and subsequent PTM localization (using either electron capture dissociation or known PTM data stored in a database) show how larger scale proteome projects incorporating top down may proceed in the future using commercial Q-FT instruments. Measurement approaches that provide molecular-level insight into cell dynamics at the protein level focus primarily on determining changes in abundance, half-life, intracellular localization, and posttranslational modification on a large scale. Such technology development employing separation science and mass spectrometry (MS) continues its evolution to improve dynamic range,1,2 specificity of protein identification by database retrieval,3 and measurement efficiency (a combination of throughput, sample amount, and computational requirements). With significant advances for both two-dimensional (2D) gels and non-gel-based proteomic analyses, it is now possible to interrogate ∼25-61%4,5 of a microorganism’s predicted proteome with many low-abundance proteins yet undetectable.6 * Corresponding author: (e-mail) [email protected]; (fax) 217-244-8068. (1) Corthals, G. L.; Wasinger, V. C.; Hochstrasser, D. F.; Sanchez, J. C. Electrophoresis 2000, 21, 1104-1115. (2) Link, A. J. Trends Biotechnol. 2002, 20, S8-13. (3) Rappsilber, J.; Mann, M. Trends Biochem. Sci. 2002, 27, 74-78. (4) Peng, J.; Elias, J. E.; Thoreen, C. C.; Licklider, L. J.; Gygi, S. P. J. Proteome Res. 2003, 2, 43-50.

2852 Analytical Chemistry, Vol. 76, No. 10, May 15, 2004

Beyond proteome coverage, the separate problem of determining complete primary structures of expressed proteins is being increasingly targeted by measurement scientists given the regulatory power that posttranslational modifications (PTMs) can have over protein function.7,8 Many common processes in the cell biology of higher eukaryotes are modulated by PTMs, including proteolytic processing, phosphorylation, acetylation, glycosylation, and ubiquitination to name but a few. This measurement challenge is vast, and many have forwarded approaches in three basic categories: PTM preconcentration,9-12 2D gel signatures,13,14 and the “sequence coverage” approach involving MS measurement of as many (overlapping) peptide fragments as possible.15 A combination of these will be required just to establish a basis set of PTMs expressed from a typical eukaryotic cell, much less determine PTM dynamics. Here, we forward the “top down” MS approach, touted recently for its potential to obtain 100% sequence coverage in an efficient fashion. Further, combinations of PTMs that form a type of posttranslational regulatory logic (e.g., the histone code16) are best interrogated by top down MS, given satisfactory signal-to-noise ratios. Complementary to bottom up proteomics (e.g., using tryptic peptides), top down fragmentation of intact protein ions17-19 offers informatic advantages in both protein identification and PTM (5) Lipton, M. S.; Pasa-Tolic, L.; Anderson, G. A.; Anderson, D. J.; Auberry, D. L.; Battista, J. R.; Daly, M. J.; Fredrickson, J.; Hixson, K. K.; Kostandarithes, H.; Masselon, C.; Markillie, L. M.; Moore, R. J.; Romine, M. F.; Shen, Y.; Stritmatter, E.; Tolic, N.; Udseth, H. R.; Venkateswaran, A.; Wong, K. K.; Zhao, R.; Smith, R. D. Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 11049-11054. (6) Ghaemmaghami, S.; Huh, W. K.; Bower, K.; Howson, R. W.; Belle, A.; Dephoure, N.; O’Shea, E. K.; Weissman, J. S. Nature 2003, 425, 737-741. (7) Mann, M.; Jensen, O. N. Nat. Biotechnol. 2003, 21, 255-261. (8) Wu, C. C.; Yates, J. R., 3rd. Nat. Biotechnol. 2003, 21, 262-267. (9) Salomon, A. R.; Ficarro, S. B.; Brill, L. M.; Brinker, A.; Phung, Q. T.; Ericson, C.; Sauer, K.; Brock, A.; Horn, D. M.; Schultz, P. G.; Peters, E. C. Proc. Natl. Acad. Sci. U.S.A. 2003, 100, 443-448. (10) Ficarro, S. B.; McCleland, M. L.; Stukenberg, P. T.; Burke, D. J.; Ross, M. M.; Shabanowitz, J.; Hunt, D. F.; White, F. M. Nat. Biotechnol. 2002, 20, 301-305. (11) Zhou, H.; Watts, J. D.; Aebersold, R. Nat. Biotechnol. 2001, 19, 375-378. (12) Oda, Y.; Nagasu, T.; Chait, B. T. Nat. Biotechnol. 2001, 19, 379-382. (13) Larsen, M. R.; Roepstorff, P. Fresenius J. Anal. Chem. 2000, 366, 677-690. (14) Wilkins, M. R.; Gasteiger, E.; Gooley, A. A.; Herbert, B. R.; Molloy, M. P.; Binz, P. A.; Ou, K.; Sanchez, J. C.; Bairoch, A.; Williams, K. L.; Hochstrasser, D. F. J. Mol. Biol. 1999, 289, 645-657. (15) MacCoss, M. J.; McDonald, W. H.; Saraf, A.; Sadygov, R.; Clark, J. M.; Tasto, J. J.; Gould, K. L.; Wolters, D.; Washburn, M.; Weiss, A.; Clark, J. I.; Yates, J. R., 3rd. Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 7900-7905. (16) Jenuwein, T.; Allis, C. D. Science 2001, 293, 1074-1080. 10.1021/ac0354903 CCC: $27.50

© 2004 American Chemical Society Published on Web 04/14/2004

characterization.20 Over the past decade, the top down approach has evolved from protein standards,21 to recombinant proteins,18 to 4:1) was added to all the fractions before they were stored at -80 °C for further processing. For each fraction analyzed, the sample was spun at 7200 rpm for 2 min to collect acetone-precipitated proteins, while bulk ALS remained in the supernatant. The pellet was then washed by 100% ice-cold acetone once. Typically, two adjacent ALS-PAGE fractions were combined. Each sample was dissolved into 0.7 mL of 6 M guanidine hydrochloride buffer and then trifluoroacetic acid (TFA) was used to adjust to pH 2 for 2 h to degrade residual ALS. Generally, processed ALS-PAGE fractions were injected onto a C4 Symmetry 300 column (4.6 × 50 mm; Waters Corp., Milford, MA) or a C18 NPS column (4.6 × 14 mm; Eichrom Technologies, Darien, IL), washed for 20 min on-column, and eluted with a linear gradient over 15-20 min with standard solvents (H2O, CH3CN) and 0.1% TFA. The RPLC fractions containing ∼0.5-10 µg of total protein were dried down before MS analysis. Some of the samples processed in this study were prepared using 20-fold less lysate through a miniprep cell (0.7-cm diameter, BioRad, Hercules, CA) and direct analysis of HPLC fractions from a 320-µm-i.d. C4 capillary column (2% formic acid instead of 0.1% TFA). ESI/Q-FTMS. The fractionated protein mixtures were resuspended in ESI solution (50% ACN, 49% H2O, and 1% formic acid). (33) Johnson, J. R.; Meng, F.; Forbes, A. J.; Cargile, B. J.; Kelleher, N. L. Electrophoresis 2002, 23, 3217-3223. (34) Taylor, G. K.; Kim, Y. B.; Forbes, A. J.; Meng, F. Y.; McCarthy, R.; Kelleher, N. L. Anal. Chem. 2003, 75, 4081-4086. (35) Nieuwint, R. T.; Molenaar, C. M.; van Bommel, J. H.; van Raamsdonk-Duin, M. M.; Mager, W. H.; Planta, R. J. Curr. Genet. 1985, 10, 1-5. (36) Mangiarotti, G. Biochem. J. 2003, 370, 713-717.

Analytical Chemistry, Vol. 76, No. 10, May 15, 2004

2853

Normally samples were spun at 14 000 rpm for 8 min, and then 10-µL samples were loaded into a 96-well plate. A nanospray robot (Advion BioSciences, Ithaca, NY) used 10 µL of solution from each well and automatically set up the spray. Typically, 10-µL samples were enough for more than 50 min of stable nanospray, providing ample time to acquire high-quality MS and MS/MS scans of two to four intact proteins per sample. With unused sample volumes archived, manual fragmentation data [e.g., better quality threshold MS/MS or electron capture dissociation (ECD)37] could be generated for further localization of PTMs or other mass discrepancies (∆m’s).34 The instrument used in this study was a custom 8.5-T Q-FTMS of the Marshall design38 akin to that recently commercialized by Bruker Daltonics. The customized Tool Command Language (TCL) script first acquired 25 or 50 broadband scans, and then a deconvolution algorithm33 was called to calculate protein Mr values and return a series of peak lists. The top one to five most abundant charge states of each protein were used to generate the stored waveform inverse Fourier transform (SWIFT) (for higher Mr proteins, more charge states were isolated). After five scans for the isolated charge states, an IR laser was activated for 0.250.45 s (with a beam expander mounted in front of the laser, 40 W, 75% power). Infrared multiphoton dissociation (IRMPD) spectra were sums of 35 or 55 scans. Several samples were analyzed using a quadrupole-based selection strategy. In short, targeted ions were selected in ∼40 m/z windows by a quadrupole filter before accumulation (1-2 s) in an external octupole and transfer to the ICR cell. Ions were fragmented while entering the external octupole (∼10 mTorr, -35 V dc offset) or in the ICR cell after a SWIFT isolation (5-10 m/z) was applied (to further clean targeted ions) prior to IRMPD; typically, 10-50 scans were used to collect such MS/MS data. The data of Figure 2D were acquired on a LTQ FT (Thermo Electron Corp., San Jose, CA) using static nanospray for sample infusion, a 5 m/z isolation window, and the normalized collision energy of 30% for the linear ion trap.39 Data Analysis and ProSight PTM. Intact protein spectra were analyzed by a deconvolution algorithm,33 and the fragmentation data were analyzed by a modified version of the THRASH algorithm.40 The resulting protein list and fragment peak list were uploaded onto ProSight PTM for further analysis (https:// prosightptm.scs.uiuc.edu).34 Theoretical isotopic distributions were generated using Isopro v3.0, and the mass difference (in units of 1.002 35 Da)41 between the most abundant isotopic peak and the monoisotopic peak is denoted in italics after each Mr value. Most spectra were stored as 512K data points in the MIDAS data station,42 and the parameters used to process time domain data (37) Zubarev, R. A.; Kelleher, N. L.; McLafferty, F. W. J. Am. Chem. Soc. 1998, 120, 3265-3266. (38) Hendrickson, C. H.; Emmett, M. R.; Quinn, J. P.; Marshall, A. G. Proceedings of the 48th ASMS Conference on Mass Spectrometry and Allied Topics, Long Beach, CA, 2000. (39) Horning, S.; Malek, R.; Wieghaus, A.; Senko, M. W.; Syka, J. E. P. Proceedings of the 51st ASMS Conference on Mass Spectrometry and Allied Topics, Montreal, 2003. (40) Horn, D. M.; Zubarev, R. A.; McLafferty, F. W. J. Am. Soc. Mass Spectrom. 2000, 11, 320-332. (41) Senko, M. W.; Beu, S. C.; McLafferty, F. W. J. Am. Soc. Mass Spectrom. 1995, 6, 229-233. (42) Senko, M. W.; Canterbury, J. D.; Guan, S.; Marshall, A. G. Rapid Commun. Mass Spectrom. 1996, 10, 1839-1844.

2854

Analytical Chemistry, Vol. 76, No. 10, May 15, 2004

were zero or one truncation, no zero fill, and Hanning apodization. All spectra were externally calibrated with an ECD data set of bovine ubiquitin or (where indicated) internally using identified fragment ions. The criteria for database searching were generally (1000 Da or (2000 Da Mr window and 10-50 ppm error tolerance for fragment ions, with default PTM options selected (such as N-acetylations and known site-specific cases of PTMs such as phosphorylation and methylation).34 When an interesting ∆m or ambiguities in protein identification were encountered, further data mining using internal calibration was performed, which could reduce most fragment ion errors to the 1-10 ppm range. P-Scores reported in this study are the negative log of the probability score that has been defined previously.20 For example, a protein match with a probability score of 1 × 10-10 has a P-score of 10, so a higher P-score indicates a higher confidence of protein identification. Unless noted otherwise, protein identifiers are Swiss-Prot primary accession numbers.43 RESULTS AND DISCUSSION Intact Protein Analysis by Sample-Dependent MS/MS. To date, more than 400 different molecular masses have been observed using ALS-PAGE/RPLC fractions containing 5-39-kDa proteins. Overall, ∼80-90% of the samples provided protein charge-state distributions after 25 scans, with 9 such mixtures ranging in complexity from 1 to 10 observable protein forms (Figure 1). After on-line deconvolution, multiple charge states of the most abundant protein were isolated and dissociated by the data acquisition software. In separate MS/MS experiments, one, two, or three more proteins of sequential decreasing abundance were fragmented automatically. Such processing of the 11 596.70 Da protein from Figure 2A produced the isolation spectrum of Figure 2B and the MS/MS spectrum of Figure 2C containing 133 isotopic distributions. After filtering,34 86 fragment ion masses were combined with the Mr value and uploaded into ProSight PTM to query the database. With consideration of several possible N-terminal states (Met-kept, Met-removed, or either with acetylation) during the search, the best protein candidate had 19 band 13 y-type ion matches with a P-score of 25.6. The protein identified was an N-terminally acetylated form of heat shock protein P22943. The theoretical Mr for this protein was 11 596.63-0 Da, within 10 ppm of the observed value. Also, seven pairs of complementary fragment ions were observed (b6/y102, b34/ y74, b37/y71, b47/y61, b57/y51, b61/y47, b64/y44), providing 100% sequence coverage at both the intact protein and fragment ion levels. This protein has also been identified by top down MS/MS (P-score of 2; 100 Da Mr window) in a 3D ion trap by McLuckey and co-workers.44 Using a multichannel SWIFT for isolation of multiple ESI charge states provided quality fragmentation data during automated MS/MS, in part due to different fragmentation behaviors of proteins carrying different numbers of charges.45 For infrared dissociation used here, MS/MS spectral quality and database searching were often optimal with ∼30% of the precursor ions (43) Boeckmann, B.; Bairoch, A.; Apweiler, R.; Blatter, M. C.; Estreicher, A.; Gasteiger, E.; Martin, M. J.; Michoud, K.; O’Donovan, C.; Phan, I.; Pilbout, S.; Schneider, M. Nucleic Acids Res. 2003, 31, 365-370. (44) Amunugama, R.; Hogan, J. M.; Newton, K. A.; McLuckey, S. A. Anal. Chem. 2004, 76, 720-727. (45) Reid, G. E.; Wu, J.; Chrisman, P. A.; Wells, J. M.; McLuckey, S. A. Anal. Chem. 2001, 73, 3274-3281.

Figure 1. Overview of representative spectra (25 scans) obtained from ALS-PAGE/RPLC samples processed in this study. Automatic MS/MS analysis of the top right sample is shown in Figure 2A-C.

Figure 2. Sample-dependent MS/MS analysis of a 12-kDa yeast protein, including (A) broadband (25 scan), (B) isolation (5 scans), (C) IR fragmentation (35 scans; 836-862 m/z region of the spectrum shown in the inset) on a custom Q-FTMS hybrid, and (D) manual MS/MS of the 13+ precursor (indicated by asterisk) in an LTQ FT instrument of the major component in (A).

Figure 3. Gene-specific identification with simultaneous detection of phosphorylation. (A) Broadband scan for an ALS-PAGE/RPLC sample (25 scans). (B) IRMPD of the automatically isolated 11-kDa species (35 scans). (C) Follow-up manual IRMPD of the isolated 11kDa species using more vigorous fragmentation conditions (50 scans). (D) Graphical fragment map showing ProSight output obtained when using the (B) data as input and searching for known cases of phosphorylation (Ser100, circled) housed in the database (P-score 3.5).

Analytical Chemistry, Vol. 76, No. 10, May 15, 2004

2855

Figure 4. Automated top down analysis of four proteins in a mixture. (A) Broadband scan for an ALS-PAGE/RPLC sample (25 scans). (B) Isolation of the 16-kDa species by sample-dependent deconvolution and SWIFT (5 scans). (C) IRMPD of the isolated 16-kDa species (35 scans). (D) ProSight PTM output for the database retrieval identifying the 16-kDa protein to be Q01855 (RPS15), with N-terminal acetylation (P-score 7.2). (E) Two 12-kDa species were selected together by the automated deconvolution and SWIFT isolation (5 scans). (F) IRMPD of the isolated two 12-kDa species (35 scans); inset, the critical c2 ion observed from ECD of the species in panel E. (G) Output of ProSight PTM showed these two proteins are two ribosomal proteins with a 28-Da mass discrepancies localized into the first amino acid as highlighted by the box.

remaining. We are currently developing software with the capability to automatically optimize fragmentation conditions. Processing the same sample on a linear quadrupole ion trap-FT hybrid (the LTQ-FT) gave the MS/MS spectrum of Figure 2D. Collisional activation in the high-pressure ion trap39 allowed selective and complete dissociation of the isolated 13+ precursor ions, yielding 41 matching b/y ions for a P-score of 55.2. Gene-Specific Identification of a Modified Protein. An 11 129.3-6 Da yeast protein was observed in one ALS-PAGE/RPLC fraction (Figure 3A). Following automated deconvolution, SWIFT isolation of two charge states (data not shown), and IR fragmentation, the MS/MS spectrum of Figure 3B provided 16 isotopic distributions corresponding to 14 discrete fragment ion mass values. In the default search mode, there was no valid identification above 95% confidence (i.e., P-score >1.3). In ∆m mode,32 five genes belonging to an acidic ribosomal protein family (Supporting Information Figure 1) occupied the top of the retrieval list with P-scores all >3.5 (i.e., 0.03% chance of a spurious hit). Further examination of the fragment ions showed that they were six large b-type ions and two small y-type ions, all resulting from backbone cleavages close to the C-terminus of the protein. Manual reexamination of the protein using more vigorous fragmentation conditions (i.e., higher laser power and longer irradiation time) generated small b ions (Figure 3C) that were not previously observed. With three singly charged ions assigned as b10-b12, this gene product was identified unambiguously to be acidic ribosomal protein P02400, with a total of nine b-type ions and two 2856 Analytical Chemistry, Vol. 76, No. 10, May 15, 2004

y-type ions matching. The P-score associated with this revised search was 7.4. The ∆m was 79.9 Da, indicating phosphorylation in the region between Gly13-Asp103 to be likely. Instead of more data acquisition to localize the ∆m, we could identify the protein in normal mode searching with the original MS/MS data set (Figure 3B) and the phosphorylation search option selected. In this case, we opened up the database search to consider known, site-specific PTMs during retrieval. In the database, there is a protein form corresponding to P02400 phosphorylated at Ser100 that had been reported previously using newly developed detection methods for phosphopeptides.10 In sum, this acidic protein was determined to have its Met retained (no acetylation), its precise gene identified, and a phosphorylation detected and putatively localized all from sparse top down data (see ProSight output in Figure 3D). More extensive top down or phosphopeptide analysis would be required to validate the precise location of the phosphorylation also observed in yeast during stationary phase.10 However, this strategy allows a significantly faster approach to PTM profiling with the potential to detect diverse modifications that occur in combination. Thus, the synergy between bottom up and top down methods predicted by some years ago is indeed coming to fruition. This illustrates a key role for placing PTMs in a database,7 particularly in combination with automated top down mass spectrometry. Automated Top Down Processing of Mixtures with Unambiguous PTM Characterization. Processing of a prototypical ALS-PAGE/RPLC fraction is shown in Figure 4A. The on-line

deconvolution algorithm correctly recognized the four most abundant charge-state distributions within this complex protein mixture (∼6 components). The 15 902.5-0 Da component was isolated by SWIFT (Figure 4B) and dissociated first because it had a higher overall abundance considering all charge states. The resulting MS/MS spectrum (Figure 4C) generated 32 ion species corresponding to 25 unique masses. After filtering, 17 ions were submitted to ProSight PTM and the protein was identified to be Q01855 (RPS15) with its Met removed. A 42.0-Da ∆m could be localized to the 20 N-terminal residues and is attributed to an N-terminal acetylation.46 The P-score for this search was 7.2, indicating an unambiguous retrieval. Upon further inspection, three b- and six y-type ion matches were found that comprise two sequence tags (“M” and “VG[I/L]”) as indicated by the graphical fragmentation map shown in Figure 4D. The observed molecular mass matched to the predicted value (15 902.51-0 Da) within 10 ppm. With the Figure 4A sample still spraying through the NanoMate robot, the TCL script selected the next most abundant species. With a 10 m/z isolation window, two species of 11 898.8-0 and 11 928.8-0 Da (∆m ) 30.01 Da; ∆m/z ∼ 2.5) were selected for this MS/MS attempt. After automated data reduction and filtering, the MS/MS data shown in Figure 4F generated 33 masses for database searching. Two proteins were identified together using the ∆m search mode (fragment mass accuracy was 0.2 Da), with P-scores of 14.0 and 9.7. The observed species emanate from duplicate ribosomal protein genes (RPS25A and RPS25B; Swiss-Prot primary accession number P07282), with only a Thr104Ala sequence difference (theoretical ∆m ) 30.01 Da). The identifications were verified by three observed y-type ions from RPS25A and two y-type ions for RPS25B that were 30 Da apart. Both protein forms had no start Met present, did not have a predicted propeptide (residues 1-14), and harbored a +28.0 Da ∆m between their observed and DNA-predicted Mr values. Upon further inspection and internal calibration of the MS/MS spectrum, the ∆m was determined to be 28.04 ( 0.01 Da. This indicates that the ∆m is far more likely from two methylations (28.03 Da) versus one formylation (27.99 Da). Further, the ∆m could be localized to the first six amino acids (Figure 4G, box), with the most probable sites to be either the N-terminus or Lys3. This result is consistent with either dimethylation of the N-terminal Pro 47 or dimethylation of Lys3 (methylation of Gln unlikely). Further fragmentation by electron capture dissociation37 of quadrupole-enhanced ion populations generated a MS/MS spectrum with 26 matching c/z• ions, including the critical c2 ion consistent with dimethylation of the N-terminal proline (Figure 4F, inset). The c3, c4, and c5 ions were also observed and all had 6 components. Also validated in the above example was the trend that protein variants coelute through the entire ALS-PAGE/RPLC fractionation, making determination of their apparent ratios feasible by ESI-Q-FTMS. The (46) Arnold, R. J.; Polevoda, B.; Reilly, J. P.; Sherman, F. J. Biol. Chem. 1999, 274, 37035-37040. (47) Martinage, A.; Briand, G.; Van Dorsselaer, A.; Turner, C. H.; Sautiere, P. Eur. J. Biochem. 1985, 147, 351-359.

Table 1. Assignments of Critical c Ions for Yeast Ribosomal Protein S25 A/B assignment

m/z

z

ion mass ion mass errora (exp) (theo) (ppm)

c5+ c4+ c3+ c52+

624.3867 496.3273 368.2663 312.6958

1 1 1 2

624.3867 496.3273 368.2663 625.3916

624.3832 496.3247 368.2661 625.3911

5.6 5.2 0.5 0.8

c2+

240.1712 1 240.1712 240.1712

0.2

source of charging dimethyl-Pro dimethyl-Pro dimethyl-Pro dimethyl-Pro and Lys3 protonation dimethyl-Pro

a Determined using external calibration, with the space charge of the calibrant spectrum closely matching that of the ECD spectrum.

Figure 5. Synopsis of methods used and proteins detected in this study. (A) 28 gene products were 2 residues long and 87 did not. (D) 26 acetylations, 2 methylations, 1 phosphorylation, and 2 disulfide bonds were observed. (E) 43 ribosomal proteins, 46 glycolysis proteins, 7 heat shock proteins, and 21 other proteins were observed.

biological role of this ribosomal protein has been implicated in assembly of the small subunit.35 Although the function of the methylation is still unclear in S. cerevisiae, several reports have shown that the methylation serves as a key regulatory switch for the degradation of the ribosomal protein mRNAs in Dictyostelium discoideum.36,48 Genetic alteration of the modification site along with isolation, identification, and reconstitution of the methyltransferase in vitro are currently underway. Summary of Identified Gene Products. Shown in Figure 5 are several pie graphs depicting the 117 protein forms identified and characterized from the fractionated yeast extract. A list is also available as Supporting Information. The majority (73) were intact proteins whereas 44 were protease products (Figure 5B). Approximately one-third of the identified species generated the same (48) Mangiarotti, G.; Giorda, R. Biochem. Cell Biol. 2002, 80, 261-270.

Analytical Chemistry, Vol. 76, No. 10, May 15, 2004

2857

sequence tag information compiled from MS/MS spectra (Figure 5C). Not counting proteolytic cleavage, ∼25% of the species observed were modified, with the majority of these N-terminal acetylations. Figure 5E illustrates that mostly proteins of higher abundance were observed (codon biases between 0.93 and 0.083), such as ribosomal proteins and those involved in glycolysis. Protein sizes ranged from 5 to 39 kDa (Figure 5A), all in accord with the Mr values expected from the size-based proteome fractionation used as the first step in lysate processing.32 Conclusions. This initial report of high-resolution top down MS on a large scale has combined front end fractionation, sampledependent data acquisition, and back end data analysis to effectively determine covalent states of gene products. Uplike the bottom up, more than 99% of identifications did not require manual validation of the result. The strategy of considering known and predicted modifications during database retrieval accelerates both identification and protein characterization. Such storage of known PTM information in databases illustrates a specific synergy between top down and bottom up proteomics likely to find wider implementation in coming years. With most of the 117 gene products identified and characterized to date being highly abundant, the top down approach is progressing in a similar trajectory as bottom up methods did during the mid-to-late 1990s. Nearterm improvements to MS/MS above 10 kDa will include “smart” fragmentation involving automated optimization of MS/MS conditions and targeting proteins in the 50-70-kDa range. With recent

2858

Analytical Chemistry, Vol. 76, No. 10, May 15, 2004

development of sophisticated data acquisition software and hybrid FTMS hardware commercially available, it is certain that top down MS will soon proliferate to many more laboratories. ACKNOWLEDGMENT The acid-labile analogue of SDS was a gift from Edward Bouvier and Reb Russell of the Waters Corp. We thank Yong-Bin Kim, Gregory Taylor, Lihua Jiang, David Whipple, and Michael Roth for technical assistance. We also thank Iain Mylchreest, Ian Jardine, Michael Senko, and Vlad Zabrouskov of Thermo Electron Corp. for access to the linear trap-FT hybrid. The laboratory of N.L.K. received support from the Searle Scholars Program, the Burroughs Wellcome Fund, the Sloan and Packard Foundations, and the National Institutes of Health (GM 067193). F.M. received a Drickamer Fellowship at the University of Illinois and an ACS DAC Graduate Fellowship sponsored by Dow. SUPPORTING INFORMATION AVAILABLE An Excel spreadsheet of identified and characterized protein forms. This material is available free of charge via the Internet at http://pubs.acs.org.

Received for review December 17, 2003. Accepted February 26, 2004. AC0354903