Identification of Intact Proteins in Mixtures by Alternated Capillary

Weiqun Li,† Christopher L. Hendrickson, Mark R. Emmett, and Alan G. Marshall*,†. Center for Interdisciplinary Magnetic Resonance, National High Ma...
0 downloads 0 Views 63KB Size
Anal. Chem. 1999, 71, 4397-4402

Technical Notes

Identification of Intact Proteins in Mixtures by Alternated Capillary Liquid Chromatography Electrospray Ionization and LC ESI Infrared Multiphoton Dissociation Fourier Transform Ion Cyclotron Resonance Mass Spectrometry Weiqun Li,† Christopher L. Hendrickson, Mark R. Emmett, and Alan G. Marshall*,†

Center for Interdisciplinary Magnetic Resonance, National High Magnetic Field Laboratory and Department of Chemistry, Florida State University, Tallahassee, Florida 32310

Here we propose a novel method for rapidly identifying proteins in complex mixtures. A list of candidate proteins (including provision for posttranslational modifications) is obtained by database searching, within a specified mass range about the accurately measured mass (e.g., (0.1 Da at 10 kDa) of the intact protein, by capillary liquid chromatography electrospray ionization Fourier transform ion cyclotron resonance mass spectrometry (LC ESI FTICR MS). On alternate scans, LC ESI infrared multiphoton dissociation (IRMPD) FT-ICR MS yields mostly b and y fragment ions for each protein, from which the correct candidate is identified as the one with the highest “hit” score (i.e., most b and y fragments matching the candidate database protein amino acid sequence masses) and sequence “tag” score (based on a series of fragment sequences differing in mass by 1 or 2 amino acids). The method succeeds in uniquely identifying each of a mixture of five proteins treated as unknowns (melittin, ubiquitin, GroES, myoglobin, carbonic anhydrase II), from more than 1000 possible database candidates within a (500 Da mass window. We are also able to identify posttranslational modifications of two of the proteins (mellitin and GroES). The method is simple, rapid, and definitive and is extendable to a mixture of affinity-selected proteins, to identify proteins with a common biological function. In proteome analyses, proteins are usually first separated by 2-D gel electrophoresis, then in-gel digested by an enzyme with cleavage specificity. For each proteolytic peptide mixture, the peptide nominal masses, together with the protein molecular mass, cleavage specificity, isoelectric point, and possible modifications during the gel procedure, are compared with possible values for * Corresponding author: (tel.) 850-644-0529, (fax) 850-644-1366, (e-mail) [email protected]. † Department of Chemistry. 10.1021/ac990011e CCC: $18.00 Published on Web 09/03/1999

© 1999 American Chemical Society

each protein in a protein sequence database.1-5 Such “peptide fingerprint mapping” may be limited by insufficient search specificity, due to incomplete digestion, sample impurities, posttranslational modifications, and/or database sequence errors. A particular proteolytic peptide may be mass-selected and dissociated in the mass spectrometer. The tandem mass spectrum (MS/MS) of the collision-induced dissociation fragments is related to the peptide sequence. A protein database can thus be searched to match the peptide fragmentation pattern represented by its MS/ MS spectrum.6-10 The correctly identified protein is taken as the one which best matches the candidate peptide masses and their CID spectra. The confidence of the assignment increases greatly by selection of more peptides from the same protein for MS/MS experiments and database searching; that approach can provide high-confidence identification and is highly tolerant of posttranslational modifications or errors in the database.6 However, the overall method is relatively slow and labor-intensive, due to the requisite 2-D gel electrophoresis and enzymatic digestion steps. Another variation of the approach is to digest a simple protein mixture without prior separation and then subject the complex proteolytic peptide mixture to LC/MS/MS.11 That method avoids use of 2-D gel electrophoresis for protein separation and has been (1) Henzel, W.; Billeci, T. M.; Stults, J. T.; Wong, S. C.; Grimley, C.; Watanabe, C. Proc. Natl. Acad. Sci. U.S.A. 1993, 90, 5011-5015. (2) Pappin, D.; Hojrup, B.; Bleasby, A. J. Curr. Biol. 1993, 3, 327. (3) Yates, J. R. I.; Speicher, S.; Griffin, P. R.; Hunkapiller, T. Anal. Biochem. 1993, 214, 397. (4) Mann, M.; Hojrup, P.; Roepstorff, P. Biol. Mass Spectrom. 1993, 22, 338. (5) James, P.; Quadroni, M.; Carafoli, E.; Gonnet, G. Biochem. Biophys. Res. Commun. 1993, 195, 58. (6) Mann, M.; Wilm, M. Anal. Chem. 1994, 66, 4390-4399. (7) Mann, M.; Wilm, M. Trends. Biochem. Sci. 1995, 20, 219-223. (8) Yates, J. R., III.; Eng, J.; McCormack, A. L. Anal. Chem. 1995, 67, 32023210. (9) Yates, J. R., III.; Eng, J.; McMormack, A. L.; Schieltz, D. Anal. Chem. 1995, 67, 1426-1436. (10) Qin, J.; Steenvoorden, R. J. J. M.; Chait, B. T. Anal. Chem. 1996, 68, 17841791. (11) McCormack, A. L.; Schieltz, D. M.; Goode, B.; Yang, S.; Barnes, G.; Drubin, D.; Yates, J. R., III Anal. Chem. 1997, 69, 767-776.

Analytical Chemistry, Vol. 71, No. 19, October 1, 1999 4397

applied to affinity-extracted protein mixtures, such as immunoprecipitation and elution by affinity chromatography.11 An alternative approach is to identify intact proteins, rather than their enzymatically cleaved fragments, which removes the need for prior 2-D gel electrophoresis and enzymatic cleavage, thereby providing advantages in speed, direct determination of protein molecular weight, and identification of posttranslational modifications. Although matrix-assisted laser desorption/ionization (MALDI) in-source decay of intact proteins with delayed extraction and linear time-of-flight mass analysis has been shown to provide partial sequence information for several pure proteins,12 Fourier transform ion cyclotron resonance (FT-ICR) mass spectrometry13 offers a more powerful and versatile method. The unequaled mass resolving power and mass accuracy of FT-ICR MS makes it uniquely suitable for characterizing both intact proteins14-18 and components of complex mixtures,19,20 such as peptides. Its inherent ion-trapping ability makes FT-ICR MS ideal for MSn experiments to yield amino acid primary sequences.21-24 The dynamic range of FT-ICR MS is vastly improved by on-line chromatographic separation.25 Infrared multiphoton dissociation (IRMPD)26 is especially well-suited to LC FT-ICR MS because no gas load is needed, thereby shortening the duty cycle for successive mass analyses. IRMPD also provides for on-axis fragmentation (for more efficient observation of product ions), good control over ion excitation energy, and minimal mass discrimination. Infrared multiphoton dissociation (IRMPD) combined with FTICR analysis of the resulting fragment ions, has previously been shown to provide an efficient and selective fragmentation method.26 McLafferty et al. have shown that IRMPD fragmentation of constantly infused intact proteins can yield enough sequence information to identify the proteins uniquely.22 IRMPD FT-ICR MS is fast (a few seconds) and does not require prior enzymatic degradation. Because no collision gas is needed during the IRMPD experiment, no additional pump-down delay is needed, and the LC/MS scan rate can be relatively fast (e.g., ∼5 s per scan for acquisition of 256K time domain data on our Odyssey (12) Reiber, D. C.; Grover, T. A.; Brown, R. S. Anal. Chem. 1998, 70, 673-683. (13) Marshall, A. G.; Hendrickson, C. L.; Jackson, G. S. Mass Spec. Rev. 1998, 17, 1-35. (14) McLafferty, F. W. Acc. Chem. Res. 1994, 27, 379-386. (15) Wu, Q.; Van Orden, S.; Cheng, X.; Bakhtiar, R.; Smith, R. D. Anal. Chem. 1995, 67, 2498-2509. (16) Senko, M. W.; Hendrickson, C. L.; Pasa-Tolic, L.; Marto, J. A.; White, F. M.; Guan, S.; Marshall, A. G. Rapid Commun. Mass Spectrom. 1996, 10, 1824-1828. (17) Kelleher, N. L.; Senko, M. W.; Little, D. P.; O’Connor, P. B.; McLafferty, F. W. J. Am. Soc. Mass Spectrom. 1995, 6, 220-221. (18) Buchanan, M. V.; Hettich, R. L. Anal. Chem. 1993, 65, 245A-259A. (19) Guan, S.; Marshall, A. G.; Scheppele, S. E. Anal. Chem. 1996, 68, 8, 4671. (20) Rodgers, R. P.; White, F. M.; Hendrickson, C. L.; Marshall, A. G.; Anderson, K. V. Anal. Chem. 1998, 70, 4743-4750. (21) Huang, Y.; Pasa-Tolic, L.; Guan, S.; Marshall, A. G. Anal. Chem. 1994, 66, 4385-4389. (22) Mortz, E.; O’Connor, P. B.; Roepstorff, P.; Kelleher, N.; Wood, T. D.; McLafferty, F. W.; Mann, M. Proc. Natl. Acad. Sci. U.S.A. 1996, 93, 82648267. (23) Wood, T. D.; Chen, L. H.; Kelleher, N. L.; Little, D. P.; Kenyon, G. L.; McLafferty, F. W. Biochemistry 1995, 34, 16251-16254. (24) Solouki, T.; Pasa-Tolic, L.; Jackson, G. S.; Guan, S.; Marshall, A. G. Anal. Chem. 1996, 68, 3718-3725. (25) Emmett, M. R.; White, F. M.; Hendrickson, C. L.; Shi, S. D.-H.; Marshall, A. G. J. Am. Soc. Mass Spectrom. 1997, 9, 333-340. (26) Little, D. P.; Speir, J. P.; Senko, M. W.; O’Connor, P. B.; McLafferty, F. W. Anal. Chem. 1994, 66, 2809-2815.

4398 Analytical Chemistry, Vol. 71, No. 19, October 1, 1999

data station). In this work, we perform alternated ESI MS and ESI IRMPD MS during a single HPLC run, to rapidly identify proteins in a mixture. A mixture of five known proteins is separated by on-line HPLC over a capillary C3 reversed-phase column, with the eluent fed directly into the ESI source and subsequent external ion accumulation before mass analysis with a 9.4 T FT-ICR mass spectrometer. LC/MS and LC/IRMPD/MS data are collected alternately. A computer program has been developed to search the IRMPD spectra against a protein database (Genpept or SwissProt in FASTA format). We achieve rapid identification of the constituent proteins and are able to identify several posttranslational modifications. Because a single LC/MS run takes only 30 min to 1 h, data processing can also be automated, so that rapid protein identification becomes possible. MATERIALS AND METHODS Protein Mixture. Ubiquitin (bovine), GroES (Escherichia coli), melittin (bee venom), myoglobin (horse heart), and carbonic anhydrase II (bovine erythrocytes) were purchased from Sigma Chemical Co. (St. Louis, MO). Myoglobin, ubiquitin, and GroES concentrations were each 50 fmol/µL, whereas melittin was 10 fmol/µL and carbonic anhydrase II was 100 fmol/µL. Total injection volume was 10 µL. The total molar amounts of myoglobin, ubiquitin, and GroES were 500 fmol each, compared with 100 fmol of melittin and 1 pmol of carbonic anhydrase II. Capillary HPLC Separation. HPLC grade solvents, water and acetonitrile, were purchased from J. T. Baker (Philipsburg, NJ); formic acid from Sigma; and a Zorbax C3 (5 µm) reversed-phase capillary column (320-µm i.d., 15-cm-long) from MicroTech Scientific (Sunnyvale, CA). Samples were injected through a Rheodyne 9725 injector (Alltech, Deerfield, IL) with a 25-µL PEEK loop. Capillary HPLC was performed with two Shimadzu LC-10AD pumps. A microflow splitter (LC-packings, San Francisco, CA) produced a gradient at a flow rate of ∼2 µL/min from an input flow rate of 100 µL/min. The mobile phase for gradient elution consisted of (A) water/acetonitrile (90:10 v/v) and (B) water/ acetonitile (10:90 v/v), each containing 0.5% formic acid. The gradient started at 20% B, equilibrated for 10 min after sample injection, then linearly increased from 20% B to 40% B over the next 20 min, to 50% over the following 5 min, to 70% over the following 5 min, to 86% over the following 2 min, and to 90% during the last 1 min. Continuous External Ion Accumulation. Mass analysis was performed with a home-built 9.4 T FT-ICR mass spectrometer controlled by an Odyssey data station (Finnigan FTMS, Madison, WI).16 The electrospray needle was 50 µm i.d. fused silica with a tapered end.25 Ions accumulate in a first rf-only octopole before transfer through a second octopole of the ion injection system to the ICR cell in the center of the magnet.27 Continuous ion accumulation was achieved by applying a d.c. trapping voltage (9.75 V) to the end caps (separated by 45 cm) of the first octopole except during the ion transfer period (1-2 µs). The total mass analysis time was ∼5 s for 256K time-domain data. The duty cycle for each LC/MS analysis was thus nearly 100%. LC/MS and LC-IRMPD/MS by FT-ICR MS. Eluent from the capillary HPLC column was sprayed directly into the FT-ICR (27) Senko, M. W.; Hendrickson, C. L.; Emmett, M. R.; Shi, S. D.-H.; Marshall, A. G. J. Am. Soc. Mass Spectrom. 1997, 8, 970-976.

Figure 1. Stages in the protein database search starting from the parent ion at 10 380.6 Da (monoisotopic mass). The IRMPD mass spectrum (top) is first transformed into a zero-charge mass spectrum (middle).29 The GenPept (release 107.0, NCBI) protein database is then searched for matches to the protein mass as well as its IRMPD fragments (bottom). A scoring system based on fragment hits (mass matches) and sequence tags (partial sequence matchesssee Figure 2) provides criteria for identifying the correct protein candidate as E. coli GroES.

instrument through a tapered-end fused-silica needle at the end of the column. No union was used to connect the electrospray needle with the HPLC column, thereby effectively eliminating dead volumn after the capillary column. Infrared multiphoton dissociation (IRMPD) generated fragment ions. A SYNRAD carbon dioxide IR laser (SYNRAD, Inc., Mukilteo, WA) operated at 32 W for 0.5 s during the fragmentation period. All of the parent ions in one HPLC scan were subjected to IRMPD without any prior ion isolation or ejection. LC/MS and LC-IRMPD/MS data were collected alternately. Database Analysis of LC-IRMPD/MS Spectra. A computer program was written in LabWindows CVI (National Instruments, Austin, TX) on Windows 95 platform. The GenPept Protein Database was obtained as ASCII text files in FASTA format from the GenBank (Release 107.0, National Center for Biotechnology Information, Washington, DC.). All searches were performed on a DELL OptiPlex GXpro (Pentium Pro 200) personal computer. RESULTS AND DISCUSSION We now describe the application of the above procedures to see if the method can correctly identify each component of a mixture of five known proteins. The known proteins exhibit natural-abundance isotopic distributions, so we analyze each of those proteins according to its average isotopic mass. Protein Database Searching Algorithm. Our protein identification strategy begins with optional steps for simplification of the original m/z mass spectrum (see Figure 1). For GroES

protein expressed from 13C,15N doubly depleted28 E. coli culture. First, we deconvolve the m/z domain spectrum into a zero-charge mass spectrum (second panel from the top of Figure 1). The algorithm for this deconvolution has been described elsewhere.29 The zero-charge mass spectrum is simpler than the original one because it combines data from different charge states, thereby greatly simplifying the mass spectrum. Second, we limit the database search to proteins whose intact protein mass falls within a specified range (up to 500 Da) from the experimentally determined mass. That range is designed to allow for inclusion of posttranslational modifications that may increase or decrease the protein molecular weight relative to that expected from the cDNA sequence alone. The algorithm currently recognizes several common posttranslational modifications (so far these are amidation to carboxyl-terminus; methylation, acetylation, formylation to amino-terminus; carboxylation of Cys, presence or absence of Met as the first residue; and cleavage of a signal peptide (15-50 amino acids from the N-terminus). With those modification possibilities included, the database search can be limited to match the experimental (modified) protein to within less than 1 Da. If no known modifications yield a match, then the mass difference between the measured mass and the calculated mass is attributed to an unknown modification on either the N- or C-terminus. These assumptions are then tested as follows. Once a candidate protein sequence is retrieved, the computer generates a list of all possible b and y fragments. If the measured mass of intact protein and calculated mass differ, then the mass difference is assigned to putative posttranslational modification(s) of unknown chemical composition. That mass difference is then incorporated into all b or y fragments. The masses of those fragments are compared with the IRMPD mass spectrum, and two scores (see next section) are assigned to each protein candidate. Potential modifications may thereby be identified and located. The scoring system operates as follows. “Hit” Score and “Tag” Score. To evaluate the match between the fragments calculated from the database sequence and those obtained from the IRMPD spectrum, we designed two scoring systems. A “hit” score simply counts the number of matched (band y-) fragment masses. For a specified mass spectral signal magnitude threshold, the mass of any peak above the threshold is matched against the theoretical fragment masses. If the magnitude of a matched peak is more than twice the threshold, a score of 2 points is assigned to the match; if the magnitude is between 1 and 2 times the specified threshold, a score of 1 point is assigned. A mass that corresponds to loss of H2O or NH3 from a database b or y fragment counts 1 point. A mass that matches to a b fragment with glutamic acid (Glu) or aspartic acid (Asp) at the carboxyl terminus or a y fragment with Glu or Asp at the amino terminus is also given 1 point, in view of the tendency for IRMPD to produce cleavage from the carboxyl end of Asp and Glu.26 The specificity of matching gas-phase dissociated fragments to the masses calculated from the protein sequence database depends on the number of theoretical fragment ions that can be generated from the database within a certain protein mass window. In some cases, specificity may be insufficient to identify a protein, (28) Marshall, A. G.; Senko, M. W.; Li, W.; Li, M.; Dillon, S.; Guan, S.; Logan, T. M. J. Am. Chem. Soc. 1997, 119, 433-434. (29) Zhang, Z.; Marshall, A. G. J. Am. Soc. Mass Spectrom. 1998, 9, 225-233.

Analytical Chemistry, Vol. 71, No. 19, October 1, 1999

4399

Figure 2. Scoring system for our database search program. The hit score simply counts the number of matched b and y fragment masses. The tag score gives weight only to sequence tags (i.e., a series of two or more fragment ions differing in mass by consecutive amino acid(s)). The hit score provides a linear criterion for protein identification, whereas the tag score constitutes a logarithmic scale (see text).

uniquely, relative to its closely matched alternatives; therefore, another score has been devised to decide between such alternatives. A tag score gives weight only to sequence tags, an idea first proposed by Matthias Mann.6,7 A peptide sequence tag is a continuous ion series generated by MS/MS from which a short stretch of sequence (typically 2-3 amino acids) may be deduced. Combined with the mass from the cleavage position to the amino and carboxyl termini of the protein, a peptide sequence tag has about >105 search specificity (i.e., one out of 105 random amino acid sequences would contain such a tag).6 The peptide sequence tag idea has been extended to the gas-phase dissociation of an intact protein.22 Figure 2 (bottom panel) shows that the IRMPD spectrum of 13C, 15N doubly depleted p16 protein (the preparation of which is described elsewhere28) yields several short sequence tags (e.g., b17, b18, b19, whose successive mass differences yield the masses of amino acids nos. 18 and 19). There are ∼20 possible amino acids per sequence position; thus the “specificity” for a single amino acid position tag is 20. In the same spirit, one can show that knowledge of the mass from a given sequence position to the N-terminus, C-terminus, or the mass between different tags adds additional specificity of ∼100 (i.e., the average mass of one

amino acid in a protein with average amino acid composition).6 Because sequence tag information compounds multiplicatively, we score tags on a logarithmic scale. Thus, a tag score of 3 points (i.e., ∼ln(20)) is given if one amino acid is found from a sequence tag. A tag score of 5 points (i.e., ∼ln(100)) is assigned to a matched mass between two different tags or to the mass from the cleavage position to either terminus. Thus, our final tag score represents the logarithm of the search specificity: the higher the tag score, the more specific is the search identification. Because the tag score is logarithmic, even a small difference in tag score can represent a large difference in probability that the protein identification is correct. If the IRMPD mass spectrum exhibits low signal-to-noise (S/ N) ratio, so that only a few peaks are observed, the number of matches may be low, resulting in a low “hit” score. In such a case, the tag score can give much more reliable identification, because a sequence tag has very high specificity. However, a mass spectrum with many peaks can also yield high tag scores for several protein candidates, due to many false-positive matches between the tandem mass spectrum and the candidate sequences. Analysis of a Standard Protein Mixture. We first apply the above methods to a mixture of five known proteins, of molecular weights ranging from 2846 to 28 508 Da. Table 1 shows hit and tag scores for three of the known proteins: ubiquitin, myoglobin, and carbonic anhydrase II. In each case, the initial search was limited to a (100 Da mass window from the measured average mass. For purposes of display, only proteins found within 1 Da of the measured average mass are shown in Table 2. All three known proteins are correctly and uniquely identified, without posttranslational modification. Table 2 shows the hit and tag scores for several candidates for measured masses at 2844.8 (melittin) and 10 435.7 Da (GroES). Each IRMPD mass spectrum was deconvolved to yield a zerocharge mass spectrum. Every protein from SwissProt protein database (release 35.0, SwissProt, Swiss) that has an average mass that lies within 100 Da of the experimental average mass was then retrieved. Any difference between the analyzed mass and calculated mass is taken to represent a possible modification on either the N-terminal or C-terminal fragment. The highest score for the protein of measured mass 2844.8 Da is for melittin (bee venom) with C-terminal amidation (-OH replaced by -NH2, for a net mass decrease of -0.9 Da). The highest score for the protein of

Table 1. Mass Spectral Identification of Carbonic Anhydrase II, Myoglobin, and Ubiquitin, Treated As Unknown Proteinsa searched average mass (Da)

protein name

hit score

tag score

29 023.5

bovine carbonic anhydrase II chlorophyll A-B binding protein of LHCI ORF III [Acidaminococcus fermentans] horse myoglobin phosphoribosylaminoimidazole carboxylase hypothetical 17.1 KDa protein (Assume that start codon Met is cleaved) hypothetical gene 9 zinc binding protein (Assume that start codon Met is cleaved) human ubiquitin

55 18 11 64 16 13

94 25 26 70 18 27

5

18

52

47

16 951.5

8564.9

a Matching the experimentally measured LC/MS intact protein average mass to SwissProt nonredundant protein database proteins within a mass window of (1 Da yields the proteins listed below. For each candidate, the most likely protein is confirmed as the one with the highest hit (mass matches) and tag (partial sequence matches) scores based on matching LC/IRMPD/MS peptide masses to possible b and y fragments for that protein (see text).

4400 Analytical Chemistry, Vol. 71, No. 19, October 1, 1999

Table 2. Mass Spectral Identification of Melittin and GroES (Chaperonin 10 Protein), Treated As Unknown Proteinsa protein name melittin (bee venom) melittin minor (MEL2•ADIME) melittin (MEL•APIDO) Cpn 10 (CH10•ECOLI) Cpn 10 (CH10•LEGPN) Cpn 10 (CH10•HAEIN) Cpn 10 (CH10•AGRTU)

assumed modifications

hit score

tag score

Measured Parent Ion Average Mass: 2 844.8 Da C-terminal -0.9 Da N-terminal -0.9 Da C-terminal -87.0 Da C-terminal -0.9 Da

30 16 14 10

59 34 34 20

Measured Parent Ion Average Mass: 10 435.7 Da N-terminal 48.7 Da C-terminal 48.7 Da N-terminal -39.6Da C-terminal 95.8 Da C-terminal 48.7 Da

20 8 9 8 11

51 18 18 18 3

a Matching the experimentally measured LC/MS intact protein average mass (to within 0.1 Da) to the SwissProt nonredundant protein database yields the proteins listed below. For each candidate, the most likely protein is confirmed as the one with the highest hit and tag scores based on matching LC/IRMPD/MS peptide masses to possible b and y fragments for that protein (see text).

Table 3. Number of Database Proteins Found by Searching within Each of Three Mass Windows Bracketing the Experimental Intact Protein Average Massa mass window score range

(100 Da hit tag

(200 Da hit tag

(500 Da hit tag

0-9 10-20 20-30 30-40 40-50 >50 total

Parent Protein Mass ) 10 435.7 Da 309 531 16 10 28 16 1 5 1 10 0 1 0 1 0 0 0 1 0 1 0 1 326 560

1245 61 37 1 21 0 2 0 1 0 1 1307

0-9 10-20 20-30 30-40 40-50 >50 total

Parent Protein Mass ) 16 951.5 Da 257 450 83 37 148 63 17 49 29 96 2 12 2 16 0 4 0 4 1 1 1 1 360 630

1110 381 165 47 215 2 44 0 5 1 1 1641

a Both hit and tag scores are computed for each candidate protein (on the basis of matches between the experimental IRMPD mass spectrum and possible b and y fragments for each candidate protein). The distribution of number of proteins within each of several score ranges is shown. Although the widest mass search window ((500 Da) finds more than 1000 candidate proteins, only one candidate (the correct one in both cases) gave a hit or tag score >50.

measured mass 10 435.7 Da is for chaperonin 10 (Cpn 10 or GroES) (E. coli) protein with a +48.7-Da modification on the N terminus. The 48.7-Da modification on the Cpn 10 is unknown. However, if that modification is assigned (on the basis of higher hit and tag scores) to the N-terminal fragments of E. coli Cpn 10 protein, the smallest b fragment that matches the modified Cpn 10 sequence is b14; therefore the 48.7-Da modification must be located between the N terminus and Arg14. It is desirable not only to establish the most probable protein but also to have some idea of the number of possible proteins with similar scores. For each of three mass “windows” ((100, 200, and 500 Da from a protein measured average mass of either 10 435.7 Da (top) or 16 951.5 Da (bottom)), Table 3 shows the number of candidate proteins with hit or tag scores within each of several score ranges. For example, on the basis of hit scores for a parent ion of 16 951.5 Da, 360 candidate proteins are found

Figure 3. LC/MS total ion chromatograms (TIC) for the separation of five known proteins. Top: LC ESI FT-ICR MS of intact proteins; the inset shows a mass window spanning the isotopic distribution for a protein of average mass, 8564.9 Da. Bottom: LC ESI IRMPD FTICR MS experiments for the same mixture. The two TICs were detected on alternate scans during the same HPLC run.

within (100 Da, 630 proteins within (200 Da, and 1641 proteins within (500 Da. In all three cases, horse myoglobin is the only candidate protein with a hit score greater than 50 and is therefore unambiguously identified. Unsurprisingly, several myoglobins from other species also score relatively high, due to sequence homology. The top panel of Table 3 shows similar hit and tag score distributions for a parent average mass of 10 435.7 Da. There are 326, 560, and 1307 proteins within (100, (200, and (500 Da mass ranges, respectively. Chaperonin 10 (GroES) protein from E. coli again scores highest and is indeed the correct choice. Alternated LC/MS and LC/IRMPD/MS. Figure 3 shows total ion chromatograms for the LC/MS and LC/IRMPD/MS of five known proteins. Note that the TIC profiles are similar for both intact proteins and fragments, showing that the fragment ions remain trapped and observable after IRMPD. Also note the isotopic distribution (inset in top panel) of the intact protein at 8564.9 Da (average mass). The corresponding IRMPD mass spectrum (inset in bottom panel) shows numerous fragments. A search of the entire protein database for the monoisotopic intact Analytical Chemistry, Vol. 71, No. 19, October 1, 1999

4401

protein, along with its possible b and y fragments, yielded the highest hit and tag scores for bovine ubiquitin. CONCLUSIONS AND FUTURE DIRECTIONS Advantages of the present approach include: sensitivity (protein recovery from the reversed-phase HPLC column should be much greater than that from a 2-D gel); simplicity (dozens of proteins can in principle be identified from a single LC run; speed (∼30 min per LC/MS experiment); and accuracy (protein mass and fragmentation pattern unambiguously identify and approximately locate the site(s) of posttranslational modification). At this stage, the LC/IRMPD FT-ICR MS method appears, already, suitable for identification of a mixture of as many as 3050 proteins, and that should be useful for identifying affinityseparated proteins (see below), large protein complexes, drugtargeted proteins, etc. One improvement would be to increase sensitivity by use of a smaller capillary HPLC column (say, 180 or 75 µm) and lower flow rate.25,30 Alternatively, if a protein is too large or its posttranslational modifications are too complex, CNBr could be used to generate relatively large peptide segments from (30) Valaskovic, G. A.; Kelleher, N. K.; McLafferty, F. W. Science (Washington, D.C.) 1996, 273, 1199-1202. (31) Hofstadler, S. A.; Swanek, F. D.; Gale, D. C.; Ewing, A. G.; Smith, R. D. Anal. Chem. 1995, 67, 1477-1480.

4402 Analytical Chemistry, Vol. 71, No. 19, October 1, 1999

the protein mixture (CNBr cleaves at the carboxyl side of Met, and the natural occurrence of Met is relatively low). Finally, for increasingly complex mixtures, HPLC separation could be replaced by capillary electrophoresis30-32 for better resolution in the on-line separation of proteins. ACKNOWLEDGMENT 13C,15N doubly depleted GroES was produced with the helpful advice and assistance of M. Li and T. M. Logan at Florida State University. 13C, 15N doubly depleted p16 protein was kindly provided by T. Selby and M.-D. Tsai at Ohio State University. We thank John P. Quinn for maintaining the high-performance 9.4 T FT-ICR instrument. This work was supported by NSF (CHE-9413008), NIH (GM-31683 and GM-54035), Florida State University, and the National High Magnetic Field Laboratory in Tallahassee, FL.

Received for review January 8, 1999. Accepted July 19, 1999. AC990011E (32) Hofstadler, S. A.; Severs, J. C.; Smith, R. D.; Swanek, F. D.; Ewing, A. G. Rapid Commun. Mass Spectrom. 1996, 10, 919-922.