Accurate Mass Multiplexed Tandem Mass Spectrometry for High

Accurate Mass Multiplexed Tandem Mass Spectrometry for High-Throughput Polypeptide .... was controlled by an Odyssey (Finnigan, Madison, WI) data stat...
2 downloads 0 Views 130KB Size
Anal. Chem. 2000, 72, 1918-1924

Accurate Mass Multiplexed Tandem Mass Spectrometry for High-Throughput Polypeptide Identification from Mixtures Christophe Masselon, Gordon A. Anderson, Richard Harkewicz, James E. Bruce, Ljiljana Pasa-Tolic, and Richard D. Smith*

Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, P.O. Box 999, Richland, Washington 99352

We report a new tandem mass spectrometric approach for the improved identification of polypeptides from mixtures (e.g., using genomic databases). The approach involves the dissociation of several species simultaneously in a single experiment and provides both increased speed and sensitivity. The data analysis makes use of the known fragmentation pathways for polypeptides and highly accurate mass measurements for both the set of parent polypeptides and their fragments. The accurate mass information makes it possible to attribute most fragments to a specific parent species. We provide an initial demonstration of this multiplexed tandem MS approach using an FTICR mass spectrometer with a mixture of seven polypeptides dissociated using infrared irradiation from a CO2 laser. The peptides were added to, and then successfully identified from, the largest genomic database yet available (C. elegans), which is equivalent in complexity to that for a specific differentiated mammalian cell type. Additionally, since only a few enzymatic fragments are necessary to unambiguously identify a protein from an appropriate database, it is anticipated that the multiplexed MS/MS method will allow the more rapid identification of complex protein mixtures with on-line separation of their enzymatically produced polypeptides. The advent of the “postgenomic” era has increased demands for both the throughput and sensitivity of protein identification.1 Currently, the identification of proteins from complex mixtures is generally conducted using 2-D polyacrylamide gel electrophoresis (2-D PAGE) separations followed by spot excision, digestion, and matrix-assisted laser desorption/ionization mass spectrometry2,3 (MALDI-MS) or on-line liquid chromatography (LC) separation followed by electrospray ionization (ESI) tandem MS (e.g., MS/MS).4 An alternative route to higher throughput protein identification, where 2-D PAGE separations are circumvented, has (1) Yates, J. R. J. Mass Spectrom. 1998, 33, 1-19. (2) Strupat, K.; Karas, M.; Hillenkamp, F.; Eckerskorn, C.; Lottspeich, F. Anal. Chem. 1994, 66, 464-470. (3) Liang, X. L.; Bai, J.; Liu, Y. H.; Lubman, D. M. Anal. Chem. 1996, 68, 10121018. (4) Shevchenko, A.; Jensen, O. N.; Podtelejnikov, A. V.; Sagliocco, F.; Wilm, M.; Vorm, O.; Mortensen, P.; Boucherie, H.; Mann, M. Proc. Natl. Acad. Sci. U.S.A. 1996, 93, 14440-14445.

1918 Analytical Chemistry, Vol. 72, No. 8, April 15, 2000

been pioneered by Yates and co-workers based upon the analysis of protein mixtures.5,6 In this approach, groups of proteins are digested to produce mixtures of polypeptides that are directly analyzed using an on-line separation (e.g., capillary LC) and MS/ MS. Since it is generally the case that many polypeptides will coelute (i.e., be detected in the same spectrum), the ability to address complex protein mixtures depends on how many polypeptides can be individually subjected to MS/MS experiments and how quickly data of sufficient quality for their identification can be accumulated. Previous work has demonstrated the utility of stopping or slowing capillary electrophoresis7 or LC8 flow rates. This enables additional MS/MS experiments to be conducted during the elution time of individual polypeptides and allows additional polypeptides to be identified for a given sample quantity consumed (i.e., greater overall sensitivity). The ability to characterize multiple polypeptides simultaneously using MS/MS would enhance both sensitivity and throughput (with or without the use of reduced elution speed methods). We report here a new tandem mass spectrometric approach for faster and more sensitive identification of polypeptides in mixtures. This approach is based on the simultaneous MS/MS of multiple polypeptides combined with the use of highly accurate mass measurements. In the present case, the high mass measurement accuracy is provided by Fourier transform ion cyclotron resonance (FTICR) mass spectrometry, although the principles are generally applicable when sufficiently high mass measurement accuracy is achieved. Previous “multiplexing” approaches using tandem FTICR mass spectrometry have involved comprehensive 2-D studies9 and Hadamard transform methods.10 These methods used multiple spectrum acquisitions in which different subsets of parent ions were simultaneously dissociated. Processing of the whole data (5) Ducret, A.; Van Oostveen, I.; Eng, J. K.; Yates, J. R.; Aebersold, R. Protein Sci. 1998, 7, 706-719. (6) Yates, J. R.; McCormack, A. L.; Eng, J. Anal. Chem. 1996, 68, 534A-540A. McCormack, A. L.; Schieltz, D. M.; Goode, B.; Yang, S.; Barnes, G.; Drubin, D.; Yates, J. R. Anal. Chem. 1997, 69, 767-776. (7) Goodlett, D. R.; Wahl, J. H.; Udseth, H. R.; Smith, R. D. J. Microcolumn Sep. 1993, 5, 57-62. (8) Moore, R. E.; Licklider, L.; Schumann, D.; Lee, T. D. Anal. Chem. 1998, 70, 4879-4884. (9) Ross, C. W.; Guan, S. H.; Grosshans, P. B.; Ricca, T. L.; Marshall, A. G. J. Am. Chem. Soc. 1993, 115, 7854-7861. (10) Williams, E. R.; Loh, S. Y.; McLafferty, F. W.; Cody R. B. Anal. Chem. 1990, 62, 698-703. 10.1021/ac991133+ CCC: $19.00

© 2000 American Chemical Society Published on Web 03/17/2000

Table 1. Set of Polypeptides Analyzed with Sequence, Monoisotopic Molecular Masses, and Terminal Groups

a

polypeptide

sequence

Mw

N-terminus

C-terminus

renin inhibitor bradykinin angiotensin substance P fibrinopeptide A neurotensin γ-endorphin

HPFHLLVY RPPGFSPFR DRVYIHPFHL RPKPQQFFGLM-(NH2) ADSGEGDFLAEGGGVR (pGlu)LYENKPRRPYILa YGGFMTSEKSQTPLVTL

1024.549 1059.561 1295.677 1346.728 1535.685 1671.910 1857.918

H H H H H pGlu H

H H H NH2 H H H

Glu, pyroglutamic acid.

set then allows the attribution of all fragments to the appropriate parent species. In the present approach, a set of parent species is dissociated and measured in a single mass spectrum (i.e., in exactly the same manner that a single parent species would be studied), providing both enhanced sensitivity and a gain in throughput directly proportional to the number of species that may be simultaneously addressed. We show that the present method depends crucially on high mass measurement accuracy and, given this, has the potential for polypeptide identification from the largest genomic databases yet available. The initial results presented here indicate that the multiplexed MS/MS approach can maintain the high confidence levels for polypeptide identification by the use of high-accuracy mass measurements while also providing increased throughput. The example presented also includes an initial demonstration indicating that certain types of modified peptides will also be amenable to this approach. EXPERIMENTAL SECTION Sample Preparation. All polypeptides used in this work were purchased from Sigma (St. Louis, MO) and used without further purification. Table 1 lists the names, sequences, and molecular weights of the polypeptides. Stock solutions were prepared by dissolving 1 mg of polypeptide in 1 mL of distilled water. The mixture was prepared by mixing 20-µL aliquots of each stock solution with 140 µL of water, methanol, and acetic acid (49:50:1). ESI-FTICR-IRMPD MS/MS. All experiments were performed using an 11.5-T FTICR mass spectrometer equipped with an external electrospray ion source and an elongated cylindrical open-ended cell, and which is described elsewhere.11,12 The instrument was controlled by an Odyssey (Finnigan, Madison, WI) data station. Infrared laser dissociation was effected using a CO2 laser (model 48-2W-25W, Synrad, Mukilteo, WA) operated in the pulsed mode. The pulse was achieved by gating the laser controller with a 5-V trigger delivered by the FTMS Odyssey workstation. An adjustable beam expander (Synrad) was mounted directly on the laser head, allowing the beam diameter to be varied from 3 to 7.5 mm. The beam passed through a ZnSe beam splitter and was directed on the magnetic field axis of the FTICR instrument by means of a silver-coated silicon mirror (TRS1.0PS-0339, Laser Power Optics, San Diego, CA). The beam entered the instrument through a BaF2 window (Bichron Harshaw, Solon, OH) mounted on a miniflange. (11) Udseth, H. R.; Gorshkov, M. V.; Belov, M. L.; Pasa-Tolic, L.; Bruce, J. E.; Masselon, C. D.; Harkewicz, R.; Anderson, G. A.; Smith, R. D. Proceedings of the 37th ASMS Conference on Mass Spectromtry And Allied Topics, Dallas, TX. June 13-17, 1999. (12) Harkewicz, R.; Anderson, G. A.; Bruce, J. E.; Masselon, C.; Udseth, H. R.; Smith, R. D., unpublished work.

The polypeptide solution was introduced to the ESI source at a rate of 0.3 µL/min using a Harvard Apparatus (Holliston, MA) model 22 syringe pump. A +1.8-2-kV voltage was applied to the ESI emitter, and charged species were injected through a 500µm-diameter heated metal capillary maintained at 160 °C. At the exit of the metal capillary, the ion beam was focused to the entrance of a quadrupole ion guide. The ions were accumulated for 500 ms in an external storage quadrupole before transfer to the FTICR cell. This arrangement allowed the trapping of the ions without using pulsed gas introduction to the FTICR cell region and minimizing the overall analysis time.12,13 The transfer to the FTICR trap was followed by the use of a stored waveform inverse Fourier transform (SWIFT)14 ejection of unwanted species, a selection mechanism that is readily extended to a data-dependent mode of operation, as demonstrated previously.15 The selected ions were then subjected to a 750-ms infrared laser pulse of ∼25-W power, and all ions were excited by a frequency chirp (100 Hz/µs, amplitude 75 Vp-p) and detected (1 Mb data points) at an acquisition frequency of 1.777 MHz. The entire sequence time for spectrum acquisition was less than 3 s, making it suitable for coupling with separation techniques such as LC. Data Analysis and Database Searching. The data were analyzed using software developed in our laboratory. Time domain data were apodized (Hanning) and zero-filled twice before fast Fourier transform to produce a mass spectrum. The Horn mass transform algorithm16 was used to detect isotopic distributions in the mass spectrum. The threshold for peak finding was set to “5”; i.e., only peaks with intensity 5 times the average signal level of the noise were considered. This threshold was selected on the basis of the mass accuracy needed (better than 2.5 ppm), since a peak position is better determined when the noise contributes little to the detected signal.17 The m/z values of the parent ions and of the fragments were then included as two separate lists into a “Find Protein” module of the software program, which performed the database search. (The algorithm used is discussed in greater detail in the next section.) The database used consisted of predicted proteins initially reported for the full Caenorhabditis elegans genome sequence18 and contained 19 106 putative proteins, (13) Senko, M. W.; Hendrickson, C. L.; Emmett, M. R.; Shi, S. D. H.; Marshall, A. G. J. Am. Chem. Soc. Mass Spectrom. 1997, 8, 970-976. (14) Marshall, A. G.; Wang, T.-C. L.; Ricca, T. L. J. Am. Chem. Soc. 1985, 107, 7893-7897. (15) Jensen, P. K.; Pasa-Tolic, L.; Anderson, G. A.; Horner, J. A.; Lipton, M. S.; Bruce, J. E.; Smith, R. D. Anal. Chem. 1999, 71, 2076-2084. (16) Horn, D. M.; Zubarev, R. A.; McLafferty, F. W. J. Am. Soc. Mass Spectrom., in press. (17) Chen, L.; Cottrell, C. E.; Marshall, A. G. Chemom. Intell. Lab. Syst. 1986, 1, 51-58.

Analytical Chemistry, Vol. 72, No. 8, April 15, 2000

1919

corresponding to 918 655 possible tryptic fragments (excluding partial digestion products). In the present work, we explicitly consider the possibility of incomplete digestion as well as several modifications, effectively making the size of the database much larger (see later discussion). The general approach for this initial study involved the addition of test analytes to the full database in order to evaluate the effectiveness of their identification (i.e., the ability to “find” these added peptides among all possible C. elegans tryptic peptides). Data analysis and database search were performed on a 400MHz Pentium II personal computer. The complete analysis of one data set using our approach was of approximately 15-20 min, in which 2-3 min were spent for loading the database into memory. The data analysis is done after the experiment, thus does not hinder the experimental throughput, and can readily be made both faster and to run in parallel fashion, if so desired. Simulations were also performed at various levels of mass measurement accuracy where 10 proteins from the C. elegans database were randomly selected and tryptic fragments of these proteins within the mass range 800-2000 were randomly generated. The computed tryptic fragment parent ions masses and a corresponding set of thre “y” or “b” fragments per peptide were generated at random. These later represented the multiplexed MS/MS species list as would be generated in actual experiments. The computer searches to identify these peptides in the entire C. elegans database. The results derived from these simulations were used to set the criteria for peptide identification, as discussed in the next section. RESULTS AND DISCUSSION The objective of this work was to evaluate the feasibility of the multiplexed MS/MS approach for the more efficient identification of polypeptide mixtures. The goal of our efforts is the faster and/or the more reliable identification of protein mixtures from genomic databases based upon the use of highly accurate mass measurements for the parent and MS/MS-produced fragments of polypeptides obtained by enzymatic digestion. This initial implementation of multiplexed MS/MS takes advantage of the ion selection and manipulation capabilities of FTICR and uses infrared multiphoton dissociation (IRMPD) as the dissociation method to further speed analysis. Since IRMPD does not require the introduction of a gas, the pressure reduction time necessary for effective MS/MS measurements by FTICR is significantly reduced.19,20 IRMPD can also be used to dissociate ions of many different m/z values simultaneously. In contrast to the conventional tandem mass spectrometric approach, where an individual polypeptide is sequentially selected and dissociated,21-23 we simultaneously select and dissociate several species. In the case of the low-energy dissociation of polypeptides, a limited set of fragmentation pathways is usually favored,24 and we have found (18) The C. elegans Sequencing Consortium. Science 1998, 282, 2012-2018. (19) Little, D. P.; Speir, J. P.; Senko, M. W.; O’Connor, P. B.; McLafferty, F. W. Anal. Chem. 1994, 66, 2809-2815. (20) Marshall, A. G.; Hendrickson, C. L.; Jackson, G. S. Mass Spectrom. Rev. 1998, 17, 1-35. (21) McLafferty, F. W. Tandem Mass Spectrometry; John Wiley and Sons: New York, 1983. (22) Lin, T.; Glish, G. L. Anal. Chem. 1998, 70, 5162-5165. (23) Price, W. D.; Schnier, P. D.; Williams, E. R. Anal. Chem. 1996, 68, 859866. (24) Papayannopoulos, I. A. Mass Spectrom. Rev. 1995, 14, 49-73.

1920 Analytical Chemistry, Vol. 72, No. 8, April 15, 2000

Figure 1. Multiple ion selection for multiplexed MS/MS. The seven most abundant ions from the MS spectrum of the mixture (a) were selected using a SWIFT waveform (b). The masses reported in (b) are the computed neutral masses of the detected species.

that high-accuracy mass measurements allow most of the fragment ions in a multiplexed MS/MS experiment to be attributed to a specific parent polypeptide. While some fragments may remain unassigned to a specific parent, our initial work indicates that even a small number of correctly attributed fragments can provide unambiguous identification of the parent species from an appropriate database. A set of seven model polypeptides was selected in order to evaluate the multiplexed approach (see Table 1). The polypeptides exhibited a variety of structural features and two incorporated simple posttranslational modifications (i.e., amidation of the C-terminus and pyroglutamic acid at the N-terminus). Multiplexed MS/MS of Polypeptide Mixtures. Figure 1a shows the mass spectrum obtained for the mixture of seven polypeptides. In addition to the protonated or doubly protonated polypeptides, some minor species are evident. For multiplexed analysis, the most abundant species (marked by arrows in Figure 1a) were simultaneously selected using a SWIFT waveform (see Figure 1b). The waveform was generated to select the entire isotopic distribution for these species. The automation of the SWIFT selection procedure for data-dependent single-ion selection has already been demonstrated,15 and we have previously demonstrated the use of quadrupolar excitation to simultaneously select many different ionic species in a data-dependent mode.25 At present, while there are many issues related to the approach of data-dependent ion selection, methods for the implementation of this method using FTICR are straightforward; and real application is largely constrained by the present state of software for datadependent ion selection. We are presently developing automated data-dependent SWIFT selection for the application of multiplexed MS/MS with on-line separations. Once a subset of ions of interest had been selected, the IRMPD was performed to simultaneously yield fragments for all selected species. It should be noted that the dissociation threshold and dissociation efficiency of different polypeptide ions can vary somewhat depending on their structural features.24 Since lowenergy fragmentation processes are primarily charge-induced, the (25) Bruce, J. E.; Anderson, G. A.; Smith, R. D. Anal. Chem. 1996, 68, 534541.

Figure 2. Multiplexed MS/MS spectrum of the seven selected species (a). Insets show expanded views of the m/z range 975-1100, containing more than 20 detected isotopic distributions (b) and of the m/z range 1347-1356 showing two overlapping isotopic distributions (c). Values reported in (b) and (c) are measured m/z values for the most abundant isotope of each distribution (b) or of each isotopic peak (c).

presence of basic amino acids in the polypeptide affects the dissociation efficiency. Similarly, the presence of proline residues can increase the dissociation threshold.24 Since the laser continues to “heat up” any fragments formed, they can also be further activated and then dissociate. All polypeptides studied to date yielded measurable quantities of distinctive fragments in approximately 500 ms to 1 s irradiation time at the laser power used (data not shown). Thus, an irradiation time of 750 ms was used in the present work, but shorter irradiation periods should be feasible based upon the work of others19 and certainly with the use of higher laser power. Figure 2a shows a multiplexed MS/MS spectrum of the seven species subjected to IRMPD. For replicate experiments using an irradiation time of 750 ms, between 92 and 105 species were present in the resulting multiplexed MS/MS spectra of the seven peptides. Importantly, the spectra showed very high similarity and led to the same search results as the example described. In the spectrum shown here, 105 distinct species (not counting isotopic peaks) were detected using the Horn mass transform algorithm, corresponding to polypeptide fragment ions as well as some residual intact parent ions. The data set used for polypeptide identification consisted of the masses of the seven parent species and the 105 detected species from the multiplexed MS/MS spectrum. Polypeptide Identification. In this initial work, we have evaluated the multiplexed approach using the C. elegans full genome predicted protein database and accurate masses measured for both the parent species and their fragments. It is worth pointing out that no attempts at manual interpretation of the

spectra were made. The only information used was the accurate m/z values obtained during the multiplexed MS/MS acquisition. The C. elegans 97 Mbase genome full DNA sequence has an annotated set of 19 106 predicted proteins. An ideal tryptic digestion of these proteins would result in 918 655 tryptic polypeptides. However, since tryptic digestion may be incomplete, we actually searched against a larger database that included all possible partial digestion products from the putative proteins in the C elegans full database; this includes 1 511 441 polypeptides in the mass range of 400-1900 Da. While this step may not be necessary, such increased complexity is likely more representative of real systems. The peptide search algorithm is summarized in Figure 3. First, a list of all possible tryptic peptides (including the partial digestion products) was generated from the C. elegans predicted protein database to which we added the sequences of the seven model polypeptides. The measured mass for each parent species (assuming 2 ppm mass accuracy) was then searched against all masses for species on this list of possible tryptic digestion products, resulting in a set of “candidates” for each parent species (see Table 2). The subsequent search was performed using only a set of major predicted fragment ion species originating from the possible “candidate” peptides. The list of possible fragment species for the candidates included all b and y fragments as well as product ions corresponding to the loss of water or ammonia from these same b and y fragments. Although additional fragmentation modes could be included with this approach, these modes predominate24 under the low-energy, slow-activation conditions that apply for IRMPD (less important fragmentation pathAnalytical Chemistry, Vol. 72, No. 8, April 15, 2000

1921

Figure 3. Algorithm used for the polypeptides search using multiplexed MS/MS data. In this initial work, the sequences for the set of model polypeptides selected for study were added to the full database of predicted polypeptides that could be produced by enzymatic digestion of proteins for C. elegans. The predicted polypeptides included all possible species from incomplete digestion, and studies also included the generation and use of modified polypeptide libraries. The search is performed in two steps. First, a list of the masses of all possible tryptic fragments of the proteins in the database is generated and those masses are compared with the list of masses of the parent ions. This results in a list of tryptic fragments that matched a parent mass or candidates. The masses of all y, b, y - H2O, b - H2O, y - NH3, and b - NH3 fragments from all candidates are then generated by the program and each candidate fragmentation is compared to the multiplexed MS/MS species list. If three or more of the candidates’ fragments are in the multiplexed MS/ MS spectrum, it is considered to be identified as one of the parent species. Table 2. Search Results in the C. Elegans Full Genome Databasea Unmodified or Modified To Account for Two Posttranslational Modifications (See Text)b candidatesc

hitsd

parent mass

unmodfd

inc modfd

unmodfd

inc modfd

identification

1857.918 1671.910 1535.685 1346.728 1295.677 1059.561 1024.549

40 25 9 48 32 33 16

121 89 33 166 81 106 61

1 0 1 0 1 1 1

1 1 1 1 1 1 1

γ-endorphin neurotensin fibrinopeptide A substance P angiotensin I bradykinin renin inhibitor

a A total of 19 106 putative proteins corresponding to 41 655 439 possible tryptic fragments, including all partial digestion products. bThe search parameters assumed 2 ppm mass accuracy for parent species and 2.5 ppm for fragment species. cBased on only parent mass. d Based upon use of three or more distinctive fragments from the MS/MS spectrum.

ways may prove useful, however, to gain additional confidence after peptide identification, as proposed for the conventional tandem MS approach26). Thus, ions corresponding to the y, b, y - H2O, b - H2O, y - NH3, and b - NH3 species for all candidates were searched against the multiplexed MS/MS species 1922 Analytical Chemistry, Vol. 72, No. 8, April 15, 2000

list. When ions related to the same fragment appeared more than once (e.g., y8 and y8 - H2O), it was counted as only a single match. For all candidates, possible fragment masses were computed and compared to the list of masses from the multiplexed MS/MS spectrum (assuming 2.5 ppm mass accuracy in this case). The capability for peptide identification based on this information obviously depends on many different factors: the size of the polypeptides’ parents; the size of the database; the mass accuracy for parent and fragment species; and the “uniqueness” of the fragment masses. In this work, the extended C. elegans data set provided a level of complexity that should be similar to that encountered, e.g., for different mammalian cells (i.e., where only a subset of all possible protein will be expressed). Since peptide (and ultimately protein) identification is based upon a limited set of accurate mass information, future efforts will be required to set well-defined criteria for identification based upon the explicit consideration of the needs for speed, sensitivity and confidence in making assignments. It should be noted that the use of simulations provided an extremely useful tool in the evaluation of multiplexed MS/MS and was used extensively to test our approach prior to actual experiments. Simulation performed at the 10 peptide mixture level of complexity resulted in 99% of the tryptic peptides being uniquely identified when three MS/MS fragments were used, compared to 86% and 39% when only two and one single fragment, respectively, were used. Thus, a polypeptide candidate was considered “identified” when three or more predicted fragment masses from this candidate matched masses measured in the multiplexed MS/MS spectrum. It is worth noting that the confidence level of the identification for a given peptide increases with the number of fragments retrieved in the multiplexed MS/ MS spectrum; a scoring method implementing this observation is being developed. The five unmodified polypeptides were uniquely identified using this approach, and there were no “hits” for the two modified polypeptides (see Table 2). Thus, the high mass accuracy of the approach provides (at least for this initial example) very high confidence and no “false positives”. Elimination of the set of test peptides from the C. elegans database resulted in no “hits”, providing further support for the approach. An important practical test of the utility of the present approach, as well as its robustness, is qualitatively indicated by its ability to identify modified polypeptides. As indicated above, the two modified polypeptides were not identified from the unmodified C. elegans database (i.e., no “hits” were obtained). The ability to identify modified peptides can be achieved by expanding the present approach to include the appropriate mass differences resulting from the modification(s). In the present work, we expanded the database to include the two posttranslational modifications; i.e., all possible tryptic polypeptides in the database were amidated at the C-terminus or/and modified by loss of H2O at the N-terminus (as occurs in the case of pyroglutamic acid modification). The result was a database 4 times more complex than the initial one, since it contained the original polypeptides plus those same polypeptides with the N-terminus, the C-terminus, or both termini modified. In this case, all seven species were uniquely identified (see Table 2) using the accurate mass multiplexed MS/MS information with the same identification (26) Mann, M.; Wilm, M. Anal. Chem. 1994, 66, 4390-4399.

Figure 4. Multiplexed MS/MS spectrum of the seven selected species, labeled with fragment attributions obtained during the database search. Insets show expanded views. Table 3. Search Results from C. elegans Full Genome Database Unmodified and Modified To Account for Two Posttranslational Modifications (See Text), Assuming a 10 ppm Mass Accuracy for Both Parent and Fragment Species candidates

hits

parent mass

unmodfd

inc modfd

unmodfd

inc modfd

1857.918 1671.910 1535.685 1346.728 1295.677 1059.561 1024.549

135 193 40 195 135 185 89

506 690 128 647 537 680 439

7 3 1 1 2 2 1

14 13 2 10 2 5 1

criteria as described above. Figure 4 shows the multiplexed MS/ MS spectrum obtained for the seven polypeptides labeled with the assignments obtained by the search algorithm. Most of the isotopic distributions in the spectrum were attributed to one of the parents found in the database. The few duplicate assignments obtained result from actual isobaric fragments (i.e., same molecular formula), which can originate from more than one of the parent species. The high confidence levels demonstrated by these results are strongly dependent upon the mass measurement accuracy achieved for both the parent and fragment species. To illustrate the need for a high mass accuracy, we have performed the same peptide search using a lower mass accuracy. Instead of using 2 ppm accuracy for the parent species and 2.5 ppm for the fragment species, we also evaluated to results obtained for 10 ppm accuracy for both. As shown in Table 3, the ability to identify a specific parent species was generally lost, and 47 “hits” were found in the modified polypeptide database for the seven parent species from among a total of 3 627 candidate polypeptides. Note also that at such a reduced mass accuracy, hits corresponding to the two

modified peptides were found in the unmodified database, leading to “false positive” identifications. A detailed study of the issues involved, and the refinement of search algorithms aimed at achieving confident identification, will be the subject of a future publication. CONCLUSIONS A new approach for faster and more sensitive polypeptide identification in mixtures has been developed involving the simultaneous infrared multiphoton dissociation of multiple polypeptide ions and their identification based upon accurate mass information for both the parent and fragment species. The approach exploits the understanding of low-energy polypeptide fragmentation pathways. The multiplexed MS/MS approach has been initially demonstrated using a mixture of seven polypeptides resulting in the unambiguous identification of all (including two modified species) from the C. elegans full genome predicted protein database modified to include those peptides. Importantly, the C. elegans proteome is equivalent in complexity to that projected for a specific differentiated mammalian cell type. The present results demonstrate the high reliability of the multiplexed MS/MS approach and highlight the advantages of FTICR both for enabling multiplexed MS/MS and for providing the high mass measurement accuracy that aids confident polypeptide identification. This approach should enable substantially increased throughput and confidence in protein identification after enzymatic digestion and decreased sample consumption (since all species are examined in a single experiment) and should be practical in conjunction with on-line separations. In the latter regard, we have developed a new approach for calibration that avoids the need for internal calibrants and further enhances the practicality for obtaining high mass measurement accuracies with FTICR during on-line separations.27 Finally, we note that multiplexed MS/MS of many more species than used in this initial Analytical Chemistry, Vol. 72, No. 8, April 15, 2000

1923

demonstration should be feasible. Indeed, limited simulations involving 25-50 polypeptides indicate that more than 95% can be identified with high confidence from a single experiment using the level of mass measurement accuracy assumed in the present work. Increased mass accuracy should be achievable, and the practical constraint for the number of polypeptides addressable is expected to be primarily based upon the product spectrum sensitivity.

(27) Bruce, J. E.; Anderson, G. A.; Brands, M. D.; Pasa-Tolic, L.; Smith, R. D. J. Am. Soc. Mass Spectrom., in press.

1924

Analytical Chemistry, Vol. 72, No. 8, April 15, 2000

ACKNOWLEDGMENT We thank the U.S. Department of Energy, Office of Biological and Environmental Research, for support of this research. Pacific Northwest National Laboratory is operated by Battelle Memorial Institute for the U.S. Department of Energy through Contract DEAC06-76RLO 1830.

Received for review September 30, 1999. Accepted January 28, 2000. AC991133+