High-Mass-Measurement Accuracy and 100 Sequence Coverage of

were signal-averaged leading to the observation of a total of 123 isotope distributions with ... amino acid sequence coverage and an average error of...
0 downloads 0 Views 49KB Size
Anal. Chem. 1999, 71, 2595-2599

Accelerated Articles

High-Mass-Measurement Accuracy and 100% Sequence Coverage of Enzymatically Digested Bovine Serum Albumin from an ESI-FTICR Mass Spectrum James E. Bruce, Gordon A. Anderson, Jenny Wen, Richard Harkewicz, and Richard D. Smith*

Separations and Mass Spectrometry Group, Environmental and Molecular Sciences Laboratory, Pacific Northwest National Laboratory, P.O. Box 999 Richland, Washington 99352

The application of Fourier transform ion cyclotron resonance (FTICR) mass spectrometry to the analysis of polypeptide mixtures resulting from proteolytic digestion is described. A new 11.5-T FTICR mass spectrometer has been applied for the analysis of tryptic digestion mixtures of the protein bovine serum albumin (BSA). The improved cyclotron frequency stability and reduced frequency shifts observed over a wide range of trapped ion population sizes provide the ability to signal average spectra without degrading mass measurement accuracy, requiring internal calibration or advanced data processing schemes to compensate for variations in ion cyclotron signals brought about by different population sizes. A total of 100 spectra were signal-averaged leading to the observation of a total of 123 isotope distributions with a signal-to-noise ratio greater than 3:1. From those distributions, 86 can be ascribed to tryptic fragments of BSA on the basis of mass measurement errors of 10 ppm or less. Of these, 71 were within 2 ppm error limits corresponding to complete amino acid sequence coverage and an average error of 0.77 ppm. These results indicate that high-accuracy measurements are feasible for a large number of species detected simultaneously without the necessity for internal calibration and indicate the potential of such measurements, when combined with chromatographic separations, for facilitating more rapid identification of large numbers of proteins.

Mass spectrometry has become an important analytical method in many aspects of biological research in recent years, and perhaps 10.1021/ac990231s CCC: $18.00 Published on Web 06/10/1999

© 1999 American Chemical Society

most notably, for protein characterization.1-5 Both matrix-assisted laser desorption/ionization (MALDI)6 and electrospray ionization (ESI)7 have been successfully employed in conjunction with several types of mass analyzers to provide extremely useful capabilities for rapid protein identification and characterization. As a result, proteins are now commonly identified from extracted spots separated by 2D-PAGE in a serial approach involving 2D separation, extraction/digestion, and liquid chromatography/mass spectrometry (LC/MS) for mixture analysis. Given the impressive success of these approaches, there is great interest in extending these capabilities to mixtures of proteins and possibly reducing the dependence on the time-consuming 2D-PAGE stage of these analyses and greatly improving the speed of the application by processing many proteins in parallel. In this regard, Yates and co-workers have recently demonstrated the potential for the analysis of protein mixtures consisting of up to 75 proteins using LC/ESI-MS/MS.8 Mass spectrometry offers the ability to address complex mixtures of proteins and the ability to characterize the many coand posttranslational modifications that result in a change in protein mass. However, proteome-wide characterization of proteins (1) Figeys, D.; Aebersold, R. Electrophoresis 1998, 19, 885-892. (2) Qin, J.; Fenyo¨, D.; Zhao, Y. M.; Wilson, C. J.; Young, R. A.; Chait, B. T. Anal. Chem. 1997, 69, 3339-4001. (3) Yates, J. R. Electrophoresis 1998, 19, 893-900. (4) Fenyo ¨, D.; Qin, J.; Chait, B. T. Electrophoresis 1998, 19, 998-1005. (5) Chaurand, P.; Leutzenkirchen, F.; Spengler, B. J. Am. Soc. Mass Spectrom. 1999, 10, 91-103. (6) Hillenkamp, F.; Karas, M.; Beavis, R. C.; Chait, B. T. Anal. Chem. 1991, 63, 1193A-1203A. (7) Fenn, J. B.; Mann, M.; Meng, C. K.; Wong, S. F. Whitehouse, C. M. Science 1989, 246, 64-71. (8) Mccormack, A. L.; Schieltz, D. M.; Goode, B.; Yang, S.; Barnes, G.; Drubin, D.; Yates, J. R. Anal. Chem. 1997, 69, 767-776.

Analytical Chemistry, Vol. 71, No. 14, July 15, 1999 2595

holds many challenges for mass spectrometry, including identification and extent of modifications, an area where the gene sequence is uninformative. However, the detection of a limited set of unmodified peptides resulting from the digestion of a modified protein can provide the basis for protein identification, if the peptides can be confidently associated to a single protein. Although the process can be automated, it is still time-consuming; estimates indicate 5-10 proteins can be identified per day by a single operator.2 Furthermore, each step of sample handling can lead to sample losses. Consequently, there is currently a strong drive to couple on-line separations with mass spectrometry to provide a more sensitive and rapid method of protein identification from complex polypeptide mixtures.1 In general, high-mass-measurement accuracy greatly enhances the capability for unambiguous protein identification from the analysis of mixtures of enzymatically digested proteins. The large number of tryptic peptides resulting from the simultaneous digestion of many proteins can have nearly identical nominal masses. Fenyo¨ et al.4 have provided a detailed discussion of the effects of mass measurement accuracy and other factors such as N-terminal amino acid sequence or the presence or absence of particular amino acids on protein identification from Saccharomyces cerevisiae (yeast). Calculations performed in our laboratory show that, for Escherichia coli, nearly all (96.3%) possible predicted proteins based upon putative open-reading frames (ORFs) have at least one unique tryptic fragment to a mass accuracy of 0.1 ppm.9 The few exceptions almost all correspond to duplicate genes or cases where predicted proteins are identical fragments of larger proteins. Similarly, 96.6% of all S. cerevisae ORFs have unique tryptic peptides to a mass accuracy of 0.1 ppm. Fourier transform ion cyclotron resonance mass spectrometry (FTICR-MS) provides the highest combination of simultaneous mass measurement accuracy, resolution, and sensitivity10 and thus should be unparalleled for characterization of complex mixtures of proteins and peptides. Recent work by Easterling et al. shows that, with careful consideration of trapped ion population sizes, routine parts-per-million mass accuracy can be obtained for highmass ions with MALDI-FTICR.11 This publication describes the application of high-magnetic-field electrospray ionization-Fourier transform ion cyclotron resonance mass spectrometry to the analysis of enzymatically digested proteins. EXPERIMENTAL SECTION All experiments were performed with an FTICR mass spectrometer designed and constructed in this laboratory that is to be described in detail elsewhere.12 It should be noted that the spectrometer employed for the present results uses the same magnet but is a distinctly different spectrometer than that reported (9) Anderson, G. A.; Bruce, J. E.; Pasa-Tolic, L.; Smith, R. D. Proceedings of the 46th ASMS Conference on Mass Spectrometry and Allied Topics, Orlando, FL, May 31-June 4, 1998; p 1270. (10) Marshall, A. G.; Hendrickson, C. L.; Jackson, G. S. Mass Spectrom. Rev. 1998, 17, 1-35. (11) Easterling, M. L.; Mize, T. H.; Amster, I. J. Anal. Chem. 1999, 71, 624632. (12) Harkewicz, R.; Bruce, J. E.; Anderson, G. A.; Lin, C.-Y.; Gorshkov, M.; Huang, M.; Udseth, H. R.; Smith, R. D., manuscript in preparation.

2596 Analytical Chemistry, Vol. 71, No. 14, July 15, 1999

previously.13 Briefly, electrospray ionization7 is used to produce gas-phase ions of the molecules of interest, external to the magnetic field. After ionization, the ions are guided through the fringing fields of an 11.5-T superconducting magnet with the aid of four sets of rf-only quadrupoles. With six stages of differential pumping, the pressure is reduced from atmosphere to (2-3) × 10-9 Torr at the ICR cell. Ions are initially trapped external to the magnetic field within the quadrupole ion guide. As shown by Senko et al.,14 external trapping avoids the need for pulsed gas introduction to the ultrahigh-vacuum region and increases the maximum spectrum acquisition rate. After initial accumulation in the quadrupole ion guide, the ions are transferred to the trapped ion cell for mass analysis. For the data presented here, 128K data points were collected with an analog-to-digital converter rate of 592 kHz, corresponding to a low-mass limit of 589 Da. Thus, ICR signals were detected for 0.221 s, with a maximum theoretical resolution of about 20 000 for an ion with mass-to-charge ratio of 1000 and a cyclotron frequency of 174.27 kHz. Enzymatic Digestion. Bovine serum albumin (BSA) was purchased from Sigma (St. Louis, MO) and was used as obtained. Sequencing grade trypsin was also obtained from Sigma and was dissolved in a 1 mM HCl solution at an initial concentration of 0.5 mg/mL. BSA with an initial concentration of 1 mg/mL in 100 mM Tris buffer with 5 mM DTT was digested with trypsin in a ratio of 1:50 (weight of trypsin:weight of BSA) at 37 °C overnight. After digestion, excess salts were removed prior to mass spectrometry analysis with the flow-through microdialysis technique developed in this laboratory.15 After dialysis, samples were directly electrosprayed into the mass spectrometer at 300 nL/min and 1800 V. Ions were accumulated in the quadrupole ion guide for 1-3 s, after which they were transferred to the FTICR cell for mass analysis. Typically, 50-100 transient signals were summed in this initial work for a total analysis time of 2-3 min. It is projected that an optimized external accumulation and ion-transfer arrangement currently under development will allow similar results to be obtained with two to five spectra. Data Analysis. The analysis of mass spectra of peptides resulting from the digestion of proteins is generally straightforward, but the use of this information can be challenging, depending upon experimental details and goals. For unknown proteins, each isotope envelope must first be processed (i.e., “deconvolved”) to establish its mass and then compared with the set of possible tryptically generated peptides and candidates ranked on the basis of the fit with specified criteria. We have developed a set of tools for the automatic analysis of ESI-FTICR mass spectra resulting from protein digests that fully exploits highmass-measurement accuracy. First, in the sequence of events for automated data analysis, the data were transformed to the frequency domain and m/z domain, on the basis of the established magnetic field calibration. Magnetic field calibration was carried out using peaks from nine peptides observed in the spectrum resulting from another tryptic digest sample. For each of the (13) Gorshkov, M. V.; Pasa-Tolic, L.; Udseth, H. R.; Anderson, G. A.; Huang, B. M.; Bruce, J. E.; Prior, D. C.; Hofstadler, S. A.; Tang, L.; Chen, L.; Willett, J. A.; Rockwood, A. L.; Sherman, M. S.; Smith, R. D. J. Am. Soc. Mass Spectrom. 1998, 9, 692-700. (14) Senko, M. W.; Hendrickson, C. L.; Emmett, M. R.; Shi, S. D. H.; Marshall, A. G. J. Am. Soc. Mass Spectrom. 1997, 8, 970-976. (15) Liu, C. L.; Muddiman, D. C.; Tang, K. Q.; Smith, R. D. J. Mass Spectrom. 1997, 32, 425-431.

Figure 1. The 11.5-T ESI-FTICR mass spectrum of complex polypeptide mixture resulting from tryptic digestion of bovine serum albumin. A total of 123 distinct isotope distributions were obtained and 71 assigned to tryptic digestion products ranging in mass from 788 to 12 995 Da, all within 2 ppm error limits (average error of 0.77 ppm). This corresponds to fragments containing 100% of the amino acid sequence of BSA. The insets show the measured and assigned tryptic fragment isotope distributions for three components.

observed peptides, a single isotope peak was included in the magnetic field calibration table and a second-order calibration equation of the form m/z ) A/f + B/f 2 was generated, where f is the measured frequency, A ) 174 548 926, and B ) -464 543 263. Mass deconvolution is then accomplished by the implementation of the algorithm reported by Horn et al.,16 and the resulting masses are compared to a generated table of masses based on all possible proteolysis products for the protein. Multiple protein or even entire databases of sequences (protein or genomic) are also amenable to search in the same manner used for single protein identification. User-defined parameters of the “search and identify” portion of this program include the following: sequence(s) to search; type of fragments expected (i.e., those resulting from trypsin, CNBr, other enzymatic proteolysis, or even typical fragments resulting from CAD); acceptable error in mass measurement; and m/z range for the search. The program presents as output a table of the experimentally measured masses and the predicted polypeptides (if any) that fall within the search criteria established by the user. Also indicated is an annotated amino acid sequence that illustrates the portions of the protein that correspond to peptides tentatively identified on the basis of the search criteria. RESULTS Figure 1 shows an ESI-FTICR spectrum of bovine serum albumin digested with trypsin. Over 1000 peaks are observed in this spectrum with greater than 3:1 signal-to-noise ratio. This spectrum represents the direct summation of 100 transient signals; no advanced data processing was performed prior to summation to remove any shot-to-shot variation that might occur in the precise cyclotron frequency determination of this large ensemble of ions. The observed frequency stability is likely an attribute of both the (16) Horn, D. M.; Zubarev, R. A.; McLafferty, F. W. Proceedings of the 46th ASMS Conference on Mass Spectrometry and Allied Topics, Orlando, FL, May 31June 4, 1998; p 118.

Figure 2. Distribution of error values observed between +10 and -10 ppm for the data presented in Figure 1. The large majority of error values fall near zero and the distribution of error values closely resembles that of a normal error distribution with a standard deviation of 1 ppm.

stability and high field of the superconducting magnet employed. The latter is particularly important since perturbations in the effective magnetic field arising from the Coulombic interactions of the ion cloud itself become less significant as the magnetic field strength increases. Also shown in this figure are the isotope envelopes of three of the tryptic fragments identified by comparison with those calculated masses based on the amino acid sequence. Error analysis of the results arising from the peptide mass assignment with an error threshold set to (10 ppm is presented in Figure 2, along with a normal error probability function with a standard deviation of 1 ppm. These results show a good fit between the measured error distribution and that predicted for a normal error distribution with a standard deviation of 1 ppm. Therefore, the 95% confidence limits, as defined as (2 Analytical Chemistry, Vol. 71, No. 14, July 15, 1999

2597

Table 1. Identified Peptides and Measurement Error for Tryptic Digestion of Bovine Serum Albumin m/z

Ma

Cb

err/ppm

assigned sequence

789.472 817.489 820.472 834.91 841.46 847.504 870.774 875.991 901.42 906.009 912.955 922.489 927.494 982.48 988.569 1001.59 1011.419 1023.52 1024.453 1031.08 1037.459 1038.941 1046.527 1071.523 1086.769 1090.241 1133.548 1141.533 1142.716 1151.546 1159.026 1163.63 1173.6 1177.558 1194.625 1218.627 1221.555 1237.103 1249.621 1254.306 1264.266 1301.223

788.464 816.482 1638.93 1666.803 840.452 846.497 2608.298 1749.967 2700.235 1810.003 1822.893 921.481 926.487 1961.942 987.562 1000.583 1010.412 2044.023 1023.446 2059.142 3107.35 2075.868 2090.036 3210.543 4341.043 4354.928 3396.621 3419.57 1141.709 2300.076 2315.034 1162.622 3516.774 1176.55 3578.848 2434.236 2440.092 2471.189 1248.614 3757.89 3787.772 12995.134

788.464 816.482 1638.93 1666.806 840.453 846.496 2608.299 1749.966 2700.238 1810.002 1822.892 921.481 926.486 1961.94 987.56 1000.582 1010.413 2044.021 1023.448 2059.142 3107.356 2075.871 2090.035 3210.547 4341.048 4354.924 3396.622 3419.572 1141.707 2300.075 2315.038 1162.623 3516.775 1176.552 3578.848 2434.235 2440.093 2471.19 1248.614 3757.888 3787.771 12995.131

-0.158 -0.537 0.245 1.395 0.482 -0.512 0.365 -0.165 0.974 -0.258 -0.657 -0.736 -0.785 -0.997 -1.648 -1.105 0.823 -1.322 1.592 0.337 1.992 1.244 -0.311 1.285 1.164 -0.993 0.292 0.553 -1.566 -0.298 1.879 0.743 0.21 1.269 0.131 -0.432 0.405 0.651 0.253 -0.532 -0.206 -0.199

1305.718 1333.977

1304.711 9324.773

1304.709 9324.761

-1.772 -1.312

1347.604 1362.672 1378.33 1386.62 1418.738 1426.656 1439.812 1448.689 1453.316 1479.794 1567.741 1603.076

2692.191 1361.665 4129.961 1385.612 1417.73 5698.585 1438.804 4341.041 4354.92 1478.787 1566.734 6405.266

2692.193 1361.665 4129.961 1385.613 1417.731 5698.578 1438.804 4341.048 4354.924 1478.788 1566.735 6405.277

0.894 -0.159 -0.104 0.667 0.293 -1.252 0.051 1.671 0.822 0.859 0.926 1.708

1615.132 1635.604

4840.368 6533.374

4840.366 6533.372

-0.392 -0.312

1639.937 1658.796 1667.81 1711.295 1760.396 1791.433 1812.01 1824.9 1866.96

1638.929 4971.362 1666.803 3419.573 3516.772 3578.846 1810 1822.89 9324.752

1638.93 4971.364 1666.806 3419.572 3516.775 3578.848 1810.002 1822.892 9324.744

0.692 0.485 1.761 -0.31 0.736 0.717 1.414 1.218 -0.871

1880.955 1895.895 1901.536 1962.946

3757.89 3787.771 5698.579 1961.939

3757.888 3787.771 5698.578 1961.94

-0.528 0.038 -0.137 0.828

LVTDLTK SLGKVGTR KVPQVSTPTLVEVSR MPCTEDYLSLILNR LCVLHEK LSQKFPK LKPDPNTLCDEFKADEKKFWGK LSQKFPKAEFVEVTK CCTKPESERMPCTEDYLSLILNR LCVLHEKTPVSEKVTK RPCFSALTPDETYVPK AEFVEVTK YLYEIAR LKPDPNTLCDEFKADEK TPVSEKVTK ALKAWSVAR QNCDQFEK RHPYFYAPELLYYANK CCTESLVNR YTRKVPQVSTPTLVEVSR EYEATLEECCAKDDPHACYSTVFDKLK ECCHGDLLECADDRADLAK LKPDPNTLCDEFKADEKK LVTDLTKVHKECCHGDLLECADDRADLAK RHPYFYAPELLYYANKYNGVFQECCQAEDKGACLLPK VASLRETYGDMADCCEKQEPERNECFLSHKDDSPDLPK SHCIAEVEKDAIPENLPPLTADFAEDKDVCK LAKEYEATLEECCAKDDPHACYSTVFDKLK KQTALVELLK NYQEAKDAFLGSFLYEYSR YNGVFQECCQAEDKGACLLPK LVNELTEFAK LKPDPNTLCDEFKADEKKFWGKYLYEIAR ECCDKPLLEK GLVLIAFSQYLQQCPFDEHVKLVNELTEFAK GLVLIAFSQYLQQCPFDEHVK VHKECCHGDLLECADDRADLAK QNCDQFEKLGEYGFQNALIVR FKDLGEEHFK HLVDEPQNLIKQNCDQFEKLGEYGFQNALIVR NECFLSHKDDSPDLPKLKPDPNTLCDEFKADEK DTHKSEIAHRFKDLGEEHFKGLVLIAFSQYLQQCPFDEHVKLVNELTEFAKTCVADESHAGCEKSLHTLFGDELCKVASLRETYGDMADCCEKQEPERNECFLSHKDDSPDLPK HLVDEPQNLIK AFDEKLFTFHADICTLPDTEKQIKKQTALVELLKHKPKATEEQLKTVMENFVAFVDKCCAADDKEACFAVEGPKLVVSTQTALA TCVADESHAGCEKSLHTLFGDELCK SLHTLFGDELCK SHCIAEVEKDAIPENLPPLTADFAEDKDVCKNYQEAK YICDNQDTISSK LKECCDKPLLEK SLHTLFGDELCKVASLRETYGDMADCCEKQEPERNECFLSHKDDSPDLPK RHPEYAVSVLLR RHPYFYAPELLYYANKYNGVFQECCQAEDKGACLLPK VASLRETYGDMADCCEKQEPERNECFLSHKDDSPDLPK LGEYGFQNALIVR DAFLGSFLYEYSR QTALVELLKHKPKATEEQLKTVMENFVAFVDKCCAADDKEACFAVEGPKLVVSTQTALA RHPEYAVSVLLRLAKEYEATLEECCAKDDPHACYSTVFDKLK KQTALVELLKHKPKATEEQLKTVMENFVAFVDKCCAADDKEACFAVEGPKLVVSTQTALA KVPQVSTPTLVEVSR RHPYFYAPELLYYANKYNGVFQECCQAEDKGACLLPKIETMR MPCTEDYLSLILNR LAKEYEATLEECCAKDDPHACYSTVFDKLK LKPDPNTLCDEFKADEKKFWGKYLYEIAR GLVLIAFSQYLQQCPFDEHVKLVNELTEFAK LCVLHEKTPVSEKVTK RPCFSALTPDETYVPK FWGKYLYEIARRHPYFYAPELLYYANKYNGVFQECCQAEDKGACLLPKIETMREKVLASSARQRLRCASIQKFGERALK HLVDEPQNLIKQNCDQFEKLGEYGFQNALIVR NECFLSHKDDSPDLPKLKPDPNTLCDEFKADEK SLHTLFGDELCKVASLRETYGDMADCCEKQEPERNECFLSHKDDSPDLPK LKPDPNTLCDEFKADEK

a

M, measured monoisotopic mass. b C, calculated monoisotopic mass.

times the standard deviation, are established for the present results as (2 ppm. Assignments with error values greater in magnitude than 2 ppm were not accepted based on the established 95% confidence limits and these peaks likely resulted from well2598 Analytical Chemistry, Vol. 71, No. 14, July 15, 1999

known heterogeneity of BSA samples.17 From the more than 1000 peaks in the mass spectrum, a total of 123 isotope envelopes can (17) Bruce, J. E.; Anderson, G. A.; Udseth, H. R.; Smith, R. D. Anal. Chem. 1998, 70, 519-525.

be assigned that include most of the major peaks. These distributions are used to search against the sequence of BSA to identify tryptic peptides. In this analysis, all possible digestion products, including any number of missed cleavage sites are searched. A total of 71 isotope distributions were assigned to BSA tryptic fragments with range and average mass measurement error values of 0.051-2 and 0.77 ppm, respectively. This observation corresponds to the detection of fragments containing 583 out of the 583 possible amino acids of BSA, or a coverage of 100%. The m/z, measured and calculated masses, and sequences of the identified peptides are listed in Table 1. For a large majority of the peptides reported in Table 1, the measured monoisotopic mass was determined directly from the mass spectrum. For cases where the monoisotopic peak was undetected, averagine18 was used to predict the monoisotopic mass for the observed isotope distribution. It should be noted that the high-mass-measurement accuracy observed with a time domain signal length of only 0.2 s is likely an attribute of the signal averaging employed. As the number of spectra used for peak definition increases, the relative noise contribution to the amplitude of each spectral point decreases (by the square root of the number of spectra averaged). Since each peak is fit to a three-point quadratic function to establish the true measured cyclotron frequency and thus m/z, fluctuations in the amplitude of each point are likely to be a source for mass measurement error. Signal averaging reduces the error associated with point amplitude definition and thus increases the mass measurement precision. (18) Senko M. W.; Beu, S. C.; McLafferty, F. W. J. Am. Soc. Mass Spectrom. 1995, 6, 229-233.

CONCLUSIONS High-accuracy mass measurements with errors averaging 0.77 ppm for the direct analysis of the polypeptide mixture from an enzymatic digestion of bovine serum albumin allowed the rapid identification of 71 peptides covering 100% of the protein’s amino acid sequence. These high-field FTICR mass measurements were obtained without the aid of an internal calibrant that should result in even greater accuracy. The present results also demonstrate the great utility FTICR has for high-mass-measurement accuracy for a large number of components from a single spectrum. These results indicate that, in conjunction with chromatographic separations, rapid, accurate mass analysis of complex digestion mixtures for the identification of large numbers of proteins should be feasible. Importantly, the expanded capacity and improved control of the total ion population size, combined with the greater frequency stability resulting from the high-field superconducting magnet employed here, resulted in greater repeatability for the measured ion signals and low- to sub-ppm mass measurement errors. ACKNOWLEDGMENT This research was supported by the U.S. Department of Energy, Office of Health and Environmental Research. The Pacific Northwest National Laboratory is operated by Battelle Memorial Institute for the U.S. Department of Energy through Contract DEAC06-76RLO 1830.

Received for review March 1, 1999. Accepted May 13, 1999. AC990231S

Analytical Chemistry, Vol. 71, No. 14, July 15, 1999

2599