Anal. Chem. 1998, 70, 1214-1222
Unknown Peptide Sequencing Using Matrix-Assisted Laser Desorption/Ionization and In-Source Decay Duane C. Reiber and Robert S. Brown*
Department of Chemistry and Biochemistry, Utah State University, Logan, Utah 84322-0300 Scot Weinberger,† James Kenny, and Jerome Bailey
The Hewlett-Packard Company, Protein Chemistry Systems, 1601 California Avenue, Palo Alto, California 94393
The results of a study to determine the utility of in-source decay fragmentation of matrix-assisted laser-desorbed ions for obtaining useful sequence information on unknown peptides are presented. Six peptides were purified by high-performance liquid chromatography and submitted as single blind unknowns. The in-source decay fragment ion data were collected on a linear time-of-flight mass spectrometer equipped with delayed extraction. These fragment ion data were manually interpreted on the basis of known fragmentation pathways to determine a proposed sequence. The proposed sequences for three of the unknowns were essentially correct, with a few minor errors. A fourth unknown had significant errors associated with its proposed sequence due to misinterpretation of the fragmentation data. Two unknowns were found to have undergone significant sample degradation prior to analysis, which compromised the results for these samples. An example of the use of protein database searching of a partial peptide sequence to aid in a sequence determination is also presented.
One area where MALDI TOF-MS has made significant strides in recent years is in the sequencing of peptides. Traditionally, peptides and proteins have been sequenced using repeated cycles of the Edman degradation.18-20 In the 1970s, the utility of mass spectrometry for the sequencing of simple peptides was demonstrated.21,22 Tandem mass spectrometry techniques (MS/MS),23,24 coupled with fast-atom bombardment ionization (FAB)25 and collisionally activated decomposition (CAD) have been widely used for peptide sequence determination.26,27 The nomenclature now widely employed to describe amide bond related peptide fragmentation was developed by Roepstorff and Fohlman28 and later modified by Biemann and co-workers.29 Utilizing this nomenclature, FAB MS/MS mass spectra typically exhibit bn, yn, and wn fragment ion series.
† Current address: Ciphergen Biosystems, Inc., 490 San Antonio Rd., Suite 201, Palo Alto, CA 94306. (1) Yamashita, M.; Fenn, J. B. J. Phys. Chem. 1984, 88, 4451. (2) Yamashita, M.; Fenn, J. B. J. Phys. Chem. 1984, 88, 4471. (3) Karas, M.; Hillenkamp, F. Anal. Chem. 1988, 60, 2299. (4) Brown, R. S.; Lennon, J. J. Anal. Chem. 1995, 67, 1998. (5) Colby, S. M.; King, T. B.; Reilly, J.P. Rapid Commun. Mass Spectrom. 1994, 8, 869-875. (6) Vestal, M. L.; Juhasz, P.; Martin, S. A. Rapid Commun. Mass Spectrom. 1990, 9, 1044.
(7) Qin, J.; Chait, B. T. Anal. Chem. 1996, 68, 2108. (8) Annan, R. S.; Carr, S. A. Anal. Chem. 1996, 68, 3413. (9) Bai, J.; Qian, M. G.; Liu, Y. Anal. Chem. 1995, 67, 1705. (10) Mortz, E.; Sareneva, T.; Julkunen, I.; Roepstorff P. J. Mass Spectrom. 1996, 31, 1109. (11) Klabunde, T.; Stahl, B.; Suerbaum, H.; Hahner, S.; Karas, M.; Hillenkamp, F.; Krebs, B.; Witzel, H. Eur. J. Biochem. 1994, 226, 369. (12) Berkenkamp, S.; Karas, M.; Hillenkamp, F. Proc. Natl. Acad. Sci. U.S.A. 1996, 93, 7003. (13) Mock, K. K.; Davey, M.; Cottrel, J. S. Biochem. Biophys. Res. Commun. 1991, 177, 644. (14) Juhasz P.; Costello, C. E. J. Am. Soc. Mass Spectrom. 1992, 3, 785. (15) Tang, W.; Nelson, C. M.; Zhu, L.; Smith, L. M. J. Am. Soc. Mass Spectrom. 1997, 8, 218. (16) Nordhoff, E.; Kirpekar, F.; Roepstorff, P. Mass Spectrom. Rev. 1997, 15, 69. (17) Bentzley, C. M.; Johnston, M. V.; Larsen, B. S.; Gutteridge, S. Anal. Chem. 1996, 68, 2141. (18) Edman, P. Arch. Biochem. Biophys. 1949, 22, 475. (19) Edman, P.; Begg, B. Eur. J. Biochem. 1967, 1, 80. (20) Niall, H. D. Methods Enzymol. 1973, 27, 942. (21) McLafferty, F. W.; Bente, P. F.; Kornfeld, R.; Tsai, S. C.; Howe, I. J. Am. Chem. Soc. 1973, 95, 2120. (22) McLafferty, F. W.; Kornfeld, R.; Haddon, W. F.; Levsen, K.; Sakai, I.; Bente, P. F.; Tsai, S. C.; Schuddemage, H. D. R. J. Am. Chem. Soc. 1973, 95, 5, 3886. (23) McLafferty, F. W., Ed. Tandem Mass Spectrometry; Wiley-Interscience: New York, 1983. (24) Biemann, K. Methods Enzymol. 1990, 193, 455. (25) Barber, M.; Bordoli, R. S.; Sedgwick, R. D.; Tyler, A. N. J. Chem. Soc., Chem. Commun. 1981, 325. (26) Fenselau, C. Annu. Rev. Biophys. Biophys. Chem. 1991, 20, 205. (27) Busch, K. L.; Glish, G. L.; McLuckey, S. A. Mass Spectrometry/Mass Spectrometry: Techniques and Applications of Tandem Mass Spectrometry; VCH: New York, 1988.
1214 Analytical Chemistry, Vol. 70, No. 6, March 15, 1998
S0003-2700(97)01158-X CCC: $15.00
During the past decade, mass spectrometry has evolved as an essential tool in the study of biological macromolecules. Along with the development of electrospray ionization,1,2 the introduction of matrix-assisted laser desorption/ionization (MALDI)3 has revolutionized the way biomolecules are analyzed. Recent advances in time-of-flight mass spectrometry (TOF-MS)4-6 instrumentation, coupled with continued improvements in the practice and understanding of the MALDI technique, have led to powerful new instruments that are especially well suited for the biochemical researcher. Recent examples of MALDI TOF-MS biochemical applications include the analysis of peptides,7-9 proteins,10-12 oligosaccharides,13 glycolipids,14 and oligonucleotides.15-17
© 1998 American Chemical Society Published on Web 02/18/1998
While MALDI was initially described as a “soft” ionization process, a significant degree of analyte ion fragmentation occurs for MALDI-generated ions.30-33 This fragmentation can be exploited to gain sequence information for peptides. Metastable ion decay occurring in the first field-free region of a reflectron TOFMS (RE-TOF-MS)34 has been used to obtain structural information for moderate-sized peptides.30,35-37 These postsource decay (PSD)35 data are generated by successively changing the reflectron potential in discrete steps (typically 10-15) and then acquiring MALDI mass spectra. In this way, metastable ions within a specific m/z range can be brought into time focus at the detector. These mass spectra are then combined into a single composite mass spectrum. Typical fragmentation observed in PSD MALDI corresponds to an, an - 17, bn, and bn - 17 ion series. Alternatively, a curved-field reflectron-based TOF-MS instrument38,39 can be utilized to acquire PSD MALDI mass spectra without the need to step-scan the reflectron. Sequential degradation schemes based on the Edman degradation have also been developed and combined with MALDI TOF-MS for peptide sequencing. Both N-terminus40 and C-terminus41 sequencing has been demonstrated via this approach. PSD is not the only source of fragment ions that are generated in MALDI. Prompt fragmentation (i.e., fragmentation occurring on the same time frame as the ionization) has been reported for certain classes of analytes.14,42 Rapid fragmentation of MALDIgenerated peptide ions has also been observed to occur at low levels43,44 after laser irradiation. This fragmentation is associated with the amide nitrogen in the peptide backbone and results in different fragment ion series than are found in PSD. It is generally referred to as in-source decay (ISD), although at least some of the fragmentation may not be due to a true metastable ion decay process.45 All of this in-source fragmentation appears to be complete within 170 ns after the laser event and typically requires a TOF-MS equipped with delayed extraction (DE)4-6 to be observed.43-45 Previous studies43,44 have shown that this ISD produces cn, yn, and sometimes zn series fragment ions. While (28) Roepstorff, P.; Fohlman, J. Biomed. Mass Spectrom. 1984, 11, 601. (29) Johnson, R. S.; Martin, S. A.; Biemann, K. Int. J. Mass Spectrom. Ion Proc. 1988, 86, 137-154. (30) Spengler, B.; Kirsch D.; Kaufmann, R. Rapid Commun. Mass Spectrom. 1991, 5, 198. (31) Karas, M.; Bahr, U.; Strupat, K.; Hillenkamp, F.; Tsarbopoulos, A.; Pramanik, B. N. Anal. Chem. 1995, 67, 675. (32) Hill, J. A.; Annan, R. S.; Biemann, K. Rapid Commun. Mass Spectrom. 1991, 5, 395. (33) Spengler, B.; Kaufmann, R. Analusis 1992, 20, 91. (34) Tang, X.; Ens, W.; Mayer, F.; Standing, K. G.; Westmore, J. B. Rapid Commun. Mass Spectrom. 1989, 3, 443. (35) Kaufmann, R.; Kirsch D.; Spengler, B. Int. J. Mass Spectrom. Ion Processes 1994, 131, 355. (36) Spengler, B.; Kirsch, D.; Kaufmann, R.; Jaeger, E. Rapid Commun. Mass Spectrom. 1992, 6, 105. (37) Kaufmann, R.; Spengler, B.; Lutzenkirchen, F. Rapid Commun. Mass Spectrom. 1993, 7, 902. (38) Cornish, T. J.; Cotter, R. J. Rapid Commun. Mass Spectrom. 1993, 7, 1037. (39) Cornish, T. J.; Cotter, R. J. Rapid Commun. Mass Spectrom. 1994, 8, 781. (40) Chait, B. T.; Wang, R.; Beavis, R. C.; Kent, S. B. H. Science 1993, 262, 89. (41) Patterson, D. H.; Tarr, G. E.; Regnier, F. E.; Martin, S. A. Anal. Chem. 1995, 67, 3971. (42) Talbo, G.; Roepstorff, P. Rapid Commun. Mass Spectrom. 1993, 7, 201. (43) Brown, R. S.; Lennon, J. J. Anal. Chem. 1995, 67, 3990. (44) Brown, R. S.; Carr, B. L.; Lennon, J. J. J. Am. Soc. Mass Spectrom. 1996, 7, 225. (45) Brown, R. S.; Feng, J.; Reiber, D. C. Int. J. Mass Spectrom. Ion Processes 1997, 169-170, 169.
in some peptides yn series fragment ions have been observed in PSD,46 it is a common fragment ion series in ISD. The cn fragment ion series (which becomes the dominant fragment ion series in ISD above m/z 6000) has not been observed in PSD but is not uncommon in FAB MS/MS spectra.47 The extent and type of peptide fragmentation in ISD is also significantly influenced by the MALDI matrix that is employed.43,44 Typical acidic matrixes, such as 2,5-dihydroxybenzoic acid (DHB), produce a roughly equal distribution of yn and cn series fragment ions. In contrast, nonacidic matrixes such as 7-hydroxycoumarin produce mainly yn series fragment ions. Although previous studies with known peptide standards43 have demonstrated the potential applicability of ISD MALDI-generated peptide ions for obtaining sequence information, no reports of attempts to analyze true unknowns have appeared. While ISD MALDI is not capable of the low-resolution precursor ion selection of PSD (i.e., MS/MS), peptides can be isolated chromatographically prior to analysis. ISD MALDI fragment ions can potentially provide significant complementary sequence information to the more widely used PSD MALDI technique due to the different and simpler fragment ion series that are observed. The results from a study of six peptides purified by high-performance liquid chromatography (HPLC) and submitted as part of a single-blind study will be presented to demonstrate the utility and limitations of ISD MALDI data in obtaining sequence information on true unknown peptides, EXPERIMENTAL SECTION The delayed extraction linear TOF-MS instrument employed in these studies has been described in detail previously.5,43,44 For the present studies, ion source bias voltages of 24 kV and constant extraction delay times of 340 ns were employed for all samples. Extraction pulse voltages were selected to optimize mass resolution (via time lag focusing) for the singly protonated molecular ion of each peptide sample, as previously determined.5 Although under optimal conditions a mass resolution of as high as 3500 (fwhm) can be obtained on this instrument for smaller peptides such as melittin, the elevated laser fluences employed to generate high signal-to-noise (S/N) in-source fragmentation spectra result in a significant degradation of the obtainable mass resolution. To further improve the S/N of the in-source fragmentation spectra, a 20-MHz bandwidth filter is also employed as part of the transient digitizer (LeCroy, Chestnut Ridge, NY, model 8828A). Typical mass resolution for the fragment ions used in these studies is between 400 and 800. A nitrogen laser (model PL2300, PTI Lasers, Princeton, NJ), emitting at 337.1 nm and producing 700ps (fwhm) laser pulses, is focused to a spot size of approximately 150 µm × 250 µm. The unknown peptide samples employed in these studies were supplied by collaborators at the Protein Chemistry Systems Division of the Hewlett-Packard Co. and were provided as singleblind unknowns. The individual peptides were purchased from Sigma Chemical Co. (St. Louis, MO) and purified by reversedphase HPLC prior to their submission for ISD MALDI sequence characterization. HPLC purification was performed on a Vydec (46) Rouse, J. C.; Yu, W.; Martin, S. A. Proceedings of the 42nd ASMS Conference on Mass Spectrometry and Allied Topics, Chicago, IL, May 29-June 3, 1994; p 676. (47) Downard, K. M.; Biemann, K. J. Am. Soc. Mass Spectrom. 1993, 4, 874.
Analytical Chemistry, Vol. 70, No. 6, March 15, 1998
1215
reversed-phase column (2 mm × 250 mm) packed with 5-µm C-18 modified silica particles. A gradient elution was used to effect the separation. Initial solvent conditions were 95% solvent A (95% water, 5% acetonitrile, 0.1% trifluoroacetic acid) and 5% solvent B (30% water, 70% acetonitrile, 0.1% trifluoroacetic acid). The gradient used was 2% change/min (45-min total run time), to a final condition of 5% solvent A and 95% solvent B. Only after completion of ISD MALDI sequencing and submission of the results were the true sequences provided for comparison. MALDI matrixes employed were purchased from Aldrich Chemical Co. (Milwaukee, WI) and were purified after receipt by sublimation with recrystallization from an acetonitrile/water solution (1:1 v/v). Matrix solutions were prepared as 10 mM solutions in acetonitrile/water (30:70 v/v) to which 0.1% trifluoroacetic acid was added. An estimated 20 nmol of each peptide was collected from the HPLC and provided for ISD MALDI characterization. Analyte solutions were prepared in distilled, deionized water by appropriate dilution to a final estimated concentration of 10 µM. Samples were deposited (1 µL of analyte, followed by 2 µL of matrix) via a micropipet onto a 3-mm-diameter stainless steel probe tip and allowed to evaporate to dryness in air. Assuming equal deposition of analyte across the probe tip, the laser irradiates an area containing about 50 fmol of analyte. The analyte concentrations employed ensure good S/N for the resulting low-intensity fragment ions. For the ISD MALDI mass spectra collected, the ion signals from 20-100 laser shots were signal averaged at 5 ns/point time resolution. Each unknown’s average molecular mass was initially determined experimentally by conventional MALDI analysis. Two peptides of known molecular mass (the oxidized B chain of bovine insulin and substance P purchased from Sigma Chemical Co.) were added as internal calibrants, and a MALDI mass spectrum of each unknown was acquired. The m/z of the singly protonated molecular ion of each calibrant, along with the experimentally determined flight times of each ion, was used to mass calibrate an unknown’s TOF spectrum via a simple linear fit.41 Determination of the centroid of the unknown’s singly protonated molecular ion signal in the mass-calibrated spectrum provided an unknown’s molecular mass. For the instrument used for these studies, this approach can be expected to provide (0.1 Da mass accuracies for peptides in the 1-4-kDa mass range. ISD MALDI unknown peptide spectra were acquired in two stages. First, for each unknown peptide, a signal-averaged TOF spectrum employing normal MALDI laser fluences (i.e., just above threshold irradiance) was collected from 20 laser shots on a particular sample surface. The TOF for the singly and doubly protonated molecular ions of the unknown peptide and its previously determined molecular mass were used to establish a calibration equation for subsequent ISD MALDI spectra of a particular unknown. The laser fluence was then increased such that the protonated molecular ion signal of the unknown increased by typically an order of magnitude. This resulted in the singly protonated molecular ion of the unknown being well outside of the dynamic range of the transient digitizer. An ISD spectrum was then acquired at this elevated laser fluence by signal averaging the subsequent laser shots on the same sample surface. This approach effectively improves the dynamic range of the experiment for the much lower intensity fragment ions. 1216 Analytical Chemistry, Vol. 70, No. 6, March 15, 1998
RESULTS AND DISCUSSION Earlier studies43,44 of peptides of known sequence have established the fragmentation pathways to be expected in ISD MALDI and the potential analytical utility of the fragment ions that are observed for obtaining sequence information of small- to moderate-sized peptides and proteins. Use of ISD MALDI data for sequencing peptides of unknown sequence is more challenging. There are several inherent limitations to the use of ISD data for sequencing unknown peptides. Since ISD produces only amide bond cleavages and no side-chain cleavages of amino acids, amino acids of equivalent nominal molecular mass give rise to the same m/z fragment ion. For the common amino acids, leucine/isoleucine (Leu/Ile) cannot be distinguished on the basis of the resulting mass loss. Also, the mass difference for lysine/ glutamine (Lys/Gln) is too small to distinguish between this residue pair with current instrumentation. For these cases, both potential residues must be reported as possibilities in any proposed sequence based on ISD data. Additional possibilities exist if nonstandard amino acids are considered. A commonly occurring modified amino acid is hydroxyproline (Hyp), which has the same nominal mass as leucine and isoleucine. Because of the complementary fragment ion series (cn, yn, and zn) that are commonly produced in ISD MALDI, it is often possible to obtain a large degree of overlapping sequence information. The cn ion series is characterized by successive losses of amino acids from the C-terminus of the peptide (charge retained on the N-terminus of the peptide). The yn ion series is characterized by successive losses of amino acids from the N-terminus of the peptide (charge retained on the C-terminus of the peptide). The zn ion series, which also occurs in ISD MALDI, involves an additional loss of -NH at the amide bond relative to the yn ion series. The appearance of fragment ion pairs differing in m/z by 15 allows differentiation of the cn from the yn ion series. The only amino acid that systematically fails to produce a fragment ion in ISD MALDI is proline. As has been noted previously,43 proline cannot produce a cn or zn series fragment ion because of its cyclic nature. While yn series fragment ions are typically present, the lack of a cn fragment ion causes difficulty if the proline is situated close to the C-terminus of the peptide. This results from the difficulty in observing ISD MALDI fragment ions in the low m/z range of the mass spectrum. The large matrix ion background typically obscures the ISD MALDI fragments below a m/z of 400-500 (depending upon the matrix used). For this reason, it is important to have cn fragment ion data near the C-terminus of the peptide if a full sequence is to be obtained. An additional consideration in ISD MALDI for peptide sequencing is the choice of matrix that is employed. The relative intensities of the different fragment ions series vary as a function of the MALDI matrix, as has been previously noted.43,44 Ferulic acid (4-hydroxy-3-methoxycinnamic acid) and 2,5-dihydroxybenzoic acid (DHB) both typically produce a roughly equal distribution of cn and yn series ions, which provides for more overlapping of the sequence from both ends of a peptide. Less acidic matrixes, such as hydroxy-substituted coumarins or benzophenones, produce mainly yn series ions for many peptides. This fragmentation dependence on matrix can be useful for differentiating between the cn and yn series ions if a sufficient zn ion series is not present. Another factor in the choice of matrix for ISD is the intensities of
the doubly protonated molecular ion and associated matrix adduct ions that are present in the mass spectrum. Higher intensities for these ions are more likely to obscure the ISD fragment ions near the peptide’s doubly protonated molecular ion. By changing matrixes, the m/z of the matrix adduct ion species can sometimes be shifted enough to show ISD fragments that were not discernible in a different matrix. Use of matrixes that produce lower relative intensity doubly charged peptide ions (such as DHB) can also be useful. However, it is not always possible to obtain ISD MALDI sequence information near the doubly protonated ion (especially for larger, >3000 Da, peptides), and this is an inherent limitation of the technique. Based on the above general considerations, a protocol was developed for manually sequencing an unknown peptide. This involves an initial determination of each unknown’s molecular mass accurately via standard MALDI internal mass calibration techniques. This then allows the use of the singly and doubly protonated molecular ions of an unknown to mass calibrate each subsequent ISD MALDI TOF spectrum (see the Experimental Section for details). An ISD MALDI TOF spectrum is then acquired from, typically, a DHB or ferulic acid matrix and mass calibrated. Initial sequence information is most readily obtained from the middle of the ISD MALDI mass spectrum. This is done by measuring the mass differences between adjacent peaks until a partial amino acid sequence (considering standard and common nonstandard amino acid residues) can be discerned. This initial ion series is then expanded toward both ends of the peptide by searching for additional mass losses corresponding to known amino acids. Both yn and cn ion series are identified and differentiated by a corresponding zn ion series or through the use of alternative matrixes. The possibility of a terminal amino acid modification (such as a C-terminus amidation) also must be considered. Additional matrixes are employed, as needed to aid in sequence determination. Isomass amino acids are reported as being equally probable in a final sequence. Finally, a comparison of the theoretical fragment ion masses for the proposed sequence to the actual experimental data should typically not result in average absolute mass errors for all fragment ions greater than (0.5 Da. Any individual mass error greater than 0.5 Da for a fragment ion suggests the possibility of an incorrect assignment. Once this procedure was completed for the unknowns, the true sequence was provided for comparison with the proposed sequence. To simplify the following discussion of each unknown’s ISD MALDI mass spectrum, fragment ions are referred to by cn and yn designations appropriate to the actual sequences of each peptide. Unknown Peptide 1. The smallest of the unknown peptides submitted for ISD MALDI structural characterization was determined experimentally to have an average molecular mass of 1076.20 Da. Although low molecular mass peptides limit the number of amino acid residues that must be determined, they also pose a difficult problem for ISD MALDI sequencing. Matrixrelated ion signals typically obscure low m/z ISD MALDI fragment ions, with the exact cutoff depending on the matrix employed. This typically means that the last 3-5 possible fragment ions in a series cannot be determined from an ISD MALDI mass spectrum. This requires a peptide to have a long enough sequence length so that the amino acid sequence from both ends
of the peptide can be overlapped to determine the peptide’s entire sequence. A peptide such as unknown 1, with molecular mass of about 1000 Da, typically is the smallest peptide (9 or 10 amino acid residues) that allows for enough overlap to potentially obtain a complete sequence. Unknown peptide 1 was prepared in a DHB matrix to minimize the m/z range obscured by matrix related ions (m/z < 400), and its ISD MALDI mass spectrum is shown as Figure 1. By comparing mass differences between the different ions present, two fragment ion series could be distinguished. These correspond to the expected yn and cn ion series. No corresponding zn ion series was evident. In the absence of the zn ion series, the yn ion series was distinguished from the cn ion series by differences in expected m/z of the fragment ions. For a “standard” terminated peptide, a cn fragment ion is shifted to 1 m/z unit lower than a yn fragment ion would be for equivalent amino acid losses. This approach for distinguishing yn and cn ion series can be utilized, except in the case of peptides modified on their C-terminus by amidation. While it is possible to use this approach to distinguish yn and cn ion series, it places a much more stringent requirement on accurate mass calibration, which can be more difficult to achieve for larger peptides. The identified yn and cn ion series fragments are shown in Figure 1. Unassigned ions in this and subsequent spectra are of unknown origin. They may result from unidentified bond cleavages or low-level impurities in the sample or matrix that give rise to ion signals. While fragment ions for the first five residue losses from the N-terminus of the peptide were readily identified (y8-y4), the C-terminus was more difficult to assign. The c3-c5 fragment ions were readily identified, and the c7 ion (and the lack of a c6 ion) suggested a serine followed by a proline residue in this part of the sequence. The final portion of the sequence at the C-terminus was fit to the fragment ion data by assuming a phenylalanine followed by a terminal arginine (c7 and c8) residue. This combination of residues meant that any c8 and y8 fragment ions would be overlapped due to both ends of the peptide being terminated with an arginine residue. A small increase in the expected peak width for the single observable fragment ion is consistent with overlapping c8/y8 fragment ion signals. No other combinations of residues (including a possible proline residue) for this portion of the sequence was consistent with the ISD fragment ion data. The final proposed sequence for unknown peptide 1 is shown in Table 1, along with the actual sequence of the peptide. The only error made in the proposed sequence was that the Leu/Ile residue was actually a hydroxyproline residue. This error resulted from not considering isomass nonstandard amino acids. The experimental fragment ion data provided in Table 1 also fit well to the proposed sequence in terms of expected mass accuracies. Unknown Peptide 2. It generally is easier to obtain sequence information via ISD MALDI with moderate- to larger-sized peptides (3-5 kDa) than with small peptides. This is due both to the increase in the potential sequence overlap from both ends of the peptide and to more intense ISD MALDI fragment ion yields for larger peptides. The largest of the unknown peptides (2) provided for ISD MALDI sequence analysis was experimentally determined to have a molecular mass of 3482.85 Da. Utilizing ferulic acid as the matrix yielded an ISD MALDI mass spectrum (Figure 2) Analytical Chemistry, Vol. 70, No. 6, March 15, 1998
1217
Figure 1. ISD MALDI mass spectrum of unknown peptide 1 utilizing a 2,5-dihydroxybenzoic acid matrix: 340-ns extraction delay, 24-kV ion source bias voltage, and a 1.35-kV extraction pulse voltage. Table 1. MALDI Fast Metastable Decay Ions Observed for Unknown Peptide 1 (Measured MW ) 1076.20)a in 2,5-Dihydroxybenzoic Acid Matrix N-terminal loss (yn series) proposed
actual
Arg Pro Leu/Ile Gly Phe Ser Pro Phe Arg
Arg Pro Hyp Gly Phe Ser Pro Phe Arg
(y8) (y7) (y6) (y5) (y4)
calcd
exptl
920.90 823.78 710.62 653.57 506.47 419.39 322.27 175.17
920.79 823.83 710.62 653.62 506.68
av abs errorb a
C-terminal loss (cn series) error
-0.11 +0.05 0.00 +0.05 +0.21
0.08 ( 0.18
calcd
(c3) (c4) (c5) (c6) (c7) (c8)
174.16 271.28 384.44 441.49 588.59 675.67 772.79 919.89
exptl
error
384.13 441.32 588.58
- 0.31 - 0.17 - 0.01
772.84 920.79
+0.05 +0.90 0.14 ( 0.10
True MW ) 1076.27. b c error does not include c8 fragment.
exhibiting an intense and easily identified cn ion series that was nearly complete (c28-c17 and c14-c6). A much less complete ion series of lower intensity was also identified between m/z 3000 and 2000. Ions 15 m/z units lower (zn series ions) than several of the ions of this second ion series confirmed this as a yn ion series. The lower intensity observed for the yn ion series is consistent with previous observations41 that cn ion series become more dominant in ISD MALDI with larger m/z peptides and proteins. This cn ion series allowed for a complete C-terminus sequence to be proposed to within the last six residues of the N-terminus, with the exception of two adjacent residues in the middle of the peptide. The doubly protonated molecular ion and the associated 1218 Analytical Chemistry, Vol. 70, No. 6, March 15, 1998
matrix adducting ions are intense enough to obscure any fragment ions in this portion of the spectrum. Use of a different matrix (R-cyano-4-hydroxycinnamic acid) minimized matrix adducting to the peptide sufficient to determine the c16 fragment ion, but the doubly protonated molecular ion still overlapped with one necessary fragment ion (c15). The resulting m/z difference (201.86) between the two observable cn series fragment ions (c16-c14) suggested four possible amino acid pairs (aspartic acid/serine; alanine/methionine; valine/cystine; threonine/threonine). Normally, the yn ion series could be used to establish the correct amino acid pair. However, this peptide also has the yn series ion (y14) necessary to identify the correct amino acid pair obscured by the doubly protonated molecular ion. This effectively prevents
Figure 2. ISD MALDI mass spectrum of unknown peptide 2 utilizing a ferulic acid matrix: 340-ns extraction delay, 24-kV ion source bias voltage, and a 1.51-kV extraction pulse voltage.
assigning this portion of the sequence to other than one of the four possible amino acid pairs. The N-terminus of this peptide was more difficult to determine. This is due to a combination of the weak yn ion series and the large number of ion signals just below the singly protonated molecular ion. These ion signals (of indeterminate origin) make it difficult to determine some of the weak yn series ions necessary to complete the sequence of the N-terminus of the peptide. The y23, y24, and y25 ions allowed for the identification of two of the six N-terminus residues (phenylalanine and threonine), but what was presumed to be the y26 ion was only a shoulder on the more intense c26 ion. Although an accurate mass assignment could not be made on the raw data, the approximate mass loss was only consistent with a glycine residue. A partially resolved y28 ion suggested an N-terminal histidine residue, but no well-resolved y27 ion could be identified to easily complete the sequence. An ion thought to be a y27 ion and consistent with a threonine residue suggested that the final two residues were a threonine and asparagine. The final proposed sequence is provided in Figure 3, along with the correct sequence. The only mistake in the sequence was the misidentification of the y27 ion. The true sequence actually consisted of a serine and a glutamine instead of the proposed threonine and asparagine (note that these are isomass residue pairs). The experimental fragment ion data used to determine the above sequence are provided in Table 2 and show that excellent mass accuracy is obtained for all of the fragment ions. Of the 29 amino acid residues in this sequence, 25 were correctly identified (including the indistinguishable isomass residues). Two additional residues were limited to one of four isomass residue pairs. The remaining two residues were incorrectly assigned.
Figure 3. Comparison of the proposed sequence to the actual sequence for unknown peptide 2. Isomass residue pairs that were either incorrectly identified or could not be determined are in brackets.
Unknown Peptide 3. The final unknown peptide to be discussed in detail had an experimentally measured molecular mass of 1536.56 Da. Even though it is not a particularly large peptide, it provided intense cn, yn, and zn series fragment ions, as evidenced by the ISD mass spectrum (Figure 4) utilizing DHB Analytical Chemistry, Vol. 70, No. 6, March 15, 1998
1219
Table 2. MALDI Fast Metastable Decay Ions Observed for Unknown Peptide 2 (Measured MW ) 3482.85)a in Ferulic Acid Matrix N-terminal loss (yn series) residue
mass
calcd
His Ser Gln Gly Thr Phe Thr Ser Asp Tyr Ser Lys Tyr Leu Asp Ser Arg Arg Ala Gln Asp Phe Val Gln Trp Leu Met Asn Thr
137.14 87.08 128.13 57.05 101.11 147.18 101.11 87.08 115.09 163.18 87.08 128.15 163.18 113.16 115.09 87.08 156.19 156.19 71.08 128.13 115.09 147.18 99.13 128.13 186.21 113.16 131.20 114.10 101.11
3346.65 3259.57 3131.44 3074.39 2973.28 2826.10 2724.99 2637.91 2522.82 2359.64 2272.56 2144.41 1981.23 1868.07 1752.98 1665.90 1509.71 1353.52 1282.44 1154.31 1039.22 892.04 792.91 664.78 478.57 365.41 234.21 120.11
(y25)
(y20)
(y15)
(y10)
(y5)
exptl
error
3131.20 3074.76 2973.81 2826.48 2725.09 2638.44 2523.11 2359.55 2272.50
-0.24 +0.37 +0.53 +0.38 +0.10 +0.53 +0.39 -0.09 -0.06
1980.98 1868.31
-0.25 +0.24
1665.49 1509.93
-0.41 +0.22
1154.46 1039.76
+0.15 +0.54
av abs error a
C-terminal loss (cn series) calcd
(c5)
(c10)
(c15)
(c20)
(c25)
155.14 242.22 370.35 427.40 528.51 675.69 776.80 863.88 978.97 1142.15 1229.23 1357.38 1520.56 1633.72 1748.81 1835.89 1992.08 2148.27 2219.35 2347.48 2462.57 2609.75 2708.88 2837.01 3023.22 3136.38 3267.58 3381.68
exptl
error
676.06 776.28 863.67 978.94 1142.06 1229.19 1357.34 1520.61 1633.90
+0.37 -0.52 -0.21 -0.03 -0.09 -0.04 -0.04 +0.05 +0.18
1835.76 1992.39 2148.63 2219.66 2347.70 2462.96 2610.15 2709.16 2837.15 3023.35 3136.20 3267.29 3381.68
-0.13 +0.31 +0.36 +0.29 +0.22 +0.39 +0.40 +0.28 +0.14 +0.13 -0.18 -0.29 0.00
0.30 ( 0.35
0.21 ( 0.25
True MW ) 3482.79.
as a matrix. The zn series ions are particularly strong and easily allow identification of the corresponding yn ion series. This combination of a strong yn/zn ion series allowed easy identification of the N-terminus sequence down to the matrix overlap region of the spectrum (y15-y6). The cn ion series was also very strong and apparently easy to discern. Unfortunately, two cn series ions were not correctly identified initially (c13 and c11) in the spectrum. This error was further complicated by the fact that the molecular mass gaps produced by these missed fragments correspond to the expected molecular mass for other residues. For this reason, the missed fragment ions were not initially detected. This resulted in an initially proposed sequence (Figure 5) that had tryptophan/ asparagine substituted for the correct isomass sequence of glutamic acid/glycine/glycine/glycine. This error in identifying fragment ions is an inherent potential problem with manual interpretation of fragmentation ion data. In manual interpretation, it is very difficult to consider all potential residue combinations for comparison with experimental data. It is expected that, with an automated system where software could rapidly check all possible combinations of potential sequences, the correct sequence would also have been reported as being consistent with the fragmentation data. In such a case, the presence of the c13 and c11 fragment ions in the ISD mass spectrum would have suggested the correct sequence. Prior to submission of this proposed sequence for verification of the true sequence (also see Figure 5), these data were chosen to test the potential utility of combining partial ISD sequencing 1220 Analytical Chemistry, Vol. 70, No. 6, March 15, 1998
with database searching methods. Already there exist large databases of known peptide and protein sequences readily accessible via the Internet. The rapidly enlarging genomic databases currently being compiled for various organisms offer the potential of very large, searchable databases produced by translation of genomic data to corresponding protein sequences. This approach could vastly simplify sequence determination, in that only partial sequence data would be required for searching such databases. Database searching was performed by utilizing the PROWL World Wide Web (www) resource developed by Brian Chait at Rockefeller University (http://chaitsgi.rockefeller.edu) and Ron Beavis at the Skirball Institute at New York University Medical Center (http://128.122.10.5). The PROWL resource provides effective software tools for interfacing with the wide variety of Internet-based protein databases. The protein information tool at the site allows manual entering of a proposed sequence for searching versus different databases. The proposed sequence for unknown 3 was entered and searched against the OWL database.48 The OWL database is a nonredundant protein sequence database produced by combining the entries from several different large protein and translated oligonucleotide sequence databases. The search algorithm employed allows the inclusion of ambiguities in a proposed sequence, such as the inability to distinguish leucine from isoleucine. Upon searching the initially proposed sequence (48) Bleasby, A. J.; Wootton, J. C. Protein Eng. 1990, 3, 153.
Figure 4. ISD MALDI mass spectrum of unknown peptide 3 utilizing a 2,5-dihydroxybenzoic acid matrix: 340-ns extraction delay, 24-kV ion source bias voltage, and a 1.37-kV extraction pulse voltage. The zn ions are marked with asterisks.
Figure 5. Comparison of the intially proposed and databasemodified sequences to the actual sequence for unknown peptide 3.
for unknown 3, no match was found in the database. The lack of a sequence match raised questions as to the correctness of the proposed sequence. The N-terminal data were considered the most reliable due to the presence of both yn and zn series fragment ions. The partial sequence determined by these data (ADSGEGDF[L,I]AE) was searched against the OWL database and showed a single match to a portion of the R subunit of the 95kDa human fibrinogen protein (...GTAWTADSGEGDFLAEGGGVR...). In addition, the database also provided the information that a known active peptide, Fibrinopeptide A, is produced by cleavage of the fibrinogen protein at positions 20-35 (ADS-
GEGDFLAEGGGVR). This additional information from the database search allowed for the identification of the missed c13 and c11 fragment ions and suggested that the correct sequence for unknown 3 was that of fibrinopeptide A. The mass accuracies of the fragment ion data based on the proposed sequence as modified by the database search are quite good and are presented as Table 3. This sequence was subsequently confirmed as the unknown’s correct sequence. Unknown Peptides 4-6. The efforts to obtain sequencing information on the remaining three unknown peptides that were provided were not as successful as for unknowns 1-3. Unknown peptide 4 was determined experimentally to have a molecular mass of 2150.45 Da. This unknown provided strong cn and yn fragment ions. A major contributor to the sequencing errors made for this sample was the fact that this was the first of the unknowns for which ISD MALDI sequencing was attempted. The most significant error in assigning a sequence for this unknown was the incorrect assignment of the cn and yn ion series. This resulted in proposing a sequence that was reversed from the true sequence. This error primarily occurred from a larger than normal mass calibration error of the ISD mass spectrum and the failure to consider the identifiable zn fragment ion series to confirm the identity of the yn ion series (as was implemented for subsequent unknowns). Additional significant errors were made for this unknown in assigning the terminal amino acid residues, particularly on the N-terminus. Of the 20 total residues for this unknown, 14 of the residues were correctly identified, although they were in reversed order from the true sequence. Analytical Chemistry, Vol. 70, No. 6, March 15, 1998
1221
Table 3. MALDI Fast Metastable Decay Ions Observed for Unknown Peptide 3 (Measured MW ) 1536.57)a in 2,5-Dihydroxybenzoic Acid Matrix N-terminal loss (yn series) residue
mass
Ala Asp Ser Gly Glu Gly Asp Phe Leu/Ile Ala Glu Gly Gly Gly Val Arg
71.08 115.09 87.08 57.05 129.12 57.05 115.09 147.18 113.16 71.08 129.12 57.05 57.05 57.05 99.14 156.19
(y15)
(y10)
(y5)
calcd
exptl
1466.57 1351.48 1264.40 1207.34 1078.22 1021.16 906.07 758.89 645.73 574.65 445.53 388.47 331.41 274.35 175.21
1466.79 1351.51 1264.76 1207.69 1078.33 1021.26 905.98 759.12 645.36 574.59
av abs error a
C-terminal loss (cn series) error +0.22 +0.03 +0.36 +0.35 +0.11 +0.10 -0.09 +0.23 -0.37 -0.06
(c5)
(c10)
(c15) 0.19 ( 0.24
calcd
exptl
89.09 204.18 291.26 348.32 477.44 534.50 649.59 796.77 909.93 981.01 1110.13 1167.19 1224.25 1281.31 1380.45
649.54 796.78 909.78 980.93 1110.29 1167.36 1224.43 1281.12 1380.42
error
-0.05 +0.01 +0.15 -0.08 +0.16 +0.17 +0.18 -0.19 -0.03 0.11 ( 0.14
True MW ) 1536.64.
The analyses of the final two unknown peptides (5 and 6) were complicated by the apparent degradation of the samples, presumably during shipping or the subsequent storage at 0 °C (3-4 weeks) prior to analysis. The ISD mass spectrum of unknown 5 exhibited an intense series of ions which seemed to correspond to a bn ion series. As this type of fragmentation has not been observed previously in ISD spectra, some form of sample degradation was suspected. This was further suggested by a significant increase in the relative abundance of the “bn” ions upon further storage of the sample (again at 0 °C) for an additional several weeks. Unknown 6 also exhibited these “bn” ions to an even greater extent when it was examined. The continuous ion extraction MALDI mass spectra also showed these same “fragment” ions, which strongly suggested that significant degradation had occurred with both of these samples. Due to this sample degradation, these remaining two samples could not be included in this study. CONCLUSIONS The results from unknown samples 1-3 show that very useful sequence information can be obtained from unknown peptide samples that have been first purified by HPLC. Several improvements can be made in the future to minimize potential errors in sequencing with ISD MALDI. Utilization of a reflectron-based TOF mass spectrometer for performing ISD MALDI has already been shown to afford unit mass resolution for fragment ions up to a m/z of at least 3000.49 The resulting mass accuracy of the data also is improved to better than (0.1 Da when a reflectron is used. This mass resolution improvement would reduce the problem of potentially overlapping fragment ions. The improved mass accuracy would allow easier identification of the cn and yn ion series on the basis of expected mass differences in the fragment ion series. The development of a computer analysis (49) Cornett, D. S. Presented at the 45th ASMS Conference on Mass Spectrometry and Allied Topics, Palm Springs, CA, June 1-5, 1997; Poster MPJ 275.
1222 Analytical Chemistry, Vol. 70, No. 6, March 15, 1998
program to replace the manual sequencing approach used in these studies is needed for any routine use of ISD MALDI fragment ion data for sequencing. Several of the errors in the manual sequencing approach were due to the difficulty in comparing all possible sequence fits to the ISD data. It is likely that a wellwritten computer analysis program would help to minimize such errors. Such a program should report all possible sequences that are consistent with the fragmentation data. Even with manual sequencing, the ISD MALDI data are particularly useful in determining the middle portions of peptides. For purified peptides, ISD fragmentation data should prove very complementary to PSD MALDI in determining unknown sequences due to the significant differences in the observed fragment ions. Sample usage could also be reduced to at least the 1-pmol level by utilizing smaller area sample targets, such as those employed on commercial MALDI TOF-MS instruments. Although not an MS/MS technique, the ease and speed of acquiring ISD fragmentation data are attractive for applications that can provide a purified analyte. Future use of partial sequence information derived by ISD MALDI for rapidly and accurately searching either protein and/or translated genome sequence databases to assist in sequencing and identifying proteins also offers considerable promise. ACKNOWLEDGMENT This research was supported, in part, by grants from the National Institutes of Health, Divisions of Research Resources (RR05311) and General Medical (GM47914), and with funds provided by Utah State University. Additional funds to support these studies were also provided by the Hewlett-Packard Co., Palo Alto, CA. This support is gratefully acknowledged. Received for review October 21, 1997. Accepted January 9, 1998. AC971158D