De Novo Peptide Sequencing by Two-Dimensional Fragment

Analytical Research & Development Department, Amgen Inc., One Amgen Center Drive, Thousand Oaks, California 91320. A novel concept of two-dimensional ...
92 downloads 0 Views 330KB Size
Anal. Chem. 2000, 72, 2337-2350

De Novo Peptide Sequencing by Two-Dimensional Fragment Correlation Mass Spectrometry Zhongqi Zhang* and James S. McElvain

Analytical Research & Development Department, Amgen Inc., One Amgen Center Drive, Thousand Oaks, California 91320

A novel concept of two-dimensional fragment correlation mass spectrometry and its application to peptide sequencing is described. The daughter ion (MS2) spectrum of a peptide contains the sequence information of the peptide. However, deciphering the MS2 spectrum, and thus deriving the peptide sequence is complex because of the difficulty in distinguishing the N-terminal fragments (e.g., b series) from the C-terminal fragments (e.g., y series). By taking a granddaughter ion (MS3) spectrum of a particular daughter ion, all fragment ions of the opposite terminus are eliminated in the MS3 spectrum. However, some internal fragments of the peptide will appear in the MS3 spectrum. Because internal fragments are rarely present in the MS2 spectrum, the intersection (a spectrum containing peaks that are present in both spectra) of the MS2 and MS3 spectra should contain only fragments of the same terminal type. A two-dimensional plot of the MS2 spectrum versus the intersection spectra (2-D fragment correlation mass spectrum) often gives enough information to derive the complete sequence of a peptide. This paper describes this novel technique and its application in sequencing cytochrome c and apomyoglobin. For a tryptic digest of cytochrome c, ∼78% of the protein sequence was determined. For the Glu-C/tryptic digest of apomyoglobin, ∼66% of the protein sequence was determined.

Tandem mass spectrometry (MS/MS), combined with computer algorithms for database searching, has been widely used as a tool for the identification of proteins.1-6 However, in cases when the protein to be identified is not in a protein database or when full characterization of a protein is desirable, de novo sequencing is usually required. Until recently, most de novo peptide sequencing has been performed by automated Edman degradation, which is a timeconsuming procedure and requires relatively large amounts of highly purified material. Alternatively, different techniques such as C-terminal sequencing using carboxypeptidase digestion combined with mass spectrometry (MS)7,8 or peptide ladder sequenc* Corresponding author: (e-mail) [email protected]; (fax) (805)447-8690. (1) Lamond, A. I.; Mann, M. Trends Cell Biol. 1997, 7, 139-142. (2) Patterson, S. D. Biochem. Soc. Trans. 1997, 25, 255-62. (3) Jungblut, P.; Thiede, B. Mass Spectrom. Rev. 1997, 16, 145-162. (4) Yates, J. R. J. Mass Spectrom. 1998, 33, 1-19. (5) Kuster, B.; Mann, M. Curr. Opin. Struct. Biol. 1998, 8, 393-400. (6) Yates, J. R. Electrophoresis 1998, 19, 893-900. 10.1021/ac000226k CCC: $19.00 Published on Web 05/04/2000

© 2000 American Chemical Society

ing using a ladder-generating chemistry and mass spectrometry9 have been used to derive the amino acid sequence of a peptide. Both methods, however, require peptides that are purified. Tandem mass spectrometry, on the other hand, is capable of sequencing peptides from complex mixtures due to the inherent resolving power of this analytical technique. Although MS/MS has been successfully used to sequence many different types of peptides,10-21 de novo peptide sequencing by MS/MS is not routinely used because the interpretation of the resulting data is so complex. One of the major obstacles in implementing MS/MS in de novo peptide sequencing is distinguishing the C-terminal fragments from the N-terminal fragments. In most laboratories where MS/MS is used for peptide sequencing, an extra wet chemistry step is usually implemented to distinguish fragment ions of different terminal types. One approach has been to derivatize the peptide on either its N-terminus or C-terminus in order to direct the MS fragmentation, simplify the CAD spectra, or cause mass shifts to only specific fragments.10,20-27 Another procedure has been to perform the proteolysis in a buffer (7) Tsugita, A.; van den Broek, R.; Przybylski, M. FEBS Lett. 1982, 137, 1924. (8) Bradley, C. V.; Williams, D. H.; Hanley, M. R. Biochem. Biophys. Res. Commun. 1982, 104, 1223-30. (9) Chait, B. T.; Wang, R.; Beavis, R. C.; Kent, S. B. Science 1993, 262, 8992. (10) Hunt, D. F.; Yates, J. R., 3rd; Shabanowitz, J.; Winston, S.; Hauer, C. R. Proc Natl. Acad. Sci. U.S.A. 1986, 83, 6233-7. (11) Johnson, R. S.; Biemann, K. Biochemistry 1987, 26, 1209-14. (12) Johnson, R. S.; Mathews, W. R.; Biemann, K.; Hopper, S. J. Biol. Chem. 1988, 263, 9589-97. (13) Hopper, S.; Johnson, R. S.; Vath, J. E.; Biemann, K. J. Biol. Chem. 1989, 264, 20438-47. (14) Biemann, K. Methods Enzymol. 1990, 193, 455-79. (15) Hunt, D. F.; Henderson, R. A.; Shabanowitz, J.; Sakaguchi, K.; Michel, H.; Sevilir, N.; Cox, A. L.; Appella, E.; Engelhard, V. H. Science 1992, 255, 1261-3. (16) Papayannopoulos, I. A.; Biemann, K. Protein Sci. 1992, 1, 278-88. (17) Medzihradszky, K. F.; Gibson, B. W.; Kaur, S.; Yu, Z. H.; Medzihradszky, D.; Burlingame, A. L.; Bass, N. M. Eur. J. Biochem. 1992, 203, 327-39. (18) Papov, V. V.; Gravina, S. A.; Mieyal, J. J.; Biemann, K. Protein Sci. 1994, 3, 428-34. (19) Clauser, K. R.; Hall, S. C.; Smith, D. M.; Webb, J. W.; Andrews, L. E.; Tran, H. M.; Epstein, L. B.; Burlingame, A. L. Proc. Natl. Acad. Sci. U.S.A. 1995, 92, 5072-6. (20) Wilm, M.; Shevchenko, A.; Houthaeve, T.; Breit, S.; Schweigerer, L.; Fotsis, T.; Mann, M. Nature 1996, 379, 466-469. (21) Lingner, J.; Hughes, T. R.; Shevchenko, A.; Mann, M.; Lundblad, V.; Cech, T. R. Science 1997, 276, 561-7. (22) Vath, J. E.; Biemann, K. Int. J. Mass Spectrom. Ion Processes 1990, 100, 287-299. (23) Wetzel, R.; Halualani, R.; Stults, J. T.; Quan, C. Bioconjugate Chem. 1990, 1, 114-22. (24) Wagner, D. S.; Salari, A.; Gage, D. A.; Leykam, J.; Fetter, J.; Hollingsworth, R.; Watson, J. T. Biol. Mass Spectrom. 1991, 20, 419-25.

Analytical Chemistry, Vol. 72, No. 11, June 1, 2000 2337

containing H218O so that the C-terminus of the peptide is labeled with 18O and is easily distinguished from the N-terminal fragments by its distinct isotope peaks.28 Cleavage of the peptide C-terminus with carboxypeptidase is also an effective means of labeling the C-terminal fragments and distinguishing it from the N-terminal fragments.29 In most of the approaches described above, extra wet chemistry steps are required which limits the routine implementation of these techniques from the automation standpoint. Use of multiple stages of mass spectrometry (MSn) has also been attempted when sequencing peptides. For example, Lin and Glish demonstrated that the C-terminal sequence of a peptide can be obtained from lithium- and sodium-cationized peptides by multiple stages of mass spectrometry in both quadrupole ion trap and ion cyclotron resonance mass spectrometers.30 Even before ion trap instruments became widely available, Cooks and coworkers used a hybrid BEQQ instrument and a pentaquadrupole instrument to perform MS3 experiments and obtained reaction intermediate spectra of several peptides.31,32 When performing a reaction intermediate scan of a peptide, one selects the precursor ion with the first stage of MS and selects a final product ion with the third stage of MS. Intermediate ions connecting the selected precursor ion and final product ion can then be displayed by obtaining the spectra from the second stage of MS. For example, if an N-terminal immonium ion fragment of a peptide is selected as the final product ion, only N-terminal fragments are observed in the corresponding reaction intermediate spectra. The reaction intermediate scan technique greatly simplifies the fragment spectra of the original peptide. Recently, ion-trapping instruments such as Fourier transform ion cyclotron resonance mass spectrometers and especially quadrupole ion trap mass spectrometers have become commercially available and have been widely used by many laboratories. One of the advantages of using an ion-trapping MS instrument to interpret the sequence of a peptide is its ability to perform MSn experiments. However, a practical algorithm has not been available to derive a peptide’s sequence from this complex set of data. In this paper, we describe an experimental design that takes advantage of an ion-trapping instrument to perform MS3 experiments and a computer algorithm for automated data analysis to facilitate de novo peptide sequencing. The method described here can perform analysis on mixtures, involves no extra steps of wet chemistry, and may potentially be used for automated de novo peptide sequencing. METHODS MS2/MS3 Intersection Spectrum. When an MS/MS (MS2) experiment is performed on a particular peptide, normally a parent (25) Stults, J. T.; Lai, J.; McCune, S.; Wetzel, R. Anal. Chem. 1993, 65, 17038. (26) Zaia, J.; Biemann, K. J. Am. Soc. Mass Spectrom. 1995, 6, 428-436. (27) Zaia, J.; Chapman, J. R. Methods Mol. B 1996, 61, 29-41. (28) Schnoelzer, M.; Jedrzejewski, P.; Lehmann, W. D. Electrophoresis 1996, 17, 945-53. (29) Pfeifer, T.; Rucknagel, P.; Kuellertz, G.; Schierhorn, A. Rapid Commun. Mass Spectrom. 1999, 13, 362-369. (30) Lin, T.; Glish, G. L. Anal. Chem. 1998, 70, 5162-5. (31) Schey, K. L.; Schwartz, J. C.; Cooks, R. G. Rapid Commun. Mass Spectrom. 1989, 9, 305-309. (32) Schwartz, J. C.; Schey, K. L.; Cooks, R. G. Int. J. Mass Spectrom. Ion Processes 1990, 101, 1-20.

2338

Analytical Chemistry, Vol. 72, No. 11, June 1, 2000

ion is selected, isolated, and fragmented into daughter ions. The daughter ions consist primarily of N-terminal and C-terminal fragment ions and rarely contain internal fragments of the peptide. For a low-energy fragmentation process such as in an ion trap instrument with electrospray ionization, the major N-terminal fragments are b series ions and the major C-terminal fragments are y series ions. One of the major obstacles for deriving the amino acid sequence from an MS2 spectrum of a peptide is the difficulty differentiating the C-terminal fragments from the N-terminal fragments. The concept of an intersection spectrum is introduced to facilitate the separation and identification of C-terminal and N-terminal fragment ions of a peptide sequence. As illustrated in the schematics in Figure 1, individual daughter ions in an MS2 spectrum of a theoretical peptide are further isolated and fragmented to produce granddaughter ion spectra (MS3 spectra). These MS3 spectra contain two predominant types of fragment ions. Type I includes fragments that have the same terminus as their precursors (the daughter ions), and type II includes the internal fragments (relative to the entire peptide sequence) that have the other terminus of the precursors. It is frequently found that these type II internal fragments rarely exist in an MS2 spectrum. Thus, by finding peaks that are common to both MS2 and MS3 spectra, we can generate a spectrum that contains fragments of the same terminal type. We call this spectrum, which contains peaks that are common to the MS2 and MS3 spectra, an MS2/MS3 intersection spectrum. Practically, the intersection spectrum is calculated by taking the geometric mean of the MS2 and MS3 spectra, i.e., multiplying the intensities of the two peaks with the same mass, and then taking the square root of the product. If a particular mass has zero intensity in either one of the MS2 or MS3 spectra, the mass will have zero intensity in the intersection spectrum. A partial sequence of the peptide can then be read directly from the resulting intersection spectrum based on the differences in mass of the major remaining ions. Performing MS3 scans of many daughter ions of a peptide will generate many intersection spectra, which in turn will generate many partial sequences of different areas of the peptide. Combination of these partial sequences will often give the complete sequence of the peptide. To retrieve maximum sequence information from these spectra, the MS2 and MS3 spectra are reprocessed by adding their complimentary spectra to themselves before deriving their intersection spectra. For example, if the MS3 spectrum of an individual b fragment of a peptide is acquired and the charge tends to stay at the N-terminus, then one observes primarily b fragments rather than internal fragments. However, if the charge tends to stay on the C-terminus of the precursor b fragment, the MS3 spectrum will primarily contain internal fragments. As a result, the intersection of this “internal” MS3 spectrum and the MS2 spectrum will not give much useful information of the peptide’s sequence. Fortunately, though, the mass of the b fragment can be calculated from any resulting internal fragment because they are complimentary. Specifically, the sum of masses of the two complimentary fragments should equal the mass of their precursor with an additional proton. Therefore, to make the MS3 spectrum more informative, it can be reprocessed by adding its complimentary spectrum to it. The complimentary spectrum of an individual

Figure 1. The concept of MS2/MS3 intersection spectrum when applied to de novo sequencing of a hypothetical peptide IFVQK. The MS2 spectrum of this peptide contains both b and y daughter ions, and the MS3 spectrum of any b or y ion generates the corresponding b or y granddaughter ions as well as internal fragment ions. Calculating the geometric mean of MS2 and MS3 spectra (intersections) eliminates internal fragments and leaves ions of the same terminal type (b or y). Information from these intersection spectra can be used to derive the partial sequences that can be combined to determine the sequence of the entire peptide.

fragment spectrum is obtained by converting each fragment ion into an ion of the same intensity at its complementary mass. The complimentary mass is determined by the following formula:

Mcomplimentary ) Mprecursor + Mproton - M

(1)

where Mcomplimentary is the complimentary mass of the ion, Mprecursor is the precursor mass, Mproton is the mass of a proton (1.0073 amu), and M is the mass of each fragment ion. It should be noted that

the mass of an extra proton is added in this calculation because each of the two complimentary fragments carries one additional proton as a charge. Using this approach, any reprocessed MS3 spectrum will always include fragments that contain the terminal amino acid residue. In a different example, if an MS2 spectrum of a peptide contains primarily b fragments and only a few y fragments, then the resulting MS3 spectrum of any y fragment will not give much useful sequence information. If the MS2 spectrum is reprocessed as described above, then the reprocessed Analytical Chemistry, Vol. 72, No. 11, June 1, 2000

2339

Figure 2. Two-dimensional fragment correlation mass spectrum of the hypothetical peptide IFVQK. The horizontal axis is the m/z value of ions in the intersection spectra, and the vertical axis is the m/z value of ion in the MS2 spectrum. Each ion is represented by a circle with its intensity represented by the size of the circle. Points on the diagonal represent fragment ions in the MS2 spectrum, points above a diagonal ion are the precursor ions of the diagonal ion, and points to the left of a diagonal ion are its product ions. Thus, any off-diagonal point indicates a precursor-product relationship between the corresponding two diagonal ions. Because all ions in the 2-D plot are either N-terminal or C-terminal fragments, the sequence of the peptide can be derived from the 2-D plot. In this hypothetical case, the sequence of the peptide is determined three times. The direction of the sequence is determined by the exact match of a lysine residue (K) on the C-terminus and an isoleucine (I) residue on the N-terminus. The two reading directions are indicated by solid lines and dashed lines, respectively.

MS2 spectrum will have both b and y fragments with similar intensities. Therefore, the reprocessed MS3 spectrum of both b and y fragments will give significant information on the sequence of the peptide. Using the experimental conditions described in this report, it is not possible to determine the individual charges of the fragment ions. As a result, when the data from an MS2 spectrum derived from a multiply charged precursor ion are processed, only ions with m/z values greater than half of their precursor mass are used to derive the complimentary spectrum. This rule of thumb is used to avoid incorrectly converting multiply charged ions to their complimentary masses in the MS2 spectrum. In an MS3 spectrum, however, all ions are used to derive the complimentary spectrum because there is a lower probability of observing a multiply charged ion. Two-Dimensional Fragment Correlation Mass Spectrum. If an MS3 experiment is performed on many fragment ions in an MS2 spectrum, a series of intersection spectra will be produced. These intersection spectra can be combined together to generate a two-dimensional (2-D) plot of the MS2 scans versus each intersection spectrum (Figure 2). In this plot called a 2-D fragment correlation mass spectrum, the MS2 spectrum is represented by the points on the diagonal and the points off to the left of the 2340 Analytical Chemistry, Vol. 72, No. 11, June 1, 2000

diagonal represent the product ions of the ions on the diagonal. The off-diagonal points on top of any diagonal point represent the precursors of the ion on the diagonal. Thus, each off-diagonal point designates that the corresponding two ions on the diagonal are correlated by a precursor-product relationship. In this 2-D fragment correlation spectrum, a cross section in the horizontal direction represents a product ion spectrum and a cross section in the vertical direction represents a precursor ion spectrum. Because no internal fragments are present in the spectra in both the horizontal and vertical directions, the connected points represent fragments with the same terminus. Therefore, the amino acid sequence can be read directly from these cross sections according to the mass differences of these points. A major advantage of displaying the data in a 2-D format is that the peptide sequence can be read not only in the horizontal cross sections but also in the vertical cross sections. The sequence of the peptide can be determined from this type of 2-D fragment correlation spectrum by reading in the vertical cross section from high-to-low mass down to the diagonal points and then in the horizontal cross section from high-to-low mass starting from the diagonal points (Figure 2). Alternatively, the lowto-high mass can be read first in the horizontal cross section and then low-to-high mass in the vertical cross section. The correct direction of reading the sequence is established by correlating an individual single-residue fragment or a single-residue-loss fragment with a specific terminus and corresponding terminal residue. The standard procedure used for determining the identity of the amino acid at the N-terminus is to evaluate the mass of a single-residue fragment minus a proton, or the mass difference between the protonated peptide and a single-residue-loss fragment. A possible N-terminal residue is identified if any one of the two values equals one of the known 20 amino acid residue masses. The residue at the C-terminus can be determined similarly from the mass of a single-residue fragment minus a water molecule and a proton, or the mass difference between the dehydrated protonated peptide (loss of one water molecule) and a singleresidue-loss fragment. In the example shown in Figure 2 where the mass of the protonated hypothetical peptide is 634 (MH+ ) 634), the direction of the sequence is determined by the fact that the ion at mass 521 corresponds to an isoleucine (I) at the N-terminus (MH+ - 521 ) 113, which is the residue mass of I) and the ion at mass 488 matches a lysine (K) residue at the C-terminus (MH+ - 18 - 488 ) 128, which is the residue mass of K). Symmetrized 2-D Fragment Correlation Spectrum. For better visualization and easier data handling, the 2-D fragment correlation mass spectrum can be replotted as shown in Figure 3. In this new plot called a symmetrized 2-D fragment correlation spectrum, the off-diagonal points to the left of the diagonal line are symmetrically reflected to the other side of the diagonal line. The new off-diagonal points on the right side of the diagonal line indicate the same precursor-product relationship as before, but the peptide sequence is now easier to determine since only horizontal cross sections are read. Under ideal conditions, the sequence of the peptide can be derived from a single horizontal cross section. In most practical cases, however, data interpretation may be complicated by missing fragment ions or interfering ions. Fortunately, the complete

and B. If the two spectra A and B are identical (in a relative scale), the correlation function should maximize at µ ) 0. The cross-correlation values are then converted to a similarity score of the two spectra A and B using the following formula: -2

16

∑C

AB(µ)

CAB(0) SAB )

+

µ)-16

∑C

AB(µ)

µ)2

30

CAA(0) + CBB(0)

(3)

2

Figure 3. Symmetrized 2-D fragment correlation spectrum of the hypothetical peptide IFVQK. This plot was generated by symmetrically copying the off-diagonal ions from Figure 2 that are above the diagonal to the right of the diagonal. The sequence of the peptide is directly read from the horizontal cross sections of the plot, where the solid and dashed lines indicate the two different groups of scans.

peptide sequence can often be determined by combining all of the cross sections in the 2-D spectrum, which contains overlapping sequence information. For convenience, each horizontal cross section in the symmetrized 2-D fragment correlation spectrum is identified as a scan. Each scan contains fragments of only one terminal type. Due to the similarity among scans of the same terminal type, one can divide the scans into two groups (each corresponding to one terminal type) based on the similarities of these scans. A score representing the similarity of two spectra can be calculated according to a modified cross-correlation analysis33-35 as shown below. The cross-correlation between two mass spectra, A and B, is calculated using the following modified cross-correlation function:

∑A(m)B(m + µ) CAB(µ) )

m

∑A(m)∑B(m) m

(2)

m

where µ is a displacement mass value between the two spectra (which is varied over a range of values), CAB(µ) is the crosscorrelation value of the spectra A and B at a displacement mass value of µ and m is the m/z values of each peak in the spectra. Note that the denominator ∑mA(m)∑mB(m) is applied so that the CAB(µ) value is independent to the absolute scale of spectra A (33) Powell, L. A.; Heiftje, G. M. Anal. Chim. Acta 1978, 100, 3-327. (34) Owens, K. Appl. Spectrosc. Rev. 1992, 27, 1-49. (35) Eng, J. K.; McCormack, A. L.; Yates, J. R. I. J. Am. Soc. Mass Spectrom. 1994, 5, 976-989.

The score, SAB, represents the normalized difference of CAB when µ ) 0 and the mean of CAB over the range of -16 e µ e -2 and 2 e µ e 16. When spectra A and B are identical and each peak in the two spectra differs from other peaks by more than 16 amu, SAB equals 1. When spectra A and B are completely different (no common ions), SAB has a value of zero. When this similarity score is implemented, the range over which the mean of CAB (serves as a background of the cross-correlation function) is calculated is selected so that the common ammonia and water loss as well as isotopic peaks are not included in the background calculation. After we can evaluate the similarity of any two scans, the next step is to divide the scans into the two terminal types based on the similarity analysis. For convenience of dividing the scans into the two groups, one reference spectrum was generated for each of the two terminal types to represent the major characteristics of the spectra in the corresponding group. The following procedure describes the steps used to divide scans into the two groups utilizing similarity analysis and the concept of reference spectra. First, the two most similar scans (based on their similarity scores) are identified as belonging to group 1, and the sum of the two spectra is identified as reference spectrum 1 (RS1). All of the other scans are then compared to RS1 and the scan that is most similar to RS1 is identified. If the most similar scan has a similarity score of >0.6, it is considered to belong to group 1 and its spectrum is added into the RS1 to generate a new RS1. The remaining scans are then compared to the new RS1 and the most similar scan is identified. This iterative process continues until none of the remaining scans is similar to the latest RS1 by >0.6. One last comparison is made of the remaining scans to the latest RS1, and the one that is most different from RS1, yet has relatively large signals, is identified as reference spectrum 2 (RS2) as well as part of group 2. The remaining scans are then compared to both RS2 and RS1 to obtain two similarity scores. The scan that has the greatest difference between the two similarity scores and at the same time is similar to RS2 by >0.6 is considered as group 2 and added into RS2 to form a new RS2. Once again, this iterative process continues until none of the remaining scans is similar to RS2 by >0.6. Finally, all remaining scans are compared to both the final RS1 and RS2, and two similarity scores are calculated. If a scan has RS1 similarity score >0.4 and the difference of the two similarity scores is greater than 0.2, it is considered to be part of group 1. If a scan has an RS2 similarity score of >0.4 and the difference of the two similarity scores is greater than 0.2, it is considered to be part of group 2. If a scan does not meet any of the two criteria, its group is not determined and this scan is not used in the sequence determination. Analytical Chemistry, Vol. 72, No. 11, June 1, 2000

2341

Figure 4. Regenerated 1-D spectra of the hypothetical peptide IFVQK. Scans that are similar to each other are added together to obtain two spectra (A and B, containing primarily y fragments or b fragments, respectively). The partial or complete sequence can be read from (A) or (B). (B) can be converted to its complimentary spectrum (denoted as B-1) and added to (A) to obtain spectrum C. Spectrum D is obtained by taking the difference of spectra A and B, then converting the ions with negative intensities to their complimentary ions, and adding the absolute values of their intensities to the positive part of the spectrum. The full sequence of the peptide can be read from spectra C and D. Spectrum E is obtained by adding spectrum C to the reprocessed MS2 spectrum.

Regenerated 1-D Spectra. After the scans are separated into two groups, several 1-D spectra, each containing only one type of fragment (either b or y), can be regenerated (Figure 4). The two simplest regenerated 1-D spectra (spectra A and B in Figure 4) are derived by summing scans within each group. The partial sequence, or sometimes complete sequence, of the peptide can usually be read from spectra A and B. To further combine the sequence information represented in spectra A and B, spectra A and B can be combined into a single 1-D spectrum. Two different approaches are used to combine spectra A and B. In the first approach, spectrum B is transformed into its complimentary spectrum using eq 1 and then added to spectrum A to form 2342 Analytical Chemistry, Vol. 72, No. 11, June 1, 2000

spectrum C (Figure 4). Another approach is to subtract spectrum B from spectrum A to obtain a new spectrum with one type of fragment pointing up and the other type pointing down (spectrum D′ in Figure 4). Ions with negative intensities are then transformed into their complimentary ions and the absolute values of their intensities are added to the positive part of the spectrum to form spectrum D (Figure 4). The latter approach is helpful in reducing the interfering ions that are present in both groups of scans, such as ions caused by internal fragments in the MS2 spectrum. Each of the four regenerated 1-D spectra has advantages and disadvantages. For example, spectra A and B are especially helpful when one of the two 1-D spectra contains many fragments other

Figure 5. Flowchart for deriving sequence candidates from a regenerated 1-D mass spectrum (see text for a complete description).

than b or y (which makes it more difficult to interpret when the two spectra are combined), but they contain the least amount of sequence information. Spectrum C contains the maximum sequence information, but also has the most interfering ions. Spectrum D has less interfering ions compared to spectrum C, but may lack some important sequence ions (if two ions of different terminal types have the same mass). To maximize sequence information that can be obtained from the available data, all of the above regenerated 1-D spectra (A-D) are evaluated and a list of sequence candidates are obtained from each of the four regenerated 1-D spectra. In cases when few sequence candidates are identified from the above regenerated 1-D spectra due to missing fragment ions, the reprocessed MS2 spectrum itself may be added into spectrum C to generate a 1-D spectrum E (Figure 4), from which a list of sequence candidates can be identified. However, incorrect sequence may often be derived due to the large amount of interfering ions present in the MS2 spectrum. As a result, spectrum E is used only when spectra A-D fail to provide any sequence candidates of reasonable quality. Identifying Sequence Candidates. Figure 5 shows the algorithm used to identify sequence candidates from each regenerated 1-D spectrum. The following procedures describe the steps involved: Step 1. Sequence candidates are read in both directions (from low-mass end and high-mass end) of a regenerated 1-D spectrum. Starting from the low-mass end of a regenerated 1-D spectrum, the next ion that corresponds to a sequencing starting point is found. The mass of a starting point must correspond to the mass of a protonated residue or combination of residues, which represents a b ion. Once a starting point has been determined, step 2 is initiated (see below) after setting the starting point as

the current position. If no more starting points are found when reading the sequence from the low-mass end, then the procedure can be repeated reading from the high-mass end. The mass difference of MH+ and a starting point on the high-mass end must correspond to the mass of a residue or a combination of residues, which represents a y ion. The procedures end when no more starting points are found. Step 2. From the ions following the current position to the highor low-mass end (depending upon the reading direction), all ions with ∆m (mass difference between the ion and the current reading position) corresponding to the mass of a standard amino acid residue are identified. Once an ion of this type has been identified, step 3 is initiated. If no ions of this type are found, then step 1 is repeated. Step 3. Each ion identified in step 2 is established as the current position and its mass is compared to the mass of any possible large b ions (for a reading direction from low to high mass) or small y ions (for a reading direction from high to low mass) to determine if the ion defines a possible C-terminus. If the ion is found to define a possible C-terminus, then a sequence candidate is identified and step 4 is followed to record the sequence candidate. Note that if the large b ions and small y ions are not a single-residue-loss or single-residue fragment, the sequence candidate represents only a partial sequence of the peptide. Regardless whether the possible C-terminus is defined by the ion or not, step 2 is always repeated to continue reading the sequence. Step 4. When a sequence candidate is identified, the primary score of the sequence is calculated as described in the next section. If the primary score meets the criteria set by the user, then it is recorded in the sequence list for further evaluation. Sequence Evaluation. The sequence of a peptide can often be determined directly from the regenerated 1-D spectra. However, in some cases the peptide sequence cannot be unambiguously established. An efficient way of evaluating a sequence is to assign a score to each potential sequence by comparing the simulated spectrum to the experimentally derived spectrum. The primary scoring method used in this study is based on the fraction of ion intensities used to derive the peptide sequence in a regenerated 1-D spectrum. Specifically, the sum of intensities of ions that can be assigned either b or y fragments is divided by the sum of intensities of all ions in the regenerated 1-D spectrum. If several possible sequences can be derived from the regenerated 1-D spectrum or spectra, those sequences can also be evaluated by the following similarity analysis. The similarity score between two spectra is calculated the same way as described earlier (eq 3) for the following three similarity analyses: 1. Similarity to the Regenerated 1-D Spectrum. On the basis of the proposed sequence, a spectrum containing only b ions (and their corresponding water loss and ammonia loss, etc.) and a spectrum containing only y ions (and their corresponding water loss, ammonia loss, etc.) are simulated. Two spectra are simulated because it is not known if the regenerated 1-D spectrum contains primarily b ions or y ions. A similarity analysis between the simulated spectra and the regenerated 1-D spectrum is performed, and the similarity scores of both spectra are calculated. The larger of the two scores is used as the final score. 2. Similarity to the MS2/MS3 Intersection Spectra. On the basis of the proposed sequence, b and y fragments in the MS2 spectrum Analytical Chemistry, Vol. 72, No. 11, June 1, 2000

2343

Table 1. Top Six Sequence Candidates Determined for Peptide 9IFVQK13 of Cytochrome c and Their Scores sequencea

SPrimary

S1-D

SIntersection

SMS2

LFVKK LFKVK LFVKAG LFVKGA LFKVAG LFKVGA

0.818 0.760 0.630 0.630 0.585 0.585

0.852 0.791 0.773 0.772 0.718 0.717

0.928 0.867 0.830 0.829 0.775 0.775

0.593 0.575 0.519 0.519 0.504 0.504

a

Table 2. Top Six Sequence Candidates for Peptide 40TGQAPGFTYTDANK53 of Cytochrome c and Their Scores sequencea

SPrimary

S1-D

SIntersection

SMS2

TGKAPGFTYTDANK TGKAPGFTYTPAPY TGKAPGFTYTDANAG TGKAPGFTYTDANGA TGKAPGFTYTWNK TGKAPGFTYTDAKN

0.851 0.778 0.767 0.767 0.752 0.690

0.478 0.492 0.450 0.450 0.528 0.475

0.590 0.602 0.557 0.557 0.654 0.580

0.339 0.361 0.318 0.318 0.396 0.337

Note L is indistinguishable from I; K is indistinguishable from Q. a

can be assigned. With these assignments, the grouping of the scans in the symmetrized 2-D fragment correlation spectrum is automatically determined. A 1-D spectrum that contains only N-terminal fragments can be regenerated from this grouping and a similarity analysis can be performed to calculate a similarity score between the regenerated 1-D spectrum and a spectrum simulated according to the proposed sequence. 3. Similarity to the Reprocessed MS2 Spectrum. An MS2 spectrum is simulated from the proposed sequence (containing b, y, and loss of water and ammonia from b and y), and a similarity analysis is performed between the simulated spectrum and the reprocessed MS2 spectrum to derive a score. When spectra were simulated from the peptide sequence in this study, the following rules or guidelines were used: (1) b or y ions have an ion intensity of 50. (2) Cleavage at the N-terminus of a proline residue (P) increases the ion intensity of the corresponding fragments by 50. (3) Cleavage at the C-terminus of a proline residue (P) decreases the ion intensity of corresponding fragments by 30. (4) For each serine (S), threonine (T), aspartic acid (D), and glutamic acid (E) residue found in a fragment, the ion intensity for water loss increases by 10. (5) For each asparagine (N) and glutamine (Q) residue found in the fragment, the ion intensity for ammonium loss increases by 5. Note that because all MS2 and MS3 spectra have been reprocessed by adding their complimentary spectra, it is not necessary to weigh b and y ions differently according to the most probable charge location. However, water or ammonium losses become water and ammonium gains in the complimentary spectra and must be accounted for when a spectrum is simulated. EXPERIMENTAL SECTION Horse heart cytochrome c, horse apomyoglobin, and ammonium bicarbonate were purchased from Sigma Chemical Co. (St. Louis, MO), Trypsin and endoproteinase Glu-C were purchased from Boehringer Mannheim (Chicago, IL). Acetic acid was purchased from Fisher Scientific Co. (Pittsburgh, PA). HPLC grade methanol was purchased from Burdick & Jackson (Muskegon, MI). Water was purified with a Milli-Q water purification system (Millipore, Bedford, MA). Cytochrome c (200 µM) was digested with trypsin in 0.01 M ammonium bicarbonate (pH 7.8) at 37 °C for 18 h at an enzyme/ substrate ratio of 1:100. Apomyoglobin (50 µM) was digested with endoproteinase Glu-C in 0.01 M ammonium bicarbonate (adjusted to pH ∼5 with acetic acid) at 25 °C for 18 h at an enzyme/substrate ratio of 1:100. A fraction of the Glu-C digest of apomyoglobin was further digested with trypsin at 37 °C for 18 h at an enzyme/ 2344 Analytical Chemistry, Vol. 72, No. 11, June 1, 2000

Note K is indistinguishable from Q.

substrate ratio of 1:100. Each peptide mixture was then diluted into separate aqueous solutions containing 1% acetic acid and 50% methanol to make the final peptide concentration of 5 µM. The solutions were analyzed on a Finnigan-MAT LCQ “classic” quadrupole ion trap mass spectrometer (San Jose, CA) using a custom nanospray interface built at Amgen. Glass capillary nanospray needles (New Objectives, Cambridge, MA) with ∼2µm-i.d. tips were used, and a needle voltage of 0.75 kV was supplied. The resulting flow rate for this nanospray system was estimated to be 50-100 nL/min. For each MS experiment, a singly or doubly charged peptide ion was isolated and fragmented, and then several major fragment ions were further isolated and fragmented. A typical experiment contained 5-20 scan events. The first one or two events were the MS2 scan of the parent ion, and the following events (usually 4-19 events, depending on the size of the peptide) were the MS3 scans of some major fragment ions, which usually are the most intense fragment ions. The isolation window was usually set at 1.5 mass units and the relative collision energy was set at 1030%, depending on the size and charge of the peptide. Spectral data were collected, converted into text format using the file converter provided with the LCQ software, and then imported into a custom computer program written at Amgen to perform the functions as outlined in the Methods section. The computer program was written in C++ under Microsoft Visual C++ environment. All analyses of MS experimental data were performed using this program on a Hewlett-Packard 400 MHz Pentium II computer. Sequence determination is fully automated after the data are imported into the program and usually takes a fraction of a second to a few seconds to finish, depending on the complexity of the data. RESULTS Peptides in a cytochrome c digestion mixture were analyzed using the procedures described in the Methods section. Figure 6 shows the resulting MS2 spectrum, symmetrized 2-D fragment correlation spectrum and one of the regenerated 1-D spectra of a short peptide 9IFVQK13. Several major fragment ions in the MS2 spectrum (indicated by arrows) were selected and fragmented to generate the 2-D spectrum. Scans were separated into two groups according to the procedures described in the Methods section. The two groups of scans were combined together to produce the regenerated 1-D spectrum, and the corresponding sequence was read. It should be noted that all masses were reduced to their nominal masses before deriving the 2-D spectrum. The nominal

Figure 6. MS2 spectrum (top panel), experimentally derived symmetrized 2-D fragment correlation spectrum (center panel), and one of the regenerated 1-D spectra (bottom panel) of the tryptic peptide 9IFVQK13 of cytochrome c. The arrows in the top panel designate the daughter ions that were selected for MS3. The two groups of scans in the 2-D spectrum are identified with black and gray circles, respectively. The open circles in the 2-D spectrum indicate those scans whose grouping has not been conclusively identified. The sequence of this peptide is read directly from the regenerated 1-D spectrum, but the residue isoleucine (I) is indistinguishable from leucine (L) and glutamine (Q) is indistinguishable from lysine (K).

mass of each compound is its mass calculated using 12 as the mass of carbon, 16 as the mass of oxygen, 1 as the mass of hydrogen, 14 as the mass of nitrogen, etc., instead of using the accurate mass of the isotope. For large peptides (g1000 amu), fractional masses of many atoms may add up to a value greater than 1; as a result, their nominal mass may be less than the closest

integer of their accurate mass. For peptides, it was found that a more reliable way of determining the nominal mass was dividing its measured monoisotopic mass by 1.0005 before rounding off to the nearest integer. To further evaluate the reliability of the determined sequence, all possible sequences were assessed and ranked using four Analytical Chemistry, Vol. 72, No. 11, June 1, 2000

2345

Figure 7. MS2 spectrum (top panel), symmetrized 2-D fragment correlation spectrum (center panel), and one of the regenerated 1-D spectra (bottom panel) of a longer tryptic peptide of cytochrome c, 40TGQAPGFTYTDANK53. In the symmetrized 2-D mass spectrum, the number of off-diagonal peaks on the right of the diagonal is smaller than the number of off-diagonal peaks on the left of the diagonal because many low-intensity off-diagonal peaks are not copied to the right of the diagonal to simplify calculation. For each scan in the 2-D spectrum, the fragment ions whose masses differ from their precursor mass by less than 57 amu are removed because they contain no sequence information. The two groups of scans in the 2-D spectrum are identified with black and gray circles, respectively. The open circles indicate those scans whose grouping has not been conclusively identified. The sequence of this peptide is read directly from the regenerated 1-D spectrum, but the residue glutamine (Q) is indistinguishable from lysine (K).

different scoring methods (Table 1). It is apparent that the correct sequence LFVKK, which is equivalent to IFVQK, is on top of the list of all four scoring methods. Since the instrumental conditions used in this study prevent distinguishing isoleucine (I) from leucine (L) and glutamine (Q) from lysine (K), L is used to 2346 Analytical Chemistry, Vol. 72, No. 11, June 1, 2000

represent either I or L and K to represent either Q or K. To demonstrate that this technique can be used to sequence larger peptides, the 14-residue peptide 40TGQAPGFTYTDANK53 was subjected to this type of correlation analysis (Figure 7). For a peptide of this length, the reading of a sequence is not as

Table 3. Sequencing Results of Peptides in Tryptic Digest of Cytochrome c actual sequence 1actyl-GDVEK5 8KIFVQK13 9IFVQK13 26HKTGPNLHGLFGR38 28TGPNLHGLFGR38 28TGPNLHGLFGRK39 40TGQAPGFTYTDANK53 56GITWK60 61EETLMEYLENPK72 73KYIPGTK79 74YIPGTK79 80MIFAGIK86 80MIFAGIKK87 88KTER91 88KTEREDLIAYLK99 92EDLLAYLK99 100KATNE104

derived sequencea

SPrimary (rankb)

S1-D (rank)

SIntersection (rank)

SMS2 (rank)

VDVEK KLFVKK LFVKK HKTGNPLHGLFGR TGPNLHGLFGR ...PNLHGLFGRK TGKAPGFTYTDANK GLTWK ...MEYLEN... ...NELYEM... KYLPGTK YLPGTK MLFAGLK MLFAGLKK KTER ...EREDLLAYLK EDLLAYLK KATNE AGATNE

0.80(1) 0.54(1) 0.82(1) 0.82(1) 0.82(1) 0.55(1) 0.85(1) 0.65(1) 0.90(1) 0.90(1) 0.56(1) 0.79(1) 0.71(1) 0.78(1) 0.77(1) 0.58(1) 0.69(1) 0.73(2) 0.75(1)

0.84(1) 0.70(1) 0.85(1) 0.71(1) 0.84(1) 0.47(1) 0.48(3) 0.69(1) 0.87(1) 0.87(1) 0.73(1) 0.93(1) 0.80(1) 0.80(1) 0.86(1) 0.57(3) 0.78(1) 0.87(1) 0.79(3)

0.88(1) 0.80(1) 0.93(1) 0.73(1) 0.85(1) 0.56(1) 0.59(3) 0.81(1) 0.88(1) 0.88(1) 0.78(1) 0.94(1) 0.78(1) 0.86(2) 0.89(1) 0.57(3) 0.81(1) 0.91(1) 0.82(3)

0.68(1) 0.71(1) 0.59(1) 0.42(1) 0.76(1) 0.56(1) 0.34(3) 0.58(1) 0.78(1) 0.78(1) 0.82(1) 0.58(1) 0.44(1) 0.44(3) 0.44(1) 0.43(3) 0.68(1) 0.56(3) 0.53(4)

a The residues determined incorrectly are underlined. “...” means information is not sufficient to derive sequence in those areas. When ambiguities occur, both possible sequences are shown. Note L is indistinguishable from I and K is indistinguishable from Q. b The ranks are rated among the top six sequence candidates according to the primary scores.

straightforward as the shorter peptide presented in Figure 6 (especially toward the C-terminal ends). However, on the basis of the regenerated 1-D spectrum, the computer program was able to find all the possible sequences and then rate them by their scores with the different scoring methods. Table 2 shows the top six sequence candidates and their scores. It can be seen that the correct sequence is on top of the list of the primary score and is in third place on the lists of other three scoring methods. After many different peptides were evaluated, it was found the rank of the primary score is much more important than the rank of other scoring methods. In most cases, the sequence with highest primary score is the correct sequence, and the values of other scores are used for confirmation purposes. When this technique was used to evaluate the sequences of other tryptic peptides of cytochrome c, the results were encouraging. Table 3 shows the scores and their ranks for other tryptic peptides of cytochrome c. The ranks are rated among the top six candidates according to their primary scores. Of the 17 peptides analyzed, 11 generate completely correct sequences (excluding ambiguities of the I/L and Q/K pairs). Exceptions include a relative large peptide 26HKTGPNLHGLFGR38, which have two incorrect residue assignments (NP for PN). The reason for these incorrect assignments is a common phenomenon that a very intense fragment ion of Pro-N cleavage is often observed, while the Pro-C cleavage is often weak or not observed. In this case, the P-N cleavage was not observed, and as a result the correct sequence is not identified as a sequence candidate. In addition, for peptides 28TGPNLHGLFGRK39, 61EETLMEYLENPK72, and 88KTEREDLIAYLK99, only a partial sequence was determined. The N-terminal residue of peptide 1actyl-GDVEK5 was incorrectly assigned because the acetylated glycine (G) residue matched the mass of a valine (V) residue, and peptide 100KATNE104 has an ambiguity in assignment of the first residue (K or AG). By combining the sequences from the candidates with the highest primary scores, 78% of the sequence of cytochrome c is covered (Figure 8). The only parts of the sequence that are not covered

Figure 8. De novo sequencing results of a tryptic digest of cytochrome c. The rectangular boxes indicate the tryptic peptides that were analyzed using this technique. Gray areas are segments of the peptide that are correctly sequenced, open areas are segments that do not have enough information to derive a sequence, and the black areas are segments that are sequenced incorrectly.

are the heme-containing peptide (C14-K22) and a few two- or three-residue peptides. If the heme-containing peptide is not included in the calculation, the sequence coverage of cytochrome c approaches 85%. Because a tryptic peptide has a charge-carrying lysine (K) or arginine (R) residue at its C-terminus, the fragment pattern of a tryptic peptide is usually more complete and predictable than other peptides. This behavior of a tryptic peptide makes its de novo sequencing often easier than other peptides. To demonstrate that the proposed technique applies to other common proteases besides trypsin, the 153-residue protein, apomyoglobin, was digested with Glu-C followed by an additional digestion with trypsin. Table 4 shows the scores and their ranks for the apomyoglobin peptides after being subjected to this type of sequence analysis. When the success rates of reading the sequences of tryptic (a lysine or arginine residue on the Cterminus) and nontryptic peptides were compared, they were Analytical Chemistry, Vol. 72, No. 11, June 1, 2000

2347

Table 4. Sequencing Results of Peptides in Glu-C/Trypsin Digest of Apomyoglobin actual sequence tryptic peptides 7WQQVLNVWGK16 28VLIR31 42KFDKFK47 60DLKK63 64HGTVVLTALGGILK77 86LKPLAQSHATK96 137LFRNDIAAK145 non-tryptic peptides 1GLSDGE6 19ADIAGHGQE27 28VLIRLFTGHPE38 32LFTGHPE38 39TLE41 42KFDKFKHLKTE52 48HLKTE52 48HLKTEAE54 53AEMKASE59 55MKASE59 78KKGHHE83 79KGHHEAE85 97HKIPIKYLE105 103YLE105 134ALE136 137LFRNDIAAKYKE148 146YKE148 149LGFQG153

SPrimary (rankb)

S1-D (rank)

SIntersection (rank)

SMS2 (rank)

...KVLNVWGK ...KVLNVGEGK VLLR KFDKFK DLKK HGTVVL... LKKDWMNMK LFRNDLAAK

0.44(1) 0.44(1) 0.87(1) 0.70(1) 0.67(1) 0.35(1) 0.48(1) 0.74(1)

0.68(1) 0.63(2) 0.96(1) 0.83(1) 0.81(1) 0.57(1) 0.46(1) 0.65(1)

0.74(1) 0.71(2) 0.97(1) 0.86(1) 0.91(1) 0.65(1) 0.59(1) 0.67(1)

0.60(2) 0.61(1) 0.60(1) 0.65(1) 0.66(4) 0.77(1) 0.29(1) 0.54(2)

GLSDGE ADLAGHGKE ..LRLFTGHPE LFTGHPE TLE KFDKFKHLK.. HLKTE HLKTEAE AEMKASE FSAKL MKASE KKGHHE KGHHEAE HKLFAWPLE YLE ALE ..LNDLAAKYKE FKYKAALDNL.. YKE LGFKG

0.60(1) 0.93(1) 0.81(1) 0.71(1) 0.18(1) 0.74(1) 0.88(1) 0.63(1) 0.76(1) 0.80(1) 0.80(1) 0.55(1) 0.69(1) 0.69(1) 0.88(1) 0.96(1) 0.83(1) 0.83(1) 0.70(1) 0.90(1)

0.77(1) 0.63(3) 0.74(1) 0.68(1) 0.46(3) 0.79(1) 0.69(3) 0.76(1) 0.69(2) 0.82(1) 0.79(2) 0.81(1) 0.76(1) 0.62(6) 0.96(1) 0.98(1) 0.67(1) 0.66(2) 0.86(1) 0.83(1)

0.76(1) 0.63(4) 0.76(1) 0.82(1) 0.46(3) 0.82(1) 0.77(3) 0.78(1) 0.71(2) 0.83(1) 0.78(2) 0.85(1) 0.85(1) 0.68(6) 0.96(1) 0.97(1) 0.72(1) 0.72(1) 0.94(1) 0.84(1)

0.71(1) 0.64(3) 0.46(1) 0.29(5) 0.59(2) 0.76(1) 0.79(1) 0.69(1) 0.71(4) 0.57(1) 0.54(3) 0.54(1) 0.74(1) 0.56(6) 0.62(2) 0.66(1) 0.66(1) 0.66(1) 0.68(1) 0.64(1)

derived sequencea

a The residues determined incorrectly are underlined. “...” means information is not sufficient to derive sequence in those areas. When ambiguities occur, both possible sequences are given. Note L is indistinguishable from I and K is indistinguishable from Q. b The ranks are rated among the top six sequence candidates according to the primary scores.

found to be comparable. This observation is primarily attributed to the fact that the sequence information is maximized by reprocessing the MS2 and MS3 spectra as described in the Methods section. There were two long tryptic peptides of apomyoglobin (E105-K118 and H119-K133) that did not generate enough MS information to derive their sequences. Despite these large gaps, the sequence coverage for apomyoglobin using all sequenced peptides was found to be 66%. In the Glu-C peptides (a glutamic acid residue on the C-terminus) of apomyoglobin, there are two cases where ambiguities occur (55MKASE59 and 137LFRNDIAAKYKE148) due to the difficulty in identifying the reading direction in the regenerated 1-D spectrum. Normally the reading direction is determined from the detection of a small, single residue fragment and a large peptide fragment where a single residue has been removed. Unfortunately, the fragment mass corresponding to a C-terminal glutamic acid (E) is the same as the mass of an N-terminal phenylalanine (F). This ambiguity makes it more difficult to determine the reading direction of Glu-C peptides. If this type of ambiguity also occurs at the N-terminal of the peptide, an ambiguity of reading direction will occur as in the cases of 55MKASE59 and 137LFRNDIAAKYKE148. Other pairs of residues that have the same behavior include M/L (M/I) and D/P (residue masses differ by 18). The fact that an N-terminal methionine (M) can be read as a C-terminal leucine (L) is the reason that MKASE can also be read as FSAKL. However, in many cases we may have some knowledge of the terminal residues, such as in the case of peptide MKASE, the C-terminal residue is most probably a glutamic acid (E) because of the Glu-C digestion. 2348 Analytical Chemistry, Vol. 72, No. 11, June 1, 2000

Combining the sequencing results of both cytochrome c and apomyoglobin, the success rate (residues correctly assigned divided by total number of residues analyzed) among peptides 3-13 residues in length is 89% and 4-11 residues in length is 93%. False positive (residues incorrectly assigned divided by total number of residues analyzed) for all of the peptides sequenced is only 6%. DISCUSSION Detection Limit. One of the drawbacks of the technique described in this report is its limited detection limit due to the large number of spectra that must be acquired (usually 5-20). Using a standard Finnigan LCQ MS with a custom nanospray interface, the sequences of several tryptic peptides were obtained while consuming low-femtomole amounts of the peptide materials (3 fmol consumed for Ac-1GDVEK5, 5 fmol for 9IFVQK13, and 16 fmol for 28TGPNLHGLFGR38 of cytochrome c). As advances in MS instruments continue to enhance their sensitivity, less and less material will be required to perform this type of experiment in the future. Preliminary sequence analyses of tryptic peptides of cytochrome c using conventional on-line LC/MS instruments with nominal flow rates of ∼200 µL/min gave unsatisfactory results (data not shown) compared to the experiment performed by nanospray MS. The primary reason is that under normal LC/MS conditions the peak widths are