Quantitation and Facilitated de Novo Sequencing of Proteins by

Wen-Chu Yang, Hamid Mirzaei, Xiuping Liu, and Fred E. Regnier. Analytical ...... Gavin E. Reid , Kade D. Roberts , Richard J. Simpson , Richard A. J. ...
21 downloads 0 Views 337KB Size
Anal. Chem. 2000, 72, 4047-4057

Quantitation and Facilitated de Novo Sequencing of Proteins by Isotopic N-Terminal Labeling of Peptides with a Fragmentation-Directing Moiety Martin Mu 1 nchbach, Manfredo Quadroni, Giovanni Miotto, and Peter James*

Protein Chemistry, Swiss Federal Institute of Technology, Alte Landstrasse 83, 8803 Ru¨schlikon, Switzerland

We describe a method for comparative quantitation and de novo peptide sequencing of proteins separated either by standard chromatographic methods or by one- and twodimensional polyacrylamide gel electrophoresis. The approach is based on the use of an isotopically labeled reagent to quantitate (by mass spectrometry) the ratio of peptides from digests of a protein being expressed under different conditions. The method allows quantitation of the changes occurring in spots or bands that contain more than one protein and has a greater dynamic range than most staining methods. Since the reagent carries a fixed positive charge under acidic conditions and labels only the N-terminal of peptides, the interpretation of tandem mass spectra to obtain sequence information is greatly simplified. The sequences can easily be extracted for homology searches instead of using indirect mass spectralbased searches and are independent of posttranslational modifications.

The development of high-throughput DNA sequencing and computer algorithms for the rapid assembly of the random sequence fragments into large contiguous sequences has resulted in a exponential growth in the size of the sequence databases. Currently, over 26 genomes have been completed, with 107 prokaryotic and 32 eukaryotic genomes in the process of being sequenced (as of February 6, 2000). These, together with extensive expressed sequence tag (EST) partial mRNA sequence1 libraries, allow the entire potential protein complement of organisms to be defined (the proteome). In parallel with this, genomewide studies of gene expression are now possible at the mRNA level due to the development of techniques such as the following: DNA microchips and arrays,2 differential display PCR,3 and serial analysis of gene expression.4 To correlate these data with the actual level of protein expression and the state of posttranslational modification, protein quantitation and identification tech* Corresponding author: (e-mail) [email protected]. (1) Adams, M. D.; Kelley, J. M.; Gocayne, J. D.; Dubnick, M.; Polymeropoulos, M. H.; Xiao, H.; Merril, C. R.; Wu, A.; Olde, B.; Moreno, R. F.; Kerlavage, A. R.; McCombie, W. R.; Venter, J. C. Science 1991, 252, 1651-6. (2) Schena, M.; Shalon, D.; Davis, R. W.; Brown, P. O. Science 1995, 270, 46770. (3) Liang, P.; Pardee, A. B. Science 1992, 257, 967-71. (4) Velculescu, V. E.; Zhang, L.; Vogelstein, B.; Kinzler, K. W. Science 1995, 270, 484-7. 10.1021/ac000265w CCC: $19.00 Published on Web 08/03/2000

© 2000 American Chemical Society

niques have to be developed that match the speed and accuracy of mRNA analysis. Despite its limitations, the best-established and most widely applicable approach to study protein expression quantitatively is by densitometric analysis of protein extracts separated by twodimensional polyacrylamide gel electrophoresis (2D-PAGE). Nevertheless, this method is not always reliable due to protein-specific differences in staining intensity and the reproducibility of the staining procedure, as well as problems due to proteins overlapping and streaking. Stable5 and radioisotope in vivo metabolic labeling6 has provided a partial answer to this problem. However, these methods are not generally applicable since they require (i) the introduction of a label into cells, which is often impossible in vivo, and (ii) the prevention of metabolic scrambling of the label, which is often hard to predict when using labeled amino acids. Once having at least partially separated and quantitated the proteins, one must be able to determine which gene products are being analyzed. The problem of protein identification has been revolutionized by the introduction of methods to identify proteins in databases using mass spectral data, based on either peptide masses from protein digests (reviewed in ref 7) or fragmentation spectra from individual peptides.8,9 De novo protein sequencing and its automation has been the subject of intense interest since native peptide fragmentation (MS/MS) became possible.10,11 Various methods such as isotopic labeling and fragmentationdirecting charged derivatives have been investigated12-14 as means of simplifying spectral interpretation. However, no universal answer has been forthcoming and even the use of raw MS/MS spectra to search databases has its limitations. The need to deal with posttranslational modifications and the ability to work with (5) Oda, Y.; Huang, K.; Cross, F. R.; Cowburn, D.; Chait, B. T. Proc. Natl. Acad. Sci. U.S.A. 1999, 96, 6591-6. (6) McConkey, E. H. Anal. Biochem. 1979, 96, 39-44. (7) Cottrell, J. S. Pept. Res. 1994, 7, 115-24. (8) Eng, J. K.; McCormack, A. L.; Yates, J. R. J. Am. Soc. Mass Spectrom. 1994, 5, 976-89. (9) Mann, M.; Wilm, M. Anal. Chem. 1994, 66, 4390-9. (10) Hunt, D. F.; Yates, J. R. d.; Shabanowitz, J.; Winston, S.; Hauer, C. R. Proc. Natl. Acad. Sci. U.S.A. 1986, 83, 6233-7. (11) Johnson, R. S.; Biemann, K. Biomed. Environ. Mass Spectrom. 1989, 18, 945-57. (12) Hunt, D. F.; Buko, A. M.; Ballard, J. M.; Shabanowitz, J.; Giordani, A. B. Biomed. Mass Spectrom. 1981, 8, 397-408. (13) Spengler, B.; Luetzenkirchen, F.; Metzer, S.; Chaurand, P.; Kaufmann, R.; Jeffrey, W.; Bartlet-Jones, M.; Pappin, D. J. C. Int. J. Mass Spectrom. 1997, 169/170, 127-40. (14) Stults, J. T.; Lai, J.; McCune, S.; Wetzel, R. Anal. Chem. 1993, 65, 1703-8.

Analytical Chemistry, Vol. 72, No. 17, September 1, 2000 4047

organisms whose genome has not been sequenced requires the development of new methods. We present a method that is analogous to the isotope-coded affinity tagging method which was introduced recently to allow quantitative analysis of complex protein mixtures without the need for separation by gel electrophoresis.15 Our method allows relative protein quantitation even if the separation is only partial and facilitates de novo sequencing and automated interpretation of MS/MS fragmentation spectra. EXPERIMENTAL SECTION Materials. Nicotinic acid, dicyclohexylcarbodiimide, N-hydroxysuccinimide and solvents for synthesis and HPLC were purchased from Fluka AG (Buchs, Switzerland). [D4]Nicotinic acid was purchased from Cambridge Isotope Laboratories (Andover MA). R-Cyano-4-hydroxycinnamic acid and N-methylpiperidine were purchased from Aldrich GmbH (Buchs, Switzerland). Sequencing grade modified trypsin and Asp(Glu)C protease were purchased from Promega (Zu¨rich, Switzerland). Synthesis of the 1-([H4/D4]Nicotinoyloxy)succinimide (H4/D4-Nic-NHS) Esters. Nicotinic acid (1 g) was dissolved in dry tetrahydrofuran and mixed with 1 equiv of dicyclohexylcarbodiimide under continuous stirring in a reaction flask for 2 h at room temperature. One equivalent of N-hydroxysuccinimide was added to the solution and the resultant mixture stirred overnight at room temperature. The precipitate was recovered by filtration and purified by recrystallization from ethyl acetate. The purity and structure was confirmed by 1H NMR (CDCl3, 300 MHz): δ ) 2.863 (s, 2 CH2), 7.40-7.50 (m, H5 (ArH)), 8.30-8.40 (dt, H4 (ArH), J ) 1.86, 2.18, 8.1 Hz), 8.80-8.90 (dd, H6 (ArH), J ) 1.87, 4.93 Hz), 9.27 (d, H2 (ArH), J ) 2.18 Hz) ppm. Chemical Modification and Digestion of Proteins. Escherichia coli MC4100 was obtained from the laboratory collection.16 In the carbon limitation studies, the bacteria were cultivated in a synthetic medium with either 5 or 100 mM glucose as the sole carbon source. The conditions used for the sulfate-starvation experiments, as well as for sample preparation and 2D-gel analysis, were as described previously.17 Gels were scanned in a personal laser densitometer (Molecular Dynamics, Sunnyvale, CA) and image analysis, spot matching, and quantification were performed using the 2D software package (PDQuest, Pharmacia, Uppsala, Sweden) on a PowerMac. Either the entire spot or a 1-mm2diameter circular gel piece was cut from the center of the spots chosen for analysis and completely destained in ethanol containing 0.5% (v/v) trimethylamine. The spot was washed in water then dehydrated in acetonitrile. Succinic anhydride (100 mM) was freshly prepared in 2 M urea and 200 mM sodium phosphate buffer and the pH rapidly adjusted to 8.5 with sodium hydroxide. The spot was rehydrated in 100 µL of the reagent solution, and succinylation was allowed to proceed for 2 h before a fresh aliquot of reagent was added. The gel spot was desalted by four alternate additions of pure water or acetonitrile. After the last acetonitrile wash, the spot was rehydrated (15) Gygi, S. P.; Rist, B.; Gerber, S. A.; Turecek, F.; Gelb, M. H.; Aebersold, R. Nat. Biotechnol. 1999, 17, 994-9. (16) Kertesz, M. A.; Leisinger, T.; Cook, A. M. J. Bacteriol. 1993, 175, 118790. (17) Dainese, P.; Staudenmann, W.; Quadroni, M.; Korostensky, C.; Gonnet, G.; Kertesz, M.; James, P. Electrophoresis 1997, 18, 432-42.

4048

Analytical Chemistry, Vol. 72, No. 17, September 1, 2000

with 10 µL of 2 M urea in 20 mM pH 7.8 sodium phosphate buffer containing 0.1-1 µg of Asp(Glu)C protease and digestion was carried out at 37 °C for 6 h. The peptide mixture was then N-terminally modified by the addition of the H4 or D4 Nic-NHS ester freshly prepared in pH 8.5, 50 mM sodium phosphate buffer. Further aliquots were added after 10 and 20 min. After 2 h, 0.5 M hydroxylamine in 50 mM pH 8.5 sodium phosphate buffer was added and the solution left overnight. The reaction was stopped by the addition of 2 µL of formic acid. 16O/18O isotopic labeling was carried out as described in ref 23. The protein of interest was digested in a solution contained 50% H218O. Because of the high cost of the solvent ($1000/mL), the volumes were kept as small as possible. For a single spot excised from a 2D gel, a minimum volume of 20 µL is needed. For tryptic digestions, 20 µL of 100 mM pH 8.0 NH4HCO3 buffer was added to the dried protein or dehydrated gel piece and porcine trypsin (sequencing grade) was added to a final concentration of 2% (w/w). Digestion was performed for 24 h at 37 °C in an Eppendorf shaker and stopped by the addition of 5 µL of formic acid. The solutions were dried in the Speedvac to remove the volatile buffer, resuspended in 100 µL water, and dried again, and finally the peptides were redissolved in 50 µL of water and stored at -20 °C. MALDI Analysis and Quantification. A 0.5-µL sample of crude digest was desalted using a Ziptip (Millipore, Bedford, MA), eluted with 70% methanol, 1% acetic acid, and cocrystallized with the same amount of matrix solution (10 mg/mL R-cyano-4hydroxycinnamic acid in 50% acetonitrile, 1.25% TFA in water). The dry sample-matrix mix was washed three times with icecold 1% TFA applied directly onto the MALDI target. Mass spectra were recorded using a Voyager Elite MALDI-TOF mass spectrometer (Perseptive Biosystems, Framingham, MA) operated in delayed extraction reflector mode using an accelerating voltage of 20 kV, a pulse delay time of 150 ns, a grid voltage of 60%, and a guide wire voltage of 0.05%. Spectra were accumulated for 32 laser shots. The ratio of the D4- and H4-labeled peptides were calculated from the relative peak heights and averaged over all peptides found to be unique to an identified protein. The average error for the expression ratios was (14% for the optical density measurements and (2% for the isotopic determination. MS/MS Analysis and de Novo Sequencing. MS/MS sequencing was performed on a Finnigan MAT LC-Q ion trap mass spectrometer (San Jose, CA). The desalted digest in 70% methanol and 1% acetic acid was loaded into a homemade nanospray tip and electrosprayed into the mass spectrometer at a flow rate of 0.2 µL/min using a syringe pump. The peaks of interest were selected with a mass window wide enough to include the entire isotope cluster. Fragmentation was carried out using a relative collision energy of 35-60 units for MH+ ions and 20-35 units for MH2+ ions. Database Searching. Peptide mass fingerprinting searches were carried out using either the MassSearch or PeptideSearch algorithms18,19 and the databases: Swissprot release 38, Genbank release 115, and nrdb release 10 Jan 1999. MS/MS searches were carried out using the Sequest program and the nonredundant (18) James, P.; Quadroni, M.; Carafoli, E.; Gonnet, G. Biochem. Biophys. Res. Commun. 1993, 195, 58-64. (19) Mann, M.; Hojrup, P.; Roepstorff, P. Biol. Mass Spectrom. 1993, 22, 33845.

Figure 2. Ensuring N-terminal specificity of isotopic labeling by protein succinylation. Panel A shows the MALDI MS spectrum of succinylated myoglobin after digestion with Glu(Asp)C-protease (V8). The digest after modification with the nicotinoyloxysuccinimide reagent at pH 8.5 is shown in panel B. The asterisks indicate the number of nicotinyl groups added onto a peptide.

Figure 1. Specificity of N-terminal nicotinylation as a function of pH. The difference in the reactivity of 1-(]H4]nicotinoyloxy)succinimide (H4 Nic-NHS) toward amino groups at pH 5.5 and pH 8.5 is shown. Panel A shows the MALDI-TOF spectrum of an unmodified tryptic digestion of myoglobin. The sequence and masses of the peptides corresponding to the labeled peaks are indicated below the spectrum. Panels B and C show the MALDI spectra after modification of the digest with the reagent at pH 5.5 and pH 8.5, respectively. The asterisks indicate the number of nicotinyl groups added onto a peptide.

database.8 The spectra from D4/H4-labeled peptides were manually interpreted, and the sequences deduced were used for (T)FASTA homology searches.20 (20) Pearson, W. R. Methods Enzymol. 1990, 183, 63-98.

RESULTS AND DISCUSSION Selectivity of Peptide Modification by Nicotinyl-N-Hydroxysuccinimide. A tryptic digest of myoglobin was used to test the labeling efficiency of the Nic-NHS reagent and the effect that it has on the fragmentation of peptides under MS/MS conditions. In order for the reagent to be effective, it should selectively label only the N-terminal amino group of a peptide. The effectiveness was monitored by carrying the reaction out at pH 5 since at this pH only the N-terminal amino group should be unprotonated; the -amino group of the lysine residues should be protonated and unreactive. Figure 1A shows the MALDI-TOF spectrum of a tryptic digest of horse heart myoglobin. The digest was then modified at either pH 5.0 (Figure 1B) or 8.5 (Figure 1C) in sodium phosphate buffer for 1 h at 37 °C with a 100-fold molar excess of the Nic-NHS reagent. At pH 5, the reaction is very inefficient and the peptides are only partially labeled (∼30%) though MS/MS analysis showed that the reaction was specific to the N-terminus (data not shown). At the higher pH, the reaction was much more efficient; however, the specificity was lost, and lysine residues and to a lesser extent, tyrosines were modified as well. Increasing Labeling Efficiency. To increase the efficiency of labeling while preventing unwanted side reactions, the chemical protection of lysine residues was investigated. Protein succiAnalytical Chemistry, Vol. 72, No. 17, September 1, 2000

4049

Figure 3. Complete sequence determination by MSn using an ion trap. Panel A shows the MS/MS spectra of the myoglobin Asp(Glu)-C peptide a (ADIAGHGQE) derivatized with a 25:75 mixture of H4 and D4 Nic-NHS (1+, m/z 1002/1006). Panel B shows the MS/MS/MS spectrum of the b4 ion pair (m/z 476/480) from panel A.

nylation has been used as a method of solubilizing and denaturing proteins that are highly refractory to protease digestion, even after treatment with 8 M urea, 6 M guandinium chloride, or boiling in SDS.21 The succinylation of several test proteins with succinic anhydride showed that the reaction was highly efficient. Figure 2A shows a MALDI-TOF spectrum of a Asp(Glu)C protease digestion of succinylated myoglobin. All the peptides are succinylated and there is no trace of partial modification. There is an increased tendency for metal ion adducts to be found, probably due to the highly acidic nature of the peptides, and the overall sensitivity of detection drops. Again this is probably due to the reduction of proton acceptor sites for efficient ionization. The derivatization with the Nic-NHS reagent could then be carried out at pH 8.5 for much longer times (2 h, with multiple additions of the labile reagent) to ensure complete and specific derivatization of the N-terminal amino groups (Figure 2B). This increased the side reaction of esterification of tyrosine residues; however, a simple incubation with 0.5 M hydroxylamine specifically reversed (21) Blumenthal, K. M.; Kem, W. R. J. Biol. Chem. 1977, 252, 3328-31.

4050 Analytical Chemistry, Vol. 72, No. 17, September 1, 2000

this. The labeling with the Nic-NHS group increased the detection efficiency of the succinylated peptides by a factor of 10-100, giving an overall increase in sensitivity over the native peptides. Isotopic Labeling of the N-Terminally Derived Ion Series Produced by MS/MS. Since it was now possible to introduce the Nic-NHS reagent specifically at the N-terminal of all the peptides in a digest, we investigated the use of isotopically labeling the peptides with a mixture of 1-([H4]nicotinoyloxy)- and 1-([D4]nicotinoyloxy)succinimide esters. This has two immediate advantages when sequencing peptides by MS/MS. First, all the peptide fragments derived from the N-terminal of the peptide (the b-ions in the low-energy fragmentation that occurs in triple-quadrupole and quadrupole ion trap mass spectrometers) appear as doublets separated by four mass units. Second, the highly basic nature of the nicotinyl group greatly increases the relative yield of the b-ion series compared to the native peptide. These two effects have an important consequence; they allow a simple and efficient method to obtain full-length sequences from peptides being fragmented in an ion trap.

Figure 4. MS/MS sequencing of a succinylated peptide. Panel A shows the MS/MS spectrum of the nicotinylated peptide e (WQQVLNVWGBVE) isolated from the V8 digest of succinylated myoglobin (1+, m/z 1692/1696). B represents succinyllysine. Panel B shows the MS/MS/MS spectrum obtained from the b7-ion pair (m/z 973/977) which together with the MS/MS spectrum obtained from the parent ion (panel A) allows the complete sequence to be read.

Facilitated Full Sequence Extraction by MSn. Figure 3A shows the MS/MS fragmentation spectrum of the myoglobin Asp(Glu)C (peptide a in Figure 2A; ADIAGHGQE) which has been derivatized with a 25:75 mixture of H4 and D4 Nic-NHS. The singly charged parent ion was isolated in the trap with a mass window wide enough to contain the entire isotopic envelope. The b-ions are very readily identified as doublets separated by four mass units, and the y-ions appear as singlets. The presence of the His residue, which is strongly basic in the gas phase, is responsible for the intensity of these y-ions. The MS/MS spectrum of the unmodified peptide by contrast, shows exclusively y-ions (data not shown). Due to the design and operation of the trap, only ions down to an m/z of approximately one-third of the parent mass can be effectively trapped. However, since the b-ion series can be so easily identified, the lowest mass b-ion can be isolated for a further stage of fragmentation. Figure 3B shows the MS/MS/ MS spectrum of the b4 ion pair (m/z 476/480) from panel A. Thus, the entire sequence can easily and rapidly be extrapolated from the spectra.

The Effect of Succinylation on Fragmentation. This was investigated for many peptides during the course of this study. Figure 4A shows an MS/MS spectrum of the nicotinylated peptide e (WQQVLNVWGBVE) isolated from the Asp(Glu)C digest of succinylated myoglobin. There are two high-mass peaks that are diagnostic for the presence of a succinate group in the peptide, those due to the loss of COOH and succinate, respectively, from the parent ion. The combination of nicotinylation and succinylation has suppressed the formation of any y-ions from the C-terminal and the spectrum is composed entirely of b-ions. There is a noticeable gap in the spectrum due to the absence of the b9 ion. This is due to the presence of the adjacent tryptophan residue that is basic in the gas phase suppressing the next b-ion. This effect is compounded by the presence of a glycine in the next position that usually gives a weak b-ion. In all other spectra, the suppression of a b-ion next to a succinyllysine was not observed. The usefulness of the labeling strategy for peptide sequencing by ion trap mass spectrometry is illustrated in Figure 4B, which shows the MS/MS/MS spectrum obtained from the b7 ion pair Analytical Chemistry, Vol. 72, No. 17, September 1, 2000

4051

Figure 5. Comparison of N D4/H4 nicotinyl and C 18O/16O labeling for MS/MS analysis. Panel A shows the MS/MS spectrum of myoglobin tryptic fragment B from Figure 1 (HGTVVLTALGGILK, 1+, m/z 1379) obtained by tryptic digestion in 50:50 (v/v) 16O/18O water. Panel B shows the MS/MS spectrum fragment B obtained by tryptic digestion in normal aqueous solution after derivatization with a 50:50 mixture of H4 and D4 Nic-NHS at pH 5.5 (parent ion 1+, m/z 1480/1484). The b-ion series from b4 to the parent ion is clearly identifiable as a series of doublets separated by four mass units. The sequence ions are given together with the peptide sequence under the spectra in both panels.

(m/z 973/977). Together with the MS/MS spectrum obtained from the parent ion, the complete sequence can be defined. The succinylation of the lysine residues has another important effect; one can now distinguish between the otherwise isobaric residues, lysine and glutamine. Advantages of N-Terminal Nicotinylation. Isotopic labeling of the C-terminal of peptides can be carried out by digesting the protein in a 50:50 molar mixture of 16O/18O water. This results in peptides (except the peptide arising from the C-terminal of the protein) appearing as doublets separated by two mass units. The the C-terminally derived ions in the MS/MS spectra of these peptides also appear as doublets. The chemical derivatization of the N-terminal of peptides in a digest with the highly basic D4/ H4 nicotinyl group has several advantages over C-terminal labeling using 18O.22 The yield of b-ions is greatly increased and these are 4052 Analytical Chemistry, Vol. 72, No. 17, September 1, 2000

easily identified by their isotopic pattern (doublets separated by four mass units) even when dealing with multiply charged daughter ions (up to 4+) in contrast to the 18O method. There is also no problem when internal ions arise due to the presence of proline or histidine residues, which causes confusion when the isotopic label is on the C-terminus by creating a second sequence series in addition to the y-ions. Figure 5A shows the MS/MS spectrum of the singly charged myoglobin tryptic fragment B (Figure 1, HGTVVLTALGGILK) obtained by digestion in 50:50 (v/v) 16O/18O water. Figure 5B shows the MS/MS spectrum singly charged fragment B obtained by tryptic digestion in normal aqueous solution after derivatization with a 50:50 mixture of H4 and D4 Nic-NHS at pH 5.5. The method (22) Takao, T.; Hori, H.; Okamoto, K.; Harada, A.; Kamachi, M.; Shimonishi, Y. Rapid Commun. Mass Spectrom. 1991, 5, 312-15.

can be applied to any proteolytic or chemical digestion, and it is far less expensive and more reproducible that the use of 18Olabeled water.23 The increase in b-ion intensity and the ease of recognition of these ions as doublets makes it possible to obtain full-length sequence coverage of peptides of m/z >1000 in an ion trap, extending the usual sequencing range. Finally, protein succinylation increases the yield of b-ions by suppressing charge localization by internal lysines and also allows one to differentiate between lysine and glutamine in low-energy MS/MS fragmentation regimes. Another advantage of succinylation is that most of the ions in the MS spectrum under standard electrospray conditions (0.1-5% formic or acetic acid) appear as mainly as 1+ (m/z