Anal. Chem. 2002, 74, 2072-2082
Sequencing of Argentinated Peptides by Means of Matrix-Assisted Laser Desorption/Ionization Tandem Mass Spectrometry Ivan K. Chu,†,‡,§ David M. Cox,‡,| Xu Guo,§ Inga Kireeva,†,‡,⊥ Tai-Chu Lau,# John C. McDermott,| and K. W. Michael Siu*,†,‡
Department of Chemistry, Centre for Research in Mass Spectrometry, and Department of Biology, York University, 4700 Keele Street, Ontario, Canada M3J 1P3, MDS SCIEX, 71 Four Valley Drive, Concord, Ontario, Canada L4K 4V8, and Department of Biology and Chemistry, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong
Argentinated peptide ions are formed in abundance under matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS) conditions in the presence of Ag+ ions. These argentinated peptide ions are fragmented facilely under MALDI-MS/MS conditions to yield [bn + OH + Ag]+, [bn - H + Ag]+ and [an - H + Ag]+ ions that are indicative of the C-terminal sequence. These observations parallel those made earlier under electrospray MS conditions (Chu, I. K.; Guo, X.; Lau, T.-C.; Siu, K. W. M. Anal. Chem. 1999, 71, 2364-2372). A mixed protonated and argentinated tryptic peptide map was generated from 37 fmol of bovine serum albumin (BSA) using MALDI-MS. MALDI-MS/MS data from four argentinated peptides at a protein amount of 350 fmol unambiguously identified the protein as BSA. Sequence-tag analysis of two argentinated tryptic peptides was used to identify unambiguously myocyte enhancer factor 2A, which had been recombinantly expressed in a bacterial cell line.
Proteomics, the study of proteins and their functional interactions, is a field still in its infancy, but has already generated worldwide interest.1 Central to the study are sensitive and sophisticated tools that can handle the large number of lowabundance proteins that are being encountered. Protein MS, in combination with high-resolution separation techniques and database searching, is being applied to identify proteins via their tryptic maps and partial sequence data.2 Partial sequence data are invaluable in that they narrow down the number of possible hits and, therefore, increase the analytical confidence. In fact, it is frequently the case that a protein can be unambiguously identified †
Department of Chemistry, York University. Centre for Research in Mass Spectrometry, York University. § MDS SCIEX. | Department of Biology, York University. ⊥ Present address: SYN-X Pharma Inc., 6354 Viscount Road, Mississauga, Ontario, Canada L4V 1H3. # City University of Hong Kong. (1) Yates, J. R., III. Trends Genet. 2000, 16, 5-8. (2) Aebersold, R.; Goodlett, D. R. Chem. Rev. 2001, 101, 269-295.
from a single tryptic peptide for which the sequence of only a few residues is known.3 For organisms whose genomes are unknown, however, protein identification is much less straightforward. The simplest cases are those in which the proteins in question are highly conserved and, thus, can be identified via the sequences of their homologous proteins in other species, because some of the tryptic peptides will have identical sequences and masses. This strategy fails when the proteins are insufficiently similar. For these proteins, identification is predicated upon accurate and extensive sequence data. Most commercial protein MS software has built-in algorithms for automated sequencing of tandem MS spectra of protonated tryptic peptides. Some of these are quite sophisticated in that the raw data in the tandem mass spectrum and the mass of the peptide are used as input for sequence database searching.4 This approach works obviously only for organisms whose genomes have been sequenced. In cases for which sequence data are unavailable, a common assumption is that the precursor ion produced in electrospray is doubly charged and that abundant fragment ions of larger m/z values are singly charged y ions. This assumption is typically valid and useful for a significant fraction of the tryptic peptides encountered in a given tryptic digest; however, for tryptic peptides that yield abundant b ions admixture with y ions, automated sequencing is less straightforward. Of course, sequence interpretation can often be improved by manual inspection or, in especially challenging cases, after derivatizing the hydroxyl oxygen atoms in the C-terminus to 18O in 50% of the peptide population (by performing tryptic digestion in 1:1 H218O/H216O) and assigning fragment ions that are separated by 2 Da as y ions.5-7 These steps, however, increase the complexity of the experiment as well as the analysis time.
‡
2072 Analytical Chemistry, Vol. 74, No. 9, May 1, 2002
(3) Susin, S. A.; Lorenzo, H. K.; Zamzami, N.; Marzo, I.; Snow, B. E.; Brothers, G. M.; Mangion, J.; Jacotot, E.; Costantini, P.; Loeffler, M.; Larochette, N.; Goodlett, D. R.; Aebersold, R.; Siderovski, D. P.; Penninger, J. M.; Kroemer, G. Nature 1999, 397, 441-446. (4) Eng, J. K.; McCormack, A. L.; Yates, J. R., III. J. Am. Soc. Mass Spectrom. 1994, 5, 976-989. 10.1021/ac0111006 CCC: $22.00
© 2002 American Chemical Society Published on Web 04/06/2002
Figure 1. MALDI spectra of (a) protonated and (b) argentinated peptides: des-Arg1-bradykinin, 0.88 pmol; angiotensin I, 1.14 pmol; Glu1fibrinopeptide B, 1.14 pmol; and neurotensin, 50 fmol.
Recently, we proposed an alternative, and potentially more easily automatable, strategy for de novo sequencing of oligopeptides, for example, tryptic peptides.8 The oligopeptides are first converted to their Ag+ (argentinated) complexes by addition of silver(I) nitrate, electrosprayed, and then sequenced via low(5) Takao, T.; Hori, H.; Okamoto, K.; Harada, A.; Kamachi, M.; Shimonishi, Y. Rapid Commun. Mass Spectrom. 1991, 5, 312-316. (6) Kuster, B.; Mann, M. Anal. Chem. 1999, 71, 1431-1440. (7) Kosaka, T.; Takazawa, T.; Nakamura, T. Anal. Chem. 2000, 72, 1179-1185. (8) Chu, I. K.; Guo, X.; Lau, T.-C.; Siu, K. W. M. Anal. Chem. 1999, 71, 23642372.
energy collision-induced dissociation (CID) in tandem MS. Unlike protonated peptides, argentinated peptides (even those cleaved by trypsin) fragment predominantly to yield N-terminal ions. Furthermore, cleavage of the C-terminal residues produces at least two, if not all three, of the following types of fragment ions: [bn + OH + Ag]+, [bn - H + Ag]+, and [an - H + Ag]+. The fragment ions of a common residue length (i.e., identical n) are separated by fixed mass differencess18 Da between the [bn + OH + Ag]+ and [bn - H + Ag]+ ions and 28 Da between the [bn - H + Ag]+ and [an - H + Ag]+ ionssthus allowing these to be picked out Analytical Chemistry, Vol. 74, No. 9, May 1, 2002
2073
Figure 2. MALDI spectrum of 1 pmol of bradykinin: the ion at 904.47 Th is a bradykinin fragment; inset shows the [M + Ag]+ cluster; the bars show the theoretical ion distribution.
easily at a glance. An algorithm has been developed to demonstrate the feasibility of this approach for automated sequencing. This algorithm first identifies the triplets and doublets of fragment ion peaks, computes the mass that separates neighboring triplets, and identifies the residue that is cleaved. Partial sequencing of a number of peptides, including tryptic peptides, has been demonstrated.8 A valid criticism on implementing this sequencing strategy under electrospray conditions is that the added Ag+ suppresses the peptide ion intensity and reduces analytical sensitivity; another is that tryptic peptides tend to form doubly charged ions that result from attachment of one Ag+ and one proton, thus complicating the fragmentation pattern and interpretation. Here we report the first MALDI results of argentinated oligopeptides and their fragmentation under CID conditions. It is readily apparent that abundant argentinated oligopeptide complexes can be generated from subpicomole quantities of peptides and that fragmentation of argentinated peptides yields abundant triplet fragment ions. This is demonstrated using a number of commercially available proteins and also a recombinantly expressed protein, myocyte enhancer factor (MEF) 2A. The development of skeletal muscle tissue requires the expression of a number of proteins, and the regulation of this expression is controlled by a limited number of transcription factors. The MEF-2 family is a group of transcription factors that play a critical role in the process of muscle cell differentiation. However, MEF2 is found in a variety of tissues, suggesting that its muscle-specific activity is itself regulated by posttranslational modifications and interactions with other protein partners. Identification of these modifications or interactions could explain how the activity of MEF2 is regulated. Toward this end, it was necessary to recombinantly express 2074 Analytical Chemistry, Vol. 74, No. 9, May 1, 2002
His-tagged MEF2-A for study. Unfortunately, MEF2 proteins do not express efficiently in a bacterial host, and even after purification on a nickel affinity column, the MEF2A preparation was contaminated with a number of nonspecific bacterial proteins. To identify MEF2A, it was necessary to separate the preparation using sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS PAGE), perform in-gel trypsin digestion of specific suspect MEF2A bands, and identify the proteins using MALDI tandem MS via sequencing of their argentinated tryptic peptides.
EXPERIMENTAL SECTION Experiments were conducted on a QSTAR Pulsar i quadrupole/time-of-flight (QqTOF) hybrid mass spectrometer (Applied Biosystems/MDS SCIEX, Concord, ON) equipped with an oMALDI source, and a Voyager DE-STR MALDI-TOF mass spectrometer (Applied Biosystems Perceptive, Framingham, MA). Spectra were acquired using the reflective mode with optimal acceleration voltages, laser intensities, and delays for ion extraction. CID was performed on the QSTAR using argon as the collision gas and at center-of-mass energies (Ecm) of 1-3 eV. Oligopeptides and proteins were commercially available from Sigma (St. Louis, MO); all chemicals were from Aldrich (St Louis, MO). Samples were typically 10 µM peptides in 50/50 water/ methanol containing 2 mM silver nitrate. Silver nitrate was replaced by 0.1% trifluoroacetic acid for MALDI of protonated peptides. Sample/matrix solutions were prepared typically by mixing 1 µL of peptide solution with 9 µL of saturated 4-hydroxyR-cyanocinnamic acid (HCCA) or 2,5-dihydroxybenzoic acid (DHB) solution. A 1-µL aliquot was pipetted onto the MALDI sample plate and air-dried prior to analysis.
Figure 3. (a) MALDI spectrum of 1 pmol of MGGFLR, (b) product ion spectrum of the [M + Ag]+ cluster of MGGFLR, and (c) product ion spectrum of the [M + Ag]+ cluster of YGGFL. The following abbreviations are used for labeling: bn* ) [bn + OH + Ag]+, bn ) [bn - H + Ag]+, and an ) [an - H + Ag]+.
Analytical Chemistry, Vol. 74, No. 9, May 1, 2002
2075
Figure 4. Product ion spectrum of argentinated tryptic peptide, AVPYPQR, residues 192-198, from 1 pmol of bovine β-casein precursor.
Bovine caseins and serum albumin were digested as follows: A 1-mg quantity of protein was dissolved in 500 µL of 50 mM ammonium bicarbonate and boiled for 10 min. The sample was cooled, and a 20-µL aliquot of 2 mg/mL trypsin (Worthington Biochemical, Lakewood, NJ or Promega, sequence grade, Madison, WI) was added. The mixture was incubated at 37 °C for 16 h and then diluted with water. The MEF2A protein was expressed by transforming BL21(DE3)pLysS bacterial cells with the DNA plasmid pET15b-HisMEF2A and growing a 500-mL culture to an optical density of 0.9, followed by induction with 500 mM isopropyl-β-D-1-thiogalactopyranoside for 3 h at 30 °C. Cells were harvested by centrifugation and lysed in 20 mL of binding buffer (6 M urea, 5 mM imidazole, 500 mM NaCl, and 20 mM Tris pH 7.9) and sonicated briefly. The lysate was cleared by centrifugation at 30000g for 20 min and then loaded onto a 1-mL Ni agarose gravity-flow column (Invitrogen, Carlsbad, CA). The column was washed with 10 column volumes of binding buffer and 10 column volumes of wash buffer (6 M urea, 20 mM imidazole, 500 mM NaCl, 20 mM Tris pH 7.9). His-MEF2A was eluted using 5 mL of elution buffer (6 M urea, 500 mM imidazole, 500 mM NaCl, 20 mM Tris pH 8.0). The impure proteins were then separated using SDS PAGE and visualized using Coomassie Blue (Gelcode Blue Stain, BioLynx, Montreal, QU). Excised bands were destained by washing three times with 50% acetonitrile/50 mM ammonium bicarbonate. Cysteine residues were modified with 10 mM dithiothreitol for 30 min at 50 °C followed by 55 mM iodoacetamide in the dark for 20 min. The gel pieces were finally washed with acetonitrile and then rehydrated in 50 mM ammonium bicarbonate containing 12.5 ng/µL of trypsin and incubated at 37 °C overnight. The resulting peptides were extracted using 60% acetonitrile/water 2076
Analytical Chemistry, Vol. 74, No. 9, May 1, 2002
and concentrated by Speed-Vac for MALDI analysis as detailed above.
RESULTS AND DISCUSSION Figure 1 shows a comparison of the sensitivity of MALDI-MS for (a) protonated and (b) argentinated oligopeptides with peptide amounts of 50 fmol to 1.14 pmol. It is apparent that the overall sensitivity was comparable in both cases (see later sections for results of tryptic digests). Although MALDI samples are inherently heterogeneous and there is much variation in intensity from spot to spot, the quality of the spectra shown is typical under routine operation. The presence of Ag+ results in formation of silver-peptide complexes. The [M + Ag]+/[M + H]+ intensity ratios (where M ) peptide) vary from 0.2 to 0.6, thus a significant fraction of the peptides is in the form of silver complexes. It is noteworthy that there is much variation in the extent of silver complexation: angiotensin I appears to have the highest affinity for Ag+, exhibiting the highest [M + Ag]+/[M + H]+ intensity ratio as well as more extensive Ag incorporation in the form of [M - H + 2Ag]+ and even [M - 2H + 3Ag]+ (1618 Th) ions. Figure 2 shows a second example, bradykinin. The inset shows details of the [M + Ag]+ cluster, which is easily differentiated from the [M + H]+ cluster by virtue of the 107Ag and 109Ag doublet. The elemental 107Ag/109Ag ratio is 1.076; however, the third peak in an [M + Ag]+ cluster (e.g., the 1168.46 Th ion) is typically the most abundant as a result of the contribution of the [13C2-M + 107Ag]+ ion’s intensity to that of the [M + 109Ag]+ ion. Figure 3a and b shows, respectively, the full scan spectrum and the CID results of the [M + Ag]+ ions of MGGFLR. The entire
Figure 5. MALDI spectrum of the tryptic digest of 1 pmol bovine R-casein precursor: O, protonated peptide; b, argentinated peptide; numbers bracketed by arrows indicate the residues.
[M + Ag]+ cluster centering at 788.25 Th was selected by Q1 for fragmentation; this had the added advantage of better sensitivity and discriminating power, because all of the argentinated product ions were present as doublets. The triplet patterns of [bn + OH + Ag]+, [bn - H + Ag]+, and [an - H + Ag]+ (abbreviated as bn*, bn, and an, respectively) are apparent and from which a partial sequence of -GFL/IR can be read. Figure 3c shows the product ion spectrum of the [M + Ag]+ ions of leucine enkephalin, YGGFL, which is very similar to the product ion spectrum reported earlier for the electrosprayed [M + Ag]+ ion of YGGFL recorded on a triple quadrupole mass spectrometer.8 Again the triplet pattern is unmistakable. The search algorithm developed earlier is, thus, entirely applicable to the MALDI-CID spectra. Our earlier work8-12 showed that cleavage on the C-terminal side of proline produces weak N-terminal ions; cleavage on the C-terminal side of proline cannot produce an argentinated oxazolone, which is believed to be the general structure of [bn - H + Ag]+ ions because of the lack of an N-H hydrogen on the prolyl residue.9 The poor fragmentation yield of N-terminal ions, and especially the [bn - H + Ag]+ ions, at the C-terminal side of proline is further demonstrated in Figure 4, which
shows the product ion spectrum of the tryptic peptide, AVPYPQR, residues 192-198 of bovine β-casein precursor. It is apparent that the C-terminal residues are -PQR and that the [b5 - H + Ag]+ ion (abbreviated as b5) is weak; the presence of relatively more abundant [b5 + OH + Ag]+ (abbreviated as b5*) and [a5 - H + Ag]+ (abbreviated as a5) facilitated the [b5 - H + Ag]+ ion assignment. The -PQR sequence is confirmed by the presence of the abundant [y3 - H + Ag]+ (abbreviated as y3), which is unusually intense, even for Rterminating peptides, as a result of the presence of the basic proline residue at the N-terminal position of the tripeptide PQR. It is noteworthy that there is a second abundant P-terminating [yn - H + Ag]+ ion in Figure 4, the [y5 - H + Ag]+ ion (abbreviated as y5) having the sequence PYPQR, although it is not readily apparent that it is so because of the lack of the (9) Lee, V. W.-M.; Li, H.; Lau, T.-C.; Siu, K. W. M. J. Am. Chem. Soc. 1998, 120, 7302-7309. (10) Li, H.; Siu, K. W. M.; Guevremont, R.; Le Blanc, J. C. Y. J. Am. Soc. Mass Spectrom. 1997, 8, 781-792. (11) Chu, I. K.; Shoeib, T.; Guo, X.; Rodriquez, C. F.; Lau, T.-C.; Hopkinson, A. C.; Siu, K. W. M. J. Am. Soc. Mass Spectrom. 2001, 12, 163-175. (12) Rodriquez, C. F.; Shoeib, T.; Chu, I. K.; Siu, K. W. M.; Hopkinson, A. C. J. Phys. Chem. A 2000, 104, 5335-5342.
Analytical Chemistry, Vol. 74, No. 9, May 1, 2002
2077
corresponding N-terminal triplets. The observation of abundant [yn - H + Ag]+ ions whose N-terminal is proline parallels that in protonated peptides.13 The argentinated tryptic peptide map of bovine R-casein precursor is shown in Figure 5. It is apparent that argentinated tryptic peptides coexist with their protonated analogues, their coexistence being useful for confirmation of the peptides. The [M + Ag]+/[M + H]+ abundance ratios of the tryptic peptides vary from ∼0.2 to 1. Six argentinated tryptic peptides have been (13) Papayannopoulos, I. A. Mass Spectrom. Rev. 1995, 14, 49-73.
2078
Analytical Chemistry, Vol. 74, No. 9, May 1, 2002
identified: residues 140-147, 106-115, 38-49, 121-134, 23-37, and 119-134, constituting ∼28% of the residues in the R-casein precursor. Attempts to correlate [M + Ag]+/[M + H]+ abundance ratios with the presence or absence of specific residues are inconclusive. Figure 6a shows the argentinated tryptic peptide map of 37 fmol of bovine serum albumin acquired under optimized conditions to establish performance benchmarks. Again, a number of pairs of argentinated and protonated tryptic peptides are evident. These are residues 161-167, 66-75, 35-44, 402-412, 360-371, 421-433, 347-359, 437-451, and 168-183, representing 107 out
of a total of 607 (∼18% of the) residues in the protein. The production spectra of four argentinated tryptic peptides (from 350 fmol of protein) are displayed in Figure 6b-e. It is apparent that the fragmentation of these residues yields C-terminal partial sequences of a minimum of three residues, which can be applied to sequence-tag analysis. Inputting the peptide molecular masses and the partial sequences shown in Figure 6 into Sequence Query of Mascot,14 an Internet-based protein identification program, unambiguously identifies the protein as bovine serum albumin. The three bovine serum albumin proteins that have an identical probability-based Mowse score of 307 (a score of >71 is significant) are NCBInr Accession numbers gi 418694, 1351907, and (14) http://matrixscience.com.
2190337, in which residues 42, 190, and 214 differ. In gi 418694, they are respectively Q, D, and A; in gi 1351907, they are H, E, and A; and in gi 2190337, they are H, E, and T. Together, they represent a variability of 3 out of a total of 607 residues. The last example is illustrated in Figure 7. The tryptic peptides of the putative MEF2A protein were derivatized with Ag+ and examined using MALDI. Part a of Figure 7 is the argentinated tryptic peptide map of the protein, showing the existence of a number of argentinated peptides. Parts b and c show sequencing of the argentinated peptides at m/z 1545.59 and 1532.51 Th, which yields partial sequences of -GNI/LGMNSR and -PHESR, respectively. Inputting the peptide molecular masses and the partial Analytical Chemistry, Vol. 74, No. 9, May 1, 2002
2079
Figure 6. (a) MALDI spectrum of 37 fmol of bovine serum albumin: O, protonated peptide; b, argentinated peptide; and product ion spectra of argentinated tryptic peptides at (b) 1035, (c) 1271, (d) 1412, and (e) 1588 Th.
sequences into Sequence Query of Mascot14 unambiguously identified the protein as MEF2A with a probability-based Mowse score of 165 (a score of >65 is significant). Parts d and e show product ion spectra of the protonated counterparts of the argentinated peptides; these spectra are comparatively less informative in their sequence content. CONCLUSION Abundant [M + Ag]+ ions are formed under MALDI conditions. Partial sequencing of argentinated tryptic peptides from
2080
Analytical Chemistry, Vol. 74, No. 9, May 1, 2002
bovine serum albumin and MEF2A has been demonstrated. The sequence tags thus generated were shown to correctly identify the proteins after searches of protein database. The greatest current utility in sequencing argentinated tryptic peptides is its potential for unattended and automated fragment assignment, thereby leading to automated sequence-tag analysis and protein identification. However, because sequencing is de novo in nature, a potential future utility may lie in sequencing peptides of proteins from organisms of unknown genome.
Analytical Chemistry, Vol. 74, No. 9, May 1, 2002
2081
Figure 7. (a) Argentinated tryptic peptide map for putative MEF2A protein: O, protonated peptide; b, argentinated peptide; numbers on the double-headed arrows show residue numbers determined after MEF2A was confirmed; (b) product ion spectrum of the [M + Ag]+ cluster centered at 1545.59 Th; (c) product ion spectrum of the [M + Ag]+ cluster centered at 1532.51 Th; (d) product ion spectrum of the [M + H]+ analogue of the tryptic peptide shown in part b; (e) product ion spectrum of the [M + H]+ analogue of the tryptic peptide shown in part c.
ACKNOWLEDGMENT We thank Ms. Zhe Wang for contributing toward software development in sequencing argentinated peptides. Financial support from the Natural Science and Engineering Research Council of Canada, MDS SCIEX, Canada Foundation for Innovation,
2082
Analytical Chemistry, Vol. 74, No. 9, May 1, 2002
Ontario Innovation Trust, and York University is gratefully acknowledged. Received for review October 18, 2001. Accepted February 19, 2002. AC0111006