In the Laboratory
Introduction to Biological Mass Spectrometry: Determining Identity and Species of Origin of Two Proteins
W
Curt T. Reimann,* Axel Mie, and Carina Nilsson Department of Analytical Chemistry, Chemical Center, Lund University, P.O. Box 124, SE-221 00 Lund, Sweden; *Curt.Reimann@analykem.lu.se Arieh Cohen Department of Occupational and Environmental Medicine, University Hospital, Lund University, SE-221 85 Lund, Sweden
The dramatic demonstration of mass spectrometry (MS) of arbitrarily large proteins by John Fenn, Koichi Tanaka, and others was recognized by the 2002 Nobel Prize in Chemistry (1). MS of proteins and peptides has now become a central feature of bioanalytical chemistry, with a high profile in protein identification and characterization and comprising the central assay in the generic experiment shown in Figure 1B. One of the new buzzwords today is “systems biology”— looking at an organism as a complex machine with a multitude of interacting parts. Since most of these interacting parts are proteins, MS is now used to identify hundreds to thousands of proteins per day, in an “industrial strength” approach (2). The goal is nothing less than solving a central biological mystery—cracking the “second code of life”.1 Undergraduates need to know basic mass spectrometry principles and skills to make them competitive in today’s job market. We have been updating our curriculum to make it more responsive to students’ needs in bioanalysis. In the last three years we have developed a two-day-long laboratory exA
B Grow cells Protein extraction pI MM
MM
Total mass determination by MS
Separation and mass determination
Experimental
m/z
In-gel digestion (trypsin)
Protein digestion (trypsin) Peptide mass mapping by MS, database searching Tandem MS, sequence verification
ercise instrumentally oriented around electrospray ionization (ESI) and matrix-assisted laser desorption ionization (MALDI) MS, and practically oriented around the issues of protein identification and characterization. We attempt to touch the philosophical issue of the nature and value of different kinds of information at our disposal vis-à-vis MS and the ever-growing computer databases. We do not do high-throughput proteomics, but we wanted to provide students with a background appropriate for them to move on to positions in which they could participate in such work. We thus focused on having students solve a small mystery: comparison, identification, and partial characterization of two proteins, including identifying their animal sources. The methodology the students employ, shown in Figure 1A, is a subset of the generic experiment outlined in Figure 1B. The two proteins we chose to examine are cytochrome c from horse and cow. It is interesting to point out that cytochrome c is a mitochondrial protein. Mitochondria multiply by cell division and do not undergo sexual reproduction. Consequently, mitochondrial DNA is inherited with virtually no changes. It is passed on via the mitochondria that are inherited from the female parent organism. However, mutations do occur and some are preserved and propagate throughout a species. The stability of mitochondrial DNA within a species makes the proteins it codes for suitable for the determination of species. In fact, cytochrome c has been exploited for generating evolutionary trees (3).
Extract peptides, MS, peptide mass mapping
m/z G G Y TK m/z
Tandem MS, sequence verification
Figure 1. (A) Synopsis of the protein identification procedure followed in this laboratory exercise. (B) Workflow of a large-scale proteomics experiment based on separation of a multitude of proteins by two-dimensional gel electrophoresis followed by selection of “spots” for in-gel digestion, peptide extraction, peptide mass mapping with MS, and tandem MS for sequence verification (if needed). In B, protein total mass is determined approximately from the gel instead of by mass spectrometry (A). Otherwise the general concept is the same as that studied in the laboratory exercise. MS: mass spectrometry; tandem MS: fragmentation MS or MS/MS; MM: molar mass (total mass); pI: isoelectric point. G, Y, T, and K are amino acid residues.
www.JCE.DivCHED.org
•
To measure protein total mass, we dissolve 0.25 mg or less of a protein2 in a 1:1 volume mixture of water and acetonitrile, containing 1% formic acid. The mixture is infused into the ESI source of a Bruker Esquire quadrupole ion trap (QIT) MS (but any ESI-equipped MS could be employed). The result is an m兾z spectrum reflecting multiple protonation. An account of ESI–MS and how to interpret spectra that display multi-charging is given elsewhere (4). For identification purposes, it is more typical that an enzymatic digest of the subject protein is carried out and the masses of some subset of the resulting peptides are measured by MS. Enzymatic digests and “peptide mapping” have a rich history independent of MS and there are many enzymes (5). Accurate mass-selective detection of these peptides has made identification of the digested protein much easier. One of the most popular enzymes is trypsin, which cleaves the polypeptide chain on the C-terminus side of lysine and arginine, except if the next residue is proline. We typically digest 0.25 mg of protein with 5 µg of sequencing-grade trypsin (3). We
Vol. 82 No. 8 August 2005
•
Journal of Chemical Education
1215
In the Laboratory
carry out digestions for four hours at 37 ⬚C in 1 mL of buffer consisting of 0.1 M NH4HCO3 and 2 mM CaCl2, pH 7.5 (6). The digest mixture is examined with a Bruker Biflex III MALDI time-of-flight (TOF) MS. We use the MALDI matrix material 4-hydroxy-α-cyanocinnamic acid and the “fast evaporation” deposition method (6). Aliquots, 0.75-µL, of acidified digest are deposited on top of the matrix crystals and allowed to dry, followed by brief washing in cold water and drying. Then mass spectra are acquired and peptide monoisotopic masses extracted from the spectra. The procedure for identifying the protein from the peptide masses has been described (7). We suggest that the students use the Expasy Web site (8) to carry out the peptide mass mapping. After peptide mapping analysis, the students usually conclude that the protein is some form of cytochrome c. How-
ever, the relatively poor accuracy of our MALDI–TOF instrument leads to problems in distinguishing the top “hits” on the peptide mapping hit-list. This is where tandem mass spectrometry (MS兾MS) enters the picture. MS兾MS can yield valuable sequence information about peptides. The technique involves selecting ions of a particular m兾z value (i.e., a single kind of peptide), adding excess energy to these ions so that they fragment, and then acquiring a fragment mass spectrum. An ion trap is useful for such measurements but a triple quadrupole instrument can also be employed. The software for our QIT enables us to dial in a selected m兾z value for isolation, and after the trap is filled with only these ions, they can be excited at a variable level until a fragmentation spectrum of “reasonable” complexity is obtained (Figure 2). In these measurements, students take 100 µL of digest, combine it with 100 µL of 1:1 acetonitrile兾H2O and 2 µL formic acid, and infuse the solution into the ESI source. By this stage, the students will have done theoretical digests of their candidate proteins (9), and they search for peptides that, on the basis of sequence, could decide for one candidate cytochrome c versus another. If they can find those ions in the spectrum, they can isolate and fragment them and look at the MS兾MS spectrum. For this course, we do not recommend that students try to interpret the fragmentation spectra de novo. Instead, they should insert the candidate sequences into a “fragmentation engine” (10), print out expected y- and b-type ions, and investigate whether a generous selection of these ions appears in their spectrum. This portion of the laboratory exercise is very interactive. The instructor operates the instrument, but the students specify which ions in the spectrum they would like to examine more closely by fragmentation. Hazards
D
y4
y5
R1
O
H N
H 2N O
b1
R2
b2
y3
R3 N H
y2
O
H N O
b3
R4
Commercial mass spectrometry instrumentation is generally hazard-free as lasers are enclosed and high voltages are safety-interlocked. Avoid skin contact with proteins, solvents, and other chemicals by using gloves. Solutions should be prepared in a ventilated hood. In our view, this is a low-hazardlevel laboratory exercise.
y1
R5 N H
O
H N
OH O
b4
R6
b5
Figure. 2 Summary of MS and MS/MS. (A) Electrospray ionization of a tryptic digest of bovine cytochrome c followed by mass analysis in a quadrupole ion trap (QIT). Circles mark tryptic peptides (many of which are doubly protonated). (B) Ions with m/z 584.8 (open circle) are isolated in the QIT. These ions, doubly protonated, are suspected to correspond to the tryptic peptide [TGPNLHGLFGR + 2H+]2+. (C) The isolated ions are excited in the QIT and an ensemble of singly-fragmented, singly-protonated peptide ions is formed. A number of fragment ions can be seen that are consistent with the hypothesized sequence, e.g., y2 is GR, y3 is FGR, y4 is LFGR, and so forth. (D) The y- and b-ions are defined for a 6-mer peptide. The spacings between, e.g., consecutive y-type ions correspond to amino acid residues (6), leading to acquisition of sequence information. The fragmentation spectrum confirms the sequence beyond any reasonable doubt. This particular sequence appears in numerous cytochrome c’s, including those of both horse and cow (11).
1216
Journal of Chemical Education
•
Results and Discussion This set of laboratory exercises, though fairly straightforward, has enough twists and turns to stimulate a good level of discussion. The first issue to discuss is the usefulness of knowing the total mass of the protein. When doing peptide mapping, one has the option of including total mass (with a selectable range) as “prior knowledge”, and some students will be tempted to do this with a narrow total mass range, since they measured the total (neutral) mass to within 1.5 u or so. This procedure does not work. The total mass associated with cytochrome c in the database is wrong! It does not include the extra mass due to acetylation of the N terminus nor the mass of the heme group, even though these modifications are well known. There is an additional complication worth discussing. The students have analyzed the total mass according to the model [M + nH+]n+, where M is the parent protein, but it has been shown that the iron in heme is ferric, for example, Fe(III), meaning that the heme bears a charge
Vol. 82 No. 8 August 2005
•
www.JCE.DivCHED.org
In the Laboratory
of +1 (12). The correct model for calculating the total mass is rather [M1+ + (n − 1)H+]n+. The students must thus add the mass of one proton to their estimated total masses. These may seem like trivial examples, but they underscore the point that in general, total mass measurements are much better for characterization than for identification. We teach our students to be aware of unusual isotopic patterns. Our use of a MALDI–TOF instrument with fairly poor mass accuracy pretty consistently results in a false identification with amusing consequences: a peak at 1633.5 is attributed to the peptide IFVQKCAQCHTVEK, which has mass 1633.82 u. The printout of the cytochrome c theoretical digest from the Expasy Web site (9) indicates that heme is attached to this peptide, but the mass of the heme is not included. Therefore the linking of 1633.82 u to that peptide is incorrect. The inset to Figure 3 shows the peculiar isotopic pattern of this peak. After reflection, some students realize that iron has a dramatic isotopic pattern that could account for the two little peaks at m兾z = 1631.5 and 1632.5. The students check this by carrying out MS兾MS on the same peptide, doubly-protonated, with the ESI–QIT instrument. The result, shown in Figure 3, is that one of the fragments is a heme ion at 616.2 u, and a peak at 1018.4 u appears, consistent with [CAQCHTVEK + H+]+! In some instances we have been able to isolate the ions at m兾z = 1018.4 and do another round of MS兾MS, showing that the fragment ions obtained are consistent with this shorter sequence. The conclusion is that the peptide at 1633.5 u is Heme– CAQCHTVEK, not IFVQKCAQCHTVEK. After peptide mapping analysis, verification of the peptide sequence TGPNLHGLFGR (Figure 2C) and analysis of the heme-containing peptide (Figure 3), the students definitely conclude that the protein is some form of cytochrome c. The final goal for the students is to use tandem MS to check the sequences of tryptic peptides, which will allow them to distinguish between their top hit in peptide mass mapping and any plausible hits appearing lower on the list. As an example, we show results for a peptide that can distinguish between cow and horse cytochrome c. The tandem MS
Figure 3. The inset shows a peptide ion (m/z 1633.5) with an unusual isotopic pattern as observed by MALDI–TOF–MS. Inspection of the theoretical isotopic pattern for FeC74 (the most significant atoms in roughly the right number from the standpoint of the isotopic distribution) suggests that the peptide might contain heme. The same peptide, doubly-protonated (m/z 817.3), is subjected to MS/MS and peaks corresponding to heme (m/z 616.2) and to the underlying singly-protonated peptide CAQCHTVEK ( m / z 1018.4) are observed.
www.JCE.DivCHED.org
•
of a doubly-charged peptide at m兾z = 792.9 is shown in Figure 4. This ion is suspected to correspond to [KTGQAPGFSYTDANK + 2H+]2+. A large sequence of yions and a more limited series of b-ions is completely consistent with this sequence, which in turn points to the cytochrome c from only a few species besides cow (dog and donkey, for example) but not horse. Conclusions We designed a relatively easy laboratory exercise in protein identification and characterization that has enough twists and turns to still be thought-provoking. The students enjoy solving a mystery, and they are stimulated by the interactive nature of the exercise. At the heart of this laboratory exercise is the interplay of various types of information. The relatively poor accuracy of our MALDI–TOF instrument, though frustrating, gives a pedagogic opportunity: sometimes, the “top hit” in peptide mass mapping database searching refers to the wrong animal! Moreover, on the basis of sheer number of “matching” peptides, the next best hit may also be pretty plausible. A philosophical point that should be discussed with the students is that they have identified the “gene product” without truly identifying the protein (13). Considering the prevalence of post-translational modifications—strategic point modifications to the protein that often go unnoticed in peptide mass mapping because non-matching mass spectral peaks are ignored—this is a common and serious problem, even when high quality instrumentation is employed. As a related point, total mass measurements may be seen as considerably more useful in protein characterization than in identification.
Figure 4. Tandem MS study of the doubly-charged tryptic peptide at m/z 792.9. This peptide is suspected to have the sequence KTGQAPGFSYTDANK and the extensive series of y ions observed is consistent with this sequence. The b ions and some internal ions (promoted by the presence of proline in the sequence) are also visible. This sequence distinguishes between bovine and horse cytochrome c.
Vol. 82 No. 8 August 2005
•
Journal of Chemical Education
1217
In the Laboratory
After the students have obtained some MS兾MS data, the discussion on the quality of information can be resumed with cytochrome c as a model. In general, obtaining sequence information on any one of the tryptic peptides will confirm that the protein is a variant of cytochrome c, even in absence of any other information. Going further, a strategically selected sequence snippet alone, combined with either the total mass measurement (corrected properly to include the heme and N-terminus acetylation) or the peptide mass map, will often be overwhelmingly in favor of a particular cytochrome c. At a somewhat more advanced level, tandem MS of a small number of strategically selected peptides can distinguish between cytochrome c’s high on the student’s hit list from peptide mass mapping, even if the total mass is not known to high precision. These examples show why mass spectrometry has become a central tool in protein identification today. After carrying out these exercises, which include some hands-on experience with sophisticated instrumentation, the students get a feel for what has become a central assay in bioanalytical chemistry—digestion of a protein by enzymes followed by mass spectrometric analysis. Computer-controlled instrumentation and computer-based analysis give somewhat of a “black box” experience, but the fact that the answer still does not come out smoothly forces the students to look more carefully at what they are doing, enhancing the learning process. This exercise focused on cytochrome c, but laboratory instructors can fashion interesting exercises involving any available proteins. Acknowledgments The authors are grateful for support from the Faculty of Natural Sciences at Lund University. Helpful general discussions with Jan Eriksson and Cecilia Emanuelsson are also acknowledged. WSupplemental
Material
Instructions for the students and notes for the instructor are available in this issue of JCE Online.
1218
Journal of Chemical Education
•
Notes 1. The first code of life is the genome. The second code of life is all the proteins coded by the genome. 2. The proteins and other chemicals used are available from companies such as Sigma unless otherwise noted. All water is purified or deionized. 3. We use sequencing-grade modified porcine trypsin from Promega.
Literature Cited 1. Vestling, M. M. J. Chem. Educ. 2003, 80, 122–124. 2. Aebersold, R.; Mann, M. Nature 2003, 422, 198–207. 3. Creighton, T. E. Proteins. Structures and Molecular Properties, 2nd ed.; W. H. Freeman and Co.: New York, 1993. 4. Hofstadler, S. A.; Bakhtiar, R.; Smith, R. D. J. Chem. Educ. 1996, 73, A82–A88. 5. Vestling, M. M. J. Chem. Educ. 1991, 68, 958–960. 6. A good set of MS-related protocols and other important MSrelated information is provided by Jensen, O. N. Protein Structure: A Practical Approach; Creighton, T. E., Ed.; IRL Press/ Oxford University Press: Oxford, 1998. 7. Counterman, A. E.; Thompson, M. S.; Clemmer, D. E. J. Chem. Educ. 2003, 80, 177–180. 8. PeptIdent. http://us.expasy.org/tools/peptident.html (accessed Apr 2005). PeptIdent will soon be replaced by a more powerful tool, Aldente. http://us.expasy.org/tools/aldente/ (accessed Apr 2005). A program that is similar to PeptIdent is MS-Fit, available from ProteinProspector. http://prospector.ucsf.edu/ ucsfhtml4.0/msfit.htm (accessed Apr 2005). 9. Peptide Mass. http://us.expasy.org/tools/peptide-mass.html (accessed Apr 2005). 10. MS-Product. http://prospector.ucsf.edu/ucsfhtml4.0/msprod.htm (accessed Apr 2005). 11. Peptide Search. http://www.mann.embl-heidelberg.de/ GroupPages/PageLink/peptidesearch/Services/PeptideSearch/ FR_SequenceOnlyFormG4.html (accessed Apr 2005). 12. He, F.; Hendrickson, C. L.; Marshall, A. G. J. Am. Soc. Mass Spectrom. 2000, 11, 120–126. 13. Rappsilber, J.; Mann, M. Trends in Biochemical Sciences 2002, 27, 74–78.
Vol. 82 No. 8 August 2005
•
www.JCE.DivCHED.org