Anal. Chem. 2004, 76, 1445-1452
Trace Labeling of Proteins with Stable Isotopes To Identify Fragments in Complex Mixtures Juan Zou, A. Neil Turner, and Richard G. Phelps*
MRC Centre for Inflammation Research, Room 231, Hugh Robson Building, University of Edinburgh, 14 George Square, Edinburgh, EH8 9JZ U.K.
We describe a novel nonradioactive protein-labeling technique that permits mass spectrometric identification of fragments of labeled proteins. Proteins are labeled by modulating their content of carbon-13 and labeled fragments identified from the distinctive isotope pattern observed on MALDI-TOF mass spectrometry. We show that carbon-13 enrichment to just 2.3% of total carbon (about twice the natural abundance of 1.1%) is sufficient for all fragments to be distinguishable from fragments of natural carbon-13-content proteins. Distinguishing labeled fragments is easily accomplished by visual inspection of spectra, but importantly, we show that labeled fragments can also be identified by computer analysis of spectra using novel parameters we have derived. The technique is demonstrated for identification of fragments of carbon-13-enriched glutathione transferase within a complex mixture of unlabeled peptides by visual and computer analysis of MALDI-TOF mass spectra, but it could be developed to mass spectrometrically identify and characterize fragments of labeled proteins recovered from biological systems. Stable isotopes underpin mass spectrometric solutions to many biological questions. Established applications include use as metabolic and drug tracers,1 for quantification by isotope dilution (for example, of insulin2), to probe fluctuations in protein expression in proteomic analyses,3 in structure analysis by NMR4 and hydrogen exchange,5 to identify proteolytic cut sites by 18O incorporation,6 and to provide compositional information to assist in identification of DNA7 peptides and proteins.8-11 More recently, * Corresponding author. Phone: (44) 131 651 1654. Fax: (44) 131 651 1848. E-mail:
[email protected]. (1) Patterson, B. W. Metabolism 1997, 46, 322-329. (2) Stocklin, R.; Vu, L.; Vadas, L.; Cerini, F.; Kippen, A. D.; Offord, R. E.; Rose, K. Diabetes 1997, 46, 44-50. (3) Tao, W. A.; Aebersold, R. Curr. Opin. Biotechnol. 2003, 14, 110-118. (4) Kainosho, M. Nat. Struct. Biol. 1997, Suppl. 4, 858-861. (5) Raschke, T. M.; Marqusee, S. Curr. Opin. Biotechnol. 1998, 9, 80-86. (6) Stewart, II; Thomson, T.; Figeys, D. Rapid Commun. Mass Spectrom. 2001, 15, 2456-2465. (7) Chen, X.; Fei, Z.; Smith, L. M.; Bradbury, E. M.; Majidi, V. Anal. Chem. 1999, 71, 3118-3125. (8) Marshall, A. G.; Senko, M. W.; Li, W.; Li, M.; Dillon, S.; Guan, S.; Logan, T. M. J. Am. Chem. Soc. 1997, 119, 433-434. (9) Rodgers, R. P.; Blumer, E. N.; Hendrickson, C. L.; Marshall, A. G. J. Am. Soc. Mass Spectrom. 2000, 11, 835-840. (10) Chen, X.; Smith, L. M.; Bradbury, E. M. Anal. Chem. 2000, 72, 11341143. 10.1021/ac035160i CCC: $27.50 Published on Web 01/22/2004
© 2004 American Chemical Society
stable isotope incorporation has also been shown to facilitate identification of fragments of labeled proteins after enzymatic fragmentation.10,12 In the reported approaches, labeled recombinant proteins were generated by expression in Escherichia coli grown in M9 medium supplemented with natural isotope-composition amino acids and selected stable-isotope-enriched amino acids. Peptides derived from labeled proteins gave multiple signals in mass spectra, indicative of populations of ions containing natural and adjusted-isotope-composition amino acids, and the signals informed on the amino acid content of the ions. Moreover, the distinctive signals also served to identify the origin of peptides and target them for further mass spectral analysis, indicating that stable isotopes could be used to label proteins for the identification of their peptides in complex mixtures. Labeling proteins and tracking and characterizing any fragments generated in degradative biological systems is a common requirement in biological sciences; examples include analysis of antigen processing13 and identification of antigen-derived peptides bound to MHC molecules (major histocompatibility complex).14-16 Metabolic labeling with stable isotopes is attractive because they are not radioactive and can generally be incorporated into proteins without altering their biological properties, and there is potential to combine the recognition of labeled peptides and their sequence determination in a single mass spectrometric analysis. But the reported approaches using mixtures of natural and stable-isotopeenriched amino acids for metabolic labeling have significant disadvantages for tracking fragments of a labeled protein. First, only peptides containing whichever amino acids are selected for stable isotope enrichment are distinguishable as derived from the labeled protein, which is not easily overcome by using several isotope-enriched amino acids, because very complex spectra result.10,12 Second, there is potential for amino acid interconversion or recycling that would result in unpredictable additional complexity of the spectra. And third, there are considerable reagent costs and consistency concerns over generation of media with mixtures of 20 labeled and unlabeled amino acids. (11) Pan, S.; Gu, S.; Bradbury, E. M.; Chen, X. Anal. Chem. 2003, 75, 13161324. (12) Engen, J. R.; Bradbury, E. M.; Chen, X. Anal. Chem. 2002, 74, 1680-1686. (13) Watts, C. Annu. Rev. Immunol. 1997, 15, 821-850. (14) Nelson, C. A.; Roof, R. W.; McCourt, D. W.; Unanue, E. R. Proc. Natl. Acad. Sci. U.S.A. 1992, 89, 7380-7383. (15) Engelhard, V. H.; Brickner, A. G.; Zarling, A. L. Mol. Immunol. 2002, 39, 127-137. (16) Peakman, M.; Stevens, E. J.; Lohmann, T.; Narendran, P.; Dromey, J.; Alexander, A.; Tomlinson, A. J.; Trucco, M.; Gorga, J. C.; Chicz, R. M. J. Clin. Invest. 1999, 104, 1449-1457.
Analytical Chemistry, Vol. 76, No. 5, March 1, 2004 1445
Figure 1. Origin of isotope patterns of peptide ions. The left-hand panel shows part of a MALDI-TOF mass spectrum of a singly protonated peptide with monoisotopic mass 1531.76. The lowest m/z peak reflects detection of ions in which every atom is the lowest mass isotope of the respective element and is at m/z ) (monoisotopic mass + mass of a proton)/1, the second peak ions in which one atom of carbon, oxygen, nitrogen, or hydrogen is the respective one-unitheavier isotope, and the heavier peaks are contributed by many possible combinations of isotopes. Note that the full mass spectral complexity arising from content of isotopes is scarcely apparent in spectra obtained at the resolution of widely used instruments. For a full discussion, see reference 25. The right-hand panel shows the predicted isotope patterns for a peptide of m/z 1532 in which all the atoms are the highest-abundance (lowest mass) isotope, except the indicated element in which the next-most-abundant isotope is allowed at its natural abundance. Note that the predicted pattern considering carbon alone is very close to that observed by MALDI-TOF. This is because of the high natural frequency of carbon-13 and the relatively high abundance of carbon atoms in proteins (∼32% of atoms compared with H 50%, O 9%, N 9%, and S 0.3%). Predicted spectra were calculated using eq 3 as described in the appendix.
We describe a different approach that uses just 13C6-glucose as the labeling agent. Since the isotope pattern of peptide ions is essentially that due to their content of 13C (Figure 1), we reasoned that peptides with adjusted 13C content should have distinct isotope patterns, as compared with peptides with natural 13C content. We first considered 13C depletion, but the much wider use and consequent lower cost of 13C-enriched materials made a powerful economic case to develop an approach that used 13C enrichment. Here, we show that even modest changes in 13C content give distinctive isotope patterns and that isotope patterns can be used to recognize fragments of 13C-content-adjusted proteins. Importantly, since carbon content is a fairly constant feature of peptide fragments, all peptides derived from 13C-labeled proteins have a consistent and distinctive isotopic signature. The isotopic signature is discernible in spectra acquired with widely available MALDITOF instruments. Moreover, we demonstrate that fragments of labeled proteins may be identified in complex mixtures and that the identification may be automated by computer analysis, suggesting the approach should enable the tracking and characterization of fragments of labeled proteins through biological systems. EXPERIMENTAL PROCEDURES Production of 13C-Labeled Proteins. Schistomal glutathione transferase (sGT) with controlled 13C content was generated using DH5 cells transformed with the plasmid pGEX, grown overnight in 2YT broth, then diluted 3:100 into M9 minimal medium containing 3 g/L glucose. When the culture OD600 reached 0.6, 1446 Analytical Chemistry, Vol. 76, No. 5, March 1, 2004
isopropylthiogalactoside (IPTG) was added to 0.8 mM, and 13C6glucose to 0-3.3% of total glucose. Cultures were continued for 12 h. Cells recovered from the cultures by centrifugation were lysed, and sGT was recovered using glutathione-4B (Pharmacia) as recommended by the manufacturer. The procedure yielded “purified sGT” in which sGT comprised about >95% of the total protein estimated by densitometry of a Coomassie Blue-stained sodium dodecyl sulfate (SDS) gel separation of dilutions of the sGT preparation. 13C-enriched (labeled) sGT was produced in an identical manner, but at the time of adding IPTG, 3-10 mg of 13C -glucose (supplied by Cambridge Isotope Laboratories Inc., 6 MA) per 100 mL culture was added for final 13C6-glucose contents of 1, 2.3, or 3.3% of total glucose. The 13C-enriched (labeled) NC1 domain of the R3 chain of type IV collagen (R3(IV)NC1, the Goodpasture autoantigen17) and the C fragment of tetanus toxoid (TTCF13) were produced as histidine tag fusion proteins in BL21 (DE3) cells using the same induction protocol but purified by metal chelation chromatography. Trypsin Digestion and MALDI-TOF Analysis. Proteins were digested with trypsin either in solution or within slabs of gel after SDS-PAGE. Peptide extracts were cocrystallized with R-cyano-4hydroxycinnamic acid by the dried droplet technique and analyzed in a Voyager DE Pro mass spectrometer (Perseptive) used with a reflector and with an extraction delay of 180 ns, accelerating voltage of 20 000 V, grid of 76%, and by examining the mass range 600-4000. Visual and Computer Analysis of Spectra. Initial analysis employed the Data Explorer software package (Perseptive). Spectra were internally calibrated on the 842.45 and 2211.1 trypsin autolysis peaks visible in all spectra. Peak measurements were made with Data Explorer itself or with in-house routines18 accessed through the IGOR XOP mechanism of the IGOR Pro program (Wavemetrics). The routines enhanced MALDI-TOF spectra as described by Wehofsky et al.;19 measured peaks using a Gaussian fit; grouped together peaks likely to constitute isotopes of individual peptides by recognizing their compatible m/z and relative abundance; and last, calculated the parameters of 13C content that are described below. RESULTS Generation of Peptides with Modified Content of Carbon13. Initial experiments determined suitable conditions for the generation of recombinant proteins with modified 13C content such that peptides derived from the labeled proteins had distinct isotope patterns when analyzed by MALDI-TOF. Schistomal glutathione transferase (sGT) was selected for preliminary experiments because it is readily produced and affinity-purified from DH5 E. coli containing the plasmid pGEX. Recombinant proteins were treated with Trypsin to yield a predictable set of sGT-derived peptides with m/z in the range of interest for MHC-peptide analysis. Techniques established for producing stable-isotope-enriched proteins for NMR were adapted to generate proteins trace-labeled with 13C. Glucose is the only carbon source in M9 minimal medium, so the 13C content of bacterial proteins can be adjusted (17) Phelps, R. G.; Turner, A. N. J. Nephrol. 1996, 9, 111-117. (18) Source code available to academic users under license from the authors. (19) Wehofsky, M.; Hoffman, R.; Hubert, M.; Spengler, B. Eur. J. Mass Spectrom. 2001, 7, 39-46.
Figure 2. Predicted isotope pattern for peptides by mass and abundance of 13C. Predicted relative abundances of ions with m/z M, ..., M + i for four peptides with M ) 770-2800 is shown at natural 13C content (1.1%, bottom panels) and at 3 degrees of 13C enrichment. The abundances were calculated without consideration of isotopes of elements other than carbon. The solid bars show the expected patterns for peptides containing the average number of carbon atoms in naturally occurring peptides of the indicated mass, and the error bars show the variation expected consequent upon the range of carbon atom contents observed in 95% of natural peptides.
Figure 3. Observed isotope patterns of carbon-13-enriched peptides. Isotope patterns of four selected peptides with m/z 770-2269 observed in MALDI-TOF mass spectra of trypsin digests of sGT made in E. coli grown in medium containing natural 13C-content glucose (bottom panel) or in 13C-enriched medium containing 13C6-glucose at the indicated proportion. Note the isotope pattern of all the peptides is influenced by 13C enrichment, but the resulting pattern depends on m/z.
by controlling the proportions of natural glucose (carbon atoms presumed 1.1% 13C) and 13C6-glucose in the medium. DH5 cells proliferated slowly in M9 medium, but we obtained reasonable yields (∼2.3 mg of sGT from a 100-mL culture) by growing the bacteria in a rich broth (2YT) then diluting the cultures 33-fold into M9 medium. This process gave a small (3.3 vol %) carryover of medium with diverse and uncontrolled 13C content but proved an acceptable compromise between yield and control. Variation of Isotope Patterns of Peptides. Isotope patterns of peptides vary predictably with content of carbon-13. A range of appropriate 13C contents was first judged by inspection of the predicted isotope patterns of peptides with a range of 13C contents (Figure 2). Next, sGT was produced in M9 medium in which the glucose was made with natural glucose and 1, 2.3, or 3.3% 13C6glucose. Trypsin digests of sGT were analyzed by MALDI-TOF mass spectrometry. The isotope patterns of sGT fragments were clearly influenced by 13C content (Figure 3) and were broadly consistent with those we had predicted (compare Figure 2). The contrast between isotope patterns of peptides with natural and enriched 13C content increased with 13C content, but so did two undesirable consequences. First, weaker ions became less distinct from baseline noise as the total number of detected ions became distributed between a greater number of peaks. Second, the broader envelopes of the isotope patterns of 13C-enriched peptides increased the frequency with which overlaps in the isotope patterns of different peptides generated complex composite isotope patterns that were difficult to interpret. These factors drove us to choose an intermediate degree of enrichment and to use 2.3% 13C6glucose for subsequent experiments.
Sensitivity and Specificity of Recognizing Fragments of Carbon-13-Labeled Proteins from Their Isotope Patterns. To evaluate the utility of 13C labeling for mass spectrometric identification of fragments of a labeled protein, we first assessed the consistency of the “labeled” isotope pattern. Mass spectra of tryptic digests of sGT made in three separate experiments with bacteria grown in 2.3% 13C6-glucose were searched for all peptide ions with m/z consistent with their deriving from sGT. Twenty ions were identified with m/z between 708 and 2357 Da, representing more than 66% of the sequence of sGT. A total of 55 occurrences of the putative sGT-derived ions were identified. All 55/55 putative sGTderived ions had a labeled isotope pattern that was distinct from the patterns of the same m/z ions in spectra of unlabeled sGT digests. Two other 13C-labeled recombinant proteins (TTCF and R3(IV)NC1) and labeled E. coli outer membrane proteins have been made by essentially similar techniques. In all cases, expected tryptic fragments have a labeled isotope pattern similar to that observed for sGT and distinct from the respective unlabeled fragments. Thus, the labeled isotope appearance is consistent between batches of sGT, between proteins, and between peptides representing at least two-thirds of the sequences of the proteins. The specificity of the labeled isotope pattern was evaluated by examining in the same mass spectra the isotope patterns of other ions believed not to derive from labeled sGT (i.e., m/z inconsistent with tryptic fragments of sGT). The spectra contained 15-20 significant other ions, most of which had m/z consistent with their being tryptic fragments of trypsin itself or of keratin (in the case of gel-purified sGT), but eight peaks had m/z inconsistent with predicted fragments of sGT, trypsin or keratin, possibly on account of mass-altering modifications or possibly Analytical Chemistry, Vol. 76, No. 5, March 1, 2004
1447
Figure 4. The distributions of paramλ and paramC13% are distinct for unlabeled and labeled peptides. Paramλ (left panel) and paramC13% (right panel) are shown calculated for 55 unlabeled (open circles) and 63 labeled (closed circles) peptides. Paramλ exhibits a slight dependence on m/z that was particularly marked for labeled peptides and reduces discrimination at higher m/z. The dependence was linear (fitted by 0.05952.8 × 10-6m/z for unlabeled and 0.101-1.272 × 10-5m/z for labeled peptides) and is attributed to a tendency of the peak picking routines to overestimate the height of small peaks, in this case the small monoisotopic peak of higher m/z peptides.
derived from unknown contaminants. Almost all had unlabeled “normal” 13C-content isotope patterns, indicating that the labeled isotope pattern was highly specific. Only one ion believed not to derive from labeled sGT (m/z 1799.9) had a labeled isotope pattern. Since the labeled appearance was otherwise completely specific, we reanalyzed the sequence of sGT for predicted fragments with matching m/z, allowing less typical trypsin cleavages. The reanalysis identified the sGT peptide DVKNML (m/z 1799.95, M° is methionine sulfoxide) as a plausible tryptic fragment of sGT that could be generated by cutting after a lysine/histidine couplet with a single missed cleavage. Therefore, the labeled isotope pattern was highly or completely specific to fragments of 13C-enriched sGT. Specificity was further evaluated by inspection of large numbers of spectra of tryptic and cathepsin D digests of many unlabeled proteins. Apparently, labeled ions (false positives) were occasionally observed due to chance overlapping of the isotope patterns of two unlabeled peptide ions. An example is shown in an inset in Figure 6. We found that the composite isotope pattern of two unlabeled peptides could be indistinguishable from that of a single labeled peptide if the first (monoisotopic) peak of the heavier peptide was superimposed, within the resolution of the mass spectrometer, on the second, third, or fourth peak of the lighter peptide, and both unlabeled peptides had similar abundance. These conditions prevail in the unfractionated digests of proteins used in this study, because most of the peptides derive from the same parent protein. The highest occurrence of false positives we have observed in very crowded spectra (>100 peptide) is 3%, suggesting a lower limit for the specificity of the labeled isotope patterns of 97%. Therefore, the results show that the labeled isotope pattern is generally completely specific, and even in very adverse crowded spectra, false positives occur no more frequently than 3% (specificity g 97%). Note that crowded spectra are also undesirable, because ion suppression effects are more common, and usual practice would be to fractionate complex samples before analysis. Parameters for Automated Identification of Labeled Peptides. Visual identification of peptides with a 13C-labeled isotope pattern was easily learned, but the analysis was time-consuming and subjective. We therefore sought ways to describe isotope patterns numerically so that a computer could be used to find 1448 Analytical Chemistry, Vol. 76, No. 5, March 1, 2004
Figure 5. Labeled peptides are visually distinct in a complex mixture. The top panel shows a MALDI-TOF mass spectrum of a mixture of tryptic digests of unlabeled BSA and 2.3% 13C6-glucoselabeled sGT. The bottom panel shows a closeup of a small portion of the spectrum to reveal the isotope patterns of three peptide ions. The isotope patterns of the more abundant ions are typical of natural abundance carbon-13 (unlabeled) or 2.3% 13C6-glucose-labeled peptides (labeled). The less abundant ion has an intermediate pattern that is indeterminate with respect to carbon-13 content.
labeled peptide ions in spectra. Two parameters were devised. ParamC13% is an estimate of the 13C content of a peptide calculated from the relative heights of its isotopic peaks. Paramλ is an estimation of the number of carbons per Dalton molecular weight of a peptide calculated from the relative heights of the first two or three isotopic peaks. Paramλ of natural peptides has a narrow distribution around a mean that depends on the abundance of 13C. The derivation of formulas to calculate these parameters is presented in the technical appendix. ParamC13% and Paramλ. ParamC13% and paramλ are measures of the “labeledness” of isotope patterns. ParamC13% and paramλ were calculated for the spectra of selected sGT fragments (same as shown in Figure 3) made in E. coli grown in M9 medium supplemented with 0-3.3% 13C6-glucose in order to confirm that the proposed parameters of 13C content were sensitive to changes in 13C. As expected, the values of both parameters increased with content of 13C (Table 1).
Table 1. Computed Parameters of Labeledness Correlated with Carbon-13 Content paramλa
paramC13%a
m/z
0
1.1
2.3
3.3
0
1.1
2.3
3.3
770 1138 1532 2269 mean
0.068 0.053 0.054 0.053 0.057
0.077 0.066 0.058 0.052 0.063
0.100 0.084 0.076 0.068 0.082
0.111 0.101 0.082
1.5 1.6 1.5 1.4 1.5
2.3 2.4 2.0 2.0 2.1
2.8 2.2 2.3 2.5 2.4
2.9 2.6 2.5
mean σ
55 Unlabeled and 63 2.3%-Labeled Peptidesb 0.056 0.083 1.5 2.5 0.005 0.012 0.18 0.29
0.098
2.7
a Paramλ and paramC13% are shown calculated for selected (compare to Figure 3) sGT ions with m/z 770-2269 proteolytically derived from sGT made in M9 medium containing natural glucose or glucose supplemented with 13C6-glucose to 1.1-3.3% of total glucose. The values of both parameters increase with carbon-13 content. b The lower rows summarize the distribution of the parameters determined for all the identifiable tryptic fragments of three proteins (sGT, TTCF, R3(IV)NC1) made in natural glucose or glucose supplemented to 2.3% 13C6-glucose.
Figure 6. Automated recognition of labeled peptides in a complex mixture. Automatically determined paramC13% and paramλ are shown for the major peaks of the spectrum shown in Figure 5. The unprocessed spectrum was analyzed by a computer program that first identified sets of peaks comprising the isotope patterns of individual peptides, then measured the relative abundance of the isotopes and calculated the parameters of the 13C content. Labeled peptide ions are shown as closed circles and unlabeled peptides, as open circles. The unlabeled peptides have values of both parameters distributed essentially as expected from the analysis of isotope patterns of unlabeled peptides shown in Figure 4: they are scattered around the expected mean (dashed line) and largely lower than the mean plus 2 or 3 standard deviations (dotted lines). In contrast, labeled peptides have higher values of both parameters. The inset top right shows the spectrum for the unlabeled ion at 2298 that gave rise to a “labeled” (false positive) paramC13% but “unlabeled” paramλ. The spectrum is a composite of isotope patterns for two expected tryptic fragments of an unlabeled protein with overlapping isotope patterns at m/z 2298.12 and 2301.08.
The capacity of the parameters to distinguish the isotope patterns of peptides with natural and 2.3% 13C6-glucose enriched 13C content was further tested by calculating paramC13% and paramλ for 55 unlabeled and 63 2.3% 13C6-glucose-enriched peptides made by trypsin digestion of sGT, TTCF, or R3(IV)NC1. ParamC13% calculated for unlabeled peptides had mean of 1.5 (σ ) 0.177, 95% of measurements between 1.15 and 1.93). In comparison, paramC13% for 2.3% 13C6-glucose-enhanced peptides had mean of 2.5 (σ ) 0.29, 95% of measurements between 1.93 and 3.07). Therefore, the distributions of paramC13% measured for natural and 2.3% 13C6-glucose enhanced peptides were distinct, and as shown in Figure 4, the distinction was consistent throughout the mass range of the peptides examined. Paramλ also distinguished natural and 2.3% 13C6-glucose-enhanced peptides, but the distinction slightly reduced with m/z (Table 1 and Figure 4).
Thus, both parameters distinguished labeled from unlabeled peptides in mass spectra. ParamC13% gave the greatest distinction across the mass range but could be calculated only for peptides with three identifiable isotopic peaks, whereas paramλ could be calculated from just two peaks, but gave lesser distinction at higher m/z. 13C-Labeled Protein Fragments in Complex Mixtures. To test the capability of 13C enrichment to distinguish fragments of labeled proteins in complex mixtures, we took advantage of the preceding analysis of sGT fragments to devise a test that would not require wholesale sequencing. We mixed the well-characterized tryptic digest of 2.3% 13C6-glucose-enriched sGT with similarly characterized typsin digests of 1-5 unlabeled other proteins. MALDI-TOF analysis of the resulting mixtures generated spectra in which all significant peaks could be confidently identified as derived from labeled or unlabeled proteins by reference to the spectra of individual protein digests. MALDI-TOF mass spectra of the unfractionated mixtures were complex (Figure 5), but contained fewer peaks than the components of the mixture analyzed individually, presumably because of ion suppression effects. Scrutiny of the isotope patterns in the spectrum identified 16 peptides that were clearly “labeled”. All had m/z matching expected tryptic fragments of sGT and were observed in trypsin digests of sGT alone, indicating that the labeled isotope pattern was 100% specific to sGT peptides, even within complex mixtures. All of the 30 peptides that had clearly nonlabeled patterns had m/z’s that matched to expected tryptic fragments of other proteins in the mixture, indicating that the nonlabeled appearance was also very specific. Note that although all the major ions could be assigned to one or other category, some low abundance ions had isotope patterns that were ambiguous (an example is shown in Figure 5). We were concerned that the results were influenced by our familiarity with the m/z’s of sGT fragments, but an observer familiar with isotope patterns but not with the m/z’s of tryptic fragments of sGT classified the 46 isotope patterns in exactly the same way. Therefore, provided classification was not attempted for low-abundance ions, visual classification of isotope patterns correctly classified all isotope Analytical Chemistry, Vol. 76, No. 5, March 1, 2004
1449
Table 2. Sensitivity and Specificity of Automated Recognition of Fragments of Labeled SGTa identity labeledness labeled not labeled total sensitivity specificity positive predictive value negative predictive value
sGT
other
15 1 16
0 33 33
total 15 34 49 93.8% 100% 100% 97.1%
a The truth table shows the results of automated analysis of the spectrum in Figure 5 where peptides are identified as labeled if calculated values of both paramλ and paramC13% exceed the mean + 2 SD of the distributions observed for peptides with natural 13C content. Combining the parameters yields a highly specific measure, which is essential for computer screening of large numbers of spectra in which labeled peptides are infrequent, but at the price of a slight reduction in sensitivity.
patterns as labeled and sGT-derived or unlabeled and otherprotein-derived. To evaluate automated identification of 13C-enriched peptides, a computer was programmed to calculate paramC13% and paramλ for all isotope patterns in MALDI-TOF mass spectra. Figure 6 shows calculated values of paramC13% for 46 peptides identified by computer analysis of the spectrum in Figure 5. ParamC13% for fragments of unlabeled proteins (open circles) had a distribution that was very similar to that already described (compare Figure 4), but all 16 fragments that were 13C-enriched sGT (filled circles) had paramC13% that was greater than observed for 97.5% of the unlabeled peptides. Similarly, paramλ exceeded values calculated for 97.5% of natural peptides for 15/16 sGT fragment, but only 2/30 fragments of other proteins. Examination of the small number of false positives identified different confounding factors for the two parameters, as has already been observed. We therefore tested the discrimination that could be obtained by requiring both parameters to exceed the expected range for natural peptides. The positive predictive value under this criterion was 100%, achieved at the cost of a small reduction in sensitivity (Table 2). CONCLUSIONS The results show that fragments of 13C-enriched proteins have sufficiently distinct isotope patterns to be reliably identified in complex mixtures by visual or computer analysis of MALDI-TOF mass spectra. This is a useful variation on the use of stable-isotopelabeled amino acids to generate stable-isotope-labeled proteins, because the spectra of fragments of labeled proteins are less complex and more predictable, and the labeled isotope pattern is consistent for all peptides, not just those that contain particular amino acids. The predictable nature of the labeled isotope pattern has the further important advantage that it is feasible to use computer analysis of spectra to identify labeled fragments. As well as convenience, computer analysis could be used to automatically direct further analysis, for example, selecting labeled peptides eluting during LC/MS runs for fragment ion analysis. On the other hand, information on amino acid composition that is obtained with amino acid labels is not obtained with 13C enrichment. This may 1450 Analytical Chemistry, Vol. 76, No. 5, March 1, 2004
not be a disadvantage in many circumstances, because high confidence that an unknown peptide derives from a particular parent sequence (the labeled protein), is often sufficient to confidently infer the sequence of the peptide from its accurately determined mass.20 Last, the labeling protocol is much simpler and a great deal cheaper than protocols using combinations of labeled and natural amino acids because it uses only trace amount of readily available 13C6-glucose to enrich standard M9 minimal medium. Note that although the mixtures used to demonstrate the technique are not complex by the standards of biological samples, they are at and beyond the level of complexity that can be reliably examined by MALDI-TOF, as is evident from the striking ion suppression we observed. Therefore, the results demonstrate that discrimination of 13C-enriched peptides works at levels of complexity in a single spectrum at the extreme end of those that are sensibly analyzed by MALDI-TOF. Complex mixtures of peptides are usually fractionated in some way before mass spectrometry using MALDI-TOF or other types of instrument, in part to reduce ion suppression. Our analysis of the factors that reduce the specificity of visual or automated recognition of 13C-enriched peptides indicates that specificity would be increased by any such prior fractionation, but is 97% even on samples of complexity such that ion suppression is substantial, bordering on unacceptable. We therefore expect the technique to work very well under conditions usually used to examine complex biological samples, and have already found (work in progress) that we can identify 13C-labeled protein-derived fragments among peptides eluted from HLA class II molecules by MALDI-TOF analysis of peptide extracts fractionated by 2D-chromatograpy. The method has features that are desirable in a protein-labeling strategy. Production of labeled proteins uses only standard recombinant expression techniques and small amounts of readily available 13C6-glucose. The only expenditure over that of producing unlabeled protein is the small cost of 13C6-glucose: at the time of writing, 13C6-glucose to make labeled sGT cost less than $1 per milligram of purified protein.21 It is necessary to grow bacteria in an M9 medium that may not suit some strains and certainly reduces yield of recombinant proteins. The dilution technique we describe gave yields in excess of 10 mg/L of proteins expressed in DH5, JM101, and BL21DE3 strains of E. coli. The method should therefore be widely applicable in bacterial expression. But 13C labeling may not be easily extended to proteins that are only satisfactorily expressed in eukaryotic cells, at least not as easily as approaches that use labeled amino acids.11 An attractive possibility is the manufacture of media compatible with eukaryotic cell growth by processing of prokaryotic cells grown in 13Cenriched minimal media, as has been described for NMR studies.22 Although we have focused on 13C enrichment, the broad strategy founded on interpretation of isotope patterns is adaptable to manipulation of other stable isotopes and to depletion as well as enrichment, provided the manipulation predictably and discernibly alters the isotope pattern. Especially attractive would be 13C depletion because the isotope signature of labeled peptides would (20) Zou, J.; Turner, A. N.; Phelps, R. G. Anal. Chem. 2003, 75, 2653-2662. (21) Calculated for yield of 2.3 mg/100 mL medium supplemented with 6 mg of 13 C6-glucose from Cambridge Isotopes, MA. (22) Hansen, A. P.; Petros, A. M.; Mazar, A. P.; Pederson, T. M.; Rueter, A.; Fesik, S. W. Biochemistry 1992, 31, 12713-12718.
be just as distinctive as with 13C enrichment and sensitivity would be increased (because most ions of a particular peptide would have the monoisotopic mass and be detected as a single large peak rather than distributed between many peaks). But 13Cdepleted carbon sources would be required in greater quantities (50-fold more: ∼100 mg >99.5% 12C6-glucose/mg of purified sGT), and at present, they are more expensive. Although several applications can be envisaged, the problem that has driven development of this technique is biochemical identification of antigen-derived peptides recovered from antigenpulsed antigen-presenting cells as an approach to identifying T-cell epitopes and studying antigen processing. Here, the principal difficulty is distinguishing for further characterization the few antigen-derived peptides in a complex sea of biochemically similar peptides. Since the approach was first described in 1992,14 there have fewer than 10 reports of successful application to identify antigen-derived peptides/T cell epitopes, and although developments in mass spectrometric sequencing23,24 have greatly helped peptide characterization, the problem remains of choosing which peptides to sequence. Our labeling approach tackles this problem by making all peptides derived from a labeled antigen distinguishable by their characteristic isotopic pattern that is distinct from all peptides derived from unlabeled proteins. The strategy is proving very effective for identification of antigen-derived peptides among peptides eluted from antigen-pulsed B cells and could be very useful in other mass spectrometric analyses of biological samples. TECHNICAL APPENDIX Parameters for Automated Identification of Labeled Peptides. Visual identification of peptides with a 13C-labeled isotope pattern was easily learned, but the analysis was time-consuming and subjective. We therefore sought ways to describe isotope patterns numerically so that a computer could be used to find labeled peptide ions in spectra. Two parameters were devised: First, we reasoned that for ions of a singly charged peptide with monoisotopic m/z M, the measured counts (Pi) of ions with m/z M, M + 1, ..., M + i should be proportional to the relative abundance of ions containing 0, 1, 2, etc., 13C atoms, presuming only equivalent ionization. Then the percentage proportion of 13C atoms in the peptide is given by i)n
∑iP 13
C% ) 100‚
i
i)1
Nc‚
(1)
i)n
∑P
Pn may be taken as the sum of the measured ion counts across the width of peaks. Second, Nc is approximated by the mean number of carbon atoms occurring in peptides of the measured m/z. Carbon content obviously depends on amino acid composition, but the variation for naturally occurring peptides is surprisingly small: determination of Nc for 400 peptide sequences randomly extracted from human proteins so as to have Mw of 1500 to 2500 Daltons found that Nc increases linearly with peptide molecular weight Mw according to
Nc ) λMw
(2)
where the constant λ (the number of carbon atoms per Dalton molecular mass) is 0.0408 with normal distribution and standard deviation 0.002 65 (data not shown). Thus, the average expected number of carbon atoms, λMw, may be used in place of Nc. We use paramC13% to refer to the percentage proportion of 13C atoms calculated using eq 1 and these approximations. Note that the approximation for Nc introduces only a small element of variability into the value of paramC13% (95% limits -14.5 to +11%), as compared with the 100% increases expected from the ∼2-fold increment in carbon-13 content used for labeling. Second, since the number of carbons per Dalton of molecular weight (λ) of peptides in real proteins had such a narrow range, we reasoned that formulas that expressed λ in terms of carbon13 content-dependent parameters would enable identification of labeled peptides by comparison of a computed value, called paramλ, with the distribution of λ in natural peptides of the same mass. We therefore devised formulas that expressed λ in terms of the relative peak heights of the first three peaks. It is assumed that carbon-13 atoms are randomly distributed with abundance p such that the relative abundance of ions with 0, 1, ..., i (Pi) carbon-13 atoms is given by the binomial expansion
Pi ) T
(
Nc!
(1 - p)Nc-ipi
i!(Nc - i)!
)
(3)
where T is the total number of ions of a peptide, and Nc is the number of carbon atoms in the peptide. The number of ions with 0, 1, and 2 carbon-13 atoms simplifies as P0 ) T(1 - p)Nc and P1 ) T(Ncp(1 - p)Nc-1), and P2 ) (T/2)Nc(Nc - 1)p2(1 - p)Nc-2. Using φ ) ((1 - p)/p) , the ratios of these expressions conveniently rearrange to give expressions for Nc and, hence, paramλ.
i
i)0
where P0, P1, P2, etc., are the counts of ions in the monoisotopic, first, second, etc. peaks and Nc is the number of carbon atoms in the peptide. This quantity may be estimated as a measure of 13C content by making the following reasonable approximations. First, (23) Nepom, G. T.; Lippolis, J. D.; White, F. M.; Masewicz, S.; Marto, J. A.; Herman, A.; Luckey, C. J.; Falk, B.; Shabanowitz, J.; Hunt, D. F.; Engelhard, V. H.; Nepom, B. S. Proc. Natl. Acad. Sci. U.S.A. 2001, 98, 1763-1768. (24) Kaliyaperumal, A.; Michaels, M. A.; Datta, S. K. J. Immunol. 2002, 168, 2530-2537. (25) Werlen, R. Rapid Communications in Mass Spectrometry 1994, 8, 976980.
paramλ0,1 )
paramλ1,2 )
paramλ0,2 )
|
(4)
|
P2 1 1 + 2φ Mw P1
( x
1 1+ 2Mw
φ P1 Mw P0
(5)
) x
P2 φ 1 + 8φ2 = P0 Mw
P2 (6) P0
2
Calculation of paramλ using eqs 4-6 gave broadly similar results for most ions; the exceptions were some lower-abundance Analytical Chemistry, Vol. 76, No. 5, March 1, 2004
1451
ions for which measurement of the relative abundance was less accurate. Throughout this article we show paramλ calculated with eq 6. Abbreviations. ParamC13% and paramλ are parameters describing isotope patterns and are defined in the results. BSA, bovine serum albumin; R3(IV)NC1, R3 chain of type IV collagen, the Goodpasture autoantigen.; TTCF, C fragment of tetanus toxin; sGTS, schistomal glutathione transferase.
1452
Analytical Chemistry, Vol. 76, No. 5, March 1, 2004
ACKNOWLEDGMENT This work was supported by a grant from the Scottish Hospital’s Endowments Research Trust.
Received for review October 1, 2003. Accepted December 12, 2003. AC035160I