An Isotope Coding Strategy for Proteomics ... - ACS Publications

The method exploits differential derivatization of amine and carboxyl groups generated during proteolysis ... Two global coding strategies have been e...
1 downloads 0 Views 156KB Size
An Isotope Coding Strategy for Proteomics Involving Both Amine and Carboxyl Group Labeling Peiran Liu and Fred E. Regnier* Department of Chemistry, Purdue University, West Lafayette, Indiana 47907 Received May 6, 2002

This paper describes a heavy isotope coding strategy for the analysis of all types of tryptic peptides, including those that are N-terminally blocked and from the C-terminus of proteins. The method exploits differential derivatization of amine and carboxyl groups generated during proteolysis as a means of coding. Carboxyl groups produced during proteolysis incorporate 18O from H218O. Peptides from the C-terminus of proteins were not labeled with 18O unless they contained a basic C-terminal amino acid. Primary amines from control and experimental samples were differentially acylated after proteolysis with either 1H3- or 2H3-N-acetoxysuccinamide. When these two types of labeling were combined, unique coding patterns were achieved for peptides arising from the C-termini and blocked N-termini of proteins. This method was used to (1) distinguish C-terminal peptides in model proteins, (2) recognize N-terminal peptides from proteins in which the amino terminus is acylated, and (3) identify primary structure variations between proteins from different sources. Keywords: proteomics • isotope coding • C-terminal • N-terminal • GIST •

Introduction Stable isotopes have been widely used in analytical chemistry for decades to determine reaction mechanisms, trace compounds in biological systems, study metabolic pathways, and produce internal standards for quantification. The use of stable isotope ratios to track compounds in biological systems began to emerge in the early 1970s where in one early study drugs were labeled with a mixture of heavy and light isotopes so as to produce a doublet pair in their mass spectra with a ratio of approximately 1.1 By virtue of the unique twin-ion pairs produced by 14N:15N and 12C:13C labeling, mass spectra of the drugs and their metabolites were easily identified. This peak pair or “doublet signature” strategy has evolved to the point that it is now being used in proteomics to isotopically code and quantify the relative concentration of many analytes simultaneously in multiple samples. The GIST2 and ICAT3 labeling strategies both exploit the doublet signature approach to identify isoforms of peptides. Stable isotope quantification in proteomics is based on the fact that proteins are often characterized through their tryptic peptide fragments. By labeling proteins or peptides in different samples with isotopically distinct forms of a derivatizing agent, also known as a coding agent, it is possible to determine their relative concentration between samples.2-6 The protocol is to (1) differentially code analytes from samples according to their sample origin with isotopic isoforms of the coding agent, (2) mix the differentially coded samples, (3) fractionate the tryptic peptide mixture chromatographically, and (4) determine the ratio of peptide isoforms in chromatographic fractions by mass spectrometry. A wide variety of coding strategies are being used, * To whom correspondence should be addressed. E-mail: fregnier@ purdue.edu. 10.1021/pr0255304 CCC: $22.00

 2002 American Chemical Society

18O

labeling

ranging from those that target cysteine3 to amino acylation2 and the incorporation of 18O into carboxyl groups during peptide bond hydrolysis.5 Stable isotope coding is becoming increasingly important in comparative proteomics when the objective is to find differences in the concentration of proteins between samples. The comparison is most often made between a sample taken from an organism in a normal, or control state and one obtained from the same organism in a stimulated, or altered state. A disease, some external stimulus, a developmental change, or a mutation may have initiated passage into the altered state. By comparing the ratio of isotopically coded isoforms of a peptide taken from these two states, it is possible to determine those in which the concentration differs between the two samples. This strategy is now widely used in the analysis of protein expression. Analysis of protein expression with stable isotope coding is relatively easy because every peptide derived from the protein must have changed in concentration to the same extent. Any of the coding methods referred to above can be used to examine protein expression. But studying cellular regulation goes far beyond changes in expression, a wide variety of post-translational modifications are also involved. Post-translational modifications cause a change in structure, often at a single, or small number of amino acids in the protein. The issue with stable isotope coding strategies is whether every peptide in the tryptic digest of a proteome is suitably labeled for recognizing both changes in concentration and structure. Strategies that globally code all peptides will be of greatest utility. Two global coding strategies have been examined. One is to biosynthetically label all proteins in vivo using heavy isotope labeled nutrients.7,8 Although this is a truly global labeling Journal of Proteome Research 2002, 1, 443-450

443

Published on Web 07/16/2002

research articles strategy that works well, it suffers from the fact that it is of limited utility with human subjects. A second strategy is based on the fact that proteolysis produces amino groups and carboxyl groups that can be labeled, either during or after hydrolysis. In the case of amines, they are easily acylated with activated organic acids. It has been shown with tryptic peptides that the N-hydroxysuccinimide derivatives of 1H3- and 2H3acetic acid efficiently label primary amines in peptides and that mixtures of peptides from differentially labeled samples produce mass spectra with clusters of ions separated by 3, 6, or 9 atomic mass units (amu).2 Those separated by 3 amu are derived from C-terminal arginine-containing peptides while those separated by 6 amu are C-terminal lysine-containing peptides. Clusters separated by 9 amu are also obtained occasionally through missed trypsin cleavages at internal or penultimate lysine residues. Sample coding through acylation of primary amines has now been used to examine expression,2 phosphorylation,9-10 and glycosylation.11-12 A limitation of the acylation method is that it fails to label C-terminal argininecontaining peptides that do not have a derivatizable amino group at their N-terminus. These amino-terminally blocked peptides are relatively common in serum samples. Carboxyl group labeling is best achieved through incorporation of 18O from H218O during peptide bond hydrolysis.13 Peptides that have been labeled with 18O facilitate protein sequencing via mass spectrometry.14-16 Proteolytic 18O labeling was used as a quantitative method for comparative proteomics.5,17 One 18O is incorporated immediately during proteolysis in the case of both trypsin and chymotrypsin. But since a covalent ester linkage is formed between peptides and these enzymes during proteolysis and the reaction is slightly reversible, a second mole of 18O is slowly incorporated during repeated, reversible esterification. A serine residue in these enzymes forms an ester with the carboxyl groups at the C-terminus of peptide cleavage products. The rate of exchange can be structure specific.18 The majority of peptides are doubly labeled, but there are some peptides that are primarily singly labeled. Although labeling is not universally consistent from peptide to peptide, all the tryptic peptides carrying a basic amino acid at their C-terminus were labeled with at least one 18 O.18 Peptides with a C-terminal arginine residue bind more strongly to trypsin and exchange a second 18O into the C-terminal residue much faster than those with a C-terminal lysine.19 When urea is present at greater than 0.8 M, it inhibits the 18O exchange rate with either C-terminal arginine or lysine residues. Chymotrypsin shows similar behavior, but the rate of exchange is at least 1 order of magnitude slower than with trypsin.19 Using trypsin, this means that with incubation times of a few hours, exchange on C-terminal arginine residues will be complete and two 18O will have been incorporated. In contrast, exchange on C-terminal lysine residues will be incomplete and peptides containing both one and two 18O will be seen. The exchange rate with chymotryptic peptides is slower and will show incorporation of a single 18O almost exclusively. However, with incubation times of 24 h or more and high concentrations of chymotrypsin, it is expected that most peptides will contain two 18O. A limitation of this approach is that peptides derived from the C-terminus of proteins will generally not be labeled. This means that changes in the C-terminal peptide from a protein cannot be quantified. The relative concentration of H216O/H218O in the digestion buffer also affects labeling efficiency.18 In these studies, proteins were digested in the presence of roughly 95% H218O. Incorpo444

Journal of Proteome Research • Vol. 1, No. 5, 2002

Liu and Regnier

rated 18O into carboxyl groups of peptides has been reported to be stable during most chemical manipulations.5,13,18 The 18O-C bond in a carboxyl group can be broken and backexchanged with 16O in water,5 but under extremely acidic and basic conditions. It is seen in the discussion above that neither the amino acylation nor 18O labeling strategies are truly global. The amino acylation procedure cannot deal with N-terminally blocked peptides, and the 18O carboxyl labeling procedure fails to recognize peptides derived from the C-terminus of proteins. The objective of the work described in this paper is to test the hypothesis that combining the amino acylation and carboxyl labeling methods will produce a truly global labeling method for qualitative analysis of peptides that allows recognition of a larger number of structural features in peptides than either method alone.

Materials and Methods Materials. Bovine cytochrome c, chicken lysozyme, turkey lysozyme, ammonium bicarbonate, N-hydroxylamine, iodoacetic acid, cysteine, dithiothreitol (DTT), N-tosyl-L-lysyl chloromethyl ketone (TLCK), and all the reagents for trypsin digestion were purchased from Sigma (St. Louis, MO). All peptides were obtained from Bachem (Torrance, CA). Sequence grade trypsin was purchased from Promega (Madison, WI). Oxygen-18 enriched water (95-98 atom % 18O) was obtained from Isotech (Miamisburg, OH). N-Acetoxysuccinamide and 2H -C,1N-acetoxysuccinamide were synthesized as described 3 by Ji.2 Proteolysis. Cytochrome c from bovine was dissolved in 0.1 M ammonium bicarbonate buffer (pH 8.2). Trypsin was added in a 1:50 (w/w) ratio and the protein allowed to digest overnight at 37 °C. Digestion was stopped by adding TLCK in slight molar excess. Cytochrome c was trypsin digested in H216O and H218O buffer, respectively. Lysozymes from turkey and chicken were reduced and alkylated in 0.1 M ammonium bicarbonate buffer (pH 8.2) containing 6.2 M urea and 10 mM DTT. After 2-h incubation at 37 °C, iodoacetic acid was added to a final concentration of 20 mM and incubated in darkness on ice for an additional 2 h. Cysteine was then added to the reaction mixture to a final concentration of 40 mM and the reaction allowed to proceed at room temperature for 30 min. After dilution with 0.1 M ammonium bicarbonate buffer to a final urea concentration of 0.8 M, sequence grade trypsin (2%, w/w, enzyme to that of protein) was added and incubated for 24 h at 37 °C. Adding TLCK in slight molar excess stopped digestion. Lysozymes from turkey and chicken were digested in H216O and H218O buffer, respectively. Acetylation of Peptides. A 5-fold molar excess of Nacetoxysuccinamide and N-acetoxy-2H3-succinamide was added individually to the 1 mg/mL peptide solution of cytochrome c from bovine. The reaction allowed to proceed to for 4-5 h. N-Hydroxylamine was then added in excess, and the pH was adjusted to 11-12. Incubation with hydroxylamine was allowed to proceed for 10 min. The function of the hydroxylamine reaction was to hydrolyze esters that might have been formed during the acylation reaction. Equal aliquots of the two samples were mixed and purified on a C18 reversed-phase chromatography column. Reversed-Phase Chromatography of Isotopically Labeled Peptides. Isotopically labeled peptide mixtures were separated by gradient elution from a Vydac C18 column (4.6 mm × 250

research articles

Isotope Coding Strategy for Proteomics

Figure 1. Differential isotope labeling of peptides from control and experimental samples with both 18O and 2H labeling. Details of the reaction are described in the Materials and Methods.

mm) on an Integral Micro-Analytical Workstation (Applied Biosystems, Framingham, MA). The C18 column was equilibrated using 100% mobile phase A (1% ACN/0.01% TFA in ddI H2O) at a flow rate of 1 mL/min for two column volumes (CV). Isotopically labeled peptide mixtures were injected and eluted at a flow rate of 1 mL/min in a linear gradient ranging over 60 min from 100% mobile phase A to 60% mobile phase B (95% ACN/0.01% TFA in ddI H2O). At the end of this period, a second linear gradient was applied in 10 min from 60% B to 100% B at the same flow rate. The gradient was then held at 100% mobile phase B for an additional 10 min. Throughout the analysis, an on-line UV detector set at 214 nm was used to monitor separation of the peptide mixtures. The peptides were simultaneously monitored by ESI-MS by directing 4% of the flow into the mass spectrometer. ESI-MS Analysis. Mass spectral analyses were performed using a QSTAR workstation (Applied Biosystems, Framingham, MA) equipped with an ionspray source. All spectra were obtained in the positive-ion TOF mode at a sampling rate of one spectrum every 2 s. During LC-MS data acquisition, masses were scanned from m/z 300 to 1800. MALDI-TOF-MS Analysis. MALDI-TOF mass spectrometry was performed using a Voyger DE-RP BioSpectrometry Workstation (PE Biosystems, Framingham, MA). Peptides were prepared by mixing a 1 µL aliquot of a fraction with 1 µL of matrix solution. The matrix was a 10 mg/mL solution of

R-cyano-4-hydroxycinnamic acid in 50% water-50% ACN with 0.1% TFA. The mixture was spotted onto a well of the MALDI sample plate and allowed to air dry before being placed in the mass spectrometer. All peptides were analyzed in the reflective, positive-ion mode by delayed extraction.

Results and Discussion The hypothesis being tested in this research is outlined in Figure 1. Panel A in Figure 1 shows the treatment protocol used with control samples. Experimental samples were treated according to the protocol in panel B. It is seen that in panel B experimental samples with carboxyl groups generated during proteolysis will be labeled with 18O from H218O. Also, amino groups on lysine residues and at the amino terminus of peptides will be acylated with the N-hydroxysuccinimide (NHS) derivative of 2H3-acetate. An important point to note in Figure 1 is that for each acetate group incorporated into a peptide, the difference in molecular weight between the parent ions of the isoforms will increase by 3 amu. In contrast, each 18O incorporated will cause the molecular weight of a peptide to increase by 2 amu. It will be shown below that the number of amine and carboxyl groups in a peptide can be readily determined by differences in the molecular weight of peptide isoforms with this protocol. The technique of tagging amino and carboxyl groups at peptide termini for the purpose of protein identification in proteomics will be referred to as TACT Journal of Proteome Research • Vol. 1, No. 5, 2002 445

research articles

Liu and Regnier

Table 1. Molecular Weight Shift Predicted in Peptide Isoforms H label

mass shift

ended with K/R +4 +10

R/(no K) +3 one K +6 two K +9

+7 +10 +13

other C-terminal peptides

+0

R/(no K) +3 one K +6 two K +9

+3 +6 +9

N-terminal peptide with blocked N-terminus

+4

R/(no K) +0 one K +3 two K +6

+4 +7 +10

all others peptides

+4

R/(no K) +3 one K +6 two K +9

+7 +10 +13

tryptic peptides

C-terminal peptide

18

O label

2

in this paper. The mass shift between doublet clusters of ions arising from the TACT double labeling technique is shown in Table 1. Bovine cytochrome c (12 KDa, molecular mass) was chosen as a model protein because it is easy to digest, the N-terminus of the protein is acetylated, and the C-terminal peptide is sufficiently large to be captured during reversed-phase chromatography. After proteolysis and derivatization, peptides were fractionated by reversed-phase chromatography before mass spectral analysis. Matching molecular mass with theoretical values was used to identify peaks. Peptides Derived from the C-Terminus of a Protein. The mass spectrum of one of the peptides obtained from the tryptic digest of bovine cytochrome c showed peptide isoforms with parent ions at m/z 476 and 479 (Figure 2). On the basis of the mass difference between the parent ions, the mass difference between the labeling agents used in this protocol, and the scheme illustrated in Figure 1, a doublet cluster separated by 3 amu could only be obtained from a peptide containing one amino group and the C-terminus of the peptide could not have been generated during proteolysis because there is no evidence of 18O labeling. This means the C-terminal amino acid in this peptide could not be a basic amino acid. Without any further evidence, this is a strong indication this is the C-terminal peptide from cytochrome c. This is in fact the C-terminal ATNE peptide derived from cytochrome c after differential acetylation with 1H3- and 2H3-acetate, showing molecular weights of m/z 476 and 479, respectively. But another peptide in the mixture had similar properties (Figure 3). The mass difference between the parent ions in this case is 6 amu. This means that two amine groups were derivatized by 2H3-acetate and there was no incorporation of 18 O. It is most likely that one of the amino groups would be at the N-terminus of the peptide and the other would be on a lysine residue that is not at the C-terminus of the peptide because no 18O was incorporated. The fact that the peptide potentially contains a lysine residue and there was no incorporation of 18O suggests that the peptide was derived from the C-terminus of cytochrome c and that it contains a missed trypsin cleavage. This is in fact the case. The peptide KATNE has a molecular weight of 646 after acetylation, it has a lysine derived from a missed trypsin cleavage, and it is from the C-terminus of cytochrome c. Peptides Arising from the Interior of a Protein. A peptide labeling pattern characteristic of many peptides observed in tryptic digests is seen in Figure 4. The parent ions of the peptide isoforms at m/z 1210 and 1217 vary by 7 amu. The 7 amu 446

Journal of Proteome Research • Vol. 1, No. 5, 2002

Figure 2. Electrospray ionization (ESI) mass spectrum of Cterminal peptides differentially labeled with N-acetoxysuccinamide and N-acetoxy-2H3-succinamide. The parent ion clusters vary by 3 amu. The sequence of the peptide in panel A is ATNE.

Figure 3. Electrospray ionization (ESI) mass spectrum of Cterminal peptide KATNE with one miscleavage differentially labeled with N-acetoxysuccinamide and N-acetoxy-2H3-succinamide. The peptide isoforms are seen to vary by 6 amu in molecular weight.

Figure 4. MALDI-TOF mass spectrum of the arginine-containing peptide TGPNLHGLFGR. Note that the peptide doublet derived from the tryptic digest using the double labeling strategy is separated by 7 amu.

difference could only arise from a 3 + 2 + 2 labeling pattern; i.e., the peptide is labeled with one 2H3-acetate and two 18O. Because two 18O were incorporated there must be a basic amino acid at the C-terminus of peptide. The most likely candidate would be a peptide with an arginine residue at the C-terminus and a free amino group at the N-terminus. However, an

Isotope Coding Strategy for Proteomics

Figure 5. ESI mass spectrum of the peptide IFVQK derived from cytochrome c after tryptic digesting and double labeling. Note that the double cluster of ions is separated by 10 amu.

Figure 6. ESI mass spectrum of differentially labeled GKK peptide after tryptic digestion and double labeling. The doublet cluster was separated by 13 amu.

N-terminally blocked peptide with a C-terminal lysine residue or an N-terminally blocked peptide with a C-terminal arginine residue and a missed cleavage at a lysine residue would give the same results. On the basis of the molecular weight of the peptide and knowing it was derived from cytochrome c, the sequence must be TGPNLHGLFGRTGPNLHGLFGR. This was shown to be true by MS/MS. Many peptides were seen in which the difference between the peptide isoforms was 7 amu. This is because many tryptic peptides have a C-terminal arginine residue and a free amino group at the N-terminus. Peptides in which the peptide isoforms varied by 10 amu were also commonly seen (Figure 5). A difference of 10 amu would arise from 3 + 3 + 2 + 2; i.e., the peptide is labeled with two 18O and two 2H3-acetate. The most likely structure in this case would be a peptide with lysine at the C-terminus and a free amino group at the N-terminus. But there are also other possibilities as well. Another would be a peptide with arginine at the C-terminus, a free amino group at the N-terminus, and a missed cleavage at a lysine residue. Still another possibility would be an N-terminally blocked peptide with two missed cleavages at lysine residues, although this is very unlikely. The above peptide actually had the sequence IFVQK. This labeling pattern is common because many tryptic peptides have a C-terminal lysine residue and a free amino group at the N-terminus. A labeling pattern of the type seen in Figure 6 is seen far less frequently. The fact that the parent ions differ by 13 amu

research articles

Figure 7. ESI-TOF mass spectrum of chymotryptic peptide TGQAPGF labeled with 16O and 18O at the C-termini and Nacetoxysuccinamide and N-acetoxy-2H3-succinamide at the Ntermini. The labeled peaks were derived principally from the carbon-12 isotopic forms of the derivatized peptides after double labeling. Peptides appear as a doublet, separated by 5 amu.

indicates derivatization with groups of 3 + 3 + 3 + 2 + 2. Incorporation of two 18O means that the C-terminus of the peptide contains a basic amino acid. The fact that there are three derivatizable amino groups in the peptide makes it highly probable the C-terminal amino acid is lysine. The most likely way to account for the other two amino groups would be if one is at the N-terminus and the other is on a lysine residue at a missed cleavage site. This is the case here. The peptide at m/z 458 arises from the addition of three acetyl groups to GKK. Chymotryptic cleavages are also seen in trypsin digestion, either due to the presence of a chymotrypsin impurity or to nonspecific cleavage by trypsin. In either case, the rate of incorporation of a second 18O is very slow. When these peptides are examined by the protocol outlined in Figure 1, they show a unique labeling pattern. The spectrum in Figure 7 is from such a peptide. In this case, the parent ions of the peptide isoforms at m/z 719 and 724 vary by 5 amu. This indicates the incorporation of three deuterium atoms from one acetate and one 18O. Clearly, this labeling pattern arose from the incorporation of one 18O into the carboxyl of the peptide during proteolysis and a failure to incorporate the second because of low affinity for the enzyme. This is diagnostic for a peptide with a more hydrophobic amino acid at its C-terminus. Although this peptide has the structure TGQAPGF, this labeling pattern could also be obtained with an N-terminally blocked peptide with a missed cleavage at an internal lysine resides. A very similar labeling pattern with an identical explanation is seen for the peptide TGPNLH (Figure 8). Peptides Derived from a Blocked N-Terminus. Acylation of proteins at their amino termini is widespread in viruses, prokaryotes, and eukaryotes.20-22 Formyl, acetyl, pyruvoyl, R-ketobutyryl, glucuronyl, and pyroglutamyl groups are common derivatives at the N-terminus of proteins, in addition to fatty acids, glucose,20,21 and even carbon dioxide.23 N-Terminal peptides derived from these proteins by trypsin hydrolysis cannot be acylated in vitro with a coding agent unless they contain at least one lysine residue somewhere in the peptide. When the blocked N-terminal peptide of a protein contains arginine and no lysine, it cannot be further acylated and two 18 O alone will be incorporated at the C-terminus of the peptide. The N-terminal peptide from bovine cytochrome c is a good example. Because the N-terminus from this protein has the Journal of Proteome Research • Vol. 1, No. 5, 2002 447

research articles

Liu and Regnier

Figure 8. ESI mass spectrum of nonspecifically cleaved peptide from cytochrome c. The doublet was separated by 5 amu.

Figure 10. MALDI mass spectrum of N-terminal peptide AcSYSMEHFR. Peptides appear as a doublet, separated by 4 amu.

Figure 9. Electrospray ionization of ESI mass spectrum of N-terminal peptide Ac-GDVEK after tryptic cleavage and double labeling. The peptides appear as a doublet, separated by 7 amu.

Figure 11. Fractionation of coded peptide isoforms in reversedphase chromatography. The peptide sequence is IFVQK. Extracted ion chromatogram of heavily labeled peptide (9), nonheavily labeled peptide ([), isotope ratio (2).

sequence Ac-GDVEK, labeled isoforms of the tryptic peptide differ by 7 amu (Figure 9). This 7 amu difference arises from a 3 + 2 + 2 addition of heavy isotopes; i.e., the peptide is labeled with two 18O and one 2H3-acetate. Because the N-terminus of the peptide is blocked, only lysine is acetylated. In contrast, tryptic peptides with a single lysine derived from the interior of a protein (Figures 5) vary by 10 amu because the aminoterminus of the peptide may be derivatized as well. R-MSH is N-terminally acetylated, which produces an N-terminally blocked peptide with a C-terminal arginine residue, Ac-SYSMEHFR (Figure 10). There are no derivatizable amino groups in this peptide. Labeled isoforms of this peptide varied by 4 amu. It is seen by these two examples that N-terminally blocked peptides can be both labeled and recognized. Isotope Effects during Reversed-Phase Chromatography. It has been shown above that by combining the N-terminal and C-terminal derivatization methods 18O, 2H3, or both can be incorporated into any peptide derived from a protein by trypsin digestion. In the case of lysine-containing peptides derived from the interior of a protein, it is even possible to incorporate multiple 2H3-labeled acetyl groups along with two 18 O. It has been noted recently that labeled isoforms of peptides containing large numbers of deuterium atoms are partially resolved from nondeuterated species during reversed-phase chromatography.24 The same phenomenon was observed in these studies as seen in the case of IFVQK (Figure 11). This peptide is labeled with two 18O and two 2H3-acetyl groups, causing the isoforms to differ by 10 amu. The heavy isoform eluted approximately 6 s earlier than the nonlabeled isoform. 448

Journal of Proteome Research • Vol. 1, No. 5, 2002

It was observed that both 2H and 18O contributed to this isotope fractionation. This can cause a problem when isotope labeling is being used for quantification. Recent work has shown that by going to 13C labeling with either 13C4-succinate or 13C3propionate for quantitative analysis that isotope effects arising form 2H labeling can be eliminated.25 Structural Changes Resulting from a Mutation. Single nucleotide polymorphism is thought to be common in the genome of most organisms. For this reason, there will be some small degree of polymorphism in the proteome. The question addressed in this section of the paper is whether these structural differences will be recognizable by tagging amino and carboxyl termini (TACT) of peptides as described above. The requisite selectivity involved in producing the mass spectral “doublet signature” used to characterize peptides in TACT is obtained at three levels. One occurs during chromatographic fractionation. Isotopic isoforms of peptides of the same sequence will co-chromatograph when appropriately labeled, such as with 12C3- and 13C3-propionate on amine groups and 16 O and 18O at carboxyl groups. A second level of selectivity is obtained through derivatization. It has been shown above that peptides, which are identical in derivatizable functional groups, will differ in their doublet signature by the same number of amu. A third level of selectivity is peptide molecular weight. Peptide isoforms will differ in molecular weight by the difference in mass contributed by the labeling agents. Genetic differences within a population and the resulting protein polymorphism resulting from this genetic variation will

research articles

Isotope Coding Strategy for Proteomics

assumption in all these methods is that the doublet cluster of ions necessary for either quantitative or qualitative analysis of peptide isoforms is easily identifiable in mass spectra. For this to be true there should be no overlap between ions in the doublet cluster and other peptide ions in a sample. In the case of ICAT, doublet clusters are either 8 or 16 atomic mass units (amu) apart depending on the number of cysteine residues in the peptide. Doublet clusters in most of the GIST literature vary by either 3 or 6 amu. As sample complexity increases, the possibility that an ion from another peptide will fall within this 6-18 amu window increases. A tryptic digest containing 1 million peptides that has been fractionated with a reversedphase column having a peak capacity of 300 will theoretically have 3300 peptides per peak. Although a second dimension of mass spectrometry can be used to deconvolute overlapping peaks, samples of this complexity are obviously not good candidates for any type of differential isotope coding method. There will be a large amount of component overlap in the mass spectra. Substantially more chromatographic or electrophoretic fractionation would be need with this sample before differential coding methods would be useful. Thus, one of the major issues in how well a differential coding method will work with complex mixtures depends on the degree to which the sample has been fractionated before mass spectral analysis.

Figure 12. Peptides from chicken and turkey lysozyme with single amino acid variation. (a) Peptide FESNFNTQATNR from chicken labeled with 2H3-acetate labeling at the N-terminus and 18O at C-terminus of the peptides. (b) Peptide FESNFNTHATNR 2 from turkey labeled in the same way as (a).

cause differences in the amino acid sequence of a small number of peptides in peptide-based comparative proteomics. The question is whether the selectivity of the TACT method is sufficient to recognize these peptides. Turkey and chicken lysozyme were trypsin digested individually in H216O and H218O buffer, respectively. The digest from turkey lysozyme was then acetylated with N-acetoxysuccinamide while that from chicken lysozyme was acetylated with N-acetoxysuccinamide (2H3). Isotopes effects from 2H3-acetate labeling did not cause a problem during reversed phase chromatography in this case, although it is the intent to use 13 C3-propionate in future work to preclude this problem. The peptides FESNFNTQATNR (residues 34-45) from chicken and FESNFNTHATNR (residues 34-45) from turkey differ by the substitution of Q for H at residue 41. Mass spectra of these peptides appear as a single cluster (Figure 12a,b). Although the mass difference between these two doubly labeled peptides was only 2 amu, they were completely separated by the reversedphase column. Lysozymes from these two species vary at six sites in the primary structure. All were detected by TACT. This suggests that TACT will be of value in recognizing protein polymorphism in complex peptide mixtures. General Comments. Although the TACT labeling method described here works well with model proteins, there is the question of how well it will work with complex samples having thousands of components. This is actually a valid question for the entire set of differential coding, stable isotope-labeling methods currently being used in proteomics. A fundamental

A second issue in identifying doublet clusters of ions in complex spectra is how it will be done. It is common in the analysis of complex samples that 100 000 or more spectra may be taken. Each of these spectra can show a few to more than 50 components. Examining this number of spectra manually is not physically possible. Instrument companies have developed algorithms that (1) recognize ion clusters varying by 8 or 16 atomic mass units in the case of ICAT, (2) measure the isotope ratio between the molecular ions of peptide isoforms, and (3) calculate the relative concentrations of peptides between samples. Current commercial software does not allow researchers to select the mass difference between doublet clusters of ions being searched. This will be necessary before the TACT method can be widely used.

Conclusions On the basis of the data presented above, it is concluded that differential isotope coding using both 18O and 2H can label all the tryptic peptides derived from a protein, including C-terminal and N-terminal blocked peptides. This can be achieved by combining the C-terminal 18O (+2) labeling13 and the N-terminal acylation2 (+3) strategies. Since the mass shift after double labeling depends on the position of the carboxyl group in the parent protein and the number of lysine residues in a peptide, this method can be used to identify both C-terminal and N-terminal blocked peptide simultaneously. Moreover, it is a very powerful tool to identify protein polymorphism. After double labeling, all the peptides having the same sequence in the two proteins will appear as doublets separated by a specific mass difference, which depends on the number of lysines and the source of the peptides. Uniquely different peptides coming from only one protein will appear as a single cluster of ions. This allows the portion of the protein containing the polymorphism to be identified without sequencing the entire protein.

Acknowledgment. We gratefully acknowledge support from the National Institutes of Health (GM 59996). Journal of Proteome Research • Vol. 1, No. 5, 2002 449

research articles References (1) Bush, M. T.; Sekerke, H. J., Jr.; Vore, M.; Sweetman, B. J.; Watson, J. T. Labeling with nitrogen-15 or carbon-13 for identification of mass spectra of drugs and their metabolites. In Proceedings of a Seminar on the Use of Stable Isotopes in Clinical Pharmacology; Klein, P. D., Roth, L. J., Eds.; U.S. Atomic Energy Commission: Oak Ridge, TN, 1972; pp 233-237. (2) Ji, J.; Chakraborty, A.; Geng, M.; Zhang, X.; Amini, A.; Bina, M.; Regnier, F. E. J. Chromatogr. B 2000, 745, 197-210. (3) Gygi, S. P.; Rist, B.; Gerber, S. A.; Turecek, F.; Gelb, M. H.; Aebersold, R. Nat. Biotechnol. 1999, 17, 994-999. (4) Aebersold, R.; Goodlett, D. R. Mass Spectrometry in Proteomics. Chem. Rev. 2001, 101, 269-295. (5) Yao, X.; Freas, A.; Ramirez, J.; Demirev, P. A.; Fenselau, C. Anal. Chem. 2001, 73, 2836-2842. (6) Mirgorodskaya, O. A.; Kozmin, Y. P.; Titov, M. I.; Korner, R.; Sonksen, C. P.; Roepstorff, P. Rapid Commun. Mass Spectrom. 2000, 14, 1226-1232. (7) Oda, Y.; Huang, K.; Cross, F. R.; Cowburn, D. B.; Chait, B. T. Proc. Natl. Acad. Sci. U.S.A. 1999, 96, 6591-6596. (8) Conrads, T. P.; Alving, K.; Veenstra, T. D.; Beolv, M. E.; Anderson, G. A.; Anderson, D. J.; Lipton, M. S.; Pasa-Tolic, L.; Udseth, H. R.; Chrisler, W. G.; Thrall, F. D.; Smith, R. D. Anal. Chem. 2001, 73, 2132-2139. (9) Riggs, L.; Seeley, E.; Chen, H.; Regnier, F. E. J. Chromatogr. Submitted for publication. (10) Riggs, L.; Seeley E.; Regnier, F. E. Anal. Chem. Submitted for publication.

450

Journal of Proteome Research • Vol. 1, No. 5, 2002

Liu and Regnier (11) Geng, M.; Ji, J.; Regnier, F. E. J. Chromatogr. A 2000, 870, 295313. (12) Geng, M.; Zhang, X.; Bina, M.; Regnier, F. E. J. Chromatogr. B Biomed. Sci. Appl. 2001, 752, 293-306. (13) Schnolzer, M.; Jedrzejewski, P.; Lehmann, W. D. Electrophoresis 1996, 17, 945-953. (14) Takao, T. H. H.; Okamoto, K.; Harada, A.; Kamachi, M.; Shimonishi, Y. Rapid Commun. Mass Spectrom. 1991, 5, 312-315. (15) Shevchenko, A.; Chernushevich, I.; Ens, W.; Standing, K. G.; Thomson, B.; Wilm, M.; Mann, M. Rapid Commun. Mass Spectrom. 1997, 11, 1015-1024. (16) Kosaka, T.; Takazawa, T.; Nakamura, T. Anal. Chem. 2000, 72, 1179-1185. (17) Reynolds K. J.; Yao, X.; Fenselau, C. J. Proteome Res. 2002, 27-33. (18) Stewart I. I.; Thomson T.; Figeys, D. Rapid Commun. Mass Spectrom. 2001, 5, 2456-2465. (19) Yao, X.; Reynolds, K.; Fenselau, C. The Pittsburgh Conference on Analytical Chemistry and Applied Spectroscopy, 2002. (20) Driessen, H. P. C.; De Jong, W. W.; Tesser, G. I.; Bloemendal, H. CRC Crit. Rev. Biochem. 1985, 18, 281-325. (21) Jornvall, H.; Pietruszko, R. Eur. J. Biochem. 1972, 25, 283-290. (22) Berger, D.; Berger, M.; Von Wartburg, J. P. Eur. J. Biochem. 1974, 50, 215-225. (23) Thatcher, D. R. Biochem. J. 1980, 187, 875-883. (24) Zhang, R.; Sioma, C. S.; Thompson, R. A.; Xiong, L.; Regnier, F. E. Anal. Chem. Submitted for publication. (25) Zhang, R.; Regnier, F. E. J. Proteome Res. 2002, 1, 139-147.

PR0255304