Novel Approach for Peptide Quantitation and Sequencing Based on

Ambrosius P. L. Snijders, Marjon G. J. de Vos, and Phillip C. Wright ... Julie Hardouin , Jean-Pierre Ele Ekouna , Pascal Cosette , Patrice Lerouge , ...
1 downloads 0 Views 285KB Size
Novel Approach for Peptide Quantitation and Sequencing Based on 15 N and 13C Metabolic Labeling Ambrosius P. L. Snijders, Marjon G. J. de Vos, and Phillip C. Wright* Biological and Environmental Systems Group, Department of Chemical and Process Engineering, University of Sheffield, Mappin Street, Sheffield S1 3JD, United Kingdom Received December 2, 2004

Here we describe a method for protein identification and quantification using stable isotopes via in vivo metabolic labeling of the hyperthermophilic crenarchaeon Sulfolobus solfataricus. Stable isotope labeling for quantitative proteomics is becoming increasingly popular; however, its usefulness in protein identification has not been fully exploited. We use both 15N and 13C labeling to create three different versions of the same peptide, corresponding to the unlabeled, 15N and 13C labeled versions. The peptide then appears as three different peaks in a TOF-MS scan and three corresponding sets of MS/MS spectra are obtained. With this information, the elemental carbon and nitrogen compositions for each peptide and each fragment can be calculated. When this is used as a constraint in database searching and/or de novo sequencing, the confidence of a match is increased (for an example intact peptide from 34 choices to 1). This makes the method a useful proteomic tool for both sequenced and unsequenced organisms. Furthermore, it allows for accurate protein quantitation (standard deviations over >4 peptides per protein were within 10%) of three phenotypes in one MS experiment. Abundances for each peptide are calculated by determining the relative areas of each of the three peaks in the TOF-MS spectrum. Keywords: mass spectrometry • proteomics • de novo sequencing • stable isotope labeling • bioinformatics

Introduction In recent years, stable isotope labeling has become an important tool for peptide and protein quantification1,2. Traditionally, proteins are quantified on the basis of their densitometric intensities on 2-DE gels using protein specific stains such as coomassie blue, silver staining, or fluorescent stains. However, increased interest in gel-free proteomics in combination with improvements in mass spectrometry has increased the popularity of protein quantitation on the basis of stable isotopes. Generally, the introduction of stable isotopes does not affect the physicochemical properties of a protein. Therefore, no distinction is made between a heavy and light version of the same protein during the processes of cell growth, protein extraction, separation, proteolytic digestion, and peptide fragmentation. It is only in the last step of the process, mass spectrometry, that a distinction can be made between the heavy and light version of the same peptide. Three main methods exist for introducing stable isotopes into proteins or peptides; chemical, enzymatic, and metabolic incorporation. The chemical method uses a stable isotope coded derivate that specifically attaches to one of the active groups on a peptide. The best known example of this method is the ICAT method developed by Gygi and co-workers.3 In this method, the reagent selectively reacts with cysteinyl residues with either a heavy or a light version of an isotope coded linker. To this linker, a biotin moiety is attached that allows for the affinity purification of the labeled peptides. In the older versions of the ICAT label, the linker was labeled with deuterium.3 578

Journal of Proteome Research 2005, 4, 578-585

Published on Web 03/04/2005

Unfortunately, this lead to a chromatographic shift during the process of peptide separation that severely complicated the quantitation process. In the newer version of ICAT, the heavy deuterium label is replaced by a 13C label in order to prevent chromatographic shifts. 4 Isotope labels based on carbon, nitrogen or oxygen generally do not lead to chromatographic shifts. 1 Enzymatic labeling is achieved through enzymatic protein hydrolysis in the presence of heavy water (H218O).5,6 In this case, an 18O label is introduced at the C-terminus of each peptide. Metabolic labeling is achieved by growing an organism in the presence of a stable isotope labeled nitrogen or carbon source such as 15N ammonium sulfate or 13C labeled glucose.7,8 This technique is best applicable to unicellular organisms that can be easily cultured in the lab. However, recently, metabolic labeling of multicellular organisms such as C. elegans, D. melongaster,9 and R. norvegicus10 was achieved. A disadvantage of the metabolic labeling approach compared to chemical and enzymatic labeling is that it is not applicable to human tissue samples. Despite this, metabolic labeling has several advantages compared to enzymatic and chemical labeling. For example, by introducing the label as early on in the process (at the cell growth stage), experimental variation is reduced. Moreover, the biological activity of the protein is retained. Therefore, metabolically labeled cell extracts can be subjected to activity assays and purification. In addition, we show here that with metabolic labeling, sequence information can be obtained that could not be found using chemical and enzymatic labeling. Although the usefulness of metabolic labeling in protein identification has recently been recognized by other researchers,11 this is the first 10.1021/pr0497733 CCC: $30.25

 2005 American Chemical Society

15N

and

13C

research articles

Metabolic Labeling

extraction, separation, tryptic digestion, and MS analysis only have to be performed once.

Experimental Section

Figure 1. Experimental workflow. A peptide appears as three distinct peaks in the MS-spectrum. MS/MS is performed on each of the three peaks. This allows for the calculation of the number of nitrogens and carbons for both the intact peptides and its fragments. Unl. ) unlabeled.

report in which a dual procedure based on both nitrogen and carbon labeling is used. For this study, the hyperthermophilic crenarchaeon Sulfolobus solfataricus was chosen as a model organism to demonstrate our approach. This extraordinary microorganism grows optimally at a temperature of 80 °C and a pH of 2.0.12 Under these conditions cells were grown on normal medium, 15N ammonium sulfate containing or 13C glucose containing medium. In this way, three proteomes are created that are identical in composition, but differ in their mass depending on the medium they were grown in. In the next step, the three cell cultures are mixed in equal amounts to ensure an equal abundance of unlabeled, 15N labeled and 13C labeled proteins in the mixture (Figure 1). We show that, after protein extraction, protein separation using 2-DE and tryptic digestion, each peptide becomes visible in the MS spectrum in three distinct peaks (unlabeled, 15N and 13C version). For each of these peaks a MS/MS spectrum is acquired. The important consequence is that it becomes possible to determine unambiguously the elemental carbon and nitrogen compositions of both intact peptides and fragments without the need of highly accurate mass measurements. We demonstrate that this information can be used to apply a CN constraint in database searching and de novo sequencing that improves the confidence of peptide matches. Moreover, the relative abundance for each peptide (unlabeled vs labeled) can be calculated by determining the ratio between their peak areas. This means that using this method, protein expression can be quantified in a single MS experiment for three different phenotypes. This reduces the experimental time considerably, since the processes of protein

Figure 1 provides an overview of the experimental workflow as described in this section. Cell Growth and Harvest. Sulfolobus solfataricus P2 cells were grown aerobically in 50 mL cultures at a temperature of 80 °C and pH 4.0. The composition of the medium was: (NH4)2SO4, 2.5 g/L; KH2PO4, 3.1 g/L; MgCl2‚6H2O 203.3 mg/L; Ca(NO3) 2‚4H2O, 70.8 mg/L; FeSO4‚7 H2O, 2 mg/mL; MnCl24 H2O 1.8 mg/L; Na2B4O7‚H2O, 4.5 mg/L; ZnSO4‚ H2O, 0.22 mg/ L; CuCl2‚2 H2O, 0.06 mg/L; Na2MoO4‚2H2O, 0.03 mg/L; VOSO4‚ 2 H2O, 0.03 mg/L; CoCl2‚6 H2O, 0.01 mg/L, 25 µL of Wolfe’s vitamins,13 and 0.4% (w/v) glucose. Cell growth was followed by measuring the optical density at 530 nm (OD530) using an Ultrospec 2100 Pro (Amersham Biosciences). In the case of the 15 N labeling experiment, the exact protocol for cell growth was followed, but (15NH4)2SO4 was used as the nitrogen source instead. In the case of the 13C labeling experiment, normal (14NH4)2SO4 was used but fully labeled 13C glucose was used as the carbon source. To allow the 15N and the 13C label to incorporate fully, cells were incubated with the enriched media for at least eight doubling times. S. solfataricus cells that were incubated with isotopically enriched media showed the same growth characteristics compared to cells that were incubated with unlabeled medium. After this, three cell cultures were set up in parallel on the three different media (nine flasks). When the optical density reached a value of 1.0, the three different cultures were mixed. To ensure that equal amounts of biomass were mixed, slight corrections in volume were made in case the OD530 was not exactly 1.0. Next, cells were pelleted by centrifugation at 16 060 × g, washed twice with a 10 mM Tris/ HCl Buffer (pH ) 7) and stored at -80 °C. Preparation of Cell Extracts. Thawed cells were immediately resuspended in 1.5 mL of 10 mM Tris/HCl buffer (pH ) 7) containing 25 µL of a protease-inhibitor cocktail (Sigma). Next, cells were disrupted by sonication for 10 min on ice (“Soniprep 150”, Sanyo). Cell walls and unbroken cells were removed by centrifugation at 16 060 × g for 10 min. The protein concentration of the supernatant was determined using the Bradford Protein Assay (Sigma). The supernatant was subsequently stored at -80 °C. 2-DE. The extract was mixed with a rehydration buffer containing 50 mM DTT (Sigma), 8 M urea (Sigma), 2% CHAPS (Sigma), 0.2% (w/v) Pharmalyte ampholytes pH 3-10 (Fluka) and Bromophenol Blue (trace) (Sigma). This mixture was designated as the sample mix. Each IPG strip (pH 3-10) (BioRad) was rehydrated with 300 µL (400 µg of proteins) of the sample mix. Strips were allowed to rehydrate overnight. IEF was performed using a three-step protocol at a temperature of 20 °C using a Protean IEF cell (Bio-Rad). In the first step, the voltage was linearly ramped to 250 V over 30 min to desalt the strips. Next, the voltage was linearly ramped to 1000 V over two and a half hours. Finally, the voltage was rapidly ramped to 10 000 V for 40 000 V‚hours to complete the focusing. Focused strips were first incubated for 15 min in a solution containing 6 M urea, 2% SDS, 0.375 M Tris-HCl (pH 8.8), 20% glycerol, and 2% (w/v) DTT and then in a solution containing 6 M urea, 2% SDS, 0.375 M Tris-HCl (pH 8.8), 20% Glycerol, and 4% Iodoacetamide. After equilibration, proteins were separated in the second dimension using SDS-PAGE performed using a Protean II Multicell (Bio-Rad) apparatus on 10% T (% Journal of Proteome Research • Vol. 4, No. 2, 2005 579

research articles

Snijders and Wright

Figure 2. TOF-MS spectrum of the example peptide IEQGEKPANIVLLR. Three different versions of the peptide are visible. The unlabeled version of the peptide at 527.38, the 15N version at 534.02 and the 13C version at 550.79. With this information, the number of nitrogen and carbon atoms in this peptide can be calculated. The number of nitrogen atoms in this peptide can be calculated by determining the difference in mass between the unlabeled and 15N peptide. In this case, the number of nitrogen atoms is (534.02-527.34) × 3 ) 20. (The peptide charge is three). The number of carbon atoms in this peptide is (550.79-527.38) × 3 ) 70.

acrylamide + % N,N-methylenbisacrylamide), 2.6% C (% N,NMethylenbisacrylamide. 100/T) gels (17 cm × 17 cm × 1 mm). Electrophoresis was carried out with a constant current of 16 mA/gel for 30 min, subsequently the current was increased to 24 mA/gel for another 7 h. Gels were stained with Coomassie Brilliant Blue G250 (Sigma). Gels were scanned using a GS-800 densitometer (Bio-Rad) at 100 microns resolution and spot detection was performed with PDQUEST 7.1.0 (Bio-Rad). Approximately, 500 spots were visualized using Coomassie blue G250 from triplicate gels. Protein Isolation and Identification by MS. Spots of interest determined from a previously composed 2-DE map of S. solfataricus (data not shown) were excised from the gel with a scalpel and destained with 200 mM ammonium bicarbonate with 40% acetonitrile. The gel pieces were incubated overnight in a trypsin solution of 0.4 µg trypsin (Sigma) and 50 µL of 40 mM ammonium bicarbonate in 9% acetonitrile. The next day, peptides were extracted in four sequential extraction steps using 5 µL of 25 mM NH4HCO3 (10 min, room temperature), 30 µL acetonitrile (15 min, 37 °C), 50 µL of 5% formic acid (15 min, 37 °C) and finally with 30 µL acetonitrile (15 min, 37 °C). All extracts were pooled and dried in a vacuum centrifuge. The lyophilised peptide mixture was resuspended in 0.1% formic acid (FA) in 3% acetonitrile (ACN). This mixture was separated on a PepMap C-18 RP capillary column (LC Packings, Amsterdam, The Netherlands) and eluted in a 30-minute gradient via a LC Packings Ultimate nanoLC directly onto the mass spectrometer. The compositions of the hydrophilic and hydrophobic solvents were 5% ACN, 0.1% FA and 95% ACN, 0.1% FA. An Applied Biosystems QStarXL electrospray ionization quadrupole time-of-flight tandem mass spectrometer (ESI qQ-TOF) was used for mass spectrometric analysis. Analyst Qs software (Applied Biosystems) was used for data acquisition and data analysis. The data acquisition on the MS was performed in the positive ion mode using Information Dependent 580

Journal of Proteome Research • Vol. 4, No. 2, 2005

Acquisition (IDA). After each TOF-MS scan, three peaks with charge states two or three were selected for tandem mass spectrometry. Peak areas in the TOF-MS scans were calculated using the LC-MS reconstruct tool in Analyst. IDA data were submitted to Mascot 2.0 for database searching in a sequence query type of search (www.matrixscience.com). The peptide tolerance was set to 2.0 Da and the MS/MS tolerance was set to 0.8 Da. A carbamidomethyl modification of cysteine was set as a fixed modification and methionine oxidation was set as a variable modification. One missed cleavage site by trypsin was allowed. The search was performed against the Mass Spectrometry protein sequence DataBase (MSDB). In total, 313 peptides were identified corresponding to 72 different proteins.

Results and Discussion TOF-MS Spectra. Figure 2 shows an example TOF-MS spectrum in which three series of peaks are visible. Mascot assigned a score of 40 to a peptide with the sequence IEQGEKPANIVLLR on the basis of the MS/MS spectrum of the peak at m/z 527.38 (corresponding to a phosphoglycerate mutase from Sulfolobus solfataricus).The elemental composition of this peptide is C70H122N20O21. On the basis of this composition, the 15 N and 13C “mono-isotopic peaks” are 20 and 70 Da heavier, respectively. Taking into account that the charge of the peptide is three, the 15N and 13C peptides should appear at m/z 534.03 and 550.7, respectively. This indeed corresponds to the two remaining series of peaks in the mass spectrum, and therefore evidence is provided that they indeed represent the 15N and 13 C version of the same peptide (additional evidence is provided by their MS/MS spectra in Figure 5, discussed later). However, in the isotope series of the 15N and 13C peptides, peaks are observed that are lighter than the “mono-isotopic” peak. This is due to small quantities of “light” 14N and 12C that are always present in the media. Because of this, the labeling efficiency will never reach 100%. The result is that a peak is observed at

15N

and

13C

Metabolic Labeling

research articles

Figure 3. Labeling efficiency. (A) 15N labeling efficiency. (B) 13C labeling efficiency. The sum of the squares between theoretical and experimental areas was plotted against labeling efficiencies between 97 and 100%. The experimental labeling efficiency was calculated by determining the minimum in each plot. The labeling efficiency was 98.8% ( 0.2% for the 15N labeled peptides and 98.6% ( 0.2% for the 13C labeled peptides.

Figure 4. Carbon vs nitrogen plot for all theoretical tryptic peptides in S. solfataricus that have a mass of 1579.12 ( 1.0 Da. A number next to a data point indicates how many times that particular combination was found. In this example, 30 different combinations of carbon and nitrogen were found. Four of those combinations occurred twice.

-1 Da from the 15N-mono-isotopic peak. This peak mainly consists of peptides that contain one light, 14N atom. A similar situation occurs for the 13C peak. In this case, abundant peaks at -1, -2, and -3 Da from the 13C-mono-isotopic peak are also observed (Figure 2). This is because a peptide always contains more carbon then nitrogen atoms and therefore the chance that a “light” atom is incorporated into the 13C version of the peptide is greater than for the 15N version of the peptide. Another consequence of incomplete labeling is that both labeled “mono-isotopic” peaks do not represent true monoisotopic peaks since these peaks do not solely exist of one isotope form for each element. For example, the mass of a peptide that contains one 14N atom and one atom 13C atom equals the mass of the “mono-isotopic” 15N peak. The correct assignment of the labeled “mono-isotopic” peaks is essential for the correct determination of the number of nitrogen and carbon atoms. The exact labeling efficiency can be calculated on the basis of relative areas of each peak in the isotope distribution pattern. This was done for the example peptide and 2 other peptides using IsoPro 3.0 software (http://members.aol.com/msmssoft/). With IsoPro 3.0, theoretical mass spectra can be calculated. Moreover, this software allows for the manipulation of isotope abundances and calculates the theoretical relative abundances of each peak in an isotopic

series. First, the theoretical areas of each peak in the isotope series were calculated for incorporation between 97 and 100%. Then, the errors between the theoretical and experimental areas were determined. Finally, the squares of the errors were summed and the labeling efficiency was calculated by determining the minimum (Figure 3). The labeling efficiency was 98.8% ( 0.2% for the 15N-labeled peptides and 98.6% ( 0.2% 13C-labeled peptides. With these values, theoretical isotope distribution patterns of both intact peptides and fragments can be calculated. An example is given in Figure 6 for the y fragment GEKPANIVLLR. Next, the CN composition was calculated experimentally and theoretically for a set of 100 peptides, and it was found that the calculations were always in agreement. Moreover, it was found that for tryptic peptides and fragments, the monoisotopic peak generally was the most intense peak in the isotope series. This simplified the manual inspection of the spectra (Figures 2, 5, and 6). The CN composition can now be used as a constraint in database searching, since it is now possible to calculate the CN composition of each intact peptide unambiguously. For example, Mascot retrieved four candidate peptides with scores >30 for a MS/MS spectrum corresponding to a peak with a parent m/z of 572.41. The CN composition for this peptide was calculated on the basis of the TOF-MS spectrum as C50N15. Next, the theoretical CN compositions of the four matches was calculated, C48N15 for ANGLDLPEGVR, C50N14 for AILGDDVLIGR, C46N20 for AAAGKAGRPQGR and C50N15 for VIGAIIDNNIGR. If the CN constraint of C50N15 is applied, only the latter peptide remains as a match. In this case, this is most likely the correct peptide since it matches to 2,3biphosphoglycerate mutase from S. solfataricus. Note that this peptide could only be assigned on the basis of the combined CN constraint, applying only a C constraint or a N constraint, did not result in a single match. The tool that is now available allows for a very accurate calculation of the number of carbons and nitrogens in a peptide without the requirement of ultrahigh quality mass spectra (Another method to estimate the elemental composition is to analyze the relative abundance of the isotope peaks, but requires highly accurate and resolved mass spectra that are difficult to obtain for real samples). To examine the usefulness of applying a CN constraint, an in silico digestion of the complete S. solfataricus proteome was performed and the elemental compositions were calculated Journal of Proteome Research • Vol. 4, No. 2, 2005 581

research articles

Snijders and Wright

Figure 5. MS/MS spectra of the example peptide IEQGEKPANIVLLR. Top, middle and bottom spectra represent the MS/MS fragmentation spectra corresponding to the unlabeled, 15N and 13C versions of this peptide. With this information the number of nitrogen atoms and carbon atoms can be calculated for each of the fragments. For example the fragment at m/z 215.14 is shifted to m/z 217.14 and to m/z 225.17 in the 15N and 13C spectra, respectively. This means that this fragment contains 2 nitrogen atoms and 10 carbon atoms. * Fragments with two charges. ** Intact peptide.

using Proteogest software.14 In total, 76 949 tryptic peptides were obtained from 2995 ORFs. Of these, 55 852 tryptic peptides had a unique sequence (the minimum peptide length was set to three residues). Next, these peptide sequences, masses, carbon and nitrogen compositions were stored in a database that was used to investigate the applicability of the CN constraint for the example peptide IEQGEKPANIVLLR. The experimentally determined mass of this example peptide was 1579.12 Da. When a mass tolerance of 1.0 Da is applied, 34 candidate tryptic peptides are retrieved from the database. In Figure 4, the carbon and nitrogen compositions of each of those 34 peptides are plotted against each other. In total, 30 different combinations of carbon and nitrogen were found. Only four of them occurred twice (labeled as “2” in Figure 4). The combination of 70 carbon atoms and 20 nitrogen atoms is unique. From this it is clear that the example peptide IEQGEKPANIVLLR can be uniquely identified from a database containing all theoretical tryptic peptides for S. solfataricus using a mass tolerance of 1 Da and applying a CN constraint. MS/MS Spectra. Despite the fact that the number of sequenced genomes is steadily increasing, there is a great need for proteomic tools that can be used for de novo sequencing. In de novo sequencing, a peptide is fragmented and the amino acid sequence can be derived from the fragmentation pattern.15,16 In the past, a number of methods have been developed that facilitate de novo sequencing.17,18 Usually, the peptide is specifically derivatized at the N or C terminus with a fragmentation directed reagent in order to identify fragments belonging to the y or b series.19,20 Researchers have also experimented with isotopically labeled derivatives that allow for quantitation of peptides on the basis of their relative abundance in the MS 582

Journal of Proteome Research • Vol. 4, No. 2, 2005

spectra.17 Unfortunately, there are some disadvantages to the derivatization techniques. In general, they are laborious and require high labeling efficiencies, especially when quantitation is desired. Here, we demonstrate a method for facilitated de novo sequencing that is based on in vivo metabolic labeling rather than on chemical derivatization. Rather, it is based on the ability to determine the CN composition of peptide fragments and therefore to apply a CN constraint. Figure 5 shows the MS/MS spectra of peaks 527.37, 534.02, and 550.79 from Figure 2. From this, it is clear that although corresponding 14N, 15N, and 13C fragments have different m/z values, the fragmentation pattern is not affected by the introduction of stable isotopes. It is now possible to calculate the CN composition for each fragment. For example, the mass of the fragments with a mass of 175.1 Da is increased by four Da to 179.1 in the case of the 15N version, and six Da to 181.2 in the case of the 13C version. This means that this fragment is composed of 4 nitrogen atoms and 6 carbon atoms, which corresponds exactly to the composition of the amino acid arginine. Next, the peptide can be confidently de novo sequenced via its y-series starting at the 175.1 arginine peak. Each additional amino acid must comply to the CN constraint as determined from the fragmentation pattern. In fact, the CN constraint can be applied to any fragment generated (e.g., b-series, or immonium ions, Figure 5). In the example peptide IEQGEKPANIVLLR, all 14 fragments in the y-series were found by manual inspection of the spectrum, and the CN constraint was successfully applied in all cases (Figures 5 and 6 and Table 1). Figure 6 shows the isotope distribution pattern of the unlabeled, 15N version and 13C version of the y-fragment GEKPANIVLLR. This clearly has a strong implication for valida-

15N

and

13C

research articles

Metabolic Labeling

Table 1. Fragment Ion Sequences Corresponding to the Example Peptide IEQGEKPANIVLLRa sequence

y

y++

C

N

confirm C

confirm N

I E Q G E K P A N I V L L R

1466.83 1337.79 1209.67 1152.71 1023.67 895.57 798.52 727.48 613.44 500.36 401.29 288.2 175.12

733.92 669.4 605.37 576.86 512.34 448.29 399.76 364.24 307.22 250.68 201.15 144.61 88.06

70 64 59 54 52 47 41 36 33 29 23 18 12 6

20 19 18 16 15 14 12 11 10 8 7 6 5 4

yes yes yes yes yes yes yes yes yes yes yes yes yes yes

yes yes yes yes yes yes yes yes yes yes yes yes yes yes

a Masses in bold indicate fragments that were found in the MS/MS spectrum of the unlabeled peptide from Figure 5. For each fragment, the correct number of carbon and nitrogen atoms were determined using the MS/MS spectra of the 15N labeled and 13C labeled peptides from Figure 5.

Figure 6. Experimental (left) and theoretical (right) spectra of the y-ion GEKPANIVLLR (z ) 2). (A) unlabeled. (B) 15N labeled. (C) 13C labeled. The experimental spectra correspond to zoom regions in the MS/MS spectra depicted in Figure 5. For the calculation of the theoretical spectra, labeling efficiencies of 98.8% for the 15N and 98.6% for the 13C peptides were used in Isopro 3.0. Without any prior sequence or elemental composition information, the CN composition could already be calculated, since for tryptic peptides of this size, the most intense peak in each spectrum represents the mono-isotopic peak. C ) (632.51605.43) × 2 ) 54, N ) (613.40-605.43) × 2 ) 16. This corresponds exactly to the elemental composition of the fragment GEKPANIVLLR (C54H96N16O15).

tion of de novo sequencing efforts, since we have already shown the utility of this approach for intact peptides. Moreover, with this approach it is easy to distinguish between a number of amino acid combinations that are very hard to distinguish on the basis of their mass. Examples of this are (I) PH (C11N4) and AY (C12N2) and (II) PF (C14N2) and LM (C11N2). Peptide Quantitation. Quantitation in stable isotope labeling experiments occurs on the MS by comparing the relative abundances of the heavy and light version of the same peptide. Whereas protein quantitation in 2-DE based proteomics depends on one protein resolution on a gel, accurate peptide

quantitation on the basis of stable isotope labeling depends on resolution of both the heavy and light peptide in mass (in the case of MALDI) or in mass and time (in case of LC-ESIMS/MS). A disadvantage of the stable isotope labeling technique is that the sample becomes considerably more complex by the introduction of heavy and light forms of a peptide. With regards to the MS analysis, a mixture of 14N, 15N, and 13C containing peptides will be three times as complex as an unlabeled digest. Whether a peptide will be resolved in the mass spectrum strongly depends on the number of pre-MS fractionation steps that were applied. Here, we used a 2-DE protein separation step followed by a peptide separation step on a C-18 reversed phase column. With this approach, a sufficient resolution of the peptides in the MS-spectra was achieved even if multiple proteins per 2-DE spot were found. This illustrates the superiority of this approach over gel-based densitometric quantitation, even when three different phenotypes are simultaneously quantified. To integrate peak areas over time in the TOF-MS spectra, the LC-MS reconstruction tool in the Analyst software program was applied. In addition, an extracted ion chromatogram (XIC) was constructed for each peptide. The XIC is an ion chromatogram that shows the intensity values of a single mass (peptide) over a range of scans. This tool was used to check for chromatographic shifts between heavy and light versions of the same peptide. In each case, the XIC chromatograms overlaid perfectly for the three different versions of the peptide confirming that they coelute. Therefore, it can be concluded that the introduction of stable isotopes did not cause a chromatographic shift. The relative abundances of the example peptide from Figure 2 were as follows: 15N/unlabeled ) 0.94 and 13C/ unlabeled ) 0.86. Since multiple peptides corresponding to the same protein (phosphoglycerate mutase from S. solfataricus) were found, the accuracy of the quantitation can be improved by averaging the different ratios. For this protein, the average ratios and standard deviations were 15N/unlabeled ) 0.94 ( 0.028 and 13C/unlabeled ) 0.83 ( 0.043. The standard deviations in this case are a measure of the experimental error, since the peptide composition of a protein is static. Table 2 summarizes the quantitation performed on five different proteins. For each protein the standard deviation was less than 10% indicating the accuracy of the method (on average 4.7% for the 15 N/unlabeled and 5.0% for the 13C/unlabeled). This compares Journal of Proteome Research • Vol. 4, No. 2, 2005 583

research articles

Snijders and Wright

Table 2. Peptide Quantitationa unl. area

YLDWLIR ASAELDSLFSTFEK LLGIYLPIGAQNK VEVPDRVYFLGL

unl. area

15N area

13C area

Hypothetical protein SSO1393 10.52 9.51 8.02 25.94 24.11 21.92 21.51 20.89 17.94 10.52 9.51 8.02

avb stdvb Proliferating cell nuclear antigen putative homologue (PCNA-like) YVAFLMK + Oxidation M 16.57 17.06 15.34 GQVEQLTEPK 25.72 27.26 22.04 AEKGQVEQLTEPK 46.59 50.45 45.8 ATIELTETDSGLK 13.58 16.23 14.54 avb stdvb Serine-pyruvate aminotransferase (agxT) AVEEVLFSAR 18.38 18.2 16.31 EFVEALAYSLK 13.86 14.07 12.03 ALGSAAGLGLLLLSPK 21.84 22.86 18.99 RPESYSNTVTGVILK 16.35 16.98 14.37 avb stdvb Hypothetical protein metE-2 DLVFDLAK 23.5 26.89 21.18 IIQIDEPALHTR 160.4 167.42 144.46 IYNYKPLELLK 112.42 110.65 87.86 RDEMVEFFAER 33.71 38.38 30.51 avb stdvb Probable 2,3-bisphosphoglycerate-independent phosphoglycerate mutase AAAVSATALIK 10.52 9.51 8.02 LPPFSSYTK 25.94 24.11 21.92 VIGAIIDNIGR 21.51 20.89 17.94 IEQGEKPANIVLLR 15.71 14.77 13.52 avb stdvb

15N/unl.

13C/unl.

0.90 0.93 0.97 0.94 0.94 0.028

0.76 0.85 0.83 0.86 0.83 0.043

1.03 1.06 1.08 1.20 1.09 0.072

0.93 0.86 0.98 1.07 0.96 0.090

0.99 1.02 1.05 1.04 1.02 0.025

0.89 0.87 0.87 0.88 0.88 0.009

1.14 1.04 0.98 1.14 1.08 0.077

0.90 0.90 0.78 0.91 0.87 0.060

0.90 0.93 0.97 0.94 0.94 0.028

0.76 0.85 0.83 0.86 0.83 0.043

a The relative abundance of each peptide was calculated by determining the ratio of their peak areas. Areas of at least four different peptides per protein were calculated. Average ratios and standard deviations are shown in order to illustrate the accuracy of the approach. Unl.) unlabeled. b Where av ) average and stdv ) standard deviation.

well with standard deviations found using ICAT.21 For the five proteins from Table 2 the average ratios and standard deviations were: 15N/unlabeled ) 1.01 ( 0.083 and 13C/unlabeled ) 0.87 ( 0.071. Variation from the expected value of 1.0 can be explained by minor changes in proteome composition caused by biological variation between duplicate samples. This indicates that our method is sensitive enough to detect variation in biological replicates. From this, it is clear that that with this method, for the first time, relative peptide abundances of three different phenotypes can be determined in a single MS experiment using an in vivo metabolic labeling approach.

Concluding Remarks Mass spectrometry has become the primary tool for protein identification and is increasingly becoming important in protein quantitation.22 In vivo labeling through metabolic incorporation of either 15N or 13C has been applied for accurate peptide quantitation in a number of unicellular and multicellular organisms.8,9,10,23 However, its usefulness in peptide sequencing has not been fully exploited. Here, we showed that it is possible to calculate the carbon and nitrogen composition of each peptide and fragment by using both 15N and 13C labeling. With a few examples, we demonstrated that this information can be used as a constraint in database searching and de novo peptide sequencing. The method is applicable to any organism that can be grown in the lab, and avoids 584

Journal of Proteome Research • Vol. 4, No. 2, 2005

problems that are associated with chemical and enzymatic labeling. However, it requires that peptides are reasonably well separated in the MS spectra, since the introduction of stable isotope labeled peptides adds an extra degree of complexity. We have shown that the resolution obtained with a 2-DE separation in combination with a C-18, reversed phase peptide separation does result in a sufficient resolution. An alternative approach is to apply a sufficient number of orthogonal LC based separations. However, its most obvious application will probably not be a shotgun analysis of an organism’s whole proteome but rather the selected quantitation and identification of (partially) purified proteins by either electrophoresis or any other conventional purification technique. It will be particularly powerful in cases where a protein from a mixture of unlabeled, 15N and 13C labeled proteins is purified on the basis of its biological activity. In this case, the activity will not be compromised, because all three forms are biologically active. However, at the MS stage valuable sequence information can be derived using the methods described in this paper. Moreover, the activity of a protein in each of the separate forms can be connected to an expression level, therefore providing a very powerful tool to connect biological activity to protein sequence. Particularly in cases where novel biological activity is found in unsequenced organisms, the de novo sequencing approach can lead to the discovery of novel proteins that do not necessarily have homology to proteins in existing databases.

15N

and

13C

research articles

Metabolic Labeling

A challenge for the future is to develop software packages that are able to fully extract both quantitative and qualitative information that is present in the thousands of MS and MS/ MS spectra that are currently generated at high speed. In the past decade, a number of software packages have been developed that are able to assist or automate the interpretation of mass spectra.24,25 To fully exploit our technique, additional software is required that is able to apply CN constraints in database searches for intact peptides and peptide fragments, and allows for simultaneous quantitation.

Acknowledgment. We thank the University of Sheffield and United Kingdom’s SRIF Infrastructure funds for support. A.P.L.S. thanks the University of Sheffield and the EPSRC for a scholarship. M.G.J.V. thanks the European Union’s ERASMUS Program. P.C.W. thanks the United Kingdom’s Engineering and Physical Sciences Research Council (EPSRC) for provision of an Advanced Research Fellowship (GR/A11311/01). References (1) Goshe, M. B.; Smith, R. D. Curr. Opin. Biotechnol. 2003, 14, 101109. (2) Julka, S.; Regnier, F. J. Proteome. Res. 2004, 3, 350-363. (3) Gygi, S. P.; Rist, B.; Gerber, S. A.; Turecek, F.; Gelb, M. H.; Aebersold, R. Nat. Biotechnol. 1999, 17, 994-999. (4) Hansen, K. C.; Schmitt-Ulms, G.; Chalkley, R. J.; Hirsch, J.; Baldwin, M. A.; Burlingame, A. L. Mol. Cell Proteomics 2003, 2, 299-314. (5) Yao, X.; Freas, A.; Ramirez, J.; Demirev, P. A.; Fenselau, C. Anal. Chem. 2001, 73, 2836-2842. (6) Staes, A.; Demol, H.; Van Damme, J.; Martens, L.; Vandekerckhove, J.; Gevaer, K. J. Proteome. Res. 2004, 3, 786-791.

(7) Washburn, M. P.; Ulaszek, R.; Deciu, C.; Schieltz, D. M.; Yates, J. R., 3rd Anal. Chem. 2002, 74, 1650-1657. (8) Oda, Y.; Huang, K.; Cross, F. R.; Cowburn, D.; Chait, B. T. Proc. Natl. Acad. Sci. U.S.A. 1999, 96, 6591-6596. (9) Krijgsveld, J.; Ketting, R. F.; Mahmoudi, T.; Johansen, J.; ArtalSanz, M.; Verrijzer, C. P.; Plasterk, R. H.; Heck, A. J. Nat. Biotechnol. 2003, 21, 927-931. (10) Wu, C. C.; MacCoss, M. J.; Howell, K. E.; Matthews, D. E.; Yates, J. R., 3rd Anal. Chem. 2004, 76, 4951-4959. (11) Zhong, H.; Marcus, S. L.; Li, L. J. Proteome. Res. 2004, 3, 11551163. (12) Brock, D. T. Arch. Mikrobio. 1972, 84, 54-68. (13) Atlas, R. M. Handbook of Microbiological Media; New York: CRC Press, , 1997. (14) Cagney, G.; Amiri, S.; Premawaradena, T.; Lindo, M.; Emili, A. Proteome. Sci. 2003, 1, 5. (15) Mann, M.; Wilm, M. Anal. Chem. 1994, 66, 4390-4399. (16) Hunt, D. F.; Yates, J. R., 3rd; Shabanowitz, J.; Winston, S.; Hauer, C. R. Proc. Natl. Acad. Sci. U.S.A. 1986, 83, 6233-6237. (17) Munchbach, M.; Quadroni, M.; Miotto, G.; James, P. Anal. Chem. 2000, 72, 4047-4057. (18) Lee, Y. H.; Kim, M. S.; Choie, W. S.; Min, H. K.; Lee, S. W. Proteomics 2004, 4, 1684-1694. (19) Hunt, D. F.; Buko, A. M.; Ballard, J. M.; Shabanowitz, J.; Giordani, A. B. Biomed. Mass. Spectrom. 1981, 8, 397-408. (20) Spengler, B.; Lu ¨ tzenkirchen, F.; Metzger, S.; Chaurand, P.; Kaufmann, R.; Jeffery, W.; Bartlet-Jones, M.; Pappin, D. J. C. Int. J. Mass. Spectrom. 1997, 127-140. (21) Yu, L. R.; Conrads, T. P.; Uo, T.; Issaq, H. J.; Morrison, R. S.; Veenstra, T. D. J Proteome Res 2004, 3z, 469-477. (22) Aebersold, R.; Mann, M. Nature 2003, 422, 198-207. (23) Sechi, S.; Oda, Y. Curr. Opin. Chem. Biol. 2003, 7, 70-77. (24) MacCoss, M. J.; Wu, C. C.; Liu, H.; Sadygov, R.; Yates, J. R., 3rd Anal. Chem. 2003, 75, 6912-6921. (25) Li, X. J.; Zhang, H.; Ranish, J. A.; Aebersold, R. Anal. Chem. 2003, 75, 6648-6657.

PR0497733

Journal of Proteome Research • Vol. 4, No. 2, 2005 585