Inverse 18O Labeling Mass Spectrometry for the Rapid Identification of

Jul 6, 2001 - Once the peptides are identified with the characteristic inverse labeling pattern of 18O/16O ion intensity shift, MS data of peptide fin...
27 downloads 8 Views 124KB Size
Anal. Chem. 2001, 73, 3742-3750

Inverse 18O Labeling Mass Spectrometry for the Rapid Identification of Marker/Target Proteins Y. Karen Wang,* Zhixiang Ma, Douglas F. Quinn, and Emil W. Fu

Core Technologies Area, Discovery Research, Novartis Pharmaceuticals Corporation, 556 Morris Avenue, Summit, New Jersey 07901

Systematic analysis of proteins is essential in understanding human diseases and their clinical treatments. To achieve the rapid and unambiguous identification of marker or target proteins, a new procedure termed “inverse labeling” is proposed. With this procedure, to evaluate protein expression of a diseased or a drug-treated sample in comparison with a control sample, two converse labeling experiments are performed in parallel. The perturbed sample (by disease or by drug treatment) is labeled in one experiment, whereas the control is labeled in the second experiment. When mixed and analyzed with its unlabeled counterpart for differential comparison using mass spectrometry, a characteristic inverse labeling pattern of mass shift will be observed between the two parallel analyses for proteins that are differentially expressed. In this study, protein labeling is achieved through 18O incorporation into peptides by proteolysis performed in [18O]water. Once the peptides are identified with the characteristic inverse labeling pattern of 18O/16O ion intensity shift, MS data of peptide fingerprints or peptide sequence information can be used to search a protein database for protein identification. The methodology has been applied successfully to two model systems in this study. It permits quick focus on the signals of differentially expressed proteins. It eliminates the detection ambiguities caused by the dynamic range of detection on proteins of extreme changes in expression. It enables the detection of protein modifications responding to perturbation. This strategy can also be extended to other protein-labeling methods, such as chemical or metabolic labeling, to realize the same benefits. It has been well established that most disease processes and disease treatments are manifest at the protein level. The mechanisms of action for most pharmaceuticals on the market are indeed mediated through proteins. Comparative analysis of protein profiles from normal and disease states, with or without drug treatment, can facilitate the systematic study of proteins involved in any biological system or disease, revealing new insights into disease mechanisms, identifying new targets, providing information on drug action mechanisms and toxicity, and identifying surrogate markers.1-4 It is believed that proteomic studies will * To whom correspondence should be addressed: (tel) (908) 277-7022; (fax) (908) 277-4910; (e-mail) [email protected]. (1) Celis, J. E.; Wolf, H.; Østergaard, M. Electrophoresis 2000, 21, 2115-2121.

3742 Analytical Chemistry, Vol. 73, No. 15, August 1, 2001

lead to important new insights into disease mechanisms and to improved drug discovery strategies for the discovery of novel therapeutics. The most common technology platform for proteomic studies to date is the integrated use of two-dimensional (2D) gel electrophoresis and mass spectrometry.5 Protein mixtures derived from cells or tissues of normal or disease states are separated on 2D PAGE and visualized by staining. Quantitative comparisons of proteins can be made after the images of the displayed proteins are analyzed. The protein spots that are unique or differentially expressed are then identified. Following excision of the spots and in situ digestion, a variety of mass spectrometric techniques can be used to obtain peptide fingerprint and peptide sequence information, which can then be used to search a sequence database to identify the proteins. As these proteins are diseasespecific, each could potentially become a new target for drug discovery or be used as a disease marker. Although major improvements have been made on 2D PAGE in reproducibility, resolution, and sensitivity for protein profiling, there are still a number of shortcomings with the technique. The chief shortcoming of the technique is its inability to display all protein components, such as membrane proteins, proteins with extreme pIs, and proteins of low copy numbers. Inadequate resolving power is another pitfall of the technique. As many as 40% of all gel spots may contain more than one protein, which makes quantitative comparison of protein expressions and interpretation of experiments extremely difficult. Although a lot of progress has been made over the past few years, proteomics using 2D gels is still viewed as a difficult technology in terms of automation and throughput. Alternatives to this technology, particularly to replace the use of 2D gels, are being actively explored in the hope of achieving better throughput and higher sensitivity. One approach that omits 2D gels is the use of multidimensional liquid-phase separation techniques such as chromatography and solution isoelectric focusing to partially resolve mixtures of proteins or their digested peptide products.6-15 Mass spectrom(2) Jungblut, P. R.; Zimny-Arndt, U.; Zeindl-Eberhart, E.; Stulik, J.; Koupilova, K.; Pleissner, K.-P.; Otto, A.; Mu ¨ ller, E.-C.; Sokolowska-Ko ¨hler, W.; Grabher, G.; Sto ¨ffler, G. Electrophoresis 1999, 20, 2100-2110. (3) Anderson, N. L.; Esquer-Blasco, R.; Hofmann, J. P.; Anderson, N. G. Electrophoresis 1991, 12, 907-930. (4) Steiner, S.; Aicher, L.; Raymackers, J.; Meheus, L.; Esquer-Blasco, R.; Anderson, N. L.; Cordier, A. Biochem. Pharmacol. 1996, 51, 253-258. (5) Quadroni, M.; James, P. Electrophoresis 1999, 20, 664-677. (6) Eng, J. K.; McCormack, A. L.; Yates, J. R., III. J. Am. Soc. Mass Spectrom. 1994, 5, 976-989. 10.1021/ac010043d CCC: $20.00

© 2001 American Chemical Society Published on Web 07/06/2001

etry, with additional resolving power, is used to identify the simplified mixtures. Since separation occurs in the liquid phase, the automation potential is much higher than that with the gelbased platform. When running at preparative scale, sample loading is significantly larger than what is achievable with 2D PAGE. In addition, this approach reduces the protein/peptide recovery losses associated with 2D gel technology, since the final separated proteins/peptides are in solution for analysis and identification. A substantial limitation of this approach is that it cannot provide the quantitative information obtained from 2D gel imaging. Isotope dilution has been used extensively for quantitative analysis of drugs in biological materials. An isotopically unique internal standard is added to the samples to achieve accurate quantitation of the drug. As a result of the use of internal standard, variables such as sample recovery, matrix effects, and detection interferences are no longer detrimental to accurate quantitation. Efforts have been made to metabolically or chemically label proteins with isotopes in order to exploit these advantages in protein quantitation and the application of the approach in comparative proteomic studies. When differential expression of proteins (e.g., a normal vs a disease state) are evaluated, the two pools of proteins are labeled separately. One is labeled with heavy isotope and the other is not (i.e., with natural, light isotope). The two pools are then mixed, proteolyzed, and analyzed. Each pair of peptide signals, with and without label, acts as the internal standard for each other and enables the quantitative comparison of protein differential expression. While the label offers a means to differentiate the two populations and perform the relative quantitation on every protein, peptide fingerprint and sequence information obtained from MS analysis provides the identification of the proteins. Protein profiling, quantification, and identification are therefore performed in a single step. Chait et al. demonstrated such an approach wherein proteins are metabolically labeled during cell culture in a 15N-enriched culture medium.16 Similar strategies can also be applied via amino acid-specific labeling of proteins metabolically during cell culture cultivation.17 Aebersold et al. developed a chemical derivatization scheme, termed isotopecoded affinity tagging (ICAT), to carry out labeling on all cysteinecontaining proteins.18 In this approach, relative protein quantitation is achieved through the use of two isotopically distinct, light and (7) McCormack, A. L.; Schieltz, D. M.; Goode, B.; Yang, S.; Barnes, G.; Drubin, D.; Yates, J. R., III. Anal. Chem. 1997, 69, 767-776. (8) Opiteck, G. J.; Jorgenson, J. W. Anal. Chem. 1997, 69, 2283-2291. (9) Opiteck, G. J.; Lewis, K. C.; Jorgenson, J. W.; Anderegg, R. J. Anal. Chem. 1997, 69, 1518-1524. (10) Opiteck, G. J.; Ramirez, S. M.; Jorgenson, J. W.; Moseley, M. A., III. Anal. Biochem. 1998, 258, 349-361. (11) Kojima, K.; Manabe, T.; Okuyama, T.; Tomono, T.; Suzuki, T.; Tokunaga, E. J. Chromatogr. 1982, 239, 565-570. (12) Isobe, T.; Uchida, K.; Taoka, M.; Shinkai, F.; Manabe, T.; Okuyama, T. J. Chromatogr. 1991, 588, 115-123. (13) Wall, D. B.; Kachman, M. T.; Gong, S.; Hinderer, R.; Parus, S.; Misek, D. E.; Hanash, S. M.; Lubman, D. M. Anal. Chem. 2000, 72, 1099-1111. (14) Jensen, P. K.; Pasˇa-Tolic´, L.; Anderson, G. A.; Horner, J. A.; Lipton, M. S.; Bruce, J. E.; Smith, R. D. Anal. Chem. 1999, 71, 2076-2084. (15) Pasˇa-Tolic´, L.; Jensen, P. K.; Anderson, G. A.; Lipton, M. S.; Peden, K. K.; Martinovic´, S.; Tolic´, N.; Bruce, J. E.; Smith, R. D. J. Am. Chem. Soc. 1999, 121, 7949-7950. (16) Oda, Y.; Huang, K.; Cross, F. R.; Cowburn, D.; Chait, B. T. Proc. Natl. Acad. Sci. U.S.A. 1999, 96, 6591-6596. (17) Chen, X.; Smith, L. M.; Bradbury, E. M. Anal. Chem. 2000, 72, 11341143. (18) Gygi, S. P.; Rist, B.; Gerber, S. A.; Turecek, F.; Gelb, M. H.; Aebersold, R. Nature Biotechnol. 1999, 17, 994-999.

heavy tags. Although valuable and elegant procedures, these approaches do contain several inherent limitations. Data analysis can be tedious. There is no intrinsic mechanism to allow subtractive analysis so that attention can be focused on the small number of proteins that have altered the expression levels. Rather, a procedure of identifying the light and heavy isotope pair and calculating the signal intensity ratio has to be performed on all peptide signals. Dynamic range of detection is another major limiting factor with the methods that can lead to ambiguity in detection of extreme changes in protein expression. When only one signal of the pair is detected due to a dramatic change in expression and the limitation of dynamic range of detection, the signal can be confused as a chemical background or from a nonlabeled peptide rather than from a protein that has been highly differentially expressed. In the cases where protein modifications are the results of perturbation, the changes will not be detected by these approaches since no isotopic counterpart is present for the modified peptides. Protein 18O labeling and its application in comparative proteomic studies have been demonstrated.19 With this labeling scheme, one pool of proteins is digested in [16O]water buffer and the other pool in [18O]water buffer. Two pools are then mixed, processed, and analyzed by MS. Ratios of the relative intensities of the 16O peak and 18O peak are used to quantify the protein expression. The method possesses many of the problems as that mentioned above, i.e., tedious data analysis, ambiguity in detection of extreme changes in expression due to dynamic range of detection limitation, and inability to detect protein modifications due to perturbation. In addition, the method adds additional complexities due to 13C effect and its interference of the 18O signals. The 13C effect can make interpretation of a downregulation very difficult since weak 18O signals can be buried under the normal 13C pattern, and the signals can easily be misinterpreted as nonlabeled C-terminal peptides. Since the goal of a protein differential analysis is to extract and identify the small number of proteins that deviate in expression level upon a perturbation, any method that enables subtractive analysis of protein signals with unaltered levels would be of great value. Emphasis could then be focused only on the signals representative of changes upon the perturbation. As a result, any efforts required for peptide MS/ MS analysis, protein identification, and relative quantitation would drastically be reduced, as only a very small population of the proteins are of interest. With this report, we present a scheme to extract these representative ions of interest with certainty and to overcome the limitations previously mentioned. We term this technique “inverse labeling”. With this procedure, two inverse labeling experiments are performed in parallel. The labeling is swapped between the two experiments (Figure 1); i.e., in the first experiment, the labeled proteins are derived from the perturbed pool (pool 2), and in the second experiment, the labeled proteins are derived from the control pool (pool 1). If expression of a protein has been significantly up- or downregulated by the perturbation, (i.e., a shift in signal intensities of light and heavy isotopes is observed in one analysis), the inverse should be observed in the analysis of the second sample due to the inverse labeling. Thus, by comparing the results of the two experiments, (19) Yao, X.; Freas, A.; Ramirez, J.; Demirev, P.; Fenselau, C. 48th ASMS Conference on Mass Spectrometry and Allied Topics; Long Beach, CA, 2000.

Analytical Chemistry, Vol. 73, No. 15, August 1, 2001

3743

Figure 1. Inverse labeling method for the rapid identification of marker/target proteins.

one can quickly identify those differentially expressed proteins with signals showing the characteristic mass shift among the two inverse labeling experiments. As demonstrated in the studies in this report, the strategy offers a number of merits when applied to a labeling-based proteomic strategy. It offers the rapid identification of the signals from the differentially expressed proteins through quick pattern recognition. It eliminates the ambiguity in detection of proteins with extreme changes in expression caused by the limitation of dynamic range of detection. It also resolves the detection problems associated with protein modifications due to perturbation, where unpaired peaks are detected. Here we report the results of our studies in which we applied this strategy to two model systems to demonstrate that inverse labeling in combination with mass spectrometric analysis provides a viable approach for the rapid identification of marker/target proteins. It is imperative to note that although protein proteolytic 18O labeling is used in this study, the strategy of inverse labeling may be applied to all other protein-labeling techniques to realize the same benefits. EXPERIMENTAL SECTION Chemicals. [18O]Water (95% atom) was purchased from Isotec Inc. (Miamiburg, OH). Eight-Protein Model System. Commercial proteins of BSA, aldolase, carbonic anhydrase, β-casein, chicken albumin, apotransferrin, β-lactoglobulin, and cytochrome c (Sigma) were used without further purification. The eight proteins were mixed at a molar ratio of 1:1:1:1:1:1:1:1 for the “control” and 0.3:3:1:1:1:1:1:1 for the “treated” pool. Two identical aliquots containing 10 pmol each of the unchanged components were taken from each pool 3744

Analytical Chemistry, Vol. 73, No. 15, August 1, 2001

and were dried using a Speedvac. The 18O labeling was performed using two procedures, one during proteolysis and the other postproteolysis. In the during-proteolysis labeling, for both control and treated pools, one of the dried aliquots was reconstituted with 20 µL of regular water and the other with 20 µL of [18O]water, both containing 50 mM ammonium bicarbonate. Trypsin (modified, sequencing grade, Promega) at a 1:100 trypsin-to-protein ratio (w/w) was added to each solution, and digestion was allowed to proceed at 37 °C for ∼20 h. The final enrichment level of 18O was ∼84%. This enrichment includes the consideration of dilution from the addition of buffer and enzyme. For the postproteolysis labeling, trypsin digestion was performed in regular waterammonium bicarbonate buffer at the same trypsin-to-protein ratio for ∼12 h for all aliquots. The resulting peptide mixtures were then taken to complete dryness with a Speedvac. To each aliquot of the dried peptide mixture, 10 µL of 18O or regular water was added respectively for postproteolysis 18O labeling. The labeling process was allowed to proceed at room temperature for ∼12 h. For both during-proteolysis and postproteolysis labeling, prior to analysis, the 16O control sample was mixed with the 18O treated sample and the 18O control sample was mixed with the 16O treated sample. The same MS analysis was performed on both mixtures. Whole Cell Lysate Spiked with PTP Protein. Approximately 5 × 107 harvested CHO cells were lysed mechanically (freeze/ thaw) using a buffer containing 10 mM Tris, 1 mM EDTA, pH 7.4. The resulting cell lysate of 2.5 mL at 0.4 mg/mL protein concentration was divided into four aliquots. Two were spiked with 10 pmol of PTP-1B protein (internally expressed, residue 1-298) (PTP10) and the other two with 30 pmol of PTP-1B (PTP30).

Trypsin was added to each solution at a 1:100 (w/w) trypsin-tototal protein ratio to initiate the digestion. Proteolysis was allowed to proceed at 37 °C for ∼12 h. The resulting solutions were centrifuged, and the solid was discarded. The solutions were then taken to complete dryness with a Speedvac. For both PTP10 and PTP30, one of the two identical aliquots was reconstituted with 10 µL of [18O]water, the other with 10 µL of regular water. The postproteolysis 18O incorporation was allowed to proceed at room temperature for ∼12 h. Prior to analysis, the [16O]PTP10 and [18O]PTP30 samples were mixed, as were the [18O]PTP10 and [16O]PTP30 samples. Each mixture was diluted with 100 µL of mobile phase A (0.1% formic acid-0.01% TFA in water) and filtered through a 0.4-µm Microcon filter. The filtrate was injected to LC/ MS for analysis. LC/MS and LC/MS/MS Peptide Analyses. The MS analysis of the peptide mixtures was carried out through LC/ESI MS using a Finnigan LCQ ion trap mass spectrometer. A 1.0 × 150 mm Vydac C18 column was employed for on-line peptide separation with a gradient of 2/2/20/45/98/98% B at 0/2/10/65/66/70 min, respectively. Mobile phase A was 0.1% formic acid-0.01% TFA in water and mobile phase B was 0.1% formic acid-0.01% TFA in acetonitrile. The flow rate was 50 µL/min. After the elution from LC column, the flow was split 9:1 with ∼5 µL/min going into MS and 45 µL/min being collected for later use. The LCQ ion trap mass spectrometer was operated at a data-dependent mode, automatically performing MS/MS on the most intense ion of each scan when the signal intensity exceeded a preset threshold. When needed, the collected samples were concentrated and reanalyzed to obtain MS/MS data that were not collected automatically in the first run for the peptides of interest. The relative collision energy was set at 45% at which most peptides fragment effectively in our experience. An 8-Da window for precursor ion selection was employed. MALDI TOF MS Peptide Analysis. The mixture samples were simply diluted 1:3-1:5 using the MALDI matrix solution (saturated R-cyano-4-hydroxycinnamic acid in 50% acetonitrile0.1% TFA), and ∼1 µL of the final solution (containing ∼500 fmol each, based on the unchanged components for the eight-protein system) was loaded onto the MALDI target for analysis. The analysis was performed on a Bruker Reflex III MALDI TOF mass spectrometer operated in the reflectron mode with delayed ion extraction. When applicable, postsource decay (PSD) was also performed on the peptide ions of interest. Database Searches. Search software PROWL (Proteometrics, New York, NY) and MASCOT (Matrix Science, London, U.K.) were used to search the protein databases to identify proteins using peptide fingerprints, MS/MS fragments, and processed PSD spectra. For searches using peptide fingerprint information, peptide ions exhibiting the inverse labeling pattern or mass shift of 2 or 4 Da on the most abundant isotopic ion between the two inverse labeling experiments were sorted out on the basis of the direction of mass shift (increasing or decreasing). Each list was used separately for a database search to identify the proteins. For searches using peptide sequence information, the MS/MS spectra of a peptide from the two inverse labeling experiments were compared and Y ions with a mass shift of 2 or 4 Da were identified. These ions were used alone or in combination with B ions to search protein databases to obtain identification of the proteins.

An iterative search combining the data of ions with inverse labeling patterns from peptide map and MS/MS was also performed. Any ions that demonstrated a clear inverse labeling pattern in the map and were further supported by mass shifts of fragment ions in MS/MS data were identified first using their MS/MS fragments/ sequence tags. The peptides associated with the identified proteins were then removed from the list and a second-round search was initiated using the masses of the remaining peptides of the inverse labeling pattern. For those ions for which no convincing conclusion could be made, a second analysis was performed using the collected sample to obtain MS/MS data. The resulting data were used in the same manner to search the databases for protein identification. RESULTS AND DISCUSSION In this study, protein labeling is achieved through proteolysis in [18O]water. One 18O atom is incorporated into the newly formed carboxy terminus of peptides as a consequence of hydrolysis during proteolysis. An additional 18O may be incorporated into the same terminal carboxy group through a mechanism of protease-catalyzed exchange.20-23 Thus, following digestion by trypsin, all of the resulting peptides except for C-terminal peptides that lack Lys or Arg at the C-terminus are labeled with either one or two 18O atoms at the C-terminus, hence, a mass increase of 2 or 4 Da. In our experience, for most peptides, the incorporation of two 18O atoms is more prominent than that of one 18O. As depicted in Figure 1, the rapid identification of differentially expressed proteins is achieved by quick identification of their peptides that exhibit the characteristic inverse labeling pattern or mass shift between the two experiments. For most proteins, their expression level remains unchanged following a perturbation which is reflected by a similar abundance profile of pool 1 and pool 2. Therefore, for peptides from those proteins, there will be no significant difference in the labeling pattern between the two inverse labeling experiments (i.e., similar abundance of 16O and 18O signals in both experiments), and these signals can be subtracted out, in principle, by the comparative analysis of the two data sets. The C-terminal peptides without 18O labeling are subtracted out as well. For a protein whose level of expression has been significantly up- or downregulated by the perturbation, changes in the relative intensity of 16O and 18O signals will be observed. When the control pool is not labeled and the perturbed pool is 18O labeled, the 18O signals of the resulting peptides will be of greater intensity than its 16O signals if the protein is upregulated as a consequence of perturbation. Conversely, the 16O signals of the resulting peptides will be stronger if a downregulation of protein has occurred upon perturbation. In our parallel or inverse analysis, the protein pool labeling is reversed and differentially expressed proteins will give rise to peptides that display an inverse mass shift when contrasted with the first analysis. Thus, there is a 2/4-Da shift in mass for those peptides when the most intense isotopic ions are compared between the (20) Rose, K.; Simona, M. G.; Offord, R. E.; Prior, C. P.; Otto, B.; Thatcher, D. R. Biochem. J. 1983, 215, 273-277. (21) Rose, K.; Savoy, L.-A.; Simona, M. G.; Offord, R. E.; Wingfield, P. Biochem. J. 1988, 250, 253-259. (22) Schno ¨lzer, M.; Jedrzejewski, P.; Lehmann, W. D. Electrophoresis 1996, 17, 945-953. (23) Hawke, D.; Nuwaysir, L.; Settineri, T. 48th ASMS Conference on Mass Spectrometry and Allied Topics; Long Beach, CA, 2000.

Analytical Chemistry, Vol. 73, No. 15, August 1, 2001

3745

Figure 2. MALDI TOF detection of tryptic digests of the eigh-protein mixtures. (A) 16O control - 18O treated sample; (B) 18O control - 16O treated sample; (C) monoisotopic patterns of a BSA peptide MH+ 1567.9 in (A) (upper) and (B) (lower); and (D) monoisotopic patterns of an aldolase peptide MH+ 2107.3 in (A) (upper) and (B) (lower). The mass shifts or 16O/18O intensity ratio reversal indicates differential expression of the proteins: “downregulation” of BSA and “upregulation” of aldolase (*, ions showing the inverse labeling pattern/mass shift).

two inverse labeling experiments (i.e., from 18O signal in the first experiment to 16O signal in the second experiment or vice versa). A strength of the procedure lies in the fact that instead of looking for the (2/4-Da isotope pair and quantitatively calculating the ratio of 16O to 18O signals for every peptide, one only needs to compare the two data sets and identify peptides of the characteristic mass shift, which can be achieved rapidly and potentially automatically. Only the peptides from proteins of significant differential expression will display a mass shift, including any associated with extreme changes in expression and protein modifications where unpaired isotope peaks are detected. Peptides derived from proteins with no expression deviations as well as C-terminus peptides with no incorporated label will not display a mass shift pattern. The direction of the shift (i.e., increasing or decreasing) implicates the direction of differential expression of the protein (i.e., downregulation or upregulation). Eight-Protein Model System. The inverse labeling and MS analysis were performed in the same manner as shown in Figure 1 on the eight-protein model system where BSA was “downregulated” by 3-fold and aldolase “upregulated” by 3-fold. MALDI TOF MS performed directly on the mixture without any separation resulted in a peptide map spectrum that displayed a large degree of signal overlap, which made data interpretation somewhat difficult (Figure 2A,B). To minimize interference and improve detection dynamic range, ideally, one would employ off-line, 3746 Analytical Chemistry, Vol. 73, No. 15, August 1, 2001

multidimensional fractionation followed by the MALDI analysis of the fractions.24 Nonetheless, it is still clearly demonstrated how the inverse labeling strategy helps to quickly identify the peptide signals derived from proteins of differential expression. Without the inverse labeling strategy, one would have to evaluate a single spectrum (e.g., Figure 2A) looking for the (2/4-Da pair for each peptide and performing quantitation. Utilizing the inverse labeling strategy, one only needs to overlay the two spectra (Figure 2A,B) and perform “zoom and pick” to identify the peaks that show the characteristic mass shift of 2/4 Da between the two spectra. Very quickly (a few minutes in this case) after this exercise of qualitative comparison, the peaks of the characteristic inverse labeling pattern are identified (e.g., Figure 2C,D). We performed PSD on a number of them and were able to identify the corresponding proteins using the PSD data (data not shown). It is apparent that when inverse labeling is applied, a quick qualitative comparison of the two data sets can lead to the quick identification of the peptides of interest. Quantitation and PSD or MS/MS analysis for protein identification can then be performed on those peptides. When the same samples were analyzed using an LCQ with on-line RP LC, the characteristic inverse labeling pattern or a 2/4-Da mass shift was also clearly observed on a number of peptides (Figures 3A,B and 4A,B). Since doubly charged ions are the often detected for tryptic peptides (24) Griffin, T.; Han, D.; Parker, K.; Gygi, S.; Rist, B.; Aebersold, R. 48th ASMS Conference on Mass Spectrometry and Allied Topics; Long Beach, CA, 2000.

Figure 3. LC/MS detection of a BSA tryptic peptide. (A) MS of the 16O control - 18O treated sample; (B) MS of the 18O control - 16O treated sample; (C) MS/MS of the peptide in (A); and (D) MS/MS of the peptide in (B). A 2-Da mass shift between (A) and (B) on the most abundant isotopic ions indicates a significant differential expression of the protein. The mass shift is further verified/confirmed by the 4-Da shift on the correlating singly charged ions (insets) and, in addition, in the MS/MS spectra (C) and (D) by the 4-Da shift of all Y ions. The pattern helps to identify Y ions and B ions and thus helps in the interpretation of the MS/MS spectra. The BSA protein is exclusively identified from database searching using the Y ions (those with a 4-Da shift) (*, ions showing the inverse labeling pattern/mass shift).

by electrospray MS, a 1/2-Da shift is observed on the doubly charged ions. Following data analysis, two lists of peptide masses were quickly generated that were based on the direction of the mass shift. These two lists were used to search the database. Aldolase was exclusively identified using the list of a 2/4-Da decrease in mass shift, corresponding to an upregulation of protein expression. BSA was identified using the list of increase in mass shift, corresponding to a downregulation in protein expression. MS/MS spectra were obtained automatically in data-dependent mode for a number of the peptides. To emulate a broad-spectrum situation where multiple proteins may be up- or downregulated, an iterative search scheme was also applied. In this case, we used the combined mass list of all the peptides that showed a mass shift, regardless of the direction of the shift. After a protein is identified with high confidence using either the mass list or an MS/MS spectrum (aldolase in our system), all peptides derived from the protein are removed from the mass list. The process is then repeated in order to identify the next protein displaying the mass shift (BSA in this case). It is important to note that, as a consequence of inverse labeling, MS/MS data are especially information rich. Since the label is incorporated at the C-terminus of each peptide, Y ions in an MS/MS spectrum carry the label and exhibit the characteristic inverse labeling pattern for proteins that are differentially expressed. As shown in Figures 3C,D and

4C,D, for proteins whose “expression level” has been significantly altered by “perturbation”, the inverse labeling pattern of a 2/4Da mass shift observed at the molecular ion level for the peptides is evident in the Y ions in the MS/MS spectra. Hence, MS/MS data validate the mass shift assessments made at the molecular ion level. Since most peptide fragments carry fewer charges than the precursor ion, the mass shift is more prominent and easier to recognize when compared to that from the multiply charged precursor ion. In addition, the inverse labeling pattern reflected in Y ions offers facile assignment of Y ions and B ions for the interpretation of an MS/MS spectrum. The fragments displaying the inverse labeling mass shifts are Y and Y-related ions and those without are B or B-related ions. Although interpretation is not required to search the databases using MS/MS data, the added specificity helps to increase efficiency and accuracy of protein identification via database search. In our study, BSA and aldolase were both positively identified using exclusively Y ions in the MS/ MS data (Figures 3 and 4). These advantages are of more importance when one deals with novel proteins where de novo sequencing is required. The ability to assign Y and B ions greatly facilitates “read out” of the sequence from an MS/MS spectrum. Spiked Cell Lysate System. In an attempt to emulate a complex protein mixture, PTP-1B protein was spiked at two different levels into two identical pools of whole cell lysate. The Analytical Chemistry, Vol. 73, No. 15, August 1, 2001

3747

Figure 4. LC/MS detection of an aldolase tryptic peptide. (A) MS of the 16O control - 18O treated sample; (B) MS of the 18O control - 16O treated sample; (C) MS/MS of the peptide in (A); and (D) MS/MS of the peptide in (B). A 2-Da mass shift between (A) and (B) on the most abundant isotopic ions indicates a significant differential expression of the protein. The mass shift is further verified/confirmed by the 4-Da shift on the correlating singly charged ions (insets) and, in addition, in the MS/MS spectra (C) and (D) by the 4-Da shift of all Y ions. The pattern helps to identify Y ions and B ions and thus helps in the interpretation of the MS/MS spectra. Aldolase protein is exclusively identified from database searching using the Y ions (those with a 4-Da shift) (*, ions showing the inverse labeling pattern/mass shift).

inverse labeling experiment was then performed on the two pools followed by LC/MS analysis. As expected, single-dimension LC was insufficient to adequately separate the tremendously large number of peptides present. Nonetheless, when the two sets of data from inverse labeling were compared, a number of ions possessing the characteristic inverse labeling mass shift were extracted (Figure 5A,B). The split and collected samples were subjected to a second analysis to obtain MS/MS on the ions that exhibited the mass shift. The Y ions clearly exhibited the 4-Da shift, which validated the mass shift observed on the precursor peptides and, thus, the differential expression of the precursor protein (Figure 5C,D). A database search using the distinctive Y ions possessing the mass shift led to the exclusive identification of the human PTP-1B protein. Postproteolysis 18O Incorporation. Mainly for the purpose of conserving the relatively expensive [18O]water, we have explored the process of incorporating 18O during proteolysis and postproteolysis. According to previous studies, 18O labels may be incorporated into peptides at the C-terminal carboxy group through a protease-catalyzed exchange. This has been confirmed by our observation that the majority of the peptides were found to have incorporated more than one (i.e., two) 18O atom when a protein was digested in [18O]water. For the majority of the peptides, we have been able to achieve this same level of 18O incorporation, by adding a very small volume of [18O]water (∼10 3748 Analytical Chemistry, Vol. 73, No. 15, August 1, 2001

µL) to a dried peptide mixture postproteolysis, and allow the exchange to occur at room temperature for 5-12 h. The postproteolysis labeling is especially advantageous when dealing with proteins or protein mixtures for which reduction in volume is problematic. By doing postproteolysis labeling, one can perform digestion the usual way in a regular water buffer, on a cell lysate, or on membrane proteins without worrying about protein precipitation during concentration or the use of a large quantity of the relatively expensive [18O]water to reach an overwhelming 18O environment for labeling. Once proteins are proteolyzed to peptides, concentration and precipitation is normally less of a problem, and the labeling process by protease-catalyzed exchange can be carried out using a very small amount of [18O]water. Another area where postproteolysis labeling may prove to be very useful is in the performance of 18O labeling experiments on gelseparated proteins via in-gel digestion. By carrying out 18O labeling postproteolysis, the amount of [18O]water required is substantially reduced, since the labeling is performed on the extracted and dried peptides not on the excised gel pieces. Inverse Labeling Pattern, the Mass Shift. The key to using the inverse labeling method to achieve rapid identification of marker/target proteins is the quick and unambiguous identification of peptides that exhibit the characteristic inverse labeling pattern or a 2/4-Da mass shift by comparing two data sets from the inverse labeling experiments. Any interference of the 18O or

Figure 5. LC/MS detection of a PTP tryptic peptide from a CHO cell lysate spiked with PTP-1B. (A) MS of the 16O PTP10 - 18O PTP30 sample; (B) MS of the 18O PTP10 - 16O PTP30 sample; (C) MS/MS of the peptide in (A) inset; and (D) MS/MS of the peptide in (B) inset, where PTP10 is a 0.25-mg CHO cell lysate spiked with 10 pmol of PTP-1B; PTP30 is a 0.25-mg CHO cell lysate spiked with 30 pmol of PTP-1B. A 4-Da mass shift (or a 2-Da shift on the doubly charged ion) between (A) and (B) (insets) on the most abundant isotopic ions indicates a significant “differential expression” of the protein. The mass shift is further verified/confirmed in the MS/MS spectra by the 4-Da shift of all Y ions, which also helps to identify Y ions and B ions and thus helps in the interpretation of the MS/MS spectra. PTP-1B protein is exclusively identified from database searching using the Y ions (those with a 4-Da shift) (*, ions showing the inverse labeling pattern/mass shift).

16O

signal of a peptide can produce misleading results or make the data interpretation difficult. One intrinsic interference comes from the natural 13C contribution of 16O peptide to its 18O signal intensity. As a result, when the change in a protein’s expression level is not sufficient to overcome the 13C effect (e.g., 2-fold or lower), abnormalities may be observed. These may include a detection of 1/3-Da mass shift instead of a 2/4-Da shift, and detection of the inverse labeling pattern may also become somewhat unreliable. In reality, shifts of these proteins eluding detection are not problematic since their level of changes in protein expression is below the statistical significance of biology. Protein differential expression, typically with a 3-fold or greater difference in expression levels, is considered by biologists to be statistically significant. Often, a cutoff value such as 5-fold or greater in protein changes may be applied to focus on the most important proteins.25 In our studies of the two model systems, a value of 3-fold, the low-end threshold, was chosen for use. In both cases, peptides of the characteristic inverse labeling pattern were clearly detected, and with the data, the expected proteins were exclusively identified from the databases. Although we did not include in this study an example of extreme changes in expression

(25) Page, M. J.; Amess, B.; Rohlff, C.; Stubberfield, C.; Parekh, R. Drug Discovery Today 1999, 4, 55-62.

with the associated dynamic range of detection problems, it is evident that the method will identify any signals of extreme changes in expression readily, even more so than those with less significant changes. In the application of the method, what one looks for is the mass shift, not isotopic pattern. Therefore, there is no stringent requirement for resolving power of the MS instruments. The mass shift can be easily recognized even though the isotopic peaks may not be resolved for higher charged peptide ions using a mass spectrometer of unit resolution. Doubly charged ions are often observed for tryptic peptides by electrospray MS. A 1/2-Da shift (2 Da for most peptides) is readily detected on the doubly charged peptides from proteins of significant differential expression, such as shown in the examples in this study. This observation can be further supported by the mass shift on the correlating singly charged ion, and in addition, in the MS/MS data by the Y fragment ions. Because the Y ions are primarily singly charged and are often detected in series, their support for the inverse labeling pattern is often strong and without ambiguity. Merits and Limitations. As is true with any 18O labeling experiment, one must consider the fact that isotope enrichment is not 100% and the incorporation of the isotope may never go to completion. Mixture of no-label, with one 18O, and with two 18Os Analytical Chemistry, Vol. 73, No. 15, August 1, 2001

3749

is often observed as the result of an 18O labeling (with the plus two 18Os as the dominant for most peptides). Thus, the use of isotope labeling spreads the signals out over several channels and may reduce the sensitivity of the analysis. The reduction in sensitivity can be up to 2-fold. Conversely, there are valuable advantages offered by 18O labeling. It is a universal protein labeling scheme. With the exception of the C-terminus of the protein, all generated peptides incorporate as many as two 18Os at their C-terminus. Important information such as protein modification is retained. Since the labeling occurs as a result of proteolysis, no extra effort is required to introduce the label (e.g., in metabolic labeling) or for the subsequent workups (e.g., in chemical labeling). With respect to the inverse labeling procedure, since we need to do two experiments and thus split the materials into two portions, intrinsically there is a 2-fold reduction in sensitivity in comparison to the single-experiment approach if the same amount of materials is used. However, the advantages that the method offers, (1) rapid and unambiguous focus on the important signals, (2) elimination of ambiguities in detection of extreme changes due to the dynamic range of detection, (3) ability to detect protein modifications due to perturbation, and (4) potential for throughput and speed for large-scale application, in our opinion, overwhelmingly outweigh the 2-fold reduction in sensitivity. (Typically in cellular studies the limiting factor is often handling capacity rather than material supply.) Through combination with orthogonal protein/peptide separation schemes already established, the inverse labeling strategy is ready to be applied to any protein-labeling-based methods for comparative proteomic studies of complex systems. The powerful advantages that the inverse labeling strategy scheme provides will become more apparent. We believe, regardless of the labeling mechanism(s) used, as the complexity of the system increases so does the need for inverse labeling. CONCLUSIONS A new procedure termed inverse labeling has been established and experimentally validated. The strategy, which employs protein

3750

Analytical Chemistry, Vol. 73, No. 15, August 1, 2001

18O

labeling, has been demonstrated successfully on two model systems to rapidly identify proteins that are “differentially expressed”. The method necessitates only the clear detection of a few peptides from a protein to achieve identification of the protein. The characteristic inverse labeling pattern or mass shift of 2 or 4 Da indicates a significant change in the expression level of a protein, and the MS/MS data can then lead to the identification of the protein. Suitable software may be developed to automate the task. The inverse labeling strategy can be extended to other labeling methods such as chemical (ICAT) or metabolic labeling of proteins to achieve the same goal of rapid identification of target/marker proteins. The methodology described here can be used to provide a quick comparison of a healthy or normal state with a disease state (cell or tissue) for proteins that are unique or differentially expressed. The analysis leads to quick identification of the disease-specific proteins (target or marker proteins). The methodology can also be applied to elucidate drug action mechanisms and to study drug toxicity. Unique proteins and proteins that are differentially expressed upon a drug treatment are identified. These proteins can then be correlated to the drug mechanism of action or may offer new insights into its side effects. Through the implementation of the strategy to high-throughput proteomic studies, such essential information can be quickly generated to facilitate the rapid discovery of novel therapeutics.

ACKNOWLEDGMENT The authors thank Dr. Bryan Burkey and Ms. Mei Dong for providing the CHO cell lysate and Dr. James Koehn for the PTP1B protein used in the study. The authors also thank Michael Sabio, Gary Trakshel, and Maria Cueto for critical reading of the manuscript.

Received for review January 12, 2001. Accepted May 31, 2001. AC010043D