Anal. Chem. 2002, 74, 4741-4749
Identification and Relative Quantitation of Protein Mixtures by Enzymatic Digestion Followed by Capillary Reversed-Phase Liquid Chromatography-Tandem Mass Spectrometry Pavel V. Bondarenko,* Dirk Chelius, and Thomas A. Shaler
ThermoFinnigan, 355 River Oaks Parkway, San Jose, California 95134-1991
In this report, we describe an approach for identification and relative quantitation of individual proteins within mixtures using LC/MS/MS analysis of protein digests. First, the proteins are automatically identified by correlating the tandem mass spectra with peptide sequences from a database. Then, peak areas of identified peptides from one protein are added together to define the total reconstructed peak area of the protein digest. The total reconstructed peak area is further normalized to the peak area of an internal standard protein digest present in the mixture at a constant level. The method was illustrated using digested mixtures of five standard proteins as follows. One protein was gradually diluted while the other four components were present in the mixtures at constant level. This study revealed that relative peak area of the variable protein increased linearly (trend line R2 ) 0.9978) with increasing amount from 10 to 1000 fmol, while relative peak areas of four constant proteins remained approximately the same (within 20% relative standard deviation). To further evaluate the applicability of this method for the quantitation of proteins from complex mixtures, human plasma protein digest was spiked with 200 and 400 fmol of myoglobin digest. Total peak area of myoglobin peptides was normalized to the total peak area of apolipoprotein A-I peptides from human plasma, which played the role of an internal standard. The myoglobin/apolipoprotein A-I peak area ratio was 2 times larger for the human plasma digest spiked with a double amount of myoglobin. After several repetitions, the error of the relative peak area measurements remained below 11%, suggesting that the method described here can be used for relative concentration measurements of proteins in the complex biological mixtures. In the presented method, chemical derivatization steps are not needed to create an internal standard, as in isotope-coded affinity tag or similar methods. For a number of years, two-dimensional gel electrophoresis (2D GE) in combination with MS or MS/MS identification of stained spots has been the standard method for proteomic * Corresponding author: (phone) 408-965-6288; (fax) 408-965-6138; (e-mail)
[email protected]. 10.1021/ac0256991 CCC: $22.00 Published on Web 08/17/2002
© 2002 American Chemical Society
research (ref 1 for example). Quantitative determinations of protein levels in the mixture can be made by comparing the intensities of spots from one gel to the next. Spot intensities on 2D gels are typically measured by using stains such as Coomasie blue or silver or by using radioactive labeling such as [35S]methionine labeling of cultured cells.2 The technique of 2D differential imaging gel electrophoresis (2D-DIGE), which involves labeling of two samples that are to be compared with two different fluorescent dyes, e.g., Cy-3 and Cy-5, allows analysis of the two samples in the same 2D gel to obtain relative quantitative information by differentially imaging the fluorescent emission with optical filters to find those spots which exhibit changes in the two samples.3 Although the scanning of spots on 2D gels is a rather straightforward process that readily allows for the identification of differentially expressed proteins, there still remain some problems that are inherent in the method and that are very difficult to overcome. One difficulty is that many spots on a given 2D gel contain more than one protein so that it is not immediately apparent which protein in the spot has changed. It is also possible that one protein in the spot increases while another decreases, which could result in an apparent lack of intensity change for that spot. An additional difficulty is that highly abundant proteins spread out over a large area on the gel and can easily mask changes in less abundant proteins that are in the same region of the gel. Quantitation of peptide and protein mixtures by mass spectrometry alone has been a challenging analytical problem, largely because of possible ionization suppression among coeluting species during the electrospray process4 and cocrystallized moieties during matrix-assisted laser desorption/ionization.5 It was realized early6,7 that stable isotope-incorporated peptides can be (1) Shevchenko, A.; Jensen, O. N.; Podtelejnikov, A. V.; Sagliocco, F.; Wilm, M.; Vorm, O.; Mortensen, P.; Shevchenko, A.; Boucherie, H.; Mann, M. Proc. Natl. Acad. Sci. U.S.A 1996, 93, 14440-14445. (2) Lustig, R. H.; Pfaff, D. W.; Mobbs, C. V. J. Neurosci. Methods 1989, 29, 17-26. (3) Unlu, M.; Morgan, M. E.; Minden, J. S. Electrophoresis 1997, 18, 20712077. (4) King, R.; Bonfiglio, R.; Fernandez-Metzler, C.; Miller-Stein, C.; Olah, T. J. Am. Soc. Mass Spectrom. 2000, 11, 942-950. (5) Kratzer, R.; Eckerskorn, C.; Karas, M.; Lottspeich, F. Electrophoresis 1998, 19, 1910-1919. (6) Rafter, J. J.; Ingelman-Sundberg, M.; Gustafsson, J. A. Biomed. Mass Spectrom. 1979, 6, 317-324. (7) Desiderio, D. M.; Kai, M. Biomed. Mass Spectrom. 1983, 10, 471-479.
Analytical Chemistry, Vol. 74, No. 18, September 15, 2002 4741
employed as internal standards for mass spectrometry, because their chemical and physical properties (including chromatographic retention time) are similar to properties of their counterparts with natural isotopic distribution but the mass is different. Synthetic stable isotope-labeled peptides have been used for quantitative measurements of peptides and protein in complex biological mixtures by LC/MS8,9 and MALDI TOF.10 Stable isotope-labeled internal standards were also produced by incorporation of oxygen18 into the peptide molecule during the chemical or enzymatic hydrolysis in 18O-water.7,11 A recent report 12 described methylation of sample and control peptide mixtures with methanol containing three deuteriums and no deuteriums. A whole-cell stable isotope labeling was used by Chait and co-workers, who were able to perform relative quantitation measurements by growing yeast cells on medium enriched in 15N (15N >96%) and mixing with yeast grown on medium containing the natural abundance of the isotopes of nitrogen ([15N] ) 0.4%).13 Smith and co-workers used normal and rare isotope media to grow and study stress response in Escherichia coli14 and also study the 15N isotope labeling.15 Although a widely used technique, 2D GE-MS has limitations when dealing with very large or small proteins, proteins at the extremes of the pI scale, and membrane and low-abundant proteins. These limitations of the 2D GE-MS approach have been recently challenged by a new chromatography-based method. The method includes digestion of the protein mixtures without preliminary separation and data-dependent LC/MS/MS analysis of the resulting digest products followed by protein identification through database searching.16-20 This method was complemented by quantitative profiling of the complex protein digests based on isotope-coded affinity tags (ICAT).21-23 The ICAT reagent has two forms, heavy (containing eight deuteriums) and light (containing (8) Fierens, C.; Thienpont, L. M.; Stockl, D.; Willekens, E.; De Leenheer, A. P. J. Chromatogr., A 2000, 896, 275-278. (9) Stemmann, O.; Zou, H.; Gerber, S. A.; Gygi, S. P.; Kirschner, M. W. Cell 2001, 107, 715-726. (10) Gobom, J.; Kraeuter, K. O.; Persson, R.; Steen, H.; Roepstorff, P.; Ekman, R. Anal. Chem. 2000, 72, 3320-3326. (11) Mirgorodskaya, O. A.; Kozmin, Y. P.; Titov, M. I.; Korner, R.; Sonksen, C. P.; Roepstorff, P. Rapid Commun. Mass Spectrom. 2000, 14, 1226-1232. (12) Goodlett, D. R.; Keller, A.; Watts, J. D.; Newitt, R.; Yi, E. C.; Purvine, S.; Eng, J. K.; von Haller, P.; Aebersold, R.; Kolker, E. Rapid Commun. Mass Spectrom. 2001, 15, 1214-1221. (13) Oda, Y.; Huang, K.; Cross, F. R.; Cowburn, D.; Chait, B. T. Proc. Natl. Acad. Sci. U.S.A. 1999, 96, 6591-6596. (14) Paa-Toli, L.; Jensen, P. K.; Anderson, G. A.; Lipton, M. S.; Peden, K. K.; Martinovi, S.; Toli, N.; Bruce, J. E.; Smith, R. D. J. Am. Chem. Soc. 1999, 121, 7949-7950. (15) Conrads, T. P.; Alving, K.; Veenstra, T. D.; Belov, M. E.; Anderson, G. A.; Anderson, D. J.; Lipton, M. S.; Pasa-Tolic, L.; Udseth, H. R.; Chrisler, W. B.; Thrall, B. D.; Smith, R. D. Anal. Chem. 2001, 73, 2132-2139. (16) Link, A. J.; Eng, J. K.; Schieltz, D. M.; Carmack, E.; Mize, G. J.; Morris, D. R.; Garvik, B. M.; Yates, J. R., III. Nat. Biotechnol. 1999, 17, 676-682. (17) Washburn, M. P.; Wolters, D.; Yates, J. R., III. Nat. Biotechnol. 2001, 19, 242-247. (18) Spahr, C. S.; Davis, M. T.; McGinley, M. D.; Robinson, J. H.; Bures, E. J.; Beierle, J.; Mort, J.; Courchesne, P. L.; Chen, K.; Wahl, R. C.; Yu, W.; Luethy, R.; Patterson, S. D. Proteomics 2001, 1, 93-107. (19) Davis, M. T.; Spahr, C. S.; McGinley, M. D.; Robinson, J. H.; Bures, E. J.; Beierle, J.; Mort, J.; Yu, W.; Luethy, R.; Patterson, S. D. Proteomics 2001, 1, 108-117. (20) Moseley, M. A.; Blackburn, K.; Burkhart, W.; Davis, R.; Moyer, M.; Schlatzer, D.; Langridge, J.; Tyldesley, R.; Schultz, G.; Henion, J.; Koc, E.; Spremulli, L. Proceedings of the 49th ASMS Conference, 2001. (21) Gygi, S. P.; Rist, B.; Gerber, S. A.; Turecek, S. A.; Gelb, M. H.; Aebersold, R. Nat. Biotechnol. 1999, 17, 994-999. (22) Smolka, M. B.; Zhou, H.; Purkayastha, S.; Aebersold, R. Anal. Biochem. 2001, 297, 25-31.
4742
Analytical Chemistry, Vol. 74, No. 18, September 15, 2002
no deuteriums), to tag the control and experimental samples.21 An ICAT reagent kit is currently commercially available. Application of the ICAT method to a standard protein mixture indicated that accuracy and relative standard deviation (RSD) of peak area ratios was better than 12%.21 Another report 24 suggested that the error can be much larger due to the chromatographic resolution of the isotopic forms of ICAT-treated peptides. Calculations were presented indicating that the peak maximums could vary by as much as 30 s.24 If one of the isotopic forms (but not the other) coelutes with another species, its ionization can be greatly suppressed, causing substantial quantitation error.24 Different isotope-coded affinity tags were proposed containing three or four deuteriums, which may have a lower degree of separation.24 Several other affinity tags were recently described indicating that quantitation of protein digests continues to be an active field of research.25 Although the ICAT approach is sophisticated and promising, it is associated with several practical problems, such as timely chemical derivatization steps and high cost of the reagents, just to name a few. An alternative quantitative approach without isotope labeling was reported for MALDI TOF analysis of HPLC fractions of peptides from human extracellular fluids.26-28 The technique was designed to create differential displays of complex peptide mixtures from different patients. In other reports,29,30 human tears, saliva, plasma, and urine were spiked with equine β-2 microglobulin followed by affinity separation of microglobulins and quantitation of intact human β-2 microglobulin by MALDI TOF. Our report describes a method for identification and relative quantitation of unseparated protein mixtures using LC/MS/MS analysis of protein digests, which can be considered as an alternative to the ICAT technique. CONCEPT OF THE METHOD No addition of a control protein mixture with isotopic or chemical tags is required for quantitation. An algorithm of the proposed method includes the following steps. 1. A standard protein is added to control and experimental samples as an internal standard. Biological mixtures can be spiked with an internal protein standard that does not naturally occur in this mixture. A housekeeping protein, typically present in a biological fluid (or extract) and with a constant concentration, can be also employed as an internal standard, which is a method frequently employed in gene expression analysis. 2. The protein mixture including the internal standard is enzymaticaly digested into peptides and analyzed by LC/MS/MS. (23) Griffin, T. J.; Gygi, S. P.; Rist, B.; Aebersold, R.; Loboda, A.; Jilkine, A.; Ens, W.; Standing, K. G. Anal. Chem. 2001, 73, 978-986. (24) Zhang, R.; Sioma, C. S.; Wang, S.; Regnier, F. E. Anal. Chem. 2001, 73, 5142-5149. (25) Regnier, F. E.; Chakraborty, A. B.; Dormady, S. J.; G’eng, M.; Ji, J.; Riggs, L. D.; Sioma, C. S.; Wang, S.; Zhang, X. International patent publication WO 01/86306 A2, 2001. (26) Neitz, S.; Jurgens, M.; Kellmann, M.; Schulz-Knappe, P.; Schrader, M. Rapid Commun. Mass Spectrom. 2001, 15, 1586-1592. (27) Schrader, M.; Schulz-Knappe, P. Trends Biotechnol. 2001, 19, S55-S60. (28) Schulz-Knappe, P.; Zucht, H. D.; Heine, G.; Jurgens, M.; Hess, R.; Schrader, M. Comb. Chem. High Throughput Screen. 2001, 4, 207-217. (29) Tubbs, K. A.; Nedelkov, D.; Nelson, R. W. Anal. Biochem. 2001, 289, 2635. (30) Niederkofler, E. E.; Tubbs, K. A.; Gruber, K.; Nedelkov, D.; Kiernan, U. A.; Williams, P.; Nelson, R. W. Anal. Chem. 2001, 73, 3294-3299.
3. Proteolytic peptides are identified by correlating their fragmentation spectra with amino acid sequences from a protein database. 4. Chromatographic peaks of all identified peptide ions originating from one protein are reconstructed and used to calculate peak area of digest products for this protein. The chromatograms are plotted using m/z values of identified ions. Only the peaks in the close vicinity from the time of identification are taken into account to avoid pseudopeaks. Pseudopeaks are generated by species that are not proteolytic products of a particular protein but have similar m/z values. 5. The peak area of every protein digest in the mixture is normalized to peak area of the internal protein standard. Peak areas of one or several housekeeping proteins and even all identified proteins can be used as the reference value. So, the total peak area for every protein can be normalized to one or several housekeeping proteins or to all identified proteins in the mixture if it is known that the majority of the proteins in the sample are present at a constant level. 6. The entire procedure is repeated several times to increase precision of the relative quantitative measurements. Repeatability of the relative chromatographic peak area measurements and linearity of the calibration curve, being critical points of the quantitation method, have been tested in the course of this study. EXPERIMENTAL SECTION Chemicals. Five proteins and human plasma were purchased from Sigma (St. Louis, MO) in the form of lyophilized powder: bovine albumin, A-7638; horse hemoglobin, H-4632; horse ferritin, A-3641; horse myoglobin, M-0630; horse cytochrome c, C-7752; and human plasma, P9523. Solvents and reagents were purchased from different suppliers as follows: acetonitrile (Catalog No. 0151, Burdick & Jackson, Muskegon, MI); water (Catalog No. 421802, JT Baker, Phillipsburg, NJ); formic acid (Catalog No. 11670, EM Science, Gibbstown, NJ); ammonium bicarbonate (Catalog No. A-6141, Sigma); urea (Catalog No. 4111-01, JT Baker); sequencing grade modified trypsin (Catalog No. V5113, Promega, Madison, WI); iodoacetic acid (Catalog No. 35603) and dithiothreitol (DTT; Catalog No. 20290), both from Pierce (Rockford, IL). Sample Preparation. Stock solutions of protein digests were prepared as follows. Each protein was dissolved in 100 mM ammonium bicarbonate buffer and reduced by adding DTT. Stock solutions of five protein digests were further diluted and mixed together to prepare a dilution series for myoglobin including eight mixtures. Injection of 4-µL aliquots of these mixtures contained 1, 5, 10, 50, 100, 200, 500, and 1000 fmol of myoglobin. Albumin, hemoglobin, ferritin, and cytochrome c were constantly present in every injected mixture at 200 fmol. The same stock solutions of five proteins were used to prepare a dilution series for cytochrome c also including eight mixtures. In this series, the injected amount of cytochrome c was different in each mixture and equal to 1, 5, 10, 50, 100, 200, 500, and 1000 fmol. Concentrations of albumin, hemoglobin, ferritin, and myoglobin were constant, and the injected amount of each of these proteins was 200 fmol. For the second set of experiments, human plasma digest was prepared and spiked with myoglobin digest as follows. Lyophilized
human plasma was dissolved in a reducing buffer containing 6 M urea, 100 mM ammonium bicarbonate, and 30 mM DTT, pH 8.5, to a concentration of 10 mg/mL. The solution was clear, suggesting that most of the protein was dissolved in the buffer. The reduction of disulfide bonds was performed at 60 °C for 60 min. After cooling the sample solution to room temperature, cysteine residues were carboxymethylated with iodoacetic acid as follows. A 1 M stock solution of iodoacetic acid was made up fresh in 1 M NaOH. A 7-µL aliquot of the 1 M iodoacetic acid solution was added for every milligram of protein, mixed, and incubated for 30 min in the dark. The alkylation step increased the mass of cysteine residues by 58 Da. Then sample solution was diluted 5 times with 100 mM ammonium bicarbonate and digested with trypsin by adding 1 µg of trypsin/25 µg of human plasma. Trypsin was added to the human plasma solution containing 1.2 M urea without preliminary buffer exchange. This stock solution of human plasma digest was further diluted, divided into two fractions, and spiked with different amounts of myoglobin digest, so 15-µL injected aliquots contained 3.4 (200 fmol) or 6.8 ng (400 fmol) of myoglobin digest mixed with 500 ng of human plasma digest. LC/MS/MS. A Surveyor HPLC system (ThermoFinnigan, San Jose, CA) included an autosampler and a high-pressure pump. Eight 4-µL aliquots of a myoglobin dilution series and eight 4-µL aliquots of a cytochrome c dilution series were placed in wells of a 96-well plate with a conical bottom (Catalog No. 249946, Nalge Nunc, Naperville, IL) covered with polyester sealing tape (Catalog No. 236366, Nalge Nunc) and inserted in the autosampler maintained at 4 °C. All 16 samples were analyzed within 1 day according to the following procedure. The same sequence was repeated in three consecutive days, so every protein mixture from dilution series of myoglobin and cytochrome c was analyzed three times. A 4-µL aliquot of sample was aspirated from the bottom of the well into the autosampler needle and then injected into a 20µL sample loop. The rest of the loop was filled with solvent A. In the autosampler needle and in the sample loop, the 4-µL aliquot of sample was sandwiched between two 1-µL bubbles of air. This so-called “no-waste” injection routine allowed complete injection of small amounts of sample. After injection, the autosampler valve switched and sample from the loop was loaded on a polymeric reversed-phase peptide trap (peptide cap trap, Catalog No. 004/ 25108/32, Michrom BioResources, Inc., Auburn, CA) in front of a 75 µm i.d. × 10 cm capillary HPLC column with a 15-µm electrospray tip, which was packed with BioBasic C18 stationary phase, 5-µm particles, 300-Å pore (Catalog No. PFC7515-BI-10, New Objective, Inc., Cambridge, MA). The peptide trap was loaded for 3 min with a 10 µL/min isocratic flow of solvent A. For gradient elution, a 50 µL/min flow from the pump was split to 0.1 µL/min flow through the peptide trap and capillary column. Peptides were eluted from the column with a linear gradient of 0-60% B over 30 min (A, 0.1% formic in water; B, 0.1% formic acid in acetonitrile). Eluting peptides were analyzed by an LCQ DECA ion trap mass spectrometer equipped with a nanoelectrospray ion source (both ThermoFinnigan). The mass spectrometer operated in a data-dependent MS/MS mode, when the precursor ion was selected from the previous full-scan mass spectrum. Collision-induced dissociation was performed on the selected ion only once and then its m/z value was dynamically excluded for 1 min from further fragmentation. This feature of automated analysis Analytical Chemistry, Vol. 74, No. 18, September 15, 2002
4743
Figure 2. Example of reconstructed ion chromatogram for 2+ ion of cytochrome c peptide TGPNLHGLFGR (25 in Table 1). This reconstructed ion chromatogram was automatically plotted by using only intensities of mass spectral peaks with m/z 585.1 ( 0.5. Although the true cytochrome c peptide eluted as a 0.2-min-wide peak at 33.50 min, the chromatogram also features a pseudopeak at 31.66 min (see text for more details). AA, peak area automatically calculated by Xcalibur.
Figure 1. Typical base peak ion chromatogram features a series of spikes and valleys for MS and MS/MS scans (a). Full-scan mass spectrum of digest products eluted at 33.50 min (b) is followed by mass spectrum of fragments from selected precursor ion at m/z 585.1 (c). The later mass spectrum is dominated by b and y types of fragments, which is a typical pattern for collision-induced dissociation in ion trap.
provided access to a large number of peptides eluting (and often coeluting) during LC/MS/MS analysis of complex mixtures. Tandem mass spectra were correlated using TurboSequest software with a database, containing 4400 sequences of horse and bovine proteins downloaded from National Center for Biotechnology Information (NCBI) Web page at http://www.ncbi.nlm.nih.gov/Database/index.html. Output files from the correlation analysis were further summarized using the unified score of the three correlation coefficients generated by Sequest algorithm,31 Sp, Xcorr, and DeltaCn, to produce a list of identified peptides and corresponding proteins. A more complex mixture of human plasma spiked with myoglobin digest was separated using a linear gradient of 0-40% B over 80 min. This shallower gradient generated wider chromatographic peaks but provided more time to fragment multiple coeluting peptides. During the automated data-dependent analysis, one MS scan of the ion trap was followed by three MS/MS scans. To identify the peptides and corresponding proteins initially present in the mixture, tandem mass spectra were correlated with a human subset of Swiss-Prot database containing 7800 human proteins and horse myoglobin. The database was downloaded from the NCBI Web page using Swiss-Prot as a keyword. RESULTS AND DISCUSSION A typical ion chromatogram of the five-protein digest mixture is shown in Figure 1a. In this mixture, all proteins were present at 200-fmol levels. During the LC/MS/MS analysis, a full-scan mass spectrum of eluting peptides was followed by a tandem mass spectrum creating a series of spikes on the chromatogram. Fullscan mass spectra contributed to the top of the spikes. Whenever a single precursor peak was isolated and MS/MS was acquired, (31) Eng, J. K.; McCormack, A. L.; Yates, J. R., III. J. Am. Soc. Mass Spectrom. 1994, 5, 976-89.
4744 Analytical Chemistry, Vol. 74, No. 18, September 15, 2002
the ion current decreased, creating a valley between two spikes. For quantitative peak area measurements, intensities of precursor ions from the full-scan mass spectra were used; i.e., peaks on the ion chromatogram were smoothed by a line drawn through the tops of the spikes (Figure 1a). All identified digest products eluted in a 7-min interval shown in the figure. Approximately 300 mass spectra, half of them MS and the other half MS/MS, were acquired during this period of time or 1.4 s/spectrum. Figure 1b is a full-scan MS of digest products that eluted at 33.50 min. Figure 1c shows the MS/MS spectrum of the precursor ion with m/z 585.1. This tandem mass spectrum was identified as a doubly charged ion of cytochrome c peptide TGPNLHGLFGR (25 in Table 1). LC/MS/MS analysis of the entire dilution series including the equimolar mixture in Figure 1 was repeated three times. A total of 34 peptides were identified as digest products for the five-protein mixture, including the following: 16 albumin, 7 hemoglobin, 1 ferritin, 3 cytochrome c, and 5 myoglobin peptides. Many of these peptides were represented by two or more charge forms. Every acquired tandem mass spectrum was correlated with the database three times by assuming that it can be produced from singly, doubly, or triply charged precursor ion. Two charge forms of cytochrome c peptide TGPNLHGLFGR shown in Figure 1b were subjected to collision-induced dissociation during the elution time of this peptide, adding extra confidence to the identification by TurboSequest. A total of 61 ions were identified as digest products for the five-protein mixture, or ∼2 ion forms/peptide. Table 1 gives sequences of identified peptides, their charge states and m/z values, coefficients of cross correlation between each experimental MS/MS spectrum and theoretical fragmentation pattern derived from the database, and names of identified proteins with their gi numbers in NCBI database. All five proteins were unambiguously identified on three different days. Only those peptides that were identified more than once were included in Table 1. The chromatographic peak area of each identified ion (total 61) was measured after each peak was reconstructed by Xcalibur software. Figure 2 is an example of such a reconstructed ion chromatogram for the 2+ ion of cytochrome c peptide TGPNLHGLFGR. This reconstructed ion chromatogram was plotted by using only intensities of mass spectral peaks with m/z 585.1 ( 0.5. Xcalibur software automatically defined peaks by using specified peak parameters (such as noise level, minimum peak width, etc.) and calculated chromatographic area of the peak that is bounded by the intensity trace of the ion and the baseline. The automatically calculated peak area values (AA values) are shown
Table 1. Identified Peptide Numbers, Identified Peptide Sequences, m/z Values and Corresponding Charge States, TurboSequest Cross-Correlation Coefficients for Three Runs Performed in 3 Days, and Protein Names with gi Numbers from NCBI Database no.
peptide
1 2
ALKAWSVAR EACFAVEGPK
3
NECFLSHKDDSPDLPK
4
CCAADDKEACFAVEGPK
5
HLVDEPQNLIK
6 7
YNGVFQECCQAEDK YLYEIAR
8
DDPHACYSTVFDK
9
KVPQVSTPTLVEVSR
10
RHPEYAVSVLLR
11 12 13
LKPDPNTLCDEFK VPQVSTPTLVEVSR KQTALVELLK
14
LVNELTEFAK
15
SLHTLFGDELCK
16
QTALVELLK
17 18 19 20 21 22 23
charge
m/z
Protein: Albumin, gi 2190337 2+ 501.0 2+ 555.0 1+ 1108.5 3+ 635.3 2+ 952.1 3+ 644.8 2+ 966.2 2+ 653.6 1+ 1305.6 2+ 875.6 2+ 464.7 1+ 927.5 3+ 519.6 2+ 778.7 3+ 547.6 2+ 820.8 3+ 481.0 2+ 720.8 3+ 526.9 2+ 756.7 2+ 572.3 1+ 1142.5 2+ 582.6 1+ 1163.5 3+ 474.7 2+ 711.0 1+ 1420.5 2+ 508.6 1+ 1015.5
Xcorr 1
Xcorr 2
1.1 2.7 1.0 3.4 4.1 4.9
1.0 2.2
3.1 1.1 4.1 2.7 2.8 2.5 4.4 2.9 4.2 2.9 3.2 3.3 2.8 3.6 2.1 3.1 3.2 2.8 2.3
Protein: Hemoglobin A, gi 122411, and Hemoglobin B, gi 122614 VGGHAGEYGAEALER 3+ 505.7 2+ 757.8 DFTPELQASYQK 2+ 714.1 3.6 1+ 1426.6 2.0 TYFPHFDLSHGSAQVK 3+ 612.5 2.6 2+ 917.7 3.6 FLSSVSTVLTSK 2+ 635.2 3.1 1+ 1268.6 AAVLALWDK 2+ 494.1 3.4 1+ 986.5 2.0 MFLGFPTTK 2+ 521.2 2.7 1+ 1041.5 3+ 423.1 4.2 LLGNVLVVVLAR 2+ 633.5 3.8 1+ 1265.9 1.2
24
QNYSTEVEAAVNR
25
TGPNLHGLFGR
26 27
MIFAGIK EDLIAYLK
28 29
ELGFQG YKELGFQG
30
VEADIAGHGQEVLIR
31 32 33
ALELFR HGTVVLTALGGILKK HGTVVLTALGGILK
34
GLSDGEWQQVLNVWGK
Protein: Ferritin Light Chain, gi 1169741 2+ 741.2 1+ 1480.7 Protein: Cytochrome c, gi 117995 2+ 585.1 1+ 1168.6 1+ 779.5 2+ 483 1+ 964.5 Protein: Myoglobin, gi 70561 1+ 650.2 2+ 471.7 1+ 941.4 3+ 536.8 2+ 804.3 1+ 748.6 3+ 503.4 3+ 460.6 2+ 690.3 2+ 908.9
in Figure 2. The software reported chromatographic peak areas in arbitrary units of ion intensity times seconds. Although the true cytochrome c peptide eluted as a 0.2-minwide peak at 33.50 min, the chromatogram also features another,
Xcorr 3
2.1 1.1
3.5 4.4 4.5 3.4 2.3 3.8 2.3 2.7 2.9 3.9 2.3 4.1 2.3 3.5 3.0 3.2 2.0 3.3 2.1 3.1 3.1 1.2
4.4 5.4 2.1 2.8 2.7 1.5 2.8 2.4 4.0 2.9 3.8 2.9 3.3 3.7 3.5 3.5 3.5 2.2 1.3 3.1
3.4 2.5 2.8 1.6 1.5 3.6 3.3 2.5
3.4 2.2 3.4 1.4 3.5 1.5 0.9 1.6
3.3
4.1
4.3 2.0
2.3
3.2 2.1 1.7 2.1 2.0
3.2 2.1 1.5 2.1 1.8
3.0 2.0 1.6 2.3 1.9
1.0 2.7 1.8 3.4 4.4
1.1 2.5 1.7 3.7 3.6 1.0 4.2 4.0 4.7
1.2 2.7 2.0 3.5 4.3 1.1 4.2 3.6 5.1
4.0 3.8 4.4 4.8
unidentified peak at 31.66 min. This pseudopeak appeared on the reconstructed ion chromatogram, because its m/z value of 585.4 was close (within (0.5 Da) to the m/z value of the identified ion of cytochrome c. This pseudopeak was excluded from considerAnalytical Chemistry, Vol. 74, No. 18, September 15, 2002
4745
Figure 3. Reconstructed chromatograms for ions of myoglobin peptide ALELFR (31 in Table 1) including m/z 748.6 and albumin peptide SLHTLFGDELCK (15 in Table 1) including m/z 474.7, 711.0, and 1420.5. Eight mixtures of five-protein digests including myoglobin and albumin were prepared as follows. Albumin, hemoglobin, ferritin, and cytochrome c were constantly present in the injected 4-µL aliquots at a 200-fmol level. Injected amounts of myoglobin were 1, 5, 10, 50, 100, 200, 500, and 1000 fmol. The injected amount of myoglobin in each mixture measured and expected (in parentheses) peak area ratio of myoglobin: albumin is presented above each peak pair.
ation using the following rationale. On average, the chromatographic peaks were 0.2 min wide at the basement for our gradient of 0-60% B in 30 min. Therefore, only the peaks located within (0.2 min on reconstructed ion chromatograms from the time of their identification should be taken into account. Areas of all 61 ions were measured in the close vicinity from the time when the ions were identified to avoid pseudopeaks. This important step in our algorithm allowed removing the pseudopeaks generated by species, which are not the identified tryptic digest products, but have similar m/z values. The same rule was applied to other identified ions. Precision of the peak area measurements was significantly improved after this step. Figure 3 contains eight reconstructed chromatograms for ions of myoglobin peptide ALELFR (31 in Table 1) with m/z 748.6 (1+) and albumin peptide SLHTLFGDELCK (15 in Table 1) with m/z 474.7 (3+), 711.0 (2+), and 1420.5 (1+). Only a small, 1-min section of each ion chromatograms was reconstructed near the elution time of 34 min, when both peaks elute. Eight mixtures of five protein digests including albumin and myoglobin were prepared as follows. Albumin, hemoglobin, ferritin, and cytochrome c were kept constant at 200 fmol in the injected 4-µL aliquots. The injected amounts of myoglobin were 1, 5, 10, 50, 100, 200, 500, and 1000 fmol. Figure 3 illustrates the linear increase of the chromatographic peak of a myoglobin peptide with increasing myoglobin concentration and relative to the albumin peptide at constant concentration. Panels b-f of Figure 4 are reconstructed ion chromatograms of digest products for albumin (b), hemoglobin (c), ferritin (d), cytochrome c (e), and myoglobin (f). The chromatogram of albumin peptide ions in Figure 1b, for example, was plotted by using only intensities of mass spectral peaks with m/z values ((0.5 Da) for albumin ions from Table 1. The same boldface numbers labeling peaks in Figure 4 are used in Table 1. A base peak ion chromatogram is shown in Figure 4a for reference. The italic numbers on the chromatograms label pseudopeaks generated by other protein digests and unknown species with m/z values within 4746 Analytical Chemistry, Vol. 74, No. 18, September 15, 2002
Figure 4. Base peak ion chromatogram of protein mixture, where all components are presents at the 200-fmol level (a). Reconstructed ion chromatograms of digest products from bovine albumin (b); hemoglobin (c); ferritin (d); cytochrome c (e); myoglobin (f). Boldface numbers label peaks of identified peptides. Table 1 contains amino acid sequences and other information about the peptides. Italic numbers in parentheses label the pseudopeaks. The same fixed Y-scale was used for all ion chromatograms.
(0.5 Da from the identified digest products of particular protein. For example, peak (20, m/z 635.2) appeared on the reconstructed ion chromatogram of albumin, because m/z 635.3 for the 3+ ion of albumin peptide NECFLSHKDDSPDLPK (3) is close to m/z 635.2 for the 2+ ion of hemoglobin peptide FLSSVSTVLTSK (20). As we discussed earlier, this pseudopeak (20, m/z 635.2) was excluded from the total peak area of albumin digests and only the reconstructed ion peaks at the close vicinity from the time of peptide identification were used for quantitation. Figure 4 was saved as a layout template in Xcalibur software and applied to 24 LC/MS/MS data files of the myoglobin dilution series and then to 24 LC/MS/MS data files of the cytochrome c dilution series. Figure 5 is a calibration curve for a myoglobin digest mixed with albumin, hemoglobin, ferritin, and cytochrome c. The later four were constantly present in the eight injected mixtures at 200-fmol levels. The amounts of myoglobin, 1, 5, 10, 50, 100, 200, 500, and 1000 fmol, shown on the X-axis. The Y-axis gives peak area of protein digests for each protein normalized to peak area of albumin in each LC/MS/MS data file and averaged for three measurements on different days. In Figure 5, the error bars show the standard deviation (1σ) of the measurements on three different days. RSD values for myoglobin at 1 and 5 fmol were above 60%, indicating that these measurements are at the noise level. RSD for 10 fmol was 36% and then came below 15% for higher concentrations in the dilution series. So, RSD values for the majority of data points on the plot are below 20%. R2 ) 0.9978 for the linear regression line of myoglobin is a good indication that the relative peak area of myoglobin digests increases linearly with increasing concentrations from 10 to 1000 fmol. For protein digests present in the mixture at a constant level,
Figure 5. Calibration curve for myoglobin digest mixed with albumin, hemoglobin, ferritin, and cytochrome c, which were present in the injected 4-µL aliquot at the 200-fmol level. The X-axis shows amount of myoglobin in each injected mixture. The Y-axis gives peak area of identified tryptic peptides for each protein normalized to peak area of albumin in each run. Each data point is the average of three measurements with the error bars representing the standard deviation (1σ). The regression line for myoglobin has parameters y ) 0.0025x + 0.0128, R2 ) 0.9978.
Figure 6. Calibration curve for cytochrome c digest mixed with albumin, hemoglobin, ferritin, and myoglobin, which were present in the injected 4-µL aliquot at the 200-fmol level. The X-axis shows amount of cytochrome c in each injected mixture. The Y-axis gives peak area of identified tryptic peptides for each protein normalized to peak area of albumin in each run. Each data point is the average of three measurements with the error bars representing standard deviation (1σ). The regression line for cytochrome c has parameters y ) 0.0006x - 0.0094, R2 ) 0.9934.
reproducibility was also measured for eight injections within each day and was better than 20% RSD. Although it is reasonable to suggest that the charge state would increase with decreasing analyte concentration in solution, we have not observed this effect for the tryptic peptides analyzed in the course of this study. To minimize the random fluctuations in the charge-state distribution, the peak areas for all identified charge states of a given protein were summed together. The same set of 24 LC/MS/MS analyses and calculations was repeated for the five-protein mixture. This time, the amount of cytochrome c was equal to 1, 5, 10, 50, 100, 200, 500, and 1000 fmol in the eight protein mixtures, while albumin, hemoglobin, ferritin, and myoglobin digests were present at a constant level of 200 fmol in every injected aliquot. The series of eight LC/MS/
Figure 7. Base peak ion chromatograms of 500-ng human plasma digests spiked with 200 () 3.4 ng, a) and 400 fmol () 6.8 ng, b) of myoglobin digest.
Figure 8. Superposition of base peak ion chromatograms of human plasma digest samples spiked with 200 and 400 fmol of myoglobin digest (a). Superposition of reconstructed ion chromatograms of myoglobin peptides for the samples containing 200 and 400 fmol of myoglobin in human plasma (b). Superposition of reconstructed ion chromatograms of apolipoprotein A-I peptides for the same samples (c). Reconstructed ion chromatograms were plotted by Xcalibur software using m/z values of multiply charged peptides from myoglobin and apolipoprotein A-I shown in Table 2. Unlabeled peaks are the pseudopeaks.
MS analyses was repeated three times on different days. Figure 6 gives a calibration curve for cytochrome c. Every data point in the plot is an average of three measurements. Similar to the myoglobin series, RSD for cytochrome c data points at 1 and 5 fmol was very high, indicating that these concentrations could not be measured reproducibly. The data point at 10 fmol has 33% RSD and then reproducibility improved to below 20% RSD. R2 ) 0.9934 was the parameter value of the linear regression line for the cytochrome c calibration curve. The proposed quantitation method was applied to a more complex mixture of human plasma digest spiked with two different amounts of myoglobin digest. Myoglobin played the role of an Analytical Chemistry, Vol. 74, No. 18, September 15, 2002
4747
Table 2. Identified Peptide Sequences for Human Plasma Apolipoprotein A-I and Horse Myoglobin, Charge States of the Identified Peptides, Their Experimental m/z Values, and Calculated Peak Areas for These Two Protein Digests in Samples “200” and “400”a area
m/z no. 1 2 3 4
5 6 7 8 9 10 11 12
peptide
charge
“200”
“400”
Myoglobin Tryptic Peptides in Human Plasma Digest HPGDFGADAQGAMTK 2 752.3 752.2 HPGDFGADAQGAMTK 3 501.9 502.2 LFTGHPETLEK 3 425.1 425.1 LFTGHPETLEK 2 636.7 636.6 LFTGHPETLEK 1 1271.5 1271.5 VEADIAGHGQEVLIR 3 536.7 536.8 VEADIAGHGQEVLIR 2 804.4 804.7 VEADIAGHGQEVLIR 1 1606.5 1606.6 HGTVVLTALGGILK 2 690.4 690.3 HGTVVLTALGGILK 1 1378.4 1378.5 HGTVVLTALGGILK 3 460.7 460.6 total peak areas of myoglobin peptides myoglobin to apolipoprotein A-I peak area ratio myoglobin to apolipoprotein A-I peak area ratio normalized to “200” Apolipoprotein A-I Tryptic Peptides in Human Plasma Digest ATEHLSTLSEK 2 608.8 608.9 ATEHLSTLSEK 3 406.6 406.7 ATEHLSTLSEK 1 1215.5 1215.5 AKPALEDLR 2 507.4 507.4 THLAPYSDELR 3 435.3 434.9 THLAPYSDELR 1 1301.4 1301.5 VQPYLDDFQK 1 1252.5 1252.5 DYVSQFEGSALGK 2 701 701.2 DYVSQFEGSALGK 1 1402.5 1400.5 LLDNWDSVTSTFSK 2 807.6 807.8 QGLLPVLESFK 2 616.3 616.2 QGLLPVLESFK 1 1230.7 1230.5 VSFLSALEEYTK 2 693.9 694.3 total peak area of apolipoprotein A-0I peptides
“200”
“400”
5.78 × 108
8.55 × 108
1.22 × 109
2.58 × 109
8.23 × 108
1.25 × 109
8.62 × 108
1.78 × 109
3.48 × 109 1.15 1
6.46 × 109 2.23 1.93
1.94 × 108
2.02 × 108
1.10 × 108 2.41 × 109
1.33 × 108 2.22 × 109
3.27 × 108
3.33 × 108
3.04 × 109
2.89 × 109
a Sample “200” and sample “400” contain the same human plasma digest but spiked with different amounts of myoglobin digest of 200 and 400 fmol, respectively. Peaks 5- 7, 11, and 12 could not be defined by the software, so the cells in the table were left blank.
unknown protein of interest, whose concentration varies in human plasma. For this study, we chose to make digests of the myoglobin and plasma separately and then combined the digested materials because this approach was simpler and should not be significantly different from spiking the myoglobin in the form of intact protein. Figure 7 shows base peak ion chromatograms of a 500-ng human plasma digest spiked with 200 () 3.4 ng, a) and 400 fmol () 6.8 ng, b) of myoglobin digest. These samples were designated “200” and “400”, respectively. TurboSequest search against the protein database identified horse myoglobin and approximately 30 human plasma proteins including albumin, serotransferrin, apolipoprotein A-I, R-1 antitrypsin, IgG chains, fibrinogen chains, macroglobulin, complement 3, hemopexin, haptoglobin, prealbumin, and others. Apolipoprotein A-I was chosen as an internal standard protein. Table 2 contains identified peptide sequences for apolipoprotein A-I and myoglobin, charge states of the identified peptides, their experimental m/z values, and calculated peak areas for these two protein digests in sample “200” and sample “400”. Figure 8 shows a superposition of base peak ion chromatograms of human plasma digest spiked with 200 and 400 fmol of myoglobin digest (a), a superposition of reconstructed ion chromatograms of myoglobin peptides for these samples (b), and a superposition of reconstructed ion chromatograms of apolipoprotein A-I peptides for the same samples (c). The reconstructed ion chromatograms were plotted by Xcalibur software using m/z values of multiply charged peptides from myoglobin and apolipoprotein A-I shown in Table 4748
Analytical Chemistry, Vol. 74, No. 18, September 15, 2002
2. Unlabeled peaks in the figure are the pseudopeaks, which are not due to the myoglobin or apolipoprotein A-I tryptic peptides shown in Table 2 but have similar m/z values (within (0.5 Da). Only the ion peaks of identified peptides reconstructed in close proximity from the moment of their identifying MS/MS scan (within (1 min) were taken into consideration. In Table 2, the total peak area of apolipoprotein A-I peptides remains approximately constant for the two samples at 3.03 × 109 arbitrary peak area units and 2.89 × 109 units. The total peak area of myoglobin peptides increases 2 times from 3.48 × 109 units for sample “200” to 6.46 × 109 units for sample “400”. After normalizing all four peak area values to the peak area of apolipoprotein A-I in sample “200”, the myoglobin-to-apolipoprotein A-I ratios were 1.00 and 1.93 for the samples containing 200 and 400 fmol of myoglobin digests, correspondently. The LC/MS/ MS analysis of the same samples was repeated again to produce the myoglobin-to-apolipoprotein A-I ratios of 0.99 and 2.21 for the samples. So, after four runs, the myoglobin-to-apolipoprotein A-I total peak area ratios were 1.00 and 0.99 for the sample containing 200 fmol of myoglobin and 1.93 and 2.21 for the sample containing the doubled amount of myoglobin. The results presented here indicate that the method described may be useful for the relative quantitation of proteins in relatively simple mixtures such as 2D gel spots and also in much more complex samples such as a digest of total human plasma protein. The technique as illustrated may run into spectral congestion
problems while complex mixtures are being anaylzed, and lowabundance components may be difficult to discern and quantitate. Multidimensional chromatography (LC/LC/MS/MS) in principle would be the next logical step to improving the analytical capabilities of our method, because it produces fewer coeluting peaks and fewer pseudopeaks. CONCLUSIONS We investigated a new approach for relative quantitation of individual proteins within complex mixtures using LC/MS/MS analysis of unfractionated digests (shotgun analysis). The method was evaluated on a mixture of five standard proteins. Four proteins were maintained at a constant concentration while the concentration of the fifth protein was varied in a wide range. Peak areas of protein digests were normalized to peak area of albumin digest, which was one of the constant proteins. The entire procedure was repeated three times. With 20% RSD after three measurements, the peak area of the four protein digests always present in the mixture at a 200-fmol level was constant. The relative peak area of the fifth protein linearly increased with increasing amount from 10 to 1000 fmol with linear regression R2 values of 0.9978 and 0.9934 in two experiments. The method was also tested on two samples of human plasma digest spiked with 200 and 400 fmol of horse myoglobin digest. The experiments confirmed that, when myoglobin amount in human plasma increased two times from
200 to 400 fmol, total peak area of myoglobin digest also increased two times (within 11% error) relative to a standard, “housekeeping” protein from human serum. To illustrate the method, we used only one “unknown” protein, horse myoglobin, and one internal standard “housekeeping” protein, apolipoprotein A-I in human plasma. The described method of protein identification and relative quantitation provides good accuracy and linearity of the calibration curve. Expensive reagents and multiple chemical reaction steps are not needed to create an internal standard, as in ICAT or similar methods. The potential of the method for proteomics application will be further evaluated. ACKNOWLEDGMENT We thank Fernando Maroto, Andrew Guzzetta, Trevor Hall, Antonio Piccolboni, and Chris Becker for fruitful discussions. We thank Tom McCall for developing and installing the “no-waste” injection option on the autosampler and other improvements in HPLC.
Received for review April 11, 2002. Revised manuscript received June 14, 2002. Accepted July 8, 2002. AC0256991
Analytical Chemistry, Vol. 74, No. 18, September 15, 2002
4749