Reproducibility of Quantitative Proteomic Analyses of Complex

David L. Tabb , Lorenzo Vega-Montoto , Paul A. Rudnick , Asokan Mulayath Variyath , Amy-Joan L. Ham ..... Sabina I. Belli , Robert A. Walker , Sarah A...
0 downloads 0 Views 188KB Size
Anal. Chem. 2003, 75, 5054-5061

Reproducibility of Quantitative Proteomic Analyses of Complex Biological Mixtures by Multidimensional Protein Identification Technology Michael P. Washburn,*,†,‡ Ryan R. Ulaszek,†,§ and John R. Yates, III†,§,|

Proteomics, Torrey Mesa Research Institute, 3115 Merryfield Row, San Diego, California 92121

If quantitative proteomic technologies are to be of widespread use to the biological community, the reproducibility of each method must be investigated and determined. We have analyzed the reproducibility of complex quantitative proteomic analyses of metabolically labeled S. cerevisiae analyzed via multidimensional protein identification technology (MudPIT). Three independent cell growths of S. cerevisiae grown in rich and minimal media and independent MudPIT analyses of each were compared and contrasted. Quantitative MudPIT was found to be intra- and interexperimentally reproducible at both the peptide and protein levels. Proteins of potential low abundance were detected, identified, and quantified by identical peptides from three independent samples. In addition, when multiple peptides were matched to a protein, the relative abundance of each peptide was in agreement across the three samples. Despite the reproducibility, errors in the experimental determination of protein expression levels occurred, but the impact of the variation was minimized by replicate experiments. Last, quantitative MudPIT analyses will likely be improved by increasing the number of peptide hits per protein in a given analysis, which will provide for greater intraexperimental reproducibility. In the past few years, a wide variety of quantitiative proteomic methods have been described in the literature.1 The goal of every quantitative proteomic method is to modify a protein or peptide in some fashion in order to differentiate the mass of a protein or peptide from two independent cell growths and determine the protein expression changes brought about by a stimulus. There are effectively four different classes of quantitative protemic analyses depending on at which point in an experimental protocol the proteins or peptides from two independent samples are differentially labeled.2 In order of earliest introduction of “heavy” * To whom correspondence should be addressed. E-mail: [email protected]. † Torrey Mesa Research Institute. ‡ Current address: Stowers Institute for Medical Research, 1000 E. 50th St., Kansas City, MO 64110. § Current address: Diversa Corp., 4955 Directors Place, San Diego, CA 92121. | Current address: Department of Cell Biology, The Scripps Research Institute, 10550 North Torrey Pines, Rd., La Jolla, CA 92037. (1) Hamdan, M.; Righetti, P. G. Mass. Spectrom. Rev. 2002, 21, 287-302. (2) Washburn, M. P.; Ulaszek, R.; Deciu, C.; Schieltz, D. M.; Yates, J. R., 3rd. Anal. Chem. 2002, 74, 1650-1657.

5054 Analytical Chemistry, Vol. 75, No. 19, October 1, 2003

and “light” labels to pursue a quantitative proteomic analysis, the four classes are metabolic labeling, postgrowth amino acid labeling, digestion labeling, and postdigestion labeling.2 In metabolic labeling, the cells used in an experimental protocol are grown in two media in which one medium contains a naturally abundant isotope of an atom, typically 14N, and the other medium contains a heavy isotope of an atom, typically 15N.3,4 Metabolic labeling with nitrogen has been demonstrated as a quantitative proteomic system in Saccharomyces cerevisiae,2-5 Deinococcus radiodurans,6,7 and mammalian tissue culture systems.6,8 In another form of metabolic labeling, cells are grown in media containing one or more individual amino acids that contain “heavy” isotopes or “light” isotopes.9-11 Single or multiple amino acid metabolic labeling has been demonstrated in S. cererviaise9,10 and tissue culture systems.11 In every quantitative proteomic method described above, chromatographic separations are a key aspect of the analyses. Typically, the initial description of a quantitative proteomic system describes a single-dimensional analysis of a labeled sample, but as with any complex mixture, truly large-scale quantitative proteomic analyses require multidimensional separations. Multidimensional quantitative proteomic analyses have been described using ICAT12,13 and metabolic labeling via multidimensional protein identification technology (MudPIT).2,5 (3) Oda, Y.; Huang, K.; Cross, F. R.; Cowburn, D.; Chait, B. T. Proc. Natl. Acad. Sci. U.S.A. 1999, 96, 6591-6596. (4) Lahm, H. W.; Langen, H. Electrophoresis 2000, 21, 2105-2114. (5) Washburn, M. P.; Koller, A.; Oshiro, G.; Ulaszek, R.; Plouffe, D.; Deciu, C.; Winzeler, E. A.; Yates, J. R., III. Proc. Natl. Acad. Sci. U.S.A. 2003, 100, 3107-3112. (6) Conrads, T. P.; Alving, K.; Veenstra, T. D.; Belov, M. E.; Anderson, G. A.; Anderson, D. J.; Lipton, M. S.; Pasa-Tolic, L.; Udseth, H. R.; Chrisler, W. B.; Thrall, B. D.; Smith, R. D. Anal. Chem. 2001, 73, 2132-2139. (7) Lipton, M. S.; Pasa-Tolic, L.; Anderson, G. A.; Anderson, D. J.; Auberry, D. L.; Battista, J. R.; Daly, M. J.; Fredrickson, J.; Hixson, K. K.; Kostandarithes, H.; Masselon, C.; Markillie, L. M.; Moore, R. J.; Romine, M. F.; Shen, Y.; Stritmatter, E.; Tolic, N.; Udseth, H. R.; Venkateswaran, A.; Wong, K. K.; Zhao, R.; Smith, R. D. Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 11049-11054. (8) Pasa-Tolic, L.; Harkewicz, R.; Anderson, G. A.; Tolic, N.; Shen, Y.; Zhao, R.; Thrall, B.; Masselon, C.; Smith, R. D. J. Am. Soc. Mass Spectrom. 2002, 13, 954-963. (9) Berger, S. J.; Lee, S. W.; Anderson, G. A.; Pasa-Tolic, L.; Tolic, N.; Shen, Y.; Zhao, R.; Smith, R. D. Anal. Chem. 2002, 74, 4994-5000. (10) Jiang, H.; English, A. M. J. Proteome Res. 2002, 1, 345-350. (11) Ong, S. E.; Blagoev, B.; Kratchmarova, I.; Kristensen, D. B.; Steen, H.; Pandey, A.; Mann, M. Mol. Cell Proteomics 2002, 1, 376-386. (12) Gygi, S. P.; Rist, B.; Gerber, S. A.; Turecek, F.; Gelb, M. H.; Aebersold, R. Nat. Biotechnol. 1999, 17, 994-999. (13) Han, D. K.; Eng, J.; Zhou, H.; Aebersold, R. Nat. Biotechnol. 2001, 19, 946951. 10.1021/ac034120b CCC: $25.00

© 2003 American Chemical Society Published on Web 08/19/2003

MudPIT was first developed by Link et al. to comprehensively analyze the S. cerevisiae ribosome.14 A series of systems improvements15 led to the ability of MudPIT to carry out large-scale qualitative analyses of the S. cerevisiae,16 Orysa sativa,17 and Plasmodium falciparum18 proteomes. Furthermore, the ability of MudPIT to rapidly analyze a metabolically labeled quantitative proteomic sample has been demonstrated.2 A major shortcoming of each quantitative proteomic method described so far has been the lack of detailed analyses of the reproducibility and limitations to reproducibility of each system. For the most part, each quantitative proteomic system yet described was carried out on a single cellular growth and therefore no interexperimental reproducibility was reported. Knowledge of the limitations of reproducibility of a system is essential since oftentimes followup biochemistry and molecular biological analyses will be carried out on a large-scale quantitative proteomic analysis. Last, if quantitative proteomic analyses are to ever reach the capabilities of oligonucleotide or cDNA array analyses, the reproducibility and limits of reproducibility of each potential system must be defined and understood. In the current analysis, we set out to determine the reproducibility of a quantitative proteomic analysis as carried out by MudPIT and propose a strategy by which to improve quantitative MudPIT analyses. EXPERIMENTAL PROCEDURES Materials. Standard chemical reagents and ammonium acetate, Na2HPO4, KH2PO4, and Na2CO3 were obtained from Sigma (St. Louis, MO). Ammonium-15N sulfate (99 atom %) and ammonium-14N sulfate (99.99 atom %) were products of Aldrich (Milwaukee, WI). DIFCO bactopeptone, dextrose, yeast extract, and yeast nitrogen base without amino acids or ammonium sulfate were acquired from Becton Dickinson Microbiology Systems (Sparks, MD). HPLC chemicals including glacial acetic acid and HPLC grade acetonitrile (ACN) and HPLC grade methanol were purchased from Fischer Scientific (Fair Lawn, NJ). Heptafluorobutyric acid (HFBA) was obtained from Pierce (Rockford, IL). Labeling and Growth of S. cerevisiae. Overnight cultures in identical media of the S. cerevisiae strain S288C were prepared in YEPD (10 g of Bacto yeast extract, 20 g of Bacto peptone, and 20 g of dextrose per liter) or 15N minimal media (1.7 g of yeast nitrogen base without amino acids and ammonium sulfate, 20 g of dextrose, and 5 g of either ammonium sulfate per liter) at 30 °C. On the following day, the overnight cultures were diluted 1:200 in fresh media and were grown at 30 °C to mid log phase (OD 0.6) followed by centrifugation at 1000g. The cell pellets were washed three times with 1× phosphate-buffered saline (1.4 mM NaCl, 0.27 mM KCl, 1 mM Na2HPO4, 0.18 mM KH2PO4, pH 7.4) followed by additional centrifugation steps at 1000g. Three (14) Link, A. J.; Eng, J.; Schieltz, D. M.; Carmack, E.; Mize, G. J.; Morris, D. R.; Garvik, B. M.; Yates, J. R., 3rd. Nat. Biotechnol. 1999, 17, 676-682. (15) Wolters, D. A.; Washburn, M. P.; Yates, J. R. 3rd. Anal. Chem. 2001, 73, 5683-5690. (16) Washburn, M. P.; Wolters, D.; Yates, J. R., 3rd. Nat. Biotechnol. 2001, 19, 242-247. (17) Koller, A.; Washburn, M. P.; Lange, B. M.; Andon, N. L.; Deciu, C.; Haynes, P. A.; Hays, L.; Schieltz, D.; Ulaszek, R.; Wei, J.; Wolters, D.; Yates, J. R., 3rd. Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 11969-11974. (18) Florens, L.; Washburn, M. P.; Raine, J. D.; Anthony, R. M.; Grainger, M.; Haynes, J. D.; Moch, J. K.; Muster, N.; Sacci, J. B.; Tabb, D. L.; Witney, A. A.; Wolters, D.; Wu, Y.; Gardner, M. J.; Holder, A. A.; Sinden, R. E.; Yates, J. R.; Carucci, D. J. Nature 2002, 419, 520-526.

independent cell growths in each medium were carried out, and each of the independent cell growths was subjected to independent quantitative MudPIT analyses as described below. Preparation of Soluble S. cerevisiae Proteome Samples for Quantitative MudPIT Analysis. As described previously, the soluble portion of the proteome of S. cerevisiae was prepared by sodium carbonate extraction.2 The microBCA protein assay (Pierce, Rockford, IL) was used to determine the protein content of the supernatant from each of the three cell lysates. Equal protein amounts from the supernatant of the lysis of cells grown in 15N minimal media or YPD media were then mixed. After mixing, each of the samples was brought to 8 M urea, the pH was adjusted to pH 8.5, and then each sample was prepared for MudPIT analysis as described previously.2 Quantitative Multidimensional Protein Identification Technology Analysis. Each sample was subjected to analysis by MudPIT as described previously.2 The MudPIT systems consisted of a Finnigan DECA ion trap mass spectrometer (Finnigan MAT, San Jose, CA) that was interfaced with a quaternary HP 1100 series HPLC pump (Agilent Technologies, Palo Alto, CA). The biphasic microcapillary columns used consisted of fritless capillary fusedsilica microcolumns (100 i.d. × 365 µm o.d.) prepared with a P-2000 laser puller (Sutter Instruments Co., Novato, CA), packed with reversed-phase (5-µm Zorbax Eclipse XDB-C18, Agilent Technologies) and strong cation exchange (5-µm Partisphere, Whatman, Clifton, NJ) packing materials. The interfacing of the biphasic column with the mass spectrometer and HPLC pump was as described previously.16,19 A fully automated 13-cycle chromatographic run was carried out on each sample using the four buffer solutions used for the chromatography consisting of 5% ACN/0.012% HFBA/0.5% acetic acid, 80% ACN/0.012% HFBA/ 0.5% acetic acid, 250 mM ammonium acetate/5% ACN/0.012% HFBA/0.5% acetic acid, and 500 mM ammonium acetate/5% ACN/ 0.012% HFBA/0.5% acetic acid.2 With the resulting data set collected from the mass spectrometer after the completion of the chromatographic analysis, the SEQUEST algorithm20 was used to interpret the tandem mass spectra generated as described previously.2 The SEQUEST20 algorithm was run two separate times on each of the three data sets against the yeast_orfs.fasta database from the National Center for Biotechnology Information. Each set of data from the mass spectrometry analysis was analyzed by SEQUEST two independent times to separately detect and identify peptides in the sample from cells grown in YEPD media and to identify peptides in the sample from cells grown in 15N minimal media. Two separate SEQUEST parameters files were prepared in order to carry out this task, one with the masses of each amino acid set to reflect the amino acid containing all 14N and the other parameters file set to reflect the masses of each amino acid containing all 15N. Previously described selection criteria for filtering the SEQUEST results based on the charge state of the peptide, the ∆Cn value of the SEQUEST result, the tryptic nature of the peptide, and the Xcorr value of the SEQUEST result were applied to both the 14N and 15N results files.14-16 Briefly, the following ∆Cns/Xcorrs were used; fully tryptic +1 peptides 0.1/1.9, fully or partially tryptic +2 (19) Gatlin, C. L.; Kleemann, G. R.; Hays, L. G.; Link, A. J.; Yates, J. R., 3rd. Anal. Biochem. 1998, 263, 93-101. (20) Eng, J.; McCormack, A. L.; Yates, J. R., III. J. Am. Mass Spectrom. 1994, 5, 976-989.

Analytical Chemistry, Vol. 75, No. 19, October 1, 2003

5055

Table 1. Intra-/Interexperimental Reproducibility of Relative Abundance of Peptide Hits relative abundance (YPD/15N-Min)c CAIa

peptideb

expt 1

expt 2

expt 3

YER004w YDL141w (Bpl1p) YPR004c YNL216w (Rap1p) YKL073w (Lhs1p) YLR292c (Sec72p) YBR248c (His7) YAL062w (Gdh3p) YCL040w (Glk1p)

0.12 0.14 0.15 0.15 0.15 0.15 0.16 0.16 0.16

YJR148w (Bat2p) YPR191w (Qcr2p)

0.21 0.23

YPR127w YGR234W (Yhb1p)

0.24 0.28

YDL185w (Vma1p)

0.31

YMR120c (ADE17p)

0.32

YKL182w (Fas1p)

0.37

YOR310c (Nop5p) YLR303w (Met17p)

0.40 0.46

YPL061w (Ald6p)

0.51

YER091c (Met6p)

0.66

YBR196c (Pgi1p)

0.69

YLR075w (Rpl10p)

0.80

K.HIIILRPGPLLGER.T Y.YAVSTVNVKVLQTEPWM.S K.LLSPLADVLHAAIGATR.A K.IATTDLFLPLFFHFGSTR.Q R.FPQNTLLHLKPLLGK.S K.HLEALQDLDFLLGTGLIQPDVFVR.K R.LILPGVGNYGHFVDNLFNR.G R.HIGKDTDVPAGDIGVGGR.E K.LSTNPGFHLFEK.R R.NILVDLHSQGLLLQQYR.S R.IC*LPTFDPEELITLIGK.L K.ATFLKDDLPYYVNALADVLYK.T K.TAFKPHELTESVLPAAR.Y S.VKNSVSAIGGYIDIFEVAR.I K.KMYEEALWPGWKPFDITAK.E K.IIVHTDTEPLINAAFLK.E K.FRPAPAAAFAR.E K.FYDSNYPEFPVLR.D R.NNLNTENPLWDAIVGLGFLK.D K.HVSPAGAAVGIPLSDVEK.Q K.NLTEQAIIDLTVATIAIK.Y K.TLHPAVHGGILAR.D K.ATHILDFGPGGASGLGVLTHR.N R.HFASYANLPGTITHGMFSSASVR.A R.FLPVASPFHSHLLVPASDLINK.D K.AIAPNLTQLVGELVGAR.L R.DLGPLMNPFASFLLLQGVETLSLR.A K.HGSQLFGLEVPGYVYSR.F K.WIGGHGTTIGGIIVDSGK.F K.GYFIRPTVFYDVNEDMR.I R.IYVQEGIYDELLAAFK.A C.ILKPAAVTPLNALYFASLC*K.K K.DSLDLEPLSLLEQLLPLYTEILSK.L N.HIGLGLFDIHSPR.I K.TQAMQLALALRDEVNDLEAAGIK.V K.VFSGNRPTTSILAQK.I I.PSDFILAAQSHNPIENK.L K.VVDPETTLFLIASK.T K.ATVDEFPLC*VHLVSNELEQLSSEALEAAR.I K.KWGFTNLDRPEYLK.K R.VDIGQIIFSVR.T

0.28 8.38 0.37 0.79 1.21 0.42 0.37 0.14 1.09 0.32 4.94 0.27 0.38 1.16 17.1 6.06 0.47 0.21 0.69 0.28 0.29 0.35 0.77 0.74 0.86 1.08 0.13 0.07 0.20 0.59 1.15 1.20 0.15 0.09 0.13 0.70 0.73 1.00 3.32

0.14 10.13 0.12 0.78 1.13 0.49 0.34 0.02 0.46 0.34 6.43 0.21 0.36 1.9 9.9 8.07 0.32 0.26 0.64 0.36 0.32 0.45 0.76 0.79 1.06 1.01 0.14 0.12 0.12 1.05 1.02 1.8 0.09 0.11 0.08 0.64 0.63 0.75 2.64

0.39 15.86 0.29 1.14 0.81 0.26 0.26 0.18 0.39 0.37 3.18 0.12 0.18 1.08 10.72 5.09 0.39 0.49 0.16 0.29 0.19 0.37 0.89 1.17 0.88 1.28 0.13 0.11 0.2 0.99 1.28 1.08 0.06 0.11 0.05 0.91 0.95 0.74 2.45

1.84 2.50

3.68 3.41

2.48 2.06

loci (name)

a CAI is an abbreviation for Codon Adaptation Index. b The peptide identified by SEQUEST and quantified in the current analysis is bracketed by periods. The amino acid residues outside of the periods detail the cleavage sites, which generated a given peptide. An * next to a cysteine residue indicated modification by iodoacetamide. c The ratios provided under the experiment 1-3 columns are the relative abundance ratios of the peptides shown determined from three separate experiments. The ratio was calculated by dividing the elution peak intensity of the peptide from cells grown in rich (YPD) media (14N peptides) by the elution peak intensity of the peptide form cells grown in 15N minimal (Min) media.

peptides 0.1/2.2, +2 peptides regardless of tryptic nature 0.1/ 3.0, and fully or partially tryptic +3 peptides 0.1/3.75. The relative abundances of peptides detected and identified in each MudPIT analysis were computationally determined using the raw chromatographic data from the mass spectrometer as described previously.2 RESULTS AND DISCUSSION Three independent cultures of S. cerevisiae in YEPD and 15N minimal media were grown and analyzed via quantitative MudPIT. The biological results of this experiment and the correlation of the quantitative proteomic data set to oligonucleotide array analysis have been described.5 However, a key aspect of the data set was not discussed and analyzed in that paper. The interexperimental protein reproducibility is partially presented in Table 3 of the report,5 but no discussion of the peptide level reproducibility was presented. The current paper demonstrates the details of the reproducibility of quantitative MudPIT. 5056 Analytical Chemistry, Vol. 75, No. 19, October 1, 2003

There are effectively two types of reproducibility that should be considered in a quantitative proteomic analysis, that of intraexperimental reproducibility and interexperimental reproducibility. In intraexperimental reproducibility, a quantitative proteomic method like MudPIT would detect, identify, and quantify multiple peptides from the same protein and each peptide would yield a similar ratio in a single MudPIT analysis. In interexperimental reproducibility, a quantitative proteomic method like MudPIT would detect, identify, and quantify either the same peptide from the same protein in independent MudPIT analyses of independently generated samples or different peptides from the same protein in independent MudPIT analyses of independently generated samples and generate similar relative abundance ratios in either case. For quantitative proteomic methods to provide valuable data, intra- and interexperimental reproducibility must be achieved for proteins of both high and low abundance. In S. cerevisiae, the codon adaptation index value is a positive

predictor of mRNA abundance21 and is widely used as a predictor of protein abundance. On the basis of the identifications of proteins with varying CAI values, MudPIT has been demonstrated to detect, identify, and quantify2,5 proteins of high and low abundance, but the peptide level of reproducibility has yet to be analyzed. In the experimental system described in the current article, both intra- and interexperimental reproducibilities were achieved for a large number of proteins in three separate experiments run on three independently grown and generated S. cerevisiae samples. Table 1 provides the peptide expression ratios of peptides detected, identified, and quantified in all three analyses for a variety of proteins ranging from a CAI of 0.123 for YER004w to a CAI of 0.797 for YLR075w (Rpl10p). The maximum number of peptides listed for any given protein is three, even though additional peptides were detected and identified for many loci, especially loci with high CAI values (data not shown). Several patterns emerge from this table. To begin, the reproducibility is not limited to fully tryptic peptides. For example, the partially tryptic peptide N.HIGLGLFDIHSPR.I from YER091c (Met6p, CAI 0.656) was reproducibly detected, identified, and quantified in three independent analyses. Figure 1, details the MS elution profiles of this peptide from experiments 1-3 in Table 1. The elution profile of the peptide in experiment 1 from Table 1 is shown in Figure 1A, experiment 2 in Figure 1B, and experiment 3 in Figure 1C. In each case, the 15N peak is clearly more abundant than the 14N peak. An MS/MS of the 14N peptide was not generated in any of the three experiments, but the ratios for each peptide were 0.09, 0.11, and 0.11, respectively. Several additional pieces of evidence validate the SEQUEST analysis of this peptide. First, the number of nitrogen atoms in the 14N and 15N of the peptide agree with the interpretation. Furthermore, two additional peptides to Met6p are listed in Table 1, and over the three experiments, the relative abundance of each peptide was ∼0.1, as with that of N.HIGLGLFDIHSPR.I. Finally, in the three separate analyses, YER091c was detected, identified, and quantified by up to 30 unique peptides with an average protein expression ratio of 0.07 ( 0.017 (Table 3 from ref 5). All of these pieces of data support the quantitative analysis of this particular peptide. The peptide N.HIGLGLFDIHSPR.I from YER091c is an example of a common case where the protein is highly overexpressed in one condition versus the other and only the 14N or 15N peptide is identified. As described previously,2 the software analyzing the relative abundances of quantitative proteomics data sets from MudPIT was designed to interpret these instances where only one of the 14N /15N peptides was detected and identified by SEQUEST. In these cases, the SEQUEST identification is used to calculate the mass of the missing peptide pair depending on the nitrogen content of the peptide. Then, the MS/ MS of the 14N or 15N peptide actually identified by SEQUEST is used as a pointer to find the missing pair in the MS file of peak abundance of the identified pair. For proteins that are overexpressed by severalfold in either medium, it is commonly the case where only the highly expressed version of the peptide is detected and identified, but the relative abundance is still accurately determined based on the information content of the MS scans. (21) Coghlan, A.; Wolfe, K. H. Yeast 2000, 16, 1131-1145.

Figure 1. Interexperimental reproducibility of independent MudPIT analyses of Met6p. The methionine biosynthetic pathway component Met6p was overexpressed in minimal media when compared to rich media by 10× and has a CAI of 0.66. In each independent sample analyzed, multiple peptides with this ratio were detected, identified, and quantified. The replicate analyses of the partial tryptic peptide N.HIGLGLFDIHSPR.I from Met6p are shown. A focused MS profile is shown for this peptide in (A-C). In each case the 15N-enriched peptide from minimal media at m/z 741.7 (A), 741.9 (B), and 741.8 (C) is more abundant than the 14N peptide from rich media seen at m/z 731.3 (A), 731.7 (B), and 732.7 (C). The MS/MS of only the 15N peptide from minimal media was detected and identified by SEQUEST in each of the three runs with Xcorrs/∆Cns of 4.59/0.54 (A), 4.17/ 0.51 (B), and 3.78/0.47 (C) giving complete confidence in each identification (data not shown).

In addition, Table 1 demonstrates the intra- and interexperimental reproducibility of peptide quantification from proteins with low CAI values. Typically, proteins with low CAI values are detected and identified by single peptide hits in a MudPIT analysis. Analytical Chemistry, Vol. 75, No. 19, October 1, 2003

5057

Figure 2. Interexperimental reproducibility of independent MudPIT analyses of YPR004cp. The loci YPR004c is part of a mitchondrial complex that may function as an oxidoreductase and was overexpressed in minimal media when compared to rich media by 4× and has a CAI of 0.15. In each independent sample analyzed, the peptide K.LLSPLADVLHAAIGATR.A was detected, identified, and quantified, and the replicate analyses of this peptide are shown. (A-C) The MS elution profile of the peptide from one run is shown with the 15N-enriched peptide at m/z 870.6 and the 14N peptide at 859.7 (A). In this run, the MS/MS of both the 15N (B) and 14N (C) peptides were detected and identified by SEQUEST and a portion of the y-ion series is shown. (D-F) The MS elution profile of the peptide from a second run is shown with the 15N-enriched peptide at m/z 869.2 and the 14N peptide at m/z 860.4 (D). In this run, the MS/MS of both the 15N (E) and 14N (F) peptides were also detected and identified by SEQUEST, and a portion of the y-ion series is shown. (G, H) The MS elution profile of the peptide from a third run is shown with the 15N-enriched peptide at m/z 870.6 and the 14N peptide at m/z 859.4 (G). In this instance, only the 15N peptide generated an MS/MS of sufficient quality to be interpreted by SEQUEST, and a portion of the y-ion series is shown (H).

In the current body of work, the same peptide was detected and quantified for certain proteins with low CAI values as seen in Table 1. An example of this is the detection, identification, and quantification of the peptide K.LLSPLADVLHAAIGATR.A from the protein YPR004c. This protein was overexpressed in minimal media by a factor of 3.8. The interexperimental reproducibility of this peptide is shown in Figure 2. In two of the independent runs, an MS/MS was generated that gave SEQUEST identifications of both the 14N and 15N versions of the peptide (Figure 2A-F). In the third run, only the 15N version of the peptide generated an MS/MS of sufficient quality to be detected and identified (Figure 2G,H). However, in all three cases, the relative abundance determination was similar with ratios of expression in minimal versus rich of 0.37, 0.29, and 0.12, respectively. There is additional useful information in a quantitative proteomic data set generated by partial interexperimental reproduc5058 Analytical Chemistry, Vol. 75, No. 19, October 1, 2003

ibility of peptide detection, identification, and quantification that is shown in Table 2. There were several different types of partial interexperimental reproducibility in the data set. For example, peptides to certain proteins were only detected, identified, and quantified in two of the three runs. This case especially manifested itself for proteins with low CAI values such as YOR080w (CAI ) 0.12) and YNL236w (CAI ) 0.13) (Table 2). In other cases, as for YPR081c, YOR061w, and YNL037c, different peptides with relatively similar ratios were detected, identified, and quantified in all three experiments but no single peptide was detected in all three experiments (Table 2). Even in these cases of partial interexperimental reproducibility, MS/MS of both 14N and 15N peptides were commonly generated and identified by SEQUEST. For example, the partial interexperimental reproducibility of YOR061w (Cka2p) is shown in Figure 3. Cka2p is the catalytic subunit of casein kinase II and has a CAI of 0.17. The peptide

Table 2. Partial Reproducibility of Relative Abundance of Peptide Hits relative abundance (YPD/15N-Min)c loci (name)

CAIa

peptideb

expt 1

expt 2

YOR080w (Dia2p) YNL236w (Sin4p) YJL132w YNL208w YPR081c (Grs2p)

0.12 0.13 0.13 0.14 0.14

K.LDLMGTSISGSALTR.L K.FKNIIASPLSAGFNYGK.L R.ANDLILNHWLGGQSNLVAML.Q K.LSGVLGAIGGAFLANK.I M.VENETLGYFMARVHQ.F R.EFLMAEIEHFVDPLNK.S Q.VDGPMLTPYDVLKTSGHVDKFTDWMC*R.N F.AIESQHIEFNLIWIK.A K.LAVPEVVDLIDNLLR.Y R.LIDWGLAEFYHPGVDYNVR.V R.WVPMMSVDNAWLPR.G K.LHIVSENNFPTAAGLASSAAGFAALVSAIAK.L K.TKPVGIIVPNHAPLTK.L K.IYQSAHDAINR.I K.GQDFHIAGESYAGHYIPVFASEILSHK.D R.VFNGGHMVPFDVPENALSMVNEWIHGGFSL.R.GFGFLSFEKPSSVDEVVK.T K.IFVGGIGPDVRPK.E K.RLHVPAEVIFNAK.D K.NISGASDETLHELGVPITPIAFDYPTVVK.N K.GLWHTPADQTGHGSLNVALR.K M.ILSSTLMLNHLGLNEYATR.I K.TRIPDIDLIVIR.E K.AFDTTGEPDAKPYLPEEILWR.Q R.LMAEVPYGVLLSGGLDSSLIASIAAR.E K.HPLELLGK.S K.YSNEDTRPVALPWFWEHYNPEEYSLWK.V

10.67

13.91 0.26 0.05

YCR057c (Pwp2p) YOR061w (Cka2p)

0.17 0.17

YNL014w (Hef3p) YNR043w (Mvd1p) YOR317w (Faa1p)

0.18 0.21 0.25

YMR297w (Prc1p)

0.26

YOL123w (Hrp1p)

0.26

YOR184w (Ser1p)

0.26

YNL037c (Idp1p)

0.28

YPR145w (Asn1p)

0.37

YPL048w (Cam1p)

0.41

0.29 0.51

expt 3 0.09 0.12 0.12

0.22 0.19 0.04 1.02 1.10 1.12 0.42 0.07

0.10 1.15 1.19 1.49 0.53 0.12 0.13

0.47 0.83

0.41 0.90 0.94 0.31

0.47

0.16 0.79 0.91 1.49 4.50 0.85 0.80

6.69 6.95 0.79 0.82

a CAI is an abbreviation for Codon Adaptation Index. b The peptide identified by SEQUEST and quantified in the current analysis is bracketed by periods. The amino acid residues outside of the periods detail the cleavage sites, which generated a given peptide. An * next to a cysteine residue indicated modification by iodoacetamide. c The ratios provided under the experiment 1-3 columns are the relative abundance ratios of the peptides shown determined from three separate experiments. The ratio was calculated by dividing the elution peak intensity of the peptide from cells grown in rich (YPD) media (14N peptides) by the elution peak intensity of the peptide form cells grown in 15N minimal (Min) media.

K.LAVPEVVDLIDNLLR.Y was detected, identified, and quantified in experiment 1 (Figure 3A-C) and experiment 2 (Figure 3G-I) but not experiment 3. In both experiment 1 (Figure 3A) and experiment 2 (Figure 3G), the relative abundance of this peptide in S. cerevisiae cultured in rich versus minimal media was unchanged and MS/MS of both the 14N and 15N versions of the peptides were generated and identified by SEQUEST (Figure 3B,C and H,I). The peptide R.LIDWGLAEFYHPGVDYNVR.V was also detected, identified, and quantified in experiment 1 (Figure 3DF) and experiment 3 (Figure 3J-L) but not experiment 2. Again, the relative abundance of this peptide in S. cerevisiae cultured in rich versus minimal media was unchanged (Figure 3D and J), and MS/MS of both the 14N and 15N versions of the peptides were generated and identified by SEQUEST (Figure 3E,F and K,L). Errors are generated and variation of protein abundance will biologically occur in an experiment. Experimental errors, i.e., accuracy and reproducibility of the protein assay, will also occur. Analyses of mRNA expression have demonstrated the biological variation that will occur naturally in any system.22,23 Having more than one data point will be essential in elucidating all of the biological results brought on by a stimulus. In the current analysis, and in the biological reporting of the analysis,5 only proteins for which at least one peptide was detected, identified, and quantified (22) Elowitz, M. B.; Levine, A. J.; Siggia, E. D.; Swain, P. S. Science 2002, 297, 1183-1186. (23) Fedoroff, N.; Fontana, W. Science 2002, 297, 1129-1131.

in at least two of the three independent experiments were reported. The importance of reproducibility is shown in both Tables 1 and 2. For example, in Table 1, the reproducibility of two unique peptides from each of the three analyses of the locus YCL040w is shown. The peptide K.LSTNPGFHLFEK.R from the locus YCL040w had ratios of 1.09, 0.46, and 0.39 while the peptide R.NILVDLHSQGLLLQQYR.S had ratios of 0.32, 0.34, and 0.37. Clearly, the 1.09 ratio for the first peptide described had an erroneous quantitative determination, but the additional data from additional experiments and additional peptides demonstrated that this locus was 3-fold overexpressed in minimal media. Other loci such as YAL062w and YGR234w had large variations in peptide expression ratios within and between analyses (Table 1). In these cases, the ratios demonstrated ∼10-fold change in either medium, which may be pushing the limits of analysis in this system. In general, in Table 3 of Washburn et al., the standard deviation as a percentage of the average increases as the average exceeds 10fold changes in protein expression levels.5 In each of the cases described, the impact of single erroneous quantitative peptide determinations was minimized by repetitive analyses. In Table 3 of Washburn et al., many of the loci with standard deviations in excess of 30% were detected and identified by single peptide hits in only two of the three independent quantitative MudPIT analyses of the three independent cell growths.5 The protein digestion protocol used in these quantitative proteomic experiments was directed toward generating tryptic fragments of proteins. As has been seen with previous MudPIT analyses, Analytical Chemistry, Vol. 75, No. 19, October 1, 2003

5059

Figure 3. Variable peptide interexperimental reproducibility of Cka2p. The catalytic subunit of casein kinase II (YOR061p, Cka2p) was evenly expressed in both rich and minimal media and has a CAI of 0.17. In each independent sample, a different set of peptides was detected and identified. In the one sample, the peptides K.LAVPEVVDLIDNLLR.Y (A-C) and R.LIDWGLAEFYHPGVDYNVR.V (D-F) were detected, identified, and quantified. The MS elution profile of the peptide K.LAVPEVVDLIDNLLR.Y is shown in (A) with the 14N peptide at m/z 840.4 and the 15Nenriched peptide at m/z 850.0. The SEQUEST interpreted MS/MS of the 14N peptide (B) and the 15N peptide (C) is shown with partial y-ion series indicated. The MS elution profile for the peptide R.LIDWGLAEFYHPGVDYNVR.V from the same MudPIT analysis as shown in (A-C) is shown in (D) with the 14N peptide at m/z 1133.3 and the 15N peptide at m/z 1146.1. The SEQUEST interpreted MS/MS of the 14N peptide (E) and the 15N peptide (F) is shown with partial y-ion series indicated. In a second MudPIT analysis, only the peptide K.LAVPEVVDLIDNLLR.Y was detected, identified, and quantified, and the MS elution profile of this peptide is shown in (G) with the 14N peptide at m/z 840.5 and the 15N peptide at m/z 849.9. The SEQUEST interpreted MS/MS of the 14N peptide (H) and the 15N peptide (I) is shown with a partial y-ion series. In the third MudPIT analysis, only the peptide R.LIDWGLAEFYHPGVDYNVR.V was detected, identified, and quantified, and the MS elution profile of this peptide is shown in (J) with the 14N peptide at m/z 1132.6 and the 15N peptide at m/z 1146.1. The SEQUEST interpreted MS/MS of the 14N peptide (K) and the 15N peptide (L) is shown with the partial y-ion series indicated.

typically 50% of the proteins detected and identified in a MudPIT analysis are detected and identified by single peptide hits. This was generally the case in the experiments described in this report. 5060 Analytical Chemistry, Vol. 75, No. 19, October 1, 2003

Since more peptide hits per protein allow for greater intra- and interexperimental reproducibility, the focus of future quantitative proteomic analyses via MudPIT and other methods should be to

maximize the number of peptide hits per protein. Protein digestion methods to obtain multiple peptide hits per protein yielding high sequencing coverage have been described.24,25 These approaches will likely allow for improved quantitative proteomic analyses via MudPIT by generating greater intraexperimental reproducibility by yielding multiple hits per protein from each experiment. Within a single experiment when multiple peptides are detected, identified, and quantified, the expression ratios of a peptide from any given protein ideally should be identical in the absence of any experimental error. To date this has not been achieved for any system, but this is likely an impossible goal. In the limited instances in the literature where the intraexperimental reproducibility of additional quantitative proteomic methods has been analyzed, variation at the peptide level has been seen. For example, using the MCAT quantitative labeling strategy, Cagney and Emili reported up to 2-fold variations of unique peptides from the same protein (Table 2 in ref 26). For the ICAT reagent,12 the differential resolution of the original d0- and d8-labeled reagents yielded sizable errors in relative abundance determinations of peptides.27 However, these errors could be minimized by altering the ICAT strategy from d0 and d8 reagents to 12C and 13C coded isoforms.28,29 This series of articles detail improvements to the ICAT strategy and should serve as an analytical model to systematically determine sources of errors for quantitative proteomic strategies and propose solutions. (24) Gatlin, C. L.; Eng, J. K.; Cross, S. T.; Detter, J. C.; Yates, J. R., 3rd. Anal. Chem. 2000, 72, 757-763. (25) MacCoss, M. J.; McDonald, W. H.; Saraf, A.; Sadygov, R.; Clark, J. M.; Tasto, J. J.; Gould, K. L.; Wolters, D.; Washburn, M.; Weiss, A.; Clark, J. I.; Yates, J. R., 3rd. Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 7900-7905. (26) Cagney, G.; Emili, A. Nat. Biotechnol. 2002, 20, 163-170. (27) Zhang, R.; Sioma, C. S.; Wang, S.; Regnier, F. E. Anal. Chem. 2001, 73, 5142-5149. (28) Zhang, R.; Regnier, F. E. J. Proteome Res. 2002, 1, 139-147. (29) Zhang, R.; Sioma, C. S.; Thompson, R. A.; Xiong, L.; Regnier, F. E. Anal. Chem. 2002, 74, 3662-3669.

CONCLUSIONS We have carried out quantitative MudPIT analyses of S. cerevisiae grown in rich and minimal media. In the current paper, the reproducibility at the peptide level of the independent cell growths and quantitative MudPIT analyses has been described and the limitations have been analyzed. Quantitative MudPIT analyses indeed provide biological insight into the changes in protein expression brought about by cellular stimulus. However, improvements to the system can be obtained, and generating multiple peptides hits per protein through methods designed for high sequence coverage24,25 will likely provide for greater peptide hits per protein and therefore greater intraexperimental reproducibility of analyses. Determining the reproducibility of each quantitative proteomic method will be essential for each system evaluation as a potential quantitative proteomic method. In addition, methodological improvements arise from systematic analysis of sources of error as demonstrated by the shift from d0 and d8 ICAT reagents to 12C and 13C ICAT reagents now available from Applied Biosystems (Foster City, CA). Ideally, when multiple peptides per protein are detected, identified, and quantified in a single experiment, there should be little variation in the peptide expression ratios. However, in sample-to-sample analyses of independent cellular growths, there will likely be variation in protein expression ratios for the same loci. Stochasticity or “noise” in biological systems has been elegantly detailed in biological systems at the mRNA level.22,23 If these patterns seen at the mRNA level persist at the protein level, then strategies to account for and interpret the impact of variation on a quantitative proteomics analysis will be needed.

Received for review February 6, 2003. Accepted July 17, 2003. AC034120B

Analytical Chemistry, Vol. 75, No. 19, October 1, 2003

5061