Anal. Chem. 2009, 81, 6317–6326
Effect of Dynamic Exclusion Duration on Spectral Count Based Quantitative Proteomics Ying Zhang, Zhihui Wen, Michael P. Washburn, and Laurence Florens* Stowers Institute for Medical Research, 1000 East 50th Street, Kansas City, Missouri 64110 To increase proteome coverage, dynamic exclusion (DE) is a widely used tool. When DE is enabled, more proteins can be identified, although the total spectral counts will decrease. To investigate the effects of DE duration on spectral-counting based quantitative proteomics, we analyzed the same sample via multidimensional protein identification technology while enabling different DE durations (15, 60, 90, 300, 600 s) or turning DE off. Normalized spectral abundance factors (NSAFs) measured for abundant proteins varied little with or without DE, while enabling DE lead to higher peptide counts, higher NSAFs, and better reproducibility of detection for proteins of relatively lower abundance. The optimal DE duration, which generated the maximum number of peptides, proteins, and peptides per protein, was observed to be 90 s in our settings. We developed a mathematical model for analyzing the effects of DE duration on peptide spectral counts. We found that the optimal DE duration depends on the average chromatographic peak width at the base of eluting peptides and mass spectrometry parameters, leading us to calculate an optimized DE duration of 97.9 s, in excellent agreement with our observations. In this study, we provide a systematic approach for the optimization of spectral counts for improved quantitative proteomics analysis. Two primary sources of data are used for label-free quantitative proteomics. One source is derived from the MS scans of a mass spectrometer (reviewed in ref 1). As a peptide elutes from a column into a mass spectrometer, the intensity of the ion is measured at certain time intervals and a chromatogram can be generated from each eluting peptide. With the use of standard HPLC approaches,2 the peak area of an eluting peptide can be used to measure the abundance of a peptide from liquid chromatography (LC) coupled to tandem mass spectrometry (MS/MS) or LC/LC-MS/MS platforms. However, since in a typical proteomics analysis one MS scan is taken followed by several MS/MS scans, there will be gaps in the chromatogram. As a result, extracted or reconstructed ion chromatograms are commonly generated to determine the abundance of a peptide.1,3 The use of the information in MS scans for quantitative proteomics * To whom correspondence should be addressed. Laurence Florens, Ph.D.; Stowers Institute for Medical Research; 1000 E. 50th Street; Kansas City, MO 64110. E-mail:
[email protected]. Phone: (816) 926-4458. Fax: (816) 926-4694. (1) Listgarten, J.; Emili, A. Mol. Cell. Proteomics 2005, 4, 419–434. (2) Snyder, L. R.; Kirkland, J. J.; Glajch, J. L. Practical HPLC Method Development, 2nd ed.; John Wiley & Sons, Inc.: Hoboken, NJ, 1997. 10.1021/ac9004887 CCC: $40.75 2009 American Chemical Society Published on Web 07/08/2009
analysis is widespread and several computational approaches are available for this approach.3-6 The other source of label-free quantitative information is from the MS/MS scans of a mass spectrometer and is termed spectral counting.7-14 In spectral counting, the abundance of a protein is measured as the total number of tandem mass spectra matching its peptides. Spectral counting is an increasingly implemented quantitative proteomics strategy and is primarily used for label free quantitative proteomic analysis.15-17 In an important emerging area, spectral counting is also the basis for the application and development of statistical tools for evaluating quantitative proteomics data sets.13,18-20 (3) Mueller, L. N.; Brusniak, M. Y.; Mani, D. R.; Aebersold, R. J. Proteome Res. 2008, 7, 51–61. (4) Li, X. J.; Yi, E. C.; Kemp, C. J.; Zhang, H.; Aebersold, R. Mol. Cell. Proteomics 2005, 4, 1328–1340. (5) MacCoss, M. J.; Wu, C. C.; Liu, H.; Sadygov, R.; Yates, J. R., III Anal. Chem. 2003, 75, 6912–6921. (6) Park, S. K.; Venable, J. D.; Xu, T.; Yates, J. R., III Nat. Methods 2008, 5, 319–322. (7) Allet, N.; Barrillat, N.; Baussant, T.; Boiteau, C.; Botti, P.; Bougueleret, L.; Budin, N.; Canet, D.; Carraud, S.; Chiappe, D.; Christmann, N.; Colinge, J.; Cusin, I.; Dafflon, N.; Depresle, B.; Fasso, I.; Frauchiger, P.; Gaertner, H.; Gleizes, A.; Gonzalez-Couto, E.; Jeandenans, C.; Karmime, A.; Kowall, T.; Lagache, S.; Mahe, E.; Masselot, A.; Mattou, H.; Moniatte, M.; Niknejad, A.; Paolini, M.; Perret, F.; Pinaud, N.; Ranno, F.; Raimondi, S.; Reffas, S.; Regamey, P. O.; Rey, P. A.; Rodriguez-Tome, P.; Rose, K.; Rossellat, G.; Saudrais, C.; Schmidt, C.; Villain, M.; Zwahlen, C. Proteomics 2004, 4, 2333– 2351. (8) Fu, X.; Gharib, S. A.; Green, P. S.; Aitken, M. L.; Frazer, D. A.; Park, D. R.; Vaisar, T.; Heinecke, J. W. J. Proteome Res. 2008, 7, 845–854. (9) Gao, J.; Opiteck, G. J.; Friedrichs, M. S.; Dongre, A. R.; Hefta, S. A. J. Proteome Res. 2003, 2, 643–649. (10) Ishihama, Y.; Oda, Y.; Tabata, T.; Sato, T.; Nagasu, T.; Rappsilber, J.; Mann, M. Mol. Cell. Proteomics 2005, 4, 1265–1272. (11) Liu, H.; Sadygov, R. G.; Yates, J. R., III Anal. Chem. 2004, 76, 4193–4201. (12) Lu, P.; Vogel, C.; Wang, R.; Yao, X.; Marcotte, E. M. Nat. Biotechnol. 2007, 25, 117–124. (13) Old, W. M.; Meyer-Arendt, K.; Aveline-Wolf, L.; Pierce, K. G.; Mendoza, A.; Sevinsky, J. R.; Resing, K. A.; Ahn, N. G. Mol. Cell. Proteomics 2005, 4, 1487–1502. (14) Zybailov, B.; Coleman, M. K.; Florens, L.; Washburn, M. P. Anal. Chem. 2005, 77, 6218–6224. (15) Blondeau, F.; Ritter, B.; Allaire, P. D.; Wasiak, S.; Girard, M.; Hussain, N. K.; Angers, A.; Legendre-Guillemin, V.; Roy, L.; Boismenu, D.; Kearney, R. E.; Bell, A. W.; Bergeron, J. J.; McPherson, P. S. Proc. Natl. Acad. Sci. U.S.A. 2004, 101, 3833–3838. (16) Paoletti, A. C.; Parmely, T. J.; Tomomori-Sato, C.; Sato, S.; Zhu, D.; Conaway, R. C.; Conaway, J. W.; Florens, L.; Washburn, M. P. Proc. Natl. Acad. Sci. U.S.A. 2006, 103, 18928–18933. (17) Zybailov, B.; Mosley, A. L.; Sardiu, M. E.; Coleman, M. K.; Florens, L.; Washburn, M. P. J. Proteome Res. 2006, 5, 2339–2347. (18) Choi, H.; Fermin, D.; Nesvizhskii, A. I. Mol. Cell. Proteomics 2008, 7, 2373– 2385. (19) Pavelka, N.; Fournier, M. L.; Swanson, S. K.; Pelizzola, M.; RicciardiCastagnoli, P.; Florens, L.; Washburn, M. P. Mol. Cell. Proteomics 2008, 7, 631–644.
Analytical Chemistry, Vol. 81, No. 15, August 1, 2009
6317
Studies have compared the use of spectral counting and peak area for the generation of quantitative proteomics data, and in general, these studies have found the methods to correlate well with each other.13,14,21 However, the correlation has been shown to be poor when few spectral counts are found for a protein.13,21 The fact that fewer spectral counts generates less reliable quantitative data can be dealt with statistically where a statistical test has been shown to be more conservative with proteins with fewer spectral counts than with proteins with higher spectral counts.19 However, if fewer spectral counts leads to less reliable quantitative data are there systematic methods for increasing the number of spectral counts in a proteomics analysis? The major parameters that can affect the spectral counts of a protein are the data dependent acquisition settings of a tandem mass spectrometry analysis. One of these parameters, dynamic exclusion (DE) is an important method for complex mixture analysis.22,23 The complexity of a proteome-scale sample can overwhelm the separation efficiency of the HPLC. When coelution occurs, the mass spectrometer will pick the most abundant n ions, which are five ions in our setting, to perform a MS/MS scan. However, there are often more than five coeluting peptides and the weaker ions easily lose the chance to be fragmented. To increase proteome coverage, data dependent acquisition is a widely used technology in LC-MS/MS. When DE is enabled, the user can generate a list of m/z values on which the instrument will not repeat MS/MS scans of the same precursor ion more than a user defined number of times (two in our setting) during a particular time, otherwise called DE duration. Once a precursor ion mass is placed on the DE list, the instrument will move on to other, usually less abundant, ions in order to generate additional MS/MS scans of new precursor ions. As a result, both highabundance and low-abundance coeluting ions might have a chance to undergo fragmentation. Obviously, more unique peptides can be identified when DE is enabled. On the other hand, when DE is not enabled, the instrument will only select the most abundant ions for repeated MS/MS generation. Therefore, without DE the spectra from high-abundance proteins dominate the total identified spectra, and the number of identified peptides will uniquely rely on the separation capabilities of the liquid chromatography system used. DE parameters can therefore have a significant impact on the peptides, proteins, and spectral counts detected and identified in any given proteomics analysis. In the current study, we set out to determine the optimal DE duration for a multidimensional protein identification technology (MudPIT) analysis24,25 in order to maximize the number of spectral counts while simultaneously maximizing the total number of peptides, proteins, and peptides per protein. A large extract of soluble proteins from Saccharomyces cerevisiae was generated, and four technical replicate analyses (20) Zhang, B.; VerBerkmoes, N. C.; Langston, M. A.; Uberbacher, E.; Hettich, R. L.; Samatova, N. F. J. Proteome Res. 2006, 5, 2909–2918. (21) Usaite, R.; Wohlschlegel, J.; Venable, J. D.; Park, S. K.; Nielsen, J.; Olsson, L.; Yates, J. R., III J. Proteome Res. 2008, 7, 266–275. (22) Davis, M. T.; Spahr, C. S.; McGinley, M. D.; Robinson, J. H.; Bures, E. J.; Beierle, J.; Mort, J.; Yu, W.; Luethy, R.; Patterson, S. D. Proteomics 2001, 1, 108–117. (23) Kohli, B. M.; Eng, J. K.; Nitsch, R. M.; Konietzko, U. Rapid Commun. Mass Spectrom. 2005, 19, 589–596. (24) Washburn, M. P.; Wolters, D.; Yates, J. R., III Nat. Biotechnol. 2001, 19, 242–247. (25) Florens, L.; Washburn, M. P. Methods Mol. Biol. 2006, 328, 159–175.
6318
Analytical Chemistry, Vol. 81, No. 15, August 1, 2009
were carried out at five different DE durations or with DE turned off, for a total of 24 independent MudPIT analyses. We found that the optimal DE duration was directly related to the average chromatographic peak width at the base of eluting peptides, and we provided a mathematical approach for optimizing DE duration on a quantitative proteomics platform. EXPERIMENTAL SECTION Materials. Bacto peptone, dextrose, and Bacto yeast extract were acquired from BD Diagnostics (Sparks, MD). Urea, Tris, ammonium acetate, iodoacetamide (IAM), and tricholoroacetic acid (TCA) were obtained from Sigma (St. Louis, MO). Tris(2carboxyethyl) phosphine hydrochloride (TCEP) was obtained from Pierce (Rockford, IL). Endoproteinase LysC was from Roche Diagnostics Corp. (Indianapolis, IN). Modified sequencing grade trypsin was from Promega (Madison, WI). HPLC grade water was from EMD Chemicals Inc. (Gibbstown, NJ). HPLC grade formic acid and acetonitrile were purchased from Mallinckrodt Baker, Inc. (Phillipsburg, NJ). Sample Preparation. Saccharomyces cerevisiae strain BY 4741 was grown to late log phase (OD at 600 nm ∼1.5) in YPD/rich media (10 g of Bacto yeast extract, 20 g of Bactopeptone, and 20 g of dextrose/L). Cells were collected and washed in cold water by centrifugation for 20 min at 4000g at 4 °C. Cells were lysed by silica glass beads in lysis buffer (40 mM HEPES-KOH pH 7.5, 350 mM NaCl, 10% glycerol, 0.1% Tween-20), including 10 cycles consisting of 1 min vortexing at 2500 rpm followed by 30 s incubation at 4 °C. Unbroken cell material and glass beads were pelleted at 4000g at 4 °C for 20 min. Pooled supernatants were centrifuged for 1 h at 22000g at 4 °C. The supernatants were removed and protein concentration was determined by the BCA protein assay (Pierce). The proteins were precipitated by the addition of TCA to 20%, incubated 3 h at 4 °C, pelleted at 14000g at 4 °C, and washed twice with 500 µL of acetone. The final pellet was dried via a SPD111V speed vacuum system (Thermo Electron, Midford, MA). The precipitated proteins were dissolved in 100 mM Tris-HCl, pH 8.5, 8 M urea, reduced in 5 mM TCEP, incubated at room temperature for 30 min, and carbamidomethylated by adding IAM to 10 mM and incubated at room temperature for 30 min in the dark, subsequently digested with endoproteinase Lys-C followed by trypsin, and incubated at 37 °C overnight. The reaction was quenched with the addition of formic acid to 5%. The peptide mixture was aliquoted and stored at -80 °C prior to use. MudPIT Analysis. Each sample was analyzed using a modified 11-step MudPIT analysis24 as described previously.25 Briefly, 1.4 µg of peptides mixture was pressure-loaded onto a 250 µm i.d. capillary packed first with 3.5 cm of 5 µm strong cation exchange material (Partisphere SCX, Whatman), followed by 2.5 cm of 5 µm C18 RP particles (Aqua, Phenomenex), and the biphasic column was washed with buffer A (water/acetonitrile/ formic acid (95:5:0.1, v/v/v), pH 2.6) for more than 20 column volumes. After desalting, the biphasic column was connected via a 2 µm filtered union (UpChurch Scientific) to a 100 µm i.d. column, which had been pulled to a 5 µm i.d. tip, then packed with 9.5 cm of 5 µm C18 RP particles.24 The split three-phase column was placed in line with an Agilent 1100 quaternary HPLC pump (Palo Alto, CA) and LTQ mass spectrometer (Thermo Scientific). Overflow tubing was used to decrease the flow rate
from 0.1 mL/min to about 200-300 nL/min. Fully automated 11step chromatography runs were carried out. Three different elution buffers were used: 5% acetonitrile (ACN), 0.1% formic acid (buffer A); 80% ACN, 0.1% formic acid (buffer B); and 500 mM ammonium acetate, 5% ACN, 0.1% formic acid (buffer C). Peptides were sequentially eluted from the SCX resin to the RP resin by increasing salt steps (buffer C concentration was 0, 5, 10, 15, 20, 25, 30, 50, 75, 100, and 90% in steps 1-11), followed by organic gradients (increase in buffer B concentration up to 75% over 127 min in steps 1-9, and up to 90% over 132 min in steps 10 and 11). Each full MS scan (400-1600 m/z) was followed by five datadependent MS/MS, while the number of microscans was 1 for MS and MS/MS scans. The dynamic exclusion settings used were as follows: repeat count 2; repeat duration 30 s; exclusion list size 500; exclusion duration 15, 60, 90, 300, or 600 s. Four technical replicates were generated at each dynamic exclusion duration and when DE was disabled. The mass spectrometer scan functions and HPLC gradient generation were controlled by the Xcalibur data system (Thermo Scientific). Data Analysis. RAW files were extracted into ms2 file format26 using RAW_Xtract.27 MS/MS spectra were searched with the SEQUEST algorithm28 against a database containing 14176 protein sequences combining 6911 S. cerevisiae proteins (from the National Center of Biotechnology Information (NCBI) March 3, 2006 release), 177 common contaminants, and to estimate false discovery rates, their corresponding 7088 randomized amino acid sequences. No enzyme specificity was imposed during searches, setting a mass tolerance of 3 amu for precursor ions and of ±0.5 amu for fragment ions. All cysteines were considered as fully carboxamidomethylated (+57 Da statically added), while methionine oxidation was searched as a differential modification. SEQUEST outputs were then filtered with DTASelect 1.929 with the following criteria set: DeltCn at least 0.08, minimum XCorr of 1.5 for +1, 2.5 for +2, and 3.5 for +3 spectra, and maximum Sp rank of 10. Peptides had to be fully tryptic and at least seven amino acids long. The minimum number of peptides to identify proteins was two. Peptide hits from multiple runs were compared using DTASelect/CONTRAST.29 Proteins that were subsets of others were removed using the parsimony option in DTASelect29 on the proteins detected after merging all runs. Proteins that were identified by the same set of peptides (including at least one peptide unique to such protein group to distinguish between isoforms) were grouped together (Supporting Table 1A in the Supporting Information), then one accession number was considered as representative of each protein group such as only nonredundant proteins are reported in Supporting Table 1B in the Supporting Information. NSAF v7 (an in-house developed software) was used to create the final report on all nonredundant proteins detected across the different runs (Supporting Table 1B in the Supporting Information), estimate false discovery rates (FDR), and calculate their
respectivenormalizedspectralabundancefactor(NSAF)values.16,17,30 The spectral FDR was calculated as the number of spectra matching randomized peptides multiplied by 2 and divided by the total number of spectra, as described before,31 while the protein FDR was calculated as the number of randomized proteins divided by the total number of proteins. Under the spectra and protein selection criteria described above, the average spectral FDR was 0.009% ± 0.006% (standard error), while the average FDR at the protein level was 0.056% ± 0.044% (standard error) across the 24 runs (Supporting Table 1B in the Supporting Information). Spectral counts (SpC) of each nonredundant protein were used to estimate the protein’s abundance:14
(26) McDonald, W. H.; Tabb, D. L.; Sadygov, R. G.; MacCoss, M. J.; Venable, J.; Graumann, J.; Johnson, J. R.; Cociorva, D.; Yates, J. R., III Rapid Commun. Mass Spectrom. 2004, 18, 2162–2168. (27) Venable, J. D.; Dong, M. Q.; Wohlschlegel, J.; Dillin, A.; Yates, J. R. Nat. Methods 2004, 1, 39–45. (28) Eng, J.; McCormack, A. L.; Yates, J. R., III J. Am. Soc. Mass Spectrom. 1994, 5, 976–989. (29) Tabb, D. L.; McDonald, W. H.; Yates, J. R., III J. Proteome Res. 2002, 1, 21–26.
(30) Florens, L.; Carozza, M. J.; Swanson, S. K.; Fournier, M.; Coleman, M. K.; Workman, J. L.; Washburn, M. P. Methods 2006, 40, 303–311. (31) Peng, J.; Elias, J. E.; Thoreen, C. C.; Licklider, L. J.; Gygi, S. P. J. Proteome Res. 2003, 2, 43–50. (34) Pison, G.; Struyf, A.; Rousseeuw, P. J. Comput. Stat. Data Anal. 1999, 381–392. (32) Kaufman, L.; Rousseeuw, P. J. Finding Groups in Data: An Introduction to Cluster Analysis; Wiley-Interscience: New York, 1990. (33) Rousseeuw, P. J. Comput. Appl. Math. 1987, 20, 53–65.
(NSAF)k )
(SpC/length)k
(1)
N
∑ (SpC/length)
i
i)1
All chromatographic and mass spectrometric items were automatically calculated with NSAF v7 coupled with XCalibur (Thermo Scientific). Partition Clustering. Partitioning around medoids (PAM)34 is a more robust generalization of k-means clustering to arbitrary dissimilar matrices. The algorithm consists of two steps. In the BUILD-step, a user-defined number of initial medoids are sequentially selected. In the SWAP-step, one medoid is iteratively replaced with another entry in order to minimize the sum of the dissimilarities of all objects to their nearest medoids, until convergence. PAM provides for each object the cluster to which it belongs, the closest neighbor cluster, and a silhouette width that measures the degree of fitness of an object to its cluster. The average silhouette width over all objects in the data set is a measure of the goodness of clustering32,33 and is used to select the optimal number of medoids obtained with the PAM algorithm. Silhouette values lie in the range [-1, 1], with values closer to 1 considered well clustered. Partitioning around medoids34 was performed in R. The partitioning was performed using two parameters as measures of dissimilarity between abundance levels for proteins commonly detected in two DE conditions: (a) the average NSAF measured when DE was turned off (NSAFnoDE) and (b) the ratio between the average NSAF under DE and the average NSAF under noDE (NSAFDE/NSAFnoDE). Before PAM analysis, NSAF data were preprocessed with logarithm transformation to the base 10 followed by Z-score transformation. For all of the five PAM clusterings we performed, the best averaged silhouette widths were observed with two clusters. All but the DE15 vs noDE comparison (silhouette width for two clusters measured at 0.46) had silhouette width greater than 0.51, which indicated that a reasonable structure had been found. In addition, the PAM clustering results were analyzed by principal components analysis (PCA). PCA allows the visualization
Analytical Chemistry, Vol. 81, No. 15, August 1, 2009
6319
Figure 1. Effect of dynamic exclusion duration on the number of acquired and qualified spectra, matched peptides, and detected proteins. The numbers (averaged across four technical replicates) of spectra (A), proteins (B), peptides (C), and peptides per protein (D) were plotted as a function of DE duration (0, 15, 60, 90, 300, and 600 s). The variation in spectral counts was satisfactorily fitted to an exponential decay (dotted line) with an adjusted R2 of 0.97, while variations in protein and peptide counts were fitted to the Nelder-Mead35 simplex algorithm (dashed lines), with R2 > 0.94 for all three curves.
of correlations in data sets by compressing information in a low number of dimensions. PCA was performed by inputing the results of the PAM function into clustplot in R. We obtained a biplot, of which the x and y axes represent the first and second principal components, respectively. These two components explained 100% of the point variability. RESULTS AND DISCUSSION Effect of Dynamic Exclusion Duration on the Numbers of Spectra, Peptides, and Proteins. Using the same peptide mixture analyzed by MudPIT under otherwise identical settings, we tested the effect of turning DE off (noDE) and different DE durations (15, 60, 90, 300, and 600 s), denoted from now on as DE15, DE60, DE90, DE300, and DE600, respectively. We first examined the effect of DE duration on the number of qualified spectra (i.e., acquired spectra that were matched to peptides with SEQUEST cross-correlation parameters above the selection criteria), identified peptides, and detected proteins (Figure 1). In general and as expected, enabling DE lead to more proteins and peptides being identified with less spectra. On average, twice as many proteins were identified under DE300 and DE90 compared 6320
Analytical Chemistry, Vol. 81, No. 15, August 1, 2009
to noDE. However, for DE300 and DE90, these proteins were identified from only one-fourth and two-thirds of the spectral numbers measured when DE was off. While enabling DE allowed ions of lower abundance to be selected, many MS/MS spectra generated from such ions might be scored below the selection criteria for qualified spectra. Overall, when DE was enabled, except for DE15, the number of qualified spectra was smaller than when DE was turned off, and it decreased exponentially with increasing DE duration (Figure 1). On the one hand, the underlying goal of a proteomic experiment on a complex biological sample is to identify as many proteins as possible with as high a confidence as possible. Then the average numbers of identified peptides and detected proteins are often more critical factors than the spectrum number to evaluate the success of an LC-MS/MS method. Considering these two qualitative criteria to evaluate the effects of different DE durations, both peptide and protein counts followed exactly parallel trends with a well-defined optimum (Figure 1), indicating that DE90 had the best performance, while the shortest and longest DE durations we tested, DE15 and DE600, had the worst. On the other hand, the use of spectral counts as a quantitative proteomics measure has been spreading steadily over the past few years and is based on the clear relationship between the frequency of an ion being detected and its concentration. Since spectral counts decreased exponentially with longer DE durations (Figure 1), we investigated further whether the quantitative results derived from normalized spectral counting (i.e., NSAF) were dramatically different under different DE conditions. NSAF Quantitation of Identified Proteome under Varying DE Conditions. To further dissect the quantitative effect of DE duration on NSAF values, we chose to closely benchmark the results obtained for each of the varying DE durations against the results obtained when DE was turned off. This lead to five sets of proteins to be compared: noDE vs DE15, noDE vs DE60, noDE vs DE90, noDE vs DE300, and noDE vs DE600 (Table 1, with complete protein lists reported in Supporting Table 2A-E in the Supporting Information). For example, with the combination of the four noDE and four DE300 replicate runs, 533 proteins were identified in at least one of the eight experiments, 248 of which were detected in both DE conditions (Table 1 and Supporting Table 2D in the Supporting Information). The proteins in each of the five two-way comparisons could hence be initially sorted into three distinct groups (Table 1): proteins detected in both DE conditions (“noDE ∩ DE” group) and proteins uniquely detected in one of the two DE condition being compared. To evaluate the reproducibility of detection within each protein group, we calculated the average number of times a protein was detected out of four replicate DE experiments (Table 1). To evaluate the quality of protein identifications, we calculated the average number of peptides mapping to proteins in each group (Table 1). To evaluate the relative abundance of each protein group, we calculated the average NSAF values for proteins in each group (Table 1). Finally, we calculated the sum of the averaged NSAF values (∑NSAF) for proteins in a group to evaluate how much a protein group was enriched compared to other protein groups30 and how such a group may influence the overall quantitative NSAF values within a particular data set (Table 1). (35) Nelder, J. A.; Mead, R. Comput. J. 1965, 7, 308–313.
Table 1. Distribution and Abundance of Proteins Identified In noDE and Each of the DE Durations averaged number of times detected out of 4 replicatesc noDE DE
averaged number of detected peptidesc noDE DE
averaged NSAFs noDE DE
sum of NSAFs(× 100) noDE DE
groupa
no. of proteins (percent of totalb)
noDE vs DE15 (n ) 377)
noDE ∩ DE15 high low DE15 only noDE only
243 (64.5%) 136 (36.1%) 107 (28.4%) 123 (32.6%) 11 (2.9%)
3.3 ± 1.1 3.8 ± 0.6 2.7 ± 1.2 0 1.9 ± 1.1
3.8 ± 0.7 3.8 ± 0.6 3.8 ± 0.7 2.5 ± 1.2 0
5.0 ± 4.7 6.4 ± 5.7 3.1 ± 2.2 0 1.4 ± 0.6
6.0 ± 5.2 6.4 ± 5.7 5.6 ± 4.3 2.5 ± 1.3 0
0.00418 0.00718 0.00039 0 0.001
0.00405 0.00646 0.00098 0.00026 0
102 97.6 4.1 0 0.7
94.4 87.9 10.5 3.3 0
noDE vs DE60 (n ) 440)
noDE ∩ DE60 high low DE60 only noDE only
246 (55.9%) 136 (30.9%) 110 (25.0%) 186 (42.3%) 8 (1.8%)
3.3 ± 1.1 3.8 ± 0.6 2.6 ± 1.2 0 2.1 ± 1.1
3.8 ± 0.5 3.8 ± 0.4 3.7 ± 0.6 2.3 ± 1.2 0
5.0 ± 4.8 6.4 ± 5.7 3.1 ± 2.1 0 1.3 ± 0.5
6.9 ± 5.5 7.0 ± 5.9 6.8 ± 5.0 2.9 ± 1.5 0
0.00413 0.00717 0.00037 0 0.00156
0.00393 0.00615 0.0012 0.00034 0
102 97.6 4.1 0 1.2
96.9 83.7 13.2 6.3 0
noDE vs DE90 (n ) 549)
noDE ∩ DE90 high low DE90 only noDE only
251 (45.7%) 138 (25.1%) 113 (20.6%) 295 (53.7%) 3 (0.5%)
3.2 ± 1.1 3.8 ± 0.6 2.6 ± 1.2 0 2.3 ± 1.5
3.9 ± 0.4 3.9 ± 0.3 3.8 ± 0.4 2.8 ± 1.2 0
4.7 ± 4.4 6.4 ± 5.7 3.1 ± 2.1 0 1.0 ± 0.0
9.0 ± 7.1 8.9 ± 7.4 9.5 ± 7.0 3.3 ± 2.2 0
0.00352 0.0071 0.00038 0 0.00219
0.00334 0.00529 0.00157 0.00042 0
102 98 4.3 0 0.6
90.2 73.1 17.7 12.5 0
noDE vs DE300 (n ) 533)
noDE ∩ DE300 high low DE300 only noDE only
248 (46.5%) 154 (28.9%) 94 (17.6%) 279 (52.3%) 6 (1.1%)
3.3 ± 1.1 3.7 ± 0.7 2.6 ± 1.2 0 1.7 ± 1.2
3.7 ± 0.7 3.7 ± 0.7 3.7 ± 0.7 2.3 ± 1.1 0
4.9 ± 4.7 6.2 ± 5.5 2.9 ± 1.7 0 1.1 ± 0.4
6.8 ± 5.4 6.7 ± 5.7 7.0 ± 5.0 3.0 ± 1.5 0
0.00411 0.00645 0.00028 0 0.00157
0.00375 0.00515 0.00148 0.00064 0
102 99.4 2.6 0 0.9
93.2 79.3 13.9 17.8 0
noDE vs DE600 (n ) 485)
noDE ∩ DE600 high low DE600 only noDE only
233 (48.0%) 138 (28.4%) 95 (19.6%) 231 (47.6%) 21 (4.3%)
3.3 ± 1.1 3.8 ± 0.6 2.6 ± 1.2 0 2.4 ± 1.3
3.4 ± 0.9 3.4 ± 1.0 3.5 ± 0.9 2.2 ± 1.2 0
5.1 ± 4.8 6.7 ± 5.6 2.9 ± 1.7 0 1.5 ± 0.7
4.7 ± 3.5 4.8 ± 3.8 4.7 ± 3.0 2.6 ± 1.3 0
0.00424 0.00696 0.00031 0 0.0019
0.00414 0.00555 0.00209 0.00107 0
99 96 3 0 4
96.5 76.6 19.9 24.9 0
a Proteins detected in both DE conditions (noDE ∩ DE) were further sorted into two groups of high and low abundances based on their averaged NSAF values measured when DE was turned off (NSAFnoDE). NSAFnoDE cutoff values were set at 0.00103, 0.00103, 0.00103, 0.00079, and 0.0009 for DE15, DE60, DE90, DE300, and DE600 proteins, respectively (Supporting Table 2 in the Supporting Information). b Percentage of the number of proteins in each group over the total number of proteins (n value in first column). c ±Standard deviation.
To quantitatively assess the effects of enabling DE on NSAF values, we focused on the group of proteins that contributed the most to ∑NSAF in either noDE or DE conditions. Proteins uniquely detected under noDE were ignored since they accounted for only 2.4%, on average, of the total protein number and only 2.2% of the sum of NSAF on average (Table 1). Proteins uniquely detected when DE was enabled accounted, on average, for 46.7% of the total protein number, yet the quantitative value of this group represented only 15.1% of the sum of NSAF, on average. This means that enabling an optimal DE duration almost doubled the protein number; however, the proteins that were only identified when DE was enabled were of low abundance and they actually did not contribute significantly to NSAF. On the other hand, proteins commonly detected in both DE conditions (“noDE ∩ DE” group) had a number of proteins similar to that of the DE-only group, yet their quantitative value accounted, on average, for close to 100% and 95.4% of the sum of NSAF under noDE and DE, respectively. Since the commonly detected proteins very largely contributed to the sum of NSAF, in the following analysis we only compared the NSAF values of these proteins. Effect of Varying DE Conditions on Label-Free Quantitation of High- and Low-Abundance Proteins. To locally evaluate the effect of enabling DE on NSAF values of each protein belonging to the “noDE ∩ DE” group, we plotted their logtransformed averaged NSAF values measured under DE (Figure 2, open symbols) and noDE (Figure 2, closed symbols). Overall, the most abundant proteins showed little variation in NSAF whether DE was on or off. However, for the proteins with averaged
NSAF values on the low end of the range under noDE, enabling DE brought in clear differences in NSAFs. NSAFs measured for such proteins of lower abundance benefited from enabling DE, resulting in higher values compared with the noDE condition. These differences increased with longer DE durations. Enabling DE had a balancing effect on NSAF between the low- and highabundance proteins: the NSAF difference between proteins of low and high abundance decreased, and the NSAF curves when DE was enabled flattened out. The longer the DE duration was, the stronger the balancing effect became. The balancing effect was caused by DE allowing peptides from less abundant proteins to be fragmented. In such conditions, the frequency of fragmentation for peptides of higher abundance was decreased compared to the one observed when DE was turned off. To establish these observed differences in the effect of DE duration on NSAF values at the statistical level, we used partitioning around medoids analysis to further sort the proteins commonly detected in at least one DE and at least one noDE run (“noDE ∩ DE” group). In the PAM clustering statistical analysis, we chose to establish the relationship between two variables: (i) the relative abundance levels measured when DE was turned off (NSAFnoDE, i.e., the closed squares in Figure 2) and (ii) the effect of enabling DE on these abundance values as estimated by the ratio between the average NSAF under DE and the average NSAF under noDE (NSAFDE/NSAFnoDE, i.e., the ratio between open and closed symbols in Figure 2). In all five sets of proteins considered, the best averaged silhouette widths measured for the partition clustering with z-transformed log10(NSAFnoDE) and log10(NSAFDE/ Analytical Chemistry, Vol. 81, No. 15, August 1, 2009
6321
Figure 2. Differences in NSAF values for each protein with varying DE conditions. Log-transformed NSAF values (averaged across four technical replicates) measured under noDE (closed squares) and varying DE conditions (open symbols) were plotted for each protein detected in both of the compared DE conditions. The proteins on the x-axis were ranked by decreasing NSAF values measured under noDE to readily separate the high and low abundance protein groups defined in Supporting Table 2 in the Supporting Information. The horizontal grid line at y ) -3 represents the NSAFnoDE cutoff of 0.001 used in Table 1 and Figure 3 to define high- and low-abundance protein groups, where proteins above this line are considered high abundance and proteins below this line are considered low abundance. Panels A-E (open symbols) report log10(NSAFDE) values measured under DE15, DE60, DE90, DE300, and DE600, respectively.
NSAFnoDE) were observed with two medoids. This indicated that the proteins commonly detected in at least one DE and at least one noDE run could reasonably be separated into two clusters (Figure 3, left panels). To determine what these two clusters of proteins represented, we plotted the percentage of PAM cluster-1 and cluster-2 proteins as a function of their averaged NSAF values measured under noDE (Figure 3, right panels). For example, for the proteins commonly detected in noDE and DE300, 95.5% of cluster-1 proteins had NSAFnoDE greater than 0.00078, while 94.6% of cluster-2 proteins had NSAFnoDE lower than 0.00078 (Figure 3, right panels). Therefore, the two clusters mostly represented proteins of high (cluster-1) and low (cluster2) abundances that could be simply sorted based on their NSAFnoDE (Table 1). For all five sets of proteins considered, we determined the NSAFnoDE thresholds that satisfactorily separated PAM cluster-1 and -2 proteins. All thresholds were centered 6322
Analytical Chemistry, Vol. 81, No. 15, August 1, 2009
Figure 3. Partitioning of proteins commonly detected in noDE and each of the other DE durations. Proteins commonly identified between noDE and DE15, DE60, DE90, DE300, and DE600 were plotted in panels A/a to E/e, respectively. Left panels (A-E): PAM clustering. Proteins were partitioned based on their average NSAF values measured under noDE (NSAFnoDE) and the ratio between NSAF values under DE and under noDE (NSAFDE/NSAFnoDE). log10(NSAFnoDE) and log10(NSAFDE/NSAFnoDE) were inputted in the PAM function of the Cluster package in R to generate the partition clustering results. Principal components analysis (PCA) biplot for each protein identified in both of the two DE conditions being compared. The proteins were classified into two clusters by partitioning around medoids. The two components on the x and y axes explain 100% of the data variability. The data in each column were z-score transformed before the partition clustering analysis. Right panels (a-e): Relative frequency (percent of total) of proteins belonging to PAM cluster-1 (O) and cluster-2 (2) was plotted as a function of their average NSAF values measured under noDE (NSAFnoDE).
around NSAFnoDE ) 0.001 (Figure 3) and were used to define low- and high-abundance protein groups commonly detected in both of the DE conditions being compared (Table 1). We next performed a t-test between the NSAF values measured under noDE and each of the DE durations for high- and low-abundance proteins (Supporting Table 2 in the Supporting Information). For example, when comparing NSAF values measured under noDE and DE300, the percentage of proteins that passed the t-test p-value cutoff of 0.90). For low-abundance proteins (Figure 5, closed symbols), acceptable correlations were observed (R2 > 0.83). However, these proteins obviously benefited from DE since the linear regressions between the number of peptides observed under DE60, DE90, and DE300 compared to noDE generally followed a two to one relationship. While the longest DE duration tested (DE600) appeared detrimental to peptide recovery for both high- and low-abundance proteins (Figure 5E, inset equations), DE90 was again the best performer with an overall improvement of 2.9 fold in peptides mapping to proteins of lower abundance and even a slight improvement (×1.3) in the recovery of peptides from proteins of higher abundance (Figure 5C, inset equations). This means that by enabling a proper DE duration, twice as many peptides (based on the slope of the linear regressions, Figure 5) could be identified for protein of lower abundance, which likely resulted in the observed increases in abundance and frequency of detection, i.e., reproducibility. Spectral Counts Difference for Peptides With DE Enabled or Turned Off. The spectral counts difference for a peptide when DE was on or off was obviously critical for a protein NSAF value. When DE was turned off, a given peptide’s spectral count, c, could be expressed with the equation:
cj,noDE ) γj,noDEpj,noDE
wj λnoDE
(2)
Figure 5. Difference in number of detected peptides for high- and low-abundance proteins. For each of the proteins commonly identified under noDE and the other DE durations, the numbers of detected peptides (averaged across four technical replicates) recovered in each DE duration (panels A-E for DE15, DE60, DE90, DE300, and DE600, respectively) were plotted against the corresponding number of peptides under noDE. Proteins of high and low abundance were independently plotted with open and closed symbols, respectively. The inset equations are the solutions to the linear regression through zero for the high-abundance (solid lines) and low-abundance (dashed lines) data sets.
where the subscript j denotes the peptide’s identification distinguished by sequence; w is the peptide’s chromatographic peak width at the base at which its intensity is not lower than the minimum MS signal counts required for a dependent scan; λ is the cycle time for a set of whole scan events (which in our experimental set up consisted of one full MS1 followed by five MS/MS); γ is the probability of a dependent scan to be a positive peptide identification by SEQUEST; and p is the probability of the peptide to be eligible for a dependent scan at a cycle time. We defined a DE condition with three parameters: a MS1 parent ion could have R repeat counts during τ repeat duration and then be excluded for ε DE duration. The repeat duration (τ ) 0.5 min in our experimental setup) should be shorter than the chromatograph peak width at the base (wj). When the minimum signal required for triggering MS/MS on an ion was set low enough, all ions could get R repeats during τ repeat duration. Analytical Chemistry, Vol. 81, No. 15, August 1, 2009
6323
Figure 6. Cycle time and DE duration. The observed average cycle time, λ, for a set of whole scan events (1 MS followed by 5 MS/MS) was plotted as a function of DE durations, ε, (in minutes). The inset equation is the solution to the linear regression through the data set (solid line), giving λ ) 0.0295 + 0.001ε, with an R2 of 0.99.
Then a given peptide’s spectral count when DE was enabled could be expressed with the equation: cj,DE ) γj,DEpj,DEwjMIN
( Rε , λ1 )
(3a)
where MIN represents the minimum value from a data set. For a given DE condition to work efficiently, from eq 3a we must have (R/ε) < (1/λ). We calculated the average cycle times, λ, for a set of whole scan events observed in different DE conditions (Figure 6) and showed that (a) in all DE conditions, (R/ε) < (1/λ); (b) the cycle time, λ, increased with DE duration, ε, with an excellent linear correlation (R2 ) 0.99). One reason for such an increase in cycle time with DE duration was that longer DE durations favored ions of lower abundance that required longer time to be collected for MS/MS. This linear correlation reflected the natural linear relationship between ion counts and peptide concentration in solution. Because the condition (R/ε) < (1/λ) was fulfilled experimentally (Figure 6), a given peptide’s spectral count when DE was enabled could then be expressed as cj,DE ) γj,DEpj,DEwj
R ε
(3b)
Assuming that all peptides had the same probability, γ, of a dependent scan to be a positive peptide identification and that the probability, p, of a peptide to be eligible for a dependent scan during a cycle time was proportional to its peak intensity I relative to its coeluted ions, we defined the parameter A as Ijwj Aj ) 2
(4)
Applying the presumption of eq 4, which can be used to approximate peak area,2 we could obtain the difference of spectral counts between two peptides j1 and j2 in a DE condition: ∆cj1,j2,DE ) γDE(Aj1 - Aj2)
R ε
(5)
Equation 4 could be used to explain the balancing effect of DE on NSAF observed in Figure 2. When the DE duration, ε, increased, ∆cj1,j2,DE decreased. When ε became long enough to an extreme point, all peptides, either high-abundance ones or 6324
Analytical Chemistry, Vol. 81, No. 15, August 1, 2009
the low-abundance ones, should have only R MS/MS chances, resulting in smaller NSAF differences between low- and highabundance proteins, i.e., a flat NSAF curve as shown in Figure 2. Practicable DE Durations. To effectively perform dynamic exclusion, DE durations must fall within a range. DE durations cannot be too short, i.e., (R/ε) < (1/λ), otherwise, the observable MS/MS will be completely determined by the cycle time required for a set of whole scan events. This situation may happen in some types of MS instruments with very slow scan rates. Since cycle time, λ, was directly proportional to DE duration, ε, (Figure 6: λ ) aε + b), ε lower range was defined as ε>
Rb 1 - Ra
(6)
Using the slope (a ) 0.001) and intercept (b ) 0.0295) derived from the linear regression of cycle time as a function of DE duration for this data set (Figure 6), we solved eq 6 to be ε > 3.5 s. DE durations cannot be too long; otherwise, the DE list will be saturated. The DE list capacity in an LTQ instrument was 500 m/z values. To make full use of the DE list, its size was always set at 500 in our experimental setup. With the use of the linear regression obtained from Figure 6 and the assumption that the m/z difference between any two neighboring MS1 ions has a symmetrical normal distribution, then the averaged required DE list size could be calculated with of MS/MS ε m/z difference + resolution ( number )( 2 exclusion mass width ) e 500 R(aε + b) (
)
(
)
(7) where the number of MS/MS events in a set of whole scan events was 5, while the exclusion mass width was set at 3 amu in our experimental setup. Assuming that the resolution between two peaks in a mass spectrum was 0.5 amu, then the maximum required DE list size can be calculated when any two neighboring MS1 ions’ m/z difference is greater than 3 amu. This number must be smaller than the DE list capacity in an LTQ instrument that is 500 m/z values. Solving eq 7 gave ε e 15.4 min as the longest practicable DE duration. A variation of eq 7 could be used to calculate the usage percentage of the DE list: 3 + 0.5 5ε 2(0.001ε + 0.0295) (2)(3) × 100% 500
(8)
Under our experimental settings, the percentages of DE list usage were calculated to be 2.45, 9.6, 14.1, 42.3, and 73.8% for DE15, DE60, DE90, DE300, and DE600, respectively. From these results, it was then clear that all DEs durations we tested, from DE15 to DE600, were in the practicable DE duration range. With ε > 15.4, the DE list would have been saturated and the low-abundance proteins would not have gained more benefits from DE, a phenomenon we already started to observe with the longest DE duration we tested (DE600, 10 min). Optimization of DE Duration. To maximize peptide identification, the optimized DE duration, εoptimized, should ensure that
Figure 8. Spectral qualification percentage. To estimate the probability of a dependent scan to lead to a positive identification, all spectra that did not pass our selection criteria for positive identification were extracted, then checked whether the first peptide candidate matched by SEQUEST was otherwise positively identified by another qualified spectrum. For each identified protein, its spectral qualification percentage was calculated as the total qualified spectra divided by the sum of the total qualified and failed spectra. The distributions of spectral qualification percentage were plotted for the different DE conditions. The average spectral qualification percentage adjusted for false discovery rate was 34.5% and was used to estimate the probability of a dependent scan to lead to a positive identification, γj,DE, in eq 9. The symbols for the box plot are as in Figure 7. Figure 7. Peak widths at the base. The distributions of chromatographic peak widths at the base were plotted for each MudPIT step under different DE conditions (panels A-F for noDE, DE15, DE60, DE90, DE300, and DE600, respectively). All chromatographic items were automatically calculated with NSAF v7 coupled with XCalibur (Thermo Scientific). In these box plots, the 25th and 75th percentile are represented by the upper and lower boundaries of the box, with the median being the line dissecting the box, and the mean being the small square symbol. The 5th and 95th percentiles are shown as errors bars, the “X” represents the 1st and 99th percentiles, and the stand alone dashes “s” represent the complete range.
one MS/MS spectrum was positively matched to a peptide for every chosen parent ion. For an ion that was selected for MS/MS, its probability to be eligible for a dependent scan, pj,DE, was hence equal to 1 and eq 3b may be used to calculate εoptimized: εoptimized ) γj,DEwjR
(9)
To effectively perform DE without over exclusion, the optimized DE duration should hence be considered along with chromatographic properties (wj). Ideally, MS/MS should be performed at the beginning of an ion’s elution (defined by the MS/MS threshold), then its m/z value is placed on the exclusion list until this ion is completely eluted out. Therefore, the DE duration should match the peak width at the base of the liquid chromatographic peak (w) and take into account the peak capacity of the separation. In this analysis, the repeat count (R) was set at 2, while the average base peak width (w) was measured to be 142 s based on the distribution of peak widths at the base observed in every MudPIT step under noDE and varying DE durations (Figure 7). The peak width at the base for the first and second chromatographic steps showed greater variations (Figure 7), but significantly fewer peptides were identified in these early steps, hence their influence on the total peak base width distribution was ignored. We used 11 MudPIT steps for every experiment, and the total available gradient time was 1381.9 min. Taking into consideration the total available gradient time and the average peak width at the base, we estimated that the maximum number of peptides that could be acquired in an experiment (i.e., peak capacity) was 583, without coelution. Assuming that an average-size protein of 50 kDa could
generate about 50 observable tryptic peptides (data not shown), then, when the number of proteins in a mixture was greater than 12, peptide coelution would happen. Clearly, we identified well over thousands of peptides in any DE condition (Figure 1); hence, coelution did happen for almost all peptides, hence justifying the use of dynamic exclusion. To roughly yet fairly estimate the probability of a dependent scan to lead to a positive identification, γj,DE, we treated all SEQUEST output results in the following way. First, we extracted all spectrum that did not pass our selection criteria for positive identification. For each such failed spectrum, we then checked whether the first peptide candidate matched by SEQUEST was otherwise positively identified by another qualified spectrum. Then for each identified protein, we summarized all of its peptides’ failed and qualified spectra and calculated its spectral qualification percentage (the total qualified spectra divided by the sum of the total qualified and failed spectra). To account for spectra that might have been assigned incorrectly by SEQUEST, we adjusted the spectral qualification percentage taking into consideration the measured false discovery rate (the average spectral FDR was 0.009%, Supporting Table 1 in the Supporting Information). The averaged spectral qualification percentage was 34.5% (Figure 8), which we used to replace the probability of a dependent scan to lead to a positive identification, γj,DE, in eq 9. Solving eq 9 leads to an εoptimized of 97.9 s. The calculated optimized DE duration was very close to our observed best performer of 90 s DE duration (DE90). From eq 9, we could conclude that narrower wj and lower γj and R would be best matched to shorter DE durations. An optimized DE duration was hence determined by both HPLC separation capabilities (wj) and MS instrument setup (γj,R). CONCLUSIONS Spectral counting approaches can be very accurate for highabundance proteins, which are based on a large number of peptides or spectra identifications, but spectral counting becomes less reliable when few spectral counts are found for a protein.3,13,21 For example, in a study comparing spectral counting with peak Analytical Chemistry, Vol. 81, No. 15, August 1, 2009
6325
areas, Old et al. found that at least four spectral counts were needed to reliably quantify protein abundance ratios.13 DE can significantly increase the number of peptides and spectra for proteins of relatively low abundance, therefore improving the accuracy of these proteins’ NSAF or protein abundance index (PAI), which may be used for protein quantitation.10 An appropriately chosen DE duration is hence very important for labelfree proteome quantitation. Here, we reported a systematic analysis of the impact of dynamic exclusion duration on a spectral count based quantitative proteomics. Empirically, a DE duration of 90 s provided the most peptides, proteins, and peptides per protein while maintaining a high level of spectral counts. We developed a detailed mathematical model for analyzing the effects of DE duration on spectral counts. We found that the optimal dynamic exclusion time is proportional to the average chromatographic peak width at the base of the identified peptides. To test the applicability of our method and to evaluate the effect sample complexity might have on optimizing DE duration, we used six samples isolated from rat membranes containing complex mixtures of thousands of proteins that were first analyzed by 15 MudPIT steps under DE300 (data not shown). The measured spectral qualification percentage (32.8%) and peak width at the base (177 s) lead us to calculate an optimized DE duration of 116 s. We then analyzed 10 similar samples under DE120 and found that proteins counts were improved from an average of 909 under DE300 to 1389 under DE120, while peptides and spectral counts were greatly improved as well (2631 to 4877 peptide counts and 16 914 to 27 408 spectral counts under DE300 and DE120, respectively). Consequently, if appropriate HPLC conditions are chosen to account for greater sample complexity (larger number of MudPIT steps), the observed width peaks at the base should be very similar and be one of the main determinants to optimizing DE duration. However, sample complexity will affect the linear
6326
Analytical Chemistry, Vol. 81, No. 15, August 1, 2009
regression of cycle time as a function of DE. With more complex samples, i.e., containing more proteins than the yeast samples we used throughout this analysis, more intense ions may be selected for fragmentation, hence decreasing the cycle time necessary to perform a set of MS followed by five MS/MS. We measured the cycle times on the samples isolated from rat membranes analyzed at two different DE durations (120 and 300 s). While the cycle time at DE300 was the same as the one we observed with the yeast samples of lower complexity (0.035 min), the cycle time at DE120 was shorter (0.022 min) than the one we had observed with DE90 on the yeast samples, leading to a steeper slope. The praticable DE duration range (solutions to eqs 6 and 7 using the linear regression of cycle time as a function of DE duration) will hence largely depend on sample complexity. This study demonstrates a method for optimizing peptide and spectral counts in a quantitative proteomics platform. It is expected that the method and analysis presented here will facilitate the optimization of spectral count based quantitative proteomics, which will further improve the emerging statistical analysis of such spectral count based proteomics data sets.18,19 ACKNOWLEDGMENT This work was supported by the Stowers Institute for Medical Research. The complex samples isolated from rat membranes were analyzed in collaboration with Eric Schirmer (U. Edinburgh, UK). SUPPORTING INFORMATION AVAILABLE Additional information as noted in text. This material is available free of charge via the Internet at http://pubs.acs.org. Received for review March 5, 2009. Accepted June 23, 2009. AC9004887