Relationship between Sample Loading Amount and Peptide

Jan 15, 2009 - E-mail: [email protected]., † ... The absolute loading amount of a given complex sample affected the final ... upon SA as signatur...
1 downloads 0 Views 2MB Size
Anal. Chem. 2009, 81, 1307–1314

Articles Relationship between Sample Loading Amount and Peptide Identification and Its Effects on Quantitative Proteomics Kehui Liu,† Jiyang Zhang,†,‡ Jinglan Wang,† Liyan Zhao,† Xu Peng,† Wei Jia,† Wantao Ying,† Yunping Zhu,† Hongwei Xie,‡ Fuchu He,† and Xiaohong Qian*,† State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine No. 33, Life Science Park Road, Changping District, Beijing, China, 102206, and School of Mechanical Engineering and Automatization, National University of Defense Technology, Changsha, China, 410073 The relationship between sample loading amount and peptide identification is crucial for the optimization of proteomics experiments, but few studies have addressed this matter. Herein, we present a systematic study using a replicate run strategy to probe the inherent influence of both peptide physicochemical properties and matrix effects on the relationship between peptide identification and sample loading amounts, as well as its applications in protein quantification. Ten replicate runs for a series of laddered loading amounts (ranging between 0.01∼10 µg) of total digested proteins from Saccharomyces cerevisiae were performed with nanoscale liquid chromatography coupled with linear ion trap/ Fourier transform ion cyclotron resonance (nanoLC-LTQFT) to obtain a nearly saturated peptide identification. This permitted us to differentiate the linear correlativity of peptide identification by the commonly used peptide quantitative index, the area of constructed ion chromatograms (XIC) (SA, from MS and tandem MS data) in the given experiments. The absolute loading amount of a given complex sample affected the final qualitative identification result; thus, optimization of the sample loading amount before every proteomics study was essential. Peptide physicochemical properties had little effect on the linear correlativity between SA-based peptide quantification and loading amount. The matrix effects, rather than the static physicochemical properties of individual peptides, affect peptide measurability. We also quantified the target protein by selecting peptides with good parallel linear correlativity based upon SA as signature peptides and revised the data by multiplying by the reciprocal of the slope coefficient. We found that this optimized the linear protein abundance relativity at every amount range and thus extended the linear dynamic range of label-free quantification. This empirical rule for linear peptide selection (ERLPS) can be adopted to correct comparison results in proteolytic peptide-based quantitative proteomics, such as accurate mass tag (AMT) and targeted quantitative proteomics, as well as in tag-labeled comparative proteomics. 10.1021/ac801466k CCC: $40.75  2009 American Chemical Society Published on Web 01/15/2009

The optimization of sample loading is both necessary and fundamental to experimental design in mass spectrometry-based proteomics. When the loading amount (LM) changes, the absolute quantity of peptides introduced into the mass spectrometry platform also changes, thus changing the chromatographic behavior of the peptides and the probability of the peptides observed by the mass spectrometer.1,2 As sample loading increases, it becomes difficult to obtain a stronger spectral signal intensity for one specific peptide under fixed mass spectrometric conditions. The probability of peptide identification is affected by the absolute quantity of peptides in the sample. The linear correlativity reflects the relationship between the observed and actual abundances of the peptides, which is determined by factors including physicochemical properties, protease efficiency,3 and matrix effects.4 In the absence of a technique that rapidly synthesizes peptides on a large scale, comparable to polymerase chain reaction (PCR) for nucleic acid amplification,5 it is hard to investigate the linear correlation of the observed and actual abundances of all peptides for a target protein prior to every largescale quantitative proteomics experiment, such as accurate mass tag (AMT) 6,7 for absolute quantification and multiple reaction monitoring (MRM)5,8,9 of spectral libraries for targeted quantita* Corresponding author. Phone: +86-10-80727777-1231. Fax: +86-10-8070-5155. E-mail: [email protected]. † Beijing Proteome Research Center. ‡ National University of Defense Technology. (1) Finney, G. L.; Blackler, A. R.; Hoopmann, M. R.; Canterbury, J. D.; Wu, C. C.; MacCoss, M. J. Anal. Chem. 2008, 80 (4), 961–971. (2) Proteome Research: Mass Spectrometry; James, P., Ed.; Springer: New York, 2001; Vol. XXI, pp 41-42. (3) Tang, H.; Arnold, R. J.; Alves, P.; Xun, Z.; Clemmer, D. E.; Novotny, M. V.; Reilly, J. P.; Radivojac, P. Bioinformatics 2006, 22 (14), e481–488. (4) Cappiello, A.; Famiglini, G.; Palma, P.; Pierini, E.; Termopoli, V.; Trufelli, H. Anal. Chem. 2008, 80, 9343–9348. (5) Mirzaei, H.; McBee, J.; Watts, J.; Aebersold, R. Mol. Cell. Proteomics 2008, 7 (4), 813–823. (6) Pasa-Tolic´, L.; Masselon, C.; Barry, R. C.; Shen, Y.; Smith, R. D. Biotechniques 2004, 37 (4), 621–636. (7) Ishihama, Y.; Oda, Y.; Tabata, T.; Sato, T.; Nagasu, T.; Rappsilber, J.; Mann, M. Mol. Cell. Proteomics 2005, 4 (9), 1265–1272. (8) Lin, S.; Shaler, T. A.; Becker, C. H. Anal. Chem. 2006, 78 (16), 5762– 5767.

Analytical Chemistry, Vol. 81, No. 4, February 15, 2009

1307

tive proteomics,10,11 as well as tag-labeled comparative and quantitative proteomics.12,13 To date, few studies have examined the relationship between sample loading and peptide identification in terms of the inherent factors that influence linear correlativity.14,15 In proteomics research, discrepancies among methods for measuring peptides are generally accepted.16 Peptide identification is an interactive process involving the conditions of mass spectrometry and of the peptides. In recent years, numerous studies have evaluated the effects of peptide characteristics on their identification.17-20 The effects of physicochemical properties were evaluated by hydrophobicity index,17 isoelectric point,18 peptide length,20 and modification by phosphorylation.19 Studies revealed that these properties of peptides significantly affected their chances of being identified by mass spectrometry. The discriminative effects due to physicochemical properties may be magnified when the amounts of digested peptide of a single target protein at distinct labeled states differ greatly, thus affecting the corresponding quantitative linear correlativity, the linear dynamic range, and the final comparative results. Another possible reason is the so-called matrix effect, an imputative concept composed of an unexpected suppression or enhancement of the analyte’s response induced by the coeluting matrix. The coeluted matrix should influence the signal intensity in a possible competition for the available charges and for access to the droplet surface for gasphase emissions.4 The general principle of proteolytic peptide-based quantitative and comparative proteomics is to compare protein abundance in a sample by comparing all or proportional ratios of summation or the mean value of spectral signal intensity for all peptides comprised of the targeted proteins.12,21,22 However, differences in peptide detection will bias quantification results.10,21 Therefore, minimizing this discrimination effect is crucial for proteolytic peptide-based quantitative proteomics. A recent paper introduced restricted maximum likelihood estimation (REML)15 into quantitative proteomics to statistically evaluate peptide measurability. However, little attention was paid to the inherent reasons for differences in peptide measurability3,4 and also to the rules of optimal peptide selection for quantitative proteomics. (9) Anderson, L.; Hunter, C. L. Mol. Cell. Proteomics 2006, 5 (4), 573–588. (10) Mallick, P.; Schirle, M.; Chen, S. S.; Flory, M. R.; Lee, H.; Martin, D.; Ranish, J.; Raught, B.; Schmitt, R.; Werner, T.; Kuster, B.; Aebersold, R. Nat. Biotechnol. 2007, 25 (1), 125–131. (11) Lam, H.; Deutsch, E. W.; Eddes, J. S.; Eng, J. K.; King, N.; Stein, S. E.; Aebersold, R. Proteomics 2007, 7 (5), 655–667. (12) Gygi, S. P.; Rist, B.; Gerber, S. A.; Turecek, F.; Gelb, M. H.; Aebersold, R. Nat. Biotechnol. 1999, 17, 994–999. (13) Barnidge, D. R.; Hall, G. D.; Stocker, J. L.; Muddiman, D. C. J. Proteome Res. 2004, 3 (3), 658–661. (14) Lee, H. G.; Desiderio, D. M. J. Chromatogr., B: Biomed. Appl. 1994, 662 (1), 35–45. (15) Daly, D. S.; Anderson, K. K.; Panisko, E. A.; Purvine, S. O.; Fang, R; Monroe, M. E.; Baker, S. E. J. Proteome Res. 2008, 7 (3), 1209–1217. (16) Purvine, S.; Picone, A. F.; Kolker, E. OMICS 2004, 8 (1), 79–92. (17) Cech, N. B.; Krone, J. R.; Enke, C. G. Anal. Chem. 2001, 73 (2), 208–213. (18) Pan, P.; Gunawardena, H. P.; Xia, Y.; McLuckey, S. A. Anal. Chem. 2004, 76 (4), 1165–1174. (19) LinksSteen, H.; Jebanathirajah, J. A.; Rush, J.; Morrice, N.; Kirschner, M. W. Mol. Cell. Proteomics 2006, 5 (1), 172–181. (20) Nielsen, M. L.; Savitski, M. M.; Kjeldsen, F.; Zubarev, R. A. Anal. Chem. 2004, 76 (19), 5872–5877. (21) Mueller, L. N.; Brusniak, M. Y.; Mani, D. R.; Aebersold, R. J. Proteome Res. 2008, 7 (1), 51–61. (22) Old, W. M.; Meyer-Arendt, K.; Aveline-Wolf, L.; Pierce, K. G.; Mendoza, A.; Sevinsky, J. R.; Resing, K. A.; Ahn, N. G. Mol. Cell. Proteomics 2005, 4 (10), 1487–1502.

1308

Analytical Chemistry, Vol. 81, No. 4, February 15, 2009

In quantitative proteomics, the spectral signal intensity reflects the abundance of peptides identified in a single LC-MS run.21,22 General quantification features based on a label-free strategy and spectral feature analysis include spectral counting (SC, from tandem MS data) and the area of constructed ion chromatograms (SA, from MS and tandem MS data).22-24 Spectral counting counts all of the peptides mapped to a protein in one run and compares them to the count of the same protein in another run. The discrepancy in peptide measurement for one protein often leads to different sets of observable peptides for the same protein at different runs. The random sampling of the mass spectrometry platform makes spectral counting less accurate than an observation of the area of constructed ion chromatograms for protein quantification, although spectral counting might be more sensitive.22 Peptide peak area is typically compared to the peak area of that same peptide, and protein quantification is deduced from the observed ratios of changes in multiple peptides. However, averaging ratios of peptides with different measurability for one protein might seriously impair the linearity and dynamic range of protein quantification. Little attention has been paid to the optimization of the linearity and dynamic range of peptide measurability.7,21,22 The use of replicate run is a statistical strategy25 that was first introduced to check the reproducibility of an ion-exchange column26 and of the multidimensional protein identification technology (MudPIT) platform.26-28 Its importance lies in the augmentation of the number of identified proteins and peptides in complicated samples.29,30 The strategy can reveal the inherent problems of peptide identification and quantification by minimizing the random sampling effect of mass spectrometry platforms. However, almost no systemic analysis that uses the replicate run strategy to study differences in peptide measurability or quantitative behaviors has been reported. In the present study, the replicate run strategy was used to investigate the relationship between large scale peptide identification and the amount of sample loaded. From this, we hoped to deduce the potential influence of peptide physicochemical properties and matrix effects on the relationship between the loading amount and the peptide quantitative index. For a series of laddered loading amounts of tryptic digests of yeast lysate, 10 replicate runs of the same sample were analyzed by nanoscale liquid chromatography coupled with linear ion trap/Fourier transform ion cyclotron resonance (nano-LC-LTQ-FT) mass spectrometry to obtain nearly saturated peptide identification and thus to ultimately differentiate peptide identifications. The simulation of augmented (23) States, D. J.; Omenn, G. S.; Blackwell, T. W.; Fermin, D.; Eng, J.; Speicher, D. W.; Hanash, S. M. Nat. Biotechnol. 2006, 24, 333–338. (24) Radulovic, D.; Jelveh, S.; Ryu, S.; Hamilton, T. G.; Foss, E.; Mao, Y.; Emili, A. Mol. Cell. Proteomics 2004, 3, 984–997. (25) Littell, R. C.; Pendergast, J.; Natarajan, R. Stat. Med. 2000, 19 (13), 1793– 1819. (26) Shen, Y.; Jacobs, J. M.; Camp, D. G., II; Fang, R.; Moore, R. J.; Smith, R. D.; Xiao, W.; Davis, R. W.; Tompkins, R. G. Anal. Chem. 2004, 76 (4), 1134– 1144. (27) Washburn, M. P.; Ulaszek, R. R.; Yates, J. R., III Anal. Chem. 2003, 75 (19), 5054–5061. (28) Liu, H.; Sadygov, R. G.; Yates, J. R., III Anal. Chem. 2004, 76 (14), 4193– 4201. (29) Durr, E.; Yu, J.; Krasinska, K. M.; Carver, L. A.; Yates, J. R., III; Testa, J. E.; Oh, P.; Schnitzer, J. E. Nat. Biotechnol. 2004, 22 (8), 985–992. (30) Chen, M.; Ying, W.; Song, Y.; Liu, X.; Yang, B.; Wu, S.; Jiang, Y.; Cai, Y.; He, F.; Qian, X. Proteomics 2007, (14), 2479–2688.

trendlines for the number of unique peptides identified with replicate analyses was constructed by a systematic statistical analysis. Clustering the spectral signal intensity vectors of the coidentified unique peptides in different loading amounts differentiated them into three categories. Following the physicochemical property calculations, the diversity of peptide categories and the corresponding extracted ion chromatogram (XIC) correlation and distribution over the entire retention time helped to deduce the inherent reasons for the differences of the peptide linear correlativity, distinguishing matrix effects from static physicochemical properties. Selecting peptides with positive parallel linear correlativity and specific properties as signature peptides will lead to an optimal linear correlation and a wider linear dynamic range at a given platform for quantitative proteomics. This empirical rule for linear peptide selection (ERLPS) could be adopted to quantitative proteomics based on proteolytic peptides, thus rectifying the comparison result for peptide selection in the accurate mass tag (AMT) strategy,31 or to the tag-labeled comparative and quantitative proteomics.10-13 It will also contribute to the optimization of experimental designs in proteomics research. EXPERIMENTAL SECTION Preparation of Tryptic Digested Proteins from Saccharomyces cerevisiae. S. cerevisiae (Type II, Sigma) was directly lysed in lysis buffer: (8 M urea, 1 mM NaF, 0.2 mM NaVO3, 40 mM Tris, 1 mg/mL dithiothreitol (DTT), 1/50 tablets/mL Protease Inhibitor Cocktail Tablets), disrupted by sonication, and then centrifuged at 40 000g for 1 h in a refrigerated centrifuge (10 °C).The protein content of the cleared lysate was determined using a Bradford protein assay (Bio-Rad, Hercules, CA) with BSA as an external standard. Lyophilized proteins were dissolved in sampling buffer and digested with trypsin (modified sequencing grade porcine trypsin, Promega, Madison, WI) for further analysis. Analysis of Nanoscale Capillary Liquid ChromatographyElectrospray Ionization Tandem Mass Spectrometry (nanoLC-MS/MS). LC-MS/MS experiments were performed on an LTQ FT mass spectrometer (Thermo Electron, San Jose, CA) equipped with a Finnigan Nanospray II electrospray ionization source (Thermo Electron), an Agilent 1100 series binary highperformance liquid chromatography (HPLC) pump (Agilent Technologies, Palo Alto, CA). Approximately 1-20 µL of sample solution (approximately 1, 0.1, 0.01, or 0.001 µg/µL for dilution experiments) was loaded onto a PicoFrit tip column (BioBasic C18, 5 µm, 75 µm i.d. × 10 cm, 15 µm i.d. spray tip, New Objective, Woburn, MA), and separation was achieved by using a mobile phase from 2% acetonitrile (ACN)/0.1% FA (buffer A) and 80% ACN/0.1% FA (buffer B) and applying a linear gradient from 5 to 40% buffer B for 90 min at a flow rate of 300 nL/min provided across a flow splitter by the HPLC pumps. An electrospray voltage of 1.8 kV was applied via a gold electrode through a PEEK junction at the inlet of the microcapillary column. A hybrid linear quadrupole ion trap/FTICR mass spectrometer (LTQ FT) provided the upmost mass accuracy of present MS technologies up to 1∼2 ppm. The linear ion trap mass spectrom(31) Zimmer, J. S.; Monroe, M. E.; Qian, W. J.; Smith, R. D. Mass Spectrom. Rev. 2006, 25 (3), 450–482.

eter LTQ part allows high speed scan performance of MS/MS experiments, and thus high sensitivity for the characterization of a complex peptide mixture. With the FT10 method,32 a scan cycle was initiated with a full-scan MS followed by 10 MS/MS experiments on the 10 most abundant ions detected in the full-scan MS. The m/z range was 350-2000, and the temperature of the ion transfer capillary was set at 180 °C. Accumulation of ions for both MS and MS/MS scans was performed in the linear ion trap and the automatic gain control (AGC) target values were set to 5 × 105 ions for survey MS and 1 × 104 ions for MS/MS experiments. Ions subjected to MS/MS were excluded from further sequencing for 30 s. Database Searching and Result Validation. All tandem mass spectra were searched against the combined normal and reversed Saccharomyces cerevisiae open reading frame (ORF) protein sequence using SEQUEST (Bioworks v.3.2) for peptide and protein identification. Raw data were converted into the.dta files by the program ExtractMSn (Thermo Fisher Scientific Inc., 81 Wyman Street, Waltham, MA 02454), and the database search results were converted into.MS format by an in-house program called “MPS” and finally merged into a single plain text file by the in-house program “OutSumMS”. False-positive (FP) assignments were removed by the thresholds of two parameters, Xcorr and ∆Cn (obtained using the decoy database method) with the false-discovery rate (FDR) of ∼1%.The details of data processing were provided in the Supporting Information. RESULTS AND DISCUSSION Trendline of Identified Peptide Number with the Number of Replicate Runs and Increasing Loading Amount. To illustrate the relationship between peptide identification and loading amount, we first considered a carefully designed data set. Tryptic digests of whole-cell lysates of Saccharomyces cerevisiae (yeast), with loading amounts ranging from 0.01 to 10 µg, were each sequentially analyzed 10 times by nano-LC-LTQ-FT to obtain a nearly saturated peptide identification curve (Figure 1). Laddered dilutions of the original tryptic digests of yeast lysate eliminated the effects of protease efficiency for different peptides of one protein.3 The increasing number of unique identified peptides at every loading amount produced an augmented trendline simulation with approximately the same rate of change but with different degrees of saturation at the upper limits. Clearly, when the sample loading amounts are different, the total numbers of uniquely identified peptides can differ greatly even at saturated levels. The absolute quantity of an overwhelmingly complex sample will clearly affect the final identification results, emphasizing the importance of optimizing the amount of the sample loading. By comparison of the changes of redundant (Figure 2a) or nonredundant (Figure 2b) identified peptide numbers with increasing loading amount, we observed that the number of redundant peptides had an ever-increasing trendline but that the increase began to reach a maximum value at load levels above 1 µg. On the other hand, the number of nonredundant peptides increased, reached a maximum at 1 µg sample load, and then began to decline. From these two trendlines, we conclude that (32) Haas, W.; Faherty, B. K.; Gerber, S. A.; Elias, J. E.; Beausoleil, S. A.; Bakalarski, C. E.; Li, X.; Ville´n, J.; Gygi, S. P. Mol. Cell. Proteomics 2006, 5 (7), 1326–1337.

Analytical Chemistry, Vol. 81, No. 4, February 15, 2009

1309

Figure 1. Augmented trendline simulation of unique peptide identifications with repeated times at every loading amount. A tryptic digest of a whole-cell lysate of Saccharomyces cerevisiae (yeast) with sample loading amounts ranging from 0.01, 0.1, 1, and 10 µg was separated by nanocapillary reversed-phase liquid chromatography coupled with LTQ-FT; 10 runs were performed for each sample to obtain saturated peptide identification for each analysis. The xcoordinate shows the number of replicate runs, while the y-axis shows the number of unique peptide identifications.

when the identification of some peptides was scaled up, the identification of other peptides was reduced correspondingly within the fixed identification parameters of the LTQ-FT mass spectrometer. The effect of sample loading amount on experimental optimization is also indispensable for targeted quantitative proteomics. Preferences of instrument and platform with respect to the physicochemical properties of the peptides has been mentioned and used in a complementary fashion to improve protein coverage rate.33 The ability to classify physicochemical properties has been used to differentiate observed and unobserved peptides.10,34,35 Our results showed an augmented trendline simulation for each loading amount with the same rate of change but with different upper limits of saturation. Thus, the differences between observed and unobserved peptides reflected the qualitative changes of peptide measurability, when only the sample loading amounts changed. The loading amounts will affect the final qualitative results of the same sample. Therefore, optimization and the selection of sample loading amount should be a significant process, complementary to the so-called unique peptide library construction10 in targeted proteomics. Classification of Peptides Based on the Area of Constructed Ion Chromatograms. For biomarker discovery, all quantitative information on candidate proteins comes from peptides rather than from proteins directly. The protein quantification result is based on peptide quantification indexes, such as SC or SA, by summing or other complex processing.36,37 All peak areabased comparative proteomics need spectra of the peptides coidentified at different loading amounts while excluding spectra (33) Elias, J. E.; Haas, W.; Faherty, B. K.; Gygi, S. P. Nat. Methods 2005, 2 (9), 667–675. (34) Brunner, E.; Ahrens, C. H.; Mohanty, S.; Baetschmann, H.; Loevenich, S.; Potthast, F.; Deutsch, E. W.; Panse, C.; de Lichtenberg, U.; Rinner, O.; Lee, H.; Pedrioli, P. G.; Malmstrom, J.; Koehler, K.; Schrimpf, S.; Krijgsveld, J.; Kregenow, F.; Heck, A. J.; Hafen, E.; Schlapbach, R.; Aebersold, R. Nat. Biotechnol. 2007, 25 (5), 576–583. (35) Rinner, O.; Mueller, L. N.; Huba´lek, M.; Mu ¨ller, M.; Gstaiger, M.; Aebersold, R. Nat. Biotechnol. 2007, 25 (3), 345–352. (36) Veenstra, T. D. J. Chromatogr., B: Anal. Technol. Biomed. Life Sci. 2007, 847 (1), 3–11. (37) Levin, Y.; Schwarz, E.; Wang, L.; Leweke, F. M.; Bahn, S. J. Sep. Sci. 2007, 30 (14), 2198–2203.

1310

Analytical Chemistry, Vol. 81, No. 4, February 15, 2009

of randomly identified peptides. The measured discrepancies of coidentified peptides are the outcome of interactions between the peptides and the mass spectrometry system. In order to examine different behaviors of the coidentified peptides caused by different sample loading amounts, we classified peptides based on areas derived from the constructed spectral ion chromatogram (SA). On the basis of the classification of SA in Figure 3, we divided peptides into three categories: (1) peptide spectral signal intensities that attained a saturated maximal identification and then decreased with added loading amounts (Figure 3, top graphs), (2) peptide spectral signal intensities that continued to increase with increased loading amounts (middle graphs), and (3) peptide spectral signal intensities that responded in an irregular fashion with added loading amounts (lower graphs). These three categories were used as a basis for all further comparisons in the present study. Assignment and Differentiation of Peptide Physicochemical Properties with Respect to Their Classification. The differences in the quantities and types of peptides that were identified as the sample load increased implied intrinsic variation of physicochemical properties for coidentified peptides under the fixed conditions of the mass spectrometer experiment. Using the three categories of peptides mentioned above, we analyzed the physicochemical properties of the peptides and displayed the data obtained from SA analyses using box plots (Figure 4). The notches of the box plots represent a robust estimate of the uncertainty around the median values to permit box-to-box comparisons (Supporting Information).21 The analysis included the following peptide physicochemical properties: gas-phase basicity (GB) value,38,39 hydrophobicity factor (HP), isoelectric point (pI), peptide length (number of amino acids), number of alkaline amino acids, number of acidic amino acids, and predicted identified probability (a composite factor composed of 1024 physicochemical properties used to differentiate observed and unobserved peptides).10 By careful assignment of the above physicochemical properties to the three classes of the peptides, we concluded, based on SA analysis (Figure 4), that essentially none of the physicochemical properties of the peptides affected the data distinctly in terms of the scale of the applied sample load. Detailed discussions about the above attributes on peptides statistic behavior are provided in the Supporting Information. We deduced that the matrix effects,4 not the static physicochemical properties of individual peptides, might affect the peptide measurability. We illustrated the matrix effects by exhibiting the gradual changes of the extracted ion chromatogram between peptide chromatogram peak shape and loading amount as shown in Figure 5. This analysis showing the XIC of all identified peptides over the entire retention time for every loading amount provides a vivid description of peptide measurability with retention time and loading amount. Most peptides based on SA clustering showed good linear correlation, while peptides with bad linear correlation appear at retention times of regions with crowded varieties and climbing intensities. SA is the accumulation of the constructed ion chromatograms and is not directly affected by the chromatogram peak (38) Santos, I.; Balogh, D. W.; Doecke, C. W.; Marshall, A. G.; Paquette, L. A. J. Am. Chem. Soc. 1986, 108 (26), 8183–8185. (39) Carr, S. R.; Cassady, C. J. J. Mass Spectrom. 1997, 32 (9), 959–967.

Figure 2. Trendlines of redundant or nonredundant peptide numbers with increasing loading amounts. Panel a represents the changes of redundant peptide numbers while panel b shows the changes of nonredundant peptides. Each are displayed as a function of increasing amounts of peptide load.

Figure 3. Classification of peptides into three categories based upon peptide SA. Peptides were divided into three categories based upon spectral peak area (SA, from MS1 and MS/MS data). The distance between objects in the clustering is the correlation coefficient, and the linkage between clusters is the group average. The categories are defined as 1, peptides whose spectral intensities increase with loading amount and approach a maximum in a hyperbolic fashion; 2, peptides whose spectral intensities continue to increase as sample size increases; and 3, peptides whose spectral intensities change in an irregular manner with increased sample load (outliers).

shape, which may be affected greatly by the interaction effect of peptides in the chromatogram separation or by matrix effects.3 Although peptide physicochemical properties affect the peptide chromatogram peak shapes, it ideally does not alter the linear relationship between peptide SA and loading amount within the same mass spectrometric platform. The complex interactions among the peptides or the matrix effects in the chromatographic separation process make peptide chromatographic peak magnification difficult to increase linearly with loading amount. Thus, the complexity and composition of the sample greatly affect the quantification and identification of the peptides. It is difficult to select the peptide simply on the basis of physicochemical properties to optimize the accuracy and to

Figure 4. Box plots displaying physicochemical property discrepancies of coidentified peptides based upon SA. Analysis included the following physicochemical properties: gas-phase basicity (GB) value, hydrophobicity factor (HP), isoelectric point (pI), distribution of lengths (number of total amino acids, AANum), number of alkaline amino acids (BaaNum), number of acidic amino acids (AaaNum), predictor value (Prob), and correlation coefficient (Corr). Physicochemical properties of category 3 (outliers) did not take on statistic representatives. For a more detailed analysis, see the Supporting Information.

extend the dynamic range of peptide quantification. Selecting those peptides that exhibit good linearity by the repeated run strategy may be a better choice. Selection and Comparison of Special Peptides for Parameter Optimization in Quantitative Proteomics. Although our data indicated that physicochemical characteristics have little effect on SA-based analysis, we could not overlook the accurate linear correlativity of SA analysis. This observation led us to conclude Analytical Chemistry, Vol. 81, No. 4, February 15, 2009

1311

Figure 5. Correlating and distributing three categories of peptide during the whole retention time and corresponding quantitative information. Blue asterisk, peptide of category 1; green asterisk, peptide of category 2; yellow asterisk, irregular peptide of category 3. (a-d) Simulation and distribution of peptide XIC (extracted ion current) and peak shape, intensity summation, peptide number during the whole retention time for 0.01, 0.1, 1, and 10 µg of total digested yeast proteins. (e) SA distribution of three categories of peptides at loading amounts of 0.01, 0.1, 1, and 10 µg, respectively. (f, g) Repeated times and correlation coefficient distribution of three categories of peptides at loading amounts of 0.01, 0.1, 1, and 10 µg. For part e, most peptides based on SA clustering showed good linear correlation; peptides with bad linear correlation appear at retention times of regions with crowded varieties and climbing intensities. For part f, peptides of category 1 showed fewer times of identification during 10 replicate runs. For part g, peptides with bad linear correlation appeared at retention times of regions with crowded varieties and climbing intensities.

that peptide quantification is affected not by physicochemical properties directly but rather via the complex interactions of the peptide or via the matrix effects in the chromatography separation process with increasing loading amount. As the sample loading amounts increased, some peptides had a positive linear correlation with loading amounts, while other peptides had a saturated effect on spectral signal intensities and negative correlativity (Figure 6). These phenomena are crucial for proteolytic peptide-based quantitative and comparative pro1312

Analytical Chemistry, Vol. 81, No. 4, February 15, 2009

teomics, which compare protein abundances by comparing all or proportional ratios of spectral signal intensity summation of all peptides comprising the targeted proteins. Similarly, when we compared spectral signal intensity of two labeled or unlabeled proteins in one analysis, some peptides derived from the protein with the above physicochemical properties had positive correlativity while others with saturated effects had negative correlativity (Supplementary Figure 1 in the Supporting Information), and their summation did not reflect the actual differences in relative

Figure 6. Saturated effects of category 1; different efficiencies of peptide identification for categories 1 and 2. Category 2 had higher peptide identification efficiency than category 1. Different peptides showed different peptide efficiency.

abundance, as shown in Figure 7a (bold lines). Selecting peptides with good linearity for actual spectral signal intensity comparison, requiring repeated runs with different loading amounts in the experiment design, will permit us to acquire a wider linear dynamic range and optimal linear correlativity for this platform. In addition, even after excluding the peptides with saturated effects and the seldom-identified peptides, the summation and comparison of peptides with positive linear correlativity is still not the optimal comparative result. Summation of peptides with parallel trends of linear correlativity, i.e., with the same slope coefficients, will diminish the measurement error, as implied by the REML15 model. As analyzed above, it is difficult to build a universal model to remove the influence of matrix effects on peptide quantification, because the sample complexity greatly affects peptide behavior in the mass spectrometry platform, although some efforts have been made toward this goal.15 Here, we proposed a peptide selection method based on the repeated run strategy with different loading amounts. The process of this empirical rule for linear peptide selection requires the following steps. First, different loading amounts for the same sample are prepared, taking into account the proper selection of the dynamic range during the experimental design. Second, after database searches and result collection, the quantitative index of SA is found and calculated against the raw data files for all coidentified peptides at different loading amounts. Some special peptides have no tandem MS spectrum but still have an XIC. Third, the correlation coefficient between the SA vector and the loading amount for peptides is calculated, with quantification information at all loading amounts. Peptides with proximal correlation coefficients greater than 0.95 are selected for subsequent quantitative analysis. Fourth, SA vector items outside of the required dynamic range are omitted if a broader dynamic range is needed, and the corresponding correlation coefficients are calculated to select the peptides.

Figure 7. Selection of peptides for optimal linear correlativity. Panel a, selection of linear peptides as commissarial peptides for optimal linear correlativity, from coidentified peptides of 10 replicate runs with laddered loading amounts of exemplary protein S000000605 (linear peptide number, 13; total peptides number, 22). Panel b, selecting linear peptides from four single runs of laddered loading amounts for the same protein with panel a. Their summation as an actual spectral signal intensity (thin lines only), revised by multiplying by the reciprocal of the coefficient of the slope, compared with all or proportional ratios of spectral signal intensity summation of all peptides (bold lines plus thin lines) comprised of the targeted proteins, will rectify the linear curve as well as the corresponding comparative results to optimal linear correlativity at every amount range. Analytical Chemistry, Vol. 81, No. 4, February 15, 2009

1313

On the basis of the inherent interpretation of peptide measurability, this empirical rule for linear peptide selection can be adopted in the optimization of peptide selection prior to every quantitative proteomics study such as the AMT strategy7,31 for absolute quantification, targeted quantitative proteomics,10,11 and tag-labeled comparative proteomics. For all quantitative proteomic experiments, the first and most important factor is to construct the optimally constituted linear curve. After clustering the diverse changes in the trends of peptide spectral signal intensities with laddered sample loading amounts, we sorted all peptides with positive linear correlativity, and thereafter, we could select target peptides with positive and parallel linear correlativity, i.e., with proximal slope coefficients, as signature intensities of corresponding candidate biomarker proteins. Furthermore, considering the nonhomogeneous relationship between loading amount and SA, we calculated the slope coefficients to revise the final comparison result by multiplying by the reciprocal of the slope’s coefficient. Thus, we obtained the optimally constituted linear curve according to the requirements of linear curve dynamic ranges. For later quantitative comparisons of candidate biomarker proteins, we compared the SA summation according to the ratio of corresponding signature peptides and revised the comparison result by multiplying by the reciprocal of the coefficient of the slope. Thus, we obtained the actual relative difference of protein abundance, as compared to conventional calculations for all or proportional ratios of spectral signal intensity summations of all peptides as shown in Figure 7b (thin lines). CONCLUSIONS A classic statistical strategy of replicate runs was proposed to investigate the relationship between large-scale identification of peptides and sample loading amounts to probe the inherent factors that influence linear correlativity of peptide identification. Ten replicate runs for a series of laddered loading amounts of total digested proteins from Saccharomyces cerevisiae were performed with nano-LC-LTQ-FT. Augmented trendline simulations of uniquely identified peptide numbers from replicate run times were obtained by statistical analysis. The absolute loading amount of a given complex sample affected the final qualitative identification result, thus necessitating the optimization of sample loading amounts

1314

Analytical Chemistry, Vol. 81, No. 4, February 15, 2009

before every proteomics study. Diverse trendlines of spectral signal intensities derived from multiple sample loading amounts clustered into three categories for coidentified peptides. Peptide physicochemical properties showed little correlation with SA-based peptide quantification and loading amounts. The matrix effects, rather than the static physicochemical properties of individual peptides, affect the linear correlation between sample loading amounts and peptide quantification. This method was employed to induce an empirical rule for linear peptide selection for use in quantitative and comparative proteomics. The selection of positive and parallel linear peptides based on SA for spectral signal intensity comparisons permitted us to acquire a wider linear dynamic range and optimal linear correlativity, which appropriately reflected the actual relativity. The empirical rule for linear peptide selection sheds light on both the optimization of experimental design and parameter selection in comparative and quantitative proteomics. It also complemented the unique peptide library construction. This study also indicated that the replicate run strategy could be effectively used to investigate discrepancies of peptide measurability during the optimization stage of proteomics experiments, including such factors as different mass spectrometer-based platforms, varying sample complexity, and different sample preparation processes of in-gel digestion versus total digestion. This provides an opportunity for large-scale target peptide selection and optimization. ACKNOWLEDGMENT This work was supported by the National High Technologies R&D Program of China (Grants 2006AA02A308, 2006AA02Z341, 2006AA02A312, and 2007AA02Z326), the National Natural Science Foundation of China (Grants 30621063, 20735005, and 20635010), and China National Key Program for Basic Research (Grants 2006CB910801, 2006CB910803, and 2007CB914104). SUPPORTING INFORMATION AVAILABLE Additional information as noted in the text. This material is available free of charge via the Internet at http://pubs.acs.org. Received for review April 17, 2008. Accepted December 23, 2008. AC801466K