Automated Quantification Tool for High-Throughput Proteomics Using

Labeled isotopic peptide pairs can be differentiated by m/z values (or, in the case of .... If the experiment is a mixture of two samples, Student's t...
0 downloads 0 Views 739KB Size
Anal. Chem. 2006, 78, 5752-5761

Automated Quantification Tool for High-Throughput Proteomics Using Stable Isotope Labeling and LC-MSn Guanghui Wang,† Wells W. Wu,† Trairak Pisitkun,‡ Jason D. Hoffert,‡ Mark A. Knepper,‡ and Rong-Fong Shen*,†

Proteomics Core Facility and Laboratory of Kidney and Electrolyte Metabolism, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland 20892

LC-MSn has become a popular option for high-throughput quantitative proteomics, thanks to the availability of stable-isotope labeling reagents. However, the vast quantity of data generated from LC-MSn continues to make the postacquisition quantification analyses challenging, especially in experiments involving multiple samples per experimental condition. To facilitate data analysis, we developed a computer program, QUIL, for automated protein quantification. QUIL accounts for the dynamic nature of spectral background and subtracts this background accordingly during ion chromatogram reconstruction. For elution profile identification, QUIL minimizes the inclusion of coeluted neighbor peaks, yet tolerates imperfect peak shapes. Outlier-resistant methods have been implemented for better protein ratio estimation. The utility of QUIL was validated by quantitative analyses of a standard protein as well as complex protein mixtures, which were labeled with cICAT or 18O and analyzed using LCQ, LTQ, or FT-ICR instruments. For samples that no prior knowledge of relative protein quantities was available, Western blotting was performed for confirmation. For the standard protein, the coefficient of variation (CV) of peptide ratio estimation was 6%. For complex mixtures, the median CV for protein ratio calculations was less than 10%. Computed protein abundance ratios exhibited a relatively high degree of correlation with those obtained from Western blot analyses. Compared with a widely used commercial software tool, QUIL showed improvement in ion chromatogram construction and peak integration and significantly reduced relative errors in abundance ratio assessment. Conventional large-scale proteomic studies rely on twodimensional (2D) gel separation of proteins, quantification by spot intensities, and identification with MALDI-TOF mass spectrometry.1-5 Given the limitations of the gel-based technique,6,7 quantification methods using liquid chromatography and tandem * To whom correspondence should be addressed. Tel: (301) 594 1060. Fax: (301) 402 2113. E-mail: [email protected]. † Proteomics Core Facility. ‡ Laboratory of Kidney and Electrolyte Metabolism. (1) Lilley, K. S.; Razzaq, A.; Dupree, P. Curr. Opin. Chem. Biol. 2002, 6, 4650.

5752 Analytical Chemistry, Vol. 78, No. 16, August 15, 2006

mass spectrometry (LC-MSn), such as the MudPIT technology,8-11 have become attractive alternatives. In the gel-free LC-MSn approach, contrasting samples (e.g., control versus treated) are differentially labeled with stable isotopes through chemical derivatization (e.g., cICAT,12-16 iTRAQ17-20), enzymatic reaction (digestion in 18O water21-23), or metabolic incorporation (15Nenriched nutrients24,25). Labeled isotopic peptide pairs can be (2) Unlu, M.; Morgan, M. E.; Minden, J. S. Electrophoresis 1997, 18, 2071-7. (3) Bi, X.; Lin, Q.; Foo, T. W.; Joshi, S.; You, T.; Shen, H. M.; Ong, C. N.; Cheah, P. Y.; Eu, K. W.; Hew, C. L. Mol. Cell. Proteomics. In press. (4) Hedman, E.; Widen, C.; Asadi, A.; Dinnetz, I.; Schroder, W. P.; Gustafsson, J. A.; Wikstrom, A. C. Proteomics 2006, 6, 3114-26. (5) Yu, K. H.; Rustgi, A. K.; Blair, I. A. J. Proteome Res. 2005, 4, 1742-51. (6) Gygi, S. P.; Corthals, G. L.; Zhang, Y.; Rochon, Y.; Aebersold, R. Proc. Natl. Acad. Sci. U. S.A. 2000, 97, 9390-5. (7) Wu, W. W.; Wang, G.; Baek, S. J.; Shen, R. F. J. Proteome. Res. 2006, 5, 651-8. (8) Washburn, M. P.; Wolters, D.; Yates, J. R., III. Nat. Biotechnol. 2001, 19, 242-7. (9) Chen, E. I.; Hewel, J.; Felding-Habermann, B.; Yates, J. R., III. Mol. Cell. Proteomics 2006, 5, 53-6. (10) Washburn, M. P.; Ulaszek, R. R.; Yates, J. R., III. Anal. Chem. 2003, 75, 5054-61. (11) Wu, C. C.; MacCoss, M. J.; Howell, K. E.; Yates, J. R., III. Nat. Biotechnol. 2003, 21, 532-8. (12) Gygi, S. P.; Rist, B.; Gerber, S. A.; Turecek, F.; Gelb, M. H.; Aebersold, R. Nat. Biotechnol. 1999, 17, 994-9. (13) Hansen, K. C.; Schmitt-Ulms, G.; Chalkley, R. J.; Hirsch, J.; Baldwin, M. A.; Burlingame, A. L. Mol. Cell. Proteomics 2003, 2, 299-314. (14) Luo, Q.; Nieves, E.; Kzhyshkowska, J.; Angeletti, R. H. Mol. Cell. Proteomics. In press. (15) Li, J.; Steen, H.; Gygi, S. P. Mol. Cell. Proteomics 2003, 2, 1198-204. (16) Qu, J.; Straubinger, R. M. Rapid Commun. Mass Spectrom. 2005, 19, 285764. (17) Ross, P. L.; Huang, Y. N.; Marchese, J. N.; Williamson, B.; Parker, K.; Hattan, S.; Khainovski, N.; Pillai, S.; Dey, S.; Daniels, S.; Purkayastha, S.; Juhasz, P.; Martin, S.; Bartlet-Jones, M.; He, F.; Jacobson, A.; Pappin, D. J. Mol. Cell. Proteomics 2004, 3, 1154-69. (18) Hirsch, J.; Hansen, K. C.; Choi, S.; Noh, J.; Hirose, R.; Roberts, J. P.; Matthay, M. A.; Burlingame, A. L.; Maher, J. J.; Niemann, C. U. Mol. Cell. Proteomics. In press. (19) Keshamouni, V. G.; Michailidis, G.; Grasso, C. S.; Anthwal, S.; Strahler, J. R.; Walker, A.; Arenberg, D. A.; Reddy, R. C.; Akulapalli, S.; Thannickal, V. J.; Standiford, T. J.; Andrews, P. C.; Omenn, G. S. J. Proteome Res. 2006, 5, 1143-54. (20) Lee, J.; Cao, L.; Ow, S. Y.; Barrios-Llerena, M. E.; Chen, W.; Wood, T. K.; Wright, P. C. J. Proteome Res. 2006, 5, 1388-97. (21) Yao, X.; Freas, A.; Ramirez, J.; Demirev, P. A.; Fenselau, C. Anal. Chem. 2001, 73, 2836-42. (22) Nelson, C. J.; Hegeman, A. D.; Harms, A. C.; Sussman, M. R. Mol. Cell. Proteomics. In press. (23) Patwardhan, A. J.; Strittmatter, E. F.; Camp, D. G.; Smith, R. D.; Pallavicini, M. G. Proteomics 2006, 6, 2903-15. 10.1021/ac060611v Not subject to U.S. Copyright. Publ. 2006 Am. Chem. Soc.

Published on Web 07/21/2006

differentiated by m/z values (or, in the case of iTRAQ, the CIDinduced release of signature ions), and the corresponding ion chromatogram peaks can be extracted. The ratio of integrated peak areas reflects the relative quantity of the peptide in contrasting samples and is used to estimate the relative quantity of the corresponding protein. A successful quantitative proteomics study using LC-MSn depends greatly on the efficacy of the labeling reaction and the resolution power of the mass spectrometer. Despite the fact that improved labeling chemistry and modern mass spectrometers have facilitated this task, postacquisition data analysis remains extremely challenging, due primarily to the huge volume of data generated and the broad, dynamic range of peptide quantities in complex mixtures. Although well-documented software tools for automated and high-throughput quantitative analyses are available,26-30 each of these tools has constraints. Some are only able to handle a limited number of raw files simultaneously and require manual validations, making them inefficient for high-throughput analyses. Others include a relatively complex installation process, require users to install additional software package(s), or require knowledge in cross-platform (e.g., Windows and Linux) applications. To simplify data analyses, we have developed a computer program, QUIL (quantification using isotope labeling), for automated quantitative analyses using stable-isotope labeling and LCMSn. QUIL is a stand-alone Windows program that does not involve complicated installation procedures. It subtracts background from survey MS precursor intensity before constructing an ion chromatogram. A new method for peak detection is also included, which tolerates nonideal peak shapes but minimizes the inclusion of neighboring peaks from area integration. Additionally outlier-resistant methods for protein ratio estimation are employed. Together these features greatly reduce the need for timeconsuming and subjective manual validation. QUIL is capable of processing large data sets and performing statistical tests for experiments where multiple samples are used per condition. The utility of QUIL for quantitative proteomics is demonstrated herein using samples with varied complexity, including cICAT-labeled bovine serum albumin (BSA), 18O-labeled soluble kidney proteins, and cICAT-labeled control and vasopressin-treated inner medullary collecting duct (IMCD) proteins. EXPERIMENTAL SECTION Standard BSA Preparation. Two aliquots (25 µg each) of a BSA sample (1 mg/mL) were labeled with light (12C) and heavy (13C) cICAT reagents, respectively. The mixture was digested with trypsin, and the labeled peptides were purified according to a protocol recommended by the manufacturer (Applied Biosystems, (24) Wu, C. C.; MacCoss, M. J.; Howell, K. E.; Matthews, D. E.; Yates, J. R., III. Anal. Chem. 2004, 76, 4951-9. (25) Cantin, G. T.; Venable, J. D.; Cociorva, D.; Yates, J. R., III. J. Proteome Res. 2006, 5, 127-34. (26) MacCoss, M. J.; Matthews, D. E. Anal. Chem. 2005, 77, 294A-302A. (27) Han, D. K.; Eng, J.; Zhou, H.; Aebersold, R. Nat. Biotechnol. 2001, 19, 94651. (28) MacCoss, M. J.; Wu, C. C.; Liu, H.; Sadygov, R.; Yates, J. R., III. Anal. Chem. 2003, 75, 6912-21. (29) Venable, J. D.; Dong, M. Q.; Wohlschlegel, J.; Dillin, A.; Yates, J. R. Nat. Methods 2004, 1, 39-45. (30) Li, X. J.; Zhang, H.; Ranish, J. A.; Aebersold, R. Anal. Chem. 2003, 75, 6648-57.

Foster City, CA). Two micrograms of the 1:1 (light/heavy) sample was analyzed by LC-MSn using ProteomeX workstation (LCQ Deca XP, Thermo Electron, San Jose, CA). Soluble Kidney Proteins. Rat kidneys were perfused with PBS prior to harvest and homogenated in 50 mM Tris-HCl buffer (pH 7.5). The sample was centrifuged at 100000g for 1 h, and the supernatant, containing soluble proteins, was collected. Proteins were denatured, reduced, alkylated, and digested with trypsin overnight as described.31 An equal amount of the digest (50 µg) was used for labeling in 16O- or 18O-enriched water using immobilized trypsin, according to the vendor’s instruction (Stratagene, La Jolla, CA). Aliquots were taken from the two peptide samples and mixed to yield an 16O/18O ratio of 1:2. LC-MSn analysis of the mixture (6 µg) was carried out using an LTQ-FTICR mass spectrometer (Thermo Electron), with the FTICR for MS survey scans and LTQ for MS/MS scans. IMCD Proteins. IMCD protein preparation, cICAT labeling, and immunoblotting were carried out as described.32,33 Briefly, four male Brattleboro rats (Harlan-Sprague Dawley, Indianapolis, IN) were infused with the V2R-selective vasopressin analogue dDAVP (Rhone-Poulenc Rorer, Collegeville, PA) at 5 ng/h for 3 days by subcutaneous osmotic minipumps (model 2001; Alzet, Palo Alto, CA). The control rats received vehicle treatment. IMCD proteins were extracted, and 400 µg of pooled proteins from the control and dDAVP-treated rats was labeled with the light (12C) and heavy (13C) cICAT reagents, respectively. The labeled proteins were mixed and separated by one-dimensional SDS-PAGE. The gel was stained with colloidal coomassie blue stain (GelCode Blue Stain Reagent, G-250, Pierce Biotechnology, Rockford, IL) for 5 min and then destained in deionized H2O for 1 h. The gel was then sliced into 16 small blocks, from the top of the stacking gel down to the dye front. In-gel digestion was carried out using trypsin. Labeled peptides were purified and analyzed by LC-MSn using an LTQ mass spectrometer (Thermo Electron). Immunoblotting for 13 selected proteins was performed using aliquots of the same protein samples used in the ICAT labeling experiment, except that they were not pooled. Proteins were first resolved by SDS-PAGE gel electrophoresis on 7.5, 10, or 12% polyacrylamide gels and transferred electrophoretically onto nitrocellulose membranes. The membranes were then blocked with 5% nonfat dry milk in immunoblot wash buffer (42 mM Na2HPO4, 8 mM NaH2PO4, 150 mM NaCl, and 0.05% Tween 20, pH 7.5), rinsed, and probed with primary antibody overnight at room temperature. After washings, blots were incubated with species-specific secondary antibodies conjugated to horseradish peroxidase. Following a thorough washing, the antibody-antigen reaction was visualized by chemiluminescence (LumiGLO; KPL, Gaithersburg, MD) using light-sensitive films developed with the Kodak M35A X-OMAT Processor (Kodak, Rochester, NY). Relative protein abundance on the Western blots was measured with a standard densitometry method. The rabbit polyclonal antibody to R-1 subunit of Na/KATPase was previously generated in our laboratory.32 The antimyosin IIA rabbit polyclonal antibody was a gift of Dr. Robert (31) Chelius, D.; Zhang, T.; Wang, G.; Shen, R. F. Anal. Chem. 2003 75, 665865. (32) Pisitkun, T.; Bieniek, J.; Tchapyjnikov, D.; Wang, G.; Wu, W. W.; Shen, R. F.; Knepper, M. A. Physiol. Genomics. In press. (33) DiGiovanni, S. R.; Nielsen, S.; Christensen, E. I.; Knepper, M. A. Proc. Natl. Acad. Sci. U. S.A. 1994, 91, 8984-8.

Analytical Chemistry, Vol. 78, No. 16, August 15, 2006

5753

Adelstein (NHLBI, Bethesda, MD). Other commercial antibodies used are as follows: aldose reductase (goat polyclonal, sc-17735), annexin II (goat polyclonal, sc-1924), annexin IV (goat polyclonal, sc-1930), HSP70 (goat polyclonal, sc-1060), RhoGDI (rabbit polyclonal, sc-360), Cdc42 (rabbit polyclonal, sc-87), and 1-3-3 ζ (rabbit polyclonal, sc-1019) are from Santa Cruz Biotechnology (Santa Cruz, CA); transglutaminase 2 (goat polyclonal, 06-471) is from Upstate (Waltham, MA); anti-GAPDH (mouse monoclonal, NB 300-221) is from Novus Biologicals (Littleton, CO); anti 143-3 θ (mouse monoclonal, ab10439) is from Abcam (Cambridge, MA); and anti 14-3-3  (mouse monoclonal, 610542) is from BD Biosciences Pharmingen (San Jose, CA). LC-MSn and Protein Identification. LC-MSn was carried out by loading the sample mixture onto a PicoFrit reversed-phase C18 column (New Objective, Woburn, MA) using a nanoflow LC system (Agilent, Palo Alto, CA). Peptides were eluted with a 0-60% acetonitrile gradient and sprayed into a mass spectrometer (LCQ Deca XP, LTQ, or LTQ-FTICR) equipped with a nanospray source. Data were acquired in data-dependent mode with dynamic exclusion. In general, one MS survey scan was followed by three or five MS/MS scans (three for LCQ and five for LTQ). For LTQFTICR, one MS survey scan using the FTICR detector was coupled to five MS/MS scans with LTQ in parallel. Protein identification was carried out on an eight-node computer cluster using the SEQUEST algorithm34 implemented in Finnigan’s BioWorks 3.1 (Thermo Electron). Protein sequence databases were downloaded from the NCBI website. For the BSA sample, the default XCorr versus charge filter was used: +1, >1.5; +2, >2.0; +3, >2.5. For soluble kidney and IMCD protein mixtures, a more stringent filter was applied to reduce false positives: +1, >1.9; +2, >2.5; +3, >3.0; plus Rsp ) 1 and delta Cn > 0.1. Quantitative Data Analyses. Quantitative analyses of the LC-MSn data were performed using QUIL, an in-housedeveloped software program. QUIL is a Windows program written in Visual C++. It has a graphical user interface that displays manually adjustable ion chromatograms to facilitate the validation of peak detection and ratio calculation. QUIL functions as follows: Step 1: Gather Experimental Design Information. Quantification with QUIL starts by documenting the experiment setup information, such as the number of groups (or conditions), the number of samples in each group, and the multiconsensus list of the identified proteins and peptides for each sample. Step 2: Extract Ion Chromatogram. This step constructs the elution profile for each peptide in every sample. If a peptide with a specific charge state is identified more than once, only the identification associated with the highest precursor intensity within a defined retention time window is used. This minimizes errors resulting from low-intensity peaks or biases in protein ratio estimation from highly abundant peptides. A peak center for each peptide is then defined as the retention time where the precursor intensity is the highest within a defined time window. The peptide’s ion chromatogram (i.e., precursor ion intensity in survey MS scans as a function of retention time) within a defined retention time window of the peak center is then constructed. To (34) Eng, J. K.; McCormack, A. L.; Yates, J. R., III. J. Am. Soc. Mass Spectrom. 1994, 5, 976-89.

5754 Analytical Chemistry, Vol. 78, No. 16, August 15, 2006

Figure 1. Dynamic background estimation for survey MS spectra. Shown is an example MS spectrum from the cICAT-labeled IMCD experiment. The background level (solid line) for each precursor is estimated locally as the 30th percentile of ion intensities within 100 mass units of the precursor ion. Inset is a zoom-in view of the background estimation for the precursor ion 787.7.

determine the precursor intensity in a survey MS scan, a dynamic background level is first estimated for the survey spectrum to account for the observation that background signal intensity is retention time- and m/z-dependent. As shown in Figure 1, the background for a specific precursor is estimated by taking a low percentile of all sorted ion intensities within a local m/z range (e.g., (100 mass units of the precursor ion). The precursor intensity is then defined as the sum of the background-subtracted intensities of all ions within a defined mass tolerance. Finally the ion chromatogram is smoothed with a seven-point weighted triangular smoothing function. Step 3: Determine the Start and End Points of a Chromatogram Peak. The start and end points of a smoothed chromatogram are defined using the full width at half-maximum (fwhm) approach. The fwhm of a peak is identified by finding the lower (L) and upper (R) retention time points where the ion intensity is 50% of that of the peak center (C), as defined in step 2. The middle of the fwhm, (L + R)/2, is redefined as the new peak center (NC). The “peak start” is defined as the peak center retention time minus fwhm (NC - fwhm), and the “peak end” is defined as NC + fwhm. Therefore, the peak actually spans two fwhms in terms of retention time (see Figure 3 for illustration). For low-quality peaks or peaks partially skewed by neighboring ones, if only L (or R) exists, then R (or L) is taken as the mirror point of L (or R) from the peak center; if neither L nor R exists, in cases where the target peak is almost flat or overwhelmed by neighboring coeluted peaks at both sides, then the peak is considered unusable. The presence of neighboring peaks at either or both ends sometimes results in the calculated fwhm (i.e., R L) being greater than a defined maximal value (M, say 0.3 min). To address this issue, the following measures are taken: if only (R - NC) [or (NC - L)] is greater than half of M, then L (or R) is retained and its mirror point is taken from the peak center as R (or L); if both are greater than half of M, the peak is deemed invalid. This way the majority, if not all, of neighboring peaks are prevented from being integrated into the target peak area, and poor-quality peaks (due to neighboring peak interference or noiselevel peaks) are effectively excluded in ratio estimation. Step 4: Integrate Peak Area and Calculate the Ratio. Following the determination of the start and end points, the peak area is calculated by the trapezoidal approximation method. The peak area of the isotopic partner of an identified peptide is also

of testing.36 If more than two groups are used, the output is formatted in such a way that data can be easily imported to commercial statistics packages for more thorough and advanced statistical tests (e.g., the ANOVA).

calculated by the same procedure. The ratio of peak areas for the peptide in heavy and light forms is then obtained. Step 5: Normalize Peptide Ratios and Estimate Protein Ratios. After the ratios for all peptides are calculated, a global normalization is performed either at the raw file level (ratios from a specific raw file are normalized by the median peptide ratio calculated from that raw file) or at the sample level if it contains multiple raw files as is the case with a MudPIT analysis (ratios are normalized by the median peptide ratio calculated from all raw files). The ratio for each protein, based on its peptide ratios, is estimated using either the median or the one-step biweight algorithm.35 Step 6: Perform Statistical Tests. If the experiment is a mixture of two samples, Student’s t test is used to assess whether, based on its peptide ratios, the obtained ratio for each protein is significantly different from 1.0. If the experiment involves two groups of samples, a Student’s t test is performed for each protein to test the significance of the observed ratio difference between groups, followed by the Westfall and Young step-down adjustment on probability values to minimize false positive rates in multiplicity

RESULTS An overview of QUIL’s major steps in the quantitative analysis of stable-isotope labeled samples is presented in Figure 2. The quantification starts with lists of proteins and peptides identified with a database search program (e.g., multiconsensus reports from BioWorks). The first step involves constructing a backgroundsubtracted ion chromatogram for each identified peptide and its isotopic partner. Start and end points of the peaks are then determined. Areas and ratios of the isotopic pairs are computed and used to estimate the relative abundance of proteins. Finally, statistical tests are performed to identify differentially expressed proteins. QUIL employs a different approach, based on the concept of fwhm, to determine the start and end of an ion peak in the chromatogram. As illustrated in Figure 3A, the fwhm of a peak is first identified. A full span of the fwhm is extended in either direction from the center of the peak to locate the left and right retention time points, which become the start and end of the peak, respectively. This approach is more tolerant to irregularity of the peak contour. For example, Figure 3B shows that even though a valley is present at the apex of the peak of a heavy-labeled peptide, QUIL correctly finds the start and end of the peak (dashed lines). The valley is probably not the result of two overlapping peaks, since the light/heavy ratio obtained is similar to other calculated peptide ratios for this protein. The fwhm approach also effectively minimizes the inclusion of coeluted neighbor peaks during target peak integration. As shown in Figure 3C, an adjacent peak is present in the ion chromatogram to the left of both the light and the heavy peaks, but that peak is outside the start and end points (dashed lines) of the target peak defined by QUIL. The maximum peak intensities in Figure 3B and C are 4× 105 and 1 × 107, respectively, suggesting that at least a 25-fold difference in dynamic range can be achieved. Since the spectral background of LC-MS is dependent on both the content of organic solvents (i.e., gradient- or retention timedependent) and the m/z of a precursor, a dynamic background level for the precursor in each survey MS scan is estimated and subtracted from the precursor intensity before the ion chromatogram is constructed and smoothed. This leads to an improved elution profile of peptide peaks and subsequent peak detections. Figure 4 shows the ion chromatograms of two peptides labeled with 18O (light/heavy ) 1:2) using the QUIL program (panels a and c). For comparison, extracted elution profiles for the same peptides using a commercial software tool, XPRESS (in BioWorks 3.1), are also presented (panels b and d). Using QUIL, a significant improvement in chromatogram construction and peak detection is evident, and a smoother peak shape and the correct identification of the start and end points for each peak are observed. Only fractions of the peaks were used by XPRESS for ratio calculation. This is a typical problem encountered in XPRESS, which may in part explain the requirement for manual validation with this tool.

(35) Hoaglin, D. C.; Mosteller, F.; Tukey, J. W. Understanding Robust and Exploratory Data Analysis; John Wiley & Sons: New York, 2000.

(36) Ewens, W. J.; Grant, G. R. Statistical Methods in Bioinformatics: An Introduction; Springer: New York, 2001.

Figure 2. Overview of QUIL’s major steps in quantitative analyses of LC-MSn data of protein mixtures labeled by stable isotopes. Labeled sample mixtures are analyzed by LC-MSn, followed by protein and peptide identification. For each peptide, the MS background level is estimated and subtracted before constructing a smoothed ion chromatogram. The peak start and end points are determined using the fwhm method. Area of the peak and the ratio of isotopic pairs are calculated. A protein ratio is computed based on its peptide ratios using outlier-resistant algorithms. Statistical tests are then carried out with step-down p value adjustments to identify differentially expressed proteins.

Analytical Chemistry, Vol. 78, No. 16, August 15, 2006

5755

Figure 3. Fwhm approach to determining the start and end of a peak. (A) Schematic description of the fwhm approach. Given the hypothetical peak, fwhm is first calculated (upper panel). Starting at the horizontal center, move fwhm to the left and right, which define the start and end of the peak, respectively. (B) Example showing that the fwhm approach is resistant to irregularities in peak shape. The ion chromatogram shown is for the peptide R.EQEC*QQEC*QQESQQESQQESQQEQQGSS.s(+3; scan 3298; RT 20.134 min; identified in heavy form) from an IMCD sample labeled with cICAT. The X-axis represents the retention time, and the Y-axis represents intensity in arbitrary units. Dashed lines indicate the start and end of the peak calculated with the fwhm approach. The peak for the light form has a regular smooth shape (upper panel), but the heavy form has a valley at the apex (lower panel). The valley however, does not prevent the fwhm approach from correctly identifying the start and end points. (C) Example showing that the fwhm approach minimizes the inclusion of neighboring coeluting peaks. The ion chromatogram is for the peptide R.DKPLKDVIIVDCGK.I (+3; scan 4559; RT 24.949 min; identified in light form) from an IMCD sample labeled with cICAT. The coeluting peak to the left of both the light and heavy forms is clearly seen. The fwhm approach effectively minimizes the integration of the area that belongs to neighboring coeluting peaks.

In Figure 4A, both QUIL and XPRESS provide ratios close to the expected value (1:2). In this case, XPRESS yielded the acceptable result by chance. In most cases, a partial peak comparison leads to a less-than-ideal ratio, as shown in Figure 4B. Standard BSA Protein Labeled by cICAT. To validate the utility of QUIL for quantitative proteomics, we first analyzed a simple mixture composed of a 1:1 ratio of the light and heavy cICAT-labeled BSA and compared the results with those obtained by XPRESS. A total of 15 unique peptides were identified, 10 of which were in both light- and heavy-labeled forms. All identified 5756 Analytical Chemistry, Vol. 78, No. 16, August 15, 2006

peptides contain at least one cysteine, indicating that they are likely bona fide cICAT-labeled products. The light-to-heavy ratios of the peptides are presented as a scatter dot plot in Figure 5. All but one of the ratios obtained from QUIL (Figure 5C) distribute narrowly around the expected value (1.0). The majority concentrate around the median (0.88) in a small range (25th percentile 0.82; 75th percentile 0.92). The nonparametric coefficient of variation (CV), defined as the median absolute deviation divided by the median, is a robust measure of the dispersion and variability of calculated ratios. The CV of peptide ratios from QUIL was 6%.

Figure 4. Ion chromatograms showing improved peak construction and detection in QUIL. Soluble kidney proteins were labeled with 16O or 18O and mixed at the ratio of 1:2 (16O/18O). The mixture was analyzed by LTQ-FTICR. (A) The reconstructed ion chromatograms for the light and heavy form R.VPTPNVSVVDLTCR.L (charge +2, scan 6892, RT 38.141 min, identified in light form) from QUIL (a) and XPRESS (b). The X-axis represents the retention time in minutes, and the Y-axis represents the intensity (in absolute arbitrary units for QUIL; in relative term for XPRESS). In panel a, dashed lines indicate the start and end points used for calculating the peak area and ratio. In panel b, shaded areas are used for ratio calculation. (B) The reconstructed ion chromatograms for the light and heavy form K.HELQANCYEEVK.D (charge +2, scan 3563, RT 23.19 min, identified in light form) from QUIL (c) and XPRESS (d).

For comparison, quantification was also analyzed with the XPRESS software. As shown in Figure 5A, the light-to-heavy ratios from XPRESS exhibit a higher variability, spreading from 0.14 to 6.67 with a median of 0.90 and a CV of 21%. After manual inspection and adjustment, the few outliers disappeared, and a slight improvement in overall ratio estimation was obtained with a median of 0.91 and a CV of 19% (Figure 5B). These data suggest

a significantly improved precision using QUIL, without manual validation, in the quantitative analyses of the cICAT-labeled BSA, yielding ∼70% reduction in relative error compared to XPRESS. Soluble Kidney Proteins Labeled by 18O. We then applied QUIL to quantify labeled proteins in a more complex mixture with a predefined ratio. Soluble rat kidney proteins were labeled with 16O or 18O and mixed at a ratio of 1:2 (16O/18O). The labeling Analytical Chemistry, Vol. 78, No. 16, August 15, 2006

5757

Figure 5. Distribution of peptide ratios obtained by QUIL and XPRESS from the 1:1 BSA mixture. BSA was labeled in light or heavy form by cICAT reagents and mixed at a 1:1 ratio (light/heavy). The mixture was analyzed by LCQ deca XP. Light/heavy ratios of identified peptides from XPRESS (A, no manual adjustment; B, manually inspected and adjusted) and QUIL (C, no manual adjustment) are shown on a scatter dot plot. The horizontal line in the middle of each dot set is the median.

reaction with 18O was complete as suggested by the observation that, in an LC-MSn run of the 18O-labeled sample, all peptides identified were 18O modified at C termini and no 16O counterpart could be detected in survey MS spectra. Due to the small difference in mass between light- and heavy-labeled peptide pairs, an FTICR detector was used to acquire survey MS data for the construction of the peptide elution profiles. LTQ was then used to acquire MS/MS data for identification purposes. A total of 170 proteins were identified, 51 of which had 3 or more peptides identified per protein. The distribution of 90% of all peptide ratios is shown as a box and whisker plot in Figure 6A. Values lower than 5 and higher than 95th percentile were not included to avoid the distortion caused by a small number of extreme values. In theory, all peptide ratios calculated should be 0.5 (light/heavy). As can be seen, most ratios obtained from QUIL narrowly concentrate around 0.5, ranging from 0.12 to 0.76 (25th percentile 0.37; 75th percentile 0.54) with a median of 0.45 and CV of 18%. Of the 51 proteins each identified by three or more peptides, the median protein ratio estimated by QUIL was 0.45, with 25 and 75th percentile of 0.40 and 0.54, respectively. The distribution of CV, calculated for each protein based on its peptide ratios, is shown in Figure 6B. The median CV using QUIL was 8%, with most less than 20% (25th percentile 3%; 75th percentile 20%). Vasopressin-Treated IMCD. Finally, we applied QUIL to analyze proteomic changes of IMCD induced by the vasopressin analogue dDAVP in rats. Control and treated IMCD proteins were extracted and labeled separately with a light or heavy form of cICAT. The mixture was subject to 1D gel separation and in-gel digestion. Sixteen digested fractions were analyzed by LC-MSn using an LTQ mass spectrometer. Since no prior knowledge about the quantitative protein changes was available, Western blotting was performed for 13 selected proteins. Results obtained from the density ratios of immunoblots were correlated with the quantitative results obtained from LC-MSn. A total of 249 proteins, each with three or more peptide ratios determined by QUIL, were identified. The CV distribution for 5758 Analytical Chemistry, Vol. 78, No. 16, August 15, 2006

Figure 6. Distribution of peptide ratio and protein CV obtained from 18O-labeled kidney protein mixture using QUIL. Soluble kidney proteins were labeled with 16O or 18O by enzymatic reaction. Aliquots were taken and mixed at the ratio of 1:2 (light/heavy). The mixture was analyzed by LC-MSn with LTQ-FTICR. (A) The ratios of identified peptides calculated by QUIL are shown in a box and whisker plot. Ratio values below 5th percentile and above 95th percentile are not included. The horizontal line inside the box represents the median. The upper and lower boundaries of the box correspond to the 75th and 25th percentiles, respectively. The ends of the whisker indicate the minimum (bottom) or maximum (top) ratios. (B) Frequency distribution of the CV in protein ratio estimation for proteins identified with at least 3 peptides. The X-axis represents the CV, and the Y-axis represents the relative frequency.

these proteins is shown in Figure 7. Most CV values fall below 20%, with a median of 10%, and the 25 and 75th percentile are 6 and 16%, respectively. Note that the median CV (10%) is close to those obtained for the BSA sample (6%) or soluble kidney proteins (8%), suggesting that QUIL demonstrates improved quantitative precision and accuracy not only in standard protein or a simple mixture but also in complex samples. To verify protein abundance changes obtained from QUIL analyses, 13 selected proteins were quantified by immunoblotting assays. Densitometric intensities of the immunologically positive bands were determined, and the ratios were calculated for dDAVPtreated and control samples. As seen in Table 1, for 8 out of 13 proteins (62%), the ratios obtained from QUIL were in agreement with the results obtained from Western blotting, in terms of the trend of changes (up- (ratio >1.0) or downregulation (ratio