Enhanced Analysis of Metastatic Prostate Cancer Using Stable Isotopes and High Mass Accuracy Instrumentation Patrick A. Everley,†,‡ Corey E. Bakalarski,† Joshua E. Elias,† Carol G. Waghorne,‡ Sean A. Beausoleil,† Scott A. Gerber,† Brendan K. Faherty,† Bruce R. Zetter,†,‡ and Steven P. Gygi*,† Department of Cell Biology, Harvard Medical School, Boston, Massachusetts 02115, and Program in Vascular Biology, Department of Surgery, Children’s Hospital, Boston, Massachusetts 02115
The primary goal of proteomics is to gain a better understanding of biological function at the protein expression level. As the field matures, numerous technologies are being developed to aid in the identification, quantification and characterization of protein expression and post-translational modifications on a near-global scale. Stable isotope labeling by amino acids in cell culture is one such technique that has shown broad biological applications. While we have recently shown the application of this technology to a model of metastatic prostate cancer, we now report a substantial improvement in quantitative analysis using a linear ion-trap Fourier transform ion cyclotron resonance mass spectrometer (LTQ FT) and novel quantification software. This resulted in the quantification of nearly 1400 proteins, a greater than 3-fold increase in comparison to our earlier study. This dramatic increase in proteome coverage can be attributed to (1) use of a double-labeling strategy, (2) greater sensitivity, speed and mass accuracy provided by the LTQ FT mass spectrometer, and (3) more robust quantification software. Finally, by using a concatenated target/decoy protein database for our peptide searches, we now report these data in the context of an estimated false-positive rate of one percent. Keywords: mass spectrometry • SILAC • prostate cancer • quantitative proteomics
Introduction Different peptide ionization efficiencies and run-to-run variability, especially with complex samples, make peptide quantification by mass spectrometry extremely challenging. The most accurate means of achieving quantification by mass spectrometry on a large scale requires differential labeling with stable isotopes. Chemical and metabolic labeling strategies have been developed to accomplish this task, which utilize stable isotopic variants of hydrogen, carbon or nitrogen. Notably, isotope-coded affinity tags (ICAT)1 and stable isotope labeling by amino acids in cell culture (SILAC)2 are commonly used to allow large-scale and highly quantitative proteomic comparisons among biological samples. The quantitative nature of SILAC results from the metabolic incorporation of isotopic amino acid variants which are easily distinguished by mass spectrometry. To this end, cells are cultured under different biological states with either 12C or 13C amino acids added to the growth medium of each condition. This results in “light” and “heavy” populations of proteins that can be mixed, proteolytically digested and analyzed as a single sample. Peptides that have the same amino acid sequence and are present in both light and heavy conditions are chemically identical with respect to chromatographic elution and ioniza* To whom correspondence should be addressed. Department of Cell Biology, 240 Longwood Avenue, Boston, MA 02115. Tel: (617) 432-3155. Fax: (617) 432-1144. E-mail:
[email protected]. † Department of Cell Biology, Harvard Medical School. ‡ Program in Vascular Biology, Department of Surgery, Children’s Hospital.
1224
Journal of Proteome Research 2006, 5, 1224-1231
Published on Web 04/21/2006
tion efficiency, but have distinct masses. The resolution of these light and heavy peptides by mass-to-charge ratio (m/z) yields separate peak distributions in the mass spectrum (MS scan). A comparison of the two peak intensities as a function of time provides quantitative information for each peptide pair. Since its development, SILAC has helped explore a variety of biological areas, including B-cell3 and muscle cell differentiation,2 differences between T helper 1 and 2 cells,4 proteinprotein interactions associated with EGF signaling5 and tumor metastasis.6 In addition, SILAC has been instrumental in the analysis of post-translational modifications, specifically phosphorylation7,8 and methylation.9 We recently published an application of SILAC to a model of metastatic prostate cancer (PCa) in which we reported the quantification of 444 proteins.6 However, a motivation to extend and improve this analysis resulted from further developments in mass spectrometry tools, as well as an increased interest in SILAC among the proteomics community. We also sought to create a quantitative algorithm that exploits the high mass accuracy of Fourier transform ion cyclotron resonance (FTICR) spectra and is compatible with both SEQUEST10 and Mascot11 search data. To improve the precision of peak intensities reported by the FTICR, we chose to use chromatographic peak area for a more reliable assessment of peptide quantification. Furthermore, because incorporating additional differentially labeled isotopic amino acids in SILAC experiments should increase the number of quantifiable proteins, we also wanted 10.1021/pr0504891 CCC: $33.50
2006 American Chemical Society
Enhanced Analysis of Metastatic Prostate Cancer
to employ a double-labeling strategy. Indeed, others have already shown the effectiveness of this double-encoding approach.7 There have been a number of improvements in mass spectrometry instrumentation, such as the development of the LTQ FT hybrid mass spectrometer. This instrument generates an unprecedented quantity and quality of data by combining the speed and sensitivity of a linear ion trap with high mass accuracy associated with FTICR spectra. In addition, we feel that it is important to report such large proteomic datasets in the context of an estimated false-positive (FP) rate. Accordingly, a composite target/decoy protein database was utilized to attain an estimated 1% FP rate. With these tools in hand, the goal of this current work was to achieve a significantly advanced analysis of metastatic PCa using double-labeling SILAC, high mass accuracy instrumentation and in-house developed quantification software.
Methods Cell Culture and Protein Preparation. PC3M and PC3MLN4 cells were cultured and microsomal lysates collected as described,6 except PC3M-LN4 cells were grown in the presence of both 13C6-lysine and 13C6-arginine (98% purity; Cambridge Isotope Laboratories, Andover, MA). Additional information about SILAC can be found at http://silac.org/. Protein Separation and Digestion. PC3M and PC3M-LN4 lysates were standardized using the BCA assay (Pierce Biotechnology, Inc., Rockford, IL) and combined at a 1:1 ratio using 150 µg protein from each sample (300 µg total), boiled in SDSPAGE sample buffer, resolved on a 10% Tris-glycine gel and stained with Coomassie Blue. A single gel lane was excised, divided horizontally into 12 sections and each section was processed separately for tryptic digestion as described.6 LC-MS/MS Analysis. Liquid chromatography tandem mass spectrometry (LC-MS/MS) was performed using an LTQ FT hybrid linear (2-D) ion trap-Fourier transform ion cyclotron resonance (FTICR) mass spectrometer12 (ThermoElectron, San Jose, CA). Eight percent of each reconstituted sample was loaded onto a 125 µm (i.d.) by 18 cm (length) fused silica C18 (Magic C18-AQ, 200 Å pore size, 5 µm diameter, Michrom BioResources, Auburn, CA) microcapillary column using a FAMOS capillary autosampler (LC Packings, Sunnyvale, CA) and an Agilent 1100 series binary HPLC pump (Agilent Corporation, Palo Alto, CA) with an in-line flow splitter. Peptides were transferred from the autosampler directly to the resolving column for 20 min at 120 bar in Buffer A (3% acetonitrile, 0.15% formic acid), followed by gradient elution at 60 bar from 7% to 33% Buffer B (97% acetonitrile, 0.15% formic acid) over 55 min. During gradient elution, 10 ion-trap MS/MS spectra were acquired per data-dependent cycle from a high-resolution (R set at 100 000) FTICR master spectrum. To maximize dynamic range for this quantitative study, an automatic gain control (AGC) setting of 8 × 106 was used for FTICR ion accumulation. A threshold of 2000 counts was required to trigger an MS/MS event and, when possible, the LTQ FT operated in a parallel processing mode in which the LTQ and ICR cells were both used for simultaneous ion detection. FTMS master spectra were recalibrated postacquisition using a simple, single-point strategy that compensates for scan-to-scan variabilities of induced electric field. For improved mass accuracy, in-house software was utilized to create dta files (for SEQUEST searches) by extracting high-resolution precursor-ion m/z values from the
research articles corresponding FTICR master spectra (Beausoleil et al., manuscript submitted). Protein Identification and Determination of False-Positive Rate. Newly created dta files were searched with a fully tryptic enzyme constraint (two tryptic termini and up to two missed cleavages) against a concatenated target (forward orientation) and decoy (reversed orientation) human IPI protein database (ftp.ebi.ac.uk/pub/databases/IPI/current/) using the SEQUESTSORCERER algorithm (Sage-N Research, San Jose, CA) and a mass tolerance of (50 parts-per-million (PPM). Searches were performed with a 6.0201 Dalton (Da) dynamic modification on lysine and arginine residues to account for the presence of both light and heavy peptide species, as well as a 71.0371 Da static modification on cysteines due to an added acrylamide moiety and a 15.9949 Da dynamic modification on methionines for differential oxidation states. The composite target/decoy database13 was used to generate score criteria that yielded an estimated FP rate of 1% (precision of 0.99). A dCn of 0.16, a mass accuracy filter of (20 PPM and XCorr values of 1.7, 1.0, 1.3, and 1.7 for peptide charge states of 1, 2, 3, and 4, respectively were empirically determined to meet these criteria (Supporting Information Figure 1). Peptide Quantification. Peptide quantification was performed using a novel automated method developed in-house (VISTA, Bakalarski et. al, manuscript in preparation). To summarize, the sequence composition of each peptide was calculated to generate the theoretical masses of both light and heavy species, which were then used to extract ion chromatogram intensities separately for each variant from FTICR master spectra. Candidate peaks were required to have a mass accuracy of better than (20 PPM with respect to the theoretical mass of each isotopic variant and further filtered to ensure the presence of the predicted isotopic distribution. The background noise level was dynamically determined based on all peaks observed within both a (25 m/z window around the theoretical species masses and the adjacent 40 FTICR master spectra. For the purposes of this study, we defined the signal-to-noise (S/ N) ratio for each species as the ratio of the maximum peak intensity observed to the background noise baseline. Peak boundaries were established by extension from the location of the MS/MS scan outward until the intensity dropped below the background noise level. For each isotopic variant, the background-subtracted area under the curve (AUC) was separately determined as a function of elution time. The ratio of peptide abundance between heavy (PC3M-LN4) and light (PC3M) samples was calculated from these results. Quantified peptides were scored for quality using empirically determined test conditions, including sufficient S/N, an adequate number of intensity observations across the peak envelope, temporally concurrent detection of both light and heavy species and high mass accuracy of peaks with respect to their theoretical masses. Data obtained from these assessments were weighted and combined into a score ranging from 0 (worst) to 100 (best). Protein Quantification. Peptide selection for protein identification and quantification was performed in four stages. In the first, proteins were selected from the list of peptides that matched the XCorr, dCn, and PPM thresholds corresponding to an estimated 1% FP rate (see above). Second, because a higher number of peptides ultimately correlates with better confidence in protein identification,14 additional peptides that met minimal criteria (XCorr ) 1.0, dCn ) 0.05, (20 PPM) were allowed if they matched to already-identified proteins. This step Journal of Proteome Research • Vol. 5, No. 5, 2006 1225
research articles
Everley et al.
Figure 1. Schematic showing the Stable Isotope Labeling with Amino Acids in Cell Culture (SILAC) method. PC3M (low metastatic) and PC3M-LN4 (high metastatic) cells were cultured in standard media or media containing 13C6-lysine and 13C6-arginine, respectively. Following isolation of microsomal proteins, equal amounts of protein from each sample were combined, creating a single sample that was then separated by SDS-PAGE. The entire gel lane was divided into 12 regions and each section was processed for mass spectrometry. Quantitative data were determined by comparing the extracted ion chromatograms during elution of light and heavy peptides using VISTA, a quantification program developed in-house.
also increased confidence in protein quantification by providing additional, independent ratio observations in our analysis. In the third stage, the quantitative ratios associated with each peptide were accepted if peptides met these minimal criteria, as well as a stringent confidence score (g90) assigned by VISTA. Finally, the Grubbs test15 was used to eliminate outliers in quantification for cases in which multiple peptides were quantified for each protein. Briefly stated, the Grubbs test is a statistic that incorporates the number of items being considered and the standard deviation (SD) of those items to determine if a given data point is significantly different from the remaining population of values. Peptides were assembled into proteins from this final list using Excel. For purposes of quantification, peptides with the same amino acid sequences were considered unique if they had different charges, different states of methionine oxidation or were from different collections of the 12 fractions (Figure 1). For purposes of identification, peptides with different primary amino acid sequences (except for Ile/Leu substitutions) were considered unique. Protein ratios were calculated by averaging the log2-transformed nonredundant peptide ratios and then normalized to the median ratio value to give a population centered on 0. Determination of Isotopic Amino Acid Incorporation and Quantitative Dynamic Range. A small amount of purified protein from PC3M-LN4 was analyzed to determine the percentage of metabolic 13C6-lysine and 13C6-arginine incorporation and to establish a quantitative dynamic range for our SILAC studies. PC3M-LN4 protein (20 µg) was resolved by SDS-PAGE, and a small gel section was excised and digested with trypsin. This sample was analyzed by LC-MS/MS and subjected to quantitative analysis as described above. Incorporation for both 13C -lysine and 13C -arginine was observed to be 98% after 5 6 6 population doublings, with a maximum quantitative dynamic range of approximately 50 (data not shown). Metabolic incorporation of 13C6-lysine and 13C6-arginine is illustrated in Supporting Information Figure 2. Western Blotting. Whole cell lysates (WCL) were collected from PC3M and PC3M-LN4 cells in TBS-lysis buffer (10 mM Tris pH 7.5, 150 mM NaCl, 1% Triton X-114). Microsomal lysates were prepared as described.6 Equal microgram amounts of protein were separated by SDS-PAGE and transferred to 1226
Journal of Proteome Research • Vol. 5, No. 5, 2006
Immobilon-P membrane (Millipore Corporation, Bedford, MA). The membranes were blocked with 8% fat-free milk and immunoblotted with E-cadherin monoclonal antibody C36 (BD Biosciences, San Jose, CA), UCHL1 polyclonal antibody (Abcam Inc., Cambridge, MA) and Actin monoclonal antibody C4 (Chemicon International, Inc., Temecula, CA). Detection was performed using HRP-conjugated secondary antibodies visualized by using Western Lightning Chemiluminescence Reagent Plus (Perkin-Elmer Life Sciences Inc., Boston, MA) exposed to HyBlot Cl film (Denville Scientific Inc., Metuchen, NJ).
Results and Discussion The PC3M and PC3M-LN4 lines have been shown to be excellent models of PCa metastasis.16 We therefore sought to reexamine these cells with vastly improved tools to validate our previous results and identify metastatic-specific protein expression differences. The PC3M line has been previously found to be poorly metastatic to distant sites, whereas its derivative (PC3M-LN4) exhibits a much higher metastatic potential.16 Differentially expressed membrane proteins from these cells may represent the most useful biomarkers; therefore, we chose to evaluate microsomal lysates as an enrichment strategy for this class of proteins17 and to provide an accurate comparison with our previous study.6 Figure 1 provides a schematic of the procedure used for our SILAC analysis. The quantification in our earlier SILAC study was based solely on lysine-containing peptides, as only heavy and light variants of this essential amino acid were used.6 Since others have recently shown successful use of both arginine and lysine isotopic variants for SILAC,7 and because the specificity of trypsin used during sample digestion results in lysine- and arginine-containing peptides, we chose a double-labeling approach as a means to significantly increase the number of quantifiable proteins. Indeed, many proteins were identified only by arginine-containing peptides in our previous study and could not be quantified because no isotopically modified arginine was employed. Additional care should be taken when using nonessential amino acids in SILAC, as cells may synthesize undesired byproducts resulting from an inaccurate supply of isotopic amino acids. For example, too much heavy arginine added to the growth media can activate the arginase pathway in some cells, producing 13C variants of proline.18 On the other
Enhanced Analysis of Metastatic Prostate Cancer
Figure 2. Effect of using precursor ion mass accuracy filtering on peptide identification. More than 68 000 MS/MS spectra were collected in this study. Peptides were identified using the SEQUEST algorithm with 50 PPM search tolerance and allowing for SILAC-specific mass increases at lysine and arginine. In addition, a composite target/decoy database searching approach was used to permit the direct assessment of FP rates and the effect of mass accuracy filtering. (A) Distribution of all doubly charged peptide spectral matches (PSMs) from this study with dCn values larger than 0.16 showing XCorr and mass deviation (PPM). PSMs derived from the decoy (reversed) database (red dots) were randomly distributed throughout the (50 PPM search space, whereas PSMs derived from the target (forward) database (blue dots) were concentrated in the region of (20 PPM. (Note: Some peptides fall out of the (50 PPM search range due to external recalibration). (B) Effect of filtering for (20 PPM precursor ion tolerance after searching. An FP rate of approximately 4% was observed by applying only a dCn filter of 0.16 (dotted blue line vs dotted red line). Selecting peptides using an additional filter of (20 PPM (solid blue line vs solid red line, dCn ) 0.16) yielded an estimated 1% FP rate.
hand, culturing cells in too little arginine will result in production of its 12C version. An optimization of arginine concentration is required because both of these scenarios will lead to an inaccurate peptide ratio calculation. For PC3M and PC3M-LN4 cells, we found that adding one-third the recommended amount of heavy arginine to our growth media used to culture unlabeled cells6 resulted in approximately 98% incorporation with negligible 13C-proline production, as observed by analyzing
research articles MS spectra of proline-containing peptides (Supporting Information Figure 2). A major benefit of using an LTQ FT mass spectrometer is that it represents two instruments in one. Our previous study was performed using a quadrupole time-of-flight (QSTAR) mass spectrometer.6 In comparison to the QSTAR, LTQ-based instrumentation has a higher duty cycle and superior sensitivity for peptide detection.19 Moreover, precursor ions are detected with high mass accuracy in the FTICR cell, which resulted in a much higher confidence in correctly assigned MS/MS spectra. We used a concatenated target/decoy database searching strategy to increase our confidence in the entire dataset and to greatly reduce the need for manual validation. The underlying assumption is that there is an equal chance of incorrectly assigning an FP peptide to the target database as there is to the decoy database, as shown in Supporting Information Figure 3. Our results indicate that approximately 50% of the falsepositive hits match to the reversed database using this forward/ reversed (target/decoy) searching approach, as expected. These results demonstrate the effectiveness of searching large datasets against concatenated forward/reversed protein databases to estimate FP rates. A clear relationship was observed between SEQUEST’s XCorr score and mass accuracy with respect to peptides derived from the target/decoy database (Figure 2A). While an XCorr value of 2.0-2.5 is generally used for +2 peptides searched against a fully tryptic database, Figure 2A indicates that an application of this threshold would reduce the overall FP rate but would exclude many target-derived peptides that have low PPM values. Figure 2B shows that we can significantly reduce the FP ratesyet still retain the vast majority of peptides derived from the target databasesby simply applying a threshold of (20 PPM. Doing so allowed us to use lower XCorr threshold values for peptides in each charge state. Moreover, applying a PPM threshold of (20 resulted in an FP rate of approximately 1% (Supporting Information Figure 4A) and nearly maximized the total number of peptides (Supporting Information Figure 4B). Using XCorr, dCn, and PPM thresholds that provided an estimated 1% FP rate (Methods) resulted in the identification of 26 526 tryptic peptide spectral matches (PSMs). Only those PSMs with the highest VISTA confidence scores (g90; Supporting Information Figure 5) were retained for quantification followed by the removal of redundant peptides (Methods). This resulted in a total of 1395 quantified proteins (Table 1 and Supporting Information Table 1). At least one peptide quantification was manually validated per protein, taking into consideration similar PPM values and elution profiles for both heavy and light species and ensuring that the quantitative analysis was based on a comparison of each monoisotopic peak. Figure 3 follows one peptide corresponding to Proliferation-associated protein 2G4 through the identification and quantification process. Up-regulation of this protein in PC3MLN4 cells was consistent with our previously published results.6 The same cellular models and purification scheme were used in our previously published SILAC results.6 For brevity, we chose to assess quantitative reproducibility by evaluating the 25 most upregulated and 25 most down-regulated proteins from our earlier work. Of these, 72% (36 of 50) were identified in our current SILAC analyses (Supporting Information Table 1). This identification overlap is consistent when comparing data obtained by QSTAR (previous work) and LTQ (current work) instruments.19 In addition, greater than 80% (30 of 36) Journal of Proteome Research • Vol. 5, No. 5, 2006 1227
research articles
Everley et al.
Table 1. Proteins Found Exclusively in PC3M or PC3M-LN4 Samples found only in:a
PC3M PC3M PC3M PC3M PC3M PC3M PC3M PC3M-LN4 PC3M-LN4 PC3M-LN4 PC3M-LN4 PC3M-LN4 PC3M-LN4 PC3M-LN4 PC3M-LN4 PC3M-LN4 PC3M-LN4 PC3M-LN4 PC3M-LN4 PC3M-LN4 PC3M-LN4 PC3M-LN4 PC3M-LN4 PC3M-LN4 PC3M-LN4
database
peps quantb
peps IDc
avg S/Nd
max S/Ne
P04114 P13796 Q9UII7 Q08380 Q6VU69 Q9Y5Y6 Q13938 NP_006833 P33897
UniProt/Swiss-Prot UniProt/Swiss-Prot UniProt/TrEMBL UniProt/Swiss-Prot UniProt/TrEMBL UniProt/Swiss-Prot UniProt/Swiss-Prot RefSeq NP data set UniProt/Swiss-Prot
4 3 3 2 2 2 2 4 4
4 6 4 4 3 3 2 5 4
10.09 7.82 16.36 5.99 7.00 6.62 8.85 7.48 8.36
15.95 9.82 32.21 6.35 10.23 7.77 11.03 11.45 12.79
IPI00411968 IPI00018352
ENSP00000343198 P09936
Ensembl UniProt/Swiss-Prot
4 4
4 4
44.95 12.77
66.05 18.42
IPI00336047 IPI00217606 IPI00166011 IPI00010800 IPI00410157
ENSP00000314032 Q8IY21 Q5T3R3 P48681 Q8IWB8
Ensembl UniProt/TrEMBL UniProt/TrEMBL UniProt/Swiss-Prot UniProt/TrEMBL
3 3 3 2 2
6 5 3 6 5
5.90 7.22 11.02 8.42 10.17
6.13 8.50 17.71 11.61 14.28
IPI00033516
Q96CW5-1
UniProt/Swiss-Prot
2
5
6.12
6.74
IPI00166010
NP_057368
RefSeq NP data set
2
3
6.41
6.74
IPI00029111 IPI00163505
Q6DEN2 Q14498-1
UniProt/TrEMBL UniProt/Swiss-Prot
2 2
3 3
5.24 10.01
5.25 10.46
IPI00219291
NP_001003713
RefSeq NP data set
2
2
6.89
7.88
IPI00383949 IPI00385729 IPI00289776 IPI00000807
NP_060548 Q5T5R7 O75592 Q9Y4I1-1
RefSeq NP data set UniProt/TrEMBL UniProt/TrEMBL UniProt/Swiss-Prot
2 2 2 2
2 2 2 2
28.79 9.25 7.15 5.19
51.71 10.26 7.74 5.37
protein
IPI reference
entry
Apolipoprotein b-100 precursor L-plastin E-cadherin Galectin-3 binding protein precursor Laminin R3 subunit isoform 2 Suppressor of tumorigenicity 14 Calcyphosine Splicing factor 3b subunit 2 ATP-binding cassette, sub-family d, member 1 Unknown protein Ubiquitin carboxyl-terminal hydrolase isozyme L1 (UCHL1) Splice isoform long of myosin IXb Hypothetical protein FLJ20035 Novel protein Nestin CCR4-not transcription complex, subunit 1, isoform b Splice isoform 1 of γ-tubulin complex component 3 CCR4-not transcription complex, subunit 1 isoform a DPYSL3 protein Splice isoform 1 of RNA-binding region containing protein 2 ATP synthase, f0 complex, subunit f isoform 2b FLJ10378 protein isoform 1 Otthump00000022614 Protein associated with myc Splice isoform 1 of myosin Va
IPI00022229 IPI00010471 IPI00025861 IPI00023673 IPI00003951 IPI00001922 IPI00465352 IPI00221106 IPI00291373
a Sample (PC3M or PC3M-LN4) in which protein was detected and quantified; undetectable in other sample. b Peptides used for protein quantification. Peptides used for protein identification. d Average signal-to-background noise ratio for all quantified peptides. e Signal-to-background noise ratio of most intense quantified peptide. c
exhibited consistent quantifications. Consistency was defined as having at least a 1.5-fold increase or decrease for proteins found earlier to be upregulated or downregulated, respectively. Because reproducibility and linearity of stable isotope labeling experiments have typically shown internal variances of less than 15%,18 a threshold of 50% (1.5-fold) was chosen as a starting point for assessing reproducible differences. Of the six inconsistent proteins, five were previously quantified by only a single peptide. It is generally accepted that the reliability of qualitative and quantitative results is improved by increased peptide coverage. On average, three peptides per protein were used for quantification in our previous results, whereas we currently report an average of six peptides per protein in this study. Thus, this work provided improved confidence in protein identification and quantification compared to our earlier study. One concern with large proteomic studies using mass spectrometry is how to deal with proteins identified by only a single peptide, which corresponded to approximately 20% (274 of 1395) of the proteins in this dataset. Because nearly all falsepositive matches also fall in this category, we carefully evaluated these single-peptide identifications to ensure similar quantitative distributions as proteins identified by two or more peptides. The median log2-transformed SILAC ratio (heavy/light) for all proteins (Figure 4A) was 0.01 (n ) 1370, SD ) (1.26), indicating an approximate 1:1 ratio. For proteins identified by only a single peptide, the median ratio was -0.03 (n ) 274, SD ) (1.19), which compares to a median ratio of 0.04 for proteins identified by two or more peptides (n ) 1096, SD ) (1.27). These results are reassuring in that the quantitative distribution of single1228
Journal of Proteome Research • Vol. 5, No. 5, 2006
peptide identifications is similar to that of proteins identified by numerous peptides, as noted by nearly equal SD values for both populations. An ontological analysis of proteins found differentially expressed by greater than 5-fold revealed some noteworthy differences between these metastatic variants (Figure 4B and C). More than 10% of the most highly downregulated proteins could be classified as cell adhesion molecules, whereas none of the most highly upregulated proteins fell into this category. This is consistent with other studies in which proteins associated with cellular adhesion have been shown to be decreased or even lost during metastatic progression.20 Another difference was a 4-fold increase in proteins involved in regulating cell proliferation and differentiation in the highly metastatic PC3MLN4 cells, an expected trend as many cancers develop increased proliferative capabilities and become poorly differentiated during tumor progression.21 SILAC software available at the time of our previous study could report quantitative data only if heavy and light variants were both present. VISTA has the added advantage of expressing quantitative information relative to background noise levels when just one isotopic variant is present. Such information cannot be obtained with standard, label-free shotgun proteomics methods. Indeed, proteins that are found only in one condition may serve as the most useful biomarkers and could be particularly valuable as chemotherapeutic targets or as antigens for potential immunotherapy. However, many proteins in our preliminary list were identified by a single peptide and found to exist in only one cellular state. Great care should be
Enhanced Analysis of Metastatic Prostate Cancer
Figure 3. Example of the identification and quantification of a SILAC peptide pair from Proliferation-associated protein 2G4. (A) MS full scan from 400 to 1000 m/z with detailed view of the tryptic SILAC peptide pair for TIIQNPTDQQK (inset). K* indicates 13C6 (“heavy”) modification on lysine. Red and blue peaks correspond to light and heavy peptide isotopic envelopes, respectively. The small isotopic envelope near 651 m/z is unrelated to peptide TIIQNPTDQQK. (B) MS/MS fragment ion series of heavy peptide from Panel A used for protein identification. Green arrowhead indicates m/z ratio of precursor ion. Inset box shows fragment ion masses used to assign b- and y-type ion series. (C) Automated quantification of light and heavy peptides using VISTA. Utilizing the high mass accuracy of the hybrid LTQ FT mass spectrometer, peptide peaks from Panel A were identified in the MS full scan data using a mass tolerance of (20 PPM. Candidate peptide peaks were further filtered according to the presence of the predicted isotopic envelopes (red and blue peaks in Panel A). Light and heavy peptide peak intensities were then plotted against HPLC retention time. To determine the relative quantification of the peptide pair, the AUC was determined for both light and heavy peptides over the retention times where the peptide was observed (indicated by the gray shaded area). Relative peptide quantification was then reported as a ratio of their respective peak areas. Green line indicates MS/MS event for the heavy peptide, with fragmentation shown in detail in Panel B.
research articles
Figure 4. Protein expression ratio distribution and classification. (A) Distribution of log2-transformed ratios (heavy:light, PC3MLN4:PC3M) for all quantified proteins (Supporting Information Table 1), excluding those found in Table 1. (Note: a ratio bin of 0 signifies a log2(ratio) of 0 ( 0.5). Proteins greater than 5-fold upregulated (B) (n ) 75) and downregulated (C) (n ) 44) were submitted for gene ontology analysis using Panther, a freely available web-based program from Applied Biosystems (Framingham, MA) (https://panther.appliedbiostems.com/) and grouped according to biological process. Proteins for which no biological process could be assigned were omitted from this display. Journal of Proteome Research • Vol. 5, No. 5, 2006 1229
research articles
Figure 5. Quantitative validation of selected proteins from Table 1. E-cadherin and UCHL1 were analyzed by immunoblotting to confirm their predominant detection in low metastatic PC3M and highly metastatic PC3M-LN4 cells, respectively. Results are consistent with those found in Table 1, and similar staining patterns were observed in both whole-cell (WCL) and microsomal (ML) lysates. Actin levels were evaluated as loading controls.
taken with these proteins, as this category contains the most FP answers as well as the least reliability in terms of accurate quantification. To increase reliability of candidate identifications, proteins retained in our final list that fell in this category (exclusive detection in one cellular state) were required to have at least two identified peptides. Furthermore, a S/N ratio greater than five was required for every quantified peptide. We omitted many proteins that were initially classified in the PC3M-only category, such as common contaminants in mass spectrometry (e.g., keratins) for which no heavy variants should be expected. After removing common contaminants, we generated a list of those proteins that were found exclusively under low or high metastatic conditions (Table 1). Detection of these proteins could have resulted from increased sensitivity by our doubleencoding approach, as well as the ability of VISTA to quantify proteins found exclusively in one samplesa feature not available in our earlier analysis. As noted earlier, validation and quantitative reproducibility were assessed on a large scale by comparing our current and prior datasets. However, none of the proteins listed in Table 1 were quantified in our earlier study.6 While many of the proteins found in Table 1 are novel or poorly characterized, a few for which antibodies were readily available were analyzed by Western blotting in both whole-cell and microsomal lysates. Consistency between Table 1 SILAC results and immunoblots is shown in Figure 5. Intense expression of E-cadherin was observed in low metastatic PC3M cells, whereas UCHL1 was predominantly detected in highly metastatic PC3M-LN4 cells. This analysis validates the detection and quantification capabilities of VISTA and further points toward novel markers of metastasis. Although one of the primary goals of this work was to improve the SILAC technology as an analytical tool for cancer research, our results point to some potentially useful PCa biomarkers. The exclusive expression of proteins in either of these cell lines may indicate functions that correlate with tumor progression. For instance, proteins absent in PC3M-LN4 could be potential metastasis suppressor candidates, with several of these already shown to have implications in cancer. Decreased expression of E-cadherin, for example, is associated with aggressive tumors and poor clinical outcome.22 Another candidate we found exclusively in PC3M is Suppressor of tumorgenicity 14, a protein expressed in many normal epithelial 1230
Journal of Proteome Research • Vol. 5, No. 5, 2006
Everley et al.
tissues but absent in tumor lines derived from these same sites.23 Taken together with our results, this indicates that the expression of this protein may disappear during the progression of PCa. Proteins present only in PC3M-LN4 could offer unique targets for chemotherapy as these candidates may play a direct role in the increased metastatic potential of these cells. Such proteins may also be prime candidates for prognostic markers, whose presence correlates with the potential for metastatic disease. Among these proteins detailed in our study is Nestin, which has been characterized as a marker of differentiation in the brain. Nestin is expressed in neuroepithelial stem cells, with a loss of detectable expression in mature tissues.24 Others have suggested that this protein may also be a useful marker for metastatic melanomas.25 Also exclusively detected in our analysis of PC3M-LN4 is UCHL1/PGP9.5, a protein that has been previously associated with malignant disease.26 Of particular note are many of the novel and hypothetical proteins found only in PC3M-LN4 (Table 1), several of which (IPI00411968, IPI00166011, and IPI00383949) contain conserved RNA-binding domains. Because the proteins used in this analysis were derived from microsomal lysates, additional differences in potential transcriptional regulators may be more readily apparent in other cellular fractions. In summary, we have shown a substantial improvement of SILAC technology applied to a model of tumor metastasis. The rapid advancement of mass spectrometry instrumentation and analysis tools has allowed us to increase both the number of proteins quantified and our confidence in identifications to nearly 1400 proteins, representing a greater than 300% increase from our earlier study. In cooperation with the new SEQUESTSORCERER search algorithm, the LTQ FT mass spectrometer provided us with more than 26 500 PSMs used for identification and quantificationsat an estimated 1% FP rateswhich corresponds to one of the largest quantitative proteomics experiments to date. Our innovative software was not only instrumental in providing accurate quantification of these peptides, but also in yielding quantitative data for proteins detected exclusively in either low or high metastatic conditions. Furthermore, analyses of proteins found in this report revealed that nearly 20% (277 of 1395) were differentially expressed by at least 3-fold, with less than 2% (23 of 1395) changing greater than 10-fold (Table 1 and Supporting Information Table 1). Whether these play a direct role in metastasis or are simply regulated by upstream metastatic factors remains to be seen. Follow-up experiments are currently underway in additional cell models of tumor progression and patient samples to determine any clinical utility for these candidate proteins. Abbreviations. 2D, two-dimensional; SILAC, stable isotope labeling by amino acids in cell culture; ICAT, isotope-coded affinity tags; MS, peptide mass spectrum; MS/MS, product ion mass spectrum; FTICR, Fourier transform ion cyclotron resonance; m/z, mass-to-charge ratio; FP, false-positive; AGC, automatic gain control; IPI, international protein index; XCorr, cross-correlation score; dCn, delta-correlation value; PPM, parts-per-million; PSMs, peptide spectral matches; S/N, signalto-noise level; AUC, area under the curve; Da, Daltons; PCa, prostate cancer; SD, standard deviation; whole cell lysates, WCL.
Acknowledgment. Support for this work was provided by NIH grants CA037393 (B.R.Z.) and HG003456 (S.P.G.). We gratefully acknowledge C. Pettaway (MD Anderson Cancer
research articles
Enhanced Analysis of Metastatic Prostate Cancer
Center, Houston, TX) for the use of PC3M and PC3M-LN4 cell lines. We also thank members of the Zetter and Gygi labs for helpful discussions and technical assistance.
Supporting Information Available: The dCn, mass accuracy filter, and XCorr values for peptide charge states which were empirically determined to meet these criteria (Supporting Information Figure 1). Metabolic incorporation of 13C6-lysine and 13C6-arginine (Supporting Information Figure 2). Validation of the target/decoy database searching strategy (Supporting Information Figure 3). The PPM threshold of (20 resulting in an FP rate of approximately 1% (Supporting Information Figure 4A) and the nearly maximized total number of peptides (Supporting Information Figure 4B). VISTA score distribution for peptides used in this analysis. (g90; Supporting Information Figure 5). Proteins quantified in the study, along with the number of peptides used for quantification and identification. Proteins quantified in the study, along with the number of peptides used for quantification and identification. (Supporting Information Table 1). This material is available free of charge via the Internet at http://pubs.acs.org. References (1) Gygi, S. P.; Rist, B.; Gerber, S. A.; Turecek, F.; Gelb, M. H.; Aebersold, R. Nat. Biotechnol. 1999, 17, 994-999. (2) Ong, S. E.; Blagoev, B.; Kratchmarova, I.; Kristensen, D. B.; Steen, H.; Pandey, A.; Mann, M. Mol. Cell. Proteomics 2002, 1, 376386. (3) Romijn, E. P.; Christis, C.; Wieffer, M.; Gouw, J. W.; Fullaondo, A.; van der Sluijs, P.; Braakman, I.; Heck, A. J. Mol. Cell. Proteomics 2005. (4) Loyet, K. M.; Ouyang, W.; Eaton, D. L.; Stults, J. T. J. Proteome Res. 2005, 4, 400-409. (5) Blagoev, B.; Kratchmarova, I.; Ong, S. E.; Nielsen, M.; Foster, L. J.; Mann, M. Nat. Biotechnol. 2003, 21, 315-318. (6) Everley, P. A.; Krijgsveld, J.; Zetter, B. R.; Gygi, S. P. Mol. Cell. Proteomics 2004, 3, 729-735. (7) Gruhler, A.; Olsen, J. V.; Mohammed, S.; Mortensen, P.; Faergeman, N. J.; Mann, M.; Jensen, O. N. Mol. Cell. Proteomics 2005, 4, 310-327.
(8) Ibarrola, N.; Kalume, D. E.; Gronborg, M.; Iwahori, A.; Pandey, A. Anal. Chem. 2003, 75, 6043-6049. (9) Ong, S. E.; Mittler, G.; Mann, M. Nat. Methods 2004, 1, 119-126. (10) Eng, J. K.; McCormack, A. L.; Yates, I.; John, R. J. Am. Soc. Mass Spectrom. 1994, 5, 976-989. (11) Perkins, D. N.; Pappin, D. J.; Creasy, D. M.; Cottrell, J. S. Electrophoresis 1999, 20, 3551-3567. (12) Syka, J. E.; Marto, J. A.; Bai, D. L.; Horning, S.; Senko, M. W.; Schwartz, J. C.; Ueberheide, B.; Garcia, B.; Busby, S.; Muratore, T.; Shabanowitz, J.; Hunt, D. F. J. Proteome Res. 2004, 3, 621626. (13) Elias, J. E.; Gibbons, F. D.; King, O. D.; Roth, F. P.; Gygi, S. P. Nat. Biotechnol. 2004, 22, 214-219. (14) Peng, J.; Elias, J. E.; Thoreen, C. C.; Licklider, L. J.; Gygi, S. P. J. Proteome Res. 2003, 2, 43-50. (15) Barnett, V.; Lewis, T. Outliers in Statistical Data, 3rd ed.; John Wiley and Sons: West Sussex, England 1994. (16) Pettaway, C. A.; Pathak, S.; Greene, G.; Ramirez, E.; Wilson, M. R.; Killion, J. J.; Fidler, I. J. Clin. Cancer Res. 1996, 2, 1627-1636. (17) Han, D. K.; Eng, J.; Zhou, H.; Aebersold, R. Nat. Biotechnol. 2001, 19, 946-951. (18) Ong, S. E.; Kratchmarova, I.; Mann, M. J. Proteome Res. 2003, 2, 173-181. (19) Elias, J. E.; Haas, W.; Faherty, B. K.; Gygi, S. P. Nat. Methods 2005, 2. (20) Christofori, G.; Semb, H. Trends Biochem. Sci. 1999, 24, 73-76. (21) Minn, A. J.; Gupta, G. P.; Siegel, P. M.; Bos, P. D.; Shu, W.; Giri, D. D.; Viale, A.; Olshen, A. B.; Gerald, W. L.; Massague, J. Nature 2005, 436, 518-524. (22) Guilford, P.; Hopkins, J.; Harraway, J.; McLeod, M.; McLeod, N.; Harawira, P.; Taite, H.; Scoular, R.; Miller, A.; Reeve, A. E. Nature 1998, 392, 402-405. (23) Takeuchi, T.; Shuman, M. A.; Craik, C. S. Proc. Natl. Acad. Sci. U.S.A. 1999, 96, 11054-11061. (24) Ehrmann, J.; Kolar, Z.; Mokry, J. J. Clin. Pathol. 2005, 58, 222223. (25) Florenes, V. A.; Holm, R.; Myklebost, O.; Lendahl, U.; Fodstad, O. Cancer Res 1994, 54, 354-356. (26) Yamazaki, T.; Hibi, K.; Takase, T.; Tezel, E.; Nakayama, H.; Kasai, Y.; Ito, K.; Akiyama, S.; Nagasaka, T.; Nakao, A. Clin. Cancer Res. 2002, 8, 192-195.
PR0504891
Journal of Proteome Research • Vol. 5, No. 5, 2006 1231