Pathway-Based Biomarker Search by High-Throughput Proteomics

Feb 6, 2009 - To whom correspondence should be addressed. Memorial Sloan-Kettering Cancer Center, 1275 York Avenue, New York, NY 10065. E-mail, villan...
0 downloads 9 Views 8MB Size
Pathway-Based Biomarker Search by High-Throughput Proteomics Profiling of Secretomes Kevin Lawlor,† Arpi Nazarian,† Lynne Lacomis,† Paul Tempst,†,‡ and Josep Villanueva*,†,‡ Protein Center and Molecular Biology Program, Memorial Sloan-Kettering Cancer Center, New York, New York 10065 Received October 13, 2008

An efficient means for the identification of prognostic and predictive biomarkers is essential in today’s cancer management. A new approach toward biomarker discovery has therefore been proposed, where pathways instead of individual proteins would be monitored and targeted. Recently, the ‘secretome’, a biological fluid that may be enriched with secreted and/or shed proteins from adjacent disease-relevant cancer cells, has been targeted for biomarker discovery. We describe a novel method for secretome analysis using “stacking gels”, label-free relative quantitation, and pathway analysis. The protocol presented here increases the throughput of secretome analysis by approximately 1 order of magnitude compared to earlier methodologies. In the first application, six cancer cell lines from three different tissues were studied. The global secretome data sets obtained were analyzed using pathway analysis software to attempt integrating the experimental findings into a cellular signaling context. This suggested that several secretome proteins might be interconnected with intracellular canonical pathways. This, in turn, may eventually allow the use of secretomes for discovery of pathway-based biomarkers. When this strategy was applied to two breast cancer cell lines, it appeared that the IGF signaling and the plasminogen activating system may be differentially regulated in invasive breast cancer, but this remains speculative until it is verified in a clinical setting. In summary, the methodology proposed optimizes cell culture with sample fractionation and LC-MS to obtain the highest yield from cultured cell secretomes, with a focus on rational biomarker discovery through putative linkage with cancer relevant pathways. Keywords: secretome • LC-MS/MS • mass spectrometry • cancer biomarker • pathways analysis

Introduction One of the biggest challenges in the management of cancer remains the lack of prognostic and predictive biomarkers that can help design a therapeutic strategy as well as monitor its tumor response.1-4 The future of cancer therapy relies on the use of targeted molecular approaches. Drugs against known molecular alterations driving oncogenesis are already in the clinic and, despite tumor resistance against them, have been successful, particularly in liquid tumors.5-7 These drugs target specific proteins that are altered in cancer. However, proteins do not operate as individual units; they collaborate in interconnected pathways to achieve different biological functions.8 A new approach toward biomarker discovery has been proposed, where pathways instead of individual proteins should be monitored and targeted.4 Once the perturbed pathways are known, it should be easier to monitor different aspects of cancer progression and therapy by focusing on pathways instead of individual genes. Pathway-based biomarkers would then be one of the targets for biomarker discovery. In fact, gene * To whom correspondence should be addressed. Memorial SloanKettering Cancer Center, 1275 York Avenue, New York, NY 10065. E-mail, [email protected]; phone, (212) 639-6676. † Protein Center, Memorial Sloan-Kettering Cancer Center. ‡ Molecular Biology Program, Memorial Sloan-Kettering Cancer Center. 10.1021/pr8008572 CCC: $40.75

 2009 American Chemical Society

expression signatures have already been defined for several signaling pathways involved in cancer, showing that molecular signatures can define the activation of such pathways.9,10 However, there are two important challenges for the success of this approach. First, biomarker discovery should be oriented toward secreted proteins, which have a better chance of entering the bloodstream and consequently being measured using a blood test. Second, the methodology used to discover such biomarkers should be high-throughput to allow the analysis of a sufficient number of samples during the discovery phase. A desirable feature for pathway-based biomarkers is that they are secreted proteins that can be monitored in blood. Blood immerses most tissues in the body and contains cell-derived proteins that may provide information not only about biological processes, but also about changes specific to localized disease, such as tumors.11 Despite an intensive search during the past decade(s), only a small number of identified cancer biomarkers, nearly all proteins with low abundance in normal plasma (e.g., PSA, CEA, CA125), have proven to be clinically useful, often in combination with other diagnostic tools.3,12 In the past few years, the principal focus of cancer biomarker research has been on the discovery of detection markers through the analysis of plasma and/or serum proteins using mass spectrometry.13-15 Journal of Proteome Research 2009, 8, 1489–1503 1489 Published on Web 02/06/2009

research articles However, plasma-based biomarker discovery has faced important technical limitations, such as the complexity and dynamic range of plasma and the low relative abundance of many disease-specific biomarkers.11,16 Alternative strategies to conduct cancer biomarker discovery are currently being explored. A recently proposed strategy is to interrogate tissue-proximal fluids and conditioned media of cell lines (‘secretomes’).17,18 The rationale for this approach is that these biological fluids, being much closer to the tumor cells, may be enriched with secreted and/or shed proteins relevant for the disease. The presence of growth factors and proteases indicates that secretomes might help in monitoring critical aspects of cancer progression such as invasion and metastasis.19,20 Mass spectrometry offers an important advantage over mRNA gene expression when studying secreted proteins, as there may be significant differences between transcriptional regulation and the actual levels of the corresponding secreted proteins. Indeed, such disparities have already been observed in global transcriptomic and proteomic analyses of human cancers.18 The amount and complexity of experimental data generated using “omics” technologies is overwhelming and limits the ability to generate new working hypotheses. New bioinformatics tools are being developed to integrate experimental data in a cellular signaling context. For example, pathway analysis tools allow the projection of potential pathway alterations on the basis of experimental data obtained using global profiling approaches. The deregulation of pathways, not necessarily individual proteins, is the most probable cause of cancer. Integrating experimental data in the context of our present knowledge about pathways can potentially lead to the identification of new functional modules perturbed in disease. Furthermore, the original literature detailing the findings using the pathway analysis tools can be accessed to examine and verify them. Here, we present an optimized technology platform for the high-throughput profiling of cell line secretomes. The optimization of the cell culture protocol, minimizing the mass spectrometric signal attributed to fetal bovine serum (FBS), increases the probability of describing a catalogue of proteins highly enriched for relevant biological content in the cancer secretome. Limited gel fractionation, together with mass spectrometric label-free quantitation, increases the throughput of sample analysis by 1 order of magnitude compared to previous methods without adversely affecting the results. This throughput increase allows for the analysis of dozens of secretome samples in the discovery phase of biomarker discovery projects. With this technology, six cancer cell lines have been analyzed, generating 200 000 fragmentation spectra that gave rise to 35 000 spectral counts and more than 5000 unique peptides matching 600 proteins. The secretome data set was then analyzed using pathway analysis software to attempt integrating the findings into a cellular signaling context. The bioinformatics analysis suggested that several secretome proteins might be interconnected with intracellular canonical pathways, potentially opening the door for the use of secretomes to discover pathway-based biomarkers. These findings were verified by Western blot analysis, validating the quantitative data obtained by label-free quantitation mass spectrometry. In summary, the proposed methodology utilizes optimized cell culture, sample fractionation and LC-MS to obtain the highest yields from tissue culture cell secretomes, with a focus on rational biomarker discovery through putative linkage with cancer relevant pathways. 1490

Journal of Proteome Research • Vol. 8, No. 3, 2009

Lawlor et al.

Experimental Procedures Cell Culture. Cellgro DMEM lacking glutamine (#MT-10-017CV), RPMI 1640 (#MT-10-040-CV), Ham’s F12 (#MT-10-080CV), and MEM Media (#MT-10-010-CV) were all obtained from Fisher Scientific, supplemented by 10% Fetal Bovine Serum (Omega Scientific #FB-01), antibiotics, and the MEM was supplemented with 1× Non-Essential Amino Acids (Fisher #MT-25-025-CI). Six cell lines, MDA-MB-231 and MCF-7 (Breast), LNCAP and DU145 (Prostate), J82 and T24 (Bladder), were ordered from the American Type Culture Collection. MDA-MB231 and MCF-7 were grown in DMEM and 10% FBS, LnCAP and DU145 were grown in RPMI 1640 and 10% FBS, J82 was grown in MEM, NEAA and 10% FBS, and T24 was grown in Ham’s F12 and 10% FBS. Cells were grown to ∼80% confluence in 20 Costar T75 Vented Cell Culture Flasks (Fisher #10-126-37) in a 37C/5% CO2 incubator, washed five times with 15 mL of relevant serumfree media per flask, and incubated 48 h with 15 mL of the same serum-free media with antibiotics per flask (300 mL total). The conditioned media were then spun down at 200g for 5 min, collected, filtered through a Nalgene 0.2 µm pore vacuum filter (Fisher #09-741-07), and concentrated using a 10 000 MWCO Millipore Amicon Ultra (Millipore #UFC901024), spinning down 15 mL at a time at 800g for 30 min until the final concentration was 1 mg/mL (∼200- to 300-fold concentration). Protein concentration was determined with a Bio-Rad protein assay (Bio-Rad, #500-0006). Sample Preparation. Concentrated conditioned media were run on a self-poured stacking SDS-PAGE gel. The resolving gel portion (4.2 mL of water, 2.5 mL of 1.5 M Tris/0.4% SDS/pH 8.8, 3.3 mL of Bis/Acrylamide (Bio-Rad #151-0158), 50 µL of 10% ammonium persulfate, and 25 µL of TEMED) was poured and set to polymerize for 30 min. The stacking gel portion (3.1 mL of water, 1.25 mL of 1 M Tris/0.4% SDS/pH 6.8, 670 µL of Bis/Acrylamide, 25 µL of APS, and 12 µL of TEMED) was poured next; a 10-well, 50 µL comb was inserted, and set to polymerize for 30 min. A total of 12 µL of 6× Laemli Buffer was added to 60 µL (60 mg) of conditioned media (enough for 3 replicates of 20 µg samples), boiled for 5 min, and then incubated in 0.6 M acrylamide for 30 min before loading it on the gel. The gel was run at 100 V. The electrophoresis was stopped after the sample had barely passed into the resolving gel, using rainbow MW markers (GE Healthcare #28925341) as a reference. The gel was stained using Coomassie Brilliant Blue R-250 for 15 min and then destained in 45% methanol, 10% acetic acid. Bands were cut horizontally into three slivers per lane (about 3 mm per band). The gel slivers were destained with a 50% methanol wash and vortexed, and this was followed by a 37 °C incubation for 15 min (this was repeated three times). Then, the slivers were rinsed with water four times, with vortexing between washes. The gel pieces were then diced into ∼1 mm3 pieces and dried using a Speed-Vac. Each gel sample was digested with 0.2 µg of trypsin (sequencing grade modified: Promega #V511A), adding sufficient volume of 0.1 M ammonium bicarbonate to completely saturate the gel. Sample was incubated at 37 °C for an overnight digestion. Following the digest, the sample was sonicated for 5 min and spun down ∼1 min in an Eppendorf microfuge. The entire supernatant was then drawn up, transferred into a new Eppendorf tube, and cleaned-up using Poros 50 R2 RP microtips prior to mass spectrometric analysis.21

Biomarker Search by Proteomics Profiling of Secretomes LC-MS/MS Analysis. For proteomics analysis of the 1D SDS ‘stacking gel’ sample fractions, extracted peptides from in-gel trypsin digestion were analyzed using a QSTAR-XL hybrid quadrupole time-of-flight mass spectrometer (Applied Biosystems/MDS Sciex) equipped with a NanoSpray ion source (Applied Biosystems/MDS Sciex, Toronto, Canada). Peptide mixtures were loaded onto a trapping guard column (µPrecolumn cartridge filled with PepMap C18, 5 µm, 100 Å, 300 µm i.d. × 5 mm (LC Packings, Sunnyvale, CA)) using a Switchos loading pump (LC Packings, Sunnyvale, CA) at a flow rate of 5 µL/min then washed 4 min with 5% acetonitrile (ACN) in 0.1% formic acid (FA) at the same flow rate of 5 µL/min. After washing, flow was reversed through the guard column and the peptides eluted with a linear gradient of 5-45% buffer B (80% ACN and 0.1% FA (v/v)) in buffer A (5% ACN and 0.1% FA in water (v/v)). The gradient was delivered over 60 min by an Ultimate (LC Packings) capillary HPLC system at a flow rate of 200 nL/min, obtained by a 15:1 precolumn flow split, through a 75-µm × 15-cm fused silica capillary C18 HPLC column (LC Packings PepMap) to a 75-8 µm fused silica nanoelectrospray needle (New Objective, Woburn, MA). Electrospray ionization (ESI) was carried out with a capillary voltage setting of 1800 V applied to a precisely positioned needle, approximately 1 cm in front of the orifice and 0.5 cm off center. Typically, the optimal voltage to maintain a good plume is obtained between 1700 and 2000 V. For MS analysis, the mass spectrometer was operated in the automatic data-dependent acquisition mode, with the threshold set to 20 counts per second of doubly or triply charged precursor ions automatically selected for fragmentation scans. Survey scans of 1 s were recorded from 400-1200 m/z with an interscan time of 0.1 s. Up to 5 tandem MS scans were collected sequentially for the selected precursor ions, recording from 100 to 1600 m/z. The acquisition time for the tandem MS data of the most intense ion was 1 s; 1.3 s for the data of the ions with the second and third highest intensities, and 1.5 s for the spectra of the remaining two ions. The cycle time was approximately 7.6 s. After selection and fragmentation, precursor ions were excluded from repeated selection for 50 s after the end of the corresponding fragmentation duty cycle. The collision energy (CE) was automatically adjusted relative to the precursor ion m/z values and the charge of ions selected for MS/MS, (the CE used was approximately 1/20 of the numerical value of the precursor ion m/z selected). Database Searching and Quantitation. Raw mass spectrometric data was processed and the peak list was extracted using Analyst QS 1.0 (ABI). All MS/MS samples were searched against the IPI.HUMAN.v3.39 database (67756 entries) using Mascot (Matrix Science, London, U.K.; version 2.2.04) and X! Tandem (www.thegpm.org; version 2007.01.01.2). X! Tandem assumes tryptic digestion. The data was searched with a fragment ion mass tolerance of 100 ppm and a parent ion tolerance of 50 ppm. Oxidations of methionine and acrylamide adduct of cysteine were specified in Mascot and X! Tandem as variable modifications. Scaffold (version Scaffold_2_01_01, Proteome Software, Inc., Portland, OR) was used to validate MS/MS based peptide and protein identifications. Peptide identifications were accepted if they could be established at a greater than 95.0% probability as specified by the PeptideProphet algorithm.22 Protein identifications were accepted if they could be established at greater than 95.0% probability and contained at least 2 identified peptides. Protein probabilities were assigned by the ProteinProphet algorithm.23 Proteins that contained similar

research articles peptides and which could not be differentiated based on MS/ MS analysis alone were grouped to satisfy the principles of parsimony. The total number of MS/MS spectra assigned to a protein (spectral count), meeting the Peptide and Protein Prophet cutoff for each of the 600 proteins in the data set, was imported from Scaffold into Microsoft Excel. To avoid dividing by zero, every spectral count (SC) was replaced by a ‘pseudocount’ (every SC was replaced by SC + 0.33). A protein that was not detected in any of the three replicates of a sample would get one count. Normalization to correct for sample loading errors was as follows: the normalized spectral count (NSC) for a protein is the number of spectral counts identifying a protein, divided by the sum of spectral counts obtained for all proteins in the sample. Once normalized, a scaling factor was added by multiplying each NSC value by a scaling factor that is constant within the data set and is used primarily to set the NSC to a more “user-friendly” scale. Here, the scaling factor used was the average of the sums of the spectral counts obtained for all samples in the data set. Western Blot Analysis. A total of 2 µg of conditioned media from MCF7 and MDA-MB-231 cell lines was run on 4-15% ReadyGels (BioRad #161-1158). After gel electrophoresis, the resolved proteins were transferred to PVDF membranes. The membranes were blocked in 5% dry milk in phosphate buffered saline containing 0.5% Tween-20. Membranes were then incubated overnight at 4 °C with the following antibodies: antiIGFBP2 (catalog number ab4244), anti-IGFBP5 (catalog number ab4255), anti-IGFBP7 (catalog number ab51392), anti-Cathespin B (catalog number ab58802), anti-Protein C Inhibitor (catolog number ab67368), and anti-PAI1 (catalog number ab20924) were purchased from abcam. Anti-E Cadherin (catalog number 610182) was purchased from BD Biosciences. Anti-Cathepsin X/Z/P (catalog number MAB934) was purchased from R&D Systems. Anti-CSF1 (catalog number H00001435-M01) was bought from Abnova. Anti-Urokinase Plasminogen Activator uPA (Catalog number MAB7776) was purchased from Millipore. The membranes were washed and incubated in horseradish peroxidase conjugated secondary antibodies and developed using enhanced chemiluminescence detection according to the manufacturer’s instructions (Amersham Biosciences). Cellular Localization Prediction. Different sources of information were used in order to classify identified proteins into different cellular compartments: Predictive algorithms (SignalP (3.0), SecretomeP (2.0), TMHMM (2.0)), Gene Ontology (GO) and the Uniprot database.24-26 Proteins’ localizations were classified as secreted classical, secreted nonclassical, plasma membrane and intracellular using the following rationale: to designate a protein as “Secreted Classical”, a signal peptide would have to have been predicted via Signal P or have been designated “extracellular” via GO. SecretomeP was used to predict secretion that does not use the classical secretory pathway. For “Plasma Membrane” designation, TMHMM would have to have predicted transmembrane domains or GO would have to have predicted “Plasma Membrane”. “Intracellular” proteins were identified via their GO and Uniprot information or via the absence of a specific designation when using the SignalP, SecretomeP, and TMHMM algorithms. All findings were verified by weighing the predictive algorithms with the GO and Uniprot databases. Hierarchical Clustering Analyses. The spreadsheet containing the NSC for the six cell lines (6 samples total; 600 proteins) was imported into the ‘GeneSpring’ program (Agilent, Palo Alto, Journal of Proteome Research • Vol. 8, No. 3, 2009 1491

research articles CA). Different “experiments” were created in ‘GeneSpring’ to represent the proteins. No normalizations were applied to the experiment since the protein signals had been already normalized. In the ‘Parameter’ section of the experiments, different parameters were created to label. In the “Experiment’s Interpretation” section, the Analysis mode was set to “Ratio (signal/ control)” and all measurements were used. No Cross-Gene Error model was used. The 600 proteins were subjected to average-linkage hierarchical clustering, using pearson correlation as a distance metric (‘GeneSpring’ program). Cell lines and proteins were organized by creating mock-phylogenetic trees (‘dendrograms’). Trees were then displayed with the samples along the X-axis and the masses along the Y-axis. Comparison Transcriptomics-Proteomics. For the two breast cancer cell lines used in this work (MCF7 and MDAMB-231), affymetrix mRNA expression data was downloaded from the supplementary data of a recent publication.27 NSC for all proteins detected in MCF7 and MDA-MB-231 were log2 transformed. The log2 signal for the probes representing genes for which proteins have been measured in this work were extracted and compared to the log2 NSC. The log2 (MDA-MB231) - log2 (MCF7) was calculated for every affymetrix probe and protein (NSC) in our data set. Next, z-scores were calculated for all log2 fold-ratios (MDA-MB-231/ MCF7), for both the secretome and the gene expression data sets, to put the proteomics and the gene expression data in a comparable scale. Z-scores were calculated as follows: every log2 fold-ratio (MDAMB-231/ MCF7) was subtracted from the global mean log2 foldratio for the data set and then divided by the global standard deviation. Only proteins overexpressed in either MCF7 or MDAMB-231 that had at least a NSC g 5 in one of the two cell lines were considered for the Ingenuity Pathway analysis of the breast cancer secretome. Pathway Analysis. Proteomics data was uploaded into the Ingenuity Pathways (IPA) (Ingenuity Systems, Redwood City, CA) Knowledge Base as a tab-delimited text file of IPI accession numbers. Proteins were uploaded and mapped to corresponding “gene objects” in the Ingenuity Pathways Knowledge Base (IPKB). Then, biological networks were generated using their Knowledge Base for interactions between mapped Focus Genes (user’s list) and all other gene objects stored in the knowledge base. In addition, functional analysis of the networks was done to identify the biological functions and/or diseases that were most significant to the genes in the network. Networks are displayed graphically as nodes (genes/gene products) and edges (the biological relationships between the nodes). All edges are supported by at least one reference from the literature, textbook or canonical information stored in the IPKB. Ingenuity Pathways Analysis computes a score for each network according to the fit of the user’s set of significant genes. The significance of functional enrichment is computed by a Fisher’s exact test. The score is derived from a p-value and indicates the likelihood of the mapped Genes in a network being found together due to random chance (p-score ) -log 10 (p-value). Finally, the Path designer feature was used to create graphically rich network images.

Results and Discussion Cell Culture and Sample Preparation Optimization. The goal of every proteomic analysis is to obtain the most complete and unbiased answer to the biological question asked. To accomplish this, it is crucial to work with the best possible biological sample. Since it is nearly impossible to capture all 1492

Journal of Proteome Research • Vol. 8, No. 3, 2009

Lawlor et al. cellular proteins with current proteomics technologies (unless utilizing exhaustive fractionation) and our final goal is to discover biomarkers, we chose to work with the class of proteins secreted by cells (i.e., the secretome). A series of experiments were devised to optimize the conditioned media preparation. The two parameters used for optimization were cell viability and the proteomic signal of abundant serum proteins versus true secreted cellular proteins. Cells were grown to about 80% confluency in media containing 10% FBS, washed 2-10 times with PBS or media without FBS, and incubated for different times in a chemically defined medium. Forty-eight hours prior to harvesting the conditioned medium, the cells were extensively washed in serum-free medium to remove any “contaminants” originating from the serum in the medium. Subsequently, the conditioned medium was harvested and filtered after 24 or 48 h of conditioning. There was no need for any medium concentration normalization because variability in cell concentration among individual cell lines was negligible (less than 10%). For LC-MS/ MS analysis, the medium was desalted and concentrated using a 3000-Da molecular mass-cutoff spin column (see Experimental Procedures). After concentration, the protein mixture was set to 1 mg/mL using a Bradford assay (in general, that meant a 300:1 concentration from the original medium). Successful conditioning of medium by cells was monitored by running aliquots of the medium on precast SDS-PAGE gels. As expected, the protein composition of the medium increases in complexity over time. Also, the protein concentration decreases with an increasing number of washes before starting the conditioning (Figure 1). In parallel, aliquots from the different conditions were run on homemade SDS-PAGE gels where the stacking gel is longer and has a slightly lower polyacrylamide concentration than commercial precast SDS gels. This serves to denature and concentrate all proteins in tight bands at the top of the separation gel, which can be readily stained with Coommassie blue, excised, digested with trypsin, and processed for mass spectrometry, exactly as what has been common practice over the years for individual, gel-fractionated bands. Five-microgram aliquots of J82 conditioned medium (CM), each prepared under different washing and conditioning time settings, were run through “stacking gels” to a distance of approximately 5 mm into the resolving gel. A single band containing the proteins was then cut from the gel and subjected to a standard in-gel tryptic digestion protocol (see Experimental Procedures). After LC-MS analysis the normalized spectral counts (NSC) for all the proteins identified were tabulated and key serum and typical cellular secreted proteins compared (see Experimental Procedures) (Figure 1).28,29 In view of the results, conditions chosen for the generation of the secretomes were to do washes 5 times with serum-free media before a media conditioning of 48 h. Less than 5 washes left too much serum proteins and more than 5 washes affected the viability of the cells. A 48 h of incubation in serum free media was used since the secretome protein concentration went up without affecting cell viability. In a second optimization phase, the weight amount and the number of gel fractions were studied following a similar strategy. Different weight amounts of J82 CM were run through a stacking gel to various distances into the resolving gel. Proteins were allowed to run from 5 mm to 1 cm into the resolving gel depending on how many slivers were to be cut from the gel (Figure 2A). Once the LC-MS results were obtained for each experiment, the data from different sample fractions were merged before doing the database search. For example,

Biomarker Search by Proteomics Profiling of Secretomes

research articles odology is able to capture most of them using a minimal fractionation strategy.17,18,31,32 A reassuring fact about our protocol is that, in the final protein inventory, a very broad molecular weight range was obtained (5-500 kDa), which means that most of the protein mixture was captured and analyzed.

Figure 1. Cell culture optimization. J82 CM was collected after different washing conditions and their SDS-PAGE profiles were monitored (left). The molecular markers lane is marked as M (molecular weight of the markers are on the left side of the gel and the values are kDa); the rest of the lanes are marked with the number of washes before media conditioning. In parallel, CM from different washing conditions was analyzed by LC-MS and the NSC for all the proteins identified were tabulated. Key serum and typical cellular secreted/plasma membrane proteins are shown (right). Red bars represent the NSC from washing the cells twice before conditioning the media. Green bars represent the NSC from washing the cells five times before conditioning of media. Standard deviation for three technical replicates are shown for seven proteins in the two washing procedures. Gene names are shown for the proteins compared in the figure.

the data from three LC-MS runs obtained from J82 CM that was cut into three slivers were merged before the Mascot and XTandem! search and the Proteinprophet and Peptideprohet confidence filtering.30 The results in Figure 2A show a logical trend: the more CM used and the more fractionation performed, the more proteins and unique peptides were obtained. However, there are practical limitations on this issue. One goal of this project was to develop a methodology that could do high-throughput analyses; therefore, it was best to use the least number of fractions possible (around 3). For a small number of fractions, the amount of CM used saturated the system (the analytical column and the mass spectrometer) rather quickly. For 3 fractions, we were not able to run more than 20 µg of CM on the stacking gels without later saturating the mass spectrometer. Even though 3 fractions and 20 µg were almost at the saturation limit, the final results were best when blank runs were done in between each of the 3 fractions to keep the LC-MS system clean. Actually, the results obtained here were comparable to other publications analyzing cell line secretomes. Usually, the number of proteins obtained for each cell line varies from 100 to 600 proteins. This variance comes from different cell lines, different mass spectrometers, and with different stringencies with which the results have been filtered. However, all publications, until now, fractionated secretomes extensively (between 20 and 30 fractions) to obtain a few hundred proteins. Here, we prove that fractionation can be substantially reduced without greatly affecting the final results. From our results, and from the previous publications, it can be assumed that the actual number of proteins from the cellular secretome is several hundred proteins and the present meth-

Once it was decided that 20 µg and 3 fractions for each sample were the optimal conditions, and given the high throughput that this setup provided, a new element was introduced that increased the confidence in the results by adding technical replicates to the methodology. Following the strategy depicted above, a series of experiments running 20 µg of J82 CM on stacking gels and taking 3 fractions was done five times (Figure 2B). The results showed that the technical replicates had high reproducibility with low coefficients of variation (CV) (in the 10% range) in the number of proteins and unique peptides obtained (Figure 2B). Also, while the number of unique peptides kept increasing during the replicates, the increase almost levels off around the third replicate. Therefore, three technical replicates were chosen for further experimentation. While the global (number of proteins and peptides) CV among the replicates was very good, it did not say much about the quantitation of individual proteins within the mixture. The replicate data were plotted for each of the top 50 proteins identified using their individual NSC variation (Figure 2C). Assuming that the data are normally distributed, the NSC value for a given protein that is significantly different between two groups of samples should have a mean that is different by more than two standard deviations (SD). Two standard deviations from the mean defines being in the 5% tails of the normal distribution, or in others words, being 95% confident that the experimentally measured means are significantly different. When we analyze our data set (partially presented in Figure 2C), the top 150 proteins show SDs small enough that, if we were to transition into a biomarker discovery project, we would be able to detect 2-fold up/down differences between samples as being statistically significant. Secretome Characterization and Quantitation. The sample preparation strategy described in the previous section was intended to capture the most comprehensive catalogue of proteins possible in CM at the minimal fractionation cost. The methodology was verified by characterizing the secretomes of six cancer cell lines from three different tissues: J82 and T24 (bladder), LNCAP and DU145 (prostate) and MCF7 and MDAMB-231 (breast). After concentration, the CM of these six cell lines was run in triplicate on SDS-PAGE “stacking gels”, three fractions were excised from each lane for tryptic digestion, and digests were analyzed by LC-MS/MS for protein identification (see Experimental Procedures). A flowchart showing the final experimental and data analysis setup is shown in Figure 3. A combined total of 198 120 MS/MS spectra were searched with Mascot and the results imported into Scaffold where the data was additionally searched with XTandem!. Both searches were done against the human subdatabase of IPI (version v3.39). A total of 1136 proteins were identified among the six cell lines, meeting a cutoff of 95% for Peptideprophet for the six cancer cell lines combined. However, we decided to increase the stringency of the cutoff to 95% Peptideprophet and 95% Proteinprophet with at least two peptides identified for a protein in one of the cell lines to obtain a smaller list of confidently identified proteins for further data analysis. This new cutoff resulted in 625 individual proteins that had been confidently identified (Figure 4A). To filter out proteins present Journal of Proteome Research • Vol. 8, No. 3, 2009 1493

research articles

Lawlor et al.

Figure 2. Secretome preparation optimization. Different weight amounts of J82 CM were run through a stacking gel to various distances into the resolving gel. Proteins were allowed to run from 5 mm to 1 cm into the resolving gel depending on how many slivers were to be cut from the gel. Once the LC-MS results were obtained for each experiment, the data from one, two or three different size fractions (i.e., slivers) were merged before doing the database search and the Proteinprophet and Peptideprohet protein identification confidence filtering. (A) Optimization of the weight amount and fractions for secretome analysis. The graphs show the number of proteins and unique peptides obtained when testing different CM amount/fractionation conditions. (B) Cumulative number of protein and peptide identifications obtained over five technical replicates of J82 CM using 20 µg and 3 size fractions (left). The average number of proteins and unique peptides per replicate, from five technical replicates of J82 CM are also shown (right) (C) Bar plot showing the quantitation of individual proteins within the J82 CM mixture. The NSC for the top 50 most abundant proteins are shown. Standard deviations for each of the 50 proteins are shown for three and five technical replicates. The average and standard deviations for the three replicate plot were calculated by randomly taking three replicates out of the five replicate plot 10 times and averaging the values. Gene names for all of them are shown.

due to trace amounts of remaining FBS in the CM, the 625 proteins were compared to the top 500 most abundant proteins found in human serum and 18 proteins were removed and not considered further in the data analysis (see Supporting Information Table S1).33 Also, eight IPI identifiers could not find a corresponding gene and they too were removed, leaving a final list containing 600 proteins. The complete list of all protein and peptide mass spectrometry information can be found in Supporting Information Tables S2 and S3. As a summary of this proteomic analysis, 300 proteins, 1438 unique peptides, and 3800 spectral counts were obtained on average for each of the six 1494

Journal of Proteome Research • Vol. 8, No. 3, 2009

cell lines analyzed (at peptideprophet and proteinprophet 95% and at least 2 peptides for each protein). Normalized spectral counts for all replicates of the six cell lines were analyzed by hierarchical clustering (Figure 4B, Supporting Information Table S4). The first obvious observation from this analysis is that replicates are highly similar among themselves and cluster together. In general, there is an overlap of more than 80% of proteins identified in all three replicates for each cell line (except for replicate 1 of T24). A second observation is that cell lines do not seem to cluster by any biological parameter. They do not cluster by tissue nor by invasiveness. However, there is a sizable overlap

Biomarker Search by Proteomics Profiling of Secretomes

Figure 3. Secretome sample preparation and data analysis workflow. Flowchart showing the various steps of the secretome analysis approach described in this report.

among the three tissues; 38% of the proteins in the final list are in the bladder, prostate and breast tissue at the same time. Cellular Localization Prediction. A highly stringent filtering of the six-cell line secretome produced a final list of 600 proteins. It should be noted that not all 600 proteins identified in our study are naturally secreted. Trypan blue staining for all cell lines was done before and after 48 h of serum starvation to estimate the number of dead cells. More than 95% of cells in all 6 cell lines were viable after serum starvation and there was no difference between starved and control cells. However, the 5% dead cells are likely to contribute intracellullar proteins in the final secretome inventory. Cellular compartment assignment was done in a multistep fashion where SignalP 3.0, SecretomeP 2.0, TMHMM 2.0, Gene Ontology (GO) and Uniprot were involved (see Experimental Procedures).24-26 It should be noted that assigning proteins to different compartments is not an exact science. For most proteins, there is no experimental evidence on cellular compartment location, and many cases have been documented where a protein is shuttled between different cellular compartments in different cellular contexts. In addition, a new mechanism has been recently reported by which proteins without a signal peptide are secreted through a caspase-dependent mechanism.34 The fact that secretion is a process not completely understood raises the question of how many of the proteins that are classified as intracellular are

research articles

Figure 4. Secretome analysis by LC-MS/MS for six cancer cell lines. The secretomes for six cancer cell lines from three different tissues: J82 and T24 (bladder), LNCAP and DU145 (prostate) and MCF7 and MDA-MB-231 (breast) were characterized. The CM from the cell lines was concentrated, run in triplicate on SDSPAGE “stacking gels”; three fractions were excised from each lane for tryptic digestion, and digests were analyzed by LC-MS/ MS for protein identification. MS data were searched with Mascot and XTandem! using the human subdatabase of IPI, and filtered by the PeptideProphet and ProteinProphet algorithms. (A) Venn diagrams showing the overlap of proteins and unique peptides among the secretomes of the three tissues studied. (B) Unsupervised hierarchical clustering analysis of the three technical replicates of each of the six cancer cell lines. Unsupervised, average-linkage hierarchical clustering was done using standard correlation as a distance metrics among each of the 18 data sets. Spectral counts were exported from scaffold into Excel and normalized (see Experimental Procedures). The entire protein list (600 × 18) was used. Columns represent samples; rows are proteins. Dendrogram colors follow the color-coding scheme of panel A. The heat map scale of NSC is from 1 (green) to 4 (red), with the midpoint at 2 (yellow).

indeed occasionally secreted. For example, several supposedly strict intracellular proteins (like metabolic enzymes) have mass spectrometric signals 2 orders of magnitude higher than ribosomal proteins (which are universally designated as “intracellular”) in our data set, indicating a possibility of secretion. Regardless, the goal of our cellular compartment classification was to identify the most probable set of secreted (classical or nonclassical) and plasma membrane proteins for further analysis. A total of 47% of the proteins (275 of the 600 proteins) on our list were predicted to be either secreted or in the plasma membrane, while 53% were predicted to be intracellular (see Figure 5 and Supporting Information Table S5). Secreted and plasma membrane proteins (Supporting Information Table S6) were considered to be the true secretome and taken for further analysis, while predicted intracellular proteins were considered to be contaminants and left aside. The Cancer Secretome. One of the biggest challenges in the systems biology field is how to transition from obtaining a list of genes/proteins to generating a biological hypothesis and Journal of Proteome Research • Vol. 8, No. 3, 2009 1495

research articles

Figure 5. Subcellular location of the secretome. Each of the 600 proteins identified after MASCOT and X!Tandem searching that passed the identification threshold set by the PeptideProphet and ProteinProphet algorithms was classified by its cellular location. Cellular compartment assignment was done in a multistep fashion where SignalP 3.0, SecretomeP 2.0, TMHMM 2.0, Gene Ontology (GO) and Uniprot were involved (see Experimental Procedures).

designing follow-up experiments. When working with a specific pathway or a protein complex, it is relatively easy to find out whether the “omics” results are helping the project. However, it is not always easy to find a biological context for the findings, particularly when the data is looked at in a broad manner. To put our proteomics data in a biological context, the secretome data obtained with six cancer cell lines from the three different tissues was uploaded into ingenuity pathways analysis (IPA) software. The 274-protein list predicted as either “secreted” or “plasma membrane” was imported into IPA as a tab-delimited text file of IPI accession numbers (Supporting Information Table S6). A total of 271 out of the 274 proteins were identified, mapped to gene objects in the IPA Knowledge Base, and used to generate biological networks. The networks were built based on the queries to the Ingenuity Pathways Knowledge Base for interactions between Focus Genes (user’s list) and all other gene objects stored in the knowledge database. Then, IPA generates a set of multidirectional interaction networks among the Focus Genes and the rest of the gene objects where the interactions can be protein-protein or functional (e.g., activation, inhibition, PTM). Ingenuity Pathways Analysis computes a score derived from a p-value that indicates the likelihood of the Focus Genes in a network being found together due to random chance. A score of 2 indicates that there is a 1 in 100 chance that the Focus Genes are together in a network due to random chance. Eight networks were generated from our data and the four top networks (those related to cancer) were merged to obtain a global view (Figure 6A). The merged network contains 91 secretome proteins (in red) and 47 gene objects (in yellow) connected to them by IPA (see Supporting Information Table S7). The significance score of the four individual networks was better than 24 (p-value < 10-24). The complexity of Figure 6 demonstrates the enormous number of interactions between the secretome proteins and different intracellular signaling proteins. Fifteen canonical signaling pathways are populated with at least 5 molecules of the network (Figure 6A). Interestingly, there are some major hubs in the network where multiple connections arrive or irradiate to the rest of the network. The receptor tyrosine-protein kinase erbB-2 (ERBB2), which is a major oncogene driving breast cancer, is connected to several secretome molecules (Figure 6B). For example, the receptor tyrosine-protein kinase erbB-2 has been reported to physically interact with Integrin beta 1 1496

Journal of Proteome Research • Vol. 8, No. 3, 2009

Lawlor et al. (ITGB1), SH3GRL3, and the cell-surface antigen 4F2 (SLC3A2) in different cell types.35-37 In addition, ERBB2 decreases the expression of the Tax1-binding protein 3 (TAX1BP3) and modulates the expression of the Sushi repeat-containing protein SRPX (SRPX), the Epididymal secretory protein E1 (NPC2), and the Renin receptor (ATP6AP2), among others.38-40 Another big hub in the network is the ERK/MAP kinase protein family (Figure 6C). In LNCAP cells, interleukin-6 (IL6) increases the expression and phosphorylation of the MAP kinase 1 (MAPK1, ERK1/2).41,42 Neural cell adhesion molecule L1 (L1CAM) protein increases the activation of the MAP kinase 1, while some proteins of the MAP kinase signaling interact with the Neural cell adhesion molecule L1 in the plasma membrane.43,44 The Macrophage migration inhibitory factor (MIF) and the Astrocytic phosphoprotein PEA-15 (PEA15) increase the activation of MAP kinase 1 as well as its phosphorylation.45,46 Also, seven known drug targets in cancer, Collagen alpha-1 and 2(VI) chain (COL6A1 and COL6A2), receptor tyrosine-protein kinase erbB-2 (ERBB2), interleukin-6 (IL6), Integrin alpha-V (ITGAV), Plasminogen activator inhibitor 1 (SERPINE1), and vascular endothelial growth factor A (VGEFA), are part of the network presented in Figure 6A, making the secretome methodology also attractive to monitor drug targets during therapy. A limitation of our study is that the pathway analysis done with the secretome data set has been inferred by a computer modeling approach that is based on predetermined database knowledge. It must therefore be considered as a source of hypotheses. The gene interactions in the knowledge database have been established in various physiologic and pathologic cell conditions, but do not necessarily prove that the proteins in our analysis interact the same way as in cancer. The goal of the pathway analysis was to set up a hypothesis-generating approach to reveal new protein connections between the secretome and intracellular signaling pathways. Functional testing will be essential for the validation of individual protein interactions deduced by our approach. However, despite its limitations, we believe that this is a step toward the discovery of candidate secreted biomarkers for specific signaling pathways that eventually can be verified and validated in biological fluids in the future. Breast Cancer Secretome Pathway Analysis. Because of the resultant complexity found when taking the whole data set together, we decided to take the two breast cancer cell lines for a more in depth analysis. Recently, the development of gene-expression profiling methods has allowed for a major advance in the identification of distinctive molecular breast cancer subtypes. Original reports classified breast cancer tumors into five molecular subtypes: luminal A, luminal B, basal-like, ERBB2-positive and normal breast-like.47,48 The most useful conclusion from breast tumor subtyping is the clear distinction between luminal tumors and basal-like tumors. Luminal tumors generally express the estrogen receptor (ER) with or without coexpression of the progesterone receptor (PR).47 Basal-like tumors are defined by expression of cytokeratins (CKs) 5, 14 and 17 and a lack of ER, PR, and HER2 expression.47 The two breast cancer cell lines used in this study (MCF7 and MDA-MB-231) represent the two main breast cancer molecular subtypes described: MCF7 represents the luminal subtype and MDA-MB-231 represents the more aggressive basal subtype. We realize that using only two breast cancer cell lines is an oversimplification of the disease as there is no direct connection with breast cancer patient tissues at this time. Therefore, the analysis presented here should be

Biomarker Search by Proteomics Profiling of Secretomes

research articles

Figure 6. Pathway analysis of the global secretome for six cancer cell lines. The secretome data obtained with six cancer cell lines from the three different tissues were uploaded into ingenuity pathways analysis (IPA) software. The 271 proteins out of the 274-protein list predicted as either “secreted” or “plasma membrane” were identified, mapped to gene objects in the IPA Knowledge Base, and used to generate biological networks. Eight networks were generated and the four top networks were merged to obtain a global view. (A) The network consists of 91 secreted/membrane proteins involved in cancer (red), and 47 genes highly interconnected to this group (yellow). The significance score of the network was better than 24 (p-value < 10-24). Nodes represent genes, with their shape representing the functional class of the gene product, and edges indicate the biological relationship between the nodes (see legend). Straight lines suggest direct interactions, whereas dashed lines indirect ones. The most populated canonical pathways as well as drug targets in the network are shown (right). (B) ERBB2 network interactions with secretome proteins. The network shows eight secretome proteins that interact physically or functionally with ERBB2. (C) ERK network interactions with secretome proteins. The network shows 14 secretome proteins that interact physically or functionally with ERK. Edges between ERBB2 or ERK and their interacting proteins are colored in blue.

considered as a proof of principle to show what the approach taken here is capable of doing. First, the ratios of fold change of MDA-MB-231 over MCF7 were calculated for the secretome data set (Table 1). A 2 foldchange up for a particular protein means that it is overexpressed in the basal type cell line (more invasive), while a 2 fold-change down means that it is overexpressed in the luminal type cell line (less invasive). Although we tried to avoid short listing proteins that have been released from dead (∼5% of total

cells) or dying cells by focusing on bona fide secreted and membrane proteins, we wanted to obtain an additional measure of validity before proceeding with the final analysis. Assuming that ribosomal proteins are among the most abundant proteins in the cytosol, we reasoned that any protein observed at relative higher abundance (based on NSC) in conditioned media cannot be simply explained away as the sole result of cell rupture. The NSCs of all ‘extracellular’ proteins in Table 1 were therefore normalized for size (to 100 kDa: Journal of Proteome Research • Vol. 8, No. 3, 2009 1497

research articles

Lawlor et al.

Table 1. List of the Top 60 Secreted/Membrane Proteins Regulated in the Breast Cancer Secretome MCF7a

MDA231a

log2 ratio (MDA231/MCF7)b

cellular localizationc

protein name

gene name

acc. number

Plasminogen activator inhibitor 1 Tissue-type plasminogen activator Pentraxin-related protein PTX3 Macrophage colonystimulating vactor 1 Insulin-like growth factorbinding protein 7 Lysyl oxidase-like 2 Cystatin-SN Galectin-3-binding protein precursor Galectin-1 Laminin subunit gamma-2 Proprotein convertase subtilisin/kexin type 9 Protein CYR61 AXL receptor tyrosine kinase N-acetylglucosamine6-sulfatase Laminin subunit beta-1 Neutrophil gelatinaseassociated lipocalin PRSS2 protein Cathepsin Z Collagen alpha-1(VI) chain Urokinase-type plasminogen activator Hepatocyte growth factor receptor Interleukin-6 Collagen alpha-1(V) chain Laminin subunit beta-3 Neural cell adhesion molecule L1 Tripeptidyl-peptidase 1 TGF beta-induced protein ig-h3 Follistatin-related protein 1 Low-density lipoprotein receptor Extracellular matrix protein 1 Receptor-type tyrosineprotein phosphatasr F Cadherin EGF LAG sevenpass G-type receptor 2 Carboxypeptidase E Nucleobindin-2 Neogenin Laminin subunit beta-2 Epithelial cadherin precursor Prostaglandin F2 receptor negative regulator Insulin-like growth factorbinding protein 2 Plasma serine protease inhibitor

SERPINE1

IPI00007118

1

84(16)

6.6

extracellular

PLAT

IPI00019590

1

73(13)

6.4

extracellular

PTX3

IPI00029568

1

64(11)

6.2

extracellular

CSF1

IPI00015881

1

58(6)

6.1

extracellular

IGFBP7

IPI00016915

1

44(9)

5.7

extracellular

LOXL2 CST1 LGALS3BP

IPI00294839 IPI00305477 IPI00023673

1 1 4(2)

42(16) 39(8) 94(15)

5.6 5.5 4.9

extracellular extracellular plasma membrane

LGALS1 LAMC2

IPI00219219 IPI00015117

1 1

23(5) 22(9)

4.7 4.7

extracellular extracellular

PCSK9

IPI003871686

1

19(10)

4.7

extracellular

CYR61 AXL

IPI00299219 IPI00296992

1 1

19(5) 18(3)

4.5 4.4

extracellular plasma membrane

GNS

IPI00012102

1

18(6)

4.4

extracellular

LAMB1 LCN2

IPI00013976 IPI00299547

1 1

18(7) 18(6)

4.4 4.4

extracellular extracellular

PRSS2 CTSZ COL6A1

IPI00011695 IPI00002745 IPI00291136

1 1 2(1)

18(3) 17(3) 32(11)

4.4 4.3 4.1

extracellular extracellular extracellular

PLAU

IPI00296180

1

13(5)

3.9

extracellular

MET

IPI00029273

1

13(5)

3.8

plasma membrane

IL6 COL5A1

IPI00007793 IPI00477611

1 1

12(5) 11(4)

3.8 3.6

extracellular extracellular

LAMB3 L1CAM

IPI00299404 IPI00027087

1 1

10(6) 10(3)

3.5 3.5

extracellular plasma membrane

TPP1 TGFBI

IPI000298237 IPI00018219

1 3(1)

10(6) 27(7)

3.5 3.5

extracellular extracellular

FSTL1

IPI00029723

1

9(4)

3.4

extracellular

LDLR

IPI00000070

2(1)

17(7)

3.3

plasma membrane

ECM1

IPI00003351

1

8(3)

3.1

extracellular

PTPRF

IPI00465186

120(39)

1

-7.4

plasma membrane

CELSR2

IPI00015346

69(25)

1

-6.5

plasma membrane

CPE NUCB2 NEO1 LAMB2 CDH1

IPI00031121 IPI00009123 IPI00023814 IPI00296922 IPI00025861

53(14) 40(11) 36(15) 90(34) 29(8)

1 1 1 3(2) 1

-6.2 -5.8 -5.6 -5.3 -5.3

plasma membrane extracellular plasma membrane extracellular plasma membrane

PTGFRN

IPI00022048

28(10)

1

-5.2

plasma membrane

IGFBP2

IPI00297284

24(8)

1

-5.0

extracellular

SERPINA5

IPI00007221

24(7)

1

-5.0

extracellular

1498

Journal of Proteome Research • Vol. 8, No. 3, 2009

research articles

Biomarker Search by Proteomics Profiling of Secretomes Table 1. Continued protein name

gene name

Neutral amino acid transporter B(0) Neuronal cell adhesion molecule FRAS1-related extracellular matrix protein 2 BCAM Lutheran blood group glycoprotein Roundabout homologue 1 Lysosomla alpha-glucosidase Lipolysis-stimulated lipoprotein receptor Glypican-1 Sortilin-related receptor Receptor-type tyrosineprotein phosphatase S Receptor-type tyrosineprotein phosphatase eta 4F2 cell-surface antigen heavy chain Protein S100-A16 Neurosecretory protein VGF SH3 domain-bidning glutamic acid-rich-like protein Stanniocalcin-2 Agrin Integrin alpha-V Latent-transforming growth factor beta-binding protein Isochorismatase domaincontaining protein 1 a

NSC (nr unique peptides).

b

acc. number

MCF7a

MDA231a

log2 ratio (MDA231/MCF7)b

cellular localizationc

SLC1A5

IPI00019472

23(3)

1

-5.0

plasma membrane

NRCAM

IPI00333777

23(8)

1

-4.9

plasma membrane

FREM2

IPI00180707

21(10)

1

-4.8

plasma membrane

BCAM

IPI00002406

18(4)

1

-4.6

plasma membrane

ROBO1 GAA LSR

IPI00418121 IPI00293088 IPI00641640

17(9) 16(6) 16(3)

1 1 1

-4.5 -4.4 -4.4

plasma membrane extracellular plasma membrane

GPC1 SORL1 PTPRS

IPI00015688 IPI00022608 IPI000289831

15(7) 13(6) 12(6)

1 1 1

-4.3 -4.1 -4.0

extracellular plasma membrane plasma membrane

PTPRJ

IPI00290328

12(3)

1

-4.0

plasma membrane

SLC3A2

IPI00027493

34(8)

3(1)

-3.9

plasma membrane

S100A16 VGF SH3BGRL

IPI00062120 IPI00069058 IPI00025318

11(5) 11(7) 10(2)

1 1 1

-3.8 -3.8 -3.7

extracellular extracellular extracellular

STC2 AGR2 ITGAV LTBP1

IPI00008780 IPI00007427 IPI00555991 IPI00220249

10(3) 9(4) 9(4) 24(8)

1 1 1 3(1)

-3.7 -3.6 -3.6 -3.5

extracellular extracellular plasma membrane extracellular

ISOC1

IPI00304082

8(5)

1

-3.4

extracellular

Log2(NSC/NSC). c Cellular localization predicted (see Experimental Procedures and Supporting Information Table S5).

“NSC_MW”) and then expressed relative to the average NSC_MW of 5 ribosomal proteins (Supporting Information Table S8). Nearly all of them have ratios g1.0, clearly indicating selective release from the cells, that is, secretion. We then compared the proteomics data in Table 1 with available gene expression data for the same cell lines. The mRNA expression data for MCF7 and MDA-MB-231 was downloaded from a recent publication and the ratios of fold change of MDA-MB-231 over MCF7 were calculated.27 Data for 219 secreted proteins present in the breast cancer secretome was found in the transcriptomics data set. Z-scores for all fold changes of MDA-MB-231 over MCF7, for both proteomics and gene expression data sets, were generated to do the comparison between proteomics and transcriptomics. Z-score scaling is used commonly for the scaling of gene expression data.49 Shifting the values in a vector by the global mean of the data set and scaling them by the global standard deviation is a common method for making different data vectors comparable. This converts each z-score value into a deviation from the mean. Figure 7 shows a plot comparing the z-scores between proteomics and transcriptomics for 219 proteins found in both data sets. There is obviously a poor correlation between the two technologies in our data set. Among the proteins that show bigger differences between the proteomics and transcriptomics data, there are cytokines (CSF-1 and IL6), proteases (cathepsins B, D and Z), as well as membrane receptors (the receptor tyrosine kinase AXL and prostaglandin F2 receptor), among others. The reasons for these striking differences are probably due to the several regulatory steps between gene expression

Figure 7. Correlation of gene expression and protein secretion changes. The z-scores for 219 secretome proteins obtained by proteomic analyses were compared with the mRNA expression data for MCF7 versus MDA-MB-231 cells. NSC for all proteins detected in MCF7 and MDA-MB-231 were log2 transformed. The log2 signal for the Affymetrix probes representing genes for which proteins have been measured in this work were extracted and compared to the log2 NSC. The log2 (MDA-MB231) - log2 (MCF7) was calculated for every Affymetrix probe and protein (NSC) in our data set. Next, z-scores were calculated for all log2 fold-ratios (MDA-MB-231/ MCF7), for both the secretome and the gene expression data sets, to put the proteomics and the gene expression data in a comparable scale. Z-scores were calculated as follows: every log2 fold-ratio (MDA-MB-231/ MCF7) was subtracted from the global mean log2 fold-ratio for the data set and then divided by the global standard deviation. Journal of Proteome Research • Vol. 8, No. 3, 2009 1499

research articles

Lawlor et al.

Figure 8. Pathway analysis of the breast cancer secretome. The breast cancer secretome data from MCF7 and MDA-MB-231 cell lines was uploaded into ingenuity pathways analysis (IPA) software. The data set was filtered to keep only the differentially regulated proteins between the two cell lines that were associated with cancer. A total of 71 proteins predicted as either “secreted” or “plasma membrane” were identified, mapped to gene objects in the IPA Knowledge Base, and used to generate biological networks. Eight networks were generated and the four top networks (containing 67 out of the 71 proteins) were merged to obtain a global view. Furthermore, the Path designer feature was used to create a graphically rich network image. The network consists of 67 secreted/membrane proteins involved in breast cancer and 65 genes (yellow) interconnected to this group by IPA. The significance score of the network was better than 24 (p-value < 10-24). Nodes represent genes, with their shape representing the functional class of the gene product, and edges indicate the biological relationship between the nodes. In this case, only direct interactions were selected for the analysis. The significance of the fold-change expression is represented by color intensity (red, overexpression in MDA-MB-231; green, overexpression in MCF7). The subcellular localization of the secretome proteins in the network is the one previously predicted (see Experimental Procedures and Supporting Information Table S5).

and protein secretion. However, some intracellular proteins like nuclear proteins and heat-shock proteins show almost a perfect agreement between proteomics and transcriptomics. This trend has been reported earlier and supports that transcriptomics might not be the best way to profile secreted and membrane proteins.18 The protein list containing the log2 fold-change ratios (MDAMB-231/MCF7) for the breast cancer secretome data was then imported into IPA. For simplicity, only regulated proteins were considered for the analysis. A cutoff of a 2 fold-change (where the minimum signal for any protein has to be NSC g 5 in one of the two cell lines) was applied to the list, selecting 123 proteins to generate biological networks using the Ingenuity Pathways Knowledge Base. In addition, since our goal is to find secreted proteins that are linked to signaling pathways altered in cancer, the protein list was further narrowed by only considering secreted proteins that could be involved in cancer. Biological functions associated with proteins in the biological networks are displayed in the Functional Analysis feature in the order of their significance to the network. A total of 71 proteins out of the 123 regulated proteins in the data set were associated with cancer by IPA with p-values of association between 10-16 and 10-3. A separate analysis was run using only 1500

Journal of Proteome Research • Vol. 8, No. 3, 2009

these 71 proteins related to cancer. This analysis generated eight networks where the top four contained 67 out of the 71 molecules. The significance score of the four individual networks was better than 24 (p-value < 10-24). Then, the four networks were merged to obtain a global view of the proteins involved in cancer that are differentially regulated in the breast cancer secretome (Figure 8, Supporting Information Table S9). The merged network contains 67 secretome proteins and 65 gene objects (in yellow) connected to them by IPA. In Figure 8, green reflects overexpression in the invasive cell line (MCF7), and red represents overexpression in MDA-MB-231 with the significance of that expression represented by color intensity. For simplicity, from now on, we will refer to the secreted or membrane proteins in the secretomes as being ‘overexpressed’ in one or the other of the two breast cancer cell lines. However, we are aware that the mass spectrometric signal of a secreted protein is the result of gene expression and mRNA stability and translation to protein, as well as protein secretion. The first conclusion from these results is that, despite the differences obtained between gene expression and proteomics, the breast cancer molecular subtyping generated by gene expression seems abundantly propagated to protein secretion, since there are several differentially regulated proteins between the two

research articles

Biomarker Search by Proteomics Profiling of Secretomes

Figure 9. Verification of secretome protein expression by Western blotting. Conditioned medium from MCF7 and MDA-MB-231 cells was concentrated and 2 µg of total protein for each lane was loaded onto SDS-PAGE gels. Then, proteins were transferred onto PVDF and probed with the indicated secretome protein antibodies.

model cell line secretomes. However, while several proteins show different regulation between the luminal and the basal subtypes in the secretome, these differences do not necessarily agree between gene expression and proteomics technologies. Cathepsins are proteases that are located in lysosomes and other intracellular compartments, and participate in protein turnover.50 However, during neoplasia, their cellular localization change to the plasma membrane or even the extracellular space, and have been involved in apoptosis, angiogenesis, cell proliferation, and invasion.51 Cathepsin B and Z are overexpressed in the MDA-MB-231 secretome by 4-fold up over MCF7 (quadrant 2 in Figure 8). Recently, it has been reported that a cathepsin Z inhibitor is able to significantly reduce bone metastasis progression in breast cancer.52 Cathepsin B has been shown to be involved in cancer invasion through extracellular matrix degradation. One of Cathepsin B’s main proteolytic targets is the cell-adhesion protein E-cadherin (CDH1), which is reduced or absent in invasive tumors. The N-terminal extracellular domain of E-cadherin is responsible for making cell-cell junctions in epithelial cells.53 E-cadherin is overexpressed by 5-fold up in MCF7 (being undetectable in the invasive cell line) (quadrant 1 in Figure 8). The overexpression of these proteins was also confirmed by Western blotting using the conditioned media from the two cell lines (Figure 9). Another secretome protein, the colony-stimulating factor 1 (CSF1), has been correlated with poor prognosis in breast cancer due to its attraction of tumor-associated macrophages in abundance.54 CSF1 is one of the most overexpressed proteins in the MDA-MB-231 to MCF7 comparison (6.5-fold up) (quadrant 1 in Figure 8). Urokinase-type plasminogen activator (uPA, PLAU) and tissue-type plasminogen activator (tPA, PLAT), two proteases that have been involved in tumor invasion and metastasis by degrading and remodeling the extracellular matrix, are ovexpressed in MDA231 by 4- and 7-fold up over MCF7, respectively (quadrant 4 in Figure 8).55 Both uPA and tPA proteases are inhibited by plasminogen activator inhibitor 1 and 3 (SERPINE1/PAI-1 and SERPINA5/PAI-3, respectively) among others. Paradoxically, increased levels of PAI-1 (but not PAI-3) are associated with poor prognosis showing that the adverse effects of the plasminogen activating system in cancer cannot be attributed solely to proteolysis.56,57 The overexpression of these proteins was also confirmed by Western blotting

using the conditioned media from the two cell lines (Figure 9). Both, the secretome and the Western blotting results follow the findings obtained using other methodologies by finding overexpression of uPA, tPA and PAI-1 but not PAI-3 in MDAMB-231 (quadrant 4 in Figure 8, Figure 9). Therefore, secretome studies seem to be able to monitor most of the plasminogen activating system as a whole, making it attractive for drug development and therapeutic monitoring. The insulin growth factor (IGF) signaling pathway is also differentially regulated in our data set with overexpression of some members in the luminal type (insulin-like growth factorbinding protein 2 and 5 (IGFBP2 and IGFBP5)) and others in the basal type (insulin-like growth factor-binding protein 7 (IGFBP7) and CYR61) (quadrant 1 in Figure 8, Figure 9). IGFBPs regulate the activity of IGFs by extending their half-life, but some studies have shown that they also have IGF-independent functions.58,59 IGFBP2 has been reported to be elevated in brain and prostate cancer, while IGFBP5 reports have been contradictory on its effects on different solid tumors.60,61 CYR61 is also part of the IGF signaling too and is also overexpressed in MDA-MB-231. CYR61 has been shown to promote an aggressive breast cancer phenotype.62 Despite the limitations described in the last section and the fact that we are just showing proof of principle, the results presented here demonstrate the potential for the secretome technology when coupled with pathway analysis for pathwaybased biomarker and drug target discovery and monitoring.

Conclusions A comprehensive method for secretome analysis using “stacking gels”, label-free relative quantitation, and pathway analysis was developed and verified using six cancer cell lines from three different tissues. In this work, we used a rational strategy to interface cell culture with SDS-PAGE gels and mass spectrometry by minimizing the abundant serum protein signal and the number of sample fractions to render the maximum possible number of proteins of the cancer secretome. The protocol presented here increases the throughput of secretome analysis by approximately 1 order of magnitude compared to previously presented methodologies. We were able to generate 300 proteins and about 1500 unique peptides on average for six cell lines in triplicate applying a very stringent confidence filter. With the use of state-of-the-art predictive algorithms and databases, we were able to assign 47% of the cancer secretome to secreted and plasma membrane proteins. Pathway analysis software was then used to attempt integrating the secretome data into a cellular signaling context. The goal of this process was to link secreted proteins to a number of canonical signal pathways and drug targets to show its potential as a pathwaybased biomarker discovery strategy. Finally, pathway analysis was used to study differentially regulated proteins in invasive breast cancer. The analysis suggested that IGF signaling and the plasminogen activating system may be among the most differentially regulated pathways when the secretomes of a basal and a luminal type cell line are compared. In addition, the analysis uncovered dozens of proteins that may be differentially regulated and could become potential biomarkers for breast cancer. Some of these results were verified by Western blot analysis, validating the quantitation data obtained by labelfree quantitation mass spectrometry. However, neither one of our tentative conclusions has direct clinical relevance at this moment and will not until our findings are expanded to a substantially larger number of cell lines and verified using Journal of Proteome Research • Vol. 8, No. 3, 2009 1501

research articles relevant clinical specimen. Nonetheless, secretome analysis raises the prospect of a rational discovery of cancer biomarkers and another opportunity for mass spectrometry-based clinical proteomics.

Acknowledgment. We thank Alex Lash, Nicholas Socci and Larry Engel for helpful discussions. Supporting Information Available: Supplementary Table S1, serum proteins removed from the secretome data set; Supplementary Table S2, scaffold peptide report; Supplementary Table S3, scaffold protein report; Supplementary Table S4, normalized spectral counts of secretome replicates; Supplementary Table S5, cellular compartment predictive models; Supplementary Table S6, list of secretome proteins considered secreted or plasma membrane; Supplementary Table S7, molecule list for the Ingenuity Pathways merged network (6 cell lines); Supplementary Table S8, list of MW-normaized NSCs for predicted extracellular proteins, expressed relative to the average ribosomal protein MW_NSC; Supplementary Table S9, molecule list for the Ingenuity Pathways merged network (breast cancer secretome). This material is available free of charge via the Internet at http://pubs.acs.org. References (1) Papadopoulos, N.; Kinzler, K. W.; Vogelstein, B. The role of companion diagnostics in the development and use of mutationtargeted cancer therapies. Nat. Biotechnol. 2006, 24 (8), 985–95. (2) Frank, R.; Hargreaves, R. Clinical biomarkers in drug discovery and development. Nat. Rev. Drug Discovery 2003, 2 (7), 566–80. (3) Ludwig, J. A.; Weinstein, J. N. Biomarkers in cancer staging, prognosis and treatment selection. Nat. Rev. Cancer 2005, 5 (11), 845–56. (4) Sawyers, C. L. The cancer biomarker problem. Nature 2008, 452 (7187), 548–52. (5) Druker, B. J.; Talpaz, M.; Resta, D. J.; Peng, B.; Buchdunger, E.; Ford, J. M.; Lydon, N. B.; Kantarjian, H.; Capdeville, R.; OhnoJones, S.; Sawyers, C. L. Efficacy and safety of a specific inhibitor of the BCR-ABL tyrosine kinase in chronic myeloid leukemia. N. Engl. J. Med. 2001, 344 (14), 1031–7. (6) Daub, H.; Specht, K.; Ullrich, A. Strategies to overcome resistance to targeted protein kinase inhibitors. Nat. Rev. Drug Discovery 2004, 3 (12), 1001–10. (7) Baselga, J. Targeting tyrosine kinases in cancer: the second wave. Science 2006, 312 (5777), 1175–8. (8) Vogelstein, B.; Kinzler, K. W. Cancer genes and the pathways they control. Nat. Med. 2004, 10 (8), 789–99. (9) Bild, A. H.; Yao, G.; Chang, J. T.; Wang, Q.; Potti, A.; Chasse, D.; Joshi, M. B.; Harpole, D.; Lancaster, J. M.; Berchuck, A.; Olson, J. A., Jr.; Marks, J. R.; Dressman, H. K.; West, M.; Nevins, J. R. Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature 2006, 439 (7074), 353–7. (10) Tomlins, S. A.; Mehra, R.; Rhodes, D. R.; Cao, X.; Wang, L.; Dhanasekaran, S. M.; Kalyana-Sundaram, S.; Wei, J. T.; Rubin, M. A.; Pienta, K. J.; Shah, R. B.; Chinnaiyan, A. M. Integrative molecular concept modeling of prostate cancer progression. Nat. Genet. 2007, 39 (1), 41–51. (11) Anderson, N. L.; Anderson, N. G. The human plasma proteome: history, character, and diagnostic prospects. Mol. Cell. Proteomics 2002, 1 (11), 845–67. (12) Sidransky, D. Emerging molecular markers of cancer. Nat. Rev. Cancer 2002, 2 (3), 210–9. (13) Rifai, N.; Gillette, M. A.; Carr, S. A. Protein biomarker discovery and validation: the long and uncertain path to clinical utility. Nat. Biotechnol. 2006, 24 (8), 971–83. (14) Villanueva, J.; Shaffer, D. R.; Philip, J.; Chaparro, C. A.; ErdjumentBromage, H.; Olshen, A. B.; Fleisher, M.; Lilja, H.; Brogi, E.; Boyd, J.; Sanchez-Carbayo, M.; Holland, E. C.; Cordon-Cardo, C.; Scher, H. I.; Tempst, P. Differential exoprotease activities confer tumorspecific serum peptidome patterns. J. Clin. Invest. 2006, 116 (1), 271–84. (15) Villanueva, J.; Nazarian, A.; Lawlor, K.; Yi, S. S.; Robbins, R. J.; Tempst, P. A sequence-specific exopeptidase activity test (SSEAT) for “functional” biomarker discovery. Mol. Cell. Proteomics 2008, 7 (3), 509–18.

1502

Journal of Proteome Research • Vol. 8, No. 3, 2009

Lawlor et al. (16) Villanueva, J.; Philip, J.; Chaparro, C. A.; Li, Y.; Toledo-Crow, R.; DeNoyer, L.; Fleisher, M.; Robbins, R. J.; Tempst, P. Correcting common errors in identifying cancer-specific serum peptide signatures. J. Proteome Res. 2005, 4 (4), 1060–72. (17) Volmer, M. W.; Stuhler, K.; Zapatka, M.; Schoneck, A.; Klein-Scory, S.; Schmiegel, W.; Meyer, H. E.; Schwarte-Waldhoff, I. Differential proteome analysis of conditioned media to detect Smad4 regulated secreted biomarkers in colon cancer. Proteomics 2005, 5 (10), 2587– 601. (18) Gronborg, M.; Kristiansen, T. Z.; Iwahori, A.; Chang, R.; Reddy, R.; Sato, N.; Molina, H.; Jensen, O. N.; Hruban, R. H.; Goggins, M. G.; Maitra, A.; Pandey, A. Biomarker discovery from pancreatic cancer secretome using a differential proteomic approach. Mol. Cell. Proteomics 2006, 5 (1), 157–71. (19) Allinen, M.; Beroukhim, R.; Cai, L.; Brennan, C.; Lahti-Domenici, J.; Huang, H.; Porter, D.; Hu, M.; Chin, L.; Richardson, A.; Schnitt, S.; Sellers, W. R.; Polyak, K. Molecular characterization of the tumor microenvironment in breast cancer. Cancer Cell 2004, 6 (1), 17– 32. (20) Chen, S. T.; Pan, T. L.; Juan, H. F.; Chen, T. Y.; Lin, Y. S.; Huang, C. M. Breast tumor microenvironment: proteomics highlights the treatments targeting secretome. J. Proteome Res. 2008, 7 (4), 1379– 87. (21) Erdjument-Bromage, H.; Lui, M.; Lacomis, L.; Grewal, A.; Annan, R. S.; McNulty, D. E.; Carr, S. A.; Tempst, P. Examination of microtip reversed-phase liquid chromatographic extraction of peptide pools for mass spectrometric analysis. J. Chromatogr., A 1998, 826 (2), 167–81. (22) Keller, A.; Nesvizhskii, A. I.; Kolker, E.; Aebersold, R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 2002, 74 (20), 5383–92. (23) Nesvizhskii, A. I.; Keller, A.; Kolker, E.; Aebersold, R. A statistical model for identifying proteins by tandem mass spectrometry. Anal. Chem. 2003, 75 (17), 4646–58. (24) Nielsen, H.; Engelbrecht, J.; Brunak, S.; von Heijne, G. Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng. 1997, 10 (1), 1–6. (25) Bendtsen, J. D.; Jensen, L. J.; Blom, N.; Von Heijne, G.; Brunak, S. Feature-based prediction of non-classical and leaderless protein secretion. Protein Eng. Des. Sel. 2004, 17 (4), 349–56. (26) Rhee, S. Y.; Wood, V.; Dolinski, K.; Draghici, S. Use and misuse of the gene ontology annotations. Nat. Rev. Genet. 2008, 9 (7), 509– 15. (27) Neve, R. M.; Chin, K.; Fridlyand, J.; Yeh, J.; Baehner, F. L.; Fevr, T.; Clark, L.; Bayani, N.; Coppe, J. P.; Tong, F.; Speed, T.; Spellman, P. T.; DeVries, S.; Lapuk, A.; Wang, N. J.; Kuo, W. L.; Stilwell, J. L.; Pinkel, D.; Albertson, D. G.; Waldman, F. M.; McCormick, F.; Dickson, R. B.; Johnson, M. D.; Lippman, M.; Ethier, S.; Gazdar, A.; Gray, J. W. A collection of breast cancer cell lines for the study of functionally distinct cancer subtypes. Cancer Cell 2006, 10 (6), 515–27. (28) Zybailov, B.; Mosley, A. L.; Sardiu, M. E.; Coleman, M. K.; Florens, L.; Washburn, M. P. Statistical analysis of membrane proteome expression changes in Saccharomyces cerevisiae. J. Proteome Res. 2006, 5 (9), 2339–47. (29) Old, W. M.; Meyer-Arendt, K.; Aveline-Wolf, L.; Pierce, K. G.; Mendoza, A.; Sevinsky, J. R.; Resing, K. A.; Ahn, N. G. Comparison of label-free methods for quantifying human proteins by shotgun proteomics. Mol. Cell. Proteomics 2005, 4 (10), 1487–502. (30) Searle, B. C.; Turner, M.; Nesvizhskii, A. I. Improving sensitivity by probabilistically combining results from multiple MS/MS search methodologies. J. Proteome Res. 2008, 7 (1), 245–53. (31) Kulasingam, V.; Diamandis, E. P. Proteomics analysis of conditioned media from three breast cancer cell lines: a mine for biomarkers and therapeutic targets. Mol. Cell. Proteomics 2007, 6 (11), 1997–2011. (32) Mbeunkui, F.; Metge, B. J.; Shevde, L. A.; Pannell, L. K. Identification of differentially secreted biomarkers using LC-MS/MS in isogenic cell lines representing a progression of breast cancer. J. Proteome Res. 2007, 6 (8), 2993–3002. (33) States, D. J.; Omenn, G. S.; Blackwell, T. W.; Fermin, D.; Eng, J.; Speicher, D. W.; Hanash, S. M. Challenges in deriving highconfidence protein identifications from data gathered by a HUPO plasma proteome collaborative study. Nat. Biotechnol. 2006, 24 (3), 333–8. (34) Keller, M.; Ruegg, A.; Werner, S.; Beer, H. D. Active caspase-1 is a regulator of unconventional protein secretion. Cell 2008, 132 (5), 818–31.

research articles

Biomarker Search by Proteomics Profiling of Secretomes (35) Ritch, P. A.; Carroll, S. L.; Sontheimer, H. Neuregulin-1 enhances motility and migration of human astrocytic glioma cells. J. Biol. Chem. 2003, 278 (23), 20971–8. (36) Schulze, W. X.; Deng, L.; Mann, M. Phosphotyrosine interactome of the ErbB-receptor kinase family. Mol. Syst. Biol. 2005, 1, 2005– 0008. (37) Thelemann, A.; Petti, F.; Griffin, G.; Iwata, K.; Hunt, T.; Settinari, T.; Fenyo, D.; Gibson, N.; Haley, J. D. Phosphotyrosine signaling networks in epidermal growth factor receptor overexpressing squamous carcinoma cells. Mol. Cell. Proteomics 2005, 4 (4), 356– 76. (38) Mackay, A.; Jones, C.; Dexter, T.; Silva, R. L.; Bulmer, K.; Jones, A.; Simpson, P.; Harris, R. A.; Jat, P. S.; Neville, A. M.; Reis, L. F.; Lakhani, S. R.; O’Hare, M. J. cDNA microarray analysis of genes associated with ERBB2 (HER2/neu) overexpression in human mammary luminal epithelial cells. Oncogene 2003, 22 (17), 2680– 8. (39) Landis, M. D.; Seachrist, D. D.; Montanez-Wiscovich, M. E.; Danielpour, D.; Keri, R. A. Gene expression profiling of cancer progression reveals intrinsic regulation of transforming growth factor-beta signaling in ErbB2/Neu-induced tumors from transgenic mice. Oncogene 2005, 24 (33), 5173–90. (40) Landis, M. D.; Seachrist, D. D.; Abdul-Karim, F. W.; Keri, R. A. Sustained trophism of the mammary gland is sufficient to accelerate and synchronize development of ErbB2/Neu-induced tumors. Oncogene 2006, 25 (23), 3325–34. (41) Ueda, T.; Bruchovsky, N.; Sadar, M. D. Activation of the androgen receptor N-terminal domain by interleukin-6 via MAPK and STAT3 signal transduction pathways. J. Biol. Chem. 2002, 277 (9), 7076– 85. (42) Steiner, H.; Godoy-Tundidor, S.; Rogatsch, H.; Berger, A. P.; Fuchs, D.; Comuzzi, B.; Bartsch, G.; Hobisch, A.; Culig, Z. Accelerated in vivo growth of prostate tumors that up-regulate interleukin-6 is associated with reduced retinoblastoma protein expression and activation of the mitogen-activated protein kinase pathway. Am. J. Pathol. 2003, 162 (2), 655–63. (43) Silletti, S.; Yebra, M.; Perez, B.; Cirulli, V.; McMahon, M.; Montgomery, A. M. Extracellular signal-regulated kinase (ERK)-dependent gene expression contributes to L1 cell adhesion moleculedependent motility and invasion. J. Biol. Chem. 2004, 279 (28), 28880–8. (44) Schaefer, A. W.; Kamiguchi, H.; Wong, E. V.; Beach, C. M.; Landreth, G.; Lemmon, V. Activation of the MAPK signal cascade by the neural cell adhesion molecule L1 requires L1 internalization. J. Biol. Chem. 1999, 274 (53), 37965–73. (45) Ren, Y.; Chan, H. M.; Li, Z.; Lin, C.; Nicholls, J.; Chen, C. F.; Lee, P. Y.; Lui, V.; Bacher, M.; Tam, P. K. Upregulation of macrophage migration inhibitory factor contributes to induced N-Myc expression by the activation of ERK signaling pathway and increased expression of interleukin-8 and VEGF in neuroblastoma. Oncogene 2004, 23 (23), 4146–54. (46) Condorelli, G.; Trencia, A.; Vigliotta, G.; Perfetti, A.; Goglia, U.; Cassese, A.; Musti, A. M.; Miele, C.; Santopietro, S.; Formisano, P.; Beguinot, F. Multiple members of the mitogen-activated protein kinase family are necessary for PED/PEA-15 anti-apoptotic function. J. Biol. Chem. 2002, 277 (13), 11013–8. (47) Sorlie, T.; Perou, C. M.; Tibshirani, R.; Aas, T.; Geisler, S.; Johnsen, H.; Hastie, T.; Eisen, M. B.; van de Rijn, M.; Jeffrey, S. S.; Thorsen,

(48)

(49) (50) (51)

(52)

(53) (54) (55) (56) (57)

(58) (59) (60) (61)

(62)

T.; Quist, H.; Matese, J. C.; Brown, P. O.; Botstein, D.; Eystein Lonning, P.; Borresen-Dale, A. L. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc. Natl. Acad. Sci. U.S.A. 2001, 98 (19), 10869–74. Perou, C. M.; Sorlie, T.; Eisen, M. B.; van de Rijn, M.; Jeffrey, S. S.; Rees, C. A.; Pollack, J. R.; Ross, D. T.; Johnsen, H.; Akslen, L. A.; Fluge, O.; Pergamenschikov, A.; Williams, C.; Zhu, S. X.; Lonning, P. E.; Borresen-Dale, A. L.; Brown, P. O.; Botstein, D. Molecular portraits of human breast tumours. Nature 2000, 406 (6797), 747– 52. Cheadle, C.; Cho-Chung, Y. S.; Becker, K. G.; Vawter, M. P. Application of z-score transformation to Affymetrix data. Appl. Bioinf. 2003, 2 (4), 209–17. Mohamed, M. M.; Sloane, B. F. Cysteine cathepsins: multifunctional enzymes in cancer. Nat. Rev. Cancer 2006, 6 (10), 764–75. Gocheva, V.; Zeng, W.; Ke, D.; Klimstra, D.; Reinheckel, T.; Peters, C.; Hanahan, D.; Joyce, J. A. Distinct roles for cysteine cathepsin genes in multistage tumorigenesis. Genes Dev. 2006, 20 (5), 543– 56. Le Gall, C.; Bellahcene, A.; Bonnelye, E.; Gasser, J. A.; Castronovo, V.; Green, J.; Zimmermann, J.; Clezardin, P. A cathepsin K inhibitor reduces breast cancer induced osteolysis and skeletal tumor burden. Cancer Res. 2007, 67 (20), 9894–902. Beavon, I. R. The E-cadherin-catenin complex in tumour metastasis: structure, function and regulation. Eur. J. Cancer 2000, 36 (13 Spec. No.), 1607–20. Lin, E. Y.; Pollard, J. W. Macrophages: modulators of breast cancer progression. Novartis Found. Symp. 2004, 256, 158-68, discussion 168-72, 259-69. Sidenius, N.; Blasi, F. The urokinase plasminogen activator system in cancer: recent advances and implication for prognosis and therapy. Cancer Metastasis Rev. 2003, 22 (2-3), 205–22. Binder, B. R.; Mihaly, J. The plasminogen activator inhibitor “paradox” in cancer. Immunol. Lett. 2008, 118 (2), 116–24. Beaulieu, L. M.; Whitley, B. R.; Wiesner, T. F.; Rehault, S. M.; Palmieri, D.; Elkahloun, A. G.; Church, F. C. Breast cancer and metabolic syndrome linked through the plasminogen activator inhibitor-1 cycle. BioEssays 2007, 29 (10), 1029–38. Pollak, M. N.; Schernhammer, E. S.; Hankinson, S. E. Insulin-like growth factors and neoplasia. Nat. Rev. Cancer 2004, 4 (7), 505– 18. Fang, P.; Hwa, V.; Rosenfeld, R. IGFBPs and cancer. Novartis Found. Symp. 2004, 262, 215-30, discussion 230-4, 265-8. Beattie, J.; Allan, G. J.; Lochrie, J. D.; Flint, D. J. Insulin-like growth factor-binding protein-5 (IGFBP-5): a critical member of the IGF axis. Biochem. J. 2006, 395 (1), 1–19. Mehrian-Shai, R.; Chen, C. D.; Shi, T.; Horvath, S.; Nelson, S. F.; Reichardt, J. K.; Sawyers, C. L. Insulin growth factor-binding protein 2 is a candidate biomarker for PTEN status and PI3K/Akt pathway activation in glioblastoma and prostate cancer. Proc. Natl. Acad. Sci. U.S.A. 2007, 104 (13), 5563–8. O’Kelly, J.; Chung, A.; Lemp, N.; Chumakova, K.; Yin, D.; Wang, H. J.; Said, J.; Gui, D.; Miller, C. W.; Karlan, B. Y.; Koeffler, H. P. Functional domains of CCN1 (Cyr61) regulate breast cancer progression. Int. J. Oncol. 2008, 33 (1), 59–67.

PR8008572

Journal of Proteome Research • Vol. 8, No. 3, 2009 1503