Prioritization of Candidate Protein Biomarkers from ... - ACS Publications

Dec 15, 2009 - Model System of Breast Tumor Progression Toward Clinical. Verification. Thomas ... Conway Institute Proteome Research Centre, UCD Conwa...
0 downloads 0 Views 4MB Size
Prioritization of Candidate Protein Biomarkers from an In Vitro Model System of Breast Tumor Progression Toward Clinical Verification Thomas Y. K. Lau,†,‡ Karen A. Power,† Sophie Dijon,§ Isabelle de Gardelle,| Susan McDonnell,⊥ Michael J. Duffy,| Stephen R. Pennington,‡,# and William M. Gallagher*,† UCD School of Biomolecular and Biomedical Science, UCD Conway Institute, University College Dublin, Ireland, Universite´ des Sciences et Technologies de Lille, Lille, France, St. Vincent’s University Hospital, Dublin and UCD School of Medicine and Medical Science, University College Dublin, Dublin, Ireland, UCD School of Chemical and Bioprocess Engineering, University College Dublin, Ireland, Conway Institute Proteome Research Centre, UCD Conway Institute, Dublin, Ireland, and UCD School of Medicine and Medical Science, University College Dublin, Ireland Received October 31, 2009

The use of in vitro cell culture model systems has revealed many potential mediators and candidate biomarkers of various disease phenotypes. To be of clinical utility, the expression of these candidates must be assessed in patient samples such as tissue, urine or blood. However, typical “omic” experiments may produce candidates in such large numbers that it is usually impossible to test all of these in clinical samples. Here, we present a proteomic approach to discover and prioritize candidate biomarkers that are more likely to be found in serum. Using a combination of experimental and in silico approaches, we have demonstrated this approach using an isogenic cell culture model of breast cancer invasion. Differential proteomics (2D-DIGE) was used to discover a number of candidate biomarkers and a subset of these were identified as “extracellular”. We tested the validity of this approach by screening serum from breast cancer patients for these candidates and then verified the presence of several of these “extracellular” proteins. This approach provides a pragmatic approach to prioritizing candidates that may be most suitable for downstream assays such as multiple reaction monitoring. Keywords: biomarker • model system • informatics • 2D-DIGE • secretion • mass spectrometry • breast cancer

Introduction Currently, a wealth of cancer biomarker research is conducted using various in vitro cell line model systems emulating one or many phenotypic characteristics.1 These systems are typically easy to establish, maintain and manipulate, which makes them ideal for dissecting pathways and identifying mediators in cancer processes such as invasion, metastasis, angiogenesis, and drug response. These mediators may also have utility as biomarkers to aid detection, disease segregation, and monitoring in cancer. However, model systems lack the biological complexity and variability that is inherent in patient samples. Consequently, due to cost and difficulties associated * To whom correspondence should be addressed. William Gallagher, e-mail: [email protected], tel. +353-1-7166743; fax. +353-1-2837211. † UCD School of Biomolecular and Biomedical Science, UCD Conway Institute, University College Dublin. ‡ Conway Institute Proteome Research Centre, UCD Conway Institute. § Universite´ des Sciences et Technologies de Lille. | St. Vincent’s University Hospital, Dublin and UCD School of Medicine and Medical Science, University College Dublin. ⊥ UCD School of Chemical and Bioprocess Engineering, University College Dublin. # UCD School of Medicine and Medical Science, University College Dublin.

1450 Journal of Proteome Research 2010, 9, 1450–1459 Published on Web 12/15/2009

with generating clinical assays,2 these candidate biomarkers are rarely evaluated on clinical material. Thus, the clinical utility of the vast majority of these candidates, either individually or as panels, remains unknown. The lack of clinical evaluation of most candidate markers can be attributed to the numerous difficulties associated with screening protein biomarkers in patient tissues and fluids.3 At the forefront of these issues is the need for high-throughput screening methods, as typical “omic” approaches can generate hundreds to thousands of candidates. Traditionally, proteins of interest are tested using Western blotting or ELISA assays, both of which require the generation and optimization of specific antibodies. Therefore, the cost and time associated with generating individual assays is considerable and make this option both prohibitively expensive and inefficient for high numbers of proteins. Recently, new methods have been developed using multiple reaction monitoring (MRM) mass spectrometry that facilitate multiplexed detection and quantification of proteins;4,5 however, the full implementation of these methods requires the generation of heavy isotope labeled peptides, which presents similar problems with regard to cost and efficiency. 10.1021/pr900989q

 2010 American Chemical Society

research articles

Prioritization of Candidate Protein Biomarkers Considering the high cost of these quantification experiments, there is a need for new approaches to bridge biomarker discovery and clinical assessment. In addition to being resource intensive, the difficulties involved in clinical validation are further compounded by numerous, well-reviewed technical challenges presented when attempting to sensitively and robustly analyze protein expression in biological fluids such as serum and plasma.3,6 To address these challenges, several efforts have been made to characterize the “plasma proteome”,7,8 the most well documented being the HUPO Plasma Proteome Project.9 Here, we present a workflow for proteomic discovery of candidate biomarkers in an in vitro cell line model system using 2D difference gel electrophoresis (2D-DIGE), followed by the use of in silico filtering and a priori data to prioritize candidates that may be present in serum. A priori data, including gene ontology, the HUPO plasma proteome database and publicly available microarray data from the analysis of tumor tissues, was incorporated to prioritize candidates likely to be detected in serum and of potential clinical utility. The aim of this workflow is to facilitate a pragmatic, easily applicable rationale to prioritize candidates from discovery data. Prioritized candidates are those more likely to be detectable in clinical fluids, such as serum, especially if existing information suggests potential clinical utility. This helps facilitate selection of the most suitable candidates for downstream assays. This concept of biomarker “verification”, where a protein must first be robustly detected in a clinical fluid, has been previously suggested as an efficient approach to triaging unsuitable candidates.10 Verification improves efficiency by removing candidates that would have been undetectable due to the limited abundance of the protein and sensitivity of one’s instrumentation. By filtering down to proteins that can only be robustly detected, this makes designing and optimizing downstream quantification experiments, such as those using MRM,11 far more streamlined. In addition, by using a priori information, one can prioritize candidates even further by selecting those that already have an inferred role in disease (in this case, breast cancer). This can be achieved by implementing gene ontology tools and interrogating publically available gene expression data sets. We propose that this workflow can be applied to most biomarker proteomics studies, but to demonstrate this approach, we have analyzed an in vitro model of breast tumor progression. Breast cancer is one of the most common causes of cancer deaths in women of the developed world. Its lethality is mostly attributed to the primary tumor metastasising to distant sites. One of the key events in metastasis is the invasion of cancer cells into the surrounding stroma and tissue. This event is difficult to analyze in clinical specimens, as it is a complex process and often difficult to obtain sufficient numbers of high quality, well-defined specimens that represent progression of the disease. To counter these problems, there have been several proteomics studies using breast cancer cell lines of varying invasive/metastatic potential to replicate the progression of the disease.12,13 Here, we used the isogenic cell line series, Hs578T and Hs578T(i8). The parental cell line, Hs578T, is derived from a mammary epithelial carcinoma, while its derivative, Hs578T(i8), was isolated following stepwise selection for acquired invasive capacity in vitro.14,15 Previous studies have demonstrated that the Hs578T(i8) derivative displays increased invasion and migration in vitro by comparison to its parental cell line. In these studies, the Hs578T(i8)

derivative was also shown to be more tumorigenic in vivo. The approach presented here could increase the transition of candidate markers from model systems, such as these, toward clinical assessment.

Materials and Methods Cell Lines. The Hs578T cell line was purchased from the American Type Culture Collection. The invasive derivative, Hs578T(i8), was created and characterized in the McDonnell laboratory as previously described in refs 14 and 15. All cell lines were cultured in Dulbecco’s Modified Eagle Medium (Gibco) supplemented with 10% fetal bovine serum (Gibco), 1% penicillin/streptomycin (50units/ml) (Gibco), 1% L-glutamine (2 mM) (Gibco) and 1% insulin (Gibco). All cells were maintained at 37 °C with 5% CO2. Cells were regularly tested to ensure they were free of Mycoplasma contamination. Whole Cell and Conditioned Media Protein Preparation. For the preparation of whole cell extracts, cells were grown to ∼80% confluency. Media was removed and the cells were washed 3 times with ice-cold isotonic sucrose (0.35M). Cells were then recovered by scraping and pelleted by centrifugation. The cell pellet was then resuspended in lysis buffer (9.5 M Urea 2% Chaps 20 mM Tris pH 8.5, protease inhibitor cocktail) and incubated at room temperature for 15 min. For the harvesting of conditioned media, cells were grown to ∼90% confluency (10 × 106 cells). Media was removed and cells were washed 3 times with PBS buffer and then serumfree Dulbecco’s Modified Eagle Medium (DMEM) (Thermo). The cells were then cultured in serum-free media for 48 h. Following this, the media was removed and the protein precipitated using a TCA-Sodium Lauroyl Sarcosinate protocol taken from Chevallet et al.16 Protein pellets were resuspended with lysis buffer. Whole cell protein extracts were processed using a 2-D Clean Up kit (GE Healthcare) as per manufacturer’s instructions. Protein precipitates were resuspended in lysis buffer with the aid of gentle sonication. All cell culture extracts were adjusted to ∼pH 8.8 and protein concentration was determined by Bradford assay.17 2D-Difference In-Gel Electrophoresis (DIGE). A 2 dye DIGE strategy18 was used whereby each gel had 25 µg protein of experimental sample (either Hs578T or Hs578T(i8)) minimally labeled with 200 pmol Cy3 CyDye (GE Healthcare) and a 25 µg reference standard similarly labeled with Cy5 CyDye. This reference sample served as an internal standard for normalization across gels and was composed of a pool of all samples. Labeling was conducted as per the manufacturer’s instructions. For each experiment, 10 gels were electrophoresed, with 5 biological replicates of both Hs578T and Hs578T(i8). Labeled protein samples were made up to 450 µL in rehydration buffer (8 M Urea, 0.5% CHAPS, 0.2% dithiothreitol (DTT), 0.2% Pharmalyte). Samples were passively hydrated into pH 4-7 Immobiline 24 cm strips (GE Healthcare) overnight. These were isoelectrically focused on an Ettan IPGphor 3 (GE Healthcare) at 3500 V up to 75 kVhr, a 10 min gradient to 8000 V, 8000 V for 10 min, and then a 100 V hold. Following focusing, strips were equilibrated in equilibration buffer (6 M urea, 30% glycerol, 2% SDS, 0.05 M Tris buffer pH 8.8), supplemented with 1% DTT for 20 min and then 2.5% iodacetamide (IAA) for 20 min. IPG strips were then placed on 12% SDS-PAGE gels, sealed with bromophenol blue stained 1% agarose, and separated overnight in a Dodeca Cell (Biorad) at 1W per gel. Fluorescence was subsequently visualized on a Typhoon 9410 Journal of Proteome Research • Vol. 9, No. 3, 2010 1451

research articles scanner (Amersham) at 100 µm resolution using excitation/ emission wavelengths as per manufacturer’s instructions. Gel Image Analysis and Protein Identification of Gel Features. 2D-DIGE images were analyzed using Progenesis Samespots 3.2 (Nonlinear Dyanamics). This detected gel features, then warped and aligned gel replicates. Normalized spot volumes and ANOVA values between samples were calculated. Gel features with fold changes of (1.3-fold and P value < 0.05 were deemed differentially expressed and then confirmed by manual inspection. To generate gels with sufficient protein for identification by mass spectrometry, preprative gels were run. Four hundred micrograms of pooled sample was separated by 2D electrophoresis as before and proteins were detected via PlusOne silver stain (GE Healthcare). Gel features of interest that could be matched to those on the corresponding DIGE images were excised, destained with 100 mM sodium thiosulphate and 30 mM potassium ferricyanide (in a 1:1 mix), and digested with trypsin (Promega) overnight at 37 °C using an enzyme:substrate ratio of 1:50. The resulting peptides were lyophilized for storage and resuspended in 10 µL of 0.1% formic acid prior to LC-MS/ MS analysis. Eight microliters of these samples were separated using a 20 min gradient on a 150 mm × 75 µm C18 nano-LC chip coupled to a Agilent 6520 Q-ToF. Gradient elution was conducted using Buffer A (3% acetonitrile, 0.1% formic acid) and Buffer B (90% acetonitrile, 0.1% formic acid), using a flow rate of 300 nL/min under the following program: 5-35% B 0-9 min, 35-90% B 9-12 min, hold 90% B 12-15 min, 90-0% B 15-18 min, followed by column reconditioning for 2 min. The mass spectrometer was set to “Auto-MS” mode where it performed MS/MS on the 8 highest intensity precursor ions. Spectral data was searched against the NCBI IPI Human database (version 3.54) using Agilent Spectrum Mill software (RevA.03.03.082) allowing for oxidized methionine and carboxymethylated cysteine. Identifications which had a Spectrum Mill protein score of >25 (using an SPI of 60%) were accepted as valid. Where two or more proteins scored over 25, protein identifications were manually verified by pI and molecular weight using existing 2D-gel identification databases (www. uniprot.org) if possible, or otherwise declared inconclusive. In Silico Prioritization of Candidates. Proteins were considered as being “extracellular” if they fulfilled one of three criteria: (i) contained a SignalP19 site, (ii) displayed “extracellular” Gene Ontology annotation or (iii) was present in the HPPP database. Gene Ontology and SignalP sites were screened using the data mining software; “Biomart”20(version 0.7, www. biomart.org) and DAVID Bioinformatics Resource21 (http:// david.abcc.ncifcrf.gov). The HPPP database consisting of 3020 proteins was downloaded from the PeptideAtlas22 (www. peptideatlas.org/hupo/hppp/). Integration of Public Gene Expression Data. Clinical and histopathological data of all 295 patients in the van de Vijver data set23 was downloaded from Rosetta Inpharmatics (www. rii.com). The log ratios of gene expression values were extracted without modification. The corresponding genes to each candidate protein biomarker were identified (where possible) and the expression of these individual genes in tumor samples were then classified (high versus low) using a previously described method.24 Kaplan-Meier survival curves were generated for each gene of interest and the log rank test was used to test for significant correlation between gene expression levels and survival outcome. Tests scoring with a P-value of 50%. Mass Spectra Data Archival. The data is available in the PRIDE database26 (www.ebi.ac.uk/pride) under accession numbers 10531 to 10536. The data was converted using PRIDE Converter27 (http://code.google.com/p/pride-converter).

Prioritization of Candidate Protein Biomarkers

research articles

Results and Discussion Differential Protein Expression in Whole Cell Extracts and Conditioned Media from an Isogenic Model of Breast Cancer Progression. To identify proteins associated with breast cancer progression, a 2D-DIGE approach was used to separate and quantify differential protein expression within the isogenic cell line series, Hs578T (parental line) and Hs578T(i8) (invasive derivative). As our ultimate goal was to identify candidate serum biomarkers, we profiled the conditioned media in addition to analyzing the whole cell extracts of these cells. This facilitated the analysis of differential secretion in the model system. “Secretome” analysis has been a popular approach for biomarker discovery,13,28–30 as secreted or shed proteins are more likely to be released from tumors and be detectable in the bloodstream. Whole cell protein extracts were prepared from Hs578T and Hs578T(i8) cells (5 biological replicates each). Following labeling and 2D-DIGE separation, gels were scanned and analyzed using Progenesis Samespots. The software identified 104 differentially expressed gel features using a fold change cutoff of ( >1.3 fold and a P value of e0.05. Posthoc statistical analysis of the gels was conducted to ascertain the statistical power of the study, which revealed a q value of >0.8 in over 80% of the data. The q value is the false discovery rate (FDR) analogue of the p value, where the q value of an individual hypothesis test is the minimum FDR at which the test may be called significant.18 Sixty-three gel features were successfully matched to silver stained preparative gels and then excised, destained and underwent in-gel tryptic digestion. The resulting peptides were analyzed by LC-MS/MS of which 46 samples gave unique, successful protein identifications. A similar experimental design was applied to analyze the conditioned media collected from the same isogenic model system. Using identical criteria, this resulted in 51 differentially expressed gel features of which 25 yielded successful, unique protein identifications (see Figure 1 and Supporting Information). Combined, 88 gel features were successfully identified of which 64 were unique proteins (see Tables 1 and 2 for details of protein identifications in the whole cell and conditioned media extracts). This accounted for redundancy in multiple gel features being identified as the same protein and also proteins that were identified in both whole cell and secretome data sets. Of the 88 identifications, 47 represented downregulated proteins from the parental to invasive derivative cell line and 41 represented upregulated proteins. Gene Ontology, Literature Mining, and Corresponding Clinical Gene Expression of Candidate Markers. We reasoned that the prioritization of which candidates to progress toward clinical assessment could be assisted by a priori information about protein function and previous involvement with the cancer process. It may also be useful for testing the validity of one’s model system by examining if the differentially expressed pathways match those expected of the model. Proteins of interest discovered by 2D-DIGE were reviewed using Gene Ontology (GO) analysis. This revealed that differentially expressed proteins had roles in apoptosis, motility, growth, development, proteolysis and other cellular events (Figure 2). Many of the identified differentially expressed proteins have already been associated with cancer, invasion and tumor progression in previous studies such as the Cathepsins,31 STC2,32 NME1-2,32 CAPG,33 SPARC,34 STRAP,35,36 and PCOLCE.37 Several other identified proteins were also previously linked with stress and immune responses such as the

Figure 1. 2D-DIGE separation and identification of differentially expressed protein in isogenic cell line model of breast cancer progression. (Upper) Representative gel from the conditioned media experiment; (Lower) representative gel from a whole cell extract. Successfully identified gel features have been indicated with corresponding proteins. Gel features that were found to be upregulated are in green and downregulated features in red.

heat shock proteins HSPB1, HSPA8 and several chaperones. These differentially regulated processes correlate well with the expectations of a model of cancer invasion. In addition, GO analysis was used to demonstrate that successful enrichment of secreted proteins in the conditioned media was achieved. Using GO analysis, 28% of identified proteins in the conditioned media had “extracellular” gene ontology, as opposed to 6.5% in the whole cell extracts. To assess potential clinical value, the candidates were tested for significant prognostic associations using the van de Vijver et al. data set.38 This is a large publically available gene expression data set derived from transcriptomic analysis of tumors from 295 breast cancer patients. For each of the 64 candidates generated from the 2D-DIGE studies, the corresponding gene expression and clinical data were matched (where possible) and interrogated for correlations between gene expression and survival. Using this data set, the gene expression data for 48 proteins was matched of which 16 candidates showed significant positive or negative correlation with survival at the mRNA level. Upregulation of 4 of these genes was associated with good prognosis, while 12 were associated with poor prognosis (see Figure 3 and Tables 1 and 2). Although not always comparable to the prognostic 70-gene expression Journal of Proteome Research • Vol. 9, No. 3, 2010 1453

research articles

Lau et al. a

Table 1. Protein Identifications from the 2D-DIGE Whole Cell Extract Analyses protein id

ARHGDIA ARPC5 C19orf10 CAPG CAPNS1 CAPZB CCT2 CLIC1 CLIC1 COTL1 CTSD CTSD DPYSL3 DYNLRB1 EEF1B2 EEF1D EFHD2 EIF3I EIF4E EIF6 ETHE1 GLO1 GSTO1 LGALS1 LMNA M6PRBP1 MTPN NME2:NME1 NPM1 PARK7 PFDN1 PFN2 PPA1 PRDX2 PSMB4 RANBP1 RELA SH3GL2 SNRPF SPARC TMOD3 TPM4 TUBB TXN TXNDC5:MUT TXNL1 VIM YWHAE

fold pfi8

-1.6 -1.8 -1.3 -1.5 4.1 -1.5 1.4 1.3 -1.8 -1.3 1.8 -1.4 -1.3 -1.3 -1.3 -1.3 -1.3 -1.6 1.3 -1.3 1.5 -1.9 1.4 -1.7 -1.6 -1.3 -1.4 -1.9 -1.4 -1.3 -1.4 -1.4 -1.3 -1.4 -1.3 -1.5 -1.6 -2 -2.1 1.3 -1.3 1.3 -1.4 -1.6 -1.4 -1.3 -1.4 -1.7

score

120.41 54.22 104.05 176.94 64.58 171.67 502.47 182.24 85.2 134.61 32.75 28.52 360.71 31.91 95.33 156.63 83.22 177.98 35.12 101.55 47.5 48.04 140.99 89.62 534.7 237.48 69.87 132.93 176.54 83.77 45.43 47.43 184.91 61.34 112.69 77.49 38.82 178.27 37.88 95.16 243.81 54.62 366.23 47.74 150.93 192.72 395.93 275.27

peptides

7 4 2 10 4 10 26 10 5 2 2 2 21 2 6 10 5 10 2 6 4 4 8 6 30 12 5 8 11 5 3 3 11 4 6 4 3 11 2 6 15 5 19 3 9 11 24 18

AA coverage

24 25 10 28 13 21 53 39 19 10 4 4 42 18 30 14 18 39 8 33 18 16 26 35 36 32 37 44 29 25 21 12 33 19 30 12 3 25 24 13 41 23 41 22 15 36 51 56

Anova -4

4.19 × 10 0.003 0.01 4.54 × 10-4 2.29 × 10-5 0.002 6.10 × 10-4 0.002 1.34 × 10-4 0.014 0.003 4.193 × 10-4 0.034 0.037 0.006 0.034 0.025 4.29 × 10-5 0.005 0.006 0.009 4.037 × 10-6 0.03 1.27 × 10-4 1.63 × 10-4 0.036 3.87 × 10-5 5.951 × 10-7 0.002 0.004 5.66 × 10-4 0.002 0.002 0.001 0.025 5.156 × 10-6 0.001 5.26 × 10-5 0.034 0.008 0.002 0.012 1.14 × 10-4 0.002 6.16 × 10-4 6.65 × 10-4 0.003 2.70 × 10-4

van de Vijver

prognosis

accesion

Y Y N N Y Y Y N N N Y Y Y Y Y Y N N Y N Y Y Y Y Y N Y Y Y Y Y Y N Y Y Y Y Y N Y Y Y Y Y N Y Y Y

poor insignificant

IPI00794402 IPI00550234 IPI00056357 IPI00848090 IPI00025084 IPI00218782 IPI00297779 IPI00010896 IPI00010896 IPI00017704 IPI00011229 IPI00011230 IPI00029111 IPI00412497 IPI00178440 IPI00642971 IPI00060181 IPI00012795 IPI00908416 IPI00010105 IPI00003766 IPI00220766 IPI00019755 IPI00219219 IPI00021405 IPI00303882 IPI00179589 IPI00375531 IPI00549248 IPI00298547 IPI00000051 IPI00107555 IPI00015018 IPI00027350 IPI00555956 IPI00878611 IPI00386448 IPI00019169 IPI00220528 IPI00014572 IPI00005087 IPI00010779 IPI00011654 IPI00216298 IPI00171438 IPI00642032 IPI00418471 IPI00000816

insignificant insignificant poor

insignificant insignificant insignificant insignificant poor insignificant

insignificant insignificant insignificant insignificant insignificant poor

poor good good insignificant insignificant insignificant good poor insignificant insignificant insignificant insignificant insignificant insignificant insignificant poor insignificant insignificant insignificant

a Protein identifications for whole cell extracts. Where multiple gel feature identifications have been duplicated, the one with the most significant P value has been displayed. Unique peptides observed and amino acid coverage are also indicated. Protein scores were calculated using Spectrum Mill Rev A 03.03.082. It is also indicated whether proteins were successfully matched to corresponding gene expression data in the van de Vijver et al data set and whether this gene expression profile showed a significant positive or negative correlation with patient survival outcome.

signature generated from the original data set,23 these results were an early indicator that the identified proteins may have potential prognostic utility in breast cancer. In Silico Serum Protein Prediction. Our ultimate goal was to confirm the presence of candidate biomarker in clinical samples that had been identified from the in vitro system. Since blood is one of the most accessible of patient specimens, our approach was to prioritize which of the biomarker candidates would be likely to be detected in serum. To achieve this, an in silico approach was used to mine for proteins that were 1454

Journal of Proteome Research • Vol. 9, No. 3, 2010

predicted to have a signal sequence, extracellular ontology or had been previously detected in plasma. The 64 unique candidates were split into 3 groups, those that were found in conditioned media, those found in the whole cell extract and a third group composed of those that were predicted to be extracellular. “Extracellular” proteins were defined through 3 possible methods: a predicted SignalP site, “extracellular” gene ontology or presence in the HPPP database.9 Using this approach, we prioritized a subset of 26 “extracellular” proteins to be more likely to be detectable in

research articles

Prioritization of Candidate Protein Biomarkers a

Table 2. Protein Identifications from the 2D-DIGE Conditioned Media Extract Analyses protein id

fold pfi8

score

peptides

AA coverage

Anova

van de Vijver

ACTG1 ATP5B C17orf13 CAPG CCT2 COL1A1 COL1A2 COTL1 CTSB CTSD CTSL1 EIF5A FABP5 HSPA8 HSPB1 NME2:NME1 PCOLCE PA2G4 PSMB6 SPARC STC2 STRAP TPI1 TXN VCP

1.6 1.8 2.6 1.4 1.3 -1.6 -1.5 1.7 -4.4 1.7 1.9 1.7 2 1.8 1.7 1.5 -1.4 1.5 1.5 1.4 1.5 2.1 1.6 1.9 1.6

190.06 213.53 43.67 153.88 167.51 126.11 125.08 88.99 60.73 190.2 144.33 51.11 108.8 387.87 186.41 141.72 126.75 174.81 77.19 199.95 40.03 224.88 87.23 29.59 77.57

12 13 4 10 12 7 8 6 5 11 8 4 7 23 13 9 7 12 6 12 3 14 7 2 9

36 26 4 25 23 4 5 26 12 26 21 20 45 34 48 46 17 24 21 32 8 50 28 20 7

0.003 0.005 0.013 0.004 0.013 0.008 3.58 × 10-4 4.758 × 10-5 1.775 × 10-4 3.58 × 10-4 2.86 × 10-4 4.42 × 10-4 0.01 3.03 × 10-7 0.002 0.002 0.011 0.003 0.012 0.013 0.009 0.002 2.08 × 10-5 0.003 1.38 × 10-4

N N N N Y N Y N Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y

prognosis

poor insignificant insignificant insignificant poor insignificant poor insignificant insignificant poor insignificant insignificant poor insignificant good poor poor poor insignificant

accesion

IPI00021440 IPI00303476 IPI00296165 IPI00027341 IPI00297779 IPI00297646 IPI00873137 IPI00017704 IPI00295741 IPI00011229 IPI00012887 IPI00376005 IPI00007797 IPI00003865 IPI00025512 IPI00375531 IPI00299738 IPI00807557 IPI00000811 IPI00014572 IPI00008780 IPI00294536 IPI00465028 IPI00216298 IPI00871453

a Protein identifications for conditioned media. Where multiple gel feature identifications have been duplicated, the one with the most significant P value has been displayed. Unique peptides observed and amino acid coverage are also indicated. Protein scores were calculated using Spectrum Mill Rev A 03.03.082. It is also indicated whether proteins were successfully matched to corresponding gene expression data in the van de Vijver et al data set and whether this gene expression profile showed a significant positive or negative correlation with patient survival outcome.

Figure 2. Biological processes related to the identified proteins. Gene ontology analysis of all 64 unique proteins was conducted using DAVID tools.21 With the exception of “cellular proliferation” and “cellular homeostasis” (P value 0.07 and 0.06, respectively), all indicated fields showed a significant enrichment of P value