Secretome Analysis of Multiple Pancreatic Cancer Cell Lines Reveals Perturbations of Key Functional Networks Silvia Schiarea,† Graziella Solinas,‡ Paola Allavena,‡ Graziana Maria Scigliuolo,† Renzo Bagnati,† Roberto Fanelli,† and Chiara Chiabrando*,† Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche “Mario Negri”, Via La Masa 19, 20156 Milano, Italy, and Department of Immunology and Inflammation, IRCCS Istituto Clinico Humanitas, Via Manzoni 113, Rozzano 20089 Milano, Italy Received February 8, 2010
The cancer secretome is a rich repository in which to mine useful information for both cancer biology and clinical oncology. To help understand the mechanisms underlying the progression of pancreatic cancer, we characterized the secretomes of four human pancreatic ductal adenocarcinoma (PDAC) cell lines versus a normal counterpart. To this end, we used a proteomic workflow based on high-confidence protein identification by mass spectrometry, semiquantitation by a label-free approach, and network enrichment analysis by a system biology tool. Functional networks significantly enriched with PDACdysregulated proteins included not only expected alterations within key mechanisms known to be relevant for tumor progression (e.g., cell-cell/cell-matrix adhesion, extracellular matrix remodeling, and cytoskeleton rearrangement), but also other extensive, coordinated perturbations never observed in pancreatic cancer. In particular, we highlighted perturbations possibly favoring tumor progression through immune escape (i.e., inhibition of the complement system, deficiency of selected proteasome components within the antigen-presentation machinery, and inhibition of T cell cytoxicity), and a defective protein folding machinery. Among the proteins found concordantly oversecreted in all of our PDAC cell lines, many are reportedly overexpressed in pancreatic cancer (e.g., CD9 and Vimentin), while others (PLOD3, SH3L3, PCBP1, and SFRS1) represent novel PDAC-secreted proteins that may be worth investigating. Keywords: proteomics • secretome • mass spectrometry • label-free quantitation • cell culture • pancreatic cancer • network enrichment analysis
Introduction Cancer is a highly heterogeneous disease that still represents a major worldwide cause of morbidity and mortality despite the successful management of some tumor types. A better understanding of biological features that are common or peculiar to different tumors could help devise new targeted therapeutic approaches and allow the identification of specific and sensitive prognostic/predictive biomarkers for early diagnosis and tumor progression monitoring. This would be particularly relevant for pancreatic cancer, a top-ranked cause of cancer deaths in both the U.S.A. and Europe.1,2 The extremely high mortality rate (>80%) of pancreatic cancer is mainly due to early metastasis,3 resistance to conventional treatments,4 and lack of recognizable symptoms and tests for early detection.5 A recently devised strategy to gain better knowledge of the molecular actors involved in tumor progression is the inves* To whom correspondence should be addressed. Chiara Chiabrando, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche “Mario Negri”, Via La Masa 19, 20156 Milano, Italy. Tel: +390239014497. Fax: +39-0239014735. E-mail:
[email protected]. † Istituto di Ricerche Farmacologiche “Mario Negri”. ‡ IRCCS Istituto Clinico Humanitas.
4376 Journal of Proteome Research 2010, 9, 4376–4392 Published on Web 07/15/2010
tigation of the “cancer secretome”.6,7 The analysis of tissueproximal fluids or conditioned media of cell lines allows in fact to identify proteins abnormally secreted or shed by neoplastic cells7-10 which may play a role in favoring tumor growth and spread. The signaling pathways involved in the process of cancer initiation and progression are in fact not restricted to neoplastic cells but often include the tumor-host interface.11 The concept that cancer cells secrete factors able to alter the adjacent stroma toward a permissive and supportive environment for tumor progression is now well established.12-14 The attempt to unravel mechanisms of tumor cell-stroma interactions is particularly relevant for pancreatic ductal adenocarcinoma (PDAC), the most common type of pancreatic cancer, because the disease is characterized by the presence of a dense stroma.15,16 Different therapeutic strategies targeting proteins that are responsible for tumor-microenvironment interactions represent promising tools for treating pancreatic cancer. For example, some current approaches are aimed at disabling PDAC-associated fibroblasts, including a DNA vaccine against a fibroblast activation protein,17 or at blocking the TGFβ signaling cascade,18 or inhibiting COX-2, a protein expressed by tumor and inflammatory cells which stimulates inflammatory pathways and angiogenesis.15 10.1021/pr1001109
2010 American Chemical Society
Secretome Analysis of Pancreatic Cancer Cell Lines Mass-spectrometry-based proteomics coupled to bioinformatic tools, including innovative system biology platforms, offers the current most powerful means to characterize cell secretomes. The global study of secretomes is now frequently addressed with different mass spectrometry-based approaches8,9,19-23 to find proteins oversecreted by cancer cells with the potential to become valid cancer biomarkers. We present here a proteomic workflow to globally compare secretomes of multiple cancer cell lines of common histological origin, with the aim of highlighting protein networks that are consistently perturbed in relation to a normal counterpart. Cell lines were chosen as a model for cancer secretome analysis, because they represent an attractive alternative to tumorsecreted fluids that would be difficult to access and easily contaminated by stromal components. Moreover, pancreatic cancer cell lines have been shown to retain gene expression patterns similar to those of the primary tumors across time and between institutions.24 Given that cancer cell lines of common histological origin often show a great deal of biological diversity,25 likely reflecting both the interindividual host diversity of the starting tumor specimens, and the known genotypic/phenotypic heterogeneity of neoplastic cells within a tumor,26 the analysis of multiple cell lines is required to generalize the conclusions. We therefore needed to obtain a comprehensive, high-confidence, semiquantitative description of the secretomes under study, allowing us to appreciate, with sufficient accuracy and broad dynamic range, the common relevant features that are peculiar to the cancer cell lines. When combining the various analytical and logical strategies of this workflow, we considered the high quality of mass spectral data and high confidence of protein identification as the most important features, and thus used state-of-the-art instrumentation (LTQ-Orbitrap) and strict identification parameters. Similarities and differences across the secretomes were assessed by a convenient label-free semiquantitative index (emPAI).27 This strategy is an attractive choice for an initial assessment of perturbations in the proteome, because it is simple and fast, and therefore applicable to multiple samples. Moreover, label-free strategies correlate well with protein abundance in complex samples.27-29 Another feature we considered important is the possibility to easily recover sequence information about the different protein species identified (e.g., full length vs splice variants, isoforms or fragments). This protocol enables in fact to combine 1DE migration information with MS/MS data from gel bands, a type of information normally unavailable in shotgun gel-free proteomic methods. Using a network enrichment analysis tool, we finally sought to highlight the functional connections of the dysregulated proteins in PDAC cell line secretomes. The final aim of this exploratory survey was in fact to reveal PDAC perturbed functional networks possibly relevant for tumor progression, which could later be tested as targets for novel tumor-specific therapies, and a source of candidate biomarkers.
Materials and Methods Cell Culture. The human pancreatic ductal carcinoma cell lines AsPC1, MiaPaCa2, and PANC1 were obtained from American Type Culture Collection (ATCC, Manassas, VA). The PT45 cell line was kindly given by Professor Aldo Scarpa, (University of Verona, Italy). The immortalized epithelial cell line derived from normal human pancreatic ducts, HPDE6, was obtained from Dr. Ming-Sound Tsao (University of Toronto,
research articles Toronto, Ontario, Canada). HPDE6 has been previously shown to maintain the phenotypic and genotypic characteristics of normal human pancreatic ducts.30 Cells were cultured at 37 °C in RPMI 1640 supplemented with 5% fetal bovine serum (FBS, Lonza BioWhittaker, Basel, Switzerland). Once the cells were grown to 90% confluence, the medium was discarded and flasks were rinsed seven times with sodium chloride 0.9% (Bieffe Medital, Lugano Switzerland) to reduce the amount of FBS-derived proteins.31 Cells were then incubated at 37 °C with fresh FBS-free RPMI, and after 24 h the cell-conditioned medium (CM) was collected. Cell viability was tested after 24 h starvation of identically cultured cells, (with or without the washing steps) by collecting CM and counting dead cells stained by Trypan blue. Contamination of tumor cells with Mycoplasma was rigorously ruled out by PCR throughout the study, and only Mycoplasma-free cultures were considered. Sample Preparation. The CM from three flasks per cell line (each containing about 2.5 × 107 cells with 35 mL CM) was collected, pooled, added with protease inhibitors mixture (complete EDTA-free protease inhibitor cocktail tablets, Roche, Mannheim, Germany), and filtered on a 0.2 µm filter (ALBET, Barcellona, Spain). A portion (10 mL) of the pooled CM from each of the five cell lines under study was concentrated using 3 kDa MWCO Amicon Ultracentrifugal filter devices (Millipore, Billerika, MA) at 5000× g (4 °C). Each retentate (0.2 mL) was then buffer-exchanged with water three times, transferred into a 5 kDa MWCO Ultrafree-0.5 Centrifugal Filter Unit (Millipore, Billerika, MA), and concentrated to 20 µL (12 000× g, 4 °C). Protein concentration was measured with a modified Bradford method32 (Bio-Rad Protein microassay). One-Dimensional Gel Electrophoresis (1DE). Samples (20 µg protein) were loaded and run on a 1-mm thick, 4-12% NuPAGE Bis-Tris gel (Invitrogen, Carlsbad, CA) using NuPAGE MOPS SDS Running Buffer. The gel was fixed, and stained using Colloidal Coomassie blue (Invitrogen), according to manufacturer’s instructions. The gel was washed overnight in water. Protein In-Gel Digestion. Each gel lane was cut manually with a sterile surgical blade into 24 bands of equal height (about 3 mm). In-gel protein digestion was performed according to Shevchenko et al.,33 with minor modifications. Briefly, each band placed into an Eppendorf tube and crushed into very small fragments, was submitted to the following steps: (1) washing with 150 µL water and then dehydration with 150 µL acetonitrile (AcN) twice; (2) reduction of cysteine residues with 10 mM dithiothreitol in 0.1 M ammonium bicarbonate (AmBic) for (30 min, 56 °C), then washing three times with AcN; (3) alkylation of reduced cysteine residues with 55 mM iodoacetamide in 0.1 M AmBic (20 min, RT, in the dark), then washing three times with 0.1 M AmBic; (4) dehydration three times with 100% AcN; (5) drying in a vacuum centrifuge. The digestion buffer consisted of 12.5 ng/µL sequencing-grade bovine trypsin (Roche, Mannheim, Germany) in 50 mM AmBic with 5 mM CaCl2. Enough digestion buffer to cover the gel pieces was added to each tube, and samples were incubated one hour at 4 °C. The digestion buffer not absorbed by the gel plugs was removed, and replaced by 20 µL of the same buffer without trypsin. Digestion was performed overnight at 37 °C. To maximize peptide recovery, after collecting the peptide-rich supernatant, the gel plugs were extracted twice (37 °C, 15 min) with 60 µL of 5% formic acid in AcN. The two peptide extracts were pooled, dried and redissolved in the supernatant previously collected (20 µL). This final sample, hereafter referred to Journal of Proteome Research • Vol. 9, No. 9, 2010 4377
research articles as “digest”, thus contained all peptides recovered from the digestion of a single gel-band. Liquid Chromatography-Tandem Mass Spectrometry. Aliquots (2 µL) of the 120 digests obtained from the CM of the 5 celllinesweredirectlyanalyzedbyliquidchromatography-tandem mass spectrometry (LC-MS/MS). Analyses were carried out with an LTQ Orbitrap XL (Thermo Scientific, Waltham, MA) interfaced with a 1200 series capillary pump (Agilent, Santa Clara, CA). Peptides were separated on a Thermo Scientific Biobasic 18 column (150 × 0.18 mm ID, particle size 5 µm). LC conditions were: column flow 2 µL/min; eluent A, H2O and 0.1% formic acid; eluent B, AcN and 0.1% formic acid; gradient program, from 2% of B to 60% of B in 40 min, then to 98% of B in 6 min for 4 min, and re-equilibration to 2% of B for 24 min. MS conditions were: source DESI Omni Spray (Prosolia, Indianapolis, IN) used in nanospray mode with positive ions; ion spray voltage, 2100 V; interface, capillary temperature, 220 °C; capillary voltage, 42 V. MS spectra (m/z 400-2000) were acquired in the Orbitrap analyzer at 60 000 resolution, in parallel with low-resolution MS/MS scans of the four most abundant precursor ions being acquired in the LTQ, excluding singly charged ions. The lock-mass option was used to obtain the most accurate mass measurements in MS mode. The polydimethylcyclosiloxane ion generated in the electrospray process from ambient air (protonated (Si(CH3)2O)6, m/z 445.120025) was used for internal recalibration in real time. MS/ MS analysis was performed in data-dependent mode using XCalibur software (Thermo Scientific), with dynamic exclusion and 30 s exclusion duration.34 Preparing MS/MS Data for Protein Identification. All individual MS/MS spectra in an LC run were exported into dta files by Bioworks browser 3.3.1 (Thermo Scientific, Waltham, MA). MS/MS spectra originating from the same precursor ion were grouped together using a tolerance of 2 ppm for MS/MS spectra with at least 100-counts intensity and at least 10 ions, with automatic assignment of charge state. Dta files were merged, and submitted as an “mgf” file to the search engine Mascot (in-house version 2.2, Matrix Science, Boston, MA). In a classical “1DE-centric” approach, the 120 mgf files obtained from individual gel bands digests were first analyzed by separate Mascot searches. Then, a “Cell line-centric” approach was also used by concatenating into a single mgf file the 24 LC-MS/MS runs derived from each single cell line, and searching it with Mascot as a whole, to simplify the comparison of results between secretomes and to allow peptides from any protein that had not migrated exactly within a single (blindly cut) gel slice to be included in the same search, thus increasing the probability of its identification. “Cell line-centric” results were those used for protein identification and emPAI assignment, while the “1DE-centric” approach served for collecting other important qualitative information about the proteins of interest. In addition, these individual Mascot searches served also to check the correspondence of actual 1DE migration with the theoretical molecular weight of identified proteins. Protein Identification. Mascot searches were performed against the Swiss-Prot database, version 56.5. Search parameters were: “Human” (20 411 sequences) or “Other Mammalia” (12 284 sequences) taxonomy; no restriction on molecular weight (MW) or isoelectric point; enzyme, trypsin (one missed cleavage allowed); fixed modification, carbamidomethylation of cysteine; variable modification, oxidation of methionine; experimental mass values, monoisotopic; peptide mass tolerance, ( 2 ppm; MS/MS mass tolerance, ( 1 Da; peptide charge, 4378
Journal of Proteome Research • Vol. 9, No. 9, 2010
Schiarea et al. 2+, 3+, 4+; decoy search, active. After obtaining the protein identification results from the “cell line-centric” searches, several validation criteria were applied in order to obtain highconfidence protein lists, as follows. Mascot results were reformatted by changing the default settings to more stringent parameters. The default significance threshold (p < 0.05) was set to p < 0.0001 to considerably decrease the False Discovery Rate (see Results). The cutoff for peptide ion score was set to g31, a value ensuring an identification confidence >99.9%. Among the identified proteins listed in the reformatted results, only those identified by at least two nonredundant peptides were accepted as valid. Identification of a specific protein isoform was claimed only if at least two valid peptides were identified which specifically differentiated this singular isoforms from other identified isoforms. Stringent criteria were also used to eliminate contaminating proteins from our secretome lists. In particular, all keratins were considered of potential environmental origin and eliminated from the protein list. Bovine proteins from residual FBS were excluded by a strategy described below. FBS-Derived Proteins. Elimination from the final protein list of residual FBS-derived bovine proteins was achieved by both experimental controls and protein identification strategies. FBS was analyzed by 1DE and LC-MS/MS, in conditions identical to those used for the various cell CM. To select for MS analysis an amount of FBS representative of the actual FBS-contamination of our CMs, scalar amounts of diluted FBS were first fractionated by 1DE. An FBS lane with the albumin band of intensity similar to that of the five CM under study (as evaluated by image software Same Spots, NonLinear Dynamics, Durham, NC) was chosen for MS analysis. The amount of FBS loaded in this lane (0.04 µL FBS in loading buffer) would correspond to an approximate residual FBS in the CM of 0.004%. LC-MS/ MS data obtained from this FBS sample, analyzed by Mascot as described above, were used to resolve dubious cases where a protein identified as “Human” in a CM sample was also identified as “Bovine” with an equal number of identical peptides. Such proteins of unresolved taxonomy were deleted from the secretome list if identified in the FBS sample. Validation of Protein Lists. To further validate the protein lists generated by Mascot search, MS/MS data were also analyzed with an additional search engine based on a different search algorithm, Spectrum Mill version A.03.03.084 SR4 (Agilent, Santa Clara, CA). MS/MS data file (.raw) were extracted with Spectrum Mill Data Extractor using the following parameters: 600-4500 (min-max mass); sequence tag “on”, length >1. Where no charge state was assigned during extraction, 2+, 3+ and 4+ charge states were considered during searches. Searches were carried out against the Swiss-Prot database in both forward and reverse direction, using the following search parameters: fixed modification, carbamidomethylation of cysteine; variable modification, oxidation of methionine; enzyme, trypsin with one missed cleavage allowed; precursor mass tolerance, ( 0.01 Da; fragment ion tolerance, ( 1 Da; instrument type, ESI linear ion trap. The initial results were autovalidated using the following parameters for the “protein details” mode: SPI (Scored Peak Intensity) >70% for matches with score >7 for 2+, and >9 for 3+ and 4+ ions, and SPI > 90% for matches with score >6 for 2+ ions. A second autovalidation step was done in “peptide mode” using as validation criteria a score >13 and SPI >70%. In addition, both autovalidation steps required a forward-reverse score >2 for all peptide chargestates. Only proteins identified by at least 2 valid peptides were
research articles
Secretome Analysis of Pancreatic Cancer Cell Lines considered. A few proteins, among those identified by the stringent Mascot analysis, were not confirmed by this stringent Spectrum Mill search and were thus removed from the final protein list. This final high-confidence data set was used for all subsequent semiquantitative comparison of secretomes, and for their functional enrichment analysis by bioinformatics tools. Rapid Protein Data “Visualization”. The capabilities for comparative visualization of proteomic data of the Scaffold program (version 2_04_00, Proteome Software Inc., Portland, OR) were used (1) to rapidly check the correspondence of actual 1DE migration with the theoretical MW of an identified protein, and/or of its known splice variants, isoforms, fragments and (2) to easily verify the colocalization (within the gel bands of a given secretome) of peptides belonging to the entire protein sequence vs particular sequence regions, and thus substantiate the presence of an hypothesized protein species/fragment. Protein Localization. To distinguish secreted from intracellular proteins among those identified in the CM, we used several algorithms or knowledge databases that predict or report protein localization. The identified proteins were classified as “Secreted Classical” if they had a signal peptide predicted by SignalP 3.0 software (http://www.cbs.dtu.dk/ services/SignalP/) or if they were classified as “Extracellular” by UniProtKB database (http://www.uniprot.org/); “Secreted Non classical” if so defined by SecretomeP 2.0 software (http:// www.cbs.dtu.dk/services/SecretomeP/); “Plasma Membrane” if so defined by UniProtKB database; “Intracellular” if not predicted to be secreted by both SignalP and SecretomeP softwares, and defined as intracellular by UniProtKB database. Label-Free Estimation of Protein Amounts by emPAI. Relative protein amounts in the secretomes were estimated by using the exponentially modified protein abundance index (emPAI),27 a label-free semiquantitative index directly obtained from the Mascot search results. The emPAI parameter takes into account the number of identified peptides, normalized against the number of identifiable peptides for a given protein. The formula is emPAI ) 10Nobserved/Nobservable - 1, where Nobserved is the number of experimentally observed peptides and Nobservable is the calculated number of observable peptides for each protein.27 Mascot estimates the number of observable peptides on the basis of protein mass, average of amino acid composition of the database, and the enzyme specificity (http:// www.matrixscience.com/help/quant_empai_help.html). The semiquantitative emPAI parameter has the following characteristics: (1) good correlation with protein abundance in complex samples,27 (2) broad linear dynamic range (4 order of magnitude),27 (3) accuracy lying in a limited error range (10. Proteins Identified in the Five Secretomes. The full list of 1854 proteins identified in the 5 cell line secretomes is reported
research articles in Supplementary Table 4S (Supporting Information), with the main elements in the Mascot results (UniProtKB Description, Accession Number and ID, molecular mass, emPAI value, number of valid peptides, Mascot protein score, and sequence coverage). Supplementary Table 5S (Supporting Information) lists all peptide sequences assigned with their precursor charge, mass/charge, mass error and modifications observed. The final protein list of 790 unique proteins (Supplementary Table 6S, Supporting Information) includes 134 proteins common to all 5 lines, 10 common to the 4 cancer cell lines only, and some restricted to single cell lines (HPDE6, 85; Aspc1, 148; MiaPaca2, 28; Panc1, 44; PT45, 53). Interestingly, the only cell line of ascitic origin has the largest number of unique proteins. A subset of proteins were similarly expressed (CV < 30%) across the five secretomes (Supplementary Figure 5S, Supporting Information), with strong correlation (R2 ) 0.98) between emPAI in HPDE6 and mean emPAI in the 4 cancer cell lines (Supplementary Figure 6S, Supporting Information). Using the “batch retrieval” tool and “customize display” feature of the UniprotKB database (http://www.uniprot.org), we checked ifsamong our proteinsssome might still lack the “Evidence at protein level” qualifier. We indeed found that 10 of our proteins still have only “Evidence at transcript level” as proof of existence (Supplementary Table 7S, Supporting Information). Cellular Localization Prediction. To assess whether the proteins identified in the CM were bona fide secreted proteins, their cellular localization was investigated. A total of 66% of the proteins in our list are known or predicted to be either secreted or plasma membrane proteins (Supplementary Table 6S, Supporting Information). Moreover, 16% of the proteins classified as intracellular in our list were reportedly secreted via exosomes in proteomic studies.43-47 Exosomes contain a mixture of proteins that retain a large part of the protein repertoire of the producing cell, the most common belonging to the families of antigen presentation, integrins, immunoglobulins, cell surface peptidases, tetraspanins, heat shock proteins, cytoskeletal proteins, membrane transport and fusion, signal transduction and metabolic enzymes.48 It has been shown recently that tumor proteins secreted via exosomes play an important role in cancer progression and dissemination by, for example, inducing dysfunction or death of immune effector cells49,50 and promoting neo-angiogenesis.51 Considering the high percentage of proteins that were predicted/known to be secreted in our data set, the importance for tumor growth of proteins secreted by poorly characterized pathways such as exosomes, and the low probability that proteins in these secretomes derived from cell-lysis (cell mortality rate