Multidimensional Identification of Tissue ... - ACS Publications

J. Proteome Res. , 2012, 11 (6), pp 3405–3413. DOI: 10.1021/pr300212g. Publication Date (Web): April 25, 2012 .... Expert Review of Proteomics 2014 ...
0 downloads 0 Views 5MB Size
Article pubs.acs.org/jpr

Multidimensional Identification of Tissue Biomarkers of Gastric Cancer Tiannan Guo,†,‡ Lingling Fan,§ Wai Har Ng,‡ Yi Zhu,† Mengfatt Ho,‡ Wei Keat Wan,∥ Kiat Hon Lim,∥ Whee Sze Ong,⊥ Sze Sing Lee,‡ Shiang Huang,§ Oi Lian Kon,*,‡ and Siu Kwan Sze*,† †

School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore 637551 Division of Medical Sciences and ⊥Division of Clinical Trials and Epidemiological Sciences, Humphrey Oei Institute of Cancer Research, National Cancer Centre Singapore, 11 Hospital Drive, Singapore 169610 § Center for Stem Cell Research & Application, Union Hospital, Huazhong University of Science and Technology, Wuhan, P.R. China 430022 ∥ Pathology Department, Singapore General Hospital, Outram Road, Singapore 169608 ‡

S Supporting Information *

ABSTRACT: Gastric cancer remains highly fatal due to a dearth of diagnostic biomarkers for early stage disease and molecular targets for therapy. Plasma membrane proteins, including cluster of differentiation (CD) proteins and receptor tyrosine kinases (RTKs), are a rich reservoir of biomarkers. Recognizing that interrogating plasma membrane proteins individually overlooks extensive interactions among them, we have systematically investigated the membrane proteomes and transcriptomes of six gastric cancer cell lines. Our data revealed aberrantly high expression of proteins whose functions accurately reflect the clinical phenotype of gastric cancer, and prioritized critical RTKs and CD proteins in gastric cancer. Expression of selected surface proteins was confirmed by flow cytometry and immunostaining of clinical gastric cancer tissues. Close to 90% of the gastric cancer tissues in a cohort showed up-regulation of at least one of four proteins, that is, MET, EPHA2, FGFR2, and CD104/ITGB4. All intestinal type gastric cancer tumors in this cohort overexpressed at least one of a panel of three proteins, MET, FGFR2, and EPHA2. This study reports the first quantitative global landscape of the surface proteome of gastric cancer cells and provides a shortlist of gastric cancer biomarkers. KEYWORDS: gastric cancer, plasma membrane, proteome, transcriptome, tissue microarray



INTRODUCTION Gastric cancer (GC) survival has not improved as much as for other common cancers, for example, breast, colorectal, and cervical cancers, despite several decades of steadily declining incidence.1 With about one million new cases each year, its troublingly high mortality rate continues to motivate much research. Among several reasons for the relatively slow progress in reducing GC’s lethality is the dearth of markers for early stage GC and curative anticancer agents. However, recent evidence of a modest survival benefit from trastuzumab treatment in advanced GC has been reported.2 Trastuzumab blocks signaling via HER2, a plasma membrane (PM) receptor tyrosine kinase (RTK) that is overexpressed in several different cancer types. Thus, the observation that a subset of GC patients benefited from HER2 blockade suggests that PM-directed targeted therapy for GC has relatively untapped potential. PM proteins are collectively encoded by 20−30% of all genes in the human genome and make up about half the mass of the PM.3,4 They are highly heterogeneous and mediate many © 2012 American Chemical Society

important functions that underpin oncogenesis, viz. signal transduction, cell proliferation, apoptosis and metabolism. Indeed, the phenotype of cancer cells is distinguished by marked changes in expression levels and post-translational modifications of PM proteins. Their location at the boundary between intracellular processes and the external milieu moreover makes PM proteins readily accessible and, thus, particularly suited as disease biomarkers and therapeutic targets. Sixty percent of anticancer agents (e.g., gefitinib, cetuximab) attenuate oncogenic cell signaling by targeting specific PM proteins.5 The combination of accessibility with their key roles in oncogenesis is exploited by individualizing cancer therapy to aberrations in specific PM proteins (HER2, EGFR) expressed by tumors. A subset of non-small cell lung cancers that express activating mutations of EGFR respond to treatment by tyrosine kinase inhibitors.6 Sensitivity to EGFR inhibitors is highly correlated with the presence of wild-type KRAS.7 In the same Received: March 5, 2012 Published: April 25, 2012 3405

dx.doi.org/10.1021/pr300212g | J. Proteome Res. 2012, 11, 3405−3413

Journal of Proteome Research

Article

Japan. Each cell line was cultured as recommended by the respective sources in the presence of 100 U penicillin and 100 μg of streptomycin per milliliter (Invitrogen, Carlsbad, CA). All cell lines were authenticated by short tandem repeat (STR) genotyping (DNA Diagnostic Center, Fairfield, OH).

cancer type, chromosomal aberrations cause constitutive activation of another RTK, anaplastic lymphoma kinase, that has become a new therapeutic target.8 HER2 overexpression in some breast and gastric cancers similarly render these tumors susceptible to the antiproliferative effects of kinase inhibition.9 Two groups of PM proteins merit special attention. The cluster of differentiation (CD) molecules are a group of 389 human proteins that mediate vital functions in many cell types. They include protein kinases, receptors, ligands and enzymes. Expression patterns of CD antigens on cancer cells have been proposed as rational, cancer-specific diagnostic and prognostic biomarkers, as well as potential molecular targets for therapeutic intervention.10 However, the expression of most CD molecules in solid tumors is largely unknown. Receptor tyrosine kinases (RTKs), the most extensively studied PM proteins, comprise 518 members classified into 20 subfamilies based on kinase domain sequences.11 RTKs are prominent diagnostic and therapeutic biomarkers of several types of cancers whose cells express mutationally activated, overexpressed or amplified RTKs that initiate and/or sustain malignant transformation.12 The inhibitors developed against EGFR, HER2, KIT, ALK, and VEGFR are exemplars of diseasespecific RTK biomarkers that are both diagnostic and therapeutic. Extensive interplay among proteins in cancer cells cautions against oversimplifying the quest for biomarkers. Single biomarkers rarely suffice as biological signatures of malignancies because cancer-associated proteins are heterogeneously expressed. Thus, a successful strategy for discovering biologically relevant GC biomarkers should not be constrained to a few empirically hypothesized proteins. For instance, extensive crosstalk among PM-initiated oncogenic signaling pathways, including the capacity of signaling networks to compensate for and overcome a single point of inhibition, is known to be a significant cause of the ultimate failure of monotherapy directed at only a single RTK. Acquired resistance is therefore common and limits the curative potential of treatment directed at a single RTK. Hence, combination treatments that concurrently inhibit multiple signaling pathways are being investigated as a strategy to subvert the onset of acquired drug resistance.13,14 Rational selection of effective combination therapies requires a global understanding of the kinome of specific cancer types. The surface proteome of GC has not been comprehensively defined to date. Mass spectrometry (MS)-based proteomics, a proven technology for developing systematic inventories of membrane proteins, has been used for large-scale analysis of PM proteins in cancer cells.15,16 Given that a comprehensive and unbiased inventory of the GC membrane proteome has yet to be reported, here we systematically investigated PM proteins from multiple GC cell sources using high-throughput technologies and computational methods. Integrated analysis of transcriptome and LC−MS/MS-based membrane proteome data sets revealed insights into membrane proteins at both mRNA and protein levels. This study has generated the first global quantitative view of the membrane proteome in GC validated by examination of clinical GC tissues.



Membrane Protein Enrichment and Digestion

Each culture of the GC cell lines above was lysed using HES buffer (20 mM HEPES, pH 7.4, 1 mM EDTA, 250 mM sucrose) supplemented with protease inhibitors as described previously.17 Cell lysates were diluted with 0.1 M Na2CO3, pH 11, and incubated at 4 °C with gentle rotation for 1 h.18 The suspension was centrifuged for 45 min at 250 000g and 4 °C. The resulting membrane pellet was washed twice with Milli-Q water and centrifuged for 30 min at 250 000g. The washed pellet was dissolved in 2% SDS. About 0.5 mg of membrane protein was resolved in SDS-PAGE and excised into 15 slices for tryptic digestion prior to LC−MS/MS analysis. LC−MS/MS Analysis

Membrane protein digests from each cell line were analyzed in an LTQ-FT Ultra mass spectrometer (Thermo Fisher, Waltham, MA) coupled to a Prominence HPLC unit (Shimadzu, Kyoto, Japan) as described previously.17 Briefly, peptide samples were reconstituted in 100 μL of 0.1% formic acid (FA) in HPLC water, injected from an autosampler (Shimadzu), concentrated in a Zorbax peptide trap (Agilent Technologies, Santa Clara, CA), and subsequently resolved in a capillary column (200 μm i.d. × 10 cm) packed with C18 AQ (5-μm particle size, 100-Å pore size, Michrom BioResources, Auburn, CA) at a flow rate of about 300 nL/min. Buffer A (0.1% FA in HPLC water) and buffer B (0.1%FA in acetonitrile) were used for 60-min gradients, starting at 5% acetonitrile for 5 min, followed by acetonitrile ramping from 5% to 30% over 40 min, maintaining at 80% for 5 min, and finally 5% for 10 min. The samples were ionized in an ADVANCE CaptiveSpray Source (Michrom BioResources) with an electrospray potential of 1.5 kV. The LTQ-FT Ultra was set to perform data acquisition in the positive ion mode. A full MS scan (350−1600 m/z range) was acquired in the FT-ICR cell at a resolution of 100 000 and a maximum ion accumulation time of 1000 ms. The automatic gain control target for FT was set at 1 × 106, and precursor ion charge state screening was activated. The linear ion trap was used to collect peptides and to measure peptide fragments generated by CID. The default automatic gain control setting was used (full MS target at 3.0 × 104, MSn at 1 × 104) in the linear ion trap. The 10 most intense ions above a 500-count threshold were selected for fragmentation in CID (MS2), which was performed concurrently with a maximum ion accumulation time of 200 ms and a dynamic range of 30 s. For CID, the activation Q was set at 0.25, isolation width (m/z) was 2.0, activation time was 30 ms, and normalized collision energy was 35%. Protein Identification by MS

The MS raw files were converted to mzXML format and mgf format using Trans-Proteome Pipeline. Protein database search was performed by uploading mgf files to an in-house Mascot cluster server (version 2.2.07) (Matrix Science, Boston, MA) against a concatenated target and decoy (reversed sequences of the target database) version of nonredundant UniProt Knowledgebase protein sequence database (40 516 sequences, downloaded on October 8, 2010). The search was limited to a maximum of 2 missed trypsin cleavages; no. 13C of 2; mass

MATERIALS AND METHODS

GC Cell Lines

AGS, Kato III, SNU1, and SNU5 were from American Type Culture Collection (Manassas, VA). MKN7 and IM95 were from Japan Health Science Research Resource Bank, Osaka, 3406

dx.doi.org/10.1021/pr300212g | J. Proteome Res. 2012, 11, 3405−3413

Journal of Proteome Research

Article

normal (N) tissues were compared. T > N and T < N denote higher or lower staining, respectively, in GC compared to matched normal gastric epithelium. T = N denotes equal staining in tumor and matched normal tissues.

tolerance of 20 ppm for peptide precursors; and 0.8 Da mass tolerance for fragment ions. Fixed modification was carbamidomethyl at Cys residues, while variable modification was oxidation at methionine residues. PeptideProphet19 and ProteinProphet20 from Trans-Proteome Pipeline (TPP) were employed to estimate false discovery rates at both peptide and protein levels. Only protein groups with a probability above 0.9 were considered as identifications. False discovery rate of protein identification was estimated at below 1% by receiver operating characteristic curves for each cell line. Membrane proteins were defined according to Gene Ontology (GO) annotation21 and transmembrane topology using TMHMM (version 2.0).3 Relative abundance of proteins identified in GC cell lines was estimated by the normalized spectral index (SIN) method.22

Statistical Analyses

Statistical tests were performed in R (version 2.13.0). Fisher’s exact 2-tail test was performed using the function “fisher.test”. The function “cor” was used to calculate correlation coefficients.



RESULTS

Gastric Cancer Surface Proteomes

Membrane proteomes of six GC cell lines (AGS, SNU1, IM95, SNU5, Kato III, and MKN7) were investigated using LC−MS/ MS analysis of enriched membrane proteins. The confidence of protein identifications was ensured by matching MS spectra to nonredundant UniProt Knowledgebase protein sequence database using Mascot search engine, followed by PeptideProphet and ProteinProphet qualification. Membrane proteins were specified based on sequence prediction by TMHMM computation and Gene Ontology annotations. Both approaches identified a total of 1473 membrane proteins, of which 86 and 479 membrane proteins were identified only by TMHMM and GO, respectively (Figure 1). Membrane proteins from various

Subcellular Classification

Term associations for each protein-encoding gene symbol were retrieved from Gene Ontology Homo sapiens annotation database updated on January 30, 2010. Plasma membrane proteins were picked out based on the annotations. Gene Expression analysis

Transcriptome data sets generated from histologically benign gastric epithelial tissues from noncancer human subjects and GC cell lines are from our previously published study.23 Procurement of human tissues was approved by the SingHealth Institutional Review Board. Flow Cytometry analysis

Fluorescence-conjugated antibodies against extracellular domains of CD9, CD13, CD14, CD36, CD38, CD44, CD49e, CD59, and CD133 were obtained from BD Biosciences (Franklin Lakes, NJ). Mouse IgG1-FITC and IgG2a-PE were isotype controls. Antibodies were incubated with GC cells (cell density (5−10) × 105/mL) for 0.5 h at 4 °C before analysis in a flow cytometer (FACSCalibur, BD Biosciences). Immunostaining of Tissue Microarrays

Two or three 1.0 mm diameter disks were cored and arrayed on standard glass microscope slides (Beecher Instruments, Sun Prairie, WI) from tumor-rich paraffin blocks of 49 gastric adenocarcinomas and adjacent histologically normal gastric epithelium. Use of archived tissues from the Pathology Department, Singapore General Hospital was approved by the SingHealth Institutional Review Board. Histological evaluation and assignment of Lauren histotypes were reviewed by authors (K.H.L. and W.K.W.). Immunostaining was performed using the basic IHC Kit with Antibody Amplifier (ProHisto, Columbia, SC), EnVision Detection, Peroxidase/ DAB, Rabbit/Mouse System (Dako, Denmark), and the following primary antibodies: anti-MET (C-12; sc-10; 1 μg/ mL), anti-FGFR2 (ab58201; 0.2 μg/mL; Abcam, Cambridge, U.K.), anti-EPHA2 (ab5386; 1.25 μg/mL; Abcam), antiEPHB2 (ab5418; 5 μg/mL; Abcam), and anti-ITGB4 (sc9090; 0.6 μg/mL). Except where stated, all antibodies were from Santa Cruz Biotechnology, Santa Cruz, CA. The recommended tissue for each antibody was stained as a positive control. All stained tissue microarrays were scored independently by two pathologists. Each tumor and its adjacent normal epithelium were scored by the product of staining intensity and percentage of positively stained cancer or normal epithelial cells. Staining intensity was scored on a scale of 0−3 (0, no staining; 1, weak; 2, moderate; and 3, strong staining). The scores of each pair of cancer (T) and adjacent histologically

Figure 1. Workflow for characterizing gastric cancer membrane proteome. Six different gastirc cancer cell lines from various sources were lysed and enriched for membrane proteins before LC−MS/MS analysis. Database search results from Mascot were qualified by PeptideProphet and ProteinProphet, and quantified by normalized spectral index (SIN). Membrane proteins were identified by Gene Ontology annotation and TMHMM prediction.

subcellular organelles were identified, of which about 38% were PM proteins (Supporting Information Table 1). Our results showed widely diverse expression of CD molecules on the surface of GC cells. Although 78 CD molecules were identified in total, only 6 were expressed by all six cell lines. These proteins are CD29, CD49f, CD71, CD98, CD107a, and CD298. Sixteen RTKs were identified among which EPHA2 and EGFR were expressed in all six cell lines and ERBB2 in five. 3407

dx.doi.org/10.1021/pr300212g | J. Proteome Res. 2012, 11, 3405−3413

Journal of Proteome Research

Article

Figure 2. Validation of quantitation of proteomic data using flow cytometry. (a) Nine representative surface proteins of various abundances were selected for validation using flow cytometry. The table is color coded according to the percentage of cells showing positive staining for each protein. (b) Correlation of the abundance of the nine proteins by flow cytometry (percentage of expression of each protein) with quantitation by mass spectrometry (SIN values).

Figure 3. Bivariate map of aberrantly expressed cell surface proteins by integrative analysis of proteome and transcriptome data. Cell surface proteins identified by LC−MS/MS are shown with their mRNA expression for six gastric cancer lines. Spot size is proportional to logarithmic values of protein normalized spectral indices and colors indicate log2 ratios (cancer cells to benign gastric epithelium controls) of mRNA expression. Ratios >2 are shown in red, T (vice versa). It was uncommon for a tumor and its matched normal tissue to have comparable expression (N = T), showing the likely pathobiological significance of these proteins. In this study, we focused on proteins which were significantly more highly expressed in tumors than in matched normal tissues (N < T) because such abnormalities are more druggable. Four proteins, that is, EPHA2, FGFR2, ITGB4, and MET, were overexpressed in 57% sample pairs (Table 1). Among the 49

Details of shortlisted surface proteins by GC cell line are shown in Supporting Information Table 2. Abundance of Surface Proteins

The abundance of surface proteins identified by MS was estimated by normalized spectral indices (SIN). To evaluate the reliability of this quantitative method, we investigated 9 surface proteins, representative of varying abundance in four GC cell lines, by flow cytometry. They were four highly expressed proteins (CD9, CD49e, CD44, CD59), two of low abundance (CD13, CD133), and three that were absent (CD14, CD36, and CD38) (Figure 2). Proteins with high SIN values were expressed by high percentages of the respective cell lines determined by flow cytometry, for example, CD9 in AGS and KatoIII, while proteins undetected by MS (CD14, CD36, CD38) also generated flow cytometry signals of low frequency. The high correlation between flow cytometric and MS data of 0.83−1.00 supported the reliability of the quantitative proteomic analysis.

Table 1. Expression of Four Selected Surface Proteins by Immunostaining Microarrays of Primary GC Tissues Lauren histotype T T T T T T T T T T T T

MET

Integrative Analysis of GC Cell Surface Proteins FGFR2

Although mRNA expression does not always correlate with protein abundance, abnormally altered gene expression is often reflected in concordant changes in mRNA and protein levels.24−27 Retrieving mRNA expression of 976 membrane proteins identified by LC−MS/MS from our published transcriptome data sets,23 we found concordant alterations in the expression of 57 CD proteins and 16 RTKs at mRNA and protein levels in six GC lines. These proteins of potential pathological significance are shown in Figure 3, with detailed information in Supporting Information Tables 2 and 3. By selecting surface proteins that were highly expressed at both mRNA and protein levels, we compiled a shortlist of proteins including CD9, integrins, ephrins, CD107a/LAMP1, CD147, CD156b/c, CD332/FGFR2, CD340/ERBB2/HER2, EGFR, and MET (Supporting Information Table 3). The relevance of our approach was supported by the presence in the shortlist of several proteins already known to be associated with GC tissues. Thus, some clinical GC tumors overexpress MET,28,29 while HER2 overexpression in a subset of GCs has recently shown this RTK to be a therapeutic target.2 CD9 overexpression has been reported in advanced GC.30,31 High expression of FGFR2 is known not only in GC cell lines, but also in primary GC tumors.32,33 In addition to known protein markers of GC, our shortlist also revealed overexpressed genes not previously associated with GC indicating that this integrative analysis could enrich and extract pathobiologically important proteins from large data sets. For example, CD98 exhibited extraordinarily high levels in all six GC lines. This transmembrane solute carrier which mediates integrin signaling is overexpressed in several tumor types and is associated with poorer prognosis.34,35

EPHA2

ITGB4

Total

> = < > = < > = < > =