Bioinformatics Processing of Protein and Transcript ... - ACS Publications

15 Aug 2007 - Including also large-scale transcript data, three bioinformatics tools .... All bioinformatics processing of differentially expressed pr...
0 downloads 0 Views 686KB Size
Bioinformatics Processing of Protein and Transcript Profiles of Normal and Transformed Cell Lines Indicates Functional Impairment of Transcriptional Regulators in Buccal Carcinoma Claudia A. Staab,† Rebecca Ceder,‡ Theres Ja1 gerbrink,† Jan-Anders Nilsson,‡ Karin Roberg,§ Hans Jo1 rnvall,† Jan-Olov Ho1 o1 g,† and Roland C. Grafstro1 m*,‡ Department of Medical Biochemistry and Biophysics, and Institute of Environmental Medicine, Karolinska Institutet, SE-171 77 Stockholm, Sweden, and Division of Otorhinolaryngology, University Hospital Linko¨ping, SE-581 85 Linko¨ping, Sweden Received May 23, 2007

Normal and two transformed buccal keratinocyte lines were cultured under a standardized condition to explore mechanisms of carcinogenesis and tumor marker expression at transcript and protein levels. An approach combining three bioinformatic programs allowed coupling of abundant proteins and largescale transcript data to low-abundance transcriptional regulators. The analysis identified previously proposed and suggested novel protein biomarkers, gene ontology categories, molecular networks, and functionally impaired key regulator genes for buccal/oral carcinoma. Keywords: cancer biomarkers • proteomics • transcriptomics • oral cancer • gene ontology • molecular networks • transcription factors • cultured oral keratinocytes • serum-free conditions

Introduction Squamous cell carcinoma (SCC) of the buccal mucosa is highly aggressive, recurrent, and frequently associated with the lowest survival rates among oral cancer forms.1 The etiology of this type of SCC is largely unknown, underscoring the need for molecular disease markers in diagnosis and for novel treatment regimens.2-4 Studies of oral cancer, often with tissue samples from other locations than buccal mucosa, have given unclear results.2,4-7 Permanent tumor cell lines provide unlimited sources of well-defined starting material useful to complement tissue-based tumor marker studies and to avoid the heterogeneity and contaminating normal tissue problems that are well-documented in tumor analyses.6-9 Investigation of the functional consequences of gene expression requires consideration of the microenvironment and growth conditions.10,11 Omitting difficulties associated with serum standardization and influences of batch variations, human normal buccal keratinocytes (NBK) can be reproducibly cultured and transferred without serum.12 In contrast, oral SCC lines were so far exclusively generated in standard media with serum.13 The buccal SCC line SqCC/Y1 adopted from serumdependence to the serum-free conditions developed for NBK, whereas similar efforts with other oral SCC lines proved unsuccessful.13,14 However, transfection of NBK under serumfree conditions with the SV40 T antigen has generated the * Corresponding author: Dr. Roland Grafstro¨m, Institute of Environmental Medicine, Box 210, Karolinska Institutet, SE-171 77 Stockholm, Sweden. Phone, +46 8 30 12 03; fax, +46 8 34 38 49; e-mail, [email protected]. † Department of Medical Biochemistry and Biophysics, Karolinska Institutet. ‡ Institute of Environmental Medicine, Karolinska Institutet. § Division of Otorhinolaryngology, University Hospital Linko¨ping. 10.1021/pr070308q CCC: $37.00

 2007 American Chemical Society

immortalized SVpgC2a line, complementing SqCC/Y1 as a serum-free model of cell transformation in buccal mucosa.12,14,15 Reflecting alterations common to many cancers, SVpgC2a and SqCC/Y1 exhibit deregulated proliferation, differentiation, apoptosis, and aberrant structural and enzymatic functions.12,14-19 Limited analyses suggest that tumor markers for buccal carcinoma, for which few cell line models exist, can be identified under well-defined conditions with these lines. For example, both lines exhibit absence of functional tumor suppressor p53, and one line (SVpgC2a) expresses considerably increased levels of cytochrome P450 1B1 transcripts.15,20,21 Microarray analyses of oral tumor specimens have implicated the existence of multiple tumor markers at the transcriptome level, but lack of coherence in marker identification among studies is striking.2,4 Gene expression is considered to reflect protein abundance in terms of functional categories, although less commonly to the single gene level, and proteins (as effectors of function), rather than transcripts, are likely to better reflect phenotypic changes.22 Two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) allows the simultaneous assessment of many proteins, yet proteins of low expression are rarely detected.23,24 Hence, combination of 2D-PAGE and microarray analysis may give both identification of a limited number of proteins and large-scale transcript profiles. To enable bridging the gap between transcript and protein data, bioinformatics tools are used to organize large data sets and select target gene categories. For example, the Gene Ontology Tree Machine (GOTM) (http://bioinfo.vanderbilt.edu/gotm) allows the sorting and interpretation of expressed genes and proteins, AffyAnnotator (http://www.bea.ki.se/jnlp) constitutes a recently developed Web-based tool for visualization of Affymetrix microarray data according to the gene ontology (GO), while Ingenuity Journal of Proteome Research 2007, 6, 3705-3717

3705

Published on Web 08/15/2007

research articles Pathway Analysis (IPA) (http://www.ingenuity.com) enables generation of molecular networks and association to biological functions and/or diseases. Implementation of these bioinformatics procedures is expected to provide predictive information on single tumors in personalized medicine, and well-characterized transformed cell lines are likely to be useful for demonstrating the concept.25 The current study utilized NBK and a standardized serumfree growth condition as reference to explore SqCC/Y1 and SVpgC2a for large-scale analysis of differentially expressed genes. Differential protein signatures, identified by 2D-PAGE and matrix-assisted laser desorption/ionization-time-of-flight (MALDI-TOF) mass spectrometry, and differential transcript profiles, identified by microarray, were assessed with the Webbased bioinformatics tools GOTM, AffyAnnotator, and IPA. GOTM enabled identification of significantly enriched GO categories from differently expressed proteins between the normal and transformed states. AffyAnnotator enabled identification of differentially expressed transcripts within these categories, which were then analyzed by IPA for relationship in molecular networks. Finally, immunochemical analysis supported deregulation at the protein level of low-abundance transcription factors implicated from processing of the microarray data. Overall, novel tumor markers for buccal carcinoma could be proposed from coupling of abundantly expressed proteins to molecular networks and transcriptional regulators.

Materials and Methods Processing of Cell Cultures. NBKs were obtained from healthy, non-smoking donors undergoing maxillofacial surgery (approved by the Karolinska Institutet ethical committee) and cultured as previously described.12 Primary cell cultures were derived from tissue digestion with trypsin overnight at 4 °C, and the mixture was resuspended in a serum-free epithelial medium with high levels of amino acids (EMHA) and plated on dishes precoated with fibronectin and collagen.12 Cultures were transferred on regular tissue culture plastic at about 75% confluence, and cells in passage 2 were used in the experiments. The immortalized buccal epithelial cell line SVpgC2a and the buccal carcinoma cell line SqCC/Y1 were also cultured in EMHA, and passages 64-72 and 125-129, respectively, were used in the experiments.14,15 Preparation of Crude Cell Lysates and 2D-PAGE. NBK, SVpgC2a, and SqCC/Y1 cells were washed twice with cold phosphate-buffered saline on ice, scraped into a small volume of phosphate-buffered saline, shortly spun down, and finally resuspended in 10 mM Tris/HCl, pH 8.0, 5 mM magnesium acetate, 7 M urea, 2 M thiourea, and 4% CHAPS. The suspension was subsequently sonicated in bursts for a total of 1 min followed by centrifugation at 48 000g for at least 40 min. The resulting lysates were distributed in aliquots and frozen until use at -80 °C. Protein content of the cell lysates was determined using the Bradford method (Bio-Rad, Hercules, CA). Five hundred micrograms of total protein was used in preparative 2D gels for protein identification via MALDI-TOF mass spectrometry, while 50 µg of total protein was used in analytical gels for analysis of differential gene expression. The samples were diluted in Destreak Rehydration Buffer (GE Healthcare Bio-Sciences AB, Uppsala, Sweden) and applied by rehydration of the strips overnight. Isoelectric focusing was carried out according to the manufacturer’s protocol using a 13 cm Immobiline DryStrip pH 3-10 NL on a Multiphor II Electro3706

Journal of Proteome Research • Vol. 6, No. 9, 2007

Staab et al.

phoresis System (GE Healthcare Bio-Sciences AB) at 20 °C. After first dimension, the strips were equilibrated in 50 mM Tris/ HCl, pH 8.0, 6 M urea, 30% glycerol, and 2% SDS including 65 mM DTT for 15 min followed by 270 mM iodoacetamide for 15 min. The second dimension was performed using 10% polyacrylamide gels. Protein spots were visualized by a modified colloidal Coomassie staining method for the preparative gels and by routine protocol silver staining for the analytical gels.26 Gels were scanned using a Bio-Rad GS-710 Calibrated Imaging Densitometer, and the silver-stained gels were analyzed for differential expression using the PDQuest 6.2.1 Software (Bio-Rad). One match set containing 31 gels was created using the Gaussian model for spot detection, and gels were normalized by scaling to total intensity in valid spots. Results from PDQuest were finally transferred to the Ludesi 2D Interpreter Software (http://www.ludesi.com) where the statistical analysis using Student’s t test was carried out. Four independent cell lysate preparations from each cell line were analyzed, including the assessment of at least two gels for each lysate. Only results with a fold change of g2 (p < 0.05) in comparison to NBK were considered. Expression of a common reference gene/protein typically applied to sample comparisons, that is, β-actin, did not detectably differ among the cell lines (results not shown). Protein Identification by In-Gel Digestion and MALDI-TOF Mass Spectrometry. Protein spots were excised manually, and in-gel digestion was carried out in 96-well plates using a MassPREP station (Waters, Milford, MA) as described.27 The tryptic fragments were analyzed by MALDI-TOF mass spectrometry (Voyager DE-PRO, Applied Biosystems, Foster City, CA) after mixing the samples at a 1:1 (v/v) ratio with a saturated solution of R-cyano-4-hydroxycinnamic acid in 50% acetonitrile, 0.1% trifluoroacetic acid, and subsequent spotting onto a 100-well MALDI target plate. For some weak spots, samples were concentrated using a compact disc-based workstation (Gyros AB, Uppsala, Sweden).28 Spectra were calibrated by internal calibration using the peaks resulting from tryptic autodigestion. The monoisotopic masses obtained were submitted to database searching (Mascot, http://www. matrixscience.com) using the entries for Homo sapiens in the Swiss-Prot database. Scores are reported as -10 × log10(P) where P is the absolute probability that the observed match is a random event. For the selected search conditions, scores >54 are significant (p < 0.05). Microarray Analysis. Total RNA was prepared from 3 × 106 cells using the RNeasy Mini Protocol (Qiagen GmbH, Hilden, Germany) from two separate experiments of NBK, SVpgC2a and SqCC/Y1, respectively. The quality of the RNA was verified using Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA). cRNA was synthesized and hybridized to the Human Genome Focus chip (Affymetrix, Inc., Santa Clara, CA) which contains 8400 genes, according to standard Affymetrix protocols at the Bioinformatics and Expression Analysis Core Facility (Karolinska Institutet, Sweden). The data was analyzed using Gene Chip Operating Software (GCOS, http:// www.affymetrix.com) for calculation of detection calls (presence, marginal presence, or absence of transcripts) and signal values as a relative measure of transcript abundance. GCOS was also applied for target intensity scaling of each array to an identical value and quantification of the signal log ratio, further converted to a fold change value. An average fold change value was calculated for all transcripts in SVpgC2a and SqCC/Y1 relative to NBK, respectively, with a minimum fold change of

Protein Profiling/Transcript Data Mining of Buccal Keratinocytes

research articles

Figure 1. Crude cell lysate of NBK analyzed by 2D gel electrophoresis. The localization of proteins found to be differentially expressed in the transformed cell lines is annotated with the abbreviations used in Table 1.

2 (p e 0.02) in each of four pairwise comparisons set as a threshold for significant differential expression (see Statistical Algorithms Description Document, Affymetrix, 2002). The quality of the data was also verified by correlation analysis in R statistical environment using Bioconductor packages (http:// www.bioconductor.org). Gene Ontology and Network Analysis. All bioinformatics processing of differentially expressed proteins and transcripts involved comparison of SVpgC2a or SqCC/Y1 versus NBK. Proteins identified as differentially expressed were assessed using GOTM, a Web-based statistical hypergeometric test used to analyze for enrichment of GO categories, that is, biological process, molecular function, and cellular component (http:// bioinfo.vanderbilt.edu/gotm). The program was set to apply the human genome as reference. Alternatively, all human protein sequences annotated in the Swiss-Prot Protein Knowledgebase were used (http://expasy.org/sprot/sp-docu.html, entries for all human chromosomes). Enriched GO categories were selected on the highest hierarchical level (glevel 7) at statistical significance p < 0.01. AffyAnnotator (http:// www.bea.ki.se/jnlp), a Web-based tool for visualization and sorting of transcripts on the Human Genome Focus chip according to the GO, was then applied to select altered transcripts in the GO categories derived from the GOTM analysis. An average fold change value was calculated for all transcripts with a minimum fold change of 2 (p e 0.02) in each

of four pairwise comparisons set as a threshold for significant differential expression. IPA (IPA 4.0, Ingenuity Systems, http:// www.ingenuity.com) was in turn applied to assess the transcripts selected through AffyAnnotator for presence in known molecular networks for each cell line. Interaction networks for direct relationships were generated from information in the Ingenuity Pathway Knowledge Base by separately uploading selected transcripts (termed “Focus genes”) and corresponding expression values (mean fold change). Annotation of gene products to several GO categories was corrected prior to IPA analysis by removal of overlapping transcripts. Fisher’s exact test was used for ranking and significance analysis of the focus genes in the networks, using a score of 12 as threshold, including association of the data set to the most significant biological functions and/or diseases (termed “Top functions”). Selection of key regulators in the networks was based on at least three interactions with altered transcripts. All types of protein-protein interactions in the IPA, not relevant for assessment of transcript data, were excluded. Western Blot Analysis. The analysis was carried out on cell lysates as described.29 The membranes were incubated with mouse anti-Sp1, rabbit anti-Sp3, mouse anti-c-Myc, mouse anti-p16, or rabbit anti-HIF-1R (all diluted 1:200, Santa Cruz Biotechnology, Santa Cruz, CA). The membranes were thereafter incubated with a peroxidase-conjugated anti-mouse or anti-rabbit (1:1500; Dakopatts) antibody, and the bands were Journal of Proteome Research • Vol. 6, No. 9, 2007 3707

research articles

Staab et al.

Table 1. Differentially Expressed Proteins and Corresponding Transcript Changes in the Transformed Buccal Keratinocyte Lines SVpgC2a and SqCC/Y1 Relative to Normal Cellsa protein level

protein identity (abbreviation)b

gene abbreviation

accession number

fold change MW [kDa]/pI

coverageb

scoreb

SVpgC2a

fold change

SqCC/Y1

SVpgC2a

SqCC/Y1

-13

n.c.

-130

n.c.

-2.5 -630 -80 n.d. n.c. n.c.

-38 n.c. -2.2 n.d. -5.1 n.c.

n.c. n.c.

-2.4 n.c.

n.c.

n.c.

37.4/9.0

17 24 25 32 23 28 25 21 27 33 19 24 23 53 50 62 22 13 20 16 20 13 26 26 20

110 128 144 147 131 187 195 133 179 219 85 110 91 190 187 221 68 78 95 70 83 112 82 111 83

-21 -5.9 -130c -16 -19 -53 -29 -2.1 -3.8 -4.3 -27 -2.9 -25 -45 +5.3 +4.5 n.c. n.c. n.c. +2.3 +2.6 (+1.5) (+1.4) +2.5

-2.9 -2.2 -3.7 n.c. (-1.9) (-1.5) (-1.9) n.c (+1.3) (-1.9) n.c. (-1.8) -8.4 n.c. n.c. +2.2 +2.2 n.c. n.c. +3.0 +9.2 n.c. +2.2 (+1.3) +2.1

P07910

33.7/5.0

24

83

-5.4

-4.4

n.c.

n.c.

SFN GLUD1 ENO FH

P31947 P00367 P06733 P07954

27.8/4.7 61.4/7.7 47.2/7.0 54.6/8.8 45.2/9.0 27.7/7.7 20.0/5.5

-3.4 (+1.7) +3.8 (+1.7) n.c. +2.0 +2.1 n.c.

-2.4 +2.1 +2.4 +2.0 (+1.3) +2.6 n.c. -2.5

n.c. n.c. n.c. n.c.

P24752 P30048 P02792

175 81 88 75 152 80 89 152

-9.3 n.c. n.c. n.c.

ACAT1 PRDX3 FTL

50 34 20 13 15 10 21 43

n.c. n.a. n.a.

+2.1 n.a. n.a.

Cytokeratin 5 (CK5)

KRT5

P13647

62.5/8.1

Cytokeratin 6A (CK6A)d

KRT6A

P02538

60.0/8.1

Cytokeratin 7 (CK7) Cytokeratin 14 (CK14) Cytokeratin 17 (CK17) Tropomyosin R-3 chain (TPM3) Heat shock protein 27 (HSP27) Heat shock protein 60 (HSP60)

KRT7 KRT14 KRT17 TPM3 HSPB1 HSPD1

P08729 P02533 Q04695 P06753 P04792 P10809

51.4/5.5 51.6/5.1 48.1/5.0 32.8/4.7 22.8/6.0 61.1/5.7

Heat shock protein 70-1 (HSP70-1) Elongation factor (EF-Tu)

HSPA1A TUFM

P08107 P49411

70.1/5.5 49.5/7.3

Heterogeneous nuclear ribonucleoprotein A2/B1 (hnRNP A2/B1) Heterogeneous nuclear ribonucleoprotein C1/C2e (hnRNP C1/C2) Stratifin (14-3-3 protein σ) (SFN) Glutamate dehydrogenase (GDH) R-Enolase (ENO) Fumarate hydratase (FH)

HNRPA2B1

P22626

HNRPC

Acetyl-CoA acetyltransferase (ACAT1) Peroxiredoxin 3 (PRDX3)f Ferritin light chain (Ferritin L)f

-41c

transcript level

a The respective cell lines were cultured under a standardized serum-free condition as described in Materials and Methods, using NBK as a basis for initially identifying differently expressed proteins in SVpgC2a and SqCC/Y1, respectively. Proteins, including at least one isoform or isoelectric variant, expressed at levels g2-fold differently to NBK in at least one of the transformed lines, are listed. Results are expressed as fold change values relative to the normal cells. For proteins identified in more than one spot, the results are shown for all spots, irrespective of significance and fold change. Fold changes