Discovery of Melanotransferrin as a Serological Marker of Colorectal

Sep 12, 2014 - (1) In Korea, CRC is the third leading cause of cancer mortality; in 2010, over 25 .... Briefly, cells on the dishes were washed twice ...
0 downloads 0 Views 3MB Size
Article pubs.acs.org/jpr

Discovery of Melanotransferrin as a Serological Marker of Colorectal Cancer by Secretome Analysis and Quantitative Proteomics Jihye Shin,†,‡ Hye-Jung Kim,† Gamin Kim,†,§ Meiying Song,§ Se Joon Woo,∥ Seung-Taek Lee,‡ Hoguen Kim,§ and Cheolju Lee*,†,⊥ †

Center for Theragnosis, Korea Institute of Science and Technology, Hwarangno 14-gil 5, Seongbuk, Seoul 136-791, Korea Department of Biochemistry, College of Life Science and Biotechnology, Yonsei University, Seoul 120-749, Korea § Department of Pathology, Yonsei University College of Medicine, Seoul 120-752, Korea ∥ Department of Ophthalmology, Seoul National University College of Medicine, Seoul National University Bundang Hospital, Seongnam, Gyeonggi-do 463-707, Korea ⊥ Department of Biomolecular Science, University of Science and Technology, Daejeon 305-333, Korea ‡

S Supporting Information *

ABSTRACT: To discover serological colorectal cancer (CRC) markers, we analyzed cell line secretome to gather proteins of higher potential to be secreted from tissues into circulation. A total of 898 human proteins were identified, of which 62.2% were predicted to be released or shed from cells. The identified proteins were compared with tissue proteomes to find candidate proteins whose expressions were elevated in tumor tissues compared with normal tissues as revealed by (i) quantitative proteomic analysis based on cICAT and mTRAQ or (ii) data mining of immunohistochemical images piled in Human Protein Atlas database. By applying various stringent criteria, 11 candidate proteins were selected. Among these, we validated an significant increase (p = 0.0018) of melanotransferrin (TRFM) at the plasma level of CRC patients through Western blotting, using 130 plasma samples containing 30 healthy controls, 80 CRC patients, and 20 patients of other diseases. Finally, we measured the expression level of TRFM in 325 plasma samples containing 77 healthy controls and 228 CRC patients (34.6 ± 4.2 ng/mL and 67.0 ± 6.4 ng/mL, p < 0.0001) through ELISA and demonstrated the area under the receiver operating characteristic curve of 0.723 (p < 0.0001) with a 92.5% specificity, 48.2% sensitivity, and 95.7% positive predictive value. Furthermore, unlike CEA and PAI-1, up-regulation of TRFM in pathological stages I & II groups compared with stages III & IV groups lead us to expect the use TRFM for early-stage diagnosis of CRC. In this study, we suggest TRFM as a potential serological marker for CRC and expect our discovery strategy to help identify highly cancer-specific and body-fluid-accessible biomarkers. KEYWORDS: melanotransferrin, secretome, ICAT, mTRAQ, serological marker, colorectal cancer



INTRODUCTION Colorectal cancer (CRC) is one of the leading causes of cancerrelated deaths worldwide. In 2014, it has been estimated that 136 830 new CRC cases were diagnosed and over 50 000 deaths occurred from CRC in the United States.1 In Korea, CRC is the third leading cause of cancer mortality; in 2010, over 25 000 new cases of CRC were reported and over 7600 deaths from CRC occurred.2 The stage of the CRC, at which it is first diagnosed, is important for survival rates. The 5-year survival for stage I is 93% but drops to 59% for stage III. Thus, detection of CRC in an early stage is critical for a successful clinical outcome, such as improved prognosis and survival rate.3 Screening methods to diagnose CRC are currently available, which include digital rectal examination, fecal occult blood test (gFOBT and iFOBT), sigmoidoscopy, and colonoscopy.4 Recent medical communities sometimes employ a computed © 2014 American Chemical Society

tomographic (CT) colonography, double-contrast barium enema (DCBE), and a single-specimen guaiac FOBT. However, the diagnostic value of these methods is limited in terms of its sensitivity, costs, risks, and convenience.5−7 Serological biomarkers can be analyzed relatively easily and economically and therefore have the potential to greatly enhance screening acceptance. In 1965, Dr. Joseph Gold found a substance in the blood of patients with colon cancer that was normally found in fetal tissues and named it carcinoembryonic antigen (CEA).8 Over the past several Special Issue: Proteomics of Human Diseases: Pathogenesis, Diagnosis, Prognosis, and Treatment Received: July 29, 2014 Published: September 12, 2014 4919

dx.doi.org/10.1021/pr500790f | J. Proteome Res. 2014, 13, 4919−4931

Journal of Proteome Research

Article

secretome by using aforementioned strategy and validated melanotransferrin (TRFM) as a potential serological CRC marker. Western blot and ELISA of plasma samples from cancer patients and various statistical analyses demonstrated the potential diagnostic value of TRFM. This study presents an accurate and robust strategic approach to find potential serological marker for CRC screenings.

decades, enormous efforts have been made to characterize useful biomarkers for CRC.9 Unfortunately, most biomarkers, such as CEA and CA-19-9, have limited specificity, sensitivity, or both.10 The preoperative CEA and preoperative CA19-9 ranges from 10 to 80% sensitivity depending on the stage of disease, whereas specificity was as high as ∼90%. Especially in early stage without metastasis, the positive rates of CRC patients with elevated preCEA and preCA19-9 levels were 5− 33% and 5−11%, respectively.11 New screening markers with high positive rates as well as high specificity and sensitivity are still required to support the early diagnosis of CRC. Proteomics technology platforms have been promising tools for the discovery of new cancer biomarkers. A highly desirable biomarker for cancer screening and monitoring would be a biomarker that can be measured in body fluid samples.12,13 Accordingly, serum and plasma have been the major targets of proteomics studies aimed at identifying potential cancer biomarkers.14,15 However, the progress of these studies has been hampered by the complex nature of serum/plasma samples and by the large dynamic range between the concentrations of different proteins.16 Because cancer serological markers are likely to be present in low amounts in blood samples, direct isolation of these markers from plasma and serum samples requires a labor-intensive process that involves the depletion of abundant proteins and extensive protein fractionation prior to mass spectrometric analysis.17,18 Alternatively, secretome can be analyzed to identify circulating molecules. The term “secretome” refers to proteins released by a cell, tissue, or organism through various mechanisms including classical secretion, nonclassical secretion, and secretion via exosomes.19 In particular, cell secretome consists of proteins that are secreted and shed from the cell surface and intracellular proteins released into extracellular growth medium due to cell lysis, apoptosis, and necrosis.20 The secretome encompasses diverse functions ranging from immune regulation to pathological processes, which include cancer invasion and metastasis.21,22 In addition, the limited complexity of the secretome in the conditioned media compared with serum and plasma enhances identification of less abundant proteins. In theory, a cancer serological marker candidate should be a secreted protein; however, not all secreted proteins must be cancer-specific, which necessitate a well-defined analytical procedure to determine whether a secreted protein is cancer-specific or not. Here we report a new strategy with discover serological markers. We suppose that if some proteins show quantitative changes in cancer tissues compared with normal tissues and are present in the cell secretome, those proteins would stand a better chance of being detected in plasma/serum. To identify CRC-specific tissue proteins, we used two approaches. First, we looked into our experimental CRC tissue proteome database. Quantitative proteomic methods such as cICAT (cleavable Isotope-Coded Affinity Tag)23 and mTRAQ (multiple reaction monitoring Tags for Relative and Absolute Quantitation)24 have allowed the analysis of protein profiles in cancerous versus normal tissues.25 In the second approach, we searched the Human Protein Atlas (HPA) database,26 which contains the immunohistochemical (IHC) staining profiles of numerous proteins in a variety of cancerous and noncancerous tissues. These two approaches provide a higher chance of finding CRCspecific markers, and the cell secretome analysis provides greater chance of finding serological markers. We chose 11 biomarker candidates from CRC tissue proteome and cell



METHODS

Preparation of Secretomes and Cell Extracts

The CRC cell lines HCT-8 (tumorigenic, noninvasive, nonmetastatic) and HCT-116 (tumorigenic, invasive, metastatic) were obtained from the American Type Culture Collection. Cells were cultured in RPMI1640 (Gibco, Rockville, MD) supplemented with 10% fetal bovine serum (Gibco), 1% penicillin, and streptomycin (Gibco) at 37 °C in a humidified 95% air, 5% CO2 incubator. Cells were grown to ∼70% confluency (∼1.6 × 107 cells) in 150 mm culture dishes (Nunc, Naperville, IL). The cell monolayer was rinsed carefully with serum-free medium (SFM) three times at RT. Then, the cells were incubated in the SFM at 37 °C for 10 h (HCT-116) or 12 h (HCT-8). After incubation, the SFM from 20 plates was carefully collected; 2 mM PMSF and 1 mM EDTA were added as protease inhibitors. Floating cells and cellular debris were removed by centrifugation (400g, 10 min, 4 °C), followed by sterile filtration (pore size: 0.22 μm, Millipore, MA). The conditioned medium was concentrated through ultrafiltration using “Amicon Ultra-15” centrifugal filter devices (Millipore, MA). Secreted proteins were precipitated by acetone at −20 °C for 1 h and then dissolved in buffer consisting of 8 M urea, 75 mM NaCl, 50 mM Tris (pH 8.2). Meanwhile, whole cellular proteins were isolated. Briefly, cells on the dishes were washed twice with SFM, harvested, and then lysed in the abovementioned urea buffer through sonication on ice. The lysate was centrifuged (14 000g, 10 min, 4 °C), and the supernatants were collected. Protein concentration was determined by a standard Bradford protein assay (Bio-Rad, Richmond, CA). All protein samples were stored at −80 °C until use. Liquid Chromatography and Tandem Mass Spectrometry (LC−MS/MS)

The protein sample (100 μg) was reduced with 10 mM DTT (Sigma, St. Louis, MO) at 36 °C for 25 min and alkylated with 14 mM iodoacetamide (Sigma) at 25 °C for 30 min. The sample was diluted five times to decrease urea concentration to less than 1.6 M in solution and added 1 mM CaCl2. The protein mixture was digested by sequencing-grade modified trypsin (Promega, Madison, WI) at 37 °C for 16 h. The ratio of enzyme to protein was 1:200. Tryptic digests were directly used or further separated based on isoelectric point by using an OFFGEL fractionator (Agilent Technology, Santa Clara, CA) prior to LC−MS/MS. The separated peptides were collected into 14 fractions. Peptide samples were desalted with C18 SPE cartridges (Waters, Milford, MA), dried in vacuo, and kept at −80 °C until use. The peptide samples were reconstituted in 0.4% acetic acid. An aliquot (∼1 μg) was then injected into a reversed-phase Magic C18aq column (15 cm × 75 μm, 200 Å, 5U) on an Agilent 1200 HPLC system (Agilent Technology). The column was pre-equilibrated with 95% buffer A (0.1% formic acid in water) and 5% buffer B (0.1% formic acid in acetonitrile). The peptides were eluted at a flow rate of 0.4 μL/min across the 4920

dx.doi.org/10.1021/pr500790f | J. Proteome Res. 2014, 13, 4919−4931

Journal of Proteome Research

Article

Table 1. Information of Plasma Samples Used for Verification and Validation verification set

validation set sex

# of samples healthy controls general healthy healthy-colonb polyps adenoma CRC patients stage I stage II stage III stage IV

30 20 10 10 10 80 20 20 20 20

ages

male

female

# of samples

± ± ± ± ± ± ± ± ± ±

15 10 5 3 6 43 10 10 10 13

15 10 5 7 4 37 10 10 10 7

77 67 10 9 11 228 68 68 65 27

63.5 63.7 63.2 56.7 58.5 62.8 64.5 65.3 63.7 57.7

7.5 8.2 6.3 7.5 11.8 8.7 7.8 8.3 8.7 8.7

primary sitea

sex ages

male

female

AC

DC

TC

SC

RJ

R

others

± ± ± ± ± ± ± ± ± ±

33 28 5 3 6 137 38 41 41 17

44 39 5 6 5 90 30 26 24 10

45 14 17 12 2

9 4 2 2 1

6 2 2 2 0

68 11 21 21 15

17 2 7 6 2

76 33 16 21 6

7 2 3 1 1

68.2 69.0 63.2 56.7 57.9 62.6 63.9 64.2 62.4 56.4

7.0 6.8 6.3 7.9 11.3 8.7 7.4 8.4 8.4 10.8

a

Primary site in CRC (AC, ascending colon; DC, descending colon; TC, transverse colon; SC, sigmoid colon; RJ, rectosigmoid junction; R, rectum; Others, cecum, hepatic flexure, splenic flexure and annular constrictive). bColon-healthy controls ascertained through colonoscopy.

algorithm to predict the existence and location of signal peptide cleavage sites for given amino acid sequences. A protein is considered classically secreted if it receives a signal peptide probability ≥0.9. To predict nonclassical protein secretion, we used the SecretomeP program (version 2.0).28 SecretomeP uses a neural network that combines six protein characteristics to determine if a protein is nonclassically secreted. A protein is considered nonclassically secreted if it receives an NN-score ≥0.5. In addition, we used the TMHMM program (version 2.0) to predict transmembrane helices in integral membrane proteins.29 Ingenuity Pathway Analysis (IPA, Ingenuity system, http://www.ingenuity.com) was used to determine the subcellular localization. The Database for Annotation, Visualization and Integrated Discovery (DAVID, version 6.7, SAICFrederick, http://david.abcc.ncifcrf.gov) was used to determine the molecular function. Human Protein Atlas database (HPA, version 7.0, http://www.proteinatlas.org), which contains protein expression profiles based on immunohistochemistry for a large number of human tissues, cancers, and cell lines, was used to compare protein expressions between CRC and normal tissues. All secreted proteins were further analyzed using the Plasma Proteome Database (PPD, http://www. plasmaproteomedatabase.org) containing 3778 entries as of 2011. (The PPD site updated in 2014 contains10 546 entries.)

analytical column with a linear gradient of 5−10% buffer B for 5 min, 10−40% buffer B for 40 min (for the OFFGELfractionated peptides) or 80 min (for unfractionated peptides), 40−80% buffer B for 5 min, 80−80% buffer B for 10 min, and 80−5% buffer B for 5 min. The HPLC system was coupled to an LTQ-XL mass spectrometer (Thermo Scientific, San Jose, CA). ESI spray voltage was set to 1.9 kV, capillary voltage to 30 V, and the temperature of the heated capillary to 250 °C. MS survey was scanned from 300 to 2000 m/z and followed by three data-dependent MS/MS scans with the following options: isolation width, 1.5 m/z; normalized collision energy, 25%; dynamic exclusion duration, and 180 s. All data were acquired using Xcalibur software v2.0.7. In the case of the unfractionated peptides, we conducted triplicate runs to exhibit nearly identical retention time and to confirm a high level of reproducibility for the LTQ instrument. In contrast, we conducted a single run for each fraction of the OFFGEL-fractionated peptides. Analysis of Mass Spectrometric Data

The acquired MS/MS spectra were searched using SEQUEST (TurboSequest version 27, revision 12) against a composite database composed of the human International Protein Index (IPI) database (about 80 000 entries, versions 3.57, European Bioinformatics Institute, ftp://ftp.ebi.ac.uk/pub/databases/IPI) and the bovine IPI database (about 30 000 entries, versions 3.42). Two trypsin-missed cleavages were allowed, and the peptide mass tolerances for MS/MS and MS were set to ±0.5 and ±3 Da, respectively. Other options used for SEQUEST searches were fixed modification of carbamidomethylation at cysteine (+57.02 Da) and variable modification of oxidation at methionine (+ 15.99 Da). Peptide assignment and validation were performed using the Trans-Proteomic Pipeline (TPP, version 4.0, http://www.proteomecenter.org). The SEQUEST search output was used as an input for TPP analysis. Peptides were collected with Peptide-Prophet probability greater than 0.5, and proteins were collected with Protein-Prophet probability greater than 0.95. The raw data were deposited in PeptideAtlas database (accession ID, PASS00565).

Human Plasmas

For verification of marker candidates through Western blot, plasma samples from 80 CRC patients, 10 colorectal adenoma patients,10 patients with polyps, and 30 healthy controls (including 10 healthy-colon controls ascertained through colonoscopy) were collected. The number of CRC patients was 20 per each pathological stage (I−IV). To validate markers through ELISA, plasma samples from 228 CRC patients (68 from stage I; 68, stage II; 65, stage III; 27, stage IV), 10 colorectal adenoma patients, 10 patients with polyps, and 77 healthy controls (including 10 healthy-colon controls ascertained through colonoscopy) were used. The ELISA samples incorporated all samples used for Western blot analysis. Detailed sample information is provided in Table 1. For all blood preparation, 3 mL of blood was collected in an EDTA tube, and the plasma was prepared as suggested by HUPO (Human Proteome Organisation) Plasma Proteome Project.17 The plasma samples were collected at Yonsei Severance Hospital, (Seoul, Korea) and Bundang Seoul National

Bioinformatics for Data Mining

The identified proteins were analyzed using ProteinCenter (Proxeon Bioinformatics, http://www.cbs.dtu.dk/services). We submitted several proteins in one FASTA file-format for each program. We used the Signal Peptide Predictor program (SignalP, version 3.0).27 SignalP uses a hidden Markov model 4921

dx.doi.org/10.1021/pr500790f | J. Proteome Res. 2014, 13, 4919−4931

Journal of Proteome Research

Article

Figure 1. Schematic workflow.

Western Blot

University Hospital (Gyeonggi-do, Korea). Authorization to use for research purposes was obtained from the Institutional Review Board of each hospital. Some of the blood specimens were obtained from the Liver Cancer Specimen Bank of the National Research Resource Bank Program of the Korea Science and Engineering Foundation of the Ministry of Education, Science and Technology. In verification analysis by Western blot, 500 ng/mL of E. coli β-galactosidase protein was spiked into each plasma, and then the top six abundant proteins (serum albumin, immunoglobulin G, immunoglobulin A, transferrin, haptoglobin, and antitrypsin) were depleted using MARS (Agilent Technology) column.18 For this, the βgalactosidase-spiked plasmas (40 μL) were diluted 1:5 with a proprietary “Buffer A” and loaded onto the MARS affinity column on an Agilent 1100 series HPLC system (Agilent Technology). Unbound fraction was concentrated through ultrafiltration using Microcon filter (3 kDa cutoff; Millipore).

Secreted and cell lysate proteins were dissolved in a sample buffer containing 8 M urea, 75 mM NaCl, and 50 mM Tris (pH 8.2). Equal amounts of protein samples (cell lysates 2 μg, cell secretome 10 μg) were separated on SDS-PAGE gels (8 or 12% depending on the molecular size of the target proteins) and transferred to PVDF membranes (GE Healthcare, Uppsala, Sweden). For plasma proteins, immunodepleted samples (10 μL each) were fractionated on multiple mini-slab gels. After SDS-PAGE, all gels were arranged row by row on a single big PVDF membrane (20 × 15 cm) and proteins were electrotransferred. All membranes were blocked with 5% skim milk in TBST buffer (25 mM Tris, 190 mM NaCl, and 0.05% Tween 20, pH 7.5) for 1 h at RT and was incubated overnight at 4 °C with primary antibodies. After three 10 min washes with TBST, membranes were incubated with corresponding IgG-HRP secondary antibodies at a dilution of 1:3000 for 1 h at RT, washed, and visualized with the ECLplus as chemiluminescent 4922

dx.doi.org/10.1021/pr500790f | J. Proteome Res. 2014, 13, 4919−4931

Journal of Proteome Research

Article

Figure 2. Analysis of conditioned media harvested form HCT-8 and HCT-116 cell lines. (A) Proteins (10 μg) in the conditioned media (CM) and cell extracts (CE) were analyzed via Western blotting using an anti-α-tubulin antibody. (B) Number of identified proteins in the cell secretome (TPP protein-prophet scores ≥0.95). The 368 proteins at the intersection represent ambiguous identification due to sequence homology between human and bovine proteins. These proteins were considered human proteins tentatively. (C) Subcellular locations of proteins identified in conditioned serum-free media of HCT-8 and HCT-116 colorectal cancer cell lines, as suggested by IPA. (D) Identified proteins are classified as in C according to subcellular locations after each protein is weighted by spectral count. (E) Classification of proteins based on their molecular function as suggested by DAVID.

ELISA

substrate (GE Healthcare). The utilized primary antibodies include rabbit monoclonal anti-TENA antibody, rabbit monoclonal anti-SERPH antibody, mouse monoclonal antiASNS antibody (Epitomics, Burlingame, CA), mouse polyclonal anti-PLOD3 antibody, rabbit polyclonal anti-RPESP antibody (Abcam, Cambridge, MA), mouse monoclonal antiFBLN4 antibody, mouse monoclonal anti-IPO5 antibody, mouse monoclonal anti-PCBP2 antibody, mouse monoclonal anti-NAP1L1 antibody, mouse monoclonal anti-TRFM antibody (Abnova, Taiwan, China), and mouse monoclonal antiPTK7 antibody (R&D Systems, Minneapolis, MN). Antimouse IgG-HRP secondary antibody (Millipore, Temecula, CA) and antirabbit IgG-HRP secondary antibody (Cell Signaling, Beverly, MA) were used as secondary antibodies.

Plasma level of TRFM, PAI-1, and CEA were measured using the following commercialized ELISA kits according to the manufacturers’ instructions: human TRFM ELISA kit, human PAI-1 ELISA kit (Cusabio Biotech, Wuhan, China), and human CEA ELISA kit (USCN Life Science, Wuhan, China). Absorbance was read at 450 nm in a microplate reader. Absolute quantities of corresponding antigen in the plasma samples were calculated from an eight-point standard curve (0−10 ng/mL TRFM, 0−200 ng/mL PAI-1, and 0−5000 pg/ mL CEA). The lowest detectable concentrations were estimated to be 0.251 ng/mL for TRFM, 0.981 ng/mL for PAI-1, and 0.335 ng/mL for CEA. The interassay and intraassay coefficient of variation were below 12% for all assays. 4923

dx.doi.org/10.1021/pr500790f | J. Proteome Res. 2014, 13, 4919−4931

Journal of Proteome Research

Article

Table 2. Predicted Secretion Pathway of Proteins Identified in Conditioned Media from CRC Cell Lines no. of identified proteins cell lines

human proteins

annotated proteinsa

classical secretionb

nonclassical secretionc

membrane proteind

otherse

percentage of predicted secreted proteins

total HCT-8 HCT-116

896 725 419

880 717 409

290 242 96

239 188 123

18 12 8

333 275 182

62.2 61.6 55.5

a

Proteins annotated in UniProt database. bProteins predicted by the SignalP program to be secreted via the classical secretion pathway (SignalP probability ≥0.90). cProteins predicted to be secreted by the nonclassical secretion pathway using SignalP and SecretomeP (SignalP probability