Phosphoproteome Characterization of Human Colorectal Cancer

Jul 29, 2016 - *T.W.: Phone: +86-20-85225960. Fax: +86-20-85222616. E-mail: [email protected]., *Q.-Y.H.: Phone/Fax: +86-20-85227039. E-mail: ...
1 downloads 3 Views 2MB Size
Subscriber access provided by CORNELL UNIVERSITY LIBRARY

Article

Phosphoproteome Characterization of Human Colorectal Cancer SW620 Cell-Derived Exosomes and New Phosphosite Discovery for C-HPP Jiahui Guo, Yizhi Cui, Ziqi Yan, Yanzhang Luo, Wanling Zhang, Suyuan Deng, Shengquan Tang, Gong Zhang, Qing-Yu He, and Tong Wang J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.6b00391 • Publication Date (Web): 29 Jul 2016 Downloaded from http://pubs.acs.org on July 29, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Proteome Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Phosphoproteome Characterization of Human Colorectal Cancer SW620 Cell-Derived Exosomes and New Phosphosite Discovery for C-HPP

Jiahui Guo†, Yizhi Cui†, Ziqi Yan, Yanzhang Luo, Wanling Zhang, Suyuan Deng, Shengquan Tang, Gong Zhang, Qing-Yu He*, Tong Wang*

Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, College of Life Science and Technology, Jinan University, 601 Huangpu Avenue West, Guangzhou 510632, China

1

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ABSTRACT

Identification of all phosphorylation forms of known proteins is a major goal of the chromosome-centric human proteome project (C-HPP). Recent studies have found that certain phosphoproteins can be encapsulated in exosomes and function as key regulators in tumor microenvironment. But, no deep coverage phosphoproteome of human exosomes has been reported to date, which makes the exosome a potential source for the new phosphosite discovery. In this study, we performed highly optimized MS analyses on the exosomal and cellular proteins isolated from human colorectal cancer SW620 cells. With stringent data quality control, 313 phosphoproteins with 1091 phosphosites were confidently identified from the SW620 exosome, from which 202 new phosphosites were detected. Exosomal phosphoproteins were significantly enriched in the 11q12.1-13.5 region of chromosome 11, and had a remarkably high level of tyrosine-phosphorylated proteins (6.4%), which were functionally relevant to ephrin signaling pathway-directed cytoskeleton remodeling. In conclusion, we here report the first high coverage phosphoproteome of human cell-secreted exosomes, which leads to the identification of new phosphosites for C-HPP. Our findings provide insights into the exosomal phosphoprotein systems that help to understand the signaling language being delivered by exosomes in cell-cell communications. The mass spectrometry proteomics data have been deposited to the ProteomeXchange consortium with the data set identifier PXD004079, and iProX database (accession number: IPX00076800).

KEYWORDS: Exosome, phosphoproteome, new phosphosites, C-HPP, signaling pathway

2

ACS Paragon Plus Environment

Page 2 of 43

Page 3 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

INTRODUCTION Understanding the flow of genetic information requires the definitive answer of how many coding genes indeed exist in the human genome and are expressed as proteins, which is the primary scientific question to be addressed by the chromosome-centric human proteome project (C-HPP) and human proteome project (HPP) [1, 2]. Accordingly, the C-HPP community has been collaborating closely to provide protein evidence (PE) for the missing proteins, those coding genes that have not been confirmed by protein sciences, specifically referring to the PE2-4 genes recorded by the neXtProt database [1, 3]. The number of missing proteins has been dropping swiftly, from 33% of human protein coding genes in 2012 to 18% in 2015

[4-7]

.

Currently missing proteins are considered to be highly specific to tissues [8-11], developmental stages

[12]

or compartments, which requires special enrichment strategies on plasma

detergent-insoluble proteins[14], transcription factors

[15, 16]

and membrane fractions

[13]

,

[16, 17]

.

Furthermore, a certain scale of human genome mis-annotation may account for the rest missing proteins, which requires chromosome-centric resolution for the complete answer. Other than finding missing proteins, the characterization of protein post-translational modifications (PTM) and single amino-acid variants (SAAVs) are equally important goals of C-HPP

[6, 18-21]

. By integrating these achievements, an all-inclusive signaling network at

protein level will be established; this will fundamentally promote biology-/disease- driven investigation as a more accurate genome annotation is being output by C-HPP

[1]

. Currently,

C-HPP suggests to focus on three major classes of PTMs, i.e. phosphoryl-, glycosyl-, and acetyl-, for each gene products, including splice isoforms and variants 3

ACS Paragon Plus Environment

[2]

. Among all PTMs,

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 43

phosphorylation represents one of the most important modifications that dominates signaling transduction. It has been assumed that up to 50% of all human proteins are phosphorylated during their life cycles

[22, 23]

. Phosphorylation and dephosphorylation of proteins are key

biological mechanisms to switch on and off enzyme activities in almost all known canonical pathways [24]. As a functionally extracellular compartment, the exosome is a type of secretory particles that is 40-150 nm in diameter

[25]

. In tumor microenvironment, exosomes have been

recognized as a key transporter of nucleic acids and proteins, particularly phosphoproteins [26]. For example, melanoma cancer cells can secret exosomes to induce inflammatory response of bone marrow cells to prepare pre-metastasis niche via transporting MET protein

[27]

. Such a

role of exosomes has been linked to an integrin-mediated homing mechanism in lung-, liverand brain metastasis of cancer [28]. It has caught wide interests to answer what proteins can be encapsulated in exosomes, which warrants proteome level investigations. It is now known that exosomes typically carry ~2000 proteins. For example, early studies from Mathivanan et al. reported that the proteome of colorectal cancer (CRC) LIM1215 cells consisted ~400 proteins with the LTQ-Orbitrap system

[29]

, similar to a recent analysis on T cell exosomes

analyzed with a 2-DE and MALDI-TOF platform

[30]

. In mesenchymal stem cells, Anderson

et al reported that exosomes might carry 1927 proteins, but the false discovery rate (FDR) control for protein identification was not clearly described [31]. Recent studies have found that cancer cell-derived exosomes have abundant receptor tyrosine kinases (RTK), such as phosphorylated EGFR and HER2

[32]

. This makes it more

intriguing to unveil the phosphoproteome of exosomes; however, we note no such reports 4

ACS Paragon Plus Environment

Page 5 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

using high coverage proteomics strategy to date. This suggests that the exosome could be a potentially useful compartment for finding new phosphoprotein forms that have been missed before. Therefore, we performed the first high coverage exosome phosphoproteome analysis with CRC SW620 cells as an example. Over 300 confident phosphoproteins with 202 new phosphosites were identified from exosomes. In addition to chromosome-centric distribution features, we found surprisingly high level of tyrosine (Y) - phosphorylated proteins in exosomes that were mechanistically relevant to the exosome life cycle and functions. We believe that reporting more exosome phosphoproteomes can help to address numerous interesting biology-/disease

driven

questions.

For

example,

whether

the

protein

phosphorylation patterns in exosomes are the same to cells; and whether phosphoproteins in exosomes are functionally directed and why.

MATERIALS AND METHODS Cell culture Human colorectal adenocarcinoma SW620 cells were purchased from American Type Culture Collections (ATCC, Rockville, MD, USA), and cultured with complete Dulbecco’s modified Eagle’s medium (DMEM, ThermoFisher Scientific, Guangzhou, China) supplemented with 10% fetal bovine serum (FBS, ThermoFisher Scientific), 2 mM L-Glutamine, 1 mM Sodium Pyruvate, 1% penicillin/streptomycin (pen/strep) and 10 µg/mL ciprofloxacin

[12, 14, 33, 34]

.

Mycoplasma contamination was periodically monitored by using a PCR detection kit purchased from ExCell Bio, Shanghai, China. The short tandem repeat (STR) loci analysis was frequently performed to guarantee the pure culture of SW620 cells. 5

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Exosome preparation Exosomes were purified according to Mathivanan et al. with minor modifications [29]. SW620 cells were cultured to reach 90% confluence, followed by in-flask PBS and serum-free DMEM washes. Cells were then cultured in 10 mL serum-free DMEM to allow exosome secretion for 24 h, and supernatants were harvested for exosome isolation. Floating cells and debris were removed by sequential centrifugation at 216 × g, 10 min and 15,000 × g, 30 min. The supernatant was next passed through a 0.22 µm filter, and concentrated to ~500 µL using an Amicon® Ultra 30K device (Merck Millipore, Guangzhou, China). After two Dulbecco's Phosphate-Buffered Saline (DPBS; ThermoFisher) washes, the concentrated supernatant was ultracentrifuged at 120,000 × g for 70 min to pellet exosomes. The pellet surface was washed with DPBS prior to subsequent analyses.

Particle size determination The exosome pellet was resuspended in DPBS and subjected to the particle size and concentration determination by the nanoparticle tracking analysis (NTA) coupled by the NanoSight NS300 analyzer (Malvern, Shanghai, China). Exosomes were diluted for 20 to 200 times prior to loading into the NanoSight to fall into the instrumental linear range of detection, approximately 1×108-1×109 particles/mL. Video records (60s) were taken for all events and analyzed by the NTA software (version 2.3, Malvern); the parameters included: detection threshold, 6; and camera level, 10. Theoretically, the particle size calculation of NTA was based on the particle Brownian motion and the Stokes-Einstein equation.

Protein extraction and immunoblotting analysis 6

ACS Paragon Plus Environment

Page 6 of 43

Page 7 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Cellular and exosomal proteins were respectively extracted by using the SDS lysis buffer (Beyotime, Nanjing, China), supplemented with 1 mM Phenylmethanesulfonyl fluoride (PMSF), cOmplete™ Mini Protease Inhibitor Cocktail and phosphatase inhibitor (Roche, Shanghai, China)

[12]

. Equal amounts of proteins were subjected to immunoblotting analyses

as we previously described

[12, 14, 35]

. Primary Abs included anti-CD9 mAb (1:1000; Abcam,

Shanghai, China), anti-CD63 mAb (1:400; Abcam), anti-TSG101 mAb (1:200; Santa Cruz, Shanghai, China), and anti-HSP90B mAb (1:5000; Proteintech, Wuhan, China). HRP-conjugated secondary Abs included anti-mouse IgG (1:5000; Tianjin Sungene Biotech co., Ltd, Tianjin, China) and anti-rabbit IgG (1:5000; Cell Signaling Technology, Inc., Shanghai, China).

Protein digestion and phosphopeptide enrichment In-solution protein digestion was performed as we described previously

[12]

. Briefly, the

cell/exosome lysate was subjected to reduction (50 mM DTT), alkylation (100 mM IAA) and in-solution tryptic digestion in a filter-aided sample preparation (FASP) manner. Peptides were collected and dried by speed vacuum. For phosphoproteome profiling, phosphopeptides were enriched by a TiO2-based approach. In detail, 12 mg of Titansphere™ TiO (GL Sciences, Tokyo, Japan) particles (5 µm in diameter) were conditioned and equilibrated by Buffer A (80% ACN, 5% TFA) and Buffer B (buffer A with 25% lactic acid). Peptides were mixed with Titansphere TiO particles in Buffer B for 15 min to allow sufficient binding. Peptide-loaded Titansphere TiO particles were subjected to 9 consecutive washes with Buffer A (3 times), Washing Buffer 1 (50% ACN, 0.5% TFA and 200 mM NaCl; 3 times) and Washing Buffer 2 (50% ACN, 0.1% TFA; 3 times). Titansphere TiO loaded peptides were 7

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 43

treated with 5% NH3·H2O for 15 min for elution, and the peptide eluent was loaded into a stage column tip with 1 layer of Empore™ C8 (3M, Shanghai, China) and 2 mg Durashell RP (Agela Technologies, Tianjin, China). Peptides were then fractionated by gradient elutions with 0%, 2%, 5%, 8%, 10% and 40% ACN in 5% NH3·H2O. Peptides were desalted using Mono Tip™ C18 Pipette Tip (GL Sciences) prior to MS analyses.

LC-MS/MS Peptides were analyzed by a TripleTOF® 5600 MS (5600 MS; AB SCIEX, Framingham, CA, USA), using the exact parameters as we previously described

[12]

. In brief, peptides were

loaded in a C18 reverse phase column (15 cm length × 75 µm ID, 3 µm C18, CMP Scientific, NY, USA). With an Eksigent nano-HPLC 425 System (AB SCIEX), the Buffer A (2% ACN in 0.1% formic acid) and Buffer B (98% ACN in 0.1% formic acid) were used to generate a gradient elution of 5-50% ACN for eluting phosphopeptides (75 min at a flow rate of 300 nL/min). Peptides were then analyzed with a 5600 MS. The ESI spray voltage was 2.3 kV. The interface heater temperature was 120 °C; scan range was 350−1500 m/z; information-dependent acquisition (IDA) mass tolerance was set to 50 mDa with resolution >30k fwhm. The maximum number of candidate ions per cycle was 40. Charge state was set to 2−4 and >200 cps. We applied dynamic exclusion, with the settings of co-occurrence = 1 and duration = 20s. All MS raw data are deposited in the iProX database (accession number: IPX00076800) and ProteomeXchange (accession number PXD004079).

Data searches

8

ACS Paragon Plus Environment

Page 9 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Mascot server version 2.5.1 (Matrix Science, London, UK) was used for database searches against the Swiss-Prot HUMAN fasta database (downloaded on Jan 4th, 2016, 20193 entries). Search parameters included: fragment ion mass tolerance, 0.05 Da; parent ion tolerance, 15 PPM; fixed modification, carbamidomethyl of cysteine; and variable modifications, Gln->pyro-Glu of the n-terminus, oxidation of methionine and acetyl of the N-terminus. When searching phosphorylated samples, phosphorylation of Ser, Thr and Tyr, were specified as variable modifications as well. The resulting DAT file was imported into the Scaffold software version 4.5.0 (Proteome Software Inc., Portland, OR, USA) for controlling the peptide and protein level FDR < 0.01. The exported mzIdentML file was further analyzed by the Scaffold PTM software version 3.0 (Proteome Software) to determine the confidence of phosphosites using the Ascore algorithm [36]. Phosphosites with the localization probability > 99% were considered confident and subjected to further analyses.

Compilation of phosphoprotein reference database

To generate a reference database with uniform data format for new phosphosite determination, we merged phosphosite information from three major experimental site-specific phosphorylation databases, dbPTM

[37]

, PhosphoSitePlus®

[38]

and SubPhosDB

[23]

. The latest

releases were downloaded from their home pages in Mar, 2016, respectively. We mapped the phosphosites to the UniProtKB database (canonical+isoform, downloaded on Mar 31st, 2016, 42143 entries). Meanwhile, an accuracy check was performed to test whether the residues recorded in phosphorylation databases were consistent with UniProtKB database. The matched phosphosites were incorporated into the UniProtKB, establishing a new 9

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 43

UniProtKB-based database with collective phosphosite information, referred as multiple phosphoprotein database (MPD).

Determination of new phosphosites

We employed the BLAST+ software version 2.3.0 (National Center for Biotechnology Information, Bethesda, MD, USA) to perform the peptide alignment against the MPD that we created above. If an experimentally detected phosphosite had been recorded by MPD with a perfect match, it would be deemed an exclusively old phosphosite. If such a phosphosite had no perfect match or dubious match (matched when allowing 1 residue mismatch), it would be considered as a new phosphosite. For those new phosphosite candidates with dubious matches, we performed manual spectrum inspection and the pLabel software possible isobaric PTM substitutions as we previously described

[39]

analysis to rule out

[12]

. In addition, per new

C-HPP criteria, only the phosphopeptide (residue length ≥ 9) that was exclusively unique to a certain protein was reported and functionally analyzed in this study.

Chromosome enrichment analysis

The

gene

chromosome

location

was

obtained

from

the

NCBI

(http://www.ncbi.nlm.nih.gov/gene, downloaded on Mar 14th, 2016, 60063 entries). Chromosomal distribution of exosomal proteins, and cellular phosphoproteins were statistically compared with the total human protein background (Swiss-Prot, downloaded on Mar 31st, 2016, 20199 entries) by using the Fisher’s exact test. For comparison, we downloaded

5

exosomal

proteome

datasets

of

CRC

(http://www.microvesicles.org/, ID 20, 21, 347, 348 and 349) 10

ACS Paragon Plus Environment

[40]

from

Vesiclepedia

. Fisher’s exact test was

Page 11 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

performed with the MATLAB 2015a software (MathWorks, Natick, Massachusetts). Statistical difference was accepted when P < 0.05.

Gene ontology analysis

We employed ClueGO+CluePedia (version 2.2.5)

[41]

, a plug-in of Cytoscape (version 3.2.1)

[42]

, to perform the gene ontology (GO) analysis on exosomal phosphoproteins. The

parameters included: Ontologies, molecular function (MF; date: 12.04.2016); Ontologies, biological process (BP; date: 12.04.2016); Pathways, WikiPathways (date: 14.04.2016); evidence code, All_Experimental_(EXP, IDA, IPI, IMP, IGI, IEP); GO term fusion, applied; pathways P value cutoff, P ≤ 0.01 for GO and P ≤ 0.05 for WikiPathways analysis; and P value correction algorithm, Benjamini-Hochberg.

Ingenuity pathway analysis (IPA)

The core analysis of IPA (www.ingenuity.com, QIAGEN, Shanghai, China) was performed as we described previously [33, 35, 43]. Specifically, top canonical pathways, top diseases and bio functions, and regulator effects of the exosome phosphoproteome were statistically computed. The P value was measured by using Fisher’s exact test provided by IPA, according to the likelihood of association of a set of genes with a pathway in Global Functional Analysis (GFA) and Global Canonical Pathways (GCP). P < 0.001 was considered statistically significant. The consistency score of a group of phosphoproteins with regulators and diseases & functions was calculated with the Regulator Effects algorithm provided in IPA.

PhosphoPath analysis 11

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The interaction and pathway enrichment of Y-phosphorylated proteins was analyzed with the PhosphoPath plug-in

[44]

run in the Cytoscape environment. The imported data sources

included PhosphoSitePlus for kinase-substrate interactions, BIOGRID for protein-protein interactions, and WikiPathways for pathway information. Whole human proteome was used as background. The pathway with q value < 0.01 was considered significant. One-step missing node imputation was allowed if any the following criteria met: 1) imputed node(s) added connections; and 2) imputed node(s) were kinases of the phosphosite in dataset.

RESULTS Phosphoproteome characterization of SW620 cell-derived exosomes To determine the size feature and purity of SW620 cell-derived exosomes (SW620 exosomes), we performed NanoSight NTA analyses (Fig. 1A), followed by the immunoblotting assay on exosomal biomarkers (Fig. 1B). We found that the size of SW620 exosomes distributed around 83 nm in diameter, and the majority of exosomes were sized less than 200 nm in diameter (Fig. 1A). Such a size distribution of SW620 exosomes is comparable with Muller et al.

[45]

, reporting the high purity isolation of human plasma

exosomes per NanoSight determination. With our experimental procedure, we could obtain approximately 5.8×106 exosomes or 50 ng exosomal proteins per million SW620 cells. Per immunoblotting analyses, exosomes had positive expression of the exosomal markers of CD9, CD63, HSP90 and TSG101 (Fig. 1B), while the abundance of CD9 and TSG101 were

12

ACS Paragon Plus Environment

Page 12 of 43

Page 13 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

remarkably higher in exosomes than those in cells (Fig. 1B). The results above implicated that we had successfully isolated SW620 exosomes in high purity.

We next performed the high coverage phosphoproteome MS analysis on exosomal proteins in comparison of cellular phosphoproteins. Detailed Information of the proteome/phosphoproteome analysis, including spectrum numbers, peptide and protein identifications was summarized in Supplementary Table S1. Specifically, we found that in the cellular protein fraction, 6382 out of 10046 phosphosites had confident localization probability > 99% (Fig. 1C). In SW620 exosomes, 1091 out of 1690 phosphosites could be confidently detected (Fig. 1D). A total of 337 phosphoproteins in SW620 exosomes were detected and 331 (98.2%) of them had at least 2 exclusively unique peptides (Fig. 1E). As a background, 1896 total proteins were identified in the exosomal proteins, and 1801 (95.0%) of them had ≥ 2 unique peptides; in cells, 1709 out of 1726 (99.0%) phosphoproteins were detected with ≥ 2 unique peptides (Fig. 1E). We emphasize that we have adopted the criteria of protein level FDR < 1 % required by C-HPP for all protein MS identifications. Interestingly, 142 phosphoproteins, nearly half of exosomal phosphoproteome, were identified only in the exosomal fraction, but not in the cellular fraction (Fig. 1F). Such difference suggested that the exosome was a unique source in finding new phosphorylation forms of proteins.

New phosphosite discovery and quality control

13

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

To determine whether a phosphosite is novel, we need to perform comprehensive similarity comparisons with publicly available major phosphorylation databases. Here, we designed a workflow with multiple quality control steps to avoid false discovery (Fig. 2A).

First, we compiled the multiple phosphoprotein database (MPD; details were in the Method section) that contained phosphosite information from 3 major phosphoprotein databases to give an inclusive reference with a uniform data format. This greatly facilitated our subsequent BLAST analyses. As a result, 99.37%, 98.63% and 93.45% entries from dbPTM, PhosphoSitePlus and SubPhosDB, respectively, were successfully mapped to UniProtKB entries (Supplementary Fig. S1). The unmapped entries were largely due to common database mapping problems, including accession number absence and inconsistent protein sequence. The resulting MPD contained 42,143 protein entries and 184,804 phosphosite entries.

Through the workflow shown in Figure 2A, out of the 337 MS detected exosome phosphoproteins, we confirmed 313 confident identifications [residue length ≥ 9, and with exclusively unique peptide(s)], which were used for all subsequent analyses. Meanwhile, we found 3 dubious matches; however, they had no possible isobaric residue PTM substitutions, and thus were classified as new phosphosites. As a result, 202 and 270 new phosphosites were found in the SW620 exosome and cell fractions, respectively (Fig. 2B; Supplementary Table S2). The overlap identifications of these phosphosites with those recorded in the other 3 phosphoprotein databases were shown in Supplementary Fig. S2. Interestingly, only 3 new phosphosites were identified in overlap from both exosome and cell fractions (Fig. 2B). We 14

ACS Paragon Plus Environment

Page 14 of 43

Page 15 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

found that 31.95% (100/313) SW620 exosome phosphoproteins had new phosphosites, while in cells, only 10.52% (177/1506) had new phosphosites (Fig. 2C). The sample MS/MS spectra for phosphopeptides with new phosphosites were provided in Supplementary Fig. S3; all phosphopeptide spectra could be found in the Scaffold PTM files uploaded in the ProteomeXchange database (identifier: PXD004079).

We noted that the phosphorylation patterns of exosome phosphoproteins were considerably different from those in cells. There were 20.2% phosphorylation occurred on the threonine (T) residues of exosome phosphoproteins (Fig. 2D); however, only 8.6% phosphosites had T phosphorylation in cellular phosphoproteins (Fig. 2E). Surprisingly, 6.4% of exosomal phosphosites were Y-phosphorylated (Fig. 2D), greatly more than the expected 0.6% of Y-phosphosites detected in the cellular phosphoproteins (Fig. 2E). We then tried to answer whether this usual Y-phosphorylation in exosomes was due to the biased detection of very low abundant pY-phophopeptides. We analyzed the abundance distribution of all phosphopeptides and performed the distribution examination with KS test. We found that the abundance distribution of pS- and pT- phosphopeptides had no significant difference, with median total ion current (TIC) of 2.6×104 and 2.8×104, respectively; their distributions were significantly different from the pY-phosphopeptides (median TIC = 4.2×104) (Supplementary Fig. S4). Our results indicated that exosomal pY-phosphopeptides with higher abundances tended to be preferentially detected by MS. In other word, there should have more low abundance pY-phosphopeptides that remained to be undetected in exosomes.

15

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 43

In addition, we identified 60 phosphoproteins and 213 total proteins from the SW620 exosomes, which had not been detected by other exosomal proteomic analyses as recorded by the Exocarta database (Fig. 2F).

We next blasted our peptide identifications against the Human Phosphoproteome 2015-09 dataset of PeptideAtlas, which had played a significant role in the 2016-02 enhancements

of

neXtProt.

Interestingly,

among

our

experimentally

detected

phosphopeptides with new phosphosites, 114 out of the 201 exosome phosphopeptides and 187 out of the 216 cellular phosphopeptides were deposited by PeptideAtlas. Although it does not record phosphosite information, the above comparison justifies that PeptideAtlas human Phosphoproteome is a very useful peptide resource for C-HPP [46].

Chromosome 11 enrichment of SW620 exosome phosphoproteins

We have found the different phosphorylation pattern and phosphoprotein composition of SW620 exosomes as compared with cells. We next tried to address whether the SW620 exosome phosphoproteins and total proteins had chromosome-centric distribution difference (Fig. 3).

By analyzing total proteins of SW620 exosomes, we found that they were significantly enriched in chromosome 13 and 17 (P = 0.019 and 0.008, respectively), while slightly significantly avoided to be distributed on chromosome 4 (P = 0.048) (Fig. 3A). As a cross-validation, we performed the same analysis on 5 exosomal proteome datasets of 16

ACS Paragon Plus Environment

Page 17 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

different CRC cells obtained from Vesiclepedia. Four of these data showed significant enrichment in chromosome 17, which was consistent with our data, although certain difference was noted, such as the chromosome 12 enrichment (Fig. 3A). Cellular phosphoproteins of SW620 cells shared similar patterns that were significantly enriched in chromosome 17, but significantly avoided to be distributed on chromosome 4 (Fig. 3A).

Regarding phosphoproteins, we found very different chromosome-centric distributions from the analysis above. They were slightly significantly enriched in chromosome 11 (P = 0.037), while significantly avoided to be distributed on chromosome 4 (Fig. 3A). We then visualized the gene distribution aided by the Circos program, showing the chromosomal location of proteins and phosphoproteins detected at cell and exosome levels in chromosome 11 (Fig. 3B). We noted that there were 16 out of 30 (53.3%) SW620 exosome phosphoproteins were located in a narrow region from q12.1 to q13.5 of chromosome 11. Such a distribution feature tended to exist in cellular phosphoproteins and exosomal total proteins as well. Statistical analyses favored the above 11q12.1-13.5 enrichments (P