Subcellular Proteome Landscape of Human Embryonic Stem Cells

Sep 11, 2018 - Department of Molecular Systems Biology at Cell Science Research Center, Royan Institute for Stem Cell Biology and Technology, ACECR, ...
0 downloads 0 Views 3MB Size
Subscriber access provided by Kaohsiung Medical University

Article

Subcellular Proteome Landscape of Human Embryonic Stem Cells Revealed Missing Membrane Proteins Mehari Muuz Weldemariam, Chia-Li Han, Faezeh Shekari, Reta Birhanu Kitata, ChingYu Chuang, Wei-Ting Hsu, Hung-Chih Kuo, Wai-Kok Choong, Ting-Yi Sung, FuChu He, Maxey Ching Ming Chung, Ghasem Hosseini Salekdeh, and Yu-Ju Chen J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.8b00407 • Publication Date (Web): 11 Sep 2018 Downloaded from http://pubs.acs.org on September 12, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Subcellular Proteome Landscape of Human Embryonic Stem Cells Revealed Missing Membrane Proteins Mehari Muuz Weldemariam1,2,3, Chia-Li Han4*, Faezeh Shekari5, Reta Birhanu Kitata1, Ching-Yu Chuang6, Wei-Ting Hsu7, Hung-Chih Kuo7, Wai-Kok Choong8, Ting-Yi Sung8, Fu-Chu He9,10, Maxey Ching Ming Chung11, Ghasem Hosseini Salekdeh5,12,13*, Yu-Ju Chen1,2* 1

Institute of Chemistry, Academia Sinica, Taipei 115, Taiwan

2

Department of Chemistry, National Taiwan University, Taipei 112, Taiwan

3

Chemical Biology and Molecular Biophysics Program, Taiwan International

Graduate Program, Academia Sinica, Taipei 115, Taiwan 4

Master Program in Clinical Pharmacogenomics and Pharmacoproteomics, College

of Pharmacy, Taipei Medical University, Taipei 110, Taiwan 5

Department of Molecular Systems Biology at Cell Science Research Center, Royan

Institute for Stem Cell Biology and Technology, ACECR, Tehran, Iran 6

Genomics Research Center, Academia Sinica, Taiepei 115, Taiwan

7

Institute of Cellular and Organismic Biology, Academia Sinica, Taipei 115, Taiwan

8

Institute of Information Science, Academia Sinica, Taipei 115, Taiwan

9

Institutes of Biomedical Sciences, Fudan University, Shanghai, China.

10

State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing,

China. 11

Department of Biochemistry, Yong Loo Lin School of Medicine, NUS, Singpore.

12

Department of Molecular Sciences, Macquarie University, Sydney, NSW, Australia

13

Department of Systems and Synthetic biology, Agricultural Biotechnology

Research Institute of Iran (ABRII), Agricultural Research, Education, and Extension Organization, Karaj, Iran

Corresponding Authors: Yu-Ju Chen Institute of Chemistry, Academia Sinica, Taipei, Taiwan, 1 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Tel: +886-2-2789-8660, Fax: +886-2-2783-1237, E-mail: [email protected]

Ghasem Hosseini Salekdeh Department of Molecular Systems Biology, Cell Science Research Center, Royan Institute for Stem Cell Biology and Technology, ACECR, Tehran, Iran Tel: +98-21-22339936 Fax: +98-21-22339958 E-mail: [email protected]

Chia-Li Han Master Program in Clinical Pharmacogenomics and Pharmacoproteomics, College of Pharmacy, Taipei Medical University, Taipei, Taiwan Tel: +886-2-2736-1661 Fax: +886-2-2739-0671 E-mail: [email protected]

2 ACS Paragon Plus Environment

Page 2 of 42

Page 3 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Abstract Human embryonic stem cells (hESCs) have capacity for self-renewal and multi-lineage differentiation which are of clinical importance for regeneration medicine. Despite the significant progress of hESCs study, the complete hESC proteome atlas, especially the surface protein composition awaits to be delineated. According to the latest release of neXtProt database (2018-01-17, 19,658 PE1,2,3,4 human proteins), membrane proteins present the major category (1,047, 48%) among all 2,186 missing proteins (MPs). We conducted a deep subcellular proteomics analysis of hESCs to identify the nuclear, cytoplasmic and membrane proteins in hESCs and to mine missing membrane proteins in the very early cell status. To our knowledge, our study achieved the largest dataset with confident identification of 11,970 unique proteins (1% FDR at peptide, protein and PSM levels), including most comprehensive description of 6,138 annotated membrane proteins in hESCs. Following the HPP guideline, we identified 26 gold (neXtProt PE2,3,4 MPs) and 87 silver (potential MP candidates with a single unique peptide detected) MPs of which 69 were membrane proteins, and expression of 21 gold MPs were further verified either by multiple reaction monitoring mass spectrometry (MRM) or by matching synthetic peptides in the Peptide Atlas database. Functional analysis of the MPs revealed their potential roles in the pluripotency related pathways and the lineage- and tissue-specific differentiation processes. Our proteome map of hESCs may provide rich resource not only for identification of MPs in human proteome but also for investigation on self-renewal and differentiation of hESC. All mass spectrometry data were deposited in ProteomeXchange via jPOST with identifier PXD009840.

Keyword: Human embryonic stem cells, Missing proteins, Membrane proteome, Subcellular fractionation

3 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Introduction Human embryonic stem cells (hESCs), a fundamental cell type, have unique capacity of cell renewal and pluripotency. Under specific stimulations, hESCs can differentiate into various types of cells, tissues and organs with specialized functions 1 which is of clinical interest to be used in regenerative medicine to restore normal cellular functions. Currently, transcription factors including c-Myc, Oct4, Klf4, Nanog and Sox2 are well known regulators for pluripotency.1-4 Many transcriptional factors regulate reprograming of human cell in to different lineages.5 Several intracellular multi-organelle signaling pathways are also involved in regulation of hESC pluripotency. For example, FGF-2 stimulates short-term reorganization and translocation of -catenin into nucleus which then modulates the expression of pluripotent gene expression.6 The activation of NOTCH, a cell surface receptor, would be cleaved by -secretase to generate NOTCH intracellular domain and then translocate into nucleus to regulate the expression of target genes associated with hESC differentiation.7 The cleavage of epithelial cell adhesion molecule (EpCAM) by tumornecrosis-factor alpha converting enzyme results in the release of intracellular domain (EpICD) to translocate into nucleus and stimulate the expression of c-Myc.8 Despite the significant progress of hESCs study, the complete hESC proteome atlas, especially the composition of cell surface factors responsible for maintaining the pluripotent nature of hESCs, are still under-explored.9 According to the most recent release of neXtProt database (2018-01-17), 19,658 proteins (PE1,2,3,4 level) are allocated in the human proteome of which 2186 proteins are still missing10 with no or inadequate evidence of translation (PE2: 1,660, PE3: 452, PE4: 74), which await to be explored by high-stringency mass spectrometry (MS) evidence to confirm their existences in the human proteome.11,12 Factor that contributes to the lack of protein level evidence for missing proteins is related to physicochemical characteristics of special classes of proteins. Among these missing proteins, as high as 1,047 (48%) are annotated membrane proteins. The membrane sub-proteome is still under-explored due to the unique features of low abundance, high hydrophobicity, and poor protease accessibility,13-16 which cause low MS detectivity. Due to the potential roles of membrane proteins in regulating hESC pluripotency, several groups have 4 ACS Paragon Plus Environment

Page 4 of 42

Page 5 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

applied proteomic analysis to study the membrane proteome of hESCs and/or the differentiated cells.17-29 By using various proteomics strategies, these literatures reported identification of 237~2910 membrane proteins, peripheral membrane proteins or proteins with signal peptides in hESCs. Prokhorova et al. analyzed the SILAC-labeled hESCs and identified 811 membrane proteins.30 McQuade et al. attempted to improve the membrane proteome coverage by using IPG-IEF peptide fraction and identified 1279 membrane proteins in hESCs.31 Gu et al. purified the cell surface proteins by biotin labeling, identifying 1560 cell surface proteins in hESC and germ cells.32 Gerwe et al. analyzed the membrane proteomic signatures of karyotypically normal and abnormal human embryonic stem cell lines and identified 775 membrane proteins and 720 transmembrane proteins,24 Melo-Braga et al. applied quantitative comparison of the membrane proteome, phosphoproteome, and sialiome of hESC and neural stem cells and identified a total of 5,105 proteins whereof 57% contained transmembrane domains or signal peptides.27 Harkness et al. identified 444 transmembrane or membraneassociated proteins including 15 CD antigens and a number of surface marker molecules previously not observed in hESC at a proteome level.33 Large-scale profiling was reported by Sarkar et al. that subcellular proteomic analysis of hESC revealed 893, 2475, and 1,185 proteins in the nuclear, cytosolic, and membrane fractions, respectively.26 Recently, Ghazizadeh and colleagues used shotgun proteomics approach to search for surface markers for the ISL1+ cardiac progenitor cells derived from hESCs.34 . ALCAM (CD166) was identified as a specific surface marker that could be used to enrich these progenitor cells. They further used this purification strategy to characterize the human ISL1+ cardiac progenitors and demonstrated their potential for cell therapy in a myocardial infarction model.34 Although the optimized technology provides large-scale protein identifications in hESC, the surface membrane protein markers based on different cell type or lineage-specificities remain to be discovered.18 Delineating the membrane subproteome may discover cell surface markers for sorting or isolation, exchange of molecule and regulating signal transduction. Nevertheless, transcriptome analysis could not fulfill such need of membrane protein identification of hESCs.

5 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 42

Various reasons may account for the difficulty of mass spectrometry-based identification of MPs. Among them, targeting membrane proteins with special physical and chemical properties have been reported to efficient strategy to identify MPs.15 In addition to these intrinsic properties35 some proteins may have tissue- or cell-specific expressions which cannot be ubiquitously found in every biological samples. Taken together, delineation of the proteome map of a hESCs may provide rich resource not only for investigation on self-renewal and differentiation of hESC but also identification of MPs of human proteome. In this study, we aimed to conduct a deep proteome analysis of hESCs based on the subcellular separations, extensive peptide fractionation, and high precision MS analysis. To our knowledge, this study presents the largest dataset of confident identification of a total of 11,970 proteins, including the most comprehensive description of 6,138 annotated membrane proteins from hESCs to date. Such dataset allows identification of 26 gold (neXtProt PE2,3,4 MP) and 87 silver (potential MP candidates with a single unique peptide detected) MPs from the Human Proteome Project (HPP) Data Interpretation Guidelines.10,36,37 The functional analysis of MPs revealed their potential roles in GPCR-related pathways to regulate self-renewal and lineage specific differentiation of hESCs in the very early cell status.

Experimental Section Materials and Reagents Triethylammoniumbicarbonate (TEABC), 2-[4-(2-hydroxyethyl)-piperazin-1-yl]ethan sulfonic acid (HEPES), potassium chloride (KCl), magnesium chloride (MgCl2), sodium hydroxide (NaOH), hydrochloric acid (HCl), Tris(2 carboxyethyl)phosphine hydrochloride (TCEP), methylmethanethiosulfonate (MMTS), Trifluoroacetic acid (TFA), HPLC-grade ACN, phosphate buffered saline (PBS), ammonium persulfate (APS) and sodium carbonate (Na2CO3) were purchased from Sigma-Aldrich. Protease inhibitor cocktail tablet was obtained from Roche Diagnostics (Mannheim, Germany). Urea was purchased from USB Corporation (Cleveland, OH). Sodium dodecyl sulfate (SDS),

sucrose,

and

ethylenediaminetetraacetic

acid

(EDTA),

tetramethylethylenediamine (TEMED) were purchased from Merck (Darmstadt, 6 ACS Paragon Plus Environment

Page 7 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Germany). Formic acid (FA) was purchased from Riedel de Haen (Seelze, Germany). 5 μm C18-AQ beads were purchased from Dr. Maisch (Ammerbuch, Germany). Monomeric acrylamide/bisacrylamide solution (40%, 29:1) was purchased from BioRad (Hercules, CA, USA). De-ionized water was obtained from a Milli-Q Ultrapure Water Purification System (Millipore, Billerica, MA).

Culture of Human Embryonic Stem Cells (H9 Cell Line) The hESCs, H9 (46, XX; WiCell ResearchInstitute Inc., Madison, WI, USA) was grown on mitotically-inactivated mouse embryonic fibroblasts in DMEM/F12 medium with 20% Knockout Serum Replacement, 1% non-essential amino acids, 2 mM Lglutamine (all from Invitrogen, Thermo Fisher Scientific, Walthan, MA, USA), 100 mM beta-mercaptoethanol (Sigma, St. Louis, MO, USA) and basic fibroblast growth factor (5 ng/ml; Sigma). Subculture of hESCs was performed every 7 days.

Subcellular Fractionation, Protein Extraction and Digestion The H9 cells were subjected to subcellular fractionation based on our previously reported protocol.38 Briefly, H9 cells were washed with PBS, collected, suspended with hypotonic buffer (10 mM HEPES, pH 7.5, 1.5 mM MgCl2, 10 mM KCl) with protease inhibitor cocktail (100:1, sample/protease inhibitor, v/v) for 15 min on ice, and homogenized by Dounce homogenizer for 50 passes. The cell lysate was centrifuged at 1000 x g for 10 min at 4°C to obtain nucleus pellet (nuclear fraction). The postnuclear supernatant was adjusted to contain 0.25 M sucrose and centrifuged at 13,000 rpm for 1 h at 4°C to separate membrane protein pellet and supernatant (cytoplasmic fraction). The membrane pellet was washed with 1 ml of ice-cold 0.1 M Na2CO3 (pH 11.5) and then centrifuged to obtain the crude membrane fraction. Protein digestion was performed using previously described gel-assisted digestion method.38 The nuclear, cytoplasmic and membrane protein pellets were dried and resuspended with denaturing buffer (2% SDS, 6 M urea, 5 mM EDTA, 0.1 M TEABC, pH 8) separately followed by sonication at 4°C for 15 min. Proteins were reduced by 5 mM TCEP at 37°C for 30 min and then alkylated with 2 mM MMTS at room

7 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

temperature for 30 min. The protein solution was polymerized as gel by mixing with acrylamide/bisacrylamide solution (40%, 29:1), 10% APS, TEMED solution in microcentrifuge tube in a volume ratio of 14:0.7:0.3. After several washings with 50% ACN containing 25 mM TEABC, trypsin (trypsin: protein = 1:10) was added into the protein gels for overnight incubation at 37°C. Peptides were sequentially extracted with 25 mM TEABC, 0.1% TFA in water, 50% ACN in 0.1% TFA, and 100% ACN, and then vacuum dried in speedVac (Thermo Savant SC210A, Holbrook, NY).

Peptide Fractionation The membrane peptides were fractionated by using Hp-RP StageTips39 and HpRP HPLC40 using the protocol described before. In brief, for Hp-RP StageTip method, C8 membrane was properly inserted into Gilson 200-µL pipet tips used as a frit. 5 mg of C18–AQ beads (5 µm) were packing into the StageTips, washed and conditioned adequately by centrifugation at 1,500 x g for 2 min. 20 µg membrane peptides were redissolved in 200 mM NH4COOH (pH 10) and transferred into the StageTips followed by centrifugation at 1,500 x g for 2 min to elute the peptides in 7 fractions with increasing ACN concentration. For Hp-RP-HPLC fractionation, 200 µg of membrane peptides was resuspended in 200 µl buffer A (5 mM ammonium formate, pH 10, 2% ACN) and loaded onto a HPLC Column Zorbax 300 Extend-C18 (4.5 mm x 250 mm, 5 µm) using Waters 2650 HPLC system (monitored at 280 nm), separated with a 120min gradient at a flow rate of 0.5 ml/min. Buffer B was made of 90% ACN and 5 M ammonium formate. The running gradient was started with 100% buffer A from 0 to 9 min, then linear gradient of 6% B (9 to 13 min), 28.5% B (13 to 63 min), 34% B (63 to 68.5 min) and 60% B (68.5 to 81.5 min). Buffer B was kept at 60% from 81.5 to 90 min, continued with linear gradient 90% B (90 to 92 min), and equilibrated to 100% buffer A from 100 to 120 min. Subsequently, 96 fractions were collected and concatenated to 12 fractions followed by desalted with StageTip packing with SDBXC membrane and C18 beads (10 mg, 5 µm). Using the same Hp-RP-HPLC method described above, 300 µg of cytoplasmic and nuclear peptides were separately fractionated to collect 24 fractions for LC-MS/MS analysis. 8 ACS Paragon Plus Environment

Page 8 of 42

Page 9 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

LC-MS/MS Analysis All fractions were analyzed by using LTQ Orbitrap Fusion or Fusion™ Lumos™ Tribrid™ mass spectrometers (Thermo Fisher Scientific, Bremen, Germany) equipped with Dionex Ultimate 3000 nanoLC system (Thermo Scientific). The Hp-RP fractions were dried, re-suspended in loading buffer (0.1% FA in deionized H2O) and loaded onto a 25-cm x 75-m C18 column (Acclaim PepMap® RSLC, Thermo ScientificTM) equipped with a NanoSpray interface. The peptides were eluted using 150 min nonlinear gradient as follows: 0-1% B (0.1% FA in ACN) for 1 min, 1-2% B for 1 min, 2-10% B for 18 min, 10-25% B for 75 min, 25-35% B for 35 min, 35-90% B for 1 min, and 90% B for 5 min. The 15 most intense ions were sequentially isolated with an isolation window of 1.4 Da for higher-energy collision dissociation and detected in the orbitrap with previously selected ions dynamically excluded for 60 s. All mass spectrometry data of this study were deposited in ProteomeXchange41 via jPOST42 with identifier

PXD009840

(Username:

[email protected],

Password:

missingprotein2018).

Database Search and Protein Identifications The MS/MS raw files were processed with Proteome Discoverer (PD, version 2.1.0.81, Thermo Fisher) against SwissProt human protein database (release 2015.12, 20,055 entries) using three searching engines including Mascot (v2.3.2), Sequest HT (v2.1) and MS Amanda (v1.0). The mass tolerances for precursor and fragment ions were set as 10 ppm and 0.02 Da respectively. Trypsin was used for digestion with maximum of 2 missed cleavages. Methylthio (Cys) was set as fixed modifications whereas oxidation (Met) and deamidation (Asn and Gln) were set as variable modifications. To ensure high confidence in identification results, identification results were filtered to 1% false discovery rate (FDR) in peptide-spectra match (PSM), peptide and protein levels. Here the search result from each engine and all the raw files selected were combined to obtain a new FDR estimation. Therefore, each 48 raw files of 24 9 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

fractions were searched separately to obtain identification per file and then consensus workflow was applied using the search result to get a combined search result with a controlled FDR filtering. Therefore, instead of adding up identification from each raw file manually, combined filtering based on the new combined FDR provided a more confident identification for grouped peptide and protein list. In this study, we have combined all results from the three subcellular fractions and two batches (total of 230 raw files) to obtain the identification result. Based on the quality of each PSM, PD assigns percolator q-values and Posterior Error Probabilities (PEPs) to a given target and decoy PSMs generated by each search engines and the FDR was separately calculated and determined for PSMs by using the highest PSM and decoy ranks (rank 1) for each spectrum. The PD calculated FDR of peptides by assigning q-and PEP values which were then assigned confidence scores based on the target FDRs (1%). Protein FDR was also calculated and assigned individual scores based on PEP values of PSMs. In this study, we obtained the total list of identified proteins in nuclear, cytoplasmic and membrane proteomes by adopting the consensus workflow in PD to combine all the fraction data with filtering to 1% FDR at PSM, peptide and protein levels with peptide length limited to more than 6 amino acids.

Verification of MPs by Multiple Reaction Monitoring (MRM)-MS 24 identified unique peptides (≥ 9 amino acids, no missed cleavage) were selected from 12 gold MPs for synthesis of standard peptides (Omics Bio, New Taipei city, Taiwan). Using the standard peptides, we applied the LC-MS/MS analysis (as described in above section) to obtain the PSMs and constructed the spectral library for establishing MRM methods. Top 10 high-intensity fragment ions per peptide precursor were selected and collision energy was optimized varying 5 steps up and down with ± 1 V from the default value. Every MRM analysis was conducted by QTRAP 5500 system (AB SCIEX Concord, ON, Canada) using Trap column (ACQUITY UPLC® symmetry C18, 5 µm, 180 µm x 20mm) with BEH C18-AQ column (nanoACQUITY® 1.7 µm, 75 µm x 250 mm). We used a 2-h LC gradient with a flow rate of 0.3 ml/min. Peptides were eluted using the following gradient: 0-1% B (ACN in 0.1% FA) for 0.5 10 ACS Paragon Plus Environment

Page 10 of 42

Page 11 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

min, 1-5% B for 0.5 min, 5-35% B for 90 min, 35-85% B for 1 min, and 85% B for 4 min. In the discovery mode, these endogenous peptides were identified in different subcellular sample and peptide fractions. Thus, 24 peptides were analyzed in duplicate in their respective subcellular sample and fraction in nonscheduled mode over 2-h LCMRM MS analysis.

Functional Analysis of Identified Proteins Annotation of identified proteins for their subcellular localization, membrane proteins, transmembrane protein helix (TMH) distribution, missing protein, protein evidence levels and chromosomal distribution was performed based on the latest version of neXtProt (2018-01). To annotate the experimental and predicted evidences for membrane proteins, we extracted three annotation codes in neXtProt, including ECO:0001053 (immunocytochemistry evidence from HPA)43, ECO:0000269 (experimental evidence used in manual assertion from Uniprot)44 and ECO:0000314 (direct assay evidence used in manual assertion from Gene Ontology) for annotation of experimentally verified membrane proteins. Other codes were assigned as prediction evidences , including ECO:0000049 (reporter gene assay evidence), ECO:0000212 (combinatorial evidence), ECO:0000250 (sequence similarity evidence used in manual assertion), ECO:0000255 (match to sequence model evidence used in manual assertion), ECO:0000303 (non-traceable author statement used in manual

assertion),

ECO:0000304 (traceable author statement used in manual assertion), ECO:0000305 (curator inference used in manual assertion), ECO:0000315 (mutant phenotype evidence used in manual assertion), ECO:0000316 (genetic interaction evidence used in manual assertion), ECO:0000318 (biological aspect of ancestor evidence used in manual assertion), ECO:0000318 (biological aspect of ancestor evidence used in manual assertion), ECO:0000320 (phylogenetic determination of loss of key residues evidence used in manual assertion), ECO:0000353 (physical interaction evidence used in manual assertion) and ECO:0000501 (evidence used in automatic assertion). We used Protter (v1)45 to construct the possible topology of missing membrane proteins with multiple TMH domains. The in silico digestion of peptides was generated using PeptideMass in ExPASy (https://web.expasy.org/peptide_mass/). For functional annotation of identified proteins, Gene Ontology, Keyoto Encyclopedia of Genes and Genomes (KEGG)46 and Ingenuity Pathway Analysis (IPA, version: 43605602) were used to annotate pathways and involved molecules related to pluripotency and 11 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

differentiations of hESCs. The mRNA expression levels of gold and silver missing proteins in human tissues and organs were extracted from Human Protein Atlas (HPA)43 with transcript per million values (TPM). Missing proteins with mRNA expression levels higher than 10 TPM were assigned into corresponding organs and tissues.

Results and Discussion A deep subproteome analysis identified 11,970 proteins in human embryonic stem cells (hESCs) In order to gain more insights on the proteome organization associated with hESC function, we firstly conducted a comprehensive proteome analysis in hESCs for dissecting the molecular landscape. Due to the complexity, spatial and dynamic expressions of proteins,47,48 subcellular fractionation were applied for comprehensive proteome profiling of hESC49. As shown in Figure 1A, nuclear, cytoplasmic and membrane protein fractions were purified from hESCs using our reported 2-step centrifugation with sucrose, followed by gel-assisted digestion with trypsin, peptide fractionation and LC-MS/MS analysis.38,50 s membrane subproteome are low abundant to result in few micrograms of peptide digest, we employed our developed sensitive high-pH reversed phase (Hp-RP) stop-and-go extraction tip (StageTip) technique to collect 7 RP fractions. For cytoplasmic and nuclear subproteomes, we applied off-line high-pH reversed-phase high-pressure liquid chromatography fractionation (Hp-RPHPLC) to generate 96 fractions and then combine into 24 RP fractions. Each subcellular proteome has two replicate analysis and all the RP fractions were analyzed in duplicate by Orbitrap Fusion™ or Fusion™ Lumos™ Tribrid™ mass spectrometers. Protein identifications were performed for each RP fraction by using Proteome Discoverer 2.1 (PD 2.1) with three search engines, Mascot, SequestHT and MSAmanda, at 1% FDR in PSM, peptide and protein levels. The total list of identified proteins in nuclear, cytoplasmic and membrane proteomes were obtained by combining all the fraction data using the consensus workflow in PD 2.1 (1% protein FDR, Figure 1B). Although the consensus workflow would reduce the number of identified proteins, highly confident identification result was expected.37

12 ACS Paragon Plus Environment

Page 12 of 42

Page 13 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

For all peptide fractionation, an average of 4,789 ± 164 proteins and 21,001 ± 1,836 peptides were obtained per fraction. The peptide fractionation efficiency of HpRP-HPLC for the three subproteomes were shown in Supplementary Figure S-1. Most fractions had < 15% overlapping of identified peptides between two adjacent fractions, suggesting adequate peptide fractionation efficiency in both fractionation methods. After combining all the fraction results by using the consensus workflow in PD 2.1, we identified a total of 8,742 proteins (133,802 unique peptides), 10,757 proteins (187,365 unique peptides) and 9,228 proteins (157,665 unique peptides) in the membrane, nuclear and cytoplasmic subproteomes, respectively. Taken together, our study achieved an identification of 11,970 unique proteins (264,088 unique peptides) in hESCs of which 6,874 proteins (57.4%) were commonly identified in the three subproteomes while 620, 932, and 535 proteins were uniquely identified in one of the subproteomes respectively (Figure 2A and Supplementary Table S-1). In comparison with the published large-scale proteomics analysis of hESCs and/or the differentiated cell types which have reported more than 10,000 identified proteins28,29 (Figure 2B), to our knowledge, this study presented the largest dataset of hESCs with additional identification of 3,626 unique protein coding genes in hESCs. In line with the C-HPP uPE1/CP50 initiative goal, we analyzed the potential functions of identified PE1 proteins having unknow functional annotation in neXtProt 2018-01. A total of 570 uPE1 proteins were identified in our dataset (Supplementary Table S8a) of which 28 were located in chromosome 4 (Supplementary Table S-8b). According to our data, some uPE1 proteins in chromosome 4 showed overexpression levels in lung cancer tissues (unpublished). In our future study, we will verify the expression and potential functions of these proteins in lung cancer. Membrane proteome profile of hESCs Membrane proteome, particularly those with transmembrane helix (TMH) domains, remains underrepresented in proteomic studies due to their low abundance and hydrophobic nature to cause challenges in solubilization, proteolysis and low MS detectability.51 In the latest neXtProt database (2018_01, 19,658 PE1,2,3,4 protein sequences), 1,047 membrane proteins still remain missing (without protein-level 13 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 42

evidences) and have limited experimental evidences for molecular function, structure and localization of membrane proteins.49,52 In this study, 8,742 proteins were confidently identified in the crude membrane fraction of hESCs. In previous publications focusing on membrane proteome analysis of hESCs cells.17-29 Two hundred thirty seven to 2,910 membrane proteins were observed in each work and a total of 7,222 membrane proteins have been reported in hESCs. Compared to these accumulated data, we found that our study revealed 2,934 newly identified proteins. Among them, 802 proteins were membrane associated proteins, including 163 plasma membrane proteins, 255 probable membrane proteins and 384 peripheral membrane proteins (Figure 2C). We further annotated the cellular localizations and the number of TMH for all 11,970 identified proteins using the latest version of neXtProt (2018-01). Protein with the

following

annotation

codes

in

neXtProt,

including

ECO:0001053

(immunocytochemistry evidence from HPA)43, ECO:0000269 (experimental evidence used in manual assertion from Uniprot)44 and ECO:0000314 (direct assay evidence used in manual assertion from Gene Ontology) were considered as experimentally verified membrane proteins. Other evidence codes were applied for predicted membrane protein annotations. Among the 11,970 identified proteins, a total of 6,138 proteins were annotated as membrane proteins (4,733 by experimental evidences and 1,405 by prediction) including 3,033 (50%) plasma membrane proteins, 1,328 probable membrane proteins and 1,735 peripheral membrane proteins (Figure 2D, Supplementary Table S-3). In addition, several organelle specific membrane proteins were also identified including nucleus membrane (n=358), ER membrane (n=671), mitochondria membrane (n=400) and golgi membrane proteins (n=546). This dataset also identified 2,626 integral membrane proteins with more than one TMH which is nearly 43% of annotated membrane proteins (Figure 2E). About 52% of integral membrane proteins have more than 2 TMHs; 90% of them have up to 12 TMHs including voltage-dependent L-type calcium channel subunit alpha-1C (CACNA1C) with the highest number of 249 TMHs. In summary, these analyses suggested that our results revealed the most comprehensive membrane proteome of hESCs to date that 14 ACS Paragon Plus Environment

Page 15 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

would be valuable to provide protein-level evidence to further study the protein composition and their function to regulate the biology hESCs.

Stepwise Filtering of Missing Proteins in hESC Proteomes The deep hESC proteome and newly identified proteins allows us to mine the missing proteins (MPs) that might be associated with pluripotency-related functions. Following Human Proteome Project Data Interpretation Guidelines (version 2.1.0)37, we applied a stepwise filtering to analyze the 11,970 identified proteins from subcellular proteomes of hESCs. Table 1 shows the step-by-step criteria and number of proteins and unique peptides after filtering. At 1% peptide and protein FDR, we identified 87, 108 and 68 potential missing proteins with highly confident peptide spectrum matches (PSMs) in membrane, nucleus and cytoplasmic proteomes respectively. Only proteins with at least one unique peptide which has unambiguous PSMs and no missed cleavages were filtered. We then checked whether the identified unique peptide would match to single amino acid variants (SAAVs) of any protein or isobaric substitutions by using peptide uniqueness checker (neXtProt, 20180117 release) and excluded these peptides for confident identification. To identify gold MPs, the final step is to extract proteins that have at least two unique peptides with more than 9 amino acids. With the strict criteria described above, a total of 26 proteins were identified as gold missing proteins (MPs), including 15, 3 and 13 proteins identified in membrane, cytoplasm, and nucleus fractions respectively. Among these, 10, 1 and 4 proteins were exclusively identified in membrane, cytoplasm and nucleus fractions respectively. 11 proteins were commonly identified in more than one fraction in the subproteomics of hESCs. The detail identification results for gold MPs were listed in Supplementary Table S-4. As shown in Table 2, 25 proteins have evidences at transcript level (PE2). The gold MP distributed in various chromosomes. The chromosome 19 has highest number of 5 gold MPs, followed by 2 gold MPs each from chromosome 1 and X. The remaining gold MPs are from chromosome 2, 4, 5, 6, 8, 9, 12, 13, 14, 15, and 22. In addition, many of gold MPs were identified with more than 2 peptides. For example, the top-ranking MP, AMIGO3, was identified with 9 unique peptides in membrane and 15 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 42

nucleus fraction, suggesting its high abundance in hESCs. Of the 26 gold MPs, 16 proteins contained TMH domains including 5 plasma membrane proteins, 11 probable membrane proteins. The result showed that mining the subcellular proteome of hESC presents an effective strategy to mine missing membrane proteins. In addition to gold MPs, we also identified silver MPs with single unique peptide of length ≥ 9 amino acid residues. Following the same filtering criteria, a total of 87 silver MPs were identified, including 40, 28 and 40 silver MPs from membrane, cytoplasm and nucleus fractions, respectively. Among them, 17 proteins were identified in more than one fractions (Supplementary Table S-5). Structural analysis of the 87 silver MPs revealed 52 membrane proteins of which 46 (88.5%) silver MPs contain TMH domains. The high percentage of integral membrane proteins in the silver membrane MPs highlighted the challenges of identifying missing membrane proteins with multiple peptides. The structural feature of such MP was shown in the example of alpha-2C adrenergic receptor protein (ADRA2C, 49.5 kDa), a MP with 7 TMHs from chromosome 4 (Figure 3). We examined the predicted structure of ADRA2C by using Protter and applied in-silico digestion analysis to generate putative unique and detectable peptides (0 missed cleavage, length between 9 to 40 amino acids) that fulfil the current guideline of missing proteins in HPP. The analyses showed that nine unique tryptic peptides (≥ 9 amino acids) could be derived from ADRA2C. Among these putative 309

tryptic

peptides,

we

identified

peptide

AGAEGGAGGADGQGAGPGAAESGALTASR337 (marked in red, Figure 3A)

evidenced by the PSM shown in Figure 3B. Regarding other peptides in the extracellular 27

domain,

1

MASPALAAALAVAAAAGPNASGAGER26

and

GSGGVANASGASWGPPR43 contain N-linked glycosylation motif (Asn-X-Ser/Thr)

at amino acid position 19 and 33, which is expected to have low MS detectability. The remaining six unique peptides (marked in grey) locate within or close to lipid bilayer. These structural features may hinder the tryptic digestion53 and subsequent MS detection and protein identifications,13,14,53 which made ADRA2C difficult to be identified as gold MPs. However, our dataset detected a second peptide with 8 amino acids (352SVEFFLSR359) in ADRA2C (marked in blue, Figure 3A). Although this 16 ACS Paragon Plus Environment

Page 17 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

peptide was identified with a highly confident PSM (Figure 3C), it did not fit the criteria of peptide with at least 9 amino acids for positive identification of missing proteins. The observation of silver membrane MPs in our data suggested relaxed criteria may be required for identification of missing membrane proteins with multiple TMHs. Multiprotease digestion strategy will be more efficient as an alternative approach to promote silver MPs to gold MPs.

Validation of Gold Missing Proteins by MRM-MS From the 26 highly confident gold missing proteins identified following strict filtering criteria, we aimed to validate some using targeted MRM-MS.15 Although MRM approach is well established for accurate and sensitive quantitation, it was also proposed by HPP as a suitable method for validation of missing proteins.36 To confirm the expression of gold MPs, we have selected 12 gold MPs, including 6 membrane proteins with TMH (P59025-RTP1, Q15760-GPR19, Q5XG99-LYSMD4, Q6P4F1FUT10, Q86WK7-AMIGO3 and Q8N5D6-GBGT1), and 1 membrane protein without TMH (Q6TDP4-KLHL17). The remaining 4 gold MPs (C9JN71-ZNF878, Q6ZNA1ZNF836, O95780-ANF692 and O75346-ANF253) were nuclear proteins and one (Q6ZU67-BEND4) without subcellular localization in UniportKB. Selection of these proteins were based on their functional information associated with developmental stages at mRNA level (from HPA database) and most of them were selected from membrane fractions with relatively high number of PSMs. Two unique peptides with good MS detectivity per protein were selected and synthesized for 12 MPs and MRM assay were developed for a total of 24 peptides in hESC cell line. As rigorous MRM assay development is critical in targeted proteomics, we first acquired DDA spectra of the pooled synthetic peptide and built library using Skyline software for further precursor and transition selection.54,55 Top 10 high intensity fragment ions per peptide precursor were selected and collision energy was optimized for MRM-MS. The endogenous peptides were identified in different subcellular part and peptide fractionation. The current hESC data was obtained by extensive peptide fractionation 17 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 42

(230 fractions) which take long MS acquisition time. For efficient verification by MRM-MS, the 24 synthetic peptides were separately analyzed in different runs. The retention time of each endogenous peptide in the same subcellular and respective 20 fractions was used as reference accordingly. The MRM peak features of endogenous peptides including peak shape, signal-tonoise ratio, relative fragment intensities, and coelution profiles were compared with these from synthetic peptides to confirm the identification of MPs. Representative example was shown in Figure 4 for Alpha-(1,3)-fucosyltransferase 10 (FUT10). FUT10, a probable fucosyltransferase located at chromomosome 8 with transcript level expression, is a membrane protein with single TMH domain (479 a.a., 56.1 kDa). Using miRNAs knockdown in mice, Kumar et al. reported that Fut10 is involved in α1,3fucosyltransferase activity required for the maintenance of stem cells and for the neural development.56 In our study, FUT10 was identified with three unique peptides from both 258

nuclear

and

membrane

fractions.

NPASMDADGFYR269 (672.29 Da, +2) and

Among

them,

two

peptides,

346

LYEAYVEWK354 (600.80 Da, +2),

were chosen and synthesized for validation. As shown in Figure 4A, in addition to similar y and b fragment ions from

258

NPASMDADGFYR269, the 6 co-eluting

transitions from synthetic peptide and endogenous peptide had similar relative intensity and profiles with a dot-product of 0.94. Similarly, the MS/MS and MRM profiles of 346

LYEAYVEWK354 (dot-product of 0.86) have high similarity between the synthetic

peptide and endogenous peptide from the 19th nuclear fraction sample. Overall, the maximum retention time shift was 1.6 min and dot-product varied from 0.56 to 0.97 for the 24 peptides. The detailed PSM and MRM validation results for the 12 MPs were shown in Supplementary Figure S-2 and Supplementary Table S-6. SRMAtlas is one of the largest public proteome database consisting of experimentally detected peptides as well as synthetic peptides.57 Yamamoto et. al.58 demonstrated a strategy to match the spectra of endogenously detected peptides of missing proteins with their synthetic counterparts in the SRMAtlas; 41missing proteins have been matched from this public dataset. In our study, we performed further validation by the same approach to carefully validate all identified gold and silver MPs 18 ACS Paragon Plus Environment

Page 19 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

against the spectra of the corresponding synthetic peptides in SRMAtlas. This verification strategy resulted in additional verification evidences of 11 gold MPs (two peptides for each gold MP). Combined both verification methods (MRM and SRMAtlas) increased confidence of total of 21 gold MPs (8 gold MPs validated by MRM, 11 by SRMAtlas and 2 verified by single peptide). The MRM or PSM spectra of 21 verified gold MPs were summarized in Supplementary Figure S-2, showing high similarity among the PSMs from endogenous peptides, synthetic peptides, and Peptide Atlas peptides, which confirmed the high-confidence identifications. In addition, the single uniquely identified peptides of 33 silver MPs were verified by PeptideAtlas databases suggesting confident identifications (Supplementary Figure S-4). All these results suggested high quality of MS analysis and confident identification of MPs in hESCs. Nevertheless, 54 silver MPs did not have PSMs matched from Peptide Atlas, which may indicate the first identification of those missing proteins in our study. The predicted uniquely-mapping tryptic peptides of the list of identified silver MPs were analyzed by PeptideMassExPasy (https://web.expasy.org/peptide_mass/) and their uniqueness

was

examined

by

the

neXtProt

Peptide

Unicity

Checker

(https://www.nextprot.org/tools/peptide-uniqueness-checker) (Supplementary Table S5).

Functional analysis of missing proteins in hESCs Subcellular localizations of gold and silver missing proteins revealed ahigh percentage of missing membrane proteins in hESCs (Supplementary Table S-4). Ten out of 17 gold membrane missing proteins were localized to several organelle membrane compartments including 6 plasma membrane (RTP1, KLHL17, PCDHGA10, GPR19, GABRQ and PANX2), 3 Golgi membrane (ZDHHC15, GBGT1, FUT10), one secreted to extracellular (PRRG3) and one in endoplasmic reticulum membrane (DISP3). It is noted that these proteins were confidentially identified in membrane fraction (PCDHGA10, GPR19, PANX2, ZDHHC15, DISP3), nuclear fraction (KLHL17) or in both membrane and nuclear fractions (PRRG3, RTP1, GABRQ, GBGT1, FUT10). The other 6 gold MPs were localized to nuclear 19 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

compartments and identified in nuclear fraction (ZNF253, FOXI3, ZNE836); cytoplasm fraction (ZNF682, ZNF730) and in both nuclear and cytoplasmic fractions (ZNF 878) (Figure 5, Supplementary Table S-4). Among the 47 silver MPs classified in membrane localizations, majority are in plasma membrane (28 proteins), 5 secreted to extracellular, 2 in ER membrane, 9 in nucleus, 1 in cytosol and 2 were identified in mitochondrion membrane (Figure 5, Supplementary Table S-5). The predominance of membrane proteins in the identified missing proteins suggested the need and promise to find MPs in the membrane subproteome. Our deep proteomic profiles enabled us to map almost every molecule in the key pathways and network related to the regulation of hESCs (Figure 6A, pink highlighted), many of them are involved in regulation of differentiation, self-renewal, epigenetic regulators, and cellular layers development through the core transcriptional networks in hESC. Interestingly, some identified MPs were associated with canonical pathways such as TGF-ß signaling (PCDHGA10-gold MP, FRAT2-silver MP) and MAPK signaling pathways (FRAT2-silver MP, MMD-silver MP), indicating their potential regulatory functions in hESC biology. Overexpression of FUT10 (gold MP) has been reported to play a critical role in enhancement of self-renewal characteristics in embryonic stem cells.59 In our data, this protein was confidently identified in more than one fractions (membrane and nucleus) and localized to Golgi membrane protein (Figure 5, Table 2) which was associated with STAT3 complex in LIF-AKT-STAT3 pathway (Figure 6A). Thus, it may involve in regulating the core transcriptional networks by interacting with the key transcriptional factors including NANOG, SOX2 and OCT4 and eventually modulate self-renewal or cellular developments. G proteincoupled receptor 19 (GPR19, gold MP), was reported among the list of 20 most positively significant genes in embryonic stem cells which may be crucial for maintaining or differentiating hESCs.60 This gold missing protein was identified in membrane fractions (Figure 5), consistent with its functional annotation as plasma protein receptor. Its role to regulate the GPCR signaling pathway and subsequently affect self-renewal or differentiation of hESCs remains further study. (Figure 6B).

20 ACS Paragon Plus Environment

Page 20 of 42

Page 21 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Lineage analysis of identified missing proteins To study the potential role of these missing proteins related to the linage-specific differentiation of hESC, the identified missing proteins were classified into the different lineage-specific differentiation based on the biological process analysis in Gene Ontology database. The lineage analysis of all identified proteins revealed top organspecific proteins and MPs that are associated with lineage-specific differentiations (Figure 7). According to Gene Ontology and DAVID (version 6.8) functional analysis, 97, 53 and 31 proteins were classified in regulation of mesoderm, endoderm and ectoderm differentiations, respectively (Figure 7A, B). Proteins were further categorized to different tissues or organs. Based on the mRNA expression data in HPA, these gold and silver missing proteins were involved in different lineage specific differentiations, suggesting their potential functional roles. Many developmental proteins were identified in each lineage. For example, three gold MPs, SLC10A3, TREM251, and TREM37, were involved in the lung development. The TREM and TREM-like receptors were reported to relate to acute inflammatory responses. The known member TREM-1 plays key role in chronic and non-infectious inflammatory disorders, including various types of cancer.61 The potential role of TREM251 and TREM37 in lung development and tumorigenesis may be explored in the future. Our data may provide a rich resource for hESC protein atlas to study the potential role of specific proteins for regulating embryonic biological processes and pluripotency.

Conclusion Based on the subcellular proteomics analysis strategy which integrate subcellular fractionation, gel-assisted digestion, extensive peptide pre-fractionation, LC-MS/MS analysis with high mass accuracy and high-resolution instrument, and database searching with multiple engines, we have achieved the most comprehensive molecular landscape of hESC to date. In addition, this large-scale proteomic data mined 26 gold MPs and 87 silver MPs. The structural analysis of silver MP revealed bottlenecks in identification of missing membrane proteins with multiple TMHs. The utility of our data demonstrated the role of MPs in regulating pluripotency-related pathways and 21 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

lineage-specific differentiation. We expected that the identification result could serve as valuable sources for building the largest hESC proteome map and for standardizing biomarker discovery in hESC.

SUPPORTING INFORMATION: The following supporting information is available free of charge at ACS website http://pubs.acs.org Legends for Supplementary Figures Supplementary Figure S-1: Peptide fractionation separation efficiency of the three subcellular fractions. Supplementary Figure S-2a. Representative PSM (A) and MRM (B) spectra of 2 identified unique peptides from Gold Missing Protein RTP1 (P59025). Supplementary Figure S-2b. Representative PSM (A) and MRM (B) spectra of 2 identified unique peptides from Gold Missing Protein GPR19 (Q15760). Supplementary Figure S-2c. Representative PSM (A) and MRM (B) spectra of 2 identified unique peptides from Gold Missing Protein GBGT1 (Q8N5D6). Supplementary Figure S-2d. Representative PSM (A) and MRM (B) spectra of 2 identified unique peptides from Gold Missing Protein ZNF682 (O95780). Supplementary Figure S-2e. Representative PSM (A) and MRM (B) spectra of 2 identified unique peptides from Gold Missing Protein ZNF878 (C9JN71). Supplementary Figure S-2f. Representative PSM (A) and MRM (B) spectra of 2 identified unique peptides from Gold Missing Protein AMIGO3 (Q86WK7). Supplementary Figure S-2g. Representative PSM (A) and MRM (B) spectra of 2 identified unique peptides from Gold Missing Protein FUT10 (Q6P4F1). Supplementary Figure S-2h. Representative PSM (A) and MRM (B) spectra of 2 identified unique peptides from Gold Missing Protein LYSMD4 (Q5XG99). Supplementary Figure S-2i. Representative PSM (A) and MRM (B) spectra of 2 identified unique peptides from Gold Missing Protein ZNF836 (Q6ZNA1). Supplementary Figure S-2j. Representative PSM (A) and MRM (B) spectra of 2 identified unique peptides, SFTASSTLTTHK and NLVFLGIVVSKPDLVTCLEQGK from Gold Missing Protein ZNF253 (O75346). Supplementary Figure S-2k. Representative PSM (A) and MRM (B) spectra of 2 identified unique peptides from Gold Missing Protein BEND4 (Q6ZU67).

22 ACS Paragon Plus Environment

Page 22 of 42

Page 23 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Supplementary Figure S-3a. Representative PSM spectra of 2 identified unique peptides from Gold Missing Protein SLC10A3 (P09131). Supplementary Figure S-3b. Representative PSM spectra of 2 identified unique peptides from Gold Missing Protein PCDHGA10 (Q9Y5H3). Supplementary Figure S-3c. Representative PSM spectra of 2 identified unique peptides from Gold Missing Protein PRRG3 (Q9BZD7). Supplementary Figure S-3d. Representative PSM spectra of 2 identified unique peptides from Gold Missing Protein Q6UYE1 (DLEU7). Supplementary Figure S-3e. Representative PSM spectra of 2 identified unique peptides, FFNQSTNLTTHK and AFNQSSTLTIHK from Gold Missing Protein ZNF730 (Q6ZMV8). Supplementary Figure S-3f. Representative PSM spectra of 2 identified unique peptides from Gold Missing Protein DISP3 (Q9P2K9) Supplementary Figure S-3g. Representative PSM spectra of 2 identified unique peptides from Gold Missing Protein ZDHHC15 (Q96MV8) Supplementary Figure S-3h. Representative PSM spectra of 2 identified unique peptides from Gold Missing Protein TMEM251 (Q8N6I4) Supplementary Figure S-3i. Representative PSM spectra of 2 identified unique peptides from Gold Missing Protein PANX2 (Q96RD6) Supplementary Figure S-3j. Representative PSM spectra of 2 identified unique peptides from Gold Missing Protein TMEM200B (Q69YZ2). Supplementary Figure S-3k. Representative PSM spectra of 2 identified unique peptides from Gold Missing Protein Q9UN88 (GABRQ). Supplementary Figure S-3l. Representative PSM spectra of 2 identified unique peptides from Gold Missing Protein TMEM37 (Q8WXS4). Supplementary Figure S-3m. Representative PSM spectra of 2 identified unique peptides from Gold Missing Protein FOXI3 (A8MTJ6). Supplementary Figure S-3n. Representative PSM spectra of 2 identified unique peptides KLHL17 (Q6TDP4). Supplementary Figure S-3o. Representative PSM spectra of 2 identified unique peptides from Gold Missing Protein A2VCL2 (CCDC162P). Supplementary Figure S4. PSM spectra of silver missing proteins identified with single unique peptide.

23 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Supplementary Table S-1. Summary identification in each subcellular fraction Supplementary Table S-2a. Total protein identification results of first batch membrane fraction Supplementary Table S-2b. Total protein identification results of second batch membrane fraction Supplementary Table S-2c. Total protein identification results of first batch cytoplasm fraction Supplementary Table S-2d. Total protein identification results of second batch cytoplasm fraction Supplementary Table S-2e. Total protein identification results of first batch nucleus fraction Supplementary Table S-2f. Total protein identification results of second batch nucleus fraction Supplementary Table S-3. Annotated membrane proteins in current study Supplementary Table S-4. Identification details of Gold Missing Proteins Supplementary Table S-5. Identification details of Silver Missing Proteins Supplementary Table S-6. The detailed summary of MRM-MS validation results Supplementary Table S-7a. The mRNA expression levels of gold MPs Supplementary Table S-7b. The mRNA expression levels of silver MPs Supplementary Table S-8a. Summary list of uPE1 proteins identified in our dataset Supplementary Table S-8b. Summary list of uPE1 proteins located in chromosome 4 identified in our dataset

Acknowledgement The proteomics data were analyzed with an LTQ-Orbitrap Fusion mass spectrometer at the Academia Sinica Common Mass Spectrometry Facilities located at the Institute of Biological Chemistry and with Fusion Lumos mass spectrometer at Department of Chemistry, National Taiwan University. This work was supported by the Ministry of Science and Technology of Taiwan (MOST104-2113-M-001-005-MY3, MOST1062113-M-038-004-MY2, Taipei Medical University (TMU103-Y05-E106) and Academia Sinica (AS-105-TP-A05).

The authors declare no competing financial interest. 24 ACS Paragon Plus Environment

Page 24 of 42

Page 25 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

References (1) Rubin, L. L.; Haston, K. M. Stem cell biology and drug discovery. BMC Biology 2011, 9, 42. (2) Takahashi, K.; Tanabe, K.; Ohnuki, M.; Narita, M.; Ichisaka, T.; Tomoda, K.; Yamanaka, S. "Induction of Pluripotent Stem Cells from Adult Human Fibroblasts by Defined Factors". Cell 2007, 131, 861-872. (3) Lu, T.-Y.; Lu, R.-M.; Liao, M.-Y.; Yu, J.; Chung, C.-H.; Kao, C.-F.; Wu, H.-C. "Epithelial Cell Adhesion Molecule Regulation Is Associated with the Maintenance of the Undifferentiated Phenotype of Human Embryonic Stem Cells". Journal of Biological Chemistry 2010, 285, 8719-8732. (4) Hu, Q.; Rosenfeld, M. G. Epigenetic regulation of human embryonic stem cells. Frontiers in Genetics 2012, 3, 238. (5) Ronquist, S. A.-O.; Patterson, G.; Muir, L. A.; Lindsly, S.; Chen, H.; Brown, M.; Wicha, M. S.; Bloch, A.; Brockett, R.; Rajapakse, I. Algorithm for cellular reprogramming. PNAS 2017, 114(45), 11832-11837. (6) Ding, V. M. Y.; Ling, L.; Natarajan, S.; Yap, M. G. S.; Cool, S. M.; Choo, A. B. H. "FGF-2 modulates Wnt signaling in undifferentiated hESC and iPS cells through activated PI3-K/GSK3β signaling". Journal of Cellular Physiology 2010, 225, 417-428. (7) Bray, S. J. "Notch signalling: a simple pathway becomes complex". Nat Rev Mol Cell Biol 2006, 7, 11. (8) Munz, M.; Baeuerle, P. A.; Gires, O. The Emerging Role of EpCAM in Cancer and Stem Cell Signaling. Cancer Research 2009, 69, 5627-5629. (9) Prokhorova, T. A.; Rigbolt, K. T. G.; Johansen, P. T.; Henningsen, J.; Kratchmarova, I.; Kassem, M.; Blagoev, B. "Stable Isotope Labeling by Amino Acids in Cell Culture (SILAC) and Quantitative Comparison of the Membrane Proteomes of Self-renewing and Differentiating Human Embryonic Stem Cells". Molecular & Cellular Proteomics 2009, 8, 959-970. (10) Omenn, G. S. A.-O. h. o. o.; Lane, L.; Lundberg, E. K.; Overall, C. M.; Deutsch, E. W. A.-O. h. o. o. Progress on the HUPO Draft Human Proteome: 2017 Metrics of the Human Proteome Project. J Proteome Res. 2017, 16(12), 4281-4287.

25 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(11) Meyfour, A.; Pahlavan, S.; Sobhanian, H.; Salekdeh, G. H. 17th Chromosome-Centric Human Proteome Project Symposium in Tehran. Proteomics 2018, 18, e1800012. (12) Baker, M. S.; Ahn, S. B.; Mohamedali, A.; Islam, M. T.; Cantor, D.; Verhaert, P. D.; Fanayan, S.; Sharma, S.; Nice, E. C.; Connor, M.; Ranganathan, S. Accelerating the search for the missing proteins in the human proteome. Nature Communications 2017, 8, 14271. (13) Paik, Y.-K.; Overall, C. M.; Deutsch, E. W.; Van Eyk, J. E.; Omenn, G. S. Progress and Future Direction of Chromosome-Centric Human Proteome Project. Journal of Proteome Research 2017, 16, 4253-4258. (14) Chen, Y.; Li, Y.; Zhong, J.; Zhang, J.; Chen, Z.; Yang, L.; Cao, X.; He, Q.-Y.; Zhang, G.; Wang, T. Identification of Missing Proteins Defined by ChromosomeCentric Proteome Project in the Cytoplasmic Detergent-Insoluble Proteins. Journal of Proteome Research 2015, 14, 3693-3709. (15) Kitata, R. B.; Dimayacyac-Esleta, B. R. T.; Choong, W.-K.; Tsai, C.-F.; Lin, T.D.; Tsou, C.-C.; Weng, S.-H.; Chen, Y.-J.; Yang, P.-C.; Arco, S. D.; Nesvizhskii, A. I.; Sung, T.-Y.; Chen, Y.-J. Mining Missing Membrane Proteins by High-pH Reverse-Phase StageTip Fractionation and Multiple Reaction Monitoring Mass Spectrometry. Journal of Proteome Research 2015, 14, 3658-3669. (16) Van Simaeys, D.; Turek, D.; Champanhac, C.; Vaizer, J.; Sefah, K.; Zhen, J.; Sutphen, R.; Tan, W. Identification of cell membrane protein stress-induced phosphoprotein 1 as a potential ovarian cancer biomarker using aptamers selected by cell systematic evolution of ligands by exponential enrichment. Analytical chemistry 2014, 86, 4521-4527. (17) Shekari, F.; Baharvand, H.; Salekdeh, G. H.: Chapter Seven - Organellar Proteomics of Embryonic Stem Cells. In Advances in Protein Chemistry and Structural Biology; Donev, R., Ed.; Academic Press, 2014; Vol. 95; pp 215-230. (18) Shekari, F.; Nezari, H.; Larijani, M. R.; Han, C.-L.; Baharvand, H.; Chen, Y.-J.; Salekdeh, G. H. Proteome analysis of human embryonic stem cells organelles. Journal of Proteomics 2017, 162, 108-118. (19) Bianco, P.; Robey, P. G. Skeletal stem cells. Development (Cambridge, England) 2015, 142, 1023-1027. (20) Dormeyer, W.; van Hoof, D.; Braam, S. R.; Heck, A. J. R.; Mummery, C. L.; Krijgsveld, J. Plasma Membrane Proteomics of Human Embryonic Stem Cells and 26 ACS Paragon Plus Environment

Page 26 of 42

Page 27 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Human Embryonal Carcinoma Cells. Journal of Proteome Research 2008, 7, 29362951. (21) Kolle, G.; Ho M Fau - Zhou, Q.; Zhou Q Fau - Chy, H. S.; Chy Hs Fau Krishnan, K.; Krishnan K Fau - Cloonan, N.; Cloonan N Fau - Bertoncello, I.; Bertoncello I Fau - Laslett, A. L.; Laslett Al Fau - Grimmond, S. M.; Grimmond, S. M. Identification of human embryonic stem cell surface markers by combined membrane-polysome translation state array analysis and immunotranscriptional profiling. Stem Cells 2009, 10, 2446-2458. (22) McQuade, L. R.; Schmidt, U.; Pascovici, D.; Stojanov, T.; Baker, M. S. Improved Membrane Proteomics Coverage of Human Embryonic Stem Cells by Peptide IPG-IEF. Journal of Proteome Research 2009, 8, 5642-5649. (23) Pan, C.; Kumar C Fau - Bohl, S.; Bohl S Fau - Klingmueller, U.; Klingmueller U Fau - Mann, M.; Mann, M. Comparative proteomic phenotyping of cell lines and primary cells to assess preservation of cell type-specific functions. Mol Cell, Proteomics 2009 3, 443-450. (24) Gerwe, B. A.; Angel Pm Fau - West, F. D.; West Fd Fau - Hasneen, K.; Hasneen K Fau - Young, A.; Young A Fau - Orlando, R.; Orlando R Fau - Stice, S. L.; Stice, S. L. Membrane proteomic signatures of karyotypically normal and abnormal human embryonic stem cell lines and derivatives. Proteomics 2011, 11(12), 25152527. (25) Gu, B.; Zhang J Fau - Wu, Y.; Wu Y Fau - Zhang, X.; Zhang X Fau - Tan, Z.; Tan Z Fau - Lin, Y.; Lin Y Fau - Huang, X.; Huang X Fau - Chen, L.; Chen L Fau - Yao, K.; Yao K Fau - Zhang, M.; Zhang, M. Proteomic analyses reveal common promiscuous patterns of cell surface proteins on human embryonic stem cells and sperms. PloS one 2011, 6(5), e19386. (26) Sarkar, P.; Collier Ts Fau - Randall, S. M.; Randall Sm Fau - Muddiman, D. C.; Muddiman Dc Fau - Rao, B. M.; Rao, B. M. The subcellular proteome of undifferentiated human embryonic stem cells. Proteomics 2012, 12(3), 421-430. (27) Melo-Braga, M. N.; Schulz M Fau - Liu, Q.; Liu Q Fau - Swistowski, A.; Swistowski A Fau - Palmisano, G.; Palmisano G Fau - Engholm-Keller, K.; EngholmKeller K Fau - Jakobsen, L.; Jakobsen L Fau - Zeng, X.; Zeng X Fau - Larsen, M. R.; Larsen, M. R. Comprehensive quantitative comparison of the membrane proteome, phosphoproteome, and sialiome of human embryonic and neural stem cells. Molecular & Cellular Proteomics 2014, 13(1), 311-328. 27 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(28) Singec, I.; Crain, Andrew M.; Hou, J.; Tobe, Brian T.; Talantova, M.; Winquist, Alicia A.; Doctor, Kutbuddin S.; Choy, J.; Huang, X.; La Monaca, E.; Horn, David M.; Wolf, Dieter A.; Lipton, Stuart A.; Gutierrez, Gustavo J.; Brill, Laurence M.; Snyder, Evan Y. Quantitative Analysis of Human Pluripotency and Neural Specification by In-Depth (Phospho)Proteomic Profiling. Stem Cell Reports 2016, 7, 527-542. (29) Phanstiel, D. H.; Brumbaugh, J.; Wenger, C. D.; Tian, S.; Probasco, M. D.; Bailey, D. J.; Swaney, D. L.; Tervo, M. A.; Bolin, J. M.; Ruotti, V.; Stewart, R.; Thomson, J. A.; Coon, J. J. Proteomic and phosphoproteomic comparison of human ES and iPS cells. Nature methods 2011, 8, 821-827. (30) Prokhorova, T. A.; Rigbolt Kt Fau - Johansen, P. T.; Johansen Pt Fau Henningsen, J.; Henningsen J Fau - Kratchmarova, I.; Kratchmarova I Fau - Kassem, M.; Kassem M Fau - Blagoev, B.; Blagoev, B. Stable isotope labeling by amino acids in cell culture (SILAC) and quantitative comparison of the membrane proteomes of selfrenewing and differentiating human embryonic stem cells. Molecular & Cellular Proteomics 8(5), 959-970. (31) McQuade, L. R.; Schmidt, U.; Pascovici, D.; Stojanov, T.; Baker, M. S. Improved membrane proteomics coverage of human embryonic stem cells by peptide IPG-IEF. Journal of proteome research 2009, 8, 5642-5649. (32) Gu, B.; Zhang, J.; Wu, Y.; Zhang, X.; Tan, Z.; Lin, Y.; Huang, X.; Chen, L.; Yao, K.; Zhang, M. Proteomic analyses reveal common promiscuous patterns of cell surface proteins on human embryonic stem cells and sperms. PloS one 2011, 6, e19386. (33) Harkness, L.; Christiansen H Fau - Nehlin, J.; Nehlin J Fau - Barington, T.; Barington T Fau - Andersen, J. S.; Andersen Js Fau - Kassem, M.; Kassem, M. Identification of a membrane proteomic signature for human embryonic stem cells independent of culture conditions. Stem Cell Reports 2008, 1(3), 219-227. (34) Ghazizadeh, Z.; Fattahi, F.; Mirzaei, M.; Bayersaikhan, D.; Lee, J.; Chae, S.; Hwang, D.; Byun, K.; Tabar, M. S.; Taleahmad, S.; Mirshahvaladi, S.; Shabani, P.; Fonoudi, H.; Haynes, P. A.; Baharvand, H.; Aghdami, N.; Evans, T.; Lee, B.; Salekdeh, G. H. Prospective Isolation of ISL1+ Cardiac Progenitors from Human ESCs for Myocardial Infarction Therapy. Stem Cell Reports 2018, 10, 848-859. (35) Wei, W.; Luo, W.; Wu, F.; Peng, X.; Zhang, Y.; Zhang, M.; Zhao, Y.; Su, N.; Qi, Y.; Chen, L.; Zhang, Y.; Wen, B.; He, F.; Xu, P. Deep Coverage Proteomics Identifies 28 ACS Paragon Plus Environment

Page 28 of 42

Page 29 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

More Low-Abundance Missing Proteins in Human Testis Tissue with Q-Exactive HF Mass Spectrometer. Journal of Proteome Research 2016, 15, 3988-3997. (36) Omenn, G. S.; Lane, L.; Lundberg, E. K.; Beavis, R. C.; Nesvizhskii, A. I.; Deutsch, E. W. Metrics for the Human Proteome Project 2015: Progress on the Human Proteome and Guidelines for High-Confidence Protein Identification. Journal of Proteome Research 2015, 14, 3452-3460. (37) Deutsch, E. W.; Overall, C. M.; Van Eyk, J. E.; Baker, M. S.; Paik, Y. K.; Weintraub, S. T.; Lane, L.; Martens, L.; Vandenbrouck, Y.; Kusebauch, U.; Hancock, W. S.; Hermjakob, H.; Aebersold, R.; Moritz, R. L.; Omenn, G. S. Human Proteome Project Mass Spectrometry Data Interpretation Guidelines 2.1. J Proteome Res 2016, 15, 3961-3970. (38) Han, C. L. C., Chih Wei; Chen, Wen Cheng; Chen, Yet Ran; Wu, Chien Peng; Li, Hung; Chen, Yu Ju. A multiplexed quantitative strategy for membrane proteomics : Opportunities for mining therapeutic targets for autosomal dominant polycystic kidney disease. Molecular and Cellular Proteomics 2008, 7, 1983-1997. (39) Dimayacyac-Esleta, B. R. T.; Tsai, C.-F.; Kitata, R. B.; Lin, P.-Y.; Choong, W.K.; Lin, T.-D.; Wang, Y.-T.; Weng, S.-H.; Yang, P.-C.; Arco, S. D.; Sung, T.-Y.; Chen, Y.-J. Rapid High-pH Reverse Phase StageTip for Sensitive Small-Scale Membrane Proteomic Profiling. Analytical chemistry 2015, 87, 12016-12023. (40) Mertins, P.; Mani, D. R.; Ruggles, K. V.; Gillette, M. A.; Clauser, K. R.; Wang, P.; Wang, X.; Qiao, J. W.; Cao, S.; Petralia, F.; Kawaler, E.; Mundt, F.; Krug, K.; Tu, Z.; Lei, J. T.; Gatza, M. L.; Wilkerson, M.; Perou, C. M.; Yellapantula, V.; Huang, K. L.; Lin, C.; McLellan, M. D.; Yan, P.; Davies, S. R.; Townsend, R. R.; Skates, S. J.; Wang, J.; Zhang, B.; Kinsinger, C. R.; Mesri, M.; Rodriguez, H.; Ding, L.; Paulovich, A. G.; Fenyo, D.; Ellis, M. J.; Carr, S. A.; Nci, C. Proteogenomics connects somatic mutations to signalling in breast cancer. Nature 2016, 534, 55-62. (41) Vizcaíno, J. A.; Deutsch, E. W.; Wang, R.; Csordas, A.; Reisinger, F.; Ríos, D.; Dianes, J. A.; Sun, Z.; Farrah, T.; Bandeira, N.; Binz, P.-A.; Xenarios, I.; Eisenacher, M.; Mayer, G.; Gatto, L.; Campos, A.; Chalkley, R. J.; Kraus, H.-J.; Albar, J. P.; MartinezBartolomé, S.; Apweiler, R.; Omenn, G. S.; Martens, L.; Jones, A. R.; Hermjakob, H. ProteomeXchange provides globally coordinated proteomics data submission and dissemination. Nature Biotechnology 2014, 32, 223. (42) Okuda, S.; Watanabe, Y.; Moriya, Y.; Kawano, S.; Yamamoto, T.; Matsumoto, M.; Takami, T.; Kobayashi, D.; Araki, N.; Yoshizawa, A. C.; Tabata, T.; 29 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Sugiyama, N.; Goto, S.; Ishihama, Y. jPOSTrepo: an international standard data repository for proteomes. Nucleic Acids Research 2017, 45, D1107-D1111. (43) Ponten, F.; Schwenk, J. M.; Asplund, A.; Edqvist, P. H. The Human Protein Atlas as a proteomic resource for biomarker discovery. Journal of internal medicine 2011, 270, 428-446. (44) The UniProt, C. The Universal Protein Resource (UniProt). Nucleic Acids Research 2007, 35, D193-D197. (45) Omasits, U.; Ahrens, C. H.; Müller, S.; Wollscheid, B. Protter: interactive protein feature visualization and integration with experimental proteomic data. Bioinformatics 2014, 30, 884-886. (46) Kanehisa, M.; Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research 2000, 28, 27-30. (47) Dreger, M. Subcellular proteomics. Mass Spectrom Rev 2003, 22, 27-56. (48) Satori, C. P.; Kostal, V.; Arriaga, E. A. Review on recent advances in the analysis of isolated organelles. Anal Chim Acta 2012, 753, 8-18. (49) Shekari, F.; Baharvand, H.; Salekdeh, G. H. Organellar proteomics of embryonic stem cells. Adv Protein Chem Struct Biol 2014, 95, 215-230. (50) Han, C. L.; Chen Js Fau - Chan, E.-C.; Chan Ec Fau - Wu, C.-P.; Wu Cp Fau Yu, K.-H.; Yu Kh Fau - Chen, K.-T.; Chen Kt Fau - Tsou, C.-C.; Tsou Cc Fau - Tsai, C.-F.; Tsai Cf Fau - Chien, C.-W.; Chien Cw Fau - Kuo, Y.-B.; Kuo Yb Fau - Lin, P.-Y.; Lin Py Fau - Yu, J.-S.; Yu Js Fau - Hsueh, C.; Hsueh C Fau - Chen, M.-C.; Chen Mc Fau - Chan, C.-C.; Chan Cc Fau - Chang, Y.-S.; Chang Ys Fau - Chen, Y.-J.; Chen, Y. J. An informaticsassisted label-free approach for personalized tissue membrane proteomics: case study on colorectal cancer. Molecular & Cellular Proteomics 2011, 10(4), M110.003087. (51) Eichacker, L. A.; Granvogl, B.; Mirus, O.; Müller, B. C.; Miess, C.; Schleiff, E. Hiding behind Hydrophobicity: TRANSMEMBRANE SEGMENTS IN MASS SPECTROMETRY. Journal of Biological Chemistry 2004, 279, 50915-50922. (52) Foster, L. J.; de Hoog, C. L.; Zhang, Y.; Zhang, Y.; Xie, X.; Mootha, V. K.; Mann, M. A Mammalian Organelle Map by Protein Correlation Profiling. Cell 2006, 125, 187-199. (53) Giansanti, P.; Tsiatsiani, L.; Low, T. Y.; Heck, A. J. R. Six alternative proteases for mass spectrometry–based proteomics beyond trypsin. Nature Protocols 2016, 11, 993. 30 ACS Paragon Plus Environment

Page 30 of 42

Page 31 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

(54) Lange, V.; Picotti, P.; Domon, B.; Aebersold, R. Selected reaction monitoring for quantitative proteomics: a tutorial. Molecular Systems Biology 2008, 4. (55) Gillette, M. A.; Carr, S. A. Quantitative analysis of peptides and proteins in biomedicine by targeted mass spectrometry. Nature methods 2013, 10, 28-34. (56) Piontek, K.; Strittmatter, E.; Ullrich, R.; Grobe, G.; Pecyna, M. J.; Kluge, M.; Scheibner, K.; Hofrichter, M.; Plattner, D. A. Structural basis of substrate conversion in a new aromatic peroxygenase: cytochrome P450 functionality with benefits. The Journal of biological chemistry 2013, 288, 34767-34776. (57) Kusebauch, U.; Campbell, D. S.; Deutsch, E. W.; Chu, C. S.; Spicer, D. A.; Brusniak, M. Y.; Slagel, J.; Sun, Z.; Stevens, J.; Grimes, B.; Shteynberg, D.; Hoopmann, M. R.; Blattmann, P.; Ratushny, A. V.; Rinner, O.; Picotti, P.; Carapito, C.; Huang, C. Y.; Kapousouz, M.; Lam, H.; Tran, T.; Demir, E.; Aitchison, J. D.; Sander, C.; Hood, L.; Aebersold, R.; Moritz, R. L. Human SRMAtlas: A Resource of Targeted Assays to Quantify the Complete Human Proteome. Cell 2016, 166(3), 766-778. (58) Elguoshy, A.; Hirao, Y.; Xu, B.; Saito, S.; Quadery, A. F.; Yamamoto, K.; Mitsui, T.; Yamamoto, T. A.-O. h. o. o. X. Identification and Validation of Human Missing Proteins and Peptides in Public Proteome Databases: Data Mining Strategy. J Proteome Res. 2017, 16(12), 4403-4414. (59) Kumar, A.; Torii T Fau - Ishino, Y.; Ishino Y Fau - Muraoka, D.; Muraoka D Fau - Yoshimura, T.; Yoshimura T Fau - Togayachi, A.; Togayachi A Fau - Narimatsu, H.; Narimatsu H Fau - Ikenaka, K.; Ikenaka K Fau - Hitoshi, S.; Hitoshi, S. The Lewis Xrelated alpha1,3-fucosyltransferase, Fut10, is required for the maintenance of stem cell populations. Journal of Biological Chemistry 2013 40, 28859-28868. (60) de Oliveira Georges, J. A.; Vergani N Fau - Fonseca, S. A. S.; Fonseca Sa Fau - Fraga, A. M.; Fraga Am Fau - de Mello, J. C. M.; de Mello Jc Fau - Albuquerque, M. C. R. M.; Albuquerque Mc Fau - Fujihara, L. S.; Fujihara Ls Fau - Pereira, L. V.; Pereira, L. V. Aberrant patterns of X chromosome inactivation in a new line of human embryonic stem cells established in physiological oxygen concentrations. Stem Cell Reports 2014 10(4), 472-479. (61) Saurer, L.; Zysset, D.; Rihs, S.; Mager, L.; Gusberti, M.; Simillion, C.; Lugli, A.; Zlobec, I.; Krebs, P.; Mueller, C. TREM-1 promotes intestinal tumorigenesis. Scientific Reports 2017, 7, 14870.

31 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 42

For TOC Only

4

26 Gold and 87 Silver Missing Proteins, 570 uPE1 proteins PCDGHA10 SLC10A3

ZNF253

TMEM14B

TMEM14B

Brain

Heart

Embryonic Stem Cell 11,970 proteins

SBK3

CTXN1

C1orf54

MMD TMEM37

SLC10A3

C1orf54

Lung

Liver

TMEM14B

TMEM251 TMEM37

TMEM220 MYL5

TMEM37 TMEM251

LIPK SLC10A3

Kidney

Skin

Membrane proteome 8,742 proteins

TMEM14B

LYSMD4

TM4SF18 C1orf54

TMEM14B BEND4 SLC10A3 TMEM200B

Testis

C3orf67

Ovary, Cervix, Endometrium GFGT1

TMEM14B TTLL2

C1orf54 TMEM251

Placenta

C1orf54

TMEM37

TMEM14B TMEM251

Cytosolic proteome 9,228 proteins

TMEM14B TMEM37

MMD

SLC10A3

Nuclear proteome 10,757 proteins

MMD

SLC10A3

Gold MPs

Silver MPs

Figure legend Figure 1: Analytical workflow for subcellular proteome profiling of hESCs. (A) The experimental workflow integrates subcellular fractionation to generate membrane, cytoplasm and nucleus fraction, gel-assisted digestion, high-pH StageTip or HPLC fractionation for peptides, and duplicate LC-MS/MS analysis. (B) The analysis pipeline for protein identifications included multiple database search by using Proteome Discover 2.1 with 1% FDR at PSM, peptide and protein level. The combined protein identifications from each subcellular fraction was obtained using consensus workflow in PD (1% protein FDR) which identified 8742, 9228 and 10757 proteins in membrane, cytoplasm and nucleus fractions, respectively. Following HPP guideline, we were able to identify 26 gold missing proteins (MPs) and 87 silver MPs with single unique peptide identification. Figure 2: Protein identification overview. (A) Overlapping of protein identifications in the three subcellular fractions. (B) Comparison of identified hESC proteins between datasets from Singec (2016), Phanstiel (2011) and our study. (C) Comparison of 32 ACS Paragon Plus Environment

Page 33 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

identified proteins in datasets from ten published membrane proteome reports and the result from membrane fraction in current study. (D) Membrane protein annotations for 11,970 identified proteins. A total of 6138 membrane proteins were identified in current study. (E) The distribution of TMH from 2,648 transmembrane proteins. Membrane protein annotation was extracted from neXtProt 2018-01. Figure 3: Example of silver missing protein with multiple TMH domains. (A) The insilico digestion analysis of alpha-2C adrenergic receptor, a plasma membrane proteins encoded from chromosome, showed nine unique tryptic peptides with at least 9 amino acids of which AGAEGGAGGADGQGAGPGAAESGALTASR (red highlighted) was identified in our dataset. A second peptide with 8 amino acid length (SVEFFLSR) was also detected (blue). The other peptide sequences (1-26, 27-43, 44-80, 150-162, 170192, 380-409, 410-420, 421-445) (gray highlighted) mostly locate within or close to lipid bilayer or carry PTMS which hindered them from tryptic digestion and subsequent MS identification. (B) and (C) showed the two identified PSM for alpha-2C adrenergic receptor. Figure 4. Representative PSM and MRM spectra of 2 identified unique peptides, NPASMDADGFYR

and

LYEAYVEWK,

from

gold

MP

alpha-(1,3)-

fucosyltransferase (FUT10, Q6P4F1). The relative intensities and retention time profiles of transition ions from the endogenous peptides are similar with that from synthetic standard peptides. Figure 5. Subcellular localizations of gold and silver MPs based on annotations from GO and/or Uniprot databases. The color of rectangle indicated the subcellular fractions in which the MP were identified. Nucleus: pink rectangle, cytoplasm: green rectangle, and membrane: blue rectangle. Figure 6: Network analysis of hESCs proteins by using KEGG and IPA. (A) Based on the 11,790 identified proteins, we constructed the most comprehensive pluripotencyrelated protein network in hESCs. Pink highlighted molecules identified in our dataset. The identified gold and silver MPs associated with the hESC networks were shown, indicating their potential regulatory functions in hESCs biology. (B) Several gold and

33 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

silver MPs were predicted to involve in the GPCR signaling which contribute to selfrenewal and differentiation of hESCs. Figure 7: (A) Proteins were classified into the different lineage-specific differentiation based on the biological process analysis in Gene Ontology database. (B) The tissue- or organ specific proteins identified in current study. The top organ specific proteins according to Human Protein Atlas is indicated by black. The subcellular fraction of each identified proteins is indicated by colors. Gold and silver MPs with >10 TPM mRNA expression levels were shown in different specialized tissues/organs. Table 1: Stepwise filtering of Gold missing proteins by C-HPP criteria Table 2: Identified gold missing proteins in 3 subcellular proteomes

34 ACS Paragon Plus Environment

Page 34 of 42

Page 35 of 42

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

35 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

36 ACS Paragon Plus Environment

Page 36 of 42

Page 37 of 42

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

37 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

38 ACS Paragon Plus Environment

Page 38 of 42

Page 39 of 42

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

39 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

40 ACS Paragon Plus Environment

Page 40 of 42

Page 41 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Table 1: Stepwise filtering of Gold missing proteins by C-HPP criteria Membrane fraction No

1 2 3

Filtering criteria

1% FDR (Protein, peptide, PSM) Unambiguous PSM, No missed cleavage ≥ 1 unique peptide, No SAAV or isobaric substitution

Cytoplasmic fraction

Nucleus fraction

Protein

Unique peptide

Protein

Unique peptide

Protein

Unique peptide

87

105

68

51

108

121

79

96

58

42

94

109

58

94

39

42

72

106

4

≥ 9 a.a. length

55

85

35

38

69

101

5

≥ 2 unique peptides (Gold MPs)*

15

43

3

6

13

44

*A total of 25 Gold MPs and 87 Silver MPs were identified in the three subproteomes.

41 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 42 of 42

Table 2: Identified gold missing proteins in 3 subcellular proteomes No.

Entry

1

Gene Name

Protein Protein Chromoso Evidence Length

No. of Unique

Identified

Annotated

Verification

me No.

Peptides

Fraction

Localization

(2 Unique peptides)

3

9

Membrane, Nucleus

Membrane

MRM, Peptide Atlas

Nucleus, Cytoplasm

Nucleus

MRM, Peptide Atlas

Q86WK7

AMIGO3

PE2

504

2

C9JN71

ZNF878

PE3

531

19

6

3

Q6ZU67

BEND4

PE2

534

4

6

Nucleus

NA

MRM

4

Q8N5D6

GBGT1

PE2

347

9

4

Membrane, Nucleus

Membrane

MRM, Peptide Atlas

5

P59025

RTP1

PE2

263

3

3

Membrane, Nucleus

Membrane

MRM, Peptide Atlas

6

Q6P4F1

FUT10

PE2

479

8

3

Membrane, Nucleus

Membrane

MRM, Peptide Atlas

7

Q6ZNA1

ZNF836

PE2

936

19

3

Nucleus

Nucleus

MRM, Synthetic

8

Q5XG99

LYSMD4

PE2

296

15

2

Membrane, Nucleus

Membrane

MRM , Peptide Atlas

9

O95780

ZNF682

PE2

498

19

2

Nucleus, Cytoplasm

Nucleus

MRM, Peptide Atlas

10

Q15760

GPR19

PE2

415

12

2

Membrane

Membrane

MRM, Peptide Atlas

11

O75346

ZNF253

PE2

499

19

2

Nucleus

Nucleus

MRM, Peptide Atlas

12

Q96MV8

ZDHHC15

PE2

337

x

4

Membrane, Nucleus

Membrane

Peptide Atlas

13

Q96RD6

PANX2

PE2

677

22

4

Membrane

Membrane

Peptide Atlas

14

Q9Y5H3

PCDHGA10

PE2

936

5

3

Membrane

Membrane

Peptide Atlas

Q9BZD7

PRRG3

PE2

231

Peptide Atlas

16

P09131

SLC10A3

PE2

17

Q6UYE1

DLEU7

PE2

18

15

x

3

Membrane

Membrane

477

x

2

Membrane

Membrane

Peptide Atlas

221

13

2

Membrane

NA

Peptide Atlas

19

2

Cytoplasm

Nucleus

Peptide Atlas Peptide Atlas

Q6ZMV8

ZNF730

PE2

503

19

Q9P2K9

DISP3

PE2

1392

20

Q8WXS4

TMEM37

PE2

21

Q8N6I4

TMEM251

PE2

22

Q69YZ2

TMEM200B

Q9UN88

GABRQ

23

1

2

Membrane

Membrane

190

1

2

Membrane

Membrane

Peptide Atlas

163

14

2

Membrane

Membrane

Peptide Atlas

PE2

307

1

2

Membrane

Membrane

Peptide Atlas

PE2

632

x

2

Membrane, Nucleus

Membrane

Peptide Atlas

24

A8MTJ6

FOXI3

PE2

420

2

2

Nucleus

Nucleus

Peptide Atlas

25

Q6TDP4

KLHL17

PE2

642

1

2

Nucleus

Membrane

Peptide Atlas

26

A2VCL2

CCDC162P

PE2

907

6

2

Membrane, Cytoplasm

NA

NA

42 ACS Paragon Plus Environment