Rapid and Deep Profiling of Human Induced ... - ACS Publications

Further quantitative analysis based on an exponentially modified protein abundance index approach combined with UniProt keyword enrichment analysis ...
0 downloads 0 Views 3MB Size
Article pubs.acs.org/jpr

Rapid and Deep Profiling of Human Induced Pluripotent Stem Cell Proteome by One-shot NanoLC−MS/MS Analysis with Meter-scale Monolithic Silica Columns Ryota Yamana,†,‡ Mio Iwasaki,†,‡ Masaki Wakabayashi,† Masato Nakagawa,§ Shinya Yamanaka,§ and Yasushi Ishihama*,† †

Department of Molecular & Cellular BioAnalysis, Graduate School of Pharmaceutical Sciences, Kyoto University, Sakyo-ku, Kyoto 606-8501, Japan § Center for iPS Cell Research and Application, Kyoto University, Sakyo-ku, Kyoto 606-8507, Japan S Supporting Information *

ABSTRACT: Proteome analyses of human induced pluripotent stem cells (iPSC) were carried out on a liquid chromatography−tandem mass spectrometry system using meter-scale monolithic silica-C18 capillary columns without prefractionation. Tryptic peptides from five different iPSC lysates and three different fibroblast lysates (4 μg each) were directly injected onto a 200 cm long, 100 μm i.d. monolithic silica-C18 column and an 8-h gradient was applied at 500 nL/min at less than 20 MPa. We identified 98 977 nonredundant tryptic peptides from 9510 proteins (corresponding to 8712 genes), including low-abundance protein groups (such as 329 protein kinases) from triplicate measurements within 10 days. The obtained proteome profiles of the eight cell lysates were categorized into two groups, iPSC and fibroblast, by hierarchical cluster analysis. Further quantitative analysis based on an exponentially modified protein abundance index approach combined with UniProt keyword enrichment analysis revealed that the iPSC group contains more “transcription regulation”-related proteins, while the fibroblast group contained more “transport”-related proteins. Our results indicate that this simplified one-shot proteomics approach with long monolithic columns is advantageous for rapid, deep, sensitive, and reproducible proteome analysis. KEYWORDS: shotgun proteomics, monolithic silica column, iPS cell, one-shot proteomics



INTRODUCTION Nanoscale liquid chromatography−tandem mass spectrometry (nanoLC−MS/MS) systems have been extensively used for shotgun proteome analyses of biological samples in many research fields.1,2 Because of the extremely high complexity and the wide dynamic range of the digested peptides obtained from proteome samples, it is still difficult to unveil the entire proteome of mammalian cells, although rapid progress has been made in MS instrumentation. To tackle these problems, prefractionation techniques such as ion exchange chromatography and isoelectric focusing have been developed.3−5 Recently, Nagaraj et al. reported the deepest coverage of a single human cell type, HeLa, by using three different proteases and tandem prefractionation by gel filtration at the protein level, followed by strong anion exchange chromatography at the peptide level, covering 9207 human genes (10 mg proteins, total 72 fractions, 12 days).6 Beck et al. reported similar results for a U2OS cell line by using extensive peptide fractionation, charge state fractionation and directed MS acquisition together with charge state and gas phase fractionation.7 Geiger et al. used 11 different human cell lines, raising the human proteome coverage to 11731 proteins (10216 genes) (0.1 mg per cell line, total 198 runs, >30 days).8 Although the proteome coverage obtained in these studies was close to that © XXXX American Chemical Society

of RNA-seq analysis, the total nanoLC−MS/MS measurement time required for prefractionated samples was rather long. An alternative approach is to use highly efficient one-dimensional nanoLC−MS/MS systems without prefractionation. So far, to obtain higher efficiency in LC separation, particle-packed columns of up to 2 m long or columns packed with sub-2 μm particles have been used for analyzing various proteome samples.9−15 On the other hand, monolithic silica columns have been developed for highly efficient separation.16−20 Although polymer-based monolithic columns can be used over a wider pH range than silica monolithic columns, they generally show limited separation efficiency.21 Recently Vaast et al. reported that capillary monolithic silica columns showed better kinetic performance for both fast and high-peak-capacity gradient separations of peptides than columns packed with porous 3 μm or fused-core 2.7 μm beads.22 We successfully applied a 3.5-m monolithic silica-C18 column to the Escherichia coli proteome and generated the proteome map on a microarray scale.23 Special Issue: Chromosome-centric Human Proteome Project Received: September 2, 2012

A

dx.doi.org/10.1021/pr300837u | J. Proteome Res. XXXX, XXX, XXX−XXX

Journal of Proteome Research

Article

Although this “one-shot” proteomics approach is useful to simplify the workflow, it remains a substantial challenge to uncover the complex human proteome.24 Human induced pluripotent stem cells (iPSCs) were first generated on 2007 by retroviral transduction of a set of transcription factors into fibroblast cells (FBCs).25,26 They have great potential for clinically orientated research, such as disease modeling with patient-derived iPSCs and screening for drugs to treat rare diseases.27,28 Extensive research including system-wide genomic and transcriptomic approaches, has been performed to characterize iPSCs, but so far the molecular mechanisms remain insufficiently understood to distinguish bona fide iPSCs from partial reprogrammed cells at an earlier stage.29 Recently, deep proteome analyses of iPSC lines have been performed to compare the proteomes in embryonic stem cells (ESCs) and iPSCs.30,31 In both studies, proteins coded by approximately 8000 genes were successfully identified and the expression profiles of iPSC, ESC and FBC have been compared. However, the required sample amounts as well as the total nanoLC−MS/MS measurement time were rather large, owing to the need for prefractionation, compared to transcriptomics approaches. In this study, we applied our “one-shot” approach with an improved sample pretreatment protocol to achieve rapid and deep profiling of the human iPSC proteome.



EXPERIMENTAL PROCEDURES

Materials

Ammonium bicarbonate, sodium deoxycholate (SDC), sodium N-lauroylsarcosinate (SLS), dithiothreitol, iodoacetamide, mass spectrometry grade lysyl endoprotease, ethyl acetate, acetonitrile, acetic acid, methanol, and trifluoroacetic acid were purchased from Wako (Osaka, Japan). Modified trypsin was from Promega (Madison, MA). Empore disks were from 3 M (St. Paul, MN). Water was purified with a Milli-Q system (Millipore, Bedford, MA). Cell Preparation

Five human iPS cell lines (201B7−P32, 32R1−P32, 414C2−P43, 585A1−P55 and 606A1−P46) and three fibroblast cell lines (aHDF1388-P9, aHDF1419-P10 and Tig120slc-P8) were maintained as previously described.25,32,33 To obtain clear data, feeder cells were removed before sample preparation. The feederfree iPS cells were washed with PBS(−) and collected with scrapers. The pellets were stored at −80 °C.

Figure 1. Numbers of identified (A) peptides and (B) protein groups. Individual results for five iPSC lines and three FBC lines are shown (white bars), together with merged results for iPSC lines and FBC lines, and all results (black bars).

HTC-PAL autosampler (CTC Analytics, Zwingen, Switzerland). Monolithic silica columns (100 μm i.d., 2 m long) were prepared as described previously.24 A spray voltage of 2300 V was applied. The coiled monolithic capillary columns were connected to a self-pulled emitter (20 μm i.d., 5 μm tip) formed with a Sutter P-2000 (Novato, CA) with a conductive distal coating end applied with an Ion Coater Model IB-2 (Eiko Engineering, Ibaraki, Japan), at which the spray voltage was applied. Column temperature was controlled at 25 °C. The injection volume was 5 μL and the flow rate was 500 nL/min. The mobile phases consisted of (A) 0.5% acetic acid and (B) 0.5% acetic acid in 80% acetonitrile. A gradient condition was employed, that is, 5−40% B in 480 min, 40−100% B in 5 min and 100% B for 10 min. The MS scan range was m/z 300−1500. The top 10 precursor ions were selected in each MS scan for subsequent MS/MS scans. MS scans were performed for 0.25 s, and subsequently 10 MS/MS scans were performed for 0.1 s each. To minimize repeated scanning, previously scanned ions were excluded for 12 s. The CID energy was automatically adjusted by the rolling CID function of Analyst TF 1.5. Triplicate analyses were done for each sample and blank runs were inserted between different samples.

Sample Pretreatment

Sample pretreatment was carried out according to the phase transfer surfactant (PTS) protocol34 with some modifications as described below. Proteins were extracted from the pellets with 12 mM SDC, 12 mM SLS, and 50 mM ammonium bicarbonate, reduced with 10 mM dithiothreitol at room temperature for 30 min; and alkylated with 55 mM iodoacetamide in the dark at room temperature for 30 min. The protein mixture was 5-fold diluted with 50 mM ammonium bicarbonate and Lys-C/trypsin digestion was performed as described previously.24 An equal volume of ethyl acetate was added to the eluent solution, and the mixture was acidified with 0.5% trifluoroacetic acid (final concentration). The mixture was shaken for 1 min and centrifuged at 15700× g for 2 min, and then the aqueous phase was collected. Tryptic peptides were desalted with reversed phase-StageTips.35 NanoLC−MS/MS System

NanoLC−MS/MS analysis was conducted using an AB Sciex TripleTOF 5600 System (Foster City, CA) equipped with a Dionex Ultimate 3000 pump (Germering, Germany) and B

dx.doi.org/10.1021/pr300837u | J. Proteome Res. XXXX, XXX, XXX−XXX

Journal of Proteome Research

Article

Figure 2. Specific proteomic profiles of each iPSCs and FBCs. Differential heat map of protein groups identified by hierarchical cluster analysis. Rows represent the 24 nanoLC−MS/MS runs, and lines represent the identified protein groups. Identified protein groups and not identified protein groups are indicated in red and black, respectively.

tolerance of 20 ppm, a fragment ion mass tolerance of 0.1 Da and strict trypsin specificity allowing for up to 2 missed cleavages. Carbamidomethylation of cysteine was set as a fixed modification, and methionine oxidation was allowed as a variable modification. Peptides were initially rejected if the Mascot score was below the 95% confidence limit based on the “identity” score of each peptide and their length was less than 7 amino acid residues. After Mascot search, the matched spectra were subtracted from the peak lists36 and the residual peak lists were searched against the same databases using AB SCIEX ProteinPilot with the set parameters of “Instrument: TripleTOF 5600” and “Digestion: trypsin”. A precursor mass tolerance of 0.05 Da and a fragment ion mass tolerance of 0.1 Da were employed. Carbamidomethylation of cysteine, oxidation of methionine, phosphorylation of serine, threonine and tyrosine, deamidation of asparagine, glutamine, N-terminal pyro-glutamic acid of glutamine or glutamic acid and protein N-terminal acetylation were accepted, and peptides were rejected if the peptide confidence was below 95%, the delta mass was over 0.05 Da, the charge state was more than 5 or the number of missed cleavages was more than 2. For protein identification, peptides were grouped into “protein groups” based on the rules previously established.37 Then, at least two confidently identified peptides per protein were used for protein identification. In addition, single peptides with higher confidence (p < 0.01) were allowed for protein identification. False discovery rates (FDR) were estimated by searching against a randomized decoy

Figure 3. Distribution of emPAI ratios (iPSC/FBC). The 5 iPSCs and 3 FBCs are merged into one iPSC and one FBC respectively to calculate emPAI ratios. Proteins were divided into three groups: proteins with emPAI ratio < 0.2 (group A), 0.2 ≤ emPAI ratio ≤ 5 (group B), and emPAI ratio > 5 (group C). These groups contained 2026 (A), 3460 (B) and 2366 (C) proteins.

Proteome Data Analysis

The raw data files were analyzed by AB SCIEX MS Data Converter to create peak lists on the basis of the recorded fragmentation spectra. Peptides and proteins were identified by Mascot v2.3 (Matrix Science, London, U.K.) against IPI human database v3.87 (91464 sequences) with a precursor mass C

dx.doi.org/10.1021/pr300837u | J. Proteome Res. XXXX, XXX, XXX−XXX

Journal of Proteome Research

Article

Table 1. Top 30 Keywords by UniProt Keyword Enrichment Analysis for iPSCs and FBCs Up in iPSCs keyword Nucleus mRNA processing RNA-binding mRNA splicing Spliceosome Cell cycle Mitosis Cell division Chromatin regulator Helicase DNA replication Transcription rRNA processing mRNA transport ATP-binding DNA damage DNA repair Kinetochore Ribosome biogenesis Transcription regulation DNA-directed RNA polymerase WD repeat Nucleotidyltransferase Methyltransferase Translocation Repressor S-adenosyl-L-methionine Ribonucleoprotein Chromosome Activator

fold enrichment 2.4 5.4 3.8 5.6 6.2 3.3 4.9 4.0 4.4 4.8 5.6 1.7 6.7 5.9 1.8 3.5 3.6 5.9 6.6 1.6 6.8 2.7 4.5 3.1 3.9 2.0 3.3 2.2 1.7 1.7

Up in FBCs p-value −188

3.1 × 10 4.6 × 10−63 1.9 × 10−61 9.2 × 10−54 1.7 × 10−38 5.1 × 10−37 1.2 × 10−35 9.5 × 10−34 2.4 × 10−33 5.1 × 10−25 3.9 × 10−23 5.3 × 10−23 2.0 × 10−22 9.7 × 10−22 3.5 × 10−20 4.0 × 10−20 7.7 × 10−20 1.7 × 10−18 1.4 × 10−17 5.6 × 10−15 2.6 × 10−14 1.5 × 10−13 9.1 × 10−13 1.2 × 10−10 3.3 × 10−9 4.0 × 10−9 5.2 × 10−9 7.4 × 10−8 2.0 × 10−6 2.0 × 10−6

FDR −185

4.5 × 10 6.7 × 10−60 2.8 × 10−58 1.3 × 10−50 2.5 × 10−35 7.5 × 10−34 1.8 × 10−32 1.4 × 10−30 3.5 × 10−30 7.4 × 10−22 5.8 × 10−20 7.7 × 10−20 2.9 × 10−19 1.4 × 10−18 5.2 × 10−17 5.9 × 10−17 1.1 × 10−16 2.4 × 10−15 2.1 × 10−14 8.1 × 10−12 3.8 × 10−11 2.2 × 10−10 1.3 × 10−9 1.8 × 10−7 4.9 × 10−6 5.8 × 10−6 7.5 × 10−6 1.1 × 10−4 2.9 × 10−3 2.9 × 10−3

keyword

fold enrichment

Actin-binding Cytoskeleton Endoplasmic reticulum Protein transport Lysosome Endosome Calcium Coiled coil Disease mutation Trimer Calcium binding Alternative splicing Extracellular matrix Transport Triple helix Oxidoreductase Golgi apparatus Collagen ER-golgi transport Duplication Host−virus interaction Nucleotide-binding Hydroxylysine Kinase Membrane SH3 domain Signal Cell adhesion Glycoprotein Lipoprotein

3.4 2.4 2.2 2.4 3.9 3.2 1.9 1.5 1.6 7.4 3.6 1.2 2.5 1.5 6.2 1.9 1.8 3.4 3.5 2.4 2.2 1.4 5.4 1.7 1.2 2.3 1.2 1.8 1.2 1.5

p-value −19

7.8 × 10 8.0 × 10−19 1.9 × 10−17 2.9 × 10−15 4.5 × 10−15 1.4 × 10−13 2.8 × 10−12 7.8 × 10−12 7.5 × 10−11 1.2 × 10−9 3.4 × 10−9 7.7 × 10−9 1.3 × 10−8 1.9 × 10−8 2.3 × 10−8 2.7 × 10−8 1.9 × 10−7 2.0 × 10−7 2.6 × 10−7 4.2 × 10−7 8.4 × 10−7 9.1 × 10−7 1.7 × 10−6 2.9 × 10−6 3.7 × 10−6 6.4 × 10−6 1.7 × 10−5 3.5 × 10−5 2.1 × 10−4 7.7 × 10−4

FDR 1.2 × 10−15 1.2 × 10−15 2.8 × 10−14 4.3 × 10−12 6.8 × 10−12 2.2 × 10−10 4.3 × 10−9 1.2 × 10−8 1.1 × 10−7 1.8 × 10−6 5.0 × 10−6 1.2 × 10−5 1.9 × 10−5 2.9 × 10−5 3.4 × 10−5 4.1 × 10−5 2.9 × 10−4 3.0 × 10−4 3.8 × 10−4 6.2 × 10−4 1.2 × 10−3 1.4 × 10−3 2.6 × 10−3 4.4 × 10−3 5.6 × 10−3 9.6 × 10−3 2.5 × 10−2 5.3 × 10−2 3.2 × 10−1 1.1

unique peptides and 6189 ± 134 protein groups on average (54458 peptides and 7223 proteins in total) were successfully identified in triplicate analysis with a FDR of 0.36% at the peptide level and 2.04% at the protein level. We compared the dynamic range of the peak response observed in this study with that in the previous study24 using the 15-cm long particle-packed column (Figure S1 in the Supporting Information). An approximately 40-fold wider dynamic range was obtained in the monolithic column. As far as we know, this is the most comprehensive human proteome analysis yet achieved witha one-shot nanoLC− MS/MS run. In addition, we examined the reproducibility of retention time among triplicate runs. The relative standard deviation (RSD) for 10211 commonly identified peptides was 0.79%, while the RSD for 2366 commonly identified peptides among 24 runs (over 9.75 days) was 2.01%, indicating that this system offers high reproducibility, comparable to that of conventional HPLC systems. In our previous study of E. coli proteomics,23 the reproducibility of retention time was more than 9% RSD. The improvement in the present case might be due togreater stability of the column temperature. Next, we applied this system to 8 cell lines (5 iPSCs and 3 FBCs). Triplicate analysis was done for each cell line. The results are summarized in Figure 1. In total, 76762 unique peptides and 8179 protein groups were identified from 5 iPSCs, and 60124 unique peptides and 7202 protein groups from 3 FBCs. Taken together, 98977 unique peptides and 9510 protein groups were identified from these 8 cell lines (Tables S1 and S2 in the Supporting Information). The total MS/MS events were

database created by Mascot and ProteinPilot. UniProt/SwissProt database release 2011_06 (20235 sequences) was used to convert peptides to GeneNames. It was also employed for protein keyword classification. For protein quantitation, exponentially modified protein abundance index (emPAI) was used.38 An in-house Perl script based on emPAICalc39 was created to calculate the emPAI values. For clustering analyses, the identified proteins were classified by using Cluster3.0.40 The data matrix for cluster analysis was generated according to the following rules: if one protein group was identified in each cell, the numerical value was set as 1, and if not, 0. Hierarchical cluster analysis was performed using correlation similarity metric and centroid linkage after centering the data by subtracting mean values for columns and arrays. The clustering data was visualized as a 2D-map with TreeView downloaded from the Web site (http://jtreeview.sourceforge.net/). For UniProt keyword enrichment analysis, DAVID bioinformatics resources 6.7 was employed.41 The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium (http://proteomecentral.proteomexchange. org) via the PRIDE partner repository (http://www.ebi.ac.uk/ pride/) with the data set identifier PXD000071.



RESULTS AND DISCUSSION First, we employed one iPSC, 32R1-32, for proteome analysis by “one-shot” nanoLC−MS/MS measurement using the 2-m monolithic silica capillary column at less than 20 MPa without prefractionation. By applying an 8-h gradient, 37861 ± 1288 D

dx.doi.org/10.1021/pr300837u | J. Proteome Res. XXXX, XXX, XXX−XXX

Journal of Proteome Research

Article

Figure 4. Comparison of emPAI values between iPSCs and FBCs. Proteins related to (A) transcription regulation, (B) glycoprotein, (C) methyltransferase and (D) kinase are shown. Significant differences between the emPAI values in iPSCs and FBCs were assessed based on the Wilcoxon signed-rank test.

3783765 and the number of MS/MS spectra matched to peptides was 1012918 (27%). For comparison with other large-scale iPSC proteome studies,30,31 proteins were converted to genes to avoid the influence of protein counting methods. In this study, 8712 genes were identified, whereas two other studies reported 807031 and 825030 genes. We also checked the coverage of 518 protein kinase-coding genes. Our study covered 329 kinases, whereas the previous studies found 25131 and 29130 kinases, supporting the conclusion that the widest proteome coverage was obtained in our study. To compare the expression profiles of the 8 cell lines, we carried out hierarchical cluster analysis of 24 samples (8 cell lines x triplicate runs). Cell lines were clearly separated into two clusters, iPSC and FBC (Figure 2). Within each cluster, triplicate runs were generally grouped into small clusters, except 414C2 and 585A1. As regards proteins, there were four major clusters. Cluster 1 consisted of proteins expressed in both iPSCs and FBCs, whereas clusters 2 and 3 contained iPSC-specific and FBC-specific proteins, respectively. The residual proteins belonged to cluster 4. The ratio between these clusters (clusters 1:2:3:4) was approximately 40:15:15:30. To evaluate the results

more precisely, we employed the quantitative scale called emPAI, which is based on the number of observed peptides divided by the number of observable peptides. The Spearman’s rank correlation coefficients of emPAI values among the 8 cell lines show that each iPSC had high similarity to other iPSCs, but was quite different from FBCs (Figures S2 and S3 in the Supporting Information). Based on these results, we merged the 5 iPSCs and the 3 FBCs into one iPSC and one FBC to calculate the emPAI ratios (iPSC/FBC). The resultant distribution of emPAI ratios is shown in Figure 3. Considering the variation in emPAI values shown in Figure S3, proteins were divided into three groups: proteins with emPAI ratio < 0.2 (group A), 0.2 ≤ emPAI ratio ≤ 5 (group B), and emPAI ratio > 5 (group C). These groups contained 2026 (A), 3460 (B) and 2366 (C) proteins. We confirmed that key pluripotency and stem cell markers, such as OCT4, NANOG, SOX2, N-MYC, POU5F1, DNMT3B, UTF1, PODXL, GRB7 and BRIX, were present in group C in this study. To extract the protein functions enriched in groups A and C, we employed UniProt keyword analysis using 250 keywords, where the numbers of proteins having a certain keyword were compared to examine which keywords were enriched in groups A and C E

dx.doi.org/10.1021/pr300837u | J. Proteome Res. XXXX, XXX, XXX−XXX

Journal of Proteome Research

Article

(Figure S4, in the Supporting Information). We also did UniProt keyword enrichment analysis using DAVID and the top 30 keywords enriched in groups A and C are listed in Table 1. Group A contains “Endoplasmic reticulum”, “Golgi apparatus” as well as “Lysosome” and “Endosome”, whereas group C contains “Nucleus”, indicating cellular localization. Protein functionrelated keywords such as “Protein transport”, “Calcium”, “Extracellular matrix”, “SH3 domain” and “Signal” were enriched in group A, while “Transcription”, “RNA-binding”, “Chromatin regulator”, “Cell cycle” and “DNA damage” were enriched in group C. Figure 4A shows the abundance of proteins related to transcription regulation. These proteins generally have low abundance, and are clearly more highly expressed in iPSCs. Regarding post-translational modifications, proteins with “Glycoprotein” keyword are more abundant in FBCs (Figure 4B), whereas glycosyltransferases were not enriched in group (A) (Figure S4). This is presumably because glycosyltransferases generally show very low abundance and the coverage in this study may not have been sufficient. Proteins with “Methyltransferase” keyword were found more in iPSCs, as shown in Figure 4C, but the enrichment analysis for “Methylation” did not support this result, again presumably owing to insufficient coverage. Largescale methylated proteome analysis would be necessary to clarify the situation. Regarding the 329 protein kinases found in this study, we examined their abundance distribution between iPSC and FBC. In general, there was no clear tendency, but some kinases, such as cell cycle-related CDK1, CDK11A, CDK11B, as well as DNA damage-related ATM and ATR, were more highly expressed in iPSC, whereas Tyr kinases such as EPHA group, JAK1, PDGFRA, PDGFRB and EGFR were more highly expressed in FBC. Finally, we compared this study with other two studies on iPSC and FBC in terms of the gene numbers coding the identified proteins, the required sample amount and the total measurement time (Figure 5). Although the gene number was

Table 2. Comparison of the Genes Identified in 3 Combined iPSC Studies with neXtProt

Figure 5. Comparison of this study with other two studies on iPSC and FBC. Genes coding identified proteins, required sample amount and total measurement time are shown.



almost the same mong the three studies, our “one-shot” approach is advantageous in terms of minimizing sample amounts and reducing measurement time. Overall, 11237 genes were identified at the proteome level, corresponding to 52% of the total human genes in neXtProt. Furthermore, we compared this combined data with neXtProt to examine how many genes were

Supplementary figures and tables. This material is available free of charge via the Internet at http://pubs.acs.org. The MS/MS data have been deposited to the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository (http://www.ebi.ac.uk/pride/) with the data set identifier PXD000071.

chromosome no.

gene number in neXtProt

gene number in 3 iPSC studies

coverage (%)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y Total

2038 1223 1060 759 856 1770 920 703 803 753 1295 1020 323 622 588 815 1151 272 1394 548 251 447 863 58 20532

1053 691 569 374 488 973 489 356 404 408 590 561 184 328 329 469 609 156 640 283 92 244 390 17 10697

51.67 56.50 53.68 49.28 57.01 54.97 53.15 50.64 50.31 54.18 45.56 55.00 56.97 52.73 55.95 57.55 52.91 57.35 45.91 51.64 36.65 54.59 45.19 29.31 52.10

a

missing proteinsa in neXtProt

missing proteinsa in 3 iPSC studies

636 328 295 203 250 467 315 222 268 228 484 249 88 174 197 246 308 70 521 161 102 134 298

94 74 42 43 67 89 68 37 55 51 40 47 18 30 31 48 44 12 103 26 6 22 44

6244

1091

Proteins without evidence at the protein level.

newly found at the proteome level (Table 2). In total, the translational products of 1091 genes out of 6244 genes were newly confirmed. We extracted 24 genes out of 1562 genes identified uniquely in this study by considering genes having the UniProt keyword “Transcription regulation” and that were more than 5-fold enriched in iPSCs, and compared them with these 1091 genes. As a result, five genes (ARID3C, MSL3P1, POU2F2/POU2F3, YY2 and ZIC1) overlapped. We consider that these gene products are potential key molecules for pluripotency, since other 19 genes include known stem marker proteins, such as SOX1/SOX21, POU3F1/POU3F2/POU3F3/ POU5F1B and ZFP42, in neXtProt. In conclusion, we have developed a one-shot nanoLC− MS/MS system using meter-scale monolithic silica-C18 capillary columns without prefractionation, and applied it to in-depth analysis of the iPSC proteome. Since this one-shot approach is rapid, simple, sensitive and reproducible, it should be particularly useful for samples available in only limited amounts, such as FACS-derived, laser capture microdissection-derived or biopsyderived samples. Further development is ongoing.

ASSOCIATED CONTENT

S Supporting Information *

F

dx.doi.org/10.1021/pr300837u | J. Proteome Res. XXXX, XXX, XXX−XXX

Journal of Proteome Research



Article

elution using columns packed with 1.0-micron particles. Anal. Chem. 1999, 71 (3), 700−8. (12) Shen, Y.; Zhang, R.; Moore, R. J.; Kim, J.; Metz, T. O.; Hixson, K. K.; Zhao, R.; Livesay, E. A.; Udseth, H. R.; Smith, R. D. Automated 20 kpsi RPLC-MS and MS/MS with chromatographic peak capacities of 1000−1500 and capabilities in proteomics and metabolomics. Anal. Chem. 2005, 77 (10), 3090−100. (13) Shen, Y.; Moore, R. J.; Zhao, R.; Blonder, J.; Auberry, D. L.; Masselon, C.; Pasa-Tolić, L.; Hixson, K. K.; Auberry, K. J.; Smith, R. D. High-efficiency on-line solid-phase extraction coupling to 15−150microm-i.d. column liquid chromatography for proteomic analysis. Anal. Chem. 2003, 75 (14), 3596−3605. (14) Thakur, S. S.; Geiger, T.; Chatterjee, B.; Bandilla, P.; Fröhlich, F.; Cox, J.; Mann, M. Deep and highly sensitive proteome coverage by LCMS/MS without prefractionation. Mol. Cell. Proteomics 2011, 10 (8), M110.003699. (15) Köcher, T.; Swart, R.; Mechtler, K. Ultra-high-pressure RPLC hyphenated to an LTQ-Orbitrap Velos reveals a linear relation between peak capacity and number of identified peptides. Anal. Chem. 2011, 83 (7), 2699−704. (16) Luo, Q.; Shen, Y.; Hixson, K. K.; Zhao, R.; Yang, F.; Moore, R. J.; Mottaz, H. M.; Smith, R. D. Preparation of 20-microm-i.d. silica-based monolithic columns and their performance for proteomics analyses. Anal. Chem. 2005, 77 (15), 5028−35. (17) Miyamoto, K.; Hara, T.; Kobayashi, H.; Morisaka, H.; Tokuda, D.; Horie, K.; Koduki, K.; Makino, S.; Núñez, O.; Yang, C.; Kawabe, T.; Ikegami, T.; Takubo, H.; Ishihama, Y.; Tanaka, N. High-efficiency liquid chromatographic separation utilizing long monolithic silica capillary columns. Anal. Chem. 2008, 80 (22), 8741−50. (18) Ishizuka, N.; Kobayashi, H.; Minakuchi, H.; Nakanishi, K.; Hirao, K.; Hosoya, K.; Ikegami, T.; Tanaka, N. Monolithic silica columns for high-efficiency separations by high-performance liquid chromatography. J. Chromatogr., A 2002, 960 (1−2), 85−96. (19) Leinweber, F. C.; Tallarek, U. Chromatographic performance of monolithic and particulate stationary phases. Hydrodynamics and adsorption capacity. J. Chromatogr., A 2003, 1006 (1−2), 207−28. (20) Ikegami, T.; Dicks, E.; Kobayashi, H.; Morisaka, H.; Tokuda, D.; Cabrera, K.; Hosoya, K.; Tanaka, N. How to utilize the true performance of monolithic silica columns. J. Sep. Sci. 2004, 27 (15−16), 1292−302. (21) Guiochon, G. Monolithic columns in high-performance liquid chromatography. J. Chromatogr., A 2007, 1168 (1−2), 101−68 discussion 100. (22) Vaast, A.; Broeckhoven, K.; Dolman, S.; Desmet, G.; Eeltink, S. Comparison of the gradient kinetic performance of silica monolithic capillary columns with columns packed with 3 μm porous and 2.7 μm fused-core silica particles. J. Chromatogr., A 2012, 1228, 270−5. (23) Iwasaki, M.; Miwa, S.; Ikegami, T.; Tomita, M.; Tanaka, N.; Ishihama, Y. One-dimensional capillary liquid chromatographic separation coupled with tandem mass spectrometry unveils the Escherichia coli proteome on a microarray scale. Anal. Chem. 2010, 82 (7), 2616−20. (24) Iwasaki, M.; Sugiyama, N.; Tanaka, N.; Ishihama, Y. Human proteome analysis by using reversed phase monolithic silica capillary columns with enhanced sensitivity. J. Chromatogr., A 2012, 1228, 292−7. (25) Takahashi, K.; Tanabe, K.; Ohnuki, M.; Narita, M.; Ichisaka, T.; Tomoda, K.; Yamanaka, S. Induction of pluripotent stem cells from adult human fibroblasts by defined factors. Cell 2007, 131 (5), 861−72. (26) Yu, J.; Vodyanik, M. A.; Smuga-Otto, K.; Antosiewicz-Bourget, J.; Frane, J. L.; Tian, S.; Nie, J.; Jonsdottir, G. A.; Ruotti, V.; Stewart, R.; Slukvin, I. I.; Thomson, J. A. Induced pluripotent stem cell lines derived from human somatic cells. Science 2007, 318 (5858), 1917−20. (27) Egawa, N.; Kitaoka, S.; Tsukita, K.; Naitoh, M.; Takahashi, K.; Yamamoto, T.; Adachi, F.; Kondo, T.; Okita, K.; Asaka, I.; Aoi, T.; Watanabe, A.; Yamada, Y.; Morizane, A.; Takahashi, J.; Ayaki, T.; Ito, H.; Yoshikawa, K.; Yamawaki, S.; Suzuki, S.; Watanabe, D.; Hioki, H.; Kaneko, T.; Makioka, K.; Okamoto, K.; Takuma, H.; Tamaoka, A.; Hasegawa, K.; Nonaka, T.; Hasegawa, M.; Kawata, A.; Yoshida, M.; Nakahata, T.; Takahashi, R.; Marchetto, M. C.; Gage, F. H.; Yamanaka,

AUTHOR INFORMATION

Corresponding Author

*Phone: +81-75-753-4555. Fax: +81-75-753-4601. E-mail: [email protected]. Author Contributions ‡

These authors contributed equally to this work.

Notes

The authors declare no competing financial interest. S.Y. is a member without salary of the scientific advisory boards of iPierian, iPS Academia Japan, Megakaryon Corporation, and Retina Institute Japan.



ACKNOWLEDGMENTS We thank other members of our laboratories for helpful discussion. We also appreciate the PRIDE team to deposit our MS/MS data. M.I. is supported by a fellowship for young scientists from the Japan Society for the Promotion of Science (JSPS). This work was supported by grants from JSPS Grants-inAid for Scientific Research No. 24116513 and 24241062 (Y.I.).



ABBREVIATIONS FDR, false-discovery rate; iPSC, induced pluripotent stem cell; FBC, fibroblast cell; ESC, embryonic stem cell; nanoLC−MS/ MS, nanoscale liquid chromatography−tandem mass spectrometry; SDC, sodium deoxycholate; SLS, sodium N-lauroylsarcosinate; PTS, phase transfer surfactant; emPAI, exponentially modified protein abundance index



REFERENCES

(1) Ahrens, C. H.; Brunner, E.; Qeli, E.; Basler, K.; Aebersold, R. Generating and navigating proteome maps using mass spectrometry. Nat. Rev. Mol. Cell Biol. 2010, 11 (11), 789−801. (2) Cox, J.; Mann, M. Quantitative, high-resolution proteomics for data-driven systems biology. Annu. Rev. Biochem. 2011, 80, 273−99. (3) Cargile, B. J.; Sevinsky, J. R.; Essader, A. S.; Stephenson, J. L.; Bundy, J. L. Immobilized pH gradient isoelectric focusing as a firstdimension separation in shotgun proteomics. J. Biomol. Tech. 2005, 16 (3), 181−9. (4) Gilar, M.; Olivova, P.; Daly, A. E.; Gebler, J. C. Orthogonality of separation in two-dimensional liquid chromatography. Anal. Chem. 2005, 77 (19), 6426−34. (5) Link, A. J.; Eng, J.; Schieltz, D. M.; Carmack, E.; Mize, G. J.; Morris, D. R.; Garvik, B. M.; Yates, J. R. Direct analysis of protein complexes using mass spectrometry. Nat. Biotechnol. 1999, 17 (7), 676−82. (6) Nagaraj, N.; Wisniewski, J. R.; Geiger, T.; Cox, J.; Kircher, M.; Kelso, J.; Päab̈ o, S.; Mann, M. Deep proteome and transcriptome mapping of a human cancer cell line. Mol. Syst. Biol. 2011, 7, 548. (7) Beck, M.; Schmidt, A.; Malmstroem, J.; Claassen, M.; Ori, A.; Szymborska, A.; Herzog, F.; Rinner, O.; Ellenberg, J.; Aebersold, R. The quantitative proteome of a human cell line. Mol. Syst. Biol. 2011, 7, 549. (8) Geiger, T.; Wehner, A.; Schaab, C.; Cox, J.; Mann, M. Comparative proteomic analysis of eleven common cell lines reveals ubiquitous but varying expression of most proteins. Mol. Cell. Proteomics 2012, 11 (3), M111.014050. (9) Eeltink, S.; Dolman, S.; Swart, R.; Ursem, M.; Schoenmakers, P. J. Optimizing the peak capacity per unit time in one-dimensional and offline two-dimensional liquid chromatography for the separation of complex peptide samples. J. Chromatogr., A 2009, 1216 (44), 7368−74. (10) Jerkovich, A. D.; Mellors, J. S.; Jorgenson, J. W. The Use of micrometer-sized particles in ultrahigh pressure liquid chromatography. LC-GC Eur. 2003, 16, 20−23. (11) MacNair, J. E.; Patel, K. D.; Jorgenson, J. W. Ultrahigh-pressure reversed-phase capillary liquid chromatography: isocratic and gradient G

dx.doi.org/10.1021/pr300837u | J. Proteome Res. XXXX, XXX, XXX−XXX

Journal of Proteome Research

Article

S.; Inoue, H. Drug Screening for ALS Using Patient-Specific Induced Pluripotent Stem Cells. Sci. Transl. Med. 2012, 4 (145), 145ra104. (28) Grskovic, M.; Javaherian, A.; Strulovici, B.; Daley, G. Q. Induced pluripotent stem cells–opportunities for disease modelling and drug discovery. Nat. Rev. Drug Discovery 2011, 10 (12), 915−29. (29) Plath, K.; Lowry, W. E. Progress in understanding reprogramming to the induced pluripotent state. Nat. Rev. Genet. 2011, 12 (4), 253−65. (30) Phanstiel, D. H.; Brumbaugh, J.; Wenger, C. D.; Tian, S.; Probasco, M. D.; Bailey, D. J.; Swaney, D. L.; Tervo, M. A.; Bolin, J. M.; Ruotti, V.; Stewart, R.; Thomson, J. A.; Coon, J. J. Proteomic and phosphoproteomic comparison of human ES and iPS cells. Nat. Methods 2011, 8 (10), 821−7. (31) Munoz, J.; Low, T. Y.; Kok, Y. J.; Chin, A.; Frese, C. K.; Ding, V.; Choo, A.; Heck, A. J. The quantitative proteomes of human-induced pluripotent stem cells and embryonic stem cells. Mol. Syst. Biol. 2011, 7, 550. (32) Okita, K.; Matsumura, Y.; Sato, Y.; Okada, A.; Morizane, A.; Okamoto, S.; Hong, H.; Nakagawa, M.; Tanabe, K.; Tezuka, K.; Shibata, T.; Kunisada, T.; Takahashi, M.; Takahashi, J.; Saji, H.; Yamanaka, S. A more efficient method to generate integration-free human iPS cells. Nat. Methods 2011, 8 (5), 409−12. (33) Nakagawa, M.; Takizawa, N.; Narita, M.; Ichisaka, T.; Yamanaka, S. Promotion of direct reprogramming by transformation-deficient Myc. Proc Natl Acad Sci U S A. 2010, 107 (32), 14152−7. (34) Masuda, T.; Sugiyama, N.; Tomita, M.; Ishihama, Y. Microscale phosphoproteome analysis of 10,000 cells from human cancer cell lines. Anal. Chem. 2011, 83 (20), 7698−703. (35) Rappsilber, J.; Mann, M.; Ishihama, Y. Protocol for micropurification, enrichment, pre-fractionation and storage of peptides for proteomics using StageTips. Nat. Protoc. 2007, 2 (8), 1896−906. (36) Helmy, M.; Sugiyama, N.; Tomita, M.; Ishihama, Y. Mass spectrum sequential subtraction speeds up searching large peptide MS/ MS spectra datasets against large nucleotide databases for proteogenomics. Genes Cells 2012, 17 (8), 633−44. (37) Nesvizhskii, A. I.; Aebersold, R. Interpretation of shotgun proteomic data: the protein inference problem. Mol. Cell. Proteomics 2005, 4 (10), 1419−40. (38) Ishihama, Y.; Oda, Y.; Tabata, T.; Sato, T.; Nagasu, T.; Rappsilber, J.; Mann, M. Exponentially modified protein abundance index (emPAI) for estimation of absolute protein amount in proteomics by the number of sequenced peptides per protein. Mol. Cell. Proteomics 2005, 4 (9), 1265−72. (39) Shinoda, K.; Tomita, M.; Ishihama, Y. emPAI Calc–for the estimation of protein abundance from large-scale identification data by liquid chromatography-tandem mass spectrometry. Bioinformatics 2010, 26 (4), 576−7. (40) Eisen, M. B.; Spellman, P. T.; Brown, P. O.; Botstein, D. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. U.S.A. 1998, 95 (25), 14863−8. (41) Huang, d. W.; Sherman, B. T.; Lempicki, R. A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 2009, 4 (1), 44−57.

H

dx.doi.org/10.1021/pr300837u | J. Proteome Res. XXXX, XXX, XXX−XXX