Proteogenomic Analyses of Cellular Lysates Using a Phenol

Nov 5, 2018 - Research of the year C&EN's most popular stories of the year Molecules of the year Science that delighted... SCIENCE CONCENTRATES ...
0 downloads 0 Views 993KB Size
Subscriber access provided by Kaohsiung Medical University

Article

Proteogenomic analyses of cellular lysates using a phenol-guanidinium thiocyanate reagent Yusuke Kawashima, Jun Miyata, Takashi Watanabe, Juri Shioya, Makoto Arita, and Osamu Ohara J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.8b00609 • Publication Date (Web): 05 Nov 2018 Downloaded from http://pubs.acs.org on November 9, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Proteogenomic analyses of cellular lysates using a phenol-guanidinium thiocyanate reagent

Yusuke Kawashima1,2, Jun Miyata3, Takashi Watanabe1, Juri Shioya1, Makoto Arita3,4,5, Osamu Ohara1,2,*

1

Laboratory for Integrative Genomics, RIKEN Center for Integrative Medical Sciences (IMS),

Yokohama, Kanagawa 230-0045, Japan. 2

Department of Genome Research and Development, Kazusa DNA Research Institute, Kisarazu,

Chiba 292-0818, Japan 3

Laboratory for Metabolomics, RIKEN Center for Integrative Medical Sciences (IMS), Yokohama,

Kanagawa 230-0045, Japan. 4

Graduate School of Medical Life Science, Yokohama City University, Yokohama, Kanagawa 230-

0045, Japan. 5

Division of Physiological Chemistry and Metabolism, Keio University Faculty of Pharmacy, Tokyo

105-8512, Japan.

* Corresponding author: Osamu Ohara, RIKEN Center for Integrative Medical Sciences (IMS), 1-722 Suehiro-cho Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan. Tel: +81-45-503-9696, Fax: +81-45-503-9694, Email: [email protected] 1 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 29

Abstract

Protein and RNA profiles are highly informative when generating a comprehensive understanding of the dynamics of complex biological systems. Quantitative correlations between protein and RNA profiles are not always high, yet simultaneous acquisition of both profiles remains challenging, in part due to the limited availability of samples and/or the inconvenience of separately preparing protein and RNA fractions. In a previous study, protein, DNA, and RNA fractions were simultaneously prepared from the same sample using phenol-guanidinium isothiocyanate reagent (P/GTC), although the performance of P/GTC-extracts in proteogenomic analyses remains poorly understood. We therefore evaluated the performance of the P/GTC-extraction method in proteogenomic analyses using standard HEK293-F cells and human peripheral blood neutrophils. The latter cell type is renowned for their extreme vulnerability to protein/RNA degradation, reflecting high protease and RNase activities. Our data indicate that the P/GTC extraction method provides superior protein profiles from neutrophil and HEK293-F cell samples for simultaneous preparation of RNA and protein, as compared with those from conventional protein extraction methods. The P/GTC extraction method therefore provides a powerful and robust tool for a broad range of proteogenomic studies.

Keywords: P/GTC, proteogenomics, trans-omics, multi-omics, neutrophils

2 ACS Paragon Plus Environment

Page 3 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Introduction The evolution of mass spectrometry and the development of next-generation sequencing technology have made it possible to perform large-scale omics analyses and profile DNA, RNA, proteins, and metabolites.(1-4) Omics analyses have led to the identification of many biomarkers for complicated biological phenomena and for the clarification of disease mechanisms.(5-8) The integration of various omics analyses has attracted much attention recently as an approach for more deeply understanding complicated biological phenomena and disease mechanisms.(9, 10) However, integrated omics analyses are sometimes hampered by the low availability of high-quality samples, because DNA, RNA, proteins and metabolites are generally isolated separately. This is particularly problematic in clinical applications of this approach. To address this concern, we are investigating the use of a phenolguanidinium isothiocyanate (P/GTC) reagent which allows the extraction of DNA, RNA and proteins from the same lysate of a single sample. (11, 12) P/GTC reagent is widely used for RNA extraction, and it was previously demonstrated that DNA and proteins can be sequentially extracted following RNA extraction (Fig. 1). However, the utility of P/GTC reagent in proteogenomic analysis remains to be fully evaluated, and especially whether proteins extracted using P/GTC are suitable for proteome analysis. In this study, we conducted a side-by-side comparison of the protein extraction methods between P/GTC-based extraction and conventional extraction from cultured cells or human peripheral blood leukocytes.

3 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 29

Materials and methods Reagents Mass spectrometry grade water, acetonitrile (ACN), 0.1% formic acid (FA) in water, 0.1% FA in ACN, and TRIzol reagent (P/GTC reagent) were purchased from Thermo Fisher Scientific (Waltham, MA). Sequencing grade modified trypsin was supplied by Promega (Madison, WI). All other reagents were purchased from FUJIFILM Wako Pure Chemical Corporation (Osaka, Japan).

RNA, DNA, and Protein Extraction from Cells FreeStyle™ HEK293-F cells (Thermo Fisher Scientific) were cultured in a shaking incubator at 37°C with 8% CO2 in serum-free FreeStyle™ 293 Expression medium (Thermo Fisher Scientific) to a cell density of 2 x 106 cells/ml, washed with cold PBS, and pelleted. We used a CD16-positive immunomagnetic selection method to isolate neutrophils from the peripheral blood of healthy volunteers, as previously reported.(13) Briefly, granulocytes were isolated from heparin-anticoagulated peripheral blood and separated from peripheral blood mononuclear cells by dextran sedimentation and Percoll centrifugation, followed by hypotonic lysis of the erythrocytes. Neutrophils were purified by magnetic selection using anti-CD16 antibody-coupled magnetic beads (Miltenyi Biotec, Bergisch Gladbach, Germany) and an autoMACS Pro Separator (Miltenyi Biotec). Cell purity and viability were assessed by Diff-Quick staining and trypan blue exclusion, respectively. The purity of neutrophils collected in the positive fraction was >99%. Purified neutrophils were 4 ACS Paragon Plus Environment

Page 5 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

washed twice with PBS, then subjected to the following protocol. This study was approved by the Institutional Review Boards of the RIKEN Center for Integrative Medical Sciences. All volunteers granted their written informed consent to participate. Each cell pellet was prepared in two ways. In the first set (conventional direct extraction method), the cell pellet was stored at −80 °C until use. Proteins in the frozen cell pellets were extracted in 50 volumes (relative to the cell pellet) of phase-transfer surfactant (PTS) buffer (12-mM sodium deoxycholate, 12-mM sodium N-lauroyl sarcosinate, 100-mM Tris-HCl (pH 9.0), and protease inhibitors or SDS buffer containing 2% SDS, 100-mM Tris-HCl (pH 9.0), and protease inhibitors using a water bath-type sonicator (Bioruptor UCD-200; SONIC BIO Co., Kanagawa, Japan) with crushed ice for 15 min in 2 min on/1 min off cycles at the ‘high’ setting, followed by centrifugation at 15,000 x g for 15 min at 4 °C to remove the insoluble material. In the second set (P/GTC extraction method), the cell pellet was mixed with 0.5 mL of P/GTC reagent, then stored at −80 °C until use. RNA and DNA were isolated from the cell samples using P/GTC reagent according to the manufacturer’s protocol. In brief, the frozen cells mixed with P/GTC reagent were thawed at room temperature and separated into a clear upper aqueous phase (containing the RNA), interphase, and a lower organic phase (containing the DNA and proteins) by the addition of 0.1 mL of chloroform. RNA was precipitated from the aqueous phase with 0.25 mL of isopropanol and DNA was precipitated from the interphase and organic phase with 0.15 mL of ethanol. The precipitated RNA and DNA were washed, then re-dissolved for use in downstream processing (DNA was not analyzed in this study). Proteins 5 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 29

were precipitated from the phenol/ethanol phase by adding 0.8 mL of acetone, the samples were incubated at room temperature for 10 min, then centrifuged at 15,000 x g for 15 min at 4 °C. The protein pellet was washed twice with 0.8 mL of ACN and then re-dissolved in 50 volumes (relative to the cell pellet) of PTS buffer or SDS buffer using water bath-type sonicator for 15 min in 1 min-on/1 min-off cycles at the ‘high’ setting, followed by centrifugation at 15,000 x g for 15 min at 4°C to remove the insoluble material.

SDS-PAGE Analysis The protein extract was analyzed using SDS-PAGE (Perfect NT Gel NE M, 7.5-15% acrylamide, 16 wells; DRC Co., Ltd., Tokyo, Japan) according to the manufacturer’s protocol. The gel was stained with Coomassie brilliant blue (SimplyBlue SafeStain; Thermo Fisher Scientific) according to the manufacturer’s protocol. The molecular weight marker used was Mark12 (Thermo Fisher Scientific).

Protein Digestion The protein extract was treated with 10 mM dithiothreitol at 50 °C for 30 min, then subjected to alkylation with 30 mM iodoacetamide in the dark at room temperature for 30 min. The mixture was 4fold diluted with 50 mM ammonium bicarbonate and digested using Lys-C and trypsin overnight at 37 °C. An equal volume of ethyl acetate was added to the digested samples, and the mixtures were acidified with 0.5% trifluoroacetic acid (final concentration) according to the PTS protocols.(14, 15) 6 ACS Paragon Plus Environment

Page 7 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Each mixture was shaken for 1 min and centrifuged at 12,000× g for 2 min for phase separation, then the aqueous phase was retrieved. The volume of the digested sample thus recovered was reduced to half or less of the original volume using a centrifugal evaporator for the complete removal of ethyl acetate, desalted with C18-StageTips, followed by drying using a centrifugal evaporator.(16) The dried peptides were finally dissolved in 3% ACN and 0.1% FA.

Fractionation of peptides by RP-HPLC The digested peptides were injected onto an HPLC system (Shimadzu, Kyoto, Japan) equipped with a 3.0×150 mm C18 RP column (Presto FF-C18; Imtakt Corp., Kyoto, Japan) and separated with a 100min gradient consisting of 0 min 2% B, 5 min 2% B, 81 min 40% B, 82 min 95% B, 90 min 95% B (A = 0.1% TFA in water, B = 0.1% TFA in 90% ACN) at a flow rate of 180 l/min and a column temperature of 35 °C. The eluted peptides were fractionated in 1-minute intervals from 10-82 min (72 fractions) and then combined into 8 fractions by the cyclic sample pooling method as described previously.(17)

LC-MS/MS and Data Analysis Peptides were directly injected onto a 75 m × 15 cm, PicoFrit emitter (New Objective, packed to 15 cm with Hypersil GOLD C18 1.9 m, 175 Å material) and separated at a flow rate of 300 nl/min using an Eksigent ekspert nanoLC 400 HPLC system (SCIEX, Framingham, MA). In assessments of 7 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 29

P/GTC-extracted proteins for proteomics analyses, peptides were separated using a 120-min gradient of buffers A (0.1% FA in water) and B (0.1% FA in 90% ACN) comprising 3% B from 0 min, 7.5% B from 4 min, 36% B from 104 min, 95% B from 108 min, and 95% B from 120 min. In

transcriptome and proteome analyses of HEK293-F cells and neutrophils, peptides were separated using a 180-min gradient of buffers A and B comprising 3.9% B from 0 min, 7% B from 4 min, 30% B from 134 min, 46% B from 168 min, 95% B from 169 min, and 95% B from 180 min. Peptides

eluting from the column were analyzed on a TripleTOF 5600+ (SCIEX) mass spectrometer. MS1 spectra were collected in the range 400-1200 m/z for 250 ms. The top 25 precursor ions with charge states of 2+ to 5+ that exceeded 150 counts/s were selected for fragmentation with rolling collision energy, and MS2 spectra were collected in the range 100-1500 m/z for 100 ms. The dynamic exclusion time was set to 24 s. A spray voltage of 2300 V was applied. All MS/MS files were searched against the reviewed UniProtKB/Swiss-Prot database (March 2016 release) containing 20,198 sequences of Homo sapiens proteins using ProteinPilot software v. 4.5 with the Paragon algorithm (SCIEX) for protein identification.(18) The search parameters were as follows: cysteine alkylation of iodoacetamide, trypsin digestion, and TripleTOF 5600+. The protein confidence threshold was a ProteinPilot unused score of 1.3 with at least one peptide with 95% confidence. The global false discovery rate for both peptides and proteins was lower than 1% in this study. Protein levels were estimated by calculating the distributed normalized spectral abundance factors (dNSAFs) for each protein as described previously.(19) Cellular localizations of identified proteins were 8 ACS Paragon Plus Environment

Page 9 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

characterized by cellular component gene ontology (GO) analysis (http://geneontology.org).

RNA-Seq analysis Total RNA was extracted from HEK293-F cells and human neutrophils using P/GTC. RNA-Seq libraries were prepared using the SureSelect Strand-Specific RNA library Preparation Kit (Agilent Technologies, Little Falls, CA) according to the manufacturer’s protocol and were sequenced using a HiSeq1500 sequencer (Illumina, San Diego, CA). The sequence reads were mapped to the human genome (NCBI version 19) using TopHat2 version 2.0.8 and Bowtie2 version 2.1.0 with default parameters and gene annotation was provided by NCBI. The transcript abundances were estimated using Cufflinks (version 2.1.1). Cufflinks was run with the same reference annotation as TopHat2 to generate FPKM (fragments per kilobase per million mapped reads) values for known gene models. We used a FPKM cutoff of 1 to identify an expressed gene.

Results and Discussion Assessment of P/GTC-extracted proteins for proteomics analyses We first examined the extraction yields and qualities of proteins by SDS-PAGE to compare two different methods, namely P/GTC-based extraction and conventional direct extraction from HEK293F cells or human peripheral blood neutrophils, respectively (Fig. 2). The total density of protein bands extracted from HEK 293-F cells by two methods was almost the same, indicating that the recovery of 9 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 29

P/GTC extraction method is as efficient as the conventional direct extraction method for HEK 293-F cells. In contrast, proteins extracted from neutrophils using the P/GTC method provided bands with high quality as compared to those obtained by the direct extraction method. Although protease inhibitor cocktails were added in the protein extraction buffer and the proteins were carefully extracted on ice, a dense band was observed in the low molecular weight region (< 6 kDa) of the direct-extracted samples, indicative of degraded proteins. In addition, neutrophil proteins were extracted using direct and P/GTC extraction methods with SDS, which is a strong surfactant that suppresses protease activities. But we found almost no differences from experiments that were performed using PTS buffer (Fig. S1). The results indicated that the P/GTC method gave superior results to the direct extraction method in terms of integrity of proteins. The reproducibility of the direct and P/GTC-based extraction methods was confirmed by comparing protein profiles from triplicate samples. Multiple factors are likely responsible for the differences in protein integrity in direct and P/GTC preparations. Among these, strong protein denaturants, such as phenol and guanidine isothiocyanate, can rapidly denature endogenous proteases. Removal of nucleic acids and other small molecules may also stabilize proteins in these extracts. Alternatively, when using the P/GTC method, neutrophils are directly lysed immediately after isolation to assure the production of high integrity RNA, which cannot be obtained from neutrophils otherwise. These experimental conditions likely influenced the quality of protein extracts and the P/GTC method produced high-quality protein and RNA even from neutrophils. But we did not pursue the causes of differing protein integrity further. 10 ACS Paragon Plus Environment

Page 11 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 3 shows the results of LC-MS/MS-based proteome analysis of the proteins extracted either by conventional direct or P/GTC-based extraction method using PTS Buffer. The number of peptides and proteins obtained from HEK 293-F cells were similar between those obtained by two methods, and the relative protein expression levels obtained by distributed normalized spectral abundance factors (dNSAFs) were not significantly different. In contrast, proteome analysis of neutrophil extracts obtained by P/GTC extraction method resulted in the identification of 1.5 times more peptides and doubled the number of proteins as compared to the direct method. The relative protein expression levels in neutrophil extracts were not significantly different in the high-expressed protein region (Log2(dNSAF x 107) > 14). However, the relative protein expression levels in extracts obtained by the P/GTC extraction method were higher than that of direct method especially in the low-expressed protein region (Log2(dNSAF x 107) < 14). Thus, the P/GTC extraction method clearly supports better proteome analyses of neutrophils as compared to the direct method. Consistent with the SDS-PAGE results, these LC-MS/MS results likely reflect the rapid inactivation of endogenous proteases and sample cleanup inherent in the P/GTC extraction method. The GO annotation analysis in Figure 4 shows a distribution of cellular components of identified proteins. The cellular components of directlyextracted proteins and P/GTC-extracted proteins had similar distributions in HEK293-F cells and neutrophils, indicating that the P/GTC extraction method does not bias the cellular components of detected proteins. These results also indicate that proteome analyses of samples following P/GTC extraction provide complete proteome coverage without the bias of the direct method. 11 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 29

Transcriptome and proteome analyses of HEK293-F cells and neutrophils We next compared the RNA and protein expression profiles of the same sample following P/GTC extraction as a proteogenome analysis. RNAs and proteins were sequentially extracted from HEK 293F cells and human peripheral blood neutrophils by P/GTC-based extraction method as described above. The mRNA expression profiles of the extracted RNAs were analyzed by RNA-Seq, and the protein expression profiles of the extracted proteins were analyzed by 2D-LC-MS/MS (Table S1). The detected mRNAs and proteins were first compared by Venn diagram analysis (Fig. 5A). We observed mRNAs for 12,144 protein-coding genes and proteins encoded by 7,901 protein-coding genes in HEK293-F cells. Corresponding mRNAs were observed for most of the observed proteins, and only 1% (61/7901 genes) of proteins lacked corresponding mRNAs. In contrast, in neutrophils we observed mRNAs for 7,669 protein-coding genes and proteins encoded by 4,781 protein-coding genes, indicating that the neutrophil samples lacked mRNAs corresponding to 17% (824/4781 genes) of the encoded identified proteins. We conducted deep proteome analysis by 2D-LC-MS/MS, but this approach was inferior to RNA-Seq: the number of mRNAs observed was more than 1.5 times that of the number of proteins observed in both HEK 293-F cell and neutrophil extracts. NGS-assisted RNA profiling generally has superior sensitivity to that of LC-MS/MS-based protein profiling, because after reverse transcription from RNA, the resulting cDNA can be amplified using PCR. In contrast, proteins cannot currently be amplified and the sensitivity of LC-MS/MS-based protein profiling is essentially 12 ACS Paragon Plus Environment

Page 13 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

that of LC-MS/MS systems, which remain sensitive only at the zeptomole to attomole level. Because the lower sensitivity of the 2D-LC-MS/MS approach results in fewer identified gene products than for NGS-assisted RNA profiling, proteome analyses are not as comprehensive as transcriptome analysis, thus limiting the comprehensiveness of proteogenomic analyses. Protein profiles from proteome analyses are, however, more functionally relevant to biological systems than RNA profiles. Thus, in practice, we consider the combination of proteome and transcriptome analyses a current best approach. It was nonetheless surprising that substantial numbers of the proteins lacked the corresponding mRNAs in the neutrophil proteome analysis. We examined the overlap between the top 1000 high-expression proteins and the total mRNA in each sample (Fig. 5B). The number of genes observed for proteins lacking the corresponding mRNA was 1 in HEK 293-F cells and 111 in neutrophils, showing that neutrophils contain many proteins lacking the corresponding mRNAs even in a group of highly expressed proteins. The gene lacking the corresponding mRNA in HEK 293-F cells was identified as HIST1H4A, and the mRNA transcribed from this gene is unusual that lacks a Poly A tail. Consequently, the mRNA of HIST1H4A was likely not detected in our study because mRNA was isolated from the total RNA using oligo (dT) beads. As representative genes, ELANE, AZU1, CTSG, and PRTN3 lacked mRNA transcripts in neutrophils, as indicated in qPCR analyses, which confirmed very low expression of these mRNAs in neutrophils (Fig. S2).

13 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 29

We next compared the expression levels of mRNAs and proteins encoded by the same gene (Fig. 5C). The correlation coefficient between mRNA and protein expression levels was R = 0.59 for HEK293-F cells and R = 0.3 for neutrophils. The correlation coefficient between mRNA and protein expression levels in HeLa cells was reported to be 0.6,(20) that was similar to the correlation coefficient for HEK293-F in this study. Because cultured cells proliferate continuously, they synthesize mRNAs and proteins continuously, thereby contributing to high correlations between mRNA and protein expression levels. In contrast, human peripheral blood neutrophils are short-lived and do not proliferate, and thus rarely synthesize new proteins that might contribute to a low correlation between mRNA and protein expression levels. We therefore found that the correlation between mRNA and protein expression levels differs greatly depending on the cell types. Neutrophils provide a low correlation between mRNA and protein expression levels, and thus protein expression levels cannot be adequately estimated only by transcriptome analysis; rather, it is important to analyze protein expression levels directly by proteome analysis. Fig. 6 shows the results of gene ontology (GO) enrichment analysis for the 111 genes for which protein expression was observed in the top 1000 high-expression proteins of the neutrophil proteome without corresponding mRNA expression. Many of these proteins were associated with extracellular exosomes and vesicles (Table S2). It was previously reported that mRNAs corresponding to the proteins in neutrophil granules and secretory vesicles are absent in human peripheral blood neutrophils, since those proteins accumulate in neutrophil precursors and are not synthesized after differentiation 14 ACS Paragon Plus Environment

Page 15 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

into neutrophils (21). Mature neutrophils appear to have a mechanism for accumulating granule and secretory vesicle proteins without new protein synthesis, because neutrophils must be ready to attack foreign matters by rapidly releasing granules and secretory vesicles. Although this is a plausible interpretation, we could not exclude other possibilities, such as tight binding of serum proteins to neutrophils. In contrast, many genes with detectable mRNA levels lacked corresponding proteins, and in the GO enrichment analysis of these genes described above, no clear trends were detected. Although it may be biologically interesting to know how the discrepancies between mRNA and protein levels emerged in neutrophils, multiple possible interpretations are likely and we did not explore this problem further. These results collectively suggest the importance of integrating mRNA and protein profiles to monitor the status of human peripheral blood neutrophils and potentially other types of primary cells isolated from clinical samples. This approach is likely applicable for proteogenomic analyses of solid clinical specimens, in addition to samples from cell cultures. Because the P/GTC reagent is a de facto standard for isolating RNA from solid clinical specimens, the method presented in this report may also be easily applied to prepare protein samples with the P/GTC reagent. Proteins, however, may not easily be prepared from formalin-fixed paraffin-embedded (FFPE) samples, particularly because RNA extraction from FFPE samples usually involves a proteinase K digestion step before lysis with the P/GTC reagent. Thus, we suggest that the sequential extraction method using the P/GTC reagent is applicable to most clinical specimens, and although it may not be suitable for proteome analysis of 15 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 29

FFPE tissues, it may be the method of choice for clinical proteogenomics analyses.

Conclusion We demonstrated that proteogenomic analysis can be conducted efficiently by P/GTC-based extraction method. In particular, a clear advantage of the P/GTC method over a conventional protein extraction method was observed in the analysis of human peripheral blood neutrophils, which are highly vulnerable to protein/RNA degradation. More importantly, the P/GTC-based extraction method greatly save precious samples, time and labor to prepare the required amount of RNA/protein from the same specimen as compared to the separate extraction of RNA and proteins. Collectively, the P/GTC-based extraction method is advantageous especially when precious samples are subject to the proteogenomic analysis.

Supporting Information Supporting Information Available Table S1: mRNA (FPKM values) and protein expression levels (dNSAF values) estimated from transcriptomics and proteomics data, respectively. Table S2: Extracellular exosome- and vesicle-associated proteins; the listed proteins were extracted using cellular component GO enrichment analyses of the 111 genes for which protein expression was observed. 16 ACS Paragon Plus Environment

Page 17 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure S1: CBB-stained SDS-PAGE patterns of proteins from neutrophils after extraction by direct and P/GTC methods. Figure S2: Confirmation of relative mRNA expression levels of ELANE, AZU1, CTSG, and PRTN3 as representative genes with high protein/low mRNA levels in neutrophils. Supplementary Method: Relative Quantification by Real-time qPCR.

MS data deposit The mass spectrometry proteomics data have been deposited in the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the jPOST partner repository (http://jpostdb.org) with the dataset identifier PXD011487/ JPST000513. Preview code URL: https://repository.jpostdb.org/preview/15143007645bd529eb22095 Access key: 8207

Acknowledgements This work was partly supported by a Grant-in-Aid for Young Scientists (B) (no. 17K18360, to Y.K.) from JSPS.

References 1.

Metzker, M. L., Sequencing technologies - the next generation. Nat Rev Genet 2010, 11, (1), 17 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 29

31-46. 2.

Mortazavi, A.; Williams, B. A.; McCue, K.; Schaeffer, L.; Wold, B., Mapping and quantifying

mammalian transcriptomes by RNA-Seq. Nat Methods 2008, 5, (7), 621-8. 3.

Meier, F.; Geyer, P. E.; Virreira Winter, S.; Cox, J.; Mann, M., BoxCar acquisition method

enables single-shot proteomics at a depth of 10,000 proteins in 100 minutes. Nat Methods 2018, 15, 440-448. 4.

Matsumoto, M.; Matsuzaki, F.; Oshikawa, K.; Goshima, N.; Mori, M.; Kawamura, Y.; Ogawa,

K.; Fukuda, E.; Nakatsumi, H.; Natsume, T.; Fukui, K.; Horimoto, K.; Nagashima, T.; Funayama, R.; Nakayama, K.; Nakayama, K. I., A large-scale targeted proteomics assay resource based on an in vitro human proteome. Nat Methods 2017, 14, (3), 251-258. 5.

Dunn, W. B.; Broadhurst, D.; Begley, P.; Zelena, E.; Francis-McIntyre, S.; Anderson, N.;

Brown, M.; Knowles, J. D.; Halsall, A.; Haselden, J. N.; Nicholls, A. W.; Wilson, I. D.; Kell, D. B.; Goodacre, R.; Human Serum Metabolome, C., Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry. Nat Protoc 2011, 6, (7), 1060-83. 6.

Davey, J. W.; Hohenlohe, P. A.; Etter, P. D.; Boone, J. Q.; Catchen, J. M.; Blaxter, M. L.,

Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nat Rev Genet 2011, 12, (7), 499-510. 7.

Aebersold, R.; Mann, M., Mass-spectrometric exploration of proteome structure and function. 18 ACS Paragon Plus Environment

Page 19 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Nature 2016, 537, (7620), 347-55. 8.

Spratlin, J. L.; Serkova, N. J.; Eckhardt, S. G., Clinical applications of metabolomics in

oncology: a review. Clin Cancer Res 2009, 15, (2), 431-40. 9.

Yugi, K.; Kubota, H.; Toyoshima, Y.; Noguchi, R.; Kawata, K.; Komori, Y.; Uda, S.; Kunida,

K.; Tomizawa, Y.; Funato, Y.; Miki, H.; Matsumoto, M.; Nakayama, K. I.; Kashikura, K.; Endo, K.; Ikeda, K.; Soga, T.; Kuroda, S., Reconstruction of insulin signal flow from phosphoproteome and metabolome data. Cell Rep 2014, 8, (4), 1171-83. 10.

Yugi, K.; Kubota, H.; Hatano, A.; Kuroda, S., Trans-Omics: How To Reconstruct

Biochemical Networks Across Multiple 'Omic' Layers. Trends Biotechnol 2016, 34, (4), 276-290. 11.

Chomczynski, P.; Sacchi, N., The single-step method of RNA isolation by acid guanidinium

thiocyanate-phenol-chloroform extraction: twenty-something years on. Nat Protoc 2006, 1, (2), 5815. 12.

Chomczynski, P., A reagent for the single-step simultaneous isolation of RNA, DNA and

proteins from cell and tissue samples. Biotechniques 1993, 15, (3), 532-4, 536-7. 13.

Miyata, J.; Fukunaga, K.; Iwamoto, R.; Isobe, Y.; Niimi, K.; Takamiya, R.; Takihara, T.;

Tomomatsu, K.; Suzuki, Y.; Oguma, T.; Sayama, K.; Arai, H.; Betsuyaku, T.; Arita, M.; Asano, K., Dysregulated synthesis of protectin D1 in eosinophils from patients with severe asthma. J Allergy Clin Immunol 2013, 131, (2), 353-60 e1-2. 14.

Masuda, T.; Tomita, M.; Ishihama, Y., Phase transfer surfactant-aided trypsin digestion for 19 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 29

membrane proteome analysis. J Proteome Res 2008, 7, (2), 731-40. 15.

Masuda, T.; Saito, N.; Tomita, M.; Ishihama, Y., Unbiased quantitation of Escherichia coli

membrane proteome using phase transfer surfactants. Mol Cell Proteomics 2009, 8, (12), 2770-7. 16.

Rappsilber, J.; Ishihama, Y.; Mann, M., Stop and go extraction tips for matrix-assisted laser

desorption/ionization, nanoelectrospray, and LC/MS sample pretreatment in proteomics. Anal Chem 2003, 75, (3), 663-70. 17.

Kawashima, Y.; Satoh, M.; Saito, T.; Matsui, T.; Nomura, F.; Matsumoto, H.; Kodera, Y.,

Cyclic sample pooling using two-dimensional liquid chromatography system enhances coverage in shotgun proteomics. Biomed Chromatogr 2013, 27, (6), 691-4. 18.

Shilov, I. V.; Seymour, S. L.; Patel, A. A.; Loboda, A.; Tang, W. H.; Keating, S. P.; Hunter,

C. L.; Nuwaysir, L. M.; Schaeffer, D. A., The Paragon Algorithm, a next generation search engine that uses sequence temperature values and feature probabilities to identify peptides from tandem mass spectra. Mol Cell Proteomics 2007, 6, (9), 1638-55. 19.

Zhang, Y.; Wen, Z.; Washburn, M. P.; Florens, L., Refinements to label free proteome

quantitation: how to deal with peptides shared by multiple proteins. Anal Chem 2010, 82, (6), 227281. 20.

Nagaraj, N.; Wisniewski, J. R.; Geiger, T.; Cox, J.; Kircher, M.; Kelso, J.; Paabo, S.; Mann,

M., Deep proteome and transcriptome mapping of a human cancer cell line. Mol Syst Biol 2011, 7, 548. 20 ACS Paragon Plus Environment

Page 21 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

21.

Journal of Proteome Research

Rorvig, S.; Ostergaard, O.; Heegaard, N. H.; Borregaard, N., Proteome profiling of human

neutrophil granule subsets, secretory vesicles, and cell membrane: correlation with transcriptome profiling of neutrophil precursors. J Leukoc Biol 2013, 94, (4), 711-21.

Figure legends Fig. 1 Flowchart for sequentially isolating RNA, DNA, and proteins from a biological sample using P/GTC reagent.

Fig. 2 CBB-stained SDS-PAGE pattern of proteins from HEK293-F cells and neutrophils extracted by the conventional direct and P/GTC extraction methods. Three independent samples were prepared for each cell type using each extraction method.

Fig. 3 Comparison of conventional direct and P/GTC extraction methods by shotgun proteomics. The number of peptides (A) and proteins (B) identified in the HEK293-F cell and neutrophil extracts prepared using the direct and P/GTC extraction methods. Bars represent the mean number of identified proteins and peptides from three independent replicated experiments. (C) Correlation of expression levels of proteins identified in samples prepared using the direct and P/GTC extraction methods.

21 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 29

Fig. 4 Comparison of cellular component gene ontology (GO) annotations for the proteins identified in proteome analyses of HEK293-F cells and neutrophils following direct-conventional and P/GTC extraction methods. GO categories are expressed as percentages of the total number of proteins in the corresponding protein list.

Fig. 5 Comparison of proteomics and transcriptomics data for HEK293-F cells and neutrophils. (A) Venn diagram of the number of genes observed by transcriptome (mRNA level) and proteome (protein level) analyses. (B) Venn diagram of the number of genes observed by transcriptome (mRNA level) analysis and the 1000 most abundant genes in the proteome data (protein level). (C) Scatter plot of mRNA (FPKM values) versus protein expression level (dNSAF values).

Fig. 6 Cellular component GO enrichment analysis of the 111 genes observed for the 1000 most abundant neutrophil proteins without corresponding mRNA expression.

22 ACS Paragon Plus Environment

Page 23 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Cell or Tissue in P/GTC Reagent Add chloroform

RNA fraction (aqueous phase)

Transcriptome

Add ethanol

DNA fraction (pellet)

Genome

Add acetone

Protein fraction (pellet)

Proteome

Fig. 1

23 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

HEK293-F cells Direct

Page 24 of 29

Neutrophils

P/GTC

Direct

kDa

kDa

200

200

116 97

116 97

66

66

55

55

37

37

31

31

21

21

14 6

14 6

P/GTC

Fig. 2 24 ACS Paragon Plus Environment

Page 25 of 29

A

B HEK293-F cells

Neutrophils

n.s.

p