Specifying RNA-Binding Regions in Proteins by ... - ACS Publications

Jun 25, 2017 - These properties are regulated through modularity of a large variety of RNA-binding domains, rendering RNA–protein interactions diffi...
0 downloads 0 Views 2MB Size
Subscriber access provided by EAST TENNESSEE STATE UNIV

Article

Specifying RNA-binding regions in proteins by peptide Cross-Linking and Affinity Purification Meeli Mullari, David Lyon, Lars Juhl Jensen, and Michael L. Nielsen J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.7b00042 • Publication Date (Web): 25 Jun 2017 Downloaded from http://pubs.acs.org on June 28, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Proteome Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 38

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

For consideration as a Research Article in Journal of Proteome Research

Specifying RNA-binding regions in proteins by peptide Cross-Linking and Affinity Purification Meeli Mullari1, David Lyon2, Lars Juhl Jensen2 and Michael L. Nielsen1*

1

Department of Proteomics, The Novo Nordisk Foundation Center for Protein Research,

University of Copenhagen, Faculty of Health Sciences, DK-2200 Copenhagen, Denmark 2

Disease Systems Biology, The Novo Nordisk Foundation Center for Protein Research,

University of Copenhagen, Faculty of Health Sciences, DK-2200 Copenhagen, Denmark

* To whom correspondence should be addressed: e-mail: [email protected], Phone: +45 35 32 50 19

1 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 38

Abstract RNA-binding proteins (RBPs) allow cells to carry out pre-RNA processing and posttranscriptional regulation of gene expression, with aberrations in RBP functions linked to many diseases including neurological disorders and cancer. Human cells encode thousands of RNA-binding proteins with unique RNA-binding properties. These properties are regulated through modularity of a large variety of RNA-binding domains, rendering RNAprotein interactions difficult to study. Recently, the introduction of proteomics methods has provided novel insights into RNA-binding proteins at a systems level. However, determining the exact protein sequence regions that interact with RNA remains challenging and laborious, especially considering that many RBPs lack canonical RNA-binding domains. Here we describe a streamlined proteomic workflow called peptide Cross-Linking and Affinity Purification (pCLAP), which allows rapid characterization of RNA-binding regions in proteins. pCLAP is based upon a combined use of UV cross-linking and enzymatic digestion of RNA-bound proteins followed by single-shot mass spectrometric analysis. To benchmark our method we identified the binding regions for poly-adenylated RNA-binding proteins in HEK293 cells, allowing us to map the mRNA interaction regions of more than 1.000 RBPs with very high reproducibility from replicate single-shot analyses. Our results show specific enrichment of many known RNA-binding regions on many known RNA-binding proteins, confirming the specificity of our approach.

Keywords: RNA-binding proteins, RNA-binding regions, Mass spectrometry, single-shot MS analysis

2 ACS Paragon Plus Environment

Page 3 of 38

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Introduction Gene expression in eukaryotes is strongly regulated at the post-transcriptional level. An eukaryotic mRNA undergoes a variety of processes, including capping, splicing, polyadenylation, editing, localization, translation, translational repression, degradation and quality control1. Collectively, these processes increase the diversity of the transcriptome and allow for temporal and spatial regulation of gene expression. Many of these processes are guided by RNA-binding proteins (RBPs), which recognize RNAs through various types of RNA-binding domains (RBDs)2. Although several well-characterized RBDs have been described3, the diversity of RBDs in eukaryotic cells are steadily increasing. For example, while more than 600 structurally distinct RBDs were characterized among previously identified RBPs, most of these RBDs were only present in one or two proteins4. One reason for this diversity stems from the large variety of RNAs in mammalian cells, which often requires RBPs to recognize very diverse targets4. Since most RBDs only recognize short RNA stretches, RBP modularity has emerged as an important functional aspect facilitating specificity and affinity and allowing recognition of diverse RNA targets3. Hence, the ability to identify RBPs and to specify the exact sequence regions responsible for the RNA interaction is of the utmost importance to uncover the functional role of RBPs in cellular functions and diseases. Especially since diseases have been linked to defects in RBP expression and function, including neuropathies, muscular atrophies, metabolic disorders and cancer5. However, the vast diversity of RBDs along with the generally complex and dynamic nature of RNA–protein interactions render such ribonucleoproteins difficult to study. Recently, several land-mark studies employed mass spectrometry (MS)-based proteomics for global identification of RBPs in human cells6-8, hereby rendering it feasible to obtain insights into RNA-binding 3 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 38

protein factors at a systems-level. Besides providing insights into the cellular landscape of RBPs, these large-scale MS studies have identified hundreds of RBPs previously not annotated to bind RNA or take part in RNA-related processes. While these RNA-interaction screens are able to identify bona fide RBPs, they do not allow for mapping of the protein sequence regions directly responsible for binding RNA. And considering that many of the identified RBPs in these studies lack known RBDs, the ability to assess which sequence regions are responsible for the observed RNA-interaction would be pertinent to investigate the RNA-binding properties of novel RBDs7-8. On the other hand, recent advances in bioinformatic analyses allow for prediction of RNA-binding sites and features 9. However, with the emerging complexity of RBDs4 it may not be possible to predict novel RBDs based only upon known ones. In order to directly map RNA-protein interactions, a methodology for mass spectrometric identification of RNA cross-linking sites has been described10. In this study a total of 60 RNAprotein cross-linking sites were identified on human proteins, however, to obtain these data several purification steps and extensive data filtering were required. Although a specialized software tool was developed for streamlined data analysis, the elaborate processing steps overall hampered the sensitivity of the methodology and rendered the approach less applicable for comprehensive analyses. To address these challenges, a methodology referred to as RBDmap was recently established which allows for mapping of the specific RNA-binding regions of RBPs11. However, the RBDmap entailed elaborate sample preparation and due to the sample pre-fractionation steps the described methodology was required prolonged MS acquisition time. To alleviate these limitations, we here describe a universal proteomic workflow for the rapid and sensitive characterization of RNA-binding regions in proteins. We call this methodology 4 ACS Paragon Plus Environment

Page 5 of 38

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

pCLAP (peptide-Cross-Linking and Affinity Purification) as it takes advantage of UV crosslinking in combination with optimized sample preparation and enzymatic digestion of RNAbound proteins prior to RNA affinity purification. Collectively, pCLAP is technically straightforward compared to contemporary methods, as it employs expeditious sample preparation using GndHCl and alleviates the need for any buffer exchange during sample preparation11. The streamlined sample procedure combines the advantages of GndHCl as a known RNase inhibitor12 with its capability for rapid protein digestion13. Besides, the guanidinium salt have been shown to facilitate RNA-RNA hybridization14 and hereby substitute the lithium-based buffers commonly used for hybridization of poly-a tails with oligo-dt beads7-8,

11

. In addition, the streamlined sample preparation entailed by pCLAP

allows the experiments to be analyzed with single shot MS, which shortens the required MS acquisition time significantly. Briefly, in pCLAP we use oligo(dT) beads to enrich for LysC digested peptides that have been cross-linked with UV-light to RNA in vivo. This is followed by a second peptide digestion step using trypsin, which liberates the tryptic sequence parts of the cross-linked LysC peptides, hereby allowing for peptide identification using standard MS tools (Figure 1A). As pCLAP does not enrich for intact RBPs, the obtained data can be used to specify the RNA-binding regions of RBPs bound to RNA in a manner similar to the previously described RDBmap11 (Table S1). Besides, as the identified peptides are derived from sequence regions proximal to the protein-RNA cross-linking sites they do not harbor any cross-linked RNA moieties. Hence the obtained MS data do not require sophisticated data processing procedures10, but can be analyzed using conventional proteomics software platforms. The differences and improvements of pCLAP compared to current methodologies are highlighted in Table 1.

5 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 38

To benchmark our pCLAP method we analyzed RNA-binding proteins from human HEK293 cells, focusing on the analysis of identifying the binding regions for mRNA-binding proteins. Using pCLAP we mapped the specific RNA-binding regions for known and novel RBPs, and we demonstrate that our methodology exerts high reproducibility via replicate single-shot analyses. The presented pCLAP method will work with diverse cells and is applicable with tissue analysis. In conclusion, pCLAP is a streamlined and universal in vivo workflow allowing for rapid and sensitive characterization of RNA-binding regions in proteins.

Methods Peptide-RNA complex isolation HEK293T cells were washed with PBS and the living cells were irradiated with 0.15 J/cm2 254 nm UV-light (Dr. Gröbel GmbH), harvested in ice-cold PBS and spun down immediately after cross-linking as previously described15. Cell pellets were lysed with ice-cold lysis buffer (6M Gnd HCl, 100mM Tris pH 8.0, 1 mM EDTA, 5mM tris(2-carboxyethyl)phosphine (TCEp), 10mM chloroacetamide (CAA); as described before16) and immediately sonicated for 20 sec, amplitude 70% (Sonics) until no cell debris was visible. Proteins in the sample were digested by adding Lys-C (Wako) to the cell lysate (protein:Lys-C ratio was 100:1) and incubation for 3h at RT on a shaker at 300 rpm. The sample was diluted with ice-cold dilution buffer (100mM Tris pH 8.0, 1 mM EDTA, 5mM TCEp, 10mM CAA), to lower GndHCl concentration to 2M. Magnetic poly(dT) beads (NE Biolabs) were added to the diluted cell lysate and incubated at 4°C for 1h with gentle rotation. Beads were captured using a magnetic rack (ThermoFisher Scientific) and washed 4 times with washing buffer (2M Gnd HCl, 100mM 6 ACS Paragon Plus Environment

Page 7 of 38

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Tris pH 8.0, 1 mM EDTA, 5mM TCEp, 10mM CAA). RNA with cross-linked peptides was eluted at 55°C for 3 min in elution buffer (20 mM Tris pH 8.0, 1 mM EDTA). NaCl was added to the sample to a concentration of 0.15 M, followed by RNase treatment with RNase A and T1 mix (Thermo Scientific) for 1h at 37°C. The remaining peptides with short RNA-moieties were further digested using trypsin (Sigma; peptide:trypsin ratio was 100:1) over night at 37°C. Trypsination was terminated and undigested proteins were precipitated by adding TFA until the reaction pH was 2-3, spun down at 14 000 rpm for 15 min and the pellet was discarded. Samples were concentrated, desalted and digested RNA was removed using SCX and C18 stage-tips17 one after another. Samples were eluted from C18 tips directly before MS-analysis. Each technical replicate was analyzed as a single shot on the mass spectrometer. As a negative control, to identify unspecific binders in the pull-down, the same procedure was used on samples that were not subjected to UV-cross-linking. Both the crosslinked and non-cross-linked samples were done in quadruplicates. Peptides significantly enriched in the cross-linked samples compared to the non-cross-linked samples were considered RNA-binding regions.

RNA-protein complex isolation RNA-protein complex isolation was done as previously described15, with the exception that instead of in-solution protein digestion in-gel protein digestion18 was used. HEK293T cells were irradiated with 254nm 0.15J/cm2 UV-light (Dr. Gröbel GmbH), harvested in ice-cold PBS and spun down. Cell pellet was resuspended in lysis buffer (20 mM Tris-HCl (pH 7.5), 500 mM Lithium chloride (LiCl), 0.1% Na-deoxycholate, 1 mM EDTA, 5 mM Dithiothreitol (DTT)) and passed through a needle (gauge 0.4 mm diameter) three times. Magnetic

7 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 38

poly(dT) beads (NE Biolabs) were added to the cell lysate and incubated at 4°C for 1h with gentle rotation. Beads were captured using a magnetic rack (ThermoFisher Scientific) and washed once with lysis buffer, followed by washes with these buffer: 1) 20 mM Tris-HCl (pH 7.5), 500 mM LiCl, 0.02% Na-deoxycholate (wt/vol), 1 mM EDTA, 5 mM DTT; 2) 20 mM TrisHCl (pH 7.5), 500 mM LiCl, 1 mM EDTA, 5 mM DTT; 3) 20 mM Tris-HCl (pH 7.5), 200 mM LiCl, 1 mM EDTA, 5 mM DTT. Bound RNA and cross-linked proteins were eluted at 55°C for 3 min in elution buffer (Mix 20 mM Tris-HCl (pH 7.5) and 1 mM EDTA). 10x RNase buffer (100 mM Tris-HCl (pH 7.5), 1.5 M NaCl, 0.5% (vol/vol) NP-40 and 5 mM DTT) was added to the sample along with a mixture of RNase A and T1 (Thermo Scientific) and the sample was incubated for 1h at 37°C. However, from here on the protocol deviates from the previously published method15. The proteins in the sample were then resolved via sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) on a NuPAGE Novex 412% Bis-Tris Protein Gel according to manufacturer’s instructions (ThermoFisher Scientific), followed by standard in-gel digestion18, C18 stage-tipping17 and MS analysis.

Proteome analysis For proteome analysis, the supernatant of the poly(A) pull down was used from the RNApeptide capturing experiment. Since UV cross-linking efficiency is between 1% and 5%19, this would still give good proteome coverage, but assure that the proteome is from cells grown in the same condition. The Lys-C peptides in the cell lysate were further digested overnight with trypsin (trypsin:protein ratio was 1:100). Trypsin digestion was terminated and undigested proteins were precipitated by adding TFA until the solution reaches pH 2-3 and spun down at 1800xg for 5 min. Peptide purification was done using reverse-phase Sep-

8 ACS Paragon Plus Environment

Page 9 of 38

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Pak C18 cartridges (Waters). Peptides were eluded using 40% and 60% acetonitrile, acetonitrile was then evaporated using a centrifugal vacuum concentrator (Eppendorf® Vacufuge plus). To get a more in-depth coverage of the proteome, the sample was fractionated into 14 concatenated fractions using off-line High-pH reverse-phase fractionation as described previously20. Briefly, the sample was loaded by the rheodyne MXII pump (IDEX Corporation) coupled to the HPLC onto a Waters XBridge BEH130 C18 3.5 μm 4.6 × 250 mm column on an Ultimate 3000 HPLC (Dionex, Sunnyvale, CA, USA) operating at 1 mL/min. The sample was eluted into fractions using an acetonitrile gradient from 0%90% in 10 mM ammonium formate. Acetonitrile was removed from the samples and sample volume was reduced by using a centrifugal vacuum concentrator (Eppendorf® Vacufuge plus). The samples were then analyzed on the mass spectrometer.

Tandem MS coupled to liquid chromatography (LC-MS/MS) MS analysis was performed using a nanoscale UHPLC (EASY-nLC1000 from Proxeon Biosystems) coupled to an Orbitrap Q-Exactive HF, equipped with a nanoelectrospray source (Thermo Fisher Scientific). The samples were separated on a 15 cm analytical column (75 µm inner diameter), packed in-house with 1.9- µm C18 beads (Reprosil Pur-AQ, Dr.Maisch). For separation of digested peptides we employed a 77 minute long gradient ranging from 5% to 40% acetonitrile in 0.5% formic acid at a flow rate of 250 nl/min. The sample eluding from the column in the HPLC was directly electrosprayed into the Q Exactive HF. The Q Exactive HF mass spectrometer was operated in data-dependent acquisition mode and all pull-down samples were analyzed using ‘sensitive’ acquisition method and the proteome using the ‘fast’ acquisition method as previously described21.

9 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 38

Data analysis All acquired raw files (mass spectrometry data) was analyzed by the MaxQuant22 software suite using standard settings, supported by the Andromeda search engine23. Briefly, data was searched against a concatenated target/decoy24 (forward and reversed) version of UniProtKB database encompassing 71,434 protein entries. We followed the step-by-step protocol of the MaxQuant (version 1.5.0.38) software suite to generate MS/MS peak lists that were filtered to contain at most six peaks per 100 Da interval prior to the Andromeda database search. Mass tolerance for searches was set to maximum 7 ppm for peptide masses and 20 ppm for HCD fragment ion masses. Data was searched with carbamidomethylation as a fixed modification and protein N-terminal acetylation, methionine oxidation as variable modifications. A maximum of two miscleavages was allowed while requiring strict trypsin specificity25. Three technical replicates were analyzed together and match between runs was turned on. Reverse hits and contaminants were removed from the protein and peptide data. The cross-linked samples were compared to the non-cross-linked sample to determine significantly enriched peptides using Perseus (version 1.5.1.2) (MaxPlanck Institute of Biochemistry, Department of Proteomics and Signal Transduction, Munich). Peptides that had intensities from at least three replicates in either the cross-linked or the non-cross-linked sample were compared. Missing values, including intensities for peptides identified in only the cross-linked sample or non-cross-linked samples were imputated based on the normal distribution of the intensities in each sample, using standard parametes in Perseus. Gene Ontology (GO) and InterPro motif enrichment analyses were conducted using GeneCodis

26

. Peptide localization in proteins was assessed using

MaxQuant Viewer and Perseus by mapping sequences onto known Pfam domains as reported in the Pfam database27.

10 ACS Paragon Plus Environment

Page 11 of 38

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Using an in house developed script we mapped the identified tryptic peptides back to an in silico digested human FASTA file. This allows for identification of the Lys-C peptides that the tryptic peptides originiate from, and in turn mapping of the initially Lys-C peptides crosslinked with mRNA. Results and Discussion Here we describe a novel workflow referred to as pCLAP, which allows efficient in vivo mapping of RNA–binding sequence regions in proteins. To induce covalent cross-links between RBPs and RNA, but not between proteins, we first subject human cells to UV-light at 254 nm (Figure 1A)28. Following UV-treatment, we lyse the cross-linked cells using the denaturing chaotrope GndHCl13 (Figure 1A) and subsequently digest cellular proteins using the Lys-C endoproteinase (Figure 1A)13. As Lys-C cleaves c-terminal to lysine residues with high specificity29, this digestion step ensures that only LysC digested peptidesremains cross-linked to RNA, while all other non-cross-linked peptides belonging to RBPs are eluted. Moreover, using the LysC endoproteinase in GndHCl results in a high digestion efficiency already after short digestion times13, ensuring a streamlined sample preparation. To further investigate this, we tested different time-points for LysC digestion, hereby reconfirming that reduced digestion time does not have an impact on the number of missed cleavages (Figure S1). With only Lys-C peptides remaining cross-linked to RNA, we next use poly(dT) beads to enrich for poly-adenylated RNA (Figure 1A), and subsequently degrade the RNA using RNAse (Figure 1A). Hereby the resulting sample only contains free RNA nucleotides along with RNA moieties cross-linked to Lys-C digested peptides.

11 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 38

In order to release the cross-linked peptides for MS identification, we next perform a second digestion step by cleaving the RNA bound Lys-C peptides using trypsin (Figure 1A). As trypsin cleaves c-terminal to arginines, while Lys-C does not, the additionally digestion step liberates the tryptic sequence parts of the initially cross-linked LysC peptide that lack crosslinked RNA moieties but still reside in an RNA-binding region proximal to the RNA and UV cross-linking site (Figure 1A). After tryptic digestion, the released tryptic peptides are sequenced by single-shot high-resolution LC-MS with the acquired peptide sequences being identified using standard proteomic software tools. Importantly, these tryptic peptides do not harbor any cross-linked RNA, which would introduce a highly variable modification to the peptides. Hence, with the pCLAP method information regarding the specific cross-linking sites are lost.. However, as UV light induces cross-links proximal to the in vivo RNA-protein interaction it does not necessarily occur between the actual amino acid and nucleotide that interact in vivo

28

. Thus, information about the physical cross-linking site would not provide

the same information that, for example a high resolution crystal structure would30. Finally, as the peptides identified by MS are derived from a tryptic digestion, we map the tryptic peptides back to their corresponding protein sequences (Figure 1D), which in turn determines the specific protein sequence region that was initially precipitated along with the poly-adenylated RNA. To discriminate which tryptic peptide sequences were directly identified by MS, and which LysC peptides sequences were inferred by in silico mapping, we will mark MS identified peptide sequences as “peptideMS” while in silico derived sequences are referred to as “peptideIS”. pCLAP enriches for UV-cross-linked peptides

12 ACS Paragon Plus Environment

Page 13 of 38

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

To benchmark our pCLAP methodology we performed quadruplicate analyses of RNAbinding peptide-regions in both UV- and non-UV-treated HEK293 cells, respectively. Overall the pCLAP enrichment showed high reproducibility in identified tryptic peptidesMS between technical replicates derived from four UV-treated samples (Figure 1B). To eliminate nonspecific binders, we performed a label-free quantification (LFQ)31 between the UV-treated samples and quadruplicate samples not subjected to UV-light, with the latter harboring no cross-links between RNA and proteins (Figure 1D, enriched peptidesMS highlighted in blue; Table S2). This comparison allows for the identification of potential background binding peptidesMS that unspecifically bind to beads or the oligo(dT) sequences but are not induced by the UV-cross-linking. By comparing the signal abundance derived from non-UV-induced peptide sequences to their UV-induced counterparts, we can quantitatively determine the specificity of the pCLAP approach using (LFQ)31. As expected, the quantitative analysis from four replicates revealed that peptidesMS identified in the UV-cross-linked samples are on average 17 times more abundant in peptide signal intensities compared to peptides derived from the non-cross-linked samples (Figure 2C). Highlighting that the tryptic peptidesMS binds more strongly to poly(A) enriched mRNA during UV-cross-linking as compare to the control experiments. This supports the notion that the peptidesMS identified by pCLAP indeed are cross-linked to the RNA by the employed UV treatment. From this LFQ analysis we determined the peptidesMS significantly enriched in the UV-cross-linked samples using standard methods31, and for all subsequent analyses of our pCLAP methodology we only considered peptideMS identifications which appeared significantly enriched in the crosslinked samples (p