Simultaneous Proteoform Analysis of Histones H3 ... - ACS Publications

Page 1 of 30. ACS Paragon Plus Environment. Analytical Chemistry. 1. 2 ..... search space, and used manually curated database files containing all his...
0 downloads 13 Views 1MB Size
Subscriber access provided by UNIVERSITY OF THE SUNSHINE COAST

Article

Simultaneous proteoform analysis of histones H3 and H4 with a simplified middle-down proteomics method Christoph Schräder, Daniel S. Ziemianowicz, Kathleen Merx, and David C. Schriemer Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.7b03948 • Publication Date (Web): 06 Feb 2018 Downloaded from http://pubs.acs.org on February 13, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Analytical Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Simultaneous Proteoform Analysis of Histones H3 and H4 with a Simplified Middle-down Proteomics Method

Christoph U. Schräder1, Daniel S. Ziemianowicz1, Kathleen Merx1 and David C. Schriemer1,2*

1

Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada T2N 1N4 2

Department of Chemistry, University of Calgary, Calgary, Alberta, Canada T2N 1N4

*Corresponding author: David C. Schriemer, Ph.D. Department of Biochemistry and Molecular Biology The University of Calgary, Room 300 Heritage Medical Research Building 3330 Hospital Drive NW Calgary, Alberta, Canada T2N 4N1 Email: [email protected]

Running title: Middle-down analysis of histones H3 and H4 with neprosin 1 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ABSTRACT Dynamic post-translational modifications of histones regulate transcriptional gene expression in eukaryotes. Unique combinations of modifications, almost exclusively displayed at the flexible N-terminal tails on histones, create distributions of proteoforms that need to be characterized in order to understand the complexity of gene regulation and how aberrant modification patterns influence disease. Although mass spectrometry is a preferred method for the analysis of histone modifications, information is lost when using conventional trypsin-based histone methods. Newer “middle-down” protocols may retain a greater fraction of the full proteoform distribution. We describe a strategy for the simultaneous characterization of histones H3 and H4 with nearcomplete retention of proteoform distributions, using a conventional proteomics LC-MS/MS configuration. The selective prolyl endoprotease neprosin generates convenient peptide lengths for retention and dispersion of modified H3 and H4 peptides on reversed-phase chromatography, offering an alternative to the hydrophilic interaction liquid chromatography typically used in middle-down methods. No chemical derivatizations are required, presenting a significant advantage over the trypsin-based protocol. Over 200 proteoforms can be readily profiled in a single analysis of histones from HeLa S3 cells. An in-gel digestion protocol provides additional options for effective histone analysis.

2 ACS Paragon Plus Environment

Page 2 of 30

Page 3 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

INTRODUCTION

Histones, a class of five different Lys- and Arg-rich proteins (H1, H2A, H2B, H3 and H4), are the core protein components of chromatin in eukaryotes around which DNA is wound. These proteins represent more than just a scaffold for chromatin assembly. Gene expression in eukaryotes is regulated by the post-translational modifications (PTMs) of histones in a number of complex ways1, involving PTM “readers” and “writers” that allow the cell to adjust its output of gene products during cell division and development. The patterns of modifications are dynamic, and aberrant patterns can transform cells into cancer2. It is essential to map these modifications as a result, not only to understand basic transcriptional mechanisms in biology but to understand how cancer develops. Histones are tightly packed into octamers but the loose, highly basic, N-terminal tails contain the vast majority of the modifications. Among them, the tails of H3 and H4 are of the highest interest due to the number of different modifications associated with the regulation of gene expression3,4. In the course of extensive PTM studies, many single sites have been mapped with high confidence and linked to cellular processes. For example, it has been shown that methylation of H3 at K27 induces gene silencing, mediated by the polycomb group of proteins (e.g. polycomb repressive complex 2, or PRC2)5. However, in vivo regulation most probably involves complex patterns of individual proteoforms (i.e. unique groupings of PTMs)6,7, more specifically referred to as the “histone code” in this context8. Mass spectrometry (MS) is the key analytical technique for histone PTM analysis. Antibody-based methods exist and are very sensitive, but MS can in theory provide access to the

3 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

code in an unbiased fashion and allow for the discovery of unknown histone PTMs9. Not surprisingly, LC-MS/MS routines derived from proteomics protocols are typically used in histone analysis, but suitably comprehensive methods remain a work in progress due to hydrophilic character of histone tails and the extremely large number of possible proteoforms10. The classical tryptic digestion (or bottom-up) method renders the tails into very small pieces and erodes much of the PTM patterning. A useful variant of the classical approach involves lysine derivatization, usually with propionic anhydride, to generate peptides with longer ‘read lengths’ as trypsin is now only able to cleave after arginine11. The resulting peptides are analyzed using conventional reversed-phase liquid chromatography (RPLC). The approach provides high sensitivity combined with straightforward data analysis. However, much of the information about individual proteoforms still remains inaccessible, and further, propionylation generates several side reactions and full derivatization of Lys residues is hard to achieve in practice12. In theory, top-down proteomics methods that analyze intact histones are the preferred technique since all proteoforms are preserved. Encouraging analytical routines are emerging13, but improvements are still needed in sensitivity, fragmentation14 and dispersion of isobaric proteoforms across retention time. Middle-down methods should return some of the benefits of the bottom-up method. They apply alternative enzymes to generate larger peptides comprising mostly the tail regions15, where the modifications are clustered. For example, GluC (an enzyme cleaving C-terminal to glutamic acid) generates a 50 aa N-terminal tail for histone H3 (H31-50)16. AspN (an enzyme cleaving Nterminal to asparagine) generates a 23 aa N-terminal tail for histone H4 (H41-23)17. In both cases the majority of PTMs are preserved, but alternative chromatographic configurations are needed prior to mass analysis. While RPLC of GluC-digests retain most peptides from histones H2A,

4 ACS Paragon Plus Environment

Page 4 of 30

Page 5 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

H2B and H4, the tail of histone H3 is weakly retained18. Capturing the modified states of histone H3 thus requires the derivatization of unmodified lysine residues to enable RPLC19, or weak cation exchange hydrophilic interaction liquid chromatography (WCX-HILIC)20. The WCXHILIC strategy enables the separation of differentially acetylated tails and the profiling of 100’s of modification states, but it is technically demanding and not available in most proteomics labs20-22. Alternative middle-down approaches are required that return the utility of RPLC and allow any proteomics lab to efficiently analyze histones with conventional technologies. Here, we describe a one-enzyme method that does not require histone derivatization, yet provides comprehensive middle-down H3/H4 tail analysis in a single RPLC run. The approach incorporates neprosin, a novel prolyl-endoprotease recently introduced for bottom-up proteomics23. We also describe an approach involving in-gel digestion that extends the utility of the method.

5 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

EXPERIMENTAL SECTION

HeLa Cell Culture and Histone Extraction HeLa S3 cells were grown at the National Cell Culture Centre in Joklik modified MEM supplemented with 5% newborn calf serum to high cell density (1 x 106 cells mL-1). Cells were collected by centrifugation at 2500g, followed by two washes in warm 37 °C PBS (Ca2+/ Mg2+ free) and stored at -80 °C. After thawing, histones from 2 x 108 cells were isolated using an acid extraction protocol according to Shechter et al.24. TCA-precipitated histones were resuspended in dd-H2O and purity was confirmed by SDS-PAGE. Crude histones from this large preparation (10%, the equivalent to 2 x 107 cells) were further used for RPLC-based fractionation and enrichment of histones H3 and H4, simply to allow for blending and testing of simultaneous analysis methods. Histone preparations Modified histones, either isolated from HeLa cells as described above or purchased (calf thymus histones, Sigma Aldrich, St. Louis, MO, USA; product #H9250), were prepared to a concentration of 5 µg µL-1 in dd-H2O and fractionated on an Agilent 1100 series HPLC using a C18 column (Aeris 3.6 µm WIDEPORE XB-C18 200 Å, 50 x 2.1 mm, Phenomenex, Torrance, CA, USA) at a flow rate of 400 µL min-1. The injected histone mass was 100 µg for calf thymus histones and 160 µg for HeLa S3 histones. The gradient consisted of solvent A (0.1 % TFA in H2O) and solvent B (97 % ACN, 0.1 % TFA), run linearly from 5 to 35% B in 15 min, followed by 35 to 55% in 35 min, and 55 to 95% B in 5 min for column washing and regeneration. The H3 6 ACS Paragon Plus Environment

Page 6 of 30

Page 7 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

and H4 fractions were identified by SDS-PAGE, combined and lyophilized. The H3/H4 blend was resuspended in dd-H2O and stored at – 20 °C prior to further analysis. Individual unmodified recombinant mammalian histones were expressed and purified as in Luger et al25. Histone Digestions Fractionated and reblended H3/H4 samples from calf thymus and HeLa S3 cells were resuspended in 100 mM Gly-HCl pH 2.5 at 1 µg µL-1 and 10 µg of the blend was subsequently digested overnight at 37 °C by the addition of recombinant neprosin23 at an E:S ratio of 1:500. Samples were desalted prior mass spectrometric analysis using the Stage Tip protocol26 and eluted with 20 % ACN in 0.1 % TFA, where only 1/20th of the amount was used per technical replicate. Recombinant histones (15 µg) were resuspended in 50 mM NH4HCO3 at pH 8.0 for digestion using GluC (Roche Diagnostics, Mannheim, Germany; product #11 420 399 01) and AspN (Roche Diagnostics Mannheim, Germany; product #11 420 488 01), or in 100 mM GlyHCl pH 2.5 for neprosin digestion. GluC was added to an E:S ratio of 1:50 for H3 and AspN added to an E:S ratio of 1:50 for H4. The digestions were carried out overnight. Recombinant neprosin was added to an E:S ratio of 1:500 for H3 and H4, and the digestion was also carried out overnight. AspN and neprosin digests were performed at 37°C and GluC digests were performed at room temperature. After incubation, samples were desalted prior mass spectrometric analysis using the Stage Tip protocol and eluted with 40% ACN in 0.1% TFA. Crude HeLa S3 histones (5 µg) were analyzed in duplicate by 15% SDS-PAGE. The gel was stained with Coomassie, de-stained with H2Odd and the H3 bands were cut out and pooled for digestion. The in-gel digestion using neprosin was carried out using a typical trypsin-based protocol27 with two changes. The reduction and subsequent alkylation step was skipped due to

7 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

the absence of disulfide bridges, and we used 100 mM Gly-HCl pH 2.5 instead of 25 mM ammonium bicarbonate buffer. The digestion was carried out overnight at 37 °C. PRC2 histone-methyltransferase assay Recombinant PRC2 consisted of human EZH2, EED, SUZ12, RbAp48 and AEBP2; these proteins were expressed and purified from insect cells as summarized in supporting information. Histone assemblies were prepared as previously described28 and consisted of H2A.1, H2B type 1-C/E/F/G/I, H3.2 and H4 from Homo sapiens. At the assay ionic strength, histones were present in a mixed state (mostly tetramers – not shown). The histone methyltransferase (HMTase) assay was performed by incubating 1.5 µM PRC2 with equimolar amounts of histone tetramer and a 33x molar excess of the methyl donor S-adenosyl methionine (SAM, Sigma Aldrich, product #A4377) at 30°C; buffered in 150 mM NaCl, 10 mM HEPES, pH 8.0. HMTase activity was quenched with 0.6% formic acid (Sigma Aldrich, product #33015), lowering the pH to 2.5. Samples were subsequently digested overnight at 37°C using neprosin, at an enzyme-to-substrate (E:S) ratio of 1:500. LC/MS All histone digests were measured on an Orbitrap Velos mass spectrometer coupled to an EASYnLC 1000 system (Thermo Fisher Scientific, Bremen, Germany) equipped with a Nanospray Flex Ion Source. Peptides were chromatographically separated using a 15 cm PicoTip fused silica emitter with an inner diameter of 75 µm (New Objective, Woburn, MA, USA), packed inhouse with reversed-phase Aeris 3.6 µm PEPTIDE XB-C18 100 Å material (Phenomenex, Torrance, CA, USA; product #04A-4507). The flow rate was set to 300 nL min-1. All samples were analyzed at least in duplicate.

8 ACS Paragon Plus Environment

Page 8 of 30

Page 9 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

For the analysis of recombinant digests of individual unmodified histones H3 and H4, a conventional gradient consisting of solvent A (0.1% FA in H2O) and solvent B (80 % ACN in 0.1 % FA) running linearly from 2% to 35% B over 40 min was applied. Digests were analyzed in biological duplicates. Data were acquired using data-dependent MS/MS mode. Each highresolution precursor ion scan in the Orbitrap (m/z 300 to 1200, R= 60,000 at m/z 400) was followed by high-resolution product ion scans (isolation window 2 Th) in the Orbitrap after HCD fragmentation at 35% normalized collision energy (NCE), for initial characterization. The resolution was set to 7,500 at m/z 400. Top 12 most abundant signals with a charge state greater than two were selected for fragmentation, followed by dynamic exclusion for 60 s. For the in-depth analysis of neprosin-digested H3 and H4 from calf thymus and PRC2methylated H3, peptides were eluted using a shallow gradient of 1% to 20% B over 55 min, which allowed for the improved separation of modified N-terminal peptide tails. Samples were analyzed in three technical replicates. Data were acquired using data-dependent MS/MS mode. Each high-resolution precursor ion scan in the Orbitrap (m/z 350 to 1200, R = 100,000 at m/z 400) was followed by low-resolution product ion scans (isolation window 1.5 Th) after ETD fragmentation (max. 105 msec with supplemental activation) in the linear ion trap in enhanced scan mode with 4 microscans. Top 5 most abundant signals with a charge state greater than three were selected for fragmentation, followed by dynamic exclusion for 30 s. For the in-depth analysis of histone H3 and H4 proteoforms from HeLa cells after digestion with neprosin (either in-solution or in-gel), we used the same shallow gradient as described to analyze the digests (also in three technical replicates). We used the same wide window for MS ion scans in the Orbitrap as described (m/z 350 to 1200, R = 100,000 at m/z 400), but limited the window for precursor ion selection to m/z 555 to 600. This restricted ion

9 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

selection to a single charge state (6+ of peptide H41-32 and 7+ of peptide H31-38). Each highresolution precursor ion scan in the Orbitrap was followed by high-resolution product ion scans (isolation window 1.5 Th) after ETD fragmentation (activation time max. 105 ms with supplemental activation) in the Orbitrap at a resolution of 7,500 (m/z 400). Top 4 most abundant signals with a charge state greater than three were selected for fragmentation, followed by dynamic exclusion for 20 s. For each product ion scan, 3 microscans were acquired. Data acquisition was controlled in all experiments with Xcalibur software (v. 3.0.63) and all RAW data are publicly available via Chorus (https://chorusproject.org/pages/index.html; ID 1370). Informatics All spectra were processed and analyzed in PEAKS Studio (v. 8.0)29, followed by deconvolution and export to the MGF file format. Database searches were then additionally performed with Mascot (v. 2.5, Matrix Science) using the following search parameters for both PEAKS and Mascot: MS tolerance was set to 12 ppm, and MS/MS tolerance to 0.03 u for product ion spectra acquired in the Orbitrap analyzer and to 0.5 u for product ion spectra acquired in the linear ion trap. Data from digests of recombinant histones digested with neprosin, Glu-C and Asp-N were searched with enzyme specificity set to ‘none’ for an unbiased analysis of cleavage sites. On the basis of these results, we limited the cleavage sites C-terminal to Pro residues for neprosin digests of endogenous histones with a maximum of 2 missed cleavages in order to restrict the search space, and used manually curated database files containing all histone isoforms from Homo sapiens and Bos taurus, respectively. Variable modifications were: monomethylation (+14.0157 u) of Lys and Arg, dimethylation (+ 28.0313 u) of Lys, phosphorylation of Ser (+79.9663 u), trimethylation (+42.0468 u) of Lys and acetylation (+42.0106 u) of Lys and the

10 ACS Paragon Plus Environment

Page 10 of 30

Page 11 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

protein’s N-terminus. The peptide IDs with a MASCOT score > 30 and the IDs from a PEAKS peptide search at an FDR of 2% were retained and manually checked with the de novo module in PEAKS. The methylation of H3 mediated by PRC2 in the HMTase assay was analyzed in mzmine2 (v. 2.23)30 using the charge states 6+ to 9+ of H31-38. Combinatorial PTMs on HeLa histones were processed in PEAKS, on native spectra. Signals were first quantified at the MS1 level using extracted ion chromatograms (XICs) for all relevant charge states (6+ to 9+) using mzmine2 (v.2.23), summing them together. Subsequently, positional isomers were identified using triggered MS2 scans and manually verified. Following the method of Pesavento et al.31, the relative abundance of unique fragment ions for a given proteoform was used to weight the XIC for coeluting positional isomers. Then, a relative abundance for a given histone tail was determined against the sum of all the histone tails observed.

11 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

RESULTS & DISCUSSION

We have previously demonstrated that neprosin, a prolyl-endoprotease from the carnivorous pitcher plant (genus Nepenthes) shows promise in the analysis of histones23. Using a crude histone mix from calf thymus, we observed peptide 1-38 (~4.0 kDa) from H3 and peptide 1-32 from H4 (~3.3 kDa). We suspected that neprosin can generate these fragments in a very selective and stable manner, even though there appear to be internal cleavage sites and the potential to generate smaller fragments (Fig. 1). For several reasons, these peptides could be better suited for simultaneous middle-down analysis than those generated by GluC and AspN. GluC generates a peptide somewhat over-long (1-50), as most modifications occur on or before K3719,32. Information on some low-abundant PTMs (e.g. phosphorylation of H3Y41 and H3T45) will be lost, but a high peptide mass confounds the resolution of isobaric species such as trimethylated (+42.0468 u) and acetylated (+42.0106 u) tails. A shorter peptide should aid in the resolution of these important modifications on most instruments. Although the neprosin fragment for H4 is longer than that generated by AspN (1-23), these isobaric forms should still be distinguishable in the 1-32 neprosin product and it may be retained better in RPLC. Further, the H41-32 neprosin product could allow for detection of the low abundant H4K31ac mark33. To test these ideas, we first compared neprosin cleavage specificity with GluC (for H3) and AspN (for H4) under “end-state” digestion conditions (i.e. extensive digestion), using conventional RPLC-based LC-MS/MS applied to synthetic, unmodified histones. Peptide 1-38 is the most abundant H3 digestion product (Fig. 2A). Even though H3 contains two other Nterminal proline residues at positions 16 and 30 (and internal alanines) we observed at best minor digestion products due to these positions. Longer incubation times and higher E:S ratios may lead to additional cleavages, but the conditions we used were aggressive for the enzyme. The 12 ACS Paragon Plus Environment

Page 12 of 30

Page 13 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

basic amino acids near the P1 position appear to render P38 an ideal cleavage site, consistent with our earlier observations that charged residues in the vicinity of the cleavage site positively influence processing. GluC reliably released peptide H31-50 as expected (Fig. S1A), but the partial deamidation of Q19 to E19 promoted the generation of the peptide H320-50 as well34. GluC-generated H31-50 can be retained by RPLC in our experiments, but we were unsuccessful in using this approach to profile modifications of endogenous histones, which appears consistent with the lack of reports using this combination for H3 analysis. In both H3 digests, the hydrophilic N-terminal peptide tails elute the earliest, while Cterminal peptides are chromatographically better retained, a common feature in histone analysis35. We observed seven different charge states (4+ to 10+) for neprosin-generated H31-38 (Fig. 2A). The larger GluC product produced a wider charge state distribution and additional signal splitting (nine different charge states, 5+ to 13+, Fig. S1A). Using the same RPLC configuration for the analysis of the H4 digests, we observed that the end-state neprosin digest generated abundant H41-32 signal over six different charge states (4+ to 9+) (Fig. 2B). We saw poor retention of the unmodified AspN peptide H41-23 (Fig. S1B), but the neprosin generated H41-32 tail was sufficiently well-retained on a generic C-18 column, under a conventional RPLC gradient. We then explored more challenging sample types, reflective of available isolation strategies, to determine the ability of a neprosin-driven method to process mixed states. For example, kits are available for the efficient isolation of intact H3/H4 tetramers. To simulate the output of such an approach, we constructed a blend of H3 and H4 from purified calf thymus histones (Fig. S2A-C). We then refined the RPLC conditions to improve retention slightly and increase resolution, simply by implementing a shallower gradient. The H3/H4 blend was

13 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

digested overnight with neprosin. The injected sample amount was approximately 0.1 µg and ETD fragmentation was used in the data dependent acquisition of MS/MS spectra. ETD (compared with CID or HCD) generates better sequence coverage for larger histone peptides and peptides with higher charge states36. The analysis shows good dispersion of multiple acetylation and methylation marks (Fig. 3), showing a staggered elution of H3 and H4 modification states. The chromatography is reproducible as proven by technical replicates (not shown). Gradient programming can compress the states if desired, but the elution order remains the same. In RPLC, acetylated histone peptides elute later than their non-modified forms, which is in contrast with HILIC separations37. We achieve baseline separation for all naturally occurring degrees of acetylation using standard C18-RPLC. While this has been reported for AspN peptide H41-23, it has not been shown for the GluC peptide H31-5020,38. The tails of H3 eluted over a time period of ca. 23 min and those of H4 eluted over a time period of ca. 30 min, providing ample acquisition time for comprehensive coverage using ETD-based MS2 fragmentation. Some duty cycle may be lost in the regions where H3 and H4 states coelute, but this should not overwhelm more efficient ETD systems currently available. We note that the differentiation between the isobaric acetylation and trimethylation modifications is a problem in both the middle-down and top-down analyses of histone H339. The mass difference between the two is only 0.0362 u. This difference corresponds to a relative mass error of ~ 9 ppm for the neprosin-generated H31-38 and of ~ 6 ppm for the GluC-generated H31-50. The larger relative difference is straightforward to resolve using a range of current MS instruments, which should help reduce the demand on the chromatographic separation. Before developing a method for H3/H4 proteoform profiling using the MS/MS data, we first needed to determine the impact of modification status on the peptide charge state

14 ACS Paragon Plus Environment

Page 14 of 30

Page 15 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

distributions, as any profiling method requires an abundance determination derived from an MS feature. Since acetylation reduces charge and can influence ionization efficiency40,41, we investigated if selecting a single charge state introduces a bias in determining the relative abundance of histone modifications. Using the calf thymus H3/H4 blend, we tracked the four most abundant charge states for H3 (6+ to 9+) and H4 (5+ to 8+) and measured the acetylation levels as a function of charge state. The average acetylation is weakly but significantly influenced by the charge state for both histones H3 and H4 (Fig. S3), thus to minimize bias for any quantitation of histone marks, we integrated all charge states, regardless of the modification. The relative abundance of each degree of modification (e.g. 1 ac plus 1 me on H31-38) can be measured by integrating the corresponding XICs, revealing the global methylation and acetylation status of H3 (Fig. S4A) and H4 (Fig. S4B). To evaluate this middle-down approach for profiling histone proteoforms by MS/MS, we mapped the combined H3/H4 marks in HeLa S3 cells, running the same RPLC gradient as above but changing the MS method slightly. While still monitoring a broad mass window to capture full charge-state distributions, we narrowed the window for precursor ion selection down to the m/z 555 – 600, to capture a single charge state16,29. The resulting MS/MS spectra were used to identify unique reporter ions for a given proteoform. The characterization of certain proteoforms (e.g. those involving K36me) is greatly simplified when using peptide H31-38 compared to the GluC-generated H31-50, since reporter ions can often be found in the lower m/z range (e.g. Fig. 4). This range is usually easier to interpret than the higher m/z ranges, which contain multiplycharged and possibly overlapping fragment ions. The relative abundance of these reporter ions was used to allocate the relative abundance level of the MS charge state distributions to the appropriate proteoform. All identified proteoforms are listed in Table S1 (H3.2) and Table S2

15 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(H4), and the results are aggregated in Fig. 5 according to modification site. Compared to the degree of H3/H4 modification seen in calf thymus histones, histones H3/H4 from HeLa S3 cells show a lower degree of acetylation (Fig. S4 and Fig. S5). Collectively we mapped 204 combinatorial states in a single analysis. The observed modification patterns for H3 and H4 generally agree well with previously reported data for histones derived from HeLa cell lines4,42,43. For example, we return a very similar degree of lysine acetylation in H3, where acetylation abundance decreases in the order K23 > K14 > K18. However, we did observe a slightly higher abundance of H3K36me marks. This mark was readily measurable in our workflow by calculating the relative abundance of differently methylated z2 ions and z3 ions (Fig. 4). We could not identify H3K4me3 in our profiling, but we note that this modification is of low abundance in HeLa cells as noted by others 44

. A suitably optimized analysis would be required for detection. We did not detect the

H4K31ac nor the H4K31me1 marks on H41-32 tails, which is consistent with an earlier top-down study on HeLa H4 histones43. More recent analyses have detected the H4K31ac marks45, perhaps suggesting that a modification in the P2 position could prohibit digestion. Encouragingly, modified residues in close proximity to cleavage sites (such as H3K36me or H3K37me) do not appear to negatively affect the cleavage specificity, or generate missed cleavages. H4K31ac may be underestimated in our analysis for sensitivity reasons. Further manual analysis would likely increase the depth of proteoform coverage in these samples, as would improved ETD technologies on newer instruments. The computational profiling in all middle-down methods still lack robustness, as recently reviewed by Sidoli et al.44. Improved automated profiling will need newer algorithms, but ultimately we are limited by the complexity of chimeric spectra arising from chromatographically-unresolved combinatorial states.

16 ACS Paragon Plus Environment

Page 16 of 30

Page 17 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

The analyses we described above reflect a blended sample to illustrate simultaneous analysis, but other histone preparations can generate isolated histones, including the preparative RPLC method we used to formulate the H3/H4 blends. Such isolations would further improve characterization, but one method particularly useful for smaller samples involves conventional gel-based proteomics workflows. We tested if neprosin could be applied to an in-gel digestion of histones, as in a typical trypsin-based approach. H3 was cut out from a 15% SDS-PAGE separation of crude histones from HeLa S3 cells (Fig. S6A), digested overnight with neprosin and analyzed using the same methodology as in Fig. 3. We observed that peptide H31-38 was highly abundant, showing extensive modification at the MS1 level from an injection of 600 u) are shown as bars. A legend for known PTM sites is provided. Displayed PTMs are chosen based on the commonly investigated PTMs in middle-down based MS studies of histones H3 and H4. A comprehensive overview of all known histone PTMs is given by Huang et al.49. Figure 2. The dominant cleavage products of recombinant histones treated with neprosin, for (A) H3 with the charge state distribution of H31-38 generated at 10.3 min, and (B) H4 with the charge state distribution of H41-32 at 14.6 min. N-terminal tails H31-38 and H41-32 are annotated in red. Figure 3. Reversed-phase LC/MS heat map of all modified forms of peptides H31-38 (charge state 7+) and H41-32 (charge state 6+) grouped according to modification state, based on MS/MS characterizations. Baseline-separated acetylation states are boxed in green and blue for H3 and H4, respectively. Figure 4. Deconvoluted chimeric product ion spectrum (ETD fragmentation) of a multiplymodified H31-38 peptide from HeLa S3 cells (m/z 570.481; z = 7; RT = 19.5 min), showing the unambiguous identification of the proteoform K23acK27me3, but revealing the additional presence of mono-, di- and trimethylated K36, based on high-resolution low-mass fragment ions (expanded spectrum).

22 ACS Paragon Plus Environment

Page 22 of 30

Page 23 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figure 5. Middle-down quantitation of histone marks for H3.2 (A) and H4 (B) derived from HeLa S3 cells, identified in a single LC-MS run. The data for the 204 marks aggregated in this figure are shown in tables S1 and S2. The legend for histone marks is provided. Figure 6. Quantitation of H3K27 methylation in H31-38 proteoforms after incubation with polycomb repressive complex 2 (PRC2), a protein complex that presents the methyltransferase EZH2 in an active form. (A) Methylation kinetics as a function of methylation state, showing the relative abundances of the unmethylated form (me0, red circles), the monomethylated form (me1, turquoise circles), the dimethylated form (me2, green circles) and the trimethylated form (me3, black circles). The cumulative yield is shown as grey area. Error bars represent the standard deviation of three technical replicates. (B) XICs for the 8+ charge state (time point: 180 min), showing increases in retention time for increasingly methylated peptide H31-38.

23 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 1.

24 ACS Paragon Plus Environment

Page 24 of 30

Page 25 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figure 2.

25 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3.

26 ACS Paragon Plus Environment

Page 26 of 30

Page 27 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figure 4.

27 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 5.

28 ACS Paragon Plus Environment

Page 28 of 30

Page 29 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figure 6.

29 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

For TOC only

30 ACS Paragon Plus Environment

Page 30 of 30