Chemical-Mediated Digestion: An Alternative Realm for Middle-down

May 3, 2018 - Finally, cleavage of proteins with small molecules, contrary to ... by the fragmentation of large polypeptides, and accounting for event...
1 downloads 0 Views 3MB Size
Subscriber access provided by Kaohsiung Medical University

Article

Chemical-mediated digestion: an alternative realm for middle-down proteomics? Kristina Srzenti#, Konstantin O. Zhurov, Anna A. Lobas, Gennady Nikitin, Luca Fornelli, Mikhail V. Gorshkov, and Yury O. Tsybin J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.7b00834 • Publication Date (Web): 03 May 2018 Downloaded from http://pubs.acs.org on May 3, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Chemical-mediated digestion: an alternative realm for middle-down proteomics?

Kristina Srzentić1£, Konstantin O. Zhurov1, Anna A. Lobas2, Gennady Nikitin1, Luca Fornelli1£, Mikhail V. Gorshkov2,3, and Yury O. Tsybin4*

1

Ecole Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland

2

V. L. Talrose Institute for Energy Problems of Chemical Physics, Russian

Academy of Sciences, Leninsky Prospect 38, 119334 Moscow, Russia 3

Moscow Institute of Physics and Technology (State University), 9 Institutskiy

per., 141707 Dolgoprudny, Moscow Region, Russia 4 £

Spectroswiss, EPFL Innovation Park, 1015 Lausanne, Switzerland present address: Northwestern University, 2170 Campus Drive, 60208

Evanston, IL, USA

Correspondence should be addressed to Dr. Yury O. Tsybin, Spectroswiss Sàrl, EPFL Innovation Park, Building I, 1015 Lausanne, Switzerland. E-mail: [email protected]  

Running title: Chemical cleavage-based middle-down approach

ACS Paragon Plus Environment



Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 49

Abstract Protein digestion in mass spectrometry (MS)-based bottom-up proteomics targets mainly lysine and arginine residues, yielding primarily 0.6-3 kDa peptides for the proteomes of organisms of all major kingdoms. Recent advances in MS technology enable analysis of complex mixtures of increasingly longer (>3 kDa) peptides in a high-throughput manner supporting the development of a middle-down proteomics (MDP) approach. Generating longer peptides is a paramount step in launching an MDP pipeline, but the quest for the selection of a cleaving agent that would provide the desired 3-15 kDa peptides remains open. Recent bioinformatics studies have shown that cleavage at the rarely occurring amino acid residues such as methionine (Met), tryptophan (Trp) or cysteine (Cys) would be suitable for MDP approach. Interestingly, chemical-mediated proteolytic cleavages uniquely allow targeting these rare amino acids, for which no specific proteolytic enzymes are known. Herein, as potential candidates for MDP-grade proteolysis, we have investigated the performance of chemical agents previously reported to target primarily Met, Trp, and Cys residues: CNBr, BNPS-Skatole (3-bromo-3-methyl-2-(2-nitrophenyl)sulfanylindole), and NTCB (2-nitro-5-thiobenzoic acid), respectively. Figures of merit such as digestion reproducibility, peptide size distribution and occurrence of side reactions are discussed. The NTCB-based MDP workflow has demonstrated particularly attractive performance and NTCB is put forward here as a potential cleaving agent for further MDP development. Keywords: middle-down, MD; mass spectrometry, MS; Fourier transform mass spectrometry, FTMS; Orbitrap; chemicals; chemical cleavage;

ACS Paragon Plus Environment



Page 3 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Introduction Proteolysis is the main avenue for in-solution and in-gel digestion of complex protein mixtures for mass spectrometry (MS)-based proteomics.1-5 The most common proteomic approach is bottom-up proteomics (BUP), which employs enzymes that digest proteins at the frequently occurring amino acids. As a result, BUP-grade enzymes generate peptides mainly lighter than 3 kDa. Trypsin is by far the most widely used enzyme for modern BUP workflows, cleaving at the C-terminus of lysine and arginine amino acids. An almost exclusive specificity and excellent proteome sequence coverage are among the advantages offered by the trypsin-based BUP approach. On the other hand, BUP enzymes produce an extremely high number of short, less than 10 amino acid long, peptides, which surpasses the MS ability to analyze them all, even when MS is aided with up-front fractionation techniques.6, 7 Proteome analysis with enzymatically-derived peptides larger than 3 kDa has recently emerged in a response to the necessity of in-depth biological studies on one hand and the MS technological advances on the other hand. MS-based analysis of 3-15 kDa peptides is generally referred to as middledown proteomics (MDP) approach.8-10 Furthermore, the analysis of 3-7 kDa peptides is sometimes referred to as extended bottom-up proteomics (eBUP) approach.11,

12

Previous reports described proteases such as Lys-C, Glu-C,

Sap911 or OmpT8 as a ‘way to go’ for generating large peptides in the mass range suitable for MDP and eBUP approaches. However, the attempted protease-based approaches either produced only slightly longer peptides compared to the trypsin-based bottom-up approach, e.g., Lys-C,13 Lys-N14 and Glu-C,15 or their anticipated dibasic amino acid specificity was found to 3  ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

be non-exclusive, e.g., Sap9 and OmpT.8,

11

Page 4 of 49

Nevertheless, some of these

proteases have demonstrated attractive performance in selected applications, as in the case of the targeted analysis of monoclonal antibodies with Sap912 or the histone analysis with Glu-C.16-18 An alternative route to the generation of larger peptides with proteases, including trypsin, is via the controlled restriction of the proteolysis reaction time.19, 20 The reproducibility and protein structure-dependence of the restricted (limited) proteolysis approach are yet to be established, including for providing protein structural information beyond sequencing.21-23 Previously, we reported a bioinformatics study on the potential of each amino acid residue for generating MDP-sized peptides.24 Amino acids unfrequently represented in the examined proteomes, including Met, Trp, and Cys, emerged there as viable candidates for the development of an MDP platform. Interestingly, proteases already benchmarked for MS-based proteomics or potentially suitable for proteomic applications do not target these residues. On the other hand, chemical-mediated digestion of proteins at these specific amino acids has been previously reported.25,

26

Although

digestion with chemical agents essentially performs the same type of reaction as with proteases, i.e., hydrolysis of peptide bonds in proteins, this alternative approach has been exclusively applied in a targeted fashion, for the cleavage of selected, purified proteins. Examples include obtaining protein sequence information, aiding in elucidation of protein higher order structure, and protein engineering.27 For instance, the cleavage after Met using cyanogen bromide (CNBr) is one of the more commonly employed chemical cleavage methods that has been reported to induce limited side reactions and has a 4  ACS Paragon Plus Environment

Page 5 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

yield of 90-100%.28 Vestling et al. reported on a cleavage at the C-terminal side of Trp residue using 3-bromo-3-methyl-2-(2-nitrophenyl)sulfanylindole, also known as BNPS-Skatole.26 There were periodical attempts to utilize Cys as a target cleavage site for the analysis of proteins with particular structure or structure-related problematics. However, most of these studies were limited to

the

investigation

of

particular

modification

introduced, such

as

dehydroalanine formation on cysteines in a single protein, e.g., in a serum albumin, or in a couple of proteins.29 Other studies considered a particular group of proteins such as hydrophobic proteins with CNBr-assisted digestion.30,

31

Finally, some studies were conducted to optimize a single

reaction step or an entire reaction with the purpose of explaining the underlying mechanism in organic chemistry. To the best of our knowledge, with the exception of acid hydrolysis, for example using formic acid,9, 32-34 the utility of chemical digestion has not been investigated in proteomic-type studies yet.35 Moreover, the cleavage efficiency of the above listed chemical methods has not been evaluated in comparison with enzymatic procedures for the proteolysis of simple protein mixtures analyzed by MS in terms of the capability to produce peptides in the desired mass range. One of the main reasons for the above described restrictions in chemical-mediated applications in proteomics might lay in the fact that chemicals generate large peptides which could not have been investigated in a high-throughput manner until recently, due to the technological limitations in MS and tandem MS (MS/MS). High-resolution mass spectrometers, such as those equipped with orbitrap or time-of-flight mass analyzers, have been adapted for high-throughput characterization of large biomolecules only in 5  ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 49

the last decade.36, 37 Additionally, until recently, comprehensive information about the average size distribution of peptides obtainable by different chemical cleavages was missing. Furthermore, proteases have been favored over chemicals due to their non-toxic nature. Finally, cleavage of proteins with small molecules, contrary to enzymatic cleavage, may entail significant modification to the side chains of the targeted residues. This feature changes the input information on cleavage rules given to the protein database search algorithms.

Nowadays,

adequate

bioinformatics

platforms,38-40

mainly

developed for top-down (TD) proteomics,41  may address analysis of this type of peptides by creating specific databases, dealing with complex tandem mass spectra obtained by the fragmentation of large polypeptides, and accounting for eventual unexpected modifications.42-44 Therefore,

current

experimental

and

bioinformatics

capabilities

combined with the growing understanding of BUP limitations, create an appropriate momentum for performance evaluation of a chemical-mediated MDP approach. Here, we first report on the protocol refinements for the chemical cleavage at the three aforementioned amino acid residues. We then describe the results of the high-resolution MS analysis of digestion products of a seven-protein mixture obtained with different chemical agents and benchmark a selected chemical agent, 2-nitro-5-thiobenzoic acid or NTCB, which appears more suitable for future large-scale MDP applications.

Experimental methods Sample preparation. An equimolar (100 µM) model protein mixture consisting of yeast enolase 1 and 2, bovine apo-transferrin, serum albumin, pancreatic 6  ACS Paragon Plus Environment

Page 7 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

ribonuclease A, chicken egg white lysozyme (all from Sigma Aldrich, St. Louis, MO) and bovine carbonic anhydrase 2 (Protea Biosciences, Morgantown, WV) was prepared in 6.8 M urea in 100 mM ammonium bicarbonate buffer (pH 7.8). An aliquot of the mixture was removed for the digestion with NTCB and the remainder was reduced with dithiothreitol (DTT, 5 mM final concentration) at 50°C for 1 hr, followed by 45 min alkylation at room temperature in dark with 18 mM final iodoacetamide. Sample was then split into aliquots (each containing 1 nmol of the protein mixture), dried completely in a SpeedVac concentrator (Eppendorf, Hamburg, Germany) and resuspended in the appropriate buffer for each of the chemical digestion procedures. For all procedures, the digestion was carried out on three sample replicates. All cleavage procedures employed were based on previously tailored protocols25, 45-48

and are summarized below.

N-terminal Cys cleavage with NTCB. Aliquots containing 1 nmol of protein mixture were treated with 10x molar excess of DTT, sealed under N2 and incubated for 1 hr at room temperature. After reduction, samples were buffer exchanged in 200 mM Tris acetate at pH 8 and subjected to labeling step with NTCB solution (Millipore Sigma, Merck, Darmstadt, Germany, cat. no. N7009, prepared in Tris acetate buffer) which was added in 10-fold molar excess over all thiol groups in reaction mixture (total of 920 nmol of NTCB) and incubated for 30 min at 40°C. Samples were purified by overnight acetone precipitation prior to the digestion step. Precipitated pellet was resuspended in 100 mM sodium borate buffer at pH 9, diluted in equal volume of water and the digestion was carried out for 1 hr at 50°C. After an hour an aliquot was used 7  ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 49

to check the pH which was adjusted back to 9 with 2 mM NaOH. The digestion proceeded for another hour at 50°C. Note that NTCB reagent is harmful and may cause acute toxicity (category 4) if swallowed or inhaled. In small amounts NTCB may cause skin, eye, and respiratory irritation.

C-terminal Met cleavage with CNBr. Reduced and alkylated pellets containing 1 nmol of protein were resuspended in an aqueous solution of 75% (v/v) formic acid, then acetonitrile was added in 1/20 of final reaction volume. CNBr (Millipore Sigma, Merck, Darmstadt, Germany, cat. no. 481432) was added in crystals weighted to correspond to number of moles required to yield 200:1 CNBr:Met molar ratio (for final 9 mmol in reaction mixture). Samples were carefully vortexed, covered with aluminum foil and incubated 24 hr at room temperature in a chemical fume hood. After digestion, 10 volumes of Milli-Q water were added to samples which were then dried completely in a dedicated SpeedVac evaporator. It should be explicitly noted that CNBr is harmful and causes acute toxicity if swallowed, inhaled or in contact with skin (categories 1, 2, and 3). CNBr liberates a very toxic gas in contact with acids and is hazardous to the aquatic environment. CNBr is corrosive and even in small amounts may cause serious eye damage, respiratory and skin irritation.

C-terminal Trp cleavage with BNPS-Skatole. Reduced and alkylated pellets containing 1 nmol of protein were resuspended in 20 µl of 1% acetic acid. 1mg/ml BNPS-Skatole (Millipore Sigma, Merck, Darmstadt, Germany, cat. no. B4651) was prepared in 1% glacial acetic acid immediately before use, and 8  ACS Paragon Plus Environment

Page 9 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

added to protein sample (3:1 volume parts BNPS:protein solution, 60 µl total). The reaction mixture was incubated 1 hr at 37°C. BNPS-Skatole was precipitated from the solution by addition of an equal volume of Milli-Q water followed by centrifugation for 5 min at 12000 rpm at room temperature. Supernatant was transferred in a new tube and lyophilized for further analysis. Note that BNPS-Skatole reagent is harmful if swallowed and may cause skin, eye, and respiratory irritation.

C-terminal Trp cleavage with o-iodosobenzoic acid. The digestion solution was prepared by dissolving o-iodosobenzoic acid (Millipore Sigma, Merck, Darmstadt, Germany, cat. no. I8000) in 1 ml of 80% (v/v) acetic acid/4M guanidine-HCl/2% (v/v) p-cresol to a final concentration of 10 mg/ml. This solution was first incubated for 2 hrs at room temperature. Reduced and alkylated pellets containing 1 nmol of protein were resuspended in digestion solution in a tube flushed with nitrogen and incubated for 24 hrs in dark at room temperature. The reaction was terminated by adding 10 volumes of Milli-Q water and drying samples in a SpeedVac concentrator. Note, that oiodosobenzoic acid is harmful if swallowed and may cause skin, eye, and respiratory irritation.  

LC-MS/MS analysis. All peptides obtained through different chemical digestion procedures were subjected to the pooled C4-C18 stage tip clean-up with ZipTip cartridges (Millipore, Billerica, MA) prior to LC separation as described previously.12 Approximately 8 pmol of peptide mixture was loaded onto C8 trap-column (Dionex, 2 cm, 100 Å, 5 um particle, 75 µm i.d.) for 10 9  ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 49

minutes with 0.1% FA at a flow rate of 8 uL/min, respectively. Reversed-phase nano LC was performed using a Dionex Ultimate 3000 system (Thermo Scientific, Bremen, Germany) equipped with C8 column (Dionex, Acclaim PepMap300, 150 mm, 300 Å, 5 um, 75 µm i.d.). Solvent A was composed of 0.1 % of FA in water and solvent B of 50 % MeOH, 20 % ACN, 10 % TFE, and 0.1 % FA. The percentage of the organic phase was increased from 5 to 60 % over 60 minutes for all performed analyses. For all chemical procedures employed, each digestion replicate was analyzed in three consecutive technical replicates. The outlet of chromatographic column was coupled on-line with a nanoelectrospray ionization (ESI) source (Nanospray Flex ion source, Thermo Scientific) equipped with a metallic emitter to which a 2.2 kV potential was applied. Mass spectrometric analysis was performed on a hybrid high-field LTQ Orbitrap Elite FTMS (Thermo Scientific). In all the LC-MS/MS runs, the survey scan was performed at 60'000 resolution (at 400 m/z) in the Orbitrap FTMS with automatic gain control (AGC) set at 1e6. Dynamic exclusion was enabled with 60 s duration. Isolated precursor ions were subjected to higherenergy collision induced dissociation (HCD), with singly- and doubly-charged precursor ions excluded from triggering MS/MS events. The AGC (number of charges) target value for MS/MS events was set to 5e4. HCD was performed in a top-5 data-dependent mode with product ion detection in the Orbitrap FTMS operating at 15'000 resolution (at 400 m/z) with 3 microscans per each scan. Normalized collision energy (NCE) was set at 27 % (default charge state: 3+).49 Signal to noise (S/N) threshold for triggering MS/MS event was set to 15'000 throughout all experiments (relative intensity units). The choice of 10  ACS Paragon Plus Environment

Page 11 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

collision-induced dissociation (in the form of HCD) as a method of ion activation and dissociation (MS/MS) was made owing to its wide spread and preferential use in BUP, as well as due to its envisioned ability to achieve the objectives of the current study – to define the precise chemistry (including identification of by-products) of the proposed chemical-mediated digestion methods.

Data processing. Theoretical distributions of the features as well as in-silico digestions of the non-redundant UniProt protein databases of human, yeast (Saccharomyces cerevisiae), and bacteria (Escherichia coli) were performed using an in-house Python-based interface based on pyteomics library.50 The peptide size distributions were determined for currently targeted amino-acid cleavage sites for MDP (dibasic, Asp, Gln/Glu, Trp, Met, and Cys). Obtained .raw files were peak picked using ReadW, centroided and converted to mzXML format for deconvolution with S/N threshold set to 3.

Database search. Data was searched in ‘PTM discovery mode’ by MS Align+ against custom database (containing primary sequences of seven proteins used in the experiments). Precursor tolerance was set to 10 ppm, and product ion tolerance to 0.1 Da. In all cases except for the Cys-based protocol, carbamydometylation of cysteine residue was enabled as a fixed modification. Detailed results for each tested chemical digestion approach are available as Supporting Information material (Tables S1-S4) and include MS-Align+ output as tables (lists of identified peptides).

11  ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 49

Data analysis. Manual validation was performed to confirm or elucidate expected and unexpected modifications on the side chains of residues. For that purpose, the search algorithm results were exported to Excel, wherein the mass shifts of all the protein species were logged, along with purported mass shift localization data. The mass shifts were split into cleavage-inducing and non-cleavage inducing groups (e. g., +25 Da on the N-terminus, associated with formation of ITZ-peptide N-terminal to a cysteine vs. +16 Da on methionine associated with oxidation of methionine). Next, the mass shifts were evaluated with respect to plausibility of occurrence under the expected chemistry for a given chemical method. Notably, a significant percentage of mass shifts were associated with sums of multiple non-localized (due to lack of ion assignments) mass shifts within a region of a peptide (e. g., 41 Da = 25 Da + 16 Da, commonly occurring at the N-terminal end of a peptide with a proximate methionine). In such cases, manual ion assignment was carried out on .raw data, with aid of Protein Prospector in attempt to localize the individual mass shifts. Furthermore, the results output was subjected to additional treatment in cases where the Protein-Spectrum-Match (PrSM)38 with the highest E-score produced clearly misassigned output (i.e. ones containing short peptide sequences either side of a truncation with mass shifts of several thousand Da). In certain instances, the correct net mass shift was identified by the software, but the contributing individual mass shifts were associated with a wrong residue, hence matching the peptide with primary sequence which has one residue more or less than the true peptide has. In all such cases, the .raw data was analyzed manually in a ‘de novo’ approach to verify the proposed alternative assignments. 12  ACS Paragon Plus Environment

Page 13 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Results and Discussion. Selection of a protein cleavage site and a reagent. The peptide mass distributions from in-silico digestion of a human, yeast, or a bacterial proteome indicate that cleavages around the rarely occurring residues (Met, Cys, and Trp) produce peptide distributions similar to those from dibasic cleavage site, Figure S1 (Supporting Information). In terms of obtainable proteome coverage, a theoretical survey shows that targeting any of the three rarely occurring residues yields substantial characterization of the human proteome, Figure 1. Here, under proteome coverage we understand the percentage of total proteins constituting a given proteome that contain at least one targeted amino acid residue. Proteins lacking a target amino acid residue are divided into two groups – below and above a 30 kDa molecular weight threshold. With the current capabilities of a top-down MS, proteins lighter than 30 kDa can be readily analyzed in their intact form,51-53 whereas larger proteins might require specifically designed, sophisticated top-down data acquisition strategies with reduced throughput.54 Therefore, cleavage at Trp would yield 93.57 % of proteome coverage, with another 4.92 % suitable for high resolution top-down MS, and only 1.51 % of proteins lacking Trp residue and being too heavy for high-throughput top-down MS. Peptide backbone digestion at Cys would yield 97.25 % of proteome coverage with 2.29 %, and 0.46 % of proteins below and above 30 kDa, respectively, which do not contain Cys residue in their primary sequence. Notably, cleavage at Met would yield 99.85 % proteome coverage, with only 0.15 % of low, < 30 kDa, molecular weight proteins lacking Met residue. Interestingly, also throughout kingdoms the prevalence of Met as residue of choice for MDP is implied, as it yields 99.85 13  ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 49

%, 99.97 %, and 99.98 % proteome coverage for human, bacterial, and yeast proteome, respectively, Figure 1 and Figure S2 (Supporting Information). To summarize, cleavages at all three residues may provide an almost complete human proteome coverage and thus may be considered as residues of choice for MDP. The choice of the residue is thus for the chemistry of a suitable reagent, including its toxicity, specificity, and side reactions. For example, protein backbone cleavage on the amino-terminal side of Cys can be accomplished using 2-nitro-5-thiobenzoic acid (NTCB), which is known to be less toxic compared to cyanogen bromide (CNBr) which may cleave on the amino-terminal side of Met. Based on the results presented above, Cys will be preferentially considered here as a target residue for MDP.

Characteristics of proteome digestion at cysteine. To estimate the potential of Cys cleavage for MDP of a human proteome, we compared the NTCB-based protocol with the BUP benchmarked trypsin-based protocol in regard to the number of proteins and peptides potentially generated with this chemicalmediated digestion, Figure 2. These numbers are estimated as a function of peptide length (a count of amino acid residues). Considering the entire mass range of peptides (0-150 residues) and no missed cleavages, trypsin may help identify slightly higher number of proteins compared to NTCB: 19,955 vs. 19,687, respectively, Figure 2 top panel. Taking into account peptides from the MDP range, namely 30-150 residues long peptides, Cys-cleavage outperforms trypsin digestion with 18,966 vs. 15,893 proteins, respectively. Overall, trypsin potential performance is impressive for both 0-30 and 30-n regions, as expected. Nevertheless, Cys cleavage demonstrates comparable 14  ACS Paragon Plus Environment

Page 15 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

numbers of proteins targeted, but with a significantly fewer unique peptides required to achieve these numbers compared to trypsin, Figure 2 bottom panel (the NTCB-generated distribution is stacked on top of the trypsingenerated one). The number of unique peptides that trypsin generates in BUP region (0-30 residues bin) exceeds 550,000, Figure 2 bottom panel. This large number (which does not account for multiple instances of the same peptide due to various modifications to the peptide’s primary sequence such as posttranslational modifications, PTMs, here not considered) makes the LC separation challenging without pre-fractionation. Conversely, in the 0-30 residues bin, the number of unique peptides generated by targeting Cys is significantly, two orders of magnitude, lower compared to the one obtainable by trypsin. By increasing the peptide length and moving to the MDP range (30-150 residues), the number of peptides generated with NTCB gradually becomes two times higher than with trypsin (Figure 2, bottom panel inset). However, when we compare trypsin and NTCB for the use in their respective designated proteomic approaches, i.e. BUP for the former and MDP for the latter, we observe a significant reduction in the total number of unique peptides (as only ~95,000 peptides in the 30-150 residues bin are produced by NTCB), with potential benefits for the chromatographic separation. The peptide mass range criteria (3-15 kDa peptides) for protein identification via MDP approach would be fulfilled in case if a single missed cleavage is produced upon NTCB digestion (Figures S3 and S4, Supporting Information). Similar trends for NTCB vs. trypsin specificity were found for yeast and bacterial kingdoms (Figure S5, Supporting Information).

15  ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 49

Overall, the results strongly indicate that a cleavage at Cys residue may satisfy the needs of MDP approach: i) peptides produced with length exceeding 30 residues provide almost complete proteome coverage; ii) number of unique peptides in 30-n residues long bin in MDP is more than five-fold smaller than for trypsin-based BUP approach in its 0-30 residues long bin; and iii) specific cleavage at the amino-terminal side of Cys may be obtained with NTCBmediated digestion, according to the literature.

Cleavage rules for chemical-mediated digestion. Reaction mechanisms of protein digestion with small molecules considered here have been previously proposed and are known to be different than those of proteases. Most notably, this type of hydrolysis introduces a modification to the side chain of the targeted residue upon cleavage. A characteristic modification to the side chain of a residue is introduced even in the case the cleavage is not observed. NTCB-mediated digestion of proteins proceeds either toward cleavage at the amino-terminal end of Cys and formation of iminothiazolidine (ITZ)peptide or toward formation of a dehydroalanine (DHA) without a backbone cleavage, as depicted in Scheme 1.45, 55, 56 Resulting products of all chemicalmediated digestions considered here are summarized in Scheme 2. Each product is shown with its counterpart of the original residue (Cys, Met, and Trp) where the modified part of the residue is shown to indicate its difference from the chemical-digestion modified residue. In case of NTCB digestion, ITZ peptide formation introduces additional 24.995 Da modification to the Nterminal side of a formed peptide (difference in mass between CN and H).

16  ACS Paragon Plus Environment

Page 17 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Miscleavage and dehydroalanine formation is accompanied by a loss of SH2 group and thus leads to the loss of 33.987 Da, see Scheme 2, top panel. Similarly to cleavage at Cys residue, digestion of proteins by targeting Met residue with CNBr leads to either peptide backbone cleavage with formation of a homoserine lactone (HEL peptide)57 or a miscleavage resulting in formation of a homoserine (Scheme S1, Supporting Information). Additionally, CNBr may induce side-reactions, for example it cleaves at the N-terminal side of Trp residue (Scheme S1 right panel, Supporting Information). As shown in Scheme 2, middle panel, CNBr digestion products demonstrate a loss of 48.003 Da (SCH4 group) in case a HEL peptide is formed and a loss of 29.992 Da (difference in mass between O and SCH2) when the reaction proceeds with homoserine formation. A third considered approach to chemical digestion entails BNPS-Skatolemediated cleavage at Trp residue which results in either formation of a dioxindole alanine spirolactone (DAS peptide) following peptide backbone cleavage, or an oxindolyalanine formation when cleavage is omitted and an oxidation reaction takes place (Scheme S2, Supporting Information). Formation of DAS peptide upon BNPS-Skatole digestion adds 13.979 Da (mass difference between O and H2) to the N-terminal side of Trp (Scheme 2, bottom panel), whereas oxindolyalanine formation is accompanied with 15.994 Da, from an additional oxygen atom attached to Trp side chain, as previously reported in work of Vestling et al.

26

A fourth digestion procedure we employed entails usage of a different reagent than BNPS-skatole, namely o-iodosobenzoic acid, but it targets the same residue (Trp). Since this protocol resulted in our hands in various side 17  ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 49

reactions, lack of selectivity and efficiency in cleaving, we opted not to expand on the underlying mechanisms and focused instead on the BNPS-skatole reagent. The principal mechanisms associated with both o-iodosobenzoic acid as well as BNPS-skatole are similar. In case of BNPS-skatole, reaction results in formation of dioxindole alanine spirolactone (DAS peptide), Scheme 2, whereas reaction with o-iodosobenzoic acid results in dioxindolylalanine.46

Comparative analysis of a seven-protein mixture with chemical-mediated digestion. A seven-protein mixture was digested in triplicate with each of the proposed chemical methods (o-iodosobenzoic acid, BNPS-Skatole, CNBr, and NTCB),

and

resulting

peptides

were

analyzed

by

LC-MS/MS.

The

superposition of the total ion current (TIC) chromatograms from the three replicate runs indicate the superior reproducibility of the NTCB-based approach compared to other chemical-mediated digestion protocols (Figure S6, Supporting Information). Table 1 summarizes the results of the applied chemical digestion methods. The experimentally obtained average molecular weights for the peptides generated by each protocol (5.8 kDa, 7.8/8.8 kDa, and 7.3 kDa for cleavage at Cys, Trp, and Met, respectively) are in accordance with theoretical distributions (Figure 2, Figures S1 and S2, Supporting Information) and fall in the desired 3-15 kDa mass range for MDP. These results are considered in more detail below.

A. Digestion with o-iodosobenzoic acid. For the o-iodosobenzoic acid protocol, the average sequence coverage was 51.4%, with cleavages at Trp detected in 79.4% (27/34) of cases. Notably, several of the identified peptides showed 18  ACS Paragon Plus Environment

Page 19 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

cleavage at the N-terminal, rather than C-terminal side to the tryptophan. In a number of cases, a previously unreported, to the best of our knowledge, for this protocol mass shift of 31.98 Da (13.98 Da +H2O) was observed (Figure S7, Supporting Information). Moreover, for this particular protein mixture the identified peptides, specifically for the larger proteins, tend to be located at or very close to the termini of the proteins (e. g., in the case of serotransferrin, seroalbumin, and enolase). As expected, ribonuclease, for lack of Trp residues, and due to its relatively small size (13.6 kDa) was detected as an entire protein with the employed MDP set-up.

B. Digestion with BNPS-Skatole. For the BNPS-Skatole protocol similar results were obtained with an overall sequence coverage (excluding ribonuclease) of 47.8% and cleavages observed at 73.5% (25/34) of all tryptophan residues. Importantly, some of the identified peptides contained cleavages around basic and acidic amino acids and not in vicinity of Trp residues.

C. Digestion with cyanogen bromide. An in-depth analysis of the cleavage sites identified for the CNBr protocol revealed that, contrary to the chemoselectivity proposed for the reagent, cleavages were equally likely to occur at Met and at Trp, see reaction mechanisms in Scheme S1, Supporting Information. In fact, disregarding N-terminal Met residues, cleavages were identified at 72.5% (37/51) of Met residues and 76.5 % (26/34) of Trp residues. The latter percentage being effectively equal to the two observed for the Trp-specific protocols, vide supra. A mechanism involving halogenation of Trp followed by HBr loss and hydrolysis has been proposed to explain cleavage at Trp residues 19  ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 49

(Scheme S1, Supporting Information). Finally, it should be noted that multiple instances of protein cleavage near aspartic and glutamic acids have also been observed. This can likely be a result of deprotonation of the side chain followed by nucleophilic substitution at the carbonyl group of the amide bond. Notably, this reduced specificity of the CNBr protocol is also at the base of the significantly higher sequence coverage for the 7-protein mixture, 78.8 %, compared to other Trp cleaving protocols. However, such limited specificity could be detrimental for the analysis of complex protein mixtures (e. g., whole proteomes).

D. Digestion with NTCB. The NTCB digestion-specific cleavages were observed at 62.5% (60/96) of cysteines with sequence coverage above 80%, Table 1. These numbers exclude carbonic anhydrase, which lacks cysteines, and the two enolases which contain one cysteine each and would form circa 18.9 kDa and 24.8 kDa fragments which would unlikely to be detected with an employed MDP instrumental set-up. Notably, only a single instance of a cleavage non-proximate to cysteine was observed (within a sequence motif …QSNSKD…, Table 1). However, this cleavage was observed in all four protocols and is thus deemed to be non-reagent specific and is likely a result of local structural effects that render hydrolysis under basic conditions particularly efficient. In five additional instances of secondary cleavages, backbone cleavages within three amino acids from a cysteine were observed. These cleavages occurred within the following sequence motifs: …SCHTGL…, …DKKSCHT…, …CGDNTRK…, …SSNYCN…, and …TKDRCK… (Table 1). For these five sequence tags, in three out of five cases Ser was present; in three 20  ACS Paragon Plus Environment

Page 21 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

cases either Lys, Thr or/and Asp were present; in two cases His or Arg were present. Therefore, in four out of five instances, a basic residue is present in the vicinity of the secondary cleavage, and in all cases a residue containing an alcohol or a carboxylic acid is proximate to the cysteine. Hence, likely, both amino acid sequence and local secondary structure played a role. Finally, one should note that a single chemical cleavage of a protein with NTCB will produce two peptides, Scheme 1. The peptide C-terminal of the cleavage will bear the chemical modification of its N-terminus and thus become an ITZ peptide, whereas base-catalyzed hydrolysis of the peptide bond will result in formation of a peptide N-terminal of the chemical cleavage with a free acid C-terminus, thus posing no problem for its detection and subsequent assignment of its fragments in tandem mass spectra using standard algorithms. In case of a missed cleavage where a label is introduced to the side chain of a cysteine residue, but the cleavage of the bond is omitted, both successive b- and preceding y-ions are diagnostic since any ion containing the modified residue will be carrying the mass shift associated with the modification.

Analysis of a seven-protein mixture with NTCB-mediated digestion. Based on the results presented above, in the following we will consider the NTCBmediated approach, as the more promising one, in more detail. Figure 3 shows a corresponding TIC chromatogram and example intact peptide mass spectra of a seven-protein mixture digested with NTCB. Even with the sizing down of a proteolytic pool, intrinsic problem inherent to top-down MS - high heterogeneity of molecular weights and charges in the same LC-MS/MS run 21  ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 49

- is pronounced with the MDP approach as well.58 The sample complexity challenges successful peptide fragmentation and obtaining the complete sequence coverage, which in turn can bias protein identification. On the other hand, due to its substantial length, even a single peptide in MDP can be potentially sufficient for unambiguous protein identification, thus reducing a “two-peptide” rule to a “single-peptide” rule for protein identification in MDP.59 Figure 4 illustrates an example of a single peptide sufficient for unambiguous identification of a protein (serotransferrin) against Swiss-Prot database using digestion with NTCB of a seven-protein mixture. Isolated [M+13H]13+ precursor ion is shown in Figure 4, right inset. Assignment verification is confirmed by the presence of the expected chemical label (Nterminal cyanylation, Δ mass = 24.99 Da) with HCD MS/MS. To confirm the presence of N-terminal cyanilation and formation of ITZ-peptides, 8 diagnostic b-ions bearing a +24.99 Da shift were identified. The high intensity series of b2-b5 ions additionally confirm the location of the modification in question at the N-terminal cysteine. Absence of y-ions at the N-terminal region of the peptide, and, similarly, b-ions near the C-terminal region could be explained by their large size (~13 kDa, size of a small protein) and constraints of employed search algorithm parameters (particularly S/N ratio threshold). Importantly, though, with the chemical modification retained generally at the N-terminus of an ITZ-peptide, the y-ion series can be solely used for increasing peptide sequence coverage or for confirming the presence of NTCB miscleavages indicated by specific modifications on non-terminal Cys residues.

22  ACS Paragon Plus Environment

Page 23 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Data interpretation peculiarities. The structurally different way of cleaving proteins with chemicals, outlined in Schemes 1, S1, and S2, represents a certain challenge for data analysis software currently available in proteomics. In addition, the MDP data analysis in general is complicated by the lack of appropriate tools for data interpretation. Luckily, currently available software tools for TD MS can be used for MDP data. However, manual validation is required, mainly due to the occasional incorrect assignment of peptide’s N- or C-terminal residue. This might derive from the fact that typical TD MS software accounts for fragmentation patterns, rather than cleavage (digestion) rules. Hence, the crucial parameter on peptide’s starting residue (in case of N-terminal cleavage) or ending residue (in case of C-terminal cleavage) is excluded from the defined search guidelines a priori. In TD MS, no in-silico peptide database is constructed as in proteolytic-based MS approaches and the identification is based on matched (or missing) fragmentation ladder and the mass of the investigated precursor ion. The latter can be biased by the various modifications occurring on the peptide and lead to an incorrect peptide assignment as exemplified for bovine serotransferrin (UniProt accession number G3X6N3) in Figure 5. Here, an additional amino acid (in respect to the expected primary sequence based on the cleaving agent specificity) was included at the N-terminus, identifying the peptide which starts with Ser instead of Cys. This is likely due to the high (>50 Da) mass shift associated with the expected N-terminal amino acid, and the lack of Nterminal product ions identified. First product ion assigned, b448, identified unknown mass shift which matched theoretical mass of a protein portion that corresponds to amino acid sequence starting with Ser with a shift of -19.03 23  ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 49

Da that is not in the list of the known PTMs. Here, numerical denotation on a product ion indicates the position of the bond cleaved in respect to the entire protein sequence, as considered by the search algorithm employed. Manual assignment of product ions from the mass spectra and mass matching to the theoretical list of product ions for the peptide starting with Cys identified btype ion series consistent with cyanylation of cysteine (ITZ-peptide formation; Cys+24.99 Da) as well as b- and y-ions localizing carbamylation at lysine (Lys+43 Da), reported to occur in this protocol.60 Importantly, of the ~3000 individual entries returned by the software, for nine LC-MS/MS runs for NTCB protocol, over 99 % of net mass shifts were rationalized and found to be consistent with chemical reactions occurring during the execution of the experimental protocol. Therefore, development of an en masse automated assignment should consider certain mass shift combinations that effectively produce sum mass shifts of the same nominal mass involving different numbers of Cys residues (from zero to four). For other cleaving methods employed about 10 -15 % of mass shifts remained unclarified (data not shown here), likely due to hitherto unidentified chemical reactions present under protocol conditions or presence of persistent intermediate species, which were outside of the scope of this evaluation study. In some other cases, as for example for the o-iodosobenzoic acid protocol, an unexpected mass shift of 31.98 Da (13.98 Da +H2O) was revealed, highlighting the importance of manual results validation (Figure S7, Supporting Information).

24  ACS Paragon Plus Environment

Page 25 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Specificity of NTCB-induced cleavage. Figure 6 elucidates the occurrence of various chemical channels for NTCB digestion of a seven-protein mixture. The vertical axes in both panels show  the number of assigned instances (counts) of detected mass shifts in the ensemble of identified peptide species with unique combinations of net mass shifts (species-based abundance weighting factors are not applied). Importantly, the figure thus represents chemical reaction (and hence species) diversity rather than relative kinetics of the occurring reactions. As a result, if the most abundant species does not contain DHA, and one of the least abundant contains two DHA’s, this will be recorded as equal number of counts for detected ITZ-peptide and DHA instances. As expected, the predominant channel is Cys cyanylation leading to ITZ-peptide formation, Figure 6 top panel. The percentage of missed cleavage counts, either as a result of a competing reaction (β-elimination producing DHA) or presence of non-cleaved labelled Cys, relative to the number of ITZpeptide counts, were 40 % and 20 % respectively, Figure 6 top panel, inset. The detected labelled species is identified in Scheme 1 as the precursor molecule to either ITZ-peptide or DHA formation and thus managed to avoid undergoing either of the two irreversible reactions. Additional channels and unknown modifications as a function of their observed mass shifts are shown in Figure 6 bottom panel. The dotted red line indicates the upper threshold count for the most frequently observed unknown modification, as outlined in both panels in Figure 6. Its purpose is to highlight the high specificity of the NTCB-mediated chemistry and relative lack of additional chemical reactions. Indeed, the sum of the unique peptide 25  ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 49

species with unknown modifications is only 5 % from the total number of entries returned by automated MS Align+ search over nine replicas (three digestion replicates repeated in three technical replicates). Overall, the NTCB protocol was found to be the most chemoselective and the one providing the highest proteome coverage of the four protocols under consideration (Table 1).

Conclusions We found that protein hydrolysis using 2-nitro-5-thiobenzoic acid (NTCB) reagent is in line with the required characteristics for a middle-down approach to proteome analysis, including: i) high amino acid residue specificity, ii) reproducibility, iii) generation of long sequence peptides in the middle-down proteomics (MDP) mass range, and vi) availability of the reagent and its relatively low toxicity. Thus, NTCB-based protocol can be proposed for a qualitative MDP analysis. Compared to other cleaving agents suggested for MDP, NTCB shows a promise for development into a quantitatively accurate approach. Appearance of unconstrained mass modifications in the data analysis revealed that the strategy of changing the nature of the base towards reduced basicity with reasonable nucleophilicity and low steric hindrance allowed us to affect the branching ratio of competing pathways toward proteolysis, rather than formation of dehydroalanine. The characteristic mass shifts associated with both major reaction channels, coupled to the chemoselectivity of the reactions in question, enable facile data interpretation and manual validation of assignments. Further method development may involve the use of other than HCD ion activation technologies for the characterization of large MDP peptides, 26  ACS Paragon Plus Environment

Page 27 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

including high capacity electron transfer dissociation, activated ion electron transfer dissociation, or ultraviolet photodissociation.52,

61, 62

Naturally,

validation of the suggested chemical-mediated approach needs to be done at a larger scale, involving complex mixtures of proteins present in a wide dynamic range of concentrations, like in the case of whole proteomes.

Supporting Information: the

following

files

are

available

free

of

charge

at

ACS

website

http://pubs.acs.org:

SI Chemicals MDP Srzentic JPR 2018. This file contains supplementary Figures S1-S7 and supplementary Schemes S1 and S2: Scheme S1. Reaction mechanisms for protein digestion with CNBr. Scheme S2. Reaction mechanisms for protein digestion with BNPS-Skatole. Figure S1. In-silico digestion of human, yeast and bacterial proteomes. Figure S2. Venn diagrams of yeast and bacterial proteome coverage with chemical methods targeting less frequent residues (Met, Cys, and Trp). Figure S3. Statistics on digestion of human proteome with NTCB and trypsin when only peptides generated allowing one miscleavage are considered. Figure S4. Statistics on digestion of human proteome with NTCB and trypsin when peptides generated with both zero and one miscleavages are considered. Figure S5. Bacterial, yeast proteomes digestion statistics with NTCB, trypsin. Figure S6. LC-MS/MS analysis of peptides obtained by chemical hydrolysis of 7-protein mixture with NTCB, CNBr, BNPS-Skatole and o-iodosobenzoic acid. Figure S7. Assignment example of the +31.98 Da modification for a peptide. 27  ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 49

Table_S1_Srzentic_et_al_PeptideList_NTCB_replicate_1_1.xlsx contains Table S1: list of peptides identified by MS-Align+ for NTCB digestion Table_S2_Srzentic_et_al_PeptideList_o-iodosobenzoic_acid_replicate_1_2.xlsx contains Table S2: list of peptides identified by MS-Align+ for oiodosobenzoic acid digestion Table_S3_Srzentic_et_al_PeptideList_CNBr_replicate_1_1.xlsx contains Table S3: list of peptides identified by MS-Align+ for CNBr digestion Table_S4_Srzentic_et_al_PeptideList_BNPSskatole_replicate_1_1.xlsx contains Table S4: list of peptides identified by MS-Align+ for BNPS-Skatole digestion.

Experimental data. Mass spectrometry .raw files are available on MassIVE with the data set identifier MSV000082216: https://massive.ucsd.edu/

Competing Financial Interests. The authors declare no competing financial interest(s).

Acknowledgments Authors thank Dr. Grigory Karateev and Dr. Elena Dubikovskaya for technical support. We are grateful for financial support through the Swiss National Science Foundation (SNF project 200021-125147/1 to YOT), the European Research Council (ERC Starting Grant 280271 to YOT), and  the Russian Foundation for Basic Research (RFBR project 16-54-21006 to MVG). 28  ACS Paragon Plus Environment

Page 29 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

References 1.

Aebersold, R.; Mann, M., Mass spectrometry-based proteomics. Nature

2003, 422, (6928), 198-207. 2.

Tsiatsiani, L.; Heck, A. J., Proteomics beyond trypsin. FEBS J 2015,

282, (14), 2612-26. 3.

Chait, B. T., Mass spectrometry in the postgenomic era. Annu Rev

Biochem 2011, 80, 239-46. 4.

Chait, B. T., Mass Spectrometry: Bottom-Up or Top-Down? Science

2006, 314, (5796), 65-66. 5.

Zhang, X., Less is More: Membrane Protein Digestion Beyond Urea-

Trypsin Solution for Next-level Proteomics. Mol Cell Proteomics 2015, 14, (9), 2441-53. 6.

Michalski, A.; Cox, J.; Mann, M., More than 100,000 Detectable Peptide

Species Elute in Single Shotgun Proteomics Runs but the Majority is Inaccessible to Data-Dependent LC−MS/MS. Journal of Proteome Research 2011, 10, (4), 1785-1793. 7.

Zubarev, R. A., The challenge of the proteome dynamic range and its

implications for in‐depth proteomics. PROTEOMICS 2013, 13, (5), 723-726. 8.

Wu, C.; Tran, J. C.; Zamdborg, L.; Durbin, K. R.; Li, M.; Ahlf, D. R.;

Early, B. P.; Thomas, P. M.; Sweedler, J. V.; Kelleher, N. L., A protease for 'middle-down' proteomics. Nat Meth 2012, 9, (8), 822-824. 9.

Cannon, J.; Lohnes, K.; Wynne, C.; Wang, Y.; Edwards, N.; Fenselau,

C., High-throughput middle-down analysis using an orbitrap. J Proteome Res 2010, 9, (8), 3886-90. 29  ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

10.

Page 30 of 49

Sweredoski, M. J.; Moradian, A.; Raedle, M.; Franco, C.; Hess, S., High

Resolution Parallel Reaction Monitoring with Electron Transfer Dissociation for Middle-Down Proteomics. Analytical Chemistry 2015, 87, (16), 8360-8366. 11.

Laskay, U. A.; Srzentic, K.; Monod, M.; Tsybin, Y. O., Extended bottom-

up proteomics with secreted aspartic protease Sap9. J Proteomics 2014, 110, 20-31. 12.

Srzentic, K.; Fornelli, L.; Laskay, U. A.; Monod, M.; Beck, A.; Ayoub, D.;

Tsybin, Y. O., Advantages of extended bottom-up proteomics using Sap9 for analysis of monoclonal antibodies. Anal Chem 2014, 86, (19), 9945-53. 13.

Wu, S.-L.; Kim, J.; Hancock, W. S.; Karger, B., Extended Range

Proteomic Analysis (ERPA):  A New and Sensitive LC−MS Platform for High Sequence Coverage of Complex Proteins with Extensive Post-translational ModificationsComprehensive Analysis of Beta-Casein and Epidermal Growth Factor Receptor (EGFR). Journal of Proteome Research 2005, 4, (4), 11551170. 14.

Taouatas, N.; Drugan, M. M.; Heck, A. J. R.; Mohammed, S.,

Straightforward

ladder

sequencing

of

peptides

using

a

Lys-N

metalloendopeptidase. Nat Meth 2008, 5, (5), 405-407. 15.

Drapeau, G. R.; Boily, Y.; Houmard, J., Purification and Properties of

an Extracellular Protease of Staphylococcus aureus. Journal of Biological Chemistry 1972, 247, (20), 6720-6726. 16.

Kalli, A.; Sweredoski, M. J.; Hess, S., Data-Dependent Middle-Down

Nano-Liquid Chromatography–Electron Capture Dissociation-Tandem Mass Spectrometry: An Application for the Analysis of Unfractionated Histones. Analytical Chemistry 2013, 85, (7), 3501-3507. 30  ACS Paragon Plus Environment

Page 31 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

17.

Moradian, A.; Kalli, A.; Sweredoski, M. J.; Hess, S., The top-down,

middle-down, characterization

and of

bottom-up histone

mass variants

spectrometry and

their

approaches

for

post-translational

modifications. Proteomics 2014, 14, (4-5), 489-497. 18.

Sidoli, S.; Garcia, B. A., Middle-down proteomics: a still unexploited

resource for chromatin biology. Expert Review of Proteomics 2017, 14, (7), 617-626. 19.

Zhang, L.; English, A. M.; Bai, D. L.; Ugrin, S. A.; Shabanowitz, J.; Ross,

M. M.; Hunt, D. F.; Wang, W.-H., Analysis of Monoclonal Antibody Sequence and Post-translational Modifications by Time-controlled Proteolysis and Tandem Mass Spectrometry. Molecular & Cellular Proteomics : MCP 2016, 15, (4), 1479-1488. 20.

Yang, H.-J.; Shin, S.; Kim, J.; Hong, J.; Lee, S.; Kim, J., Vortex-assisted

tryptic digestion. Rapid Communications in Mass Spectrometry 2011, 25, (1), 88-92. 21.

Fontana, A.; de Laureto, P. P.; Spolaore, B.; Frare, E.; Picotti, P.;

Zambonin, M., Probing protein structure by limited proteolysis. Acta Biochim Pol 2004, 51, (2), 299-321. 22.

Feng, Y.; De Franceschi, G.; Kahraman, A.; Soste, M.; Melnik, A.;

Boersema, P. J.; de Laureto, P. P.; Nikolaev, Y.; Oliveira, A. P.; Picotti, P., Global analysis of protein structural changes in complex proteomes. Nat Biotech 2014, 32, (10), 1036-1044. 23.

Schopper, S.; Kahraman, A.; Leuenberger, P.; Feng, Y.; Piazza, I.;

Müller, O.; Boersema, P. J.; Picotti, P., Measuring protein structural changes

31  ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

on

a

proteome-wide

scale

using

limited

Page 32 of 49

proteolysis-coupled

mass

spectrometry. Nature Protocols 2017, 12, 2391. 24.

Laskay, U. A.; Lobas, A. A.; Srzentic, K.; Gorshkov, M. V.; Tsybin, Y. O.,

Proteome digestion specificity analysis for rational design of extended bottomup and middle-down proteomics experiments. J Proteome Res 2013, 12, (12), 5558-69. 25.

Crimmins, D. L.; Mische, S. M.; Denslow, N. D., Chemical cleavage of

proteins in solution. Curr Protoc Protein Sci 2005, Chapter 11, Unit 11 4. 26.

Vestling, M. M.; Kelly, M. A.; Fenselau, C.; Costello, C. E., Optimization

by mass spectrometry of a tryptophan-specific protein cleavage reaction. Rapid Communications in Mass Spectrometry 1994, 8, (9), 786-790. 27.

Chapman, E.; Thorson, J. S.; Schultz, P. G., Mutational Analysis of

Backbone Hydrogen Bonds in Staphylococcal Nuclease. Journal of the American Chemical Society 1997, 119, (30), 7151-7152. 28.

Smith, B. J., Basic Protein and Peptide Protocols. In Methods in

Molecular Biology, Humana Press, Totowa, NJ: 1994; Vol. 32, pp 297-309. 29.

Bar-Or, R.; Rael, L. T.; Bar-Or, D., Dehydroalanine derived from

cysteine is a common post-translational modification in human serum albumin. Rapid Communications in Mass Spectrometry 2008, 22, (5), 711-716. 30.

Kuhn, K.; Thompson, A.; Prinz, T.; Müller, J.; Baumann, C.; Schmidt,

G.; Neumann, T.; Hamon, C., Isolation of N-Terminal Protein Sequence Tags from Cyanogen Bromide Cleaved Proteins as a Novel Approach to Investigate Hydrophobic Proteins. 2003; Vol. 2, p 598-609. 31.

Prinz, T.; Müller, J.; Kuhn, K.; Schäfer, J.; Thompson, A.; Schwarz, J.;

Hamon, C., Characterization of Low Abundant Membrane Proteins Using the 32  ACS Paragon Plus Environment

Page 33 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Protein Sequence Tag Technology. Journal of Proteome Research 2004, 3, (5), 1073-1081. 32.

Cannon, J. R.; Edwards, N. J.; Fenselau, C., Mass-biased partitioning

to enhance middle down proteomics analysis. J Mass Spectrom 2013, 48, (3), 340-3. 33.

Fenselau, C.; Laine, O.; Swatkoski, S., Microwave assisted acid cleavage

for denaturation and proteolysis of intact human adenovirus. Int J Mass Spectrom 2011, 301, (1-3), 7-11. 34.

Swatkoski, S.; Gutierrez, P.; Wynne, C.; Petrov, A.; Dinman, J. D.;

Edwards, N.; Fenselau, C., Evaluation of microwave-accelerated residuespecific acid cleavage for proteomic applications. Journal of Proteome Research 2008, 7, (2), 579-586. 35.

Han, K.-K.; Richard, C.; Biserte, G., Current developments in chemical

cleavage of proteins. International Journal of Biochemistry 1983, 15, (7), 875884. 36.

Kelstrup, C. D.; Bekker-Jensen, D. B.; Arrey, T. N.; Hogrebe, A.; Harder,

A.; Olsen, J. V., Performance Evaluation of the Q Exactive HF-X for Shotgun Proteomics. Journal of Proteome Research 2018, 17, (1), 727-738. 37.

Beck, S.; Michalski, A.; Raether, O.; Lubeck, M.; Kaspar, S.; Goedecke,

N.; Baessmann, C.; Hornburg, D.; Meier, F.; Paron, I.; Kulak, N. A.; Cox, J.; Mann, M., The Impact II, a Very High-Resolution Quadrupole Time-of-Flight Instrument (QTOF) for Deep Shotgun Proteomics. Molecular & Cellular Proteomics 2015, 14, (7), 2014-2029.

33  ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

38.

Page 34 of 49

Liu, X.; Sirotkin, Y.; Shen, Y.; Anderson, G.; Tsai, Y. S.; Ting, Y. S.;

Goodlett, D. R.; Smith, R. D.; Bafna, V.; Pevzner, P. A., Protein identification using top-down spectra. Mol Cell Proteomics 2012, 11, (6), M111 008524. 39.

Liu, X.; Inbar, Y.; Dorrestein, P. C.; Wynne, C.; Edwards, N.; Souda, P.;

Whitelegge, J. P.; Bafna, V.; Pevzner, P. A., Deconvolution and database search of complex tandem mass spectra of intact proteins: a combinatorial approach. Mol Cell Proteomics 2010, 9, (12), 2772-82. 40.

LeDuc, R. D.; Kelleher, N. L., Using ProSight PTM and Related Tools for

Targeted Protein Identification and Characterization with High Mass Accuracy Tandem MS Data. Current Protocols in Bioinformatics 2007, 19, (1), 13.6.113.6.28. 41.

Toby, T. K.; Fornelli, L.; Kelleher, N. L., Progress in Top-Down

Proteomics and the Analysis of Proteoforms. Annual review of analytical chemistry (Palo Alto, Calif.) 2016, 9, (1), 499-519. 42.

Ansong, C.; Wu, S.; Meng, D.; Liu, X.; Brewer, H. M.; Deatherage Kaiser,

B. L.; Nakayasu, E. S.; Cort, J. R.; Pevzner, P.; Smith, R. D.; Heffron, F.; Adkins, J. N.; Paša-Tolić, L., Top-down proteomics reveals a unique protein S-thiolation switch in Salmonella Typhimurium in response to infection-like conditions. Proceedings of the National Academy of Sciences of the United States of America 2013, 110, (25), 10153-10158. 43.

Vyatkina, K.; Wu, S.; Dekker, L. J. M.; VanDuijn, M. M.; Liu, X.; Tolić,

N.; Luider, T. M.; Paša-Tolić, L.; Pevzner, P. A., Top-down analysis of protein samples by de novo sequencing techniques. Bioinformatics 2016, 32, (18), 2753-2759.

34  ACS Paragon Plus Environment

Page 35 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

44.

Vyatkina, K., De Novo Sequencing of Top-Down Tandem Mass Spectra: A

Next Step towards Retrieving a Complete Protein Sequence. 2017; Vol. 5, p 6. 45.

Degani, Y.; Patchornik, A., Cyanylation of sulfhydryl groups by 2-nitro-

5-thiocyanobenzoic acid. High-yield modification and cleavage of peptides at cysteine residues. Biochemistry 1974, 13, (1), 1-11. 46.

Mahoney, W. C.; Smith, P. K.; Hermodson, M. A., Fragmentation of

proteins with o-iodosobenzoic acid: chemical mechanism and identification of o-iodoxybenzoic acid as a reactive contaminant that modifies tyrosyl residues. Biochemistry 1981, 20, (2), 443-448. 47.

Gross, E.; Witkop, B., Selective cleavage of the methionyl peptide bonds

in ribonuclease with cyanogen bromide. Journal of the American Chemical Society 1961, 83, (6), 1510-1511. 48.

Smith, B. J., Chemical Cleavage of Proteins. In New Protein Techniques,

Walker, J. M., Ed. Humana Press: Totowa, NJ, 1988; pp 71-88. 49.

Laskay, U. A.; Srzentic, K.; Fornelli, L.; Upir, O.; Kozhinov, A. N.;

Monod, M.; Tsybin, Y. O., Practical considerations for improving the productivity of mass spectrometry-based proteomics. Chimia (Aarau) 2013, 67, (4), 244-9. 50.

Goloborodko, A. A.; Levitsky, L. I.; Ivanov, M. V.; Gorshkov, M. V.,

Pyteomics - a Python Framework for Exploratory Data Analysis and Rapid Software Prototyping in Proteomics. Journal of The American Society for Mass Spectrometry 2013, 24, (2), 301-304. 51.

Fornelli, L.; Toby, T. K.; Schachner, L. F.; Doubleday, P. F.; Srzentić,

K.; DeHart, C. J.; Kelleher, N. L., Top-down proteomics: Where we are, where we are going? Journal of Proteomics 2018, 175, 3-4. 35  ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

52.

Page 36 of 49

Riley, N. M.; Westphall, M. S.; Coon, J. J., Activated Ion-Electron

Transfer

Dissociation

Enables

Comprehensive

Top-Down

Protein

Fragmentation. Journal of Proteome Research 2017, 16, (7), 2653-2659. 53.

Cheon, D. H.; Yang, E. G.; Lee, C.; Lee, J. E., Low-Molecular-Weight

Plasma

Proteome

Analysis

Using

Top-Down

Mass

Spectrometry.

In

Serum/Plasma Proteomics: Methods and Protocols, Greening, D. W.; Simpson, R. J., Eds. Springer New York: New York, NY, 2017; pp 103-117. 54.

Fornelli, L.; Durbin, K. R.; Fellers, R. T.; Early, B. P.; Greer, J. B.;

LeDuc, R. D.; Compton, P. D.; Kelleher, N. L., Advancing Top-down Analysis of the Human Proteome Using a Benchtop Quadrupole-Orbitrap Mass Spectrometer. Journal of Proteome Research 2017, 16, (2), 609-618. 55.

Wu, J.; Watson, J. T., Optimization of the Cleavage Reaction for

Cyanylated Cysteinyl Proteins for Efficient and Simplified Mass Mapping. Analytical Biochemistry 1998, 258, (2), 268-276. 56.

Jacobson, G. R.; Schaffer, M. H.; Stark, G. R.; Vanaman, T. C., Specific

Chemical Cleavage in High Yield at the Amino Peptide Bonds of Cysteine and Cystine Residues. Journal of Biological Chemistry 1973, 248, (19), 6583-6591. 57.

Lawson, W. B.; Gross, E.; Foltz, C. M.; Witkop, B., Specific cleavage of

methionyl peptides. Journal of the American Chemical Society 1961, 83, (6), 1509-1510. 58.

Compton, P. D.; Zamdborg, L.; Thomas, P. M.; Kelleher, N. L., On the

Scalability and Requirements of Whole Protein Mass Spectrometry. Analytical Chemistry 2011, 83, (17), 6868-6874. 59.

Omenn, G. S.; Lane, L.; Lundberg, E. K.; Beavis, R. C.; Nesvizhskii, A.

I.; Deutsch, E. W., Metrics for the Human Proteome Project 2015: Progress on 36  ACS Paragon Plus Environment

Page 37 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

the

Human

Proteome

and

Guidelines

for

High-Confidence

Protein

Identification. Journal of Proteome Research 2015, 14, (9), 3452-3460. 60.

Tang, H.-Y.; Speicher, D. W., Identification of alternative products and

optimization of 2-nitro-5-thiocyanatobenzoic acid cyanylation and cleavage at cysteine residues. Analytical Biochemistry 2004, 334, (1), 48-61. 61.

Riley, N. M.; Mullen, C.; Weisbrod, C. R.; Sharma, S.; Senko, M. W.;

Zabrouskov, V.; Westphall, M. S.; Syka, J. E. P.; Coon, J. J., Enhanced Dissociation of Intact Proteins with High Capacity Electron Transfer Dissociation. Journal of The American Society for Mass Spectrometry 2016, 27, (3), 520-531. 62.

Cleland, T. P.; DeHart, C. J.; Fellers, R. T.; VanNispen, A. J.; Greer, J.

B.; LeDuc, R. D.; Parker, W. R.; Thomas, P. M.; Kelleher, N. L.; Brodbelt, J. S., High-Throughput Analysis of Intact Human Proteins Using UVPD and HCD on an Orbitrap Mass Spectrometer. Journal of Proteome Research 2017, 16, (5), 2072-2079.

37  ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 38 of 49

Figure captions.

Scheme 1. Reaction mechanisms for chemical-mediated protein digestion with NTCB leading to a cleavage at X-Cys (ITZ-peptide) or a miscleavage with dehydroalanine (DHA-peptide) formation.

Scheme 2. Characteristic reaction products and mass shifts upon X-Cys, Met-X, and Trp-X cleavages with chemical agents. Structures of reaction products with and without miscleavages are shown relative to their unmodified residues. The dotted red lines indicate the regions of the molecules where the modifications have been introduced.

Figure 1. Human proteome coverage obtainable with chemical agents targeting rare amino acid residues: Trp, Cys and Met. Percentage of noncleavable proteins is further separated into two categories by mass (MW > 30 kDa indicated with pink circle; MW < 30 kDa indicated with yellow circle). The location of the circles is chosen to demonstrate their relative percentage, not the overlap between the categories.

Figure 2. Statistics on the in-silico digestion of a human proteome with NTCB (cleavage at Cys) and trypsin generated allowing no miscleavages. (Top panel) shows the number of proteins as a function of peptide length given as a count of residues. For both cleaving agents two mass ranges were considered (0-n and 30-n residues). Maximum attainable numbers of potentially identifiable proteins in both ranges are given for n=150. (Bottom panel) shows the stacked 38  ACS Paragon Plus Environment

Page 39 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

histograms of a number of unique peptides for both cleaving agents in the mass ranges for BUP (0-30 residues) and MDP (30-150 residues).

Figure 3. Evaluation of peptide mass and charge state variation over LCMS/MS gradient obtained with LTQ Orbitrap Elite FTMS from a seven-protein mixture digested with NTCB. Insets illustrate three examples of detection of short (~20 residues), average (~50 residues), and long (> 60 residues) peptides.

Figure 4. Representative example of a single peptide sufficient for unambiguous identification of a single protein against Swiss-Prot database. Isolated 13+ precursor ion of serotransferrin (Bos Taurus) derived from Nterminal Cys digestion with NTCB is indicated in the right inset. Assignment verification is confirmed by presence of expected chemical label (N-terminal cyanylation, Δ mass = 24.99 Da) with HCD MS/MS. Identified diagnostic ITZpeptide b-ions in the vicinity of the Cys residue are highlighted in red.

Figure 5. Example of the incorrect assignment of the cleavage site by the search algorithm and correct manual assignment. The peptide sequence is in black, while adjacent amino acid residues that complete protein sequence are in grey. Product ions that confirm modifications and their respective localization are indicated in red. Correct mass shifts are indicated above respective residue in blue. Figure 6. Statistical analysis of Cys cleavage in a model seven-protein mixture reflects chemical reaction complexity of the detected species ensemble. For 39  ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 40 of 49

clarity, net mass shifts with identified contributing channels are split into individual reaction channels. (Top panel) illustrates predominance of ITZpeptide formation among the identified channels. Inset shows count frequency for instances of either: DHA formation (uncleavable product), Cys sites that carry a label, and ITZ peptide formation. (Bottom panel) shows the remaining assigned and unidentified mass shifts with a threshold (dotted red line) for the most frequently encountered unassigned net mass shift.

Table 1. Sequence coverage, number of identified peptides and average peptide length for seven protein mixture digested with chemical methods employed. Number of expected vs. experimentally assigned cleavage sites and observed secondary cleavages are indicated for each chemical agent. *proteinspecific secondary cleavage; “” protocol independent secondary cleavage, observed in all protocols.

# proteins IDed 

Peptide average MW (kDa) 

Peptide av. charge state

Exp. vs. theor. site number

Protein sequence mapped, %

W (iodosobenzoic acid)

7

6.5

8.8

27/34

51.4

W (BNPS-Skatole)

7*

6.1

7.8

25/34

47.8

M (CNBr)

7

5.7

7.3

37/51

78.8

C (NTCB)

5

4.3

5.8

60/96

82.4

Targeted residue (cleaving agent) 

Secondary cleavages D, N QSN¦SKD“” D, N QSN¦SKD“” 26/34 of W, D QSN¦SKD“” KSC¦H¦TGL* KSC¦HTA* LLC¦G¦D¦NT* SSN¦YC¦N¦Q¦M* TK¦D¦RC¦K* QSN¦SKD“”

40  ACS Paragon Plus Environment

Page 41 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Scheme 1.

41  ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 42 of 49

Scheme 2.

42  ACS Paragon Plus Environment

Page 43 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 1.

43  ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 44 of 49

Figure 2.

44  ACS Paragon Plus Environment

Page 45 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 3.

45  ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 46 of 49

Figure 4.

46  ACS Paragon Plus Environment

Page 47 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 5.

47  ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 48 of 49

Figure 6.

48  ACS Paragon Plus Environment

Page 49 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

TOC figure 82x43mm (300 x 300 DPI)

ACS Paragon Plus Environment