Proteome-Derived Peptide Libraries - ACS Publications - American

Apr 17, 2018 - of the 15 most abundant ions (Top 15) using higher collision- induced dissociation (HCD) fragmentation with ... at the PSM level, Masco...
1 downloads 5 Views 2MB Size
Subscriber access provided by UNIV OF DURHAM

Protease specificity profiling in a pipette tip using “chargesynchronized” proteome-derived peptide libraries Minh T.N. Nguyen, Gerta Shema, René P. Zahedi, and Steven H.L. Verhelst J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.8b00004 • Publication Date (Web): 17 Apr 2018 Downloaded from http://pubs.acs.org on April 17, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Protease specificity profiling in a pipette tip using “charge-synchronized” proteome-derived peptide libraries Minh T. N. Nguyen‡, Gerta Shema‡, René P. Zahedi‡,†,| and Steven H. L. Verhelst*‡§



Leibniz-Institut für Analytische Wissenschaften – ISAS – e.V., Otto-Hahn-Str. 6b, 44227 Dortmund,

Germany †Gerald Bronfman Department of Oncology, Jewish General Hospital, McGill University, Montreal, Quebec H4A 3T2, Canada |Segal Cancer Proteomics Centre, Lady Davis Institute, Jewish General Hospital, McGill University , Montreal, Quebec H3T 1E2, Canada §

KU Leuven – University of Leuven, Laboratory of Chemical Biology, Department of Cellular and

Molecular Medicine, Herestraat 49 box 802, 3000 Leuven, Belgium

* [email protected], phone: +32 16-374517

1 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ABSTRACT About 2% of the genome of human and other organisms codes for proteases. An important step toward deciphering the biological function of a protease and designing inhibitors is the profiling of protease specificity. In this work we present a novel, label-free, proteomics-based protease specificity profiling method that only requires simple sample preparation steps. It uses proteome-derived peptide libraries and enriches the cleaved sequences using strong cation exchange chromatography (SCX) material in a pipette tip. As a demonstration of the method’s versatility, we successfully determined the specificity of GluC, caspase-3, chymotrypsin, MMP-1 and cathepsin G from several hundreds to almost 2000 cleavage events per protease. Interestingly, we also found a novel intrinsic preference of cathepsin G for Asn at the P1 subsite, which we confirmed using synthetic peptides. Overall, this method is straightforward and requires so far the lowest investment in material and equipment for protease specificity profiling. Therefore, we think it will be applicable in any biochemistry laboratory and promote an increased understanding of protease specificity.

KEYWORDS proteases, substrate specificity, ChaFRAtip, label free, tip-based fractionation

2 ACS Paragon Plus Environment

Page 2 of 34

Page 3 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

INTRODUCTION Proteases are involved in a wide range of biological processes, such as blood coagulation,1, 2 apoptosis,3 and immune response.4, 5 Dysregulation of proteolytic activity can lead to a variety of disorders including cancer,6 Alzheimer’s disease,7 Parkinson’s disease8 and atherosclerosis,9 making them attractive drug targets. Promising advances in protease-targeting drugs within the last decades do not only confirm that notion, but also indicate the enormous potential for further protease drug development.1, 10 There are several mechanisms to prevent proteases from indiscriminately cleaving proteins. The substrate specificity of a protease is one of the factors that govern which proteins are cleaved. Ultimately, the cleaved substrates determine the downstream effect of activating a particular protease. Hence, knowledge of the substrate specificity is an important step toward understanding the biological function of a protease. Substrate specificities can be utilized for the design of active-site targeting inhibitors, as well as for the development of important research tools, such as fluorogenic substrates and activity-based probes.11-14 Proteases recognize their substrates with pockets around the active site (called subsites), which are designated as S1-Sn at the N-terminal side of the scissile bond (non-prime side) and S1’-Sn’ at the C-terminal side of the scissile bond (prime side). The structures of these pockets allow interaction with certain amino acid side chains on the substrate, termed P sites, which have the same numbering as the subsites they occupy.15 Overall, the properties of the protease subsites lead to a narrow or broad substrate repertoire. For example, while both trypsin and thrombin cleave after arginine, trypsin has no selectivity at other subsites and digests virtually all sequences with arginine, in agreement with its role in food digestion; the substrate repertoire of thrombin, on the contrary, is much more limited by selectivity at other subsites additional to P1, in line with its involvement in blood coagulation – a procedure that requires strict regulation. There are several ways to reveal the substrate specificity of a protease. Databases such as MEROPS16 and CutDB17 catalog individual cleavage events, often from biochemical studies. Collectively, these cleavages draw a picture of the substrate specificity, provided that the protease is well studied and the database contains enough cleavage events. Another popular approach to determine protease specificities 3 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 34

is the use of synthetic peptide substrates, such as fluorogenic positional scanning libraries18, 19 Although this requires substantial synthetic efforts, it allows the incorporation of non-natural amino acids.20 A third approach involves proteomics-based methods.21 These methods have received increasing attention over the last decade due to the steady improvements in the sensitivity of mass spectrometry (MS) techniques. Currently, most proteomics-based studies focus on the identification of cleavage events at the protein level. They either apply some form of enrichment of neo N-termini generated by the protease, e.g. by selective biotinylation,22-24 by the use of isotope labels to distinguish substrates from non-substrates,25 or by separation of cleaved substrates from uncleaved ones by SDS-PAGE (as in PROTOMAP)26 or chromatography-based steps, as in the hydrophobicity-based COFRADIC (COmbined FRActional DIagonal Chromatography)27 or the charged-based ChaFRADIC (Charged-based FRActional DIagonal Chromatography).28 In a very recent report, Vidmar et al. developed a label-free method (DIPPS) for profiling of protease specificity using an in-gel digestion workflow.29 This method is straightforward, but is restricted to proteases that give multiple cleavages per protein in order to generate peptides that are small enough for gel extraction and MS analysis. Hence, this may not be applicable to all proteases. Peptide centric methods prevent above limitation. A peptide-centric proteomics-based method called PICS (Proteomic Identification of protease Cleavage Sites) introduced the concept of proteome-derived peptide libraries.30, 31 With limited amount of time and resources, it enables the generation of a peptide library containing a rich mixture of sequences from which the substrate specificity can be determined. The original PICS protocol utilizes biotinylation followed by affinity enrichment,30 whereas a recently published modified version skips enrichment and makes use of

isotope labeling by reductive

demethylation.31 In order to circumvent the usage of isotope labels or affinity tags, we here report on a novel, label-free, proteomics-based protease specificity profiling technique that utilizes “charge-synchronized” proteomederived peptide libraries. The enrichment of the cleaved sequences is based on ChaFRADIC,28 but it is here utilized in an easy pipette tip-based format (ChaFRAtip).32

4 ACS Paragon Plus Environment

It provides separation of cleaved

Page 5 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

peptides over uncleaved background peptides that otherwise compete for detection in MS experiments. Here, we provide validation of this novel method using the proteases GluC, caspase-3, chymotrypsin, MMP-1 and cathepsin G. The specificity consensus of these proteases was successfully determined from several hundreds to almost 2000 detected cleavage events per protease and was in good agreement with their reported specificities. We also found a novel intrinsic preference of cathepsin G for Asn at the P1 subsite, a finding that has been largely overlooked in previous studies on this protease.

EXPERIMENTAL SECTION In silico digest All protein sequences from E.coli K12 (Uniprot release in Juli 2015 with 4433 entries) were subjected to a Python script (name Theo_Dig_Dbase.py, Supplemental file S1) that performs a cut after every Arg, returning all possible ArgC-digested peptides. It then calculates the theoretical net charges (TNCs) of all peptides with a minimum length of 6 amino acids at pH 2.7, assuming that all N-termini and side chains of lysines are acetylated. Results from this in silico digest are shown in Supplemental Table S1. Generation of the proteome-derived peptide library Escherichia coli MG1655 was grown in Luria–Bertani (LB) medium to an OD600 of 0.7-1.0. Cells were lysed by ultrasonication in lysis buffer (100 mM HEPES pH 7.5, 10 mM EDTA with 1 mM PMSF and 10 µM E64 freshly added). Lysates were cleared from cell debris by centrifugation 30 min at 10000 g, 4°C. Supernatant was collected and subjected to ultra-centrifugation (100.000 g, 4 °C, 45 minutes). The resulting supernatant (soluble fraction) was further used in this work and the total protein concentration was determined using the bichinonic acid (BCA) assay (Thermo Scientific). For peptide library generation, typically 200-250 µg total protein was processed. Proteins were reduced with 5 mM DTT for 1 hour at 25 °C and subsequently carbamidomethylated by incubation with 20 mM IAA for 30 min at room temperature in the dark. Protein precipitation and subsequent resolubilization in 100 mM HEPES pH 8.0 and acetylation of free N-termini was carried out as described before.28 For digestion into peptides, acetonitrile and CaCl2 were added to final concentration of 10% (v/v) and 2 mM, respectively. 5 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Sequencing-grade trypsin (Promega, 1:20 (w:w) of enzyme:protein) was used to digest the sample at 37 °C overnight. From this step onwards, only low protein binding eppendorf tubes were used. Full digestion was verified using monolitihic reversed phase chromatography, as described previously.33 Ater digestion, the generated peptides were desalted using C18 solid-phase cartridges (Sep-Pak tC18 1cc, Waters) according to the manufacturer’s instructions. Peptides were dried under vacuum, followed by resolubilization in 100 mM HEPES pH 8.0 and acetylation in order to block newly generated free Ntermini. Next, a similar desalting step with a Sep-Pak cartridge was performed and the peptides were dissolved in SCX buffer A (10 mM KH2PO4, pH 2.7, 20% (v/v) ACN). The peptide library was then ready for the first separation based on the TNC state at pH 2.7. ChaFRAtip preparation The SCX separations were performed in a 200 µL pipette tip fitted with a cellulose frit (Whatman® Glass microfiber filters GF/F, 47mm diameter circles, Cat. No. 1825-047) and loaded with 25 µL of 0.2% strong-cation exchange chromatography material suspension (polysulfoethyl A in SCX buffer A). Typically, one ChaFRAtip was used to separate 60-100 µg peptide. First dimension of strong-cation exchange separation (SCX1) Two SCX buffers were used throughout the ChaFRAtip procedure: -

SCX buffer A (10 mM KH2PO4, 20% Acetonitrile, pH 2.7)

-

SCX buffer B (same as A with 250 mM KCl)

For every step, we describe the elution buffer as the percentage of buffer B in buffer A. The ChaFRAtip was equilibrated with 150 µL 0% SCX buffer B and centrifuged for 7 minutes at 1000 g. Next, the samples (in 0% SCX buffer B; volume 50 µL) were loaded onto the ChaFRAtip, centrifuged at 700-800 g for 7 minutes. The eluate was again loaded onto the tip and eluted by centrifugation, and this process was repeated once more to ensure binding to the SCX material. Next, the tip was washed with 100 µL 0% SCX buffer B.

The elution was pooled with the flowthrough and labeled as Fraction A, which

represented the fraction containing peptides with a TNC of +1. Fraction B (TNC +2) was eluted in 2 steps: by 100 µL 15% SCX buffer B followed by 100 µL 0% SCX buffer B. Since the peptides in our 6 ACS Paragon Plus Environment

Page 6 of 34

Page 7 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

library contained mostly peptides with TNC +1 and +2, we did not proceed further with elution steps for higher charge states. Fractions A and B were concentrated under vacuum until their volume was reduced by at least 2/3 before subjection to desalting using Oligo R3 reversed phase resin (Applied Biosystems) according to a previously described procedure.34 The synchronized peptide library was stored in 100 mM HEPES pH 8.0 at -20 °C until further usage. For quality control of the SCX1 fractions, typically, 100-300 ng was used for LC-MS/MS analysis. Digestion by test protease For each test protease, three biological replicates of the ChaFRAtip-based protease specificity profiling were performed. 20-30 µg of peptide library from fraction A and 5-10 µg from fraction B were used in each assay. For GluC (sequencing-grade P8100S, NEB), 1:10 (w:w) ratio of enzyme:peptide; 16 hours of digestion at 37 °C in 100 mM HEPES pH 8.0 were used. Caspase-3 digestion was performed under following conditions: caspase-3, expressed and purified as described35 was incubated at 1:10 (w:w) enzyme:peptide library for 3 hours at 37 °C in modified caspase buffer (100 mM HEPES pH 7.4, 100 mM NaCl, 1 mM EDTA, 10 mM DTT, 0.1% CHAPS and 10% (w/v) sucrose) as previously described36. Chymotrypsin (sequencing-grade C6423, Sigma) was incubated in 100 mM HEPES pH 8.0 and 10 mM CaCl2 with 1:100 (w:w) ratio of enzyme:peptide at 30 °C for 5 hours. For cathepsin G (BML-SE2830100, Enzo Life Sciences), conditions of 1:25 (w:w) ratio of enzyme:peptide; 3 hours of digestion at 37 °C in PBS pH 7.2 were used. MMP-1 digestion was performed under following conditions: the protease (SRP3117 (Sigma)) was incubated at 1:100 (w:w) enzyme:peptide library for 3 hours at 37 °C in digestion buffer (50 mM Hepes PH 7.5, 150 mM NaCl, 5 mM CaCl2 and 0.5 mM ZnCl2). At the end of the incubation, protease activity was quenched by adding formic acid to a final concentration of 2% and the sample was kept at -20 °C until further processing. Samples were desalted using Oligo R3 reversedphase resin as described in the previous section and resolubilized with SCX buffer A. For each tested protease, three biological replicates were performed. Second dimension of strong-cation exchange separation (SCX2)

7 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 34

The equilibration and sample loading of the ChaFRAtip were performed similarly as described above for the SCX1 dimension. Fractional elution of the ChaFRAtip was performed by centrifugation for 7 min at 1000 g or longer if needed. Elution steps are described in table 1. Table 1. ChaFRAtip elution scheme during the SCX2 step Digested fraction A Elution

Conditions

Digested Fraction B Action

Elution

fraction

Conditions

Action

Concentrate

fraction

Fraction A0

Flowthrough

Discard

Fraction B0

Flowthrough

Fraction A1

185 µL 0% SCX Discard

Fraction B1

150 µL 0% SCX Concentrate

buffer B Fraction A2

buffer B

200 µL 15% SCX Pool buffer B

and Fraction B2

concentrate

150 µL 15% SCX Discard buffer B

200 µL 0% SCX

150 µL 0% SCX

buffer B

buffer B

150 µL 20% SCX buffer B Fraction B3

150 µL 35% SCX Pool buffer B 150 µL 0% SCX buffer B 150 µL 40% SCX buffer B

8 ACS Paragon Plus Environment

and

concentrate

Page 9 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

The four elution fractions A2, B0, B1 and B3 were collected, concentrated under vacuum, desalted by Oligo R3 reversed-phase resin as described above, and finally resolubilized in 100 µL 0.1% TFA in water for MS analysis. In vitro cleavage assay of cathepsin G Peptides based on cathepsin G specificity profiling experiments were synthesized by our in-house technical service using an automated solid-phase peptide synthesis system (Biotage Syro I). 1 nmol of each peptide was incubated either in PBS as digestion buffer alone or PBS with cathepsin G (1:25 (w:w) ratio of enzyme:peptide) in a reaction volume of 10 µL. The mixtures were placed 3 hours at 37 °C and desalted using Oligo R3 reversed-phase resin as described above before subjection to MS analysis. LC-MS/MS For protease profiling experiments Approximately 100-300 ng of total peptides from SCX1 fractions and 5µL of the 100 µL resolubilized SCX2 fractions were analyzed by nano-LC-MS/MS, using a Q-Exactive plus mass spectrometer (Thermo Scientific) coupled to a Dionex Ultimate 3000 nano RSLC (Thermo Scientific). For peptide separation, elution was performed with a flow rate of 250 nL/min and a gradient of increasing ACN concentration (2.5-50% over 60 min, with 0.1% formic acid). The analytical column was an Acclaim PepMap column (Thermo Scientific 164570), 3 µm particle size, 50 cm length, and 75 µm inner diameter. The mass spectrometer operated in data dependent acquisition mode with an m/z range of 300 to 2000, followed by MS/MS of the 15 most abundant ions (Top 15) using higher collision-induced dissociation (HCD) fragmentation with normalized collision energy of 27. The polysiloxane ion at m/z 371.1012 was used as lock mass. MS/MS spectra were acquired at a resolution of 15000 and a dynamic exclusion time of 20 s. Isolation window for MS/MS was 1.2 m/z. Automated gain control (AGC) target values and fill times were set to 1 × 106 and 120 ms for MS and 5 × 104 and 250 ms for MS/MS, respectively, with a minimum intensity threshold of 1.5 × 104. In vitro cleavage experiments of synthetic peptides by cathepsin G

9 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Digestion reactions after desalting were resolubilized in 50 µL 0.1% TFA, from which 0.1% was injected onto an LC-MS/MS system (Ultimate 3000 nano RSLC system coupled with Orbitrap mass spectrometer. A shorter gradient of increasing ACN concentration was used for peptide separation (2.5-50% over 35 min, with 0.1% formic acid). The analytical column was the same as above. MS survey scans were acquired in the Orbitrap from m/z 300 to 1500 at a resolution of 60,000 using the polysiloxane ion at m/z 371.101236 as lock mass. The five most intense signals were subjected to collision induced dissociation (CID) in the ion trap, taking into account a dynamic exclusion of 8 s. CID spectra were acquired with normalized collision energy of 35 %. AGC target values were set to 106 for MS1 and 104 for ion trap MS2 scans, and maximum injection times were set to 100 ms for both full MS and MS2 scans. Peptide identification was analyzed in Xcalibur by manual inspection of MS1 and MS2 data. Search engine settings Raw MS data were searched against an Escherichia coli Uniprot database (July 2015 release; 4433 forward entries). For peptide identification, Proteome Discoverer software version 1.4 (Thermo Scientific) was used with Mascot 2.4 (Matrix Science) as search engine and Target Decoy peptide spectrum match (PSM) Validator. Mascot search parameters were specified as follows: none as enzyme specificity; Lys-acetylation as fixed modification; peptide N-termini acetylation as fixed for SCX1 samples and as dynamic modification for SCX2 samples. In both cases, carbamidomethylation of Cys was selected as fixed and oxidation of methionine as variable modification. Mass tolerances were set to 20 ppm for MS and 0.02 Da for MS/MS. Identified peptides were filtered for high confidence corresponding to an FDR 1% at the PSM level, Mascot ion score threshold of 20 and a search engine rank of 1. The datasets can be accessed via ProteomeXchange (PXD007556)37 (Reviewer username: [email protected]; password: z1HsLhJv). Data filter settings Protein & peptide identification result files from Proteome Discoverer were exported to Excel. TNCs at pH 2.7 of high confidence PSMs were calculated based on the identified peptide sequences. The data were converted into csv format for further use in our Python scripts (Supplemental File S2 and S3, name 10 ACS Paragon Plus Environment

Page 10 of 34

Page 11 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

cleavage_seq_selection.py and cs_sel_exe.py). In brief, the first Python script performs the following steps (see also Figure 1C and the result section): 1. Combine all identified peptides in the MS data set of the 4 analyzed SCX2 fractions into a nonredundant list and attach the information about their corresponding elution fractions. 2. Sort the peptides into prime and non-prime: if a peptide ends with Arg, it is sorted to prime, if not, to non-prime. 3. a) Remove from non-prime list: protein C-terminal peptides b) Remove from prime list: protein N-terminal peptides and full ArgC peptides 4. a) Remove from non-prime list peptides that have TNC equal to or higher than 2; remove from prime list peptides that have TNC lower than 2 b) Remove from non-prime list peptides eluted in fraction A2 or B3; remove from prime list peptides eluted in fraction B0 and B1 5. Remove from prime list peptides having acetylated N-termini 6. Remove from both lists peptides with a length of less than 6 amino acids 7. Reconstitute cleavage sequences from the prime and non-prime list by searching the E. coli protein database and combine into a non-redundant list of sequences. All of these steps are defined as Python functions in the first file. The second Python file serves as an executer file to call for these functions. Names of input and output files can be modified at users’ convenience. Each MS result (number of cleavage sequences, % correct cleavages, charge separation, etc.) was calculated as an average from three biological replicates. Consensus sequence determination iceLogo representations38 of protease specificity were based on total non-redundant cleavage sequences from each single experiment. To determine the substrate consensus sequence, the list of non-redundant cleavage sequences was submitted to iceLogo using the E.coli (strain K12) proteome as reference set. In case of a special peptide library generated from e.g. secretome, membrane proteome or organelle-specific 11 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

proteome, it is possible to directly use the peptide dataset from SCX1 MS measurements as reference set in iceLogo instead. As an example, we remade consensus sequences for the five test proteases using a reference set consisting of all non-redundant peptides from our own SCX1 data sets (Supplemental Figure S3). The iceLogo consensus sequences show little (if any) differences to the sequences we presented later in Figure 3 using E.coli K12 reference set. In all iceLogos, only amino acids are shown that are significantly (P < 0.05) enriched or disfavored at a certain position compared with the average occurrence in the proteome library at p-value 0.05 or 95% confidence interval. For every position, an amino acid will be considered as regulated if its Z-score is not part of the 95% confidence interval, with Z-score calculated as how many times its frequency (X) deviates from the mean µ (the frequency of a specific amino acid on a specific position in the reference set) in terms of standard deviations.39 The size of an amino acid reflects the difference in the frequency of an amino acid in the experimental versus the reference set. The amino acid color code in iceLogo was chosen as default. Amino acids that have not been identified, are colored in pink. As a reference for our experiments, correct cleavage sequences are defined according to the MEROPS database and as following: P1 (E or D) for GluC (MEROPS ID S01.269) ; P1 (D) for caspase-3 (MEROPS ID C14.003) and P1 (L, F, Y, M, H and W) for chymotrypsin (MEROPS ID S01.001). For cathepsin G, P1 (L, F, Y, M, H and N) based on both MEROPS (ID S01.133) and our experimental data. For MMP1, P1’ (L, I, M) based on both MEROPS (ID M10.001) and Eckhard et al., 2016.40

12 ACS Paragon Plus Environment

Page 12 of 34

Page 13 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

RESULTS ChaFRAtip workflow with proteome-derived peptide libraries – Our protease specificity profiling method is based on three main principles: (i) the theoretical net charge (TNC) of a peptide at pH 2.7 depends on its N-terminus and the presence of (generally a few) positively charged amino acid residues (lysine, arginine and histidine); (ii) the daughter peptides resulting from proteolysis of the peptide library by a protease of interest will have a different charge state than the parental peptides because of the generated neo N-termini and/or elimination of positively charged residues (see Figure 1B); (iii) peptides with different charge states can be efficiently separated using ChaFRAtip (Figure 1A and B).

Figure 1. Detailed workflow for protease specificity profiling using ChaFRAtip and “chargedsynchronized” proteome-derived peptide libraries. (A) Generation of a “charge-synchronized” peptide library by a first dimension of separation (SCX1). Acetylation of lysine residues and N-termini, together with the ArgC cleavage specificity, leads to a proteome-derived peptide library that is composed of 13 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

mostly peptides with a total net charge of 1+ and 2+ (at pH 2.7), which can be separated in a tip with SCX material. (B) Digestion by a protease of interest is followed by a second dimension of separation (SCX2) and subsequent MS analysis. Transformation of 1+ and 2+ peptides into cleavage fragments yields daughter peptides with a different charge state compared with their parental peptides. This charge shift of daughter peptides enables separation from the large background of uncleaved peptides. (C) Data analysis workflow. In short, identified peptides are sorted into prime or non-prime according to their C-terminal amino acid residue. Subsequent peptide filters are applied to exclude uncleaved background and other contaminating peptides. Finally, the prime and non-prime peptides identified by MS are reconstituted to the parental sequences prior to the cleavage (called “cleavage sequences”). MS-identified peptides are depicted in bold black, while the sequences preceding and following them in the protein sequence are colored in gray.

A ‘synchronized’ peptide library – To maximize the possibility of charge-based separation between parental peptides and cleavage fragments, while also staying in the best ChaFRAtip separation range, we generated a proteome-derived peptide library with “synchronized” charge states. To this end, we acetylated the soluble E. coli proteome before and after trypsin digestion: first of all, this leads to ArgC cleavage specificity, generating longer peptides, which should prevent that the test protease will generate peptides that are too small for MS analysis. Second, acetylation is a straightforward modification that can be performed in high efficiency with inexpensive reagents. Third and most importantly, this treatment results in a peptide library in which the majority of the peptides (86%) possess a 1+ or 2+ TNC at pH 2.7: peptides with one arginine residue, peptides with one arginine and one histidine residue or mis-cleaved peptides with two arginine residues, as shown by an in silico digest of the complete E. coli proteome (Figure 2A and Supplemental Table S1). Conveniently, the peptides generated this way can be separated into fractions enriched with a 1+ or 2+ charge state by loading the peptide mixture on SCX-material in a pipette tip and eluting with an increasing potassium concentration (Figure 1A). As a considerable portion

14 ACS Paragon Plus Environment

Page 14 of 34

Page 15 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

of peptides with a 1+ TNC elutes in the flow-through from sample loading, we combined the flowthrough and the first elution together in Fraction A. Fortunately, uncharged peptides only constitute a very small fraction of the total peptide library (3%) according to the in silico digest (Supplemental Table S1). Hence, Fraction A will mostly consist of 1+ peptides. Experimental data from MS analysis of three biological replicates of SCX1 fractions confirmed this prediction: 91% of peptides in Fraction A have a TNC of 1+, whereas only small percentages of peptides have a charge of 0 (4%), 2+ (4%) and 3+ (1%). It must be noted that these TNC values do not reflect the charge states detected during LC-MS. We observed a similar scenario for the second elution (Fraction B). Here, 80% of the peptides had a 2+ TNC, whereas 1+ charged peptides made up 17%. Uncharged and 3+ charged peptides constituted 1% and 2%, respectively (Figure 2A). Additional fractions were not collected, as peptides of a higher charge state (containing more than 1 histidine residue) represent less than 12% of the digested and acetylated proteome (Supplemental Table S1) and therefore would not substantially increase the amount of protease cleavage sites. Importantly, the characteristics of the peptide library and charge fractions in SCX1 are independent of the protease to be tested and hence, can be prepared once for multiple uses. Peptide library digestion by proteases – Treating fractions A and B with a protease of interest will cleave peptides sequences matching the protease substrate specificity. It generates two daughter peptides, one of which with a free neo-N-terminus with a positive charge. This charge increase of the cleaved substrate peptide is the core principle how newly-generated peptides can be selectively enriched from the background of uncleaved peptides (Figure 1B). In fraction A, all prime peptides have a 2+ charge and can therefore be separated from the uncharged non-prime peptides and the 1+ uncleaved peptides. In fraction B (mainly peptides with a His residue), two separate cases need to be considered. In the first case, a histidine residue locates C-terminally to the cleavage site, and proteolysis leads to one uncharged and one 3+ daughter peptide. Both of these peptides can be separated from the 2+ uncleaved peptides. In the second case, a histidine residue locates N-terminally to the cleavage site, resulting in a 1+ daughter peptide (non-prime peptide) and a 2+ daughter peptide (prime peptide). Here, only the non-prime peptide will be separated from the background of non-substrate peptides (see Table 2). The four fractions from 15 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 34

SCX2, called fraction A2, B0, B1 and B3 are individually analyzed by MS to maximize peptide identification (Figure 1B and Table 2).

Table 2: Predicted parameters of selected SCX2 fractions Predicted parameter

Fraction A2 H3N+

Arg+

Fraction B0 AcHN

Fraction B1 AcHN

His+

Fraction B3 H3N+

His+

Prime or non-prime?

Prime

Non-prime

Non-prime

Prime

Theoretical net charge

+2

0 and +1

+1

+3

Arg+

(TNC)

Peptide filter – Ideally, in the four collected fractions, there would be only prime and non-prime peptides (i.e. cleavage products by the protease of interest). These two types of peptides can be distinguished from each other by the C-terminal amino acid: if the peptide sequence ends with Arg, it should be a prime peptide; otherwise, a non-prime peptide. However, peptides derived from native protein N-termini and uncleaved peptides that have slipped through the SCX selection procedure also have Arg at the end of their sequences and may contaminate the collected fractions. In addition, uncleaved peptides derived from protein C-termini could act as false non-prime peptides. We therefore apply several bioinformatics filtering steps, programmed into a straightforward Python script, in order to remove these contaminants from the list of identfied peptides (Figure 1C; see experimental section for details). In the end, the script combines the list of prime and non-prime peptides and looks up their counterparts from the database of E. coli protein sequences. This restores the sequence information of the parental peptides that underwent proteolysis by the protease of interest (the “cleavage sequences”). The list of non-redundant cleavage sequences is then submitted to iceLogo38 to generate a consensus sequence that represents the protease specificity profile.

16 ACS Paragon Plus Environment

Page 17 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 2: Monitoring the experimental workflow. (A) Charge distribution of fraction A and B in the first dimension of separation, compared with that of an in silico ArgC digestion. Error bars indicate the standard deviation of three independent experiments. (B), (C) Charge distribution and fragment distribution of fractions A2, B0, B1 and B3 in the second dimension of separation (SCX2) from specificity profiling experiments using GluC as a test protease. Error bars indicate the standard deviation of three independent experiments. (D) Contributions of fractions A2, B0, B1 and B3 to the total cleavage sequences identified in a typical GluC-specificity profiling experiment.

Initial proof-of-concept with GluC – In triplicate experiments, we used GluC as the first protease to evaluate the method and data analysis strategy. To our delight, the charge distribution in the SCX2 fractions was in good agreement with the expectations from the method design: 87% of peptides in fraction A2 were in the correct charge state. 88%, 80% and 86% were correct in fractions B0, B1 and B3, respectively (Figure 2B). As predicted, the non-prime peptides constituted the majority in fractions B0 and B1 (>89%), whereas the prime peptides were the major part of fractions A2 and B3 (>94%) (Figure 2C). Each fraction contributed differently to the total identified cleavage events: the largest contribution 17 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 34

came from fraction A2 (more than 65%), followed by B0 and B3 (14-16%); fraction B3 had the most limited contribution, but still accounted for up to 6% of total cleavage sites (Figure 2D). Overall, from only 25 µg of the peptide library and 4 MS analyses of total 9 hours of measurement, we identified 1504 ± 183 GluC cleavage sequences with an average of 95% accuracy (judged according to the number of correct P1 residues). As shown in Figure 3A, the iceLogo generated from these cleavage sequences clearly depicts the known specificity of GluC including the several-magnitude favored Glu over Asp in the P1 position. Substrate specificity profiling of other proteases – To further assess the applicability of our protease specificity profiling method, we chose the well-characterized proteases caspase-3, chymotrypsin, MMP-1 and cathepsin G. For each protease, the adapted ChaFRAtip strategy successfully resulted in several hundreds to almost 2000 identified cleavage events. The resulting consensus sequences also matched well with the prototypical specificity of the respective proteases (Figure 3). Experiments with caspase-3 resulted in the detection of approximately 650 non-redundant cleavage events. The dominant preference for Asp in the P1 position is strongly supported by approximately 92% of the identified prime and non-prime peptides. The preference at other subsites, P4, P3, P2 and P1’, as reported in previous studies,41-44 was also neatly confirmed (Figure 3B), providing a strong evidence that the strategy in this work can substitute combinatorial peptide libraries without compromise.45,

46

For

chymotrypsin, more than 1000 detected cleavage sequences showed the well-known P1 specificity for large hydrophobic amino acids (Tyr, Trp, Phe, Leu, Met) as well as His,47 with an average of 83% correct cleavage sites. Moreover, a slight preference for Pro at P2 was found, in line with previous research48, 49 and with commercial peptide substrates (Figure 3C). To further challenge our method, we applied the workflow for matrix metalloproteinase-1 (MMP-1). Using only 25 µg peptide library, the resulted digestion reactions yielded more than 1300 cleavage sites from three replicates. The consensus sequence generated from these cleavage sites replicated the MMP-1 specificity profiled from more than 300 cleavage sites using PICS40 almost identically. The experiments confirmed the preference for Pro and small aliphatic residues in P3, Gln and larger residues in P2, as well as the primary specificity 18 ACS Paragon Plus Environment

Page 19 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

determinant of P1’Leu together with Ile and to lesser extent Met (Figure 3D). Note that in our results, Lys also came up as a preferred amino acid at P1’ – an unsurprising observation as in our workflow all lysine residues are acetylated and share a similar structure with methionine, which also appears in this position as a preferred residue. Overall, it highlights the efficiency of our method in using low amounts of sample while resulting in high amounts of cleavage sites even for proteases with a looser substrate specificity, such as MMP-1. Experiments with cathepsin G, a serine protease involved in antigen processing, resulted in almost 2000 cleavage events. We found the reported preference for hydrophobic amino acid residues at the P1 position and the partial propensity for Pro at P2 and Glu at P3 (Figure 3D 4A).50-53 The generated iceLogo also revealed that cathepsin G exhibits further intricate subsite specificity beyond P1-P3. In particular, cathepsin G preferentially accommodates mostly small aliphatic residues such as Ile, Ala and Ser at the S1’ and the acidic residues Asp and Glu at the S2’ subsite (Figure 4A). This is supported by previous reports using synthetic peptides whose sequences fit to this subsite preference.50, 51, 53

19 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3: Specificity profiling of four proteases (GluC, caspase-3, chymotrypsin and MMP-1), represented by iceLogo using the identified cleavage sequences by our ChaFRAtip method with the Escherichia coli (strain K12) proteome as reference set. The iceLogos represent the P6 to P6′ sites, according to Schlechter und Berger nomenclature. Amino acids that are most frequently observed (above axis) and least frequently observed (below axis) are illustrated. The number of cleavage sites used to make each iceLogo is listed together with each protease name. In all iceLogos, only significant amino acids (P < 0.05) are shown. The size of an amino acid reflects the difference in the frequency of an amino acid in the experimental versus the reference set.

20 ACS Paragon Plus Environment

Page 20 of 34

Page 21 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Intriguingly, we characterized an unprecedented preference at the P1 site for Asn, which contributed to almost 10% of all identified cleavage sequences, indicating that this may be a bona fide cleavage event. In order to prove that peptides with a P1 Asn can indeed be cleaved by cathepsin G, we synthesized several peptides based on the identified consensus sequence: one control peptide based on the overall consensus for cathepsin G (peptide D11), one in which the P1 position was replaced by an Asn (peptide E11), and one peptide exclusively based on the consensus of the cleavage sequences with Asn at P1 (peptide F11) (see Figure 4A lower panel for the iceLogo of this subset of cleavage sequences). Results from an in vitro cleavage experiment showed that cathepsin G was able to process all three synthetic peptides (Figure 4B and Supplemental Figure S2). Interestingly, peptide F11 was most efficiently cleaved, validating this newly identified preference for Asn at P1 as a genuine characteristic of cathepsin G (Figure 4).

Figure 4: Validation of an intrinsic preference of cathepsin G for Asn at the P1 subsite. (A) Specificity profiling of cathepsin G, represented by iceLogo using all the identified cleavage sequences (N = 1792, upper) by our ChaFRAtip method with the Escherichia coli (strain K12) proteome as reference set versus the iceLogo represents specificity profiling using all identified cleavage sequences with P1 Asn in the same experiment (N = 146, lower). The disfavored amino acids in the latter iceLogo were left out for 21 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

simplification. (B), (C) The peptide (F11, sequence: IDEFEVN↓SDGDIAR) was analyzed by LC-MS/MS after 3 hours incubation with cathepsin G (E:S = 1:25, w:w) or without protease as control. Identification of full-length peptide and its nonprime cleavage product were confirmed by correct m/z in MS1 (right panels) and MS2 fragmentation (data not shown) from corresponding TIC peaks (left panels, boxed in rectangle).

DISCUSSION In this study, we described a straightforward, label-free method for protease specificity profiling, using “charge-synchronized” proteome-derived peptide libraries and a tailored ChaFRAtip procedure. The first part of the method – generation of a “charge-synchronized” peptide library – is a critical milestone in the whole procedure, because it decides whether the test protease is provided with a rich and unbiased mixture of peptides. Owing to the transparent principle of the tip-based separation method, it is not only straightforward to produce different charge fractions, but it is also possible to check the quality of the fractions, which helps minimizing the risk of proceeding with a poor quality library. We here chose to acetylate N-termini and side chains of Lys. This generates only two easy-to-analyze library fractions (i.e. those with a TNC of +1 and +2), which minimizes fractionation and instrument time. Obviously, this procedure introduces the limitation that no basic amino acids are located in the middle of the cleavage sequences, as lysines are acetylated and arginines are only present at the C-terminal end due to the ArgC digest. Hence, proteases with a requirement for arginine or lysine around the scissile bond, cannot be analyzed with the here described method. This limitation may be overcome by omitting the acetylation steps. Without acetylation, an ArgC digest of the E. coli proteome contains approximately 79% peptides with a TNC between 2+ and 4+ (which is well within the ChaFRAtip separation range)32 (Supplemental Table S2). Obviously, as there are more fractions to be analyzed, this procedure leads to an extended ChaFRAtip separation procedure and longer LC-MS/MS analysis times (Supplemental Table S2 and Supplemental Figure S1).

22 ACS Paragon Plus Environment

Page 22 of 34

Page 23 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

With the here described “charge-synchronized” peptide libraries of 1+ and 2+, the lack of Arg and Lys in the middle of cleavage sequences did not interfere with the overall specificity profile of the five proteases studied here. Noticeable is the case of human cathepsin G: This protease does not only have a chymotryptic-like activity, but acquired trypsin-like specificity during primate evolution.52 Naturally, we did not find Arg or Lys in the P1 position. Nevertheless, it did not prevent the generation of valuable information: the consensus sequence from our experiments fits very well with the sequence of a reported cathepsin G fluorogenic peptide substrate that is selective over two closely-related proteases (chymase and β2-tryptase).53 Although the tip-based SCX fractionation does not give perfect charge state separations, the bioinformatics workflow efficiently filters out false positive peptides that do not represent protease cleavage site. We provide a straightforward Python script with the possibility to generate several intermediate outputs, so that experimenters have full access to results of each selection or filtering step. Overall, this results in an easy and very robust method. Even from low sample amounts, it gives rise to detection of hundreds of cleavage events. The high amount of cleavage sites partially originates from a remarkable feature of the strategy itself: not only peptides with new N-termini (prime peptides), as in most “N-terminomics” methods, but also of non-prime peptides (originating from library fraction B) are identified. For example, for GluC more than 20% of total identified cleavage sequences were contributed by the identification of non-prime peptides (Figure D). Except for the recently reported DIPPS29 and a recent modified PICS method,31 there is no reported method for protease specificity profiling that targets both prime and non-prime fragment peptides in one single experiment and is able to separate them without the need for a labeling technique. The here described ChaFRAtip method achieves this without complicated efforts on sample preparation and analysis. It therefore provides access to peptides that would have been missed by many other terminomics methods. The resulting specificity profiles of four well-studied proteases match well with known consensus sequences and provide firm evidence that the method is effective and applicable to various proteases. 23 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Strikingly, the results revealed that cathepsin G is also able to cleave after Asn, a bona fide preference that has stayed unnoticed in previous specificity studies. So far, only one study has reported a substrate of human cathepsin G with a P1 Asn: P. aeruginosa flagellin.54 Interestingly, when we replaced the P1 subsite in the general consensus sequence by Asn (peptide E11), this peptide was less efficiently cleaved by cathepsin G than the peptide based on consensus sequence derived from only cleavage sequences containing P1 Asn (peptide F11). This clearly points to a subsite cooperativity in cathepsin G.

CONCLUSIONS Overall, our ChaFRAtip method with charged synchronized proteome derived peptide libraries represents a simple, label-free protease specificity profiling technique that can be readily performed in any biochemistry or sample preparation lab, as it only requires tip-based fractionation. The simplicity and the associated lower amount of investment in material and equipment compared with similar methods come without compromise on final efficiency. We therefore expect that this method can find wide application and will facilitate an increased understanding of protease specificity, which may be used for development of peptide-based substrates, inhibitors and chemical probes.

ASSOCIATED CONTENT Supporting Information: The following supporting information is available free of charge at ACS website http://pubs.acs.org Supplemental Figure S1: Detailed workflow for ChaFRAtip specificity profiling without acetylation step. Supplemental Figure S2: Cleavage of synthetic peptides by cathepsin G. Supplemental Table S1: Charge distribution of an in silico ArgC digested E coli proteome with acetylation of all amino groups. Supplemental Table S2: Charge distrution of an in silico ArgC digested E coli proteome without acetylation. (PDF) Supplemental File S1: python script for in silico ArgC digestion of a proteome Supplemental File S2: python script to apply the data filters 24 ACS Paragon Plus Environment

Page 24 of 34

Page 25 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Supplemental File S3: python script file that serves as an executer file to call for the functions in the previous script

25 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

AUTHOR INFORMATION Corresponding Author *Tel: +32-16-374517. E-mail: [email protected]

Author contributions: MTNN, GS, RPZ and SHLV designed research, MTNN performed research, MTNN and GS contributed new analytic tools, MTNN and SHLV analyzed data, MTNN and SHLV wrote the paper, RPZ and SHLV supervised research.

Notes The authors declare no competing interests.

ACKNOWLEDGMENT This study was supported by the Deutsche Forschungsgemeinschaft (DFG; VE 502/4-1), the Ministerium für Innovation, Wissenschaft und Forschung des Landes Nordrhein-Westfalen, the Senatsverwaltung für Technologie und Forschung des Landes Berlin, the Bundesministerium für Bildung und Forschung, and KU Leuven (C12/16/020). We also thank Dr. D. Kopczynski for help with the optimization of Python scripts and the Technical Service Bioanalytics of Leibniz Institute for Analytical Sciences ISAS for peptide synthesis.

ABBREVIATIONS AGC: Automated gain control; BCA assay: bicinchoninic acid assay; ChaFRADIC: Charged-based FRActional DIagonal Chromatography; ChaFRAtip: Charge-based fractional diagonal chromatography in tip format; COFRADIC (COmbined FRActional DIagonal Chromatography); DIPPS: direct in-gel profiling of protease specificity; FDR: false discovery rate; HCD: higher collision-induced dissociation; PICS: Proteomic identification of protease cleavage sites; PSM: Peptide spectrum match; SCX: strongcation-exchange chromatography; TNC: theoretical net charge 26 ACS Paragon Plus Environment

Page 26 of 34

Page 27 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Reference List 1. Turk, B., Targeting proteases: successes, failures and future prospects. Nat. Rev. Drug Discovery 2006, 5, (9), 785-99. 2. Loof, T. G. et al., Coagulation, an ancestral serine protease cascade, exerts a novel function in early immune defense. Blood 2011, 118, (9), 2589-98. 3. Salvesen, G. S., Caspases and apoptosis. Essays Biochem. 2002, 38, 9 - 19. 4. Villadangos, J. A.and Ploegh, H. L., Proteolytsis in MHC Class II Antigen presentation: Who's in Charge? . Immunity 2000, 12, 233 - 239. 5. Sarma, J. V.and Ward, P. A., The complement system. Cell Tissue Res. 2011, 343, (1), 227-35. 6. Olson, O. C.and Joyce, J. A., Cysteine cathepsin proteases: regulators of cancer progression and therapeutic response. Nat. Rev. Cancer 2015, 15, (12), 712-29. 7. De Strooper, B., Proteases and proteolysis in Alzheimer disease: a multifactorial view on the disease process. Physiol. Rev. 2010, 90, (2), 465-94. 8. Martinelli, P.and Rugarli, E. I., Emerging roles of mitochondrial proteases in neurodegeneration. Biochim. Biophys. Acta 2010, 1797, (1), 1-10. 9. Weiss-Sadan, T. et al., Cysteine proteases in atherosclerosis. FEBS J. 2017, 284, (10), 1455-1472. 10. Drag, M.and Salvesen, G. S., Emerging principles in protease-based drug discovery. Nat. Rev. Drug Discovery 2010, 9, (9), 690-701. 11. Jeffery, A. D.and Bogyo, M., Chemical proteomics and its application to drug discovery. Curr. Opin. Biotechnol. 2003, 14, 87 - 95. 12. Poreba, M. et al., Caspase substrates and inhibitors. Cold Spring Harbor Perspect. Biol. 2013, 5, (8), a008680. 13. Waldner, B. J. et al., Protease Inhibitors in View of Peptide Substrate Databases. J. Chem. Inf. Model. 2016, 56, (6), 1228-35. 14. Kasperkiewicz, P. et al., Emerging challenges in the design of selective substrates, inhibitors and activity-based probes for indistinguishable proteases. FEBS J. 2017, 284, (10), 1518-1539. 15. Schlechter, I.and Berger, A., On the size of the active site in proteases. I. Papain. Biochem. Biophys. Res. Commun. 1967, 27, (2), 157 - 162. 16. Rawlings, N. D. et al., Twenty years of the MEROPS database of proteolytic enzymes, their substrates and inhibitors. Nucleic Acids Res. 2016, 44, (D1), D343-50. 17. Igarashi, Y. et al., CutDB: a proteolytic event database. Nucleic Acids Res. 2007, 35, D546-9. 18. Thornberry, N. A. et al., A Combinatorial Approach Defines Specificities of Members of the Caspase Family and Granzyme B: Functional relationships established for key mediators of apoptosis. J. Biol. Chem. 272, (July 18), 17907 - 911. 19. O'Donoghue, A. J. et al., Global identification of peptidase specificity by multiplex substrate profiling. Nature methods 2012, 9, (11), 1095-100. 20. Rut, W. et al., Recent advances and concepts in substrate specificity determination of proteases using tailored libraries of fluorogenic substrates with unnatural amino acids. Biol. Chem. 2015, 396, (4), 329-37. 21. Vizovisek, M. et al., Current trends and challenges in proteomic identification of protease substrates. Biochimie 2016, 122, 77-87. 22. Timmer, J. C. et al., Profiling constitutive proteolytic events in vivo. The Biochemical journal 2007, 407, (1), 41-8. 23. Timmer, J. C. et al., Structural and kinetic determinants of protease substrates. Nat. Struct. Mol. Biol. 2009, 16, (10), 1101-8. 24. Mahrus, S. et al., Global sequencing of proteolytic cleavage sites in apoptosis by specific labeling of protein N termini. Cell 2008, 134, (5), 866-76. 27 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

25. Kleifeld, O. et al., Isotopic labeling of terminal amines in complex samples identifies protein Ntermini and protease cleavage products. Nat. Biotechnol. 2010, 28, (3), 281-8. 26. Dix, M. M. et al., Global mapping of the topography and magnitude of proteolytic events in apoptosis. Cell 2008, 134, (4), 679-91. 27. Staes, A. et al., Selecting protein N-terminal peptides by combined fractional diagonal chromatography. Nat. Protoc. 2011, 6, (8), 1130-41. 28. Venne, A. S. et al., Novel highly sensitive, specific, and straightforward strategy for comprehensive N-terminal proteomics reveals unknown substrates of the mitochondrial peptidase Icp55. J. Proteome Res. 2013, 12, (9), 3823-30. 29. Vidmar, R. et al., Protease cleavage site fingerprinting by label-free in-gel degradomics reveals pH-dependent specificity switch of legumain. EMBO 2017. 30. Schilling, O.and Overall, C. M., Proteome-derived, database-searchable peptide libraries for identifying protease cleavage sites. Nat. Biotechnol. 2008, 26, (6), 685-94. 31. Biniossek, M. L. et al., Identification of protease specificity by combining proteome-derived peptide libraries and quantitative proteomics. Mol. Cell. Proteomics 2016, 15.7, 2515 - 2524. 32. Shema, G. et al., Simple, scalable and ultra-sensitive tip-based identification of protease substrates: novel insights into the dynamics of apoptosis. Mol. Cell. Proteomics 2018. 33. Burkhart, J. M. et al., Systematic and quantitative comparison of digest efficiency and specificity reveals the impact of trypsin quality on MS-based proteomics. J. Proteomics 2012, 75, (4), 1454-62. 34. Venne, A. S. et al., An improved workflow for quantitative N-terminal charge-based fractional diagonal chromatography (ChaFRADIC) to study proteolytic events in Arabidopsis thaliana. Proteomics 2015, 15, (14), 2458-69. 35. Stennicke, H. R.and Salvesen, G. S., Caspases: Preparation & Characterization. Methods 1999, 17, 313 - 319. 36. Stennicke, H. R. S., G. S., Biochemical Characteristics of Caspases-3, -6, -7, and -8. J. Biol. Chem. 1997, 272, (42), 25719 - 23. 37. Vizcaino, J. A. et al., ProteomeXchange provides globally coordinated proteomics data submission and dissemination. Nat. Biotechnol. 2014, 32, (3), 223-6. 38. Colaert, N. et al., Improved visualization of protein consensus sequences by iceLogo. Nat. Methods 2009, 6, (11), 786-7. 39. Colaert, N. et al. iceLogo Manual - https://iomics.ugent.be/icelogoserver/resources/manual.pdf. 40. Eckhard, U. et al., Active site specificity profiling of the matrix metalloproteinase family: Proteomic identification of 4300 cleavage sites by nine MMPs explored with structural and synthetic peptide cleavage analyses. Matrix Biol. 2016, 49, 37-60. 41. Rotonda, J. et al., The three-dimensional structure of apopain/CPP32, a key mediator of apoptosis. Nat. Struct. Biol. 1996, 3, 619 - 25. 42. Talanian, R. V. et al., Substrate specificities of caspase family proteases J. Biol. Chem. 1997, 272, (April 11), 9677 - 82. 43. Earnshaw, W. C. et al., Mammalian Caspases: Structure, Activation, Substrates and Function during apoptosis. Annu. Rev. Biochem. 1999, 68, 383 - 424. 44. Stennicke, H. R. et al., Internally quenched fluorescent peptide substrates disclose the subsite preferences of human caspases 1, 3, 6, 7 and 8 The Biochemical journal 2000, 350, 563 - 568. 45. Thornberry, N. A. et al., A Combinatorial approach defines specificities of members of caspase family and Granzyme B. J. Biol. Chem. 1997, 272, (July 18), 17907 - 11. 46. Petrassi, H. M. et al., A strategy to profile prime and non-prime proteolytic substrate specificity. Bioorg. Med. Chem. Lett. 2005, 15, (12), 3162-6. 47. Gráf, L. et al., Chymotrypsin. In Handb. Proteolytic Enzymes, Academic Press: 2013; Vol. 3, pp 2626-2633. 28 ACS Paragon Plus Environment

Page 28 of 34

Page 29 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

48. Delmar, e. G. et al., A sensitive new substrate for chymotrypsin. Analytical Biochemistry 1979, 99, 316 - 320. 49. Bru, R.and Walde, P., Product inhibition of alpha-chymotrypsin in reverse micelles. Eur. J. Biochem. 1991, 199, 95 - 103. 50. Nakajima, K.and Powers, J. C., Mapping extended substrate binding sites of cathepsin G and human leukocyte elastase. J. Biol. Chem. 1979, 254, 4027 - 32. 51. Attucci, S. et al., Measurement of free and mem-bound cathepsin G in human neutrophils using new sensitive fluorogenic substrates. The Biochemical journal 2002, 366, 965-970. 52. Raymond, W. W. et al., How immune peptidases change specificity: cathepsin G gained tryptic function but lost efficiency during primate evolution. J. Immunol. 2010, 185, (9), 5360-8. 53. Korkmaz, B. et al., Discriminating between the activities of human cathepsin G and chymase using fluorogenic substrates. FEBS J. 2011, 278, (15), 2635-46. 54. Lopez-Boado, Y. S. et al., Neutrophil Serine Proteinases Cleave Bacterial Flagellin, Abrogating Its Host Response-Inducing Activity. J. Immunol. 2003, 172, (1), 509-515.

29 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

For Table of Contents Only

30 ACS Paragon Plus Environment

Page 30 of 34

Page 31 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 1. Detailed workflow for protease specificity profiling using ChaFRAtip and “charged-synchronized” proteome-derived peptide libraries. (A) Generation of a “charge-synchronized” peptide library by a first dimension of separation (SCX1). Acetylation of lysine residues and N-termini, together with the ArgC cleavage specificity, leads to a proteome-derived peptide library that is composed of mostly peptides with a total net charge of 1+ and 2+ (at pH 2.7), which can be separated in a tip with SCX material. (B) Digestion by a protease of interest is followed by a second dimension of separation (SCX2) and subsequent MS analysis. Transformation of 1+ and 2+ peptides into cleavage fragments yields daughter peptides with a different charge state compared with their parental peptides. This charge shift of daughter peptides enables separation from the large background of uncleaved peptides. (C) Data analysis workflow. In short, identified peptides are sorted into prime or non-prime according to their C-terminal amino acid residue. Subsequent peptide filters are applied to exclude uncleaved background and other contaminating peptides. Finally, the prime and non-prime peptides identified by MS are reconstituted to the parental sequences prior to the cleavage (called “cleavage sequences”). MS-identified peptides are depicted in bold black, while the sequences preceding and following them in the protein sequence are colored in gray. 171x124mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2: Monitoring the experimental workflow. (A) Charge distribution of fraction A and B in the first dimension of separation, compared with that of an in silico ArgC digestion. Error bars indicate the standard deviation of three independent experiments. (B), (C) Charge distribution and fragment distribution of fractions A2, B0, B1 and B3 in the second dimension of separation (SCX2) from specificity profiling experiments using GluC as a test protease. Error bars indicate the standard deviation of three independent experiments. (D) Contributions of fractions A2, B0, B1 and B3 to the total cleavage sequences identified in a typical GluC-specificity profiling experiment. 84x89mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 32 of 34

Page 33 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 3: Specificity profiling of four proteases (GluC, caspase-3, chymotrypsin and MMP-1), represented by iceLogo using the identified cleavage sequences by our ChaFRAtip method with the Escherichia coli (strain K12) proteome as reference set. The iceLogos represent the P6 to P6′ sites, according to Schlechter und Berger nomenclature. Amino acids that are most frequently observed (above axis) and least frequently observed (below axis) are illustrated. The number of cleavage sites used to make each iceLogo is listed together with each protease name. In all iceLogos, only significant amino acids (P < 0.05) are shown. The size of an amino acid reflects the difference in the frequency of an amino acid in the experimental versus the reference set. 121x139mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 4: Validation of an intrinsic preference of cathepsin G for Asn at the P1 subsite. (A) Specificity profiling of cathepsin G, represented by iceLogo using all the identified cleavage sequences (N = 1792, upper) by our ChaFRAtip method with the Escherichia coli (strain K12) proteome as reference set versus the iceLogo represents specificity profiling using all identified cleavage sequences with P1 Asn in the same experiment (N = 146, lower). The disfavored amino acids in the latter iceLogo were left out for simplification.¬ (B), (C) The peptide (F11, sequence: IDEFEVN↓SDGDIAR) was analyzed by LC-MS/MS after 3 hours incubation with cathepsin G (E:S = 1:25, w:w) or without protease as control. Identification of fulllength peptide and its nonprime cleavage product were confirmed by correct m/z in MS1 (right panels) and MS2 fragmentation (data not shown) from corresponding TIC peaks (left panels, boxed in rectangle). 178x100mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 34 of 34