Multi-Enzymatic Limited Digestion - the Next-Generation Sequencing

4 hours ago - ... the identification and analysis of proteins, has tremendously gained in popularity and potency through advancements in genome sequen...
1 downloads 0 Views 1MB Size
Subscriber access provided by ALBRIGHT COLLEGE

Article

Multi-Enzymatic Limited Digestion - the NextGeneration Sequencing for Proteomics? Denis Morsa, Dominique Baiwir, Raphael La Rocca, Tyler A. Zimmerman, Emeline Hanozin, Elodie Grifnée, Rémi Longuespée, Marie-Alice Meuwis, Nicolas Smargiasso, Edwin De Pauw, and Gabriel Mazzucchelli J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.9b00044 • Publication Date (Web): 03 May 2019 Downloaded from http://pubs.acs.org on May 3, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Multi-Enzymatic Limited Digestion - the Next-Generation Sequencing for Proteomics? Denis Morsa1,2, Dominique Baiwir1,2, Raphaël La Rocca1, Tyler A Zimmerman1,3, Emeline Hanozin1, Elodie Grifnée1, Rémi Longuespée1,4, Marie-Alice Meuwis1,5,6, Nicolas Smargiasso1, Edwin De Pauw1, Gabriel Mazzucchelli1,* 1

University of Liege, Mass Spectrometry Laboratory, MolSys Research Unit, Liege, Belgium

2

University of Liege, GIGA Proteomics Facility, Liege, Belgium

3

Current address: Collins Aerospace, San Dimas, CA, United States

4

Current address: University of Heidelberg, Department of clinical pharmacology and

pharmacoepidemiology, Heidelberg, Germany 5

Department of hepato-gastroenterology and digestive oncology, CHU, Liege, Belgium

6

Laboratory of translational gastroenterology, GIGA Institute, Liege, Belgium

Correspondence Dr. Gabriel Mazzucchelli: [email protected] ORCID: 0000-0002-8757-8133

1 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ABSTRACT Over the past 40 years, proteomics, generically defined as the field dedicated to the identification and analysis of proteins, has tremendously gained in popularity and potency through advancements in genome sequencing, separative techniques, mass spectrometry and bioinformatics algorithms. As a consequence, its scope of application has gradually enlarged and diversified to meet specialized topical biomedical subjects nowadays. Although the tryptic bottom-up approach is widely regarded as the gold standard for rapid screening of complex samples, its application for precise and confident mapping of protein modifications is often hindered due to partial sequence coverage, poor redundancy in indicative peptides and lack of method flexibility. We here show how the synergic and time-limited action of a properly diluted mix of multiple enzymes can be exploited in a versatile yet straightforward protocol to alleviate present-day drawbacks. Merging bottom-up and middle-down ideologies, our results highlight broad assemblies of overlapping peptides that enable refined and reliable characterizations of proteins, including variants identification, and of their carried modifications, including posttranslational modifications, truncations and cleavages. Beyond this boost in performance, our methodology also offers efficient de novo sequencing capabilities, in view of which we here present a dedicated custom assembly algorithm.

Keywords Proteolytic method; Limited digestion; Protein characterization; PTMs; de novo sequencing; mAbs; Digestibility profile; Mass spectrometry.

2 ACS Paragon Plus Environment

Page 2 of 37

Page 3 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

INTRODUCTION The beginning of the 2000s marks the entry into the so-called post-genomics era and the advent of new “omics” fields aiming to an always-improving comprehension of the human biochemical machinery1. Positioned as a major contributor among these emerging disciplines, proteomics refers to the systematic identification and characterization of proteins including their intrinsic modifications, conformations, interactions, abundances, activities, and cellular localizations2. As such, it lies as a constitutive part among leading topics in the modern research area, namely within the scope of biomarker discovery and drug development3, both bringing the emergence of the biopharmaceutical field towards “personalized medicine” applications4. Capitalizing on its high sensitivity and specificity, as well as interfacing possibilities with both pre-fractionation techniques5,6 and alternative structural methods7, mass spectrometry (MS) has emerged as an essential tool on the road to protein elucidation8. From this angle, two main proteomics approaches were historically developed and classified9: the top-down approach targeting medium-sized intact protein ions on one hand10, and the bottom-up approach targeting short sub-4.6-kDa peptides issued from proteolytic digestions on the other hand11. In the second scenario, the resulting peptide mix is typically separated in liquid chromatography (LC), and further subjected to collision-mediated (CID) or electron-mediated (ETD, ECD) dissociation12 within the scope of tandem MS data acquisition13. These activation techniques lead to predictable fragmentation patterns that are readily confrontable with in silico simulations and database searching for identification purposes14. Although its efficiency and sensitivity contribute to settle the tryptic bottom-up approach as the gold standard for the screening of complex protein samples in the framework of shotgun proteomics15, major limitations often occur when targeting refined sequence characterizations on pure to semi-complex samples. These limitations include sequence-dependent digestion efficiencies, non-comprehensive datasets carried by the oversampling of the most abundant peptides and the need of uniform 3 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

cleavage sites distributions within the protein sequence16. In turn, these aspects may lead to unsatisfactory sequence coverage, missed or biased characterization of modifications such as post-translational modifications (PTMs) and deficient isoforms discrimination, among others.17 In this context, considerable efforts have been spent to develop alternatives to the classic bottom-up strategy over the past two decades18,19, among which a prevalent trend involves the utilization of enzymes other than trypsin20. These substitutes may be selected based on their high single-residue selectivity in view of generating larger 3-kDa to 15-kDa peptides and slimming down the proteolytic mix complexity, thus establishing the foundations of the middledown proteomics approach21. Another option lies in multi-enzymatic schemes issued from the combination of distinct mono-enzymatic proteolysis steps encompassed within parallel17,22, or more rarely consecutive23, workflows. In this way, higher numbers of proteolytic peptides and better sequence coverages were reported24, in turn improving the characterization of PTMs25, especially for phosphopeptides26. These experiments are still time-consuming and, as they involve prolonged reaction times with high enzyme concentrations, complete protein digestions yielding inherent low missed-cleavage counts are often achieved. The redundancy in strategic peptides covering PTMs or mutation zones of interest is therefore most of the time only marginally improved over tryptic workflows. We here describe an innovative approach merging both the bottom-up and middle-down ideologies to provide a fast, robust and refined solution for current proteomics challenges targeting the characterization of low-to-medium complexity samples. The methodology, referred to as MELD for Multi-Enzymatic Limited Digestion, capitalizes on a single two-hour long proteolytic step involving the synergic action of a diluted mix of several enzymes. While the multi-enzymatic benefits are retained, the lower enzyme concentration and reaction time induce a limitation in the proteolytic reaction. This feature engenders numerous missed cleavage events27 and results in various peptide products of diverse length with overlapping 4 ACS Paragon Plus Environment

Page 4 of 37

Page 5 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

stretches of residues. This complex outcome eventually increases the extent of and confidence in the sequence coverage, in turn enabling refined protein characterization and efficient database-free de novo sequencing. MATERIALS AND METHODS Sample preparation. Bovine serum albumin (BSA) was purchased as a 2 mg/mL standard protein solution packaged in sealed ampules from Sigma-Aldrich (Darmstadt, Germany). Adalimumab antibody was purchased as a 40 mg/0.8 mL Humira solution (AbbVie, Rungis) encapsulated in a subcutaneous injection pen and diluted to 2 mg/mL using NH4HCO3 50 mM. Both compounds were aliquoted as 20 µg proteins fractions, and further processed for cysteine reduction and alkylation following a three-step reaction scheme performed in 50 mM ammonium bicarbonate: (a) reduction with dithiothreitol (DTT) 10 mM final at 60 °C for 1 h; (b) alkylation in iodoacetamide (IAA) 20 mM final at room temperature for 30 mins and (c) quenching of the alkylation reaction by addition of DTT 11 mM to reach 21 mM final. The soreduced and alkylated samples were purified by precipitation using the 2D clean-up kit commercialized by GE Healthcare (Chicago, IL, USA). They were subsequently reconstituted to 0.5 mg/mL in 50 mM ammonium bicarbonate. Enzymatic digestion. Trypsin, Glu-C and chymotrypsin proteases were purchased from ThermoFisher Scientific (Waltham, MA, USA) as 20 µg MS-grade lyophilized powders. They were reconstituted to 1 mg/mL on ice using respectively HCl 10 mM, ultrapure H2O Milli-Q and HCl 1 mM. 1. Mono-enzymatic digestion: each pure enzyme solution was added to a 20 µg protein fraction to reach a final protease-to-protein ratio of 1:50 (w:w), according to manufacturer’s recommendations. The tubes were incubated at 37 °C for 16 h under stirring at 600 rpm (Thermomixer comfort, Eppendorf, Hamburg, Germany) before the digestion was stopped

5 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

using trifluoroacetic acid (TFA) at 0.5% (v:v) final concentration to reach pH < 3. The resulting samples were evaporated to dryness under vacuum and reconstituted to 1 pmole/1 µL in H2O with 0.1% TFA for injection. 2. Combinatory multi-enzymatic digestion: the mono-enzymatic digests achieved from the independent action of each three enzymes (trypsin, Glu-C, chymotrypsin) following the abovedetailed protocol were pooled in equi-proportion before injection. 3. Synergic multi-enzymatic and limited digestion (MELD): two multi-enzyme mixtures combining trypsin, Glu-C and chymotrypsin were prepared on ice immediately before use by mixing the pure 1 mg/mL solutions in a ratio of 1.00:1.00:1.55 (v:v:v) respectively. The highratio mixture was used as such. The low-ratio mixture was obtained by a 9-fold dilution of the former using NH4HCO3 25 mM. Two distinct digestions were performed by respectively adding a same volume of the so-prepared mixtures to two 20 µg protein fractions to obtain final protease-to-protein ratios of 1:85, 1:85 and 1:55 on one hand, and of 1:750, 1:750 and 1:500 on the other hand, for trypsin, Glu-C and chymotrypsin respectively. CaCl2 at 5 mM final concentration was also added. Each tube was incubated for 2 h at 37 °C under stirring at 600 rpm (Thermomixer comfort, Eppendorf, Hamburg, Germany) before the digestion was stopped using TFA at 0.5% (v:v) final concentration to reach pH < 3. Equal amounts of both digests were subsequently pooled. The resulting mixture was evaporated to dryness under vacuum and reconstituted to 15 pmoles/9 µL in H2O with 0.1% TFA for injection. Liquid Chromatography-Mass Spectrometry Analysis. The chromatographic separation was performed on a M-Class ACQUITY UPLC (Waters, Milford, MA, USA). A 3-min long sample trapping step was first achieved on a reversed-phase (RP) ACQUITY UPLC M-Class Trap Column (Symmetry C18, 100Å, 5 µm, 180 µm x 20 mm, Waters, Milford, MA, USA) prior to releasing on a ACQUITY UPLC M-Class HSS T3 C18 analytical column (100 Å, 1.8 µm spherical silica, 75 µm x 250mm, Waters, Milford, MA, USA). Water and acetonitrile both 6 ACS Paragon Plus Environment

Page 6 of 37

Page 7 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

supplemented with 0.1% (v:v) of formic acid (FA) were used as eluents and mixed according to a 177-min long gradient method (Supporting Information, Section S1). The flow rate was set at 0.6 µL/min. The mass detection was performed on an online hyphenated Q-Exactive Plus Hybrid Quadrupole-Orbitrap mass spectrometer (ThermoFisher Scientific, Waltham, MA, USA) operated in data-dependent positive acquisition mode and calibrated using Pierce LTQ ESI positive ion calibration solution (ThermoFisher Scientific, Waltham, MA, USA). Source parameters were set as follows: spray voltage = 2.3 kV, capillary temperature = 270 °C and Slens RF level = 50. The MS spectra were acquired with 70,000 mass resolution (m/z 200) from m/z 400 to m/z 1600, AGC target of 1E+6 and lock mass internal calibration at m/z 445.12003. The MS/MS spectra were acquired for the 12 most intense ions of each MS scan (TopN = 12) with 17,500 mass resolution (m/z = 200), an isolation window of m/z 2, automatic gain control (AGC) target of 1E+5, maximum injection time of 50 ms and (N)CE = 28. Single-charged ions exclusion and a 10 s dynamic exclusion were used. The optimization of the standard MELD protocol was performed on a Dionex UltiMate 3000 HPLC (ThermoFisher Scientific, Waltham, MA, USA) hyphenated an Esquire HCT Ultra ion trap mass spectrometer (Bruker Daltonics, Bremen, Germany). Database search. The database search was performed in PEAKS Studio Software v7.5 (Bioinformatics Solutions Inc, Waterloo, ON, Canada) on the adalimumab sequence backgrounded with the SWISS-PROT database28 restricted to the bovine taxonomy. Tolerances of 5 ppm and 0.02 Da were allowed on the precursor and fragment masses respectively. Carbamidomethylation of cysteine was set as a fixed modification. Deamidation of asparagine and glutamine, oxidation of methionine and N-glycosylation of asparagine with G0F, G1F and G2F fucosylated glycans were set as variable modifications. A maximum of 5 PTMs per peptide was allowed. Except otherwise stated, no-enzyme rule was used. To ensure high quality MS/MS sequencing matches, a filtering for a false discovery rate (FDR) < 0.1% was applied. 7 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

De novo sequencing and assembly. The de novo sequencing and assembly were performed in PEAKS AB v1.0 (“sequence validation job”) and in PEAKS AB v2.0 (“protein sequencing job”) using identical mass tolerances, fixed and variable PTMs, and maximum PTMs counts as described above. De novo peptide candidates. Candidate lists used as input for the Sequence Assembly algorithm were generated in PEAKS Studio Software v7.5 (Bioinformatics Solution Inc, Waterloo, ON, Canada) and initially extracted without local confidence (LC) score filtering. In silico digestion. The digestion was simulated on human serum albumin (HSA) using SequenceEditor BioTools v3.2 (Bruker Daltonics, Bremen, Germany). The cleavage rules were set as follows: K and R for trypsin, E for Glu-C and F,L,M,W,Y for chymotrypsin. Peptide lists were further refined based on their masses considering an LC-MS compliant interval between 500 and 5000 Da. Data Analysis. Venn distributions were calculated on the Bioinformatics and Evolutionary Genomics portal of the Vlaams Instituut voor Biotechnologie (Ghent, Belgium). Chromatographic peaks integration was performed in Skyline v.4.029 (MacCoss Lab Software, University of Washington, WA, USA). Sequence Assembly algorithm. Java programming language was used to write the de novo assembly algorithm. The code is available free of charge on the MSLab software webpage at http://www.mslab.ulg.ac.be/software.

8 ACS Paragon Plus Environment

Page 8 of 37

Page 9 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

RESULTS Theorizing on multi-enzymatic and limited digestions The outcome of diverse proteolytic digestions carried on human serum albumin (HSA) was simulated in silico with increasing numbers of allowed missed cleavages. Three distinct approaches were considered: a) a mono-enzymatic tryptic digestion, b) a multi-enzymatic mix issued from the combination of parallel mono-enzymatic digestions independently performed with trypsin, glutamyl endopeptidase (Glu-C) and chymotrypsin, and c) a multi-enzymatic digestion issued from the synergic action of the three aforementioned enzymes pooled together. Results were assessed based on the number of unique proteolytic peptides (Fig. 1a), the protein sequence coverage (Fig. 1b) and the mean residue occurrence (Fig. 1c), defined as the average number of peptides encompassing a given residue among the protein sequence. Protocols employed in state-of-the-art mono-enzymatic20,30 and combinatory multienzymatic methodologies22 commonly rely on long reaction times and high enzyme-tosubstrate ratios. As such, they favor low missed cleavage counts and are thus embodied by conditions localized in the left part of the graphs. Within this section, both these methods are associated with already extensive sequence coverages but display limited numbers of proteolytic peptides and low mean residue occurrence. Practically considered, these last aspects may result in missing information, therefore impeding exhaustive protein characterization, and incomplete peptides connectivity, yet a required feature for de novo sequencing. Gradually raising the possibility of missed cleavage events reflects a progressive limitation of the enzymatic digestion and a shift towards conditions localized on the right side of the graphs. Within this section, the number of unique peptides and the mean residue occurrence both drastically increase. This is chiefly observed for the synergic multi-enzymatic approach which, in those respects, proves to be the most efficient of the three simulated strategies. Summed

9 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

together, these aspects theoretically emphasize opportunities of improvement in protein characterization using the MELD methodology.

Figure 1. In silico digestion performances monitored on human serum albumin (HSA) based on (a) the number of generated unique peptides, (b) the sequence coverage, and (c) the mean occurrence per residue, as functions of the number of theoretically allowed missed cleavages and considering an LC-MS compliant mass interval between 500 and 5000 Da. Three approaches were considered: a mono-enzymatic tryptic digestion, a combined multi-enzymatic digestion relying on trypsin, Glu-C and chymotrypsin, and a synergic multi-enzymatic digestion relying on trypsin, Glu-C and chymotrypsin.

10 ACS Paragon Plus Environment

Page 10 of 37

Page 11 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Setting up the conditions of the standard MELD protocol This section aims to provide a standard MELD protocol that constitutes a methodological basis readily applicable for the mainstream characterization of a wide variety of protein samples. It is effectively tweakable based on the nature and concentrations of enzymes in view of shaping the proteolytic peptides mix for specific applications or further enhancing compatibility with specific substrates, as longer debated in the discussion section. The protocol optimization was instigated from a collection of mono-enzymatic proteolytic datasets acquired after the time-limited digestion of bovine serum albumin (BSA) using each of six different enzymes: trypsin, chymotrypsin, Glu-C, Asp-N, Arg-C and Lys-C. In this perspective, eight distinct enzyme-to-protein ratios ranging from 1:100 to 1:10000 (w:w) were systematically evaluated, and a fixed 2-hour reaction time was considered to ensure adequate reaction reproducibility while minimizing time-dependent artifacts31. The digests were examined through 1D polyacrylamide gel electrophoresis to visually assess both the digestion yield and the proteolytic peptide size distribution (Fig. 2a). Nano-HPLC-MS/MS analysis were also performed to assess the unique proteolytic peptide count, the sequence coverage, the mean residue occurrence (Fig. 2b) and the missed-cleavage pattern diversity (Fig. 2c). Results are illustrated for trypsin. The 1D gel electrophoresis displays an expected progressive transition from a low molecular weight (MW) smear towards high MW peptides bands as the enzyme-to-substrate ratio is decreased from 1:100 to 1:10000. This adjustment reflects a gradual rise in the diversity and maximum reachable size of proteolytic peptides when the probability of missed cleavage is increased through a limitation in the proteolytic reaction. Moving to LC-MS/MS experiments, this modulation results in the presence of an optimum in the identification efficiency that is localized within the 1:200 to 1:750 ratios. Considering other dilutions, both the unique peptide count and the sequence coverage get gradually worse as an increasing fraction of the produced 11 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

peptides become incompatible with the analysis, being either too short-length at high ratios or too long-length at low ratios. While an acceptable sequence coverage of 90.2% is already achieved with the 1:750 single-ratio digestion, drastic gains in the peptide counts and mean residue coverage are obtainable considering multiple-ratios combinations. This strategy offers both a greater diversity in the proteolytic peptide lengths, therefore increasing the probability of overlapping segments required for de novo assembly, and an amplified versatility, thus universalizing the digestion protocol regardless of the protein sequences and properties. The best efficiency monitored for the experimentally-unviable combination of all 1:100 to 1:10000 ratios may be efficiently approached by mixing two digests arising from conditions localized on either side of the single-ratio optimum. In this context, the combination of the 1:100 and 1:1000 ratios was found as the best compromise for trypsin. The so-generated proteolytic set covers a wide mass distribution (Fig. 2d), overlaps most of the peptides generated at the single-ratio optimum (Fig. 2e) and accounts for a 76% (86/113) sampling of the complete set issued from the all-ratio mix (Fig. 2f). Similar tunings were performed for the other five precited enzymes, each leading to their own optimal two-ratio combination (Supporting Information, Section S2). Compromising between respective digestion efficiencies and diversities in the cleavage sites offered by each enzyme, and capitalizing on the two-ratio strategy, the standard MELD protocol is eventually set up as a combination of two independent digestions, both performed with the three same enzymes working synergistically: trypsin, Glu-C and chymotrypsin. After further refinements carried on proteins with different digestibility and size properties, i.e. myoglobin, β-lactamase and fetuin (Supporting Information, Section S3), the high-ratio or concentrated enzymes mix is fixed at 1:85, 1:85 and 1:55 ratios, whereas the low-ratio or diluted enzymes mix is fixed at 1:750, 1:750 and 1:500 ratios, for the three enzymes in the pre-cited order respectively. Applied to HSA in a triplicate run, we achieved 100% sequence coverage 12 ACS Paragon Plus Environment

Page 12 of 37

Page 13 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

with 1287 ± 127 unique peptides bearing from 0 to 30 missed cleavages. An average count of 9.9 ± 0.4 missed cleavage events per peptide was monitored, with respective contributions of 2.1 ± 0.2, 2.7 ± 0.1 and 5.1 ± 0.1 for trypsin, Glu-C and chymotrypsin.

Figure 2. Evaluation of the tryptic digestion efficiency on bovine serum albumin (BSA) performed with enzyme to substrate ratios ranging from 1:100 to 1:10000 (w:w). Outcomes were analyzed with (a) 1D gel electrophoresis to monitor proteolytic peptide length distribution, as well as with nano-HPLC-MS/MS to quantify (b) the unique proteolytic peptide count, the sequence coverage, the mean residue occurrence, and (c) the missed cleavage pattern distribution. With the objective to enhance the proteolysis versatility, digest mixes issued from 13 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

the combination of all ratios as well as of the 1:100 and 1:1000 ratios were considered and confronted to the single-ratio optimum (1:500) in respect of (d) the proteolysis peptides mass distribution, and (e-f) the identity overlapping. Validating and benchmarking the MELD approach on an antibody sequence Starting from the advent of hybridoma technologies, monoclonal antibodies mAbs have rapidly raised tremendous interest in the scope of immunotherapy32 and are currently the focus of sustained characterization efforts from the proteomics community33. In this context, we here used adalimumab, a therapeutic ~144 kDa human immunoglobulin G IgG1 involved in the regulation of diverse inflammatory processes, as a candidate molecule to gauge the efficacy of the standard MELD protocol in comparison to the well-established mono-enzymatic strategies encompassed in individual and combinatory workflows30. To this end, the peptide mixes issued from each tested proteolysis were individually analyzed by LC-MS/MS and the generated spectra were searched against the adalimumab sequence, both with and without enzymespecific cleavage rules compliance to assess the contribution of non-specific cleavages (Supporting Information, Tables S1, S2, S3 and S4). The respective performances of each strategy were evaluated over triplicate experiments in regards to the digestion reproducibility (Fig. 3a), of identification-related attributes, i.e. the unique peptide count, the sequence coverage, the mean residue occurrence and the mean peptide confidence score, and of quantitative-related attributes, i.e. the averaged absolute peptide intensity and its variability (Fig. 3b), as well as of the distribution in the residue coverage across the sequences of both heavy (Fig. 3c) and light (Fig. 3d) chains. The qualitative digestion reproducibility inherent to each methodology was gauged from their three respective digest replicates based on the amount of common and singly found peptide identities. Calculated over the respective total populations, the common peptides account for 60%, 56%, 65% and 65% while the peptides only found in one replicate account for 24%, 27%, 14 ACS Paragon Plus Environment

Page 14 of 37

Page 15 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

11% and 20% for the MELD, combinatory, constrained tryptic and unconstrained tryptic methodologies respectively. Additionally, the mean confidence score per peptide, reported as 10log10(P-value)34, are respectively equal to 55 ± 7, 52 ± 7, 95 ± 16 and 54 ± 6, a score of 20 being often qualified as of relatively high confidence (Supporting Information, Section S4). Although these results emphasize the expected best reproducibility and confidence achieved for the gold-rule tryptic workflow, they also validate the adequate robustness and reproducibility inherent to the MELD methodology. Indeed, despite the complexified peptide mixtures awaited from our strategy, its reproducibility performance is on par with both the single and combinatory mono-enzymatic strategies when the “no-enzyme” rule is used for database searching. Heading to identification-related attributes, our results highlight 100% sequence coverage for both the heavy and light chains when exploiting the standard MELD protocol. Slightly lower performances were monitored with the tryptic and combinatory approaches. Considering a 6.5% coefficient of variation (CV) from triplicates, 949 unique peptides were obtained using the MELD, with respective contribution of 628 and 321 for the heavy and light chains. These numbers correspond to a 1.8-fold increase from the combinatory methodology and to a 2-to-6-fold increase from the unconstrained mono-enzymatic methodologies, depending on the enzyme nature. This raise may be imparted to both the synergic action of enzymes, diversifying the cleavage sites, and to the limitation of the digestion, favoring missed cleavages. This last idea is first supported by the fact that 49% of the complete population of proteolytic peptides resulting from the combination of all four strategies are solely found with the MELD proteolysis, while this contribution goes down to 14% for the combinatory, 6% for the unconstrained tryptic and less than 1% for the constrained tryptic workflows (Fig. 3e). It is else further reinforced by an average of 6.1 missed cleavage events per peptide monitored with the MELD against 4.3 with the combinatory strategy. These discrepancies also rationalize the 15 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

higher 30 ± 3 mean occurrences per residue recorded using the MELD, with respective contributions of 28 ± 3 and 35 ± 3 for the heavy and light chains, joined with an extensive coverage distribution across the entire sequences. Benchmarked with experienced alternatives, these data represent an improvement of above 190% over the combinatory method and from 280% to 800% over the individual mono-enzymatic methods. In parallel, results acquired using mono-enzymatic workflows highlight substantial gains in the unique peptide counts, on the order of 100% with trypsin, and of 180% and 300% for Glu-C and chymotrypsin respectively (Supporting Information, Section S5), as well as in the mean residue occurrence when the database search is performed without specific cleavage rules. Further corroborating the already-documented aspecificity of chymotrypsin35, these data support the idea that the identification accuracy and versatility imparted from mono-enzymatic strategies would strongly benefit from unconstrained rules for database searching. This option yet requires sufficient spectral resolution and bioinformatics adjustments regarding the database search algorithms, FDR filtering and results scoring. Finally, focusing on quantitative-related attributes, the averaged absolute peptide intensity was found equal to 9.6E+9 when injecting 15 pmoles of MELD-digested adalimumab. This corresponds to 6.4E+8 per pmole of digested protein, denotating a decrease of less than one order of magnitude from the combinatory strategy evaluated at 2.9E+9, and of slightly more than one order of magnitude from the tryptic strategies respectively evaluated at 4.4E+10 and 1.4E+10 after constrained and unconstrained database searches. While these differences might foreshadow possible hindrances concerning the detection and fragmentation of MELDgenerated peptides on low-end instruments, their root cause still witnesses of a benefit. Indeed, the oversampling of the most abundant peptides when using trypsin induces a distribution in peptide intensity that is largely bimodal, centered on 1E+7 and 1E+12 (Fig. 3f). On the contrary, the distribution achieved using the MELD is much more uniform and related to an ideal 16 ACS Paragon Plus Environment

Page 16 of 37

Page 17 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Gaussian-type function centered on 1E+9, indicating a more efficient sampling of the proteolytic population. The mean variability in peptide intensity over triplicate is of 49% ± 17% for the MELD, which is on the same order as the 44 % ± 17 % obtained with the combinatory methodology. This result yet shows that the MELD methodology is less adequate than tryptic workflows, associated with a variability on the order of 20%, for relative batch-to-batch quantification purposes. Still, a variation in peptide intensity of at least two orders of magnitude that would be observed using our methodology can safely be considered as relevant.

17 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3. Qualitative reproducibility, identification, quantification and coverage performances monitored from the different proteolytic methodologies: the standard MELD protocol, the mono-enzymatic strategy following a combinatory workflow, the mono-enzymatic tryptic strategy with constrained database search and the mono-enzymatic tryptic strategy with unconstrained database search. Results are established from triplicate. (a) Distribution of the number of unique peptides according to their presence or absence in the proteolytic populations generated for each replicate of a given methodology. The corresponding relative proportions of unique peptides found in the three replicates, in two of the three replicates and in one single replicate of each methodology are displayed in the second line. (b) Performances regarding the 18 ACS Paragon Plus Environment

Page 18 of 37

Page 19 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

identification-related attributes: the unique peptide count, the sequence coverage, the mean residue occurrence and the mean peptide confidence score, and regarding the quantitativerelated attributes: the averaged absolute peptide intensity normalized to 1 pmole and its mean variability over triplicate. Amplitude in the residue coverage across the sequence of the (c) heavy adalimumab chain and (d) light adalimumab chain. (e) Distribution of the number of unique peptides according to their presence or absence in the proteolytic populations generated for each four experienced proteolytic methodologies after merging their three respective replicates. (f) Distribution of the proteolytic population associated with each methodology as a function of their intensity gauged from their respective chromatographic peak area. Looking deeper into the characterization of PTMs, truncations and cleavage sites Built around a tetrameric protein assembly, mAbs exhibit a variable domain devoted to antigen recognition and a constant N-glycosylated domain responsible for receptor binding. The production-inherent diversity in the glycosylation patterns was reported to drastically influence the effector function and pharmacokinetics of therapeutic mAbs36. As such, it constitutes a mandatory but still challenging attribute to characterize by bottom-up MS-based proteomics37. Pursuing our study of adalimumab, we here compared the efficiency and extent in the coverage of the N-glycosylated Asn-301 residue using the standard MELD protocol and both the tryptic and combinatory mono-enzymatic strategies (Fig. 4a). Additionally, specific modifications reported for adalimumab38 were also monitored following the same rationale: the succinimidation of Asp-284 by reaction with Gly-285 that results in dehydration (Fig. 4b), the preferential deamidation of Asn-329 (Fig. 4c), the C-terminal truncation of Lys-451 (Fig. 4d) and the cleavage occurring between His-228 and Thr-229 (Fig. 4e). Our results show that the MELD is successful in covering all listed PTMs and mutated zones of interest with the highest count and redundancy in indicative unique peptides. First focusing on N-glycosylation, the MELD enables the confident sequencing of 9 unique peptides 19 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

covering the G0F-modified Asn-301 residue, among which 4 are found in all three replicates, and of 5 unique peptides covering the G1F-modified Asn-301 residue, among which 2 are found in all three replicates. In terms of numbers, these data highlight a 3-fold increase in the coverage of the main G0F N-glycoform from the tryptic methodologies, considering both constrained and unconstrained database searches. In terms of specificity, the MELD is the only methodology that enables the confident characterization of the less abundant G1F Nglycoform38. The dehydration of Asp-284 and the deamidation of Asn-329 are respectively supported by 3 unique peptides (1 found in all replicates) and by 6 unique peptides (4 found in all replicates) using the MELD while only one candidate is found at both sites when using the tryptic methodologies. Heading to the C-terminal truncation of Lys-451, the MELD provides 8 highly redundant unique peptides covering the modification. These are detected as matching pairs showing either intact or truncated C-terminus, therefore confidently evidencing the existence of both forms. On the other hand, the cleavage occurring between His-228 and Thr229 is supported by 3 unique peptides whose N-terminus starts at Thr-229. Both the truncation and the cleavage events are left uncovered using the tryptic methodologies. Only the combinatory approach provides results on par with the MELD when considering the deamidation of Asn-329 and the cleavage at His-228.

20 ACS Paragon Plus Environment

Page 20 of 37

Page 21 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 4. Coverage of specific modifications of adalimumab monitored from the number and redundancy of indicative unique peptides achieved using different proteolytic methodologies: the standard MELD protocol, the mono-enzymatic strategy following a combinatory workflow and the mono-enzymatic tryptic strategy with both constrained and unconstrained database search. Different types of modifications are investigated: (a) the N-glycosylation of Asn-301 by fucosylated glycans G0F and G1F, (b) the dehydration consecutive to succinimidation of Asp-284, (c) the deamidation of Asn-329, (d) the C-terminal truncation of Lys-451 and (e) the cleavage occurring between His-228 and Thr-229. Only peptides showing the modifications are reported together with their redundancy monitored from their appearance in each replicate.

21 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Capitalizing on the MELD protocol for de novo sequencing The extensive network of overlapping peptides added to the large quantity of sequenceinherent daughter ions offered by the MELD make our methodology an ideal and easily implementable alternative in view of de novo sequencing. While efficient strategies relying on combinatory proteolytic workflows with various activation techniques39 or on acid hydrolysis coupled with dedicated assembly algorithms40 have been reported, our method has the practical benefits to be easily implementable on CID-equipped instruments, to avoid exogenous chemical protein modifications and to minimize time-dependent artifacts. As a theoretical proof of concept, we here established the lists of de novo peptide candidates issued from the triplicated adalimumab digests generated by the different methodologies, aligned each respective nominee on the whole protein sequence and enumerated those whose identity match a 3- to 8-successive residue long k-mer stretch. The so-defined target window was panned by a 1-AA step across the complete protein sequence to evaluate the coverage amplitude offered by each methodology (Fig. 5a). Our results show that the MELD methodology offers superior performance compared to the combinatory methodology when the k-mer length increases, which ties in a more stringent filtering of de novo candidates and a lower false positive rate (Supporting Information, Section S6). Considering a k-mer length of 6 or 7 as optimal for accurate sequence assembly41, the MELD theoretically enables 90% ± 1% sequence coverage while the combinatory and tryptic strategies only sample 78% ± 4% and 67% ± 1% of the sequence (Supporting Information, Section S7).

22 ACS Paragon Plus Environment

Page 22 of 37

Page 23 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 5. Performances of tryptic, combinatory and MELD protocols exploited for de novo assembly. (a) Theoretical sequence coverage generated from de novo candidate lists as furnished by PEAKS Studio for k-mer length ranging from 3 to 8 and (k-1) overlapping. (b) Correct sequence coverage, average confidence score related to residue assignment, number of residues with confidence score > 85, proportion of exploitable MS2 data considering an average local confidence (ALC) score above 50, sequence-supporting peptides and PSMs as monitored form PEAKS AB v2.0 software. The digest raw files were further processed using PEAKS AB Software (Bioinformatics Solutions Inc., Waterloo, Canada) which offers an integrated solution for de novo assembly of monoclonal antibody sequences. Results achieved with the different methodologies were compared in terms of correct sequence coverage, confidence score related to residue assignment, number of sequence-supporting peptides and peptide-spectrum matches PSMs, and proportion of exploitable MS2 data considering an average local confidence (ALC) score above

23 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

50 (Fig. 5b). Results achieved from the MELD methodology enables to correctly sequence 99% of the complete adalimumab sequence with an averaged confidence score of 89 out of 100 and with 92% of sequenced residues associated with a confidence score higher than 85. That constitutes better performances in regard to these achieved with the combinatory methodology which allows to correctly sequence 97% of the total sequence with a lower confidence score of 84 and with 85% of sequences residues above the confidence threshold of 85. Additionally, the number of supporting peptides is also higher using the MELD with a gain of about 30% from the combinatory methodology. Focusing on N-glycosylation, 12 peptides were sequenced with the correct modification on the Asn-301 residue, compared to 8 with trypsin and only 1 with the combinatory methodology (Supporting Information, Section S8). In an effort to export de novo assembly possibilities on all types of proteins, we are currently designing a dedicated “Sequence Assembly” algorithm operating from de novo candidate lists generated in PEAKS Studio (Bioinformatics Solutions Inc., Waterloo, Canada) and taking advantage of the specific features offered by the MELD methodology. Its rationale is based on three successive steps, i.e. the initiation occurring from the selection of a seed sequence, the elongation of the sequence in both C-terminus and N-terminus directions, and the termination (Fig. 6). The initiation step starts with the establishment of a list of 5-AA long seed sequence candidates selected based on a minimum confidence score threshold, as furnished by the PEAKS Studio software. Practically, a threshold of 70 was found as an ideal compromise between extensive sampling and low false positive rate. The redundancy of each seed sequence is then calculated, as well as a global confidence score after averaging over all instances of a given seed sequence. Seed sequences associated with the highest redundancies (>30) and global confidence scores are used in priority. Next, for a given selected seed sequence, the last triplet of AA localized at the C-terminus is searched against the sequences of all the de novo 24 ACS Paragon Plus Environment

Page 24 of 37

Page 25 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

candidates. The first neighboring amino acid position is evaluated to find out the four most occurring AA candidates in this position. After selection, two values are calculated for these four candidates: (a) the global confidence score averaged from all their respective instances and (b) the CountCorrect score which corresponds to a stricter evaluation of the occurrence encompassing the C-terminus quadruplet of AA from the seed sequence instead of the initial triplet. Based on these three data, i.e. occurrence, confidence and CountCorrect, a hierarchy of rules is used to select the most probable candidates to concatenate onto the growth sequence. This hierarchy is conducted based on three criteria of decreasing priority: (i) if the occurrence count of either of the four candidates is greater than 10, the amino acid to concatenate is selected as the one with the highest product from multiplication of the occurrence count by the CountCorrect score; (ii) if one candidate has both a CountCorrect score and an occurrence count respectively higher by at least two and five points than the next best candidate, the amino acid to concatenate is selected as the former; (iii) if the confidence score of one candidate is higher than 200, the amino acid to concatenate is selected as the one with the highest product from multiplication of the global confidence score by the CountCorrect score. The seed sequence is extended by iteration following steps of one AA in both C-terminus and N-terminus until no valid match is found anymore. The output file reports, for each unique tested seed sequence, the concatenated sequence tag along with the average confidence and occurrence scores of each constitutive amino acid. Using this algorithm on the triplicate adalimumab MELD digests, we were able to cover respectively 64 ± 1 % and 59 ± 1 % of the heavy and light chains sequences and to obtain a correct 96-amino-acid long tag localized in the first portion of the heavy chain sequence42.

25 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 6. Design of the Sequence Assembly algorithm devoted to the de novo assembly of MELD-generated data. Lists of de novo peptide candidates issued from PEAKS Studio software are used as inputs. The algorithm starts with the selection of eligible 5-AA long seed sequences based on their respective redundancies and global confidence scores. This sequence is next elongated in the C-terminus direction through an alignment of the last triplet of AA on all peptide candidates. The AA to concatenate is selected based on hierarchy of rules involving three criteria: the occurrence count, the global confidence score and the CountCorrect score. The process is pursued iteratively and followed by similar N-terminus elongation. The final concatenated sequence tag along with the average confidence and occurrence scores of each constitutive amino acid is eventually reported in the output file.

26 ACS Paragon Plus Environment

Page 26 of 37

Page 27 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

DISCUSSION The synergic and limited proteolytic methodology here described, abbreviated MELD for multi-enzymatic limited digestion, present numerous advantages over alternative bottomup strategies relying on mono-enzymatic workflows: (i) it generates higher unique peptide count and mean residue occurrence, in turn potentially inducing improved sequence coverage and characterization accuracy; (ii) it enables confident PTMs identification and localization, namely within the challenging field of N-glycosylation in mAbs which classically requires alternative fragmentation strategies to CID43; (iii) through the two-ratio approach, it offers greater peptides diversity and amplified operation versatility; (iv) it engenders an extensive network of overlapping peptides, therefore opening way to efficient de novo sequencing and related applications; (v) it is tunable to cope with substrate particularities and/or specific study requirements. In this latter perspective, adjustments in the composition of the multi-enzymatic mixture regarding the concentration and nature of proteases expectedly constitute assets to accommodate unusual occurrence and distribution of cleavable residues, or to purposely shape specific properties of the proteolytic mixture such as the median length of the peptides. The current limitations of our methodology lie in the increased complexity of the resulting proteolytic mixtures consecutively to the joint action of multiple enzymes and the promotion of missed cleavage events. This feature amplifies effects of competitive ionization inherent to electrospray and somewhat hinders the application of the MELD for shotgun proteomics on complex mixtures with large concentration dynamics. Else, the digest products reproducibility over mono-enzymatic strategies is commonly lower, therefore advising caution for quantitative proteomics studies relying on the MELD methodology. The standard MELD protocol was successfully employed as part of specific proteomic studies ranging from sequence confirmation and PTMs characterization44 to terminal amine isotopic labeling of substrates (TAILS)45. This encompasses the characterization of punctual 27 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

mutations inherent to sequence frameshifts, substitutions and truncations, as well as the identification and sequencing of proteins from databases. This latter application recently led to the exhaustive characterization, with total sequence determination, PTMs identification and cleavage sites localization, of a novel walnut allergen from a transcriptome repository converted to proteins and expressed sequence tags (EST) database46. While our results already demonstrate the adequacy of the MELD methodology to answer punctual queries among today’s challenges in proteomics, we strongly believe that the advent of tomorrow’s technologies will enlarge its field of application towards the characterization of more complex protein samples. In the context where a growing number of studies highlight the partial47, and at time inadequate48, sampling of the protein information offered by tryptic approaches, the transition towards more comprehensive alternatives for differential shotgun proteomics constitutes a key step. The major bottleneck presently impeding the widespread utilization of the MELD resides in the capacity to analyze the highly complex proteolytic mixtures it engenders. While the upstream implementation of specific enrichments, e.g. based on immunoaffinity49, offers readily available solutions to slim them down, we likely foresee future evolutions in LC and MS setups as a way towards novel opportunities. In this context, improvements in the accessible dynamic range for detection and developments of alternative techniques for samples fractionation50 and peptides fragmentation51, together with refinements in the bioinformatics tools concerning the digestion rules, search algorithms and FDR calculations, should greatly contribute in changing the game. Based on this prospective outcome, and capitalizing on the depth of information it furnishes, the MELD methodology appears as a promising alternative to address present and upcoming biomedical challenges. Among others is the identification of proteins mutations occurring in certain types of pathologies such as these affecting amyloidogenic proteins in the context of hereditary amyloidosis52. Tweaking of the proteolytic conditions to achieve adequate 28 ACS Paragon Plus Environment

Page 28 of 37

Page 29 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

peptides sizing may else be exploited to refine epitope mapping data issued from pulldown assays. In other respects, the demonstrated higher count of unique proteolytic peptides is expected to enhance amounts and confidence of identifications in the scope of biomarker discovery studies. In the same vein, the extended sequence coverage might also constitute a valuable step towards the sequencing of the whole proteome with awaited reverberations on cancer medicine, in the wake of what resulted from the whole exome sequencing53. As a last insight, the concept of “digestibility profiles”, linking the intensity of proteolytic peptides with their respective localization inside the protein 3D structure, should benefit from the MELD to become a relevant structural probe used in the mapping of disordered stretches or enzymeresistant regions found in allergens54. CONFLICT OF INTEREST None of the authors declares any conflict of interests. ASSOCIATED CONTENT The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository55 with the dataset identifier PXD009800. Supporting Information. Sections S1 Composition of the eluent, S2 Optimal “two-ratio” combinations, S3 MELD performances achieved on myoglobin, β-lactamase and fetuin, S4 PSM score distributions, S5 Performances of mono-enzymatic workflows relying on chymotrypsin and Glu-C, S6 False positive rate in the matching of de novo candidates, S7 Amplitude of de novo coverage with the MELD, combinatory and tryptic method, S8 Coverage of the N-glycosylated Asn-301 residue from de novo candidates, and supplementary tables S1, S2, S3 and S4 respectively listing peptides issued from the standard MELD protocol, the unconstrained tryptic workflow, the constrained tryptic workflow and the combinatory

29 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

workflow as described in the text. This material is available free of charge via the Internet at http://pubs.acs.org. ACKNOWLEDMENTS The authors thank Lisette Trzpiot and Nancy Rosière (University of Liege) for experimental assays and technical help. Zac Anderson and Wen Zhang from Bioinformatics Solutions Inc. (Waterloo, Canada) are deeply acknowledged for their support, guidance and comments on PEAKS AB software and de novo results. D.M. and D.B. are financed by the FEDER and the Wallonia Region conventions. REFERENCES (1)

Brower, V. Proteomics: Biology in the Post-Genomic Era. Companies All over the World Rush to Lead the Way in the New Post-Genomics Race. EMBO Rep. 2001, 2 (7), 558–560.

(2)

Pandey, A.; Mann, M. Proteomics to Study Genes and Genomes. Nature 2000, 405 (6788), 837–846.

(3)

He, Q. Y.; Chiu, J. F. Proteomics in Biomarker Discovery and Drug Development. J. Cell. Biochem. 2003, 89 (5), 868–886.

(4)

Chan, I. S.; Ginsburg, G. S. Personalized Medicine: Progress and Promise. Annu. Rev. Genomics Hum. Genet. 2011, 12 (1), 217–244.

(5)

Metzger, J.; Schanstra, J. P.; Mischak, H. Capillary Electrophoresis-Mass Spectrometry in Urinary Proteome Analysis: Current Applications and Future Developments. Anal. Bioanal. Chem. 2009, 393 (5), 1431–1442.

(6)

Arpino, P. Combined Liquid Chromatography Mass Spectrometry. Part III. Applications of Thermospray. Mass Spectrom. Rev. 1992, 11 (1), 3–40. 30 ACS Paragon Plus Environment

Page 30 of 37

Page 31 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

(7)

Rajabi, K.; Ashcroft, A. E.; Radford, S. E. Mass Spectrometric Methods to Analyze the Structural Organization of Macromolecular Complexes. Methods 2015, 89, 13–21.

(8)

Han, X.; Aslanian, A.; Yates, J. R. Mass Spectrometry for Proteomics. Curr. Opin. Chem. Biol. 2008, 12 (5), 483–490.

(9)

Switzar, L.; Giera, M.; Niessen, W. M. A. Protein Digestion: An Overview of the Available Techniques and Recent Developments. J. Proteome Res. 2013, 12 (3), 1067– 1077.

(10)

McLafferty, F. W.; Breuker, K.; Jin, M.; Han, X.; Infusini, G.; Jiang, H.; Kong, X.; Begley, T. P. Top-down MS, a Powerful Complement to the High Capabilities of Proteolysis Proteomics. FEBS J. 2007, 274 (24), 6256–6268.

(11)

Zhang, Y.; Fonslow, B. R.; Shan, B.; Baek, M. C.; Yates, J. R. Protein Analysis by Shotgun/Bottom-up Proteomics. Chem. Rev. 2013, 113 (4), 2343–2394.

(12)

Brodbelt, J. S. Ion Activation Methods for Peptides and Proteins. Anal. Chem. 2016, 88 (1), 30–51.

(13)

Sleno, L.; Volmer, D. a. Ion Activation Methods for Tandem Mass Spectrometry. J. Mass Spectrom. 2004, 39, 1091–1112.

(14)

Aebersold, R.; Mann, M. Mass Spectrometry-Based Proteomics. Nature 2003, 422 (6928), 198–207.

(15)

Yates, J. R. The Revolution and Evolution of Shotgun Proteomics for Large-Scale Proteome Analysis. J. Am. Chem. Soc. 2013, 135 (5), 1629–1640.

(16)

Yates, J. R.; Ruse, C. I.; Nakorchevsky, A. Proteomics by Mass Spectrometry: Approaches, Advances, and Applications. Annu. Rev. Biomed. Eng. 2009, 11 (1), 49–79.

31 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(17)

Swaney, D. L.; Wenger, C. D.; Coon, J. D. Value of Using Multiple Proteases for LargeScale Mass Spectrometry-Based Proteomics. J. Proteome Res. 2010, 9 (3), 1323–1329.

(18)

Capelo, J. L.; Carreira, R.; M. Dinizb, L.; Fernandes, M.; Galesio, C.; Lodeiroc, 1, H.M. Santosb, 1, G. Valeb, 1. Overview on Modern Approaches to Speed up Protein Identification Workflows Relying on Enzymatic Cleavage and Mass SpectrometryBased Techniques. Anal. Chim. Acta 2009, 650 (2), 151–159.

(19)

Meyer, B.; Papasotiriou, D. G.; Karas, M. 100% Protein Sequence Coverage: A Modern Form of Surrealism in Proteomics. Amino Acids 2011, 41 (2), 291–310.

(20)

Giansanti, P.; Tsiatsiani, L.; Low, T. Y.; Heck, A. J. R. Six Alternative Proteases for Mass Spectrometry–based Proteomics beyond Trypsin. Nat. Protoc. 2016, 11 (5), 993– 1006.

(21)

Forbes, A. J.; Mazur, M. T.; Patel, H. M.; Walsh, C. T.; Kelleher, N. L. Toward Efficient Analysis of >70 KDa Proteins with 100% Sequence Coverage. Proteomics 2001, 1 (8), 927–933.

(22)

Tsiatsiani, L.; Heck, A. J. R. Proteomics beyond Trypsin. FEBS J. 2015, 282 (14), 2612– 2626.

(23)

Wiśniewski, J. R.; Mann, M. Consecutive Proteolytic Digestion in an Enzyme Reactor Increases Depth of Proteomic and Phosphoproteomic Analysis. Anal. Chem. 2012, 84 (6), 2631–2637.

(24)

Choudhary, G.; Wu, S. L.; Shieh, P.; Hancock, W. S. Multiple Enzymatic Digestion for Enhanced Sequence Coverage of Proteins in Complex Proteomic Mixtures Using Capillary LC with Ion Trap MS/MS. J. Proteome Res. 2003, 2 (1), 59–67.

(25)

MacCoss, M. J.; McDonald, W. H.; Saraf, A.; Sadygov, R.; Clark, J. M.; Tasto, J. J.; 32 ACS Paragon Plus Environment

Page 32 of 37

Page 33 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Gould, K. L.; Wolters, D.; Washburn, M.; Weiss, A.; et al. Shotgun Identification of Protein Modifications from Protein Complexes and Lens Tissue. Proc. Natl. Acad. Sci. 2002, 99 (12), 7900–7905. (26)

Molina, H.; Horn, D. M.; Tang, N.; Mathivanan, S.; Pandey, A. Global Proteomic Profiling of Phosphopeptides Using Electron Transfer Dissociation Tandem Mass Spectrometry. Proc. Natl. Acad. Sci. U. S. A. 2007, 104 (7), 2199–2204.

(27)

Feng, Y.; De Franceschi, G.; Kahraman, A.; Soste, M.; Melnik, A.; Boersema, P. J.; de Laureto, P. P.; Nikolaev, Y.; Oliveira, A. P.; Picotti, P. Global Analysis of Protein Structural Changes in Complex Proteomes. Nat. Biotechnol. 2014, 32 (10), 1036–1044.

(28)

Boutet, E.; Lieberherr, D.; Tognolli, M.; Schneider, M.; Bairoch, A. UniProtKB/SwissProt. Methods Mol. Biol. 2007, 406, 89–112.

(29)

Schilling, B.; Rardin, M. J.; MacLean, B. X.; Zawadzka, A. M.; Frewen, B. E.; Cusack, M. P.; Sorensen, D. J.; Bereman, M. S.; Jing, E.; Wu, C. C.; et al. Platform-Independent and Label-Free Quantitation of Proteomic Data Using MS1 Extracted Ion Chromatograms in Skyline. Mol. Cell. Proteomics 2012, 11 (5), 202–214.

(30)

Gundry, R. L.; White, M. Y.; Murray, C. I.; Kane, L. A.; Fu, Q.; Stanley, B. A.; Eyk, J. E. Van; Manuscript, A.; Su, C.; Yang, F.; et al. Preparation of Proteins and Peptides for Mass Spectrometry Analysis in a Bottom-Up Proteomics Workflow. Curr Protoc Mol Biol. 2009, 77 (5), 342–355.

(31)

Dick Jr, L. W.; Mahon, D.; Qiu, D.; Cheng, K. Peptide Mapping of Therapeutic Monoclonal Antibodies : Improvements for Increased Speed and Fewer Artifacts. J. Chromatogr. B 2009, 877 (3), 230–236.

(32)

Yamada, T.; Choy, E. H.; Panayi, G. S.; Kingsley, G. H. Therapeutic Monoclonal

33 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Antibodies. Keio J. Med. 2011, 60 (2), 37–46. (33)

Zhang, H.; Cui, W.; Gross, M. L. Mass Spectrometry for the Biophysical Characterization of Therapeutic Monoclonal Antibodies. FEBS Lett. 2013, 588 (2), 308– 317.

(34)

du Prel, J.-B.; Hommel, G.; Röhrig, B.; Blettner, M. Confidence Interval or P-Value? Part 4 of a Series on Evaluation of Scientific Publications. Dtsch. Arztebl. Int. 2009, 106 (19), 335–339.

(35)

Keil, B. Proteolysis Data Bank: Specificity of Alpha-Chymotrypsin from Computation of Protein Cleavages. Protein Seq Data Anal 1987, 1 (1), 13–20.

(36)

Higel, F.; Seidl, A.; Sörgel, F.; Friess, W. N-Glycosylation Heterogeneity and the Influence on Structure, Function and Pharmacokinetics of Monoclonal Antibodies and Fc Fusion Proteins. Eur. J. Pharm. Biopharm. 2016, 100, 94–100.

(37)

Dell, A.; Morris, H. R. Glycoprotein Structure Determination by Mass Spectrometry. Science (80-. ). 2001, 291 (5512), 2351–2356.

(38)

Füssl, F.; Cook, K.; Bones, J.; Fitzgerald, O.; Trappe, A.; Scheffler, K. Comprehensive Characterisation of the Heterogeneity of Adalimumab via Charge Variant Analysis Hyphenated On-Line to Native High Resolution Orbitrap Mass Spectrometry. MAbs 2018, 11 (1), 116–128.

(39)

Guthals, A.; Clauser, K. R.; Frank, A. M.; Bandeira, N. Sequencing-Grade De Novo Analysis of MS/MS Triplets (CID/HCD/ETD) From Overlapping Peptides. J. Proteome Res. 2013, 12 (6), 2846–2857.

(40)

Savidor, A.; Barzilay, R.; Elinger, D.; Yarden, Y.; Lindzen, M.; Gabashvili, A.; Adiv Tal, O.; Levin, Y. Database-Independent Protein Sequencing (DiPS) Enables Full34 ACS Paragon Plus Environment

Page 34 of 37

Page 35 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Length de Novo Protein and Antibody Sequence Determination. Mol. Cell. Proteomics 2017, 16 (6), 1151–1161. (41)

Tran, N. H.; Rahman, M. Z.; He, L.; Xin, L.; Shan, B.; Li, M. Complete De Novo Assembly of Monoclonal Antibody Sequences. Sci. Rep. 2016, 6 (8), 31730.

(42)

Mazzucchelli, G.; Zimmerman, T.; Smargiasso, N.; Baiwir, D.; Meuwis, M.-A.; De Pauw, E. De Novo Sequencing Using MELD Proteolysis Coupled to a “Sequence Assembly” Algorithm. In 63rd ASMS Conference Proceedings; 2015.

(43)

Kolarich, D.; Jensen, P. H.; Altmann, F.; Packer, N. H. Determination of Site-Specific Glycan Heterogeneity on Glycoproteins. Nat. Protoc. 2012, 7 (7), 1285–1298.

(44)

Chiavarina, B.; Thiry, M.; Scheijen, J. L.; Hutton, C. A.; Belpomme, D.; Peixoto, P.; Turtoi, A.; Bellahcène, A.; Bianchi, E.; Delvenne, P.; et al. Methylglyoxal, a Glycolysis Side-Product, Induces Hsp90 Glycation and YAP-Mediated Tumor Growth and Metastasis. Elife 2016, 5, e19375.

(45)

Bekhouche, M.; Leduc, C.; Dupont, L.; Janssen, L.; Delolme, F.; Vadon-Le Goff, S.; Smargiasso, N.; Baiwir, D.; Mazzucchelli, G.; Zanella-Cleon, I.; et al. Determination of the Substrate Repertoire of ADAMTS2, 3, and 14 Significantly Broadens Their Functions and Identifies Extracellular Matrix Organization and TGF-β Signaling as Primary Targets. FASEB J. 2016, 30 (5), 1741–1756.

(46)

Dubiela, P.; Kabasser, S.; Smargiasso, N.; Geiselhart, S.; Bublin, M.; Hafner, C.; Mazzucchelli, G.; Hoffmann-Sommergruber, K. Jug r 6 Is the Allergenic Vicilin Present in Walnut Responsible for IgE Cross-Reactivities to Other Tree Nuts and Seeds. Sci. Rep. 2018, 8 (1), 11366.

(47)

Raijmakers, R.; Neerincx, P.; Mohammed, S.; Heck, A. J. R. Cleavage Specificities of

35 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

the Brother and Sister Proteases Lys-C and Lys-N. Chem. Commun. 2010, 46 (46), 8827– 8829. (48)

Beltran, L.; Cutillas, P. R. Advances in Phosphopeptide Enrichment Techniques for Phosphoproteomics. Amino Acids 2012, 43 (3), 1009–1024.

(49)

Guo, A.; Gu, H.; Zhou, J.; Mulhern, D.; Wang, Y.; Lee, K. A.; Yang, V.; Aguiar, M.; Kornhauser, J.; Jia, X.; et al. Immunoaffinity Enrichment and Mass Spectrometry Analysis of Protein Methylation. Mol. Cell. Proteomics 2013, 13 (1), 372–387.

(50)

Swearingen, K. E.; Moritz, R. L. High-Field Asymmetric Waveform Ion Mobility Spectrometry for Mass Spectrometry-Based Proteomics. Expert Rev. Proteomics 2012, 9 (5), 505–517.

(51)

Chalkley, R. J.; Medzihradszky, K. F.; Lynn, A. J.; Baker, P. R.; Burlingame, A. L. Statistical Analysis of Peptide Electron Transfer Dissociation Fragmentation Mass Spectrometry. Anal. Chem. 2010, 82 (2), 579–584.

(52)

Rowczenio, D. M.; Noor, I.; Gillmore, J. D.; Lachmann, H. J.; Whelan, C.; Hawkins, P. N.; Obici, L.; Westermark, P.; Grateau, G.; Wechalekar, A. D. Online Registry for Mutations in Hereditary Amyloidosis Including Nomenclature Recommendations. Hum. Mutat. 2014, 35 (9), E2403–E2412.

(53)

Van Allen, E. M.; Wagle, N.; Stojanov, P.; Perrin, D. L.; Cibulskis, K.; Marlow, S.; JaneValbuena, J.; Friedrich, D. C.; Kryukov, G.; Carter, S. L.; et al. Whole-Exome Sequencing and Clinical Interpretation of Formalin-Fixed, Paraffin-Embedded Tumor Samples to Guide Precision Cancer Medicine. Nat. Med. 2014, 20 (6), 682–688.

(54)

Mazzucchelli, G.; Holzhauser, T.; Cirkovic Velickovic, T.; Diaz-Perales, A.; Molina, E.; Roncada, P.; Rodrigues, P.; Verhoeckx, K.; Hoffmann-Sommergruber, K. Current

36 ACS Paragon Plus Environment

Page 36 of 37

Page 37 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

(Food) Allergenic Risk Assessment: Is It Fit for Novel Foods? Status Quo and Identification of Gaps. Mol. Nutr. Food Res. 2018, 62 (1), 1700278. (55)

Vizcaíno, J. A.; Csordas, A.; Del-Toro, N.; Dianes, J. A.; Griss, J.; Lavidas, I.; Mayer, G.; Perez-Riverol, Y.; Reisinger, F.; Ternent, T.; et al. 2016 Update of the PRIDE Database and Its Related Tools. Nucleic Acids Res. 2016, 44 (D1), D447–D456.

For Table of Contents Only

37 ACS Paragon Plus Environment