The Next-Generation Sequencing for Proteomics? - ACS Publications

May 3, 2019 - through advancements in genome sequencing, separative techni- ques ... consequence, its scope of application has gradually enlarged and ...
0 downloads 0 Views 4MB Size
Article pubs.acs.org/jpr

Cite This: J. Proteome Res. 2019, 18, 2501−2513

Multi-Enzymatic Limited Digestion: The Next-Generation Sequencing for Proteomics? Denis Morsa,†,‡ Dominique Baiwir,†,‡ Raphaël La Rocca,† Tyler A. Zimmerman,†,□ ́ i Longuespeé ,†,○ Marie-Alice Meuwis,†,⊥,# Emeline Hanozin,† Elodie Grifneé ,† Rem Nicolas Smargiasso,† Edwin De Pauw,† and Gabriel Mazzucchelli*,† †

Mass Spectrometry Laboratory, MolSys Research Unit, University of Liege, Liege 4000, Belgium GIGA Proteomics Facility, University of Liege, Liege 4000, Belgium ⊥ Department of Hepato-Gastroenterology and Digestive Oncology, CHU, Liege 4000, Belgium # Laboratory of Translational Gastroenterology, GIGA, Liege 4000, Belgium

Downloaded via BUFFALO STATE on July 19, 2019 at 19:36:36 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.



S Supporting Information *

ABSTRACT: Over the past 40 years, proteomics, generically defined as the field dedicated to the identification and analysis of proteins, has tremendously gained in popularity and potency through advancements in genome sequencing, separative techniques, mass spectrometry, and bioinformatics algorithms. As a consequence, its scope of application has gradually enlarged and diversified to meet specialized topical biomedical subjects. Although the tryptic bottom-up approach is widely regarded as the gold standard for rapid screening of complex samples, its application for precise and confident mapping of protein modifications is often hindered due to partial sequence coverage, poor redundancy in indicative peptides, and lack of method flexibility. We here show how the synergic and time-limited action of a properly diluted mix of multiple enzymes can be exploited in a versatile yet straightforward protocol to alleviate present-day drawbacks. Merging bottom-up and middle-down ideologies, our results highlight broad assemblies of overlapping peptides that enable refined and reliable characterizations of proteins, including variant identification, and their carried modifications, including post-translational modifications, truncations, and cleavages. Beyond this boost in performance, our methodology also offers efficient de novo sequencing capabilities, in view of which we here present a dedicated custom assembly algorithm. KEYWORDS: proteolytic method, limited digestion, protein characterization, PTMs, de novo sequencing, mAbs, digestibility profile, mass spectrometry



elucidation.8 From this angle, two main proteomics approaches were historically developed and classified:9 the top-down approach targeting medium-sized intact protein ions on one hand,10 and the bottom-up approach targeting short sub-4.6kDa peptides issued from proteolytic digestions on the other hand.11 In the second scenario, the resulting peptide mix is typically separated in liquid chromatography (LC) and further subjected to collision-mediated (CID) or electron-mediated (ETD, ECD) dissociation12 within the scope of tandem MS data acquisition.13 These activation techniques lead to predictable fragmentation patterns that are readily confrontable with in silico simulations and database searching for identification purposes.14 Although its efficiency and sensitivity contribute to settle the tryptic bottom-up approach as the gold standard for the screening of complex protein samples in the

INTRODUCTION The beginning of the 2000s marks the entry into the so-called postgenomics era and the advent of new “omics” fields aiming to an always-improving comprehension of the human biochemical machinery.1 Positioned as a major contributor among these emerging disciplines, proteomics refers to the systematic identification and characterization of proteins, including their intrinsic modifications, conformations, interactions, abundances, activities, and cellular localizations.2 As such, it lies as a constitutive part among leading topics in the modern research area, namely within the scope of biomarker discovery and drug development, 3 both bringing the emergence of the biopharmaceutical field toward “personalized medicine” applications.4 Capitalizing on its high sensitivity and specificity as well as interfacing possibilities with both prefractionation techniques5,6 and alternative structural methods,7 mass spectrometry (MS) has emerged as an essential tool on the road to protein © 2019 American Chemical Society

Received: January 18, 2019 Published: May 3, 2019 2501

DOI: 10.1021/acs.jproteome.9b00044 J. Proteome Res. 2019, 18, 2501−2513

Article

Journal of Proteome Research

(DTT) 10 mM final at 60 °C for 1 h, (b) alkylation in iodoacetamide (IAA) 20 mM final at room temperature for 30 min, and (c) quenching of the alkylation reaction by addition of DTT 11 mM to reach 21 mM final. The so-reduced and alkylated samples were purified by precipitation using the 2D cleanup kit commercialized by GE Healthcare (Chicago, IL, United States). They were subsequently reconstituted to 0.5 mg/mL in 50 mM ammonium bicarbonate.

framework of shotgun proteomics,15 major limitations often occur when targeting refined sequence characterizations on pure to semicomplex samples. These limitations include sequence-dependent digestion efficiencies, noncomprehensive data sets carried by the oversampling of the most abundant peptides, and the need of uniform cleavage site distributions within the protein sequence.16 In turn, these aspects may lead to unsatisfactory sequence coverage, missed or biased characterization of modifications such as post-translational modifications (PTMs), and deficient isoforms discrimination, among others.17 In this context, considerable efforts have been spent to develop alternatives to the classic bottom-up strategy over the past two decades,18,19 among which a prevalent trend involves the utilization of enzymes other than trypsin.20 These substitutes may be selected based on their high single-residue selectivity in view of generating larger 3-kDa to 15-kDa peptides and slimming down the proteolytic mix complexity, thus establishing the foundations of the middle-down proteomics approach.21 Another option lies in multienzymatic schemes issued from the combination of distinct monoenzymatic steps encompassed within parallel,17,22 or more rarely consecutive,23 workflows. In this way, higher numbers of proteolytic peptides and better sequence coverages were reported,24 in turn improving the characterization of PTMs,25 especially for phosphopeptides.26 These experiments are still time-consuming and, as they involve prolonged reaction times with high enzyme concentrations, complete protein digestions yielding inherent low missed-cleavage counts are often achieved. The redundancy in strategic peptides covering PTMs or mutation zones of interest is therefore most of the time only marginally improved over tryptic workflows. We here describe an innovative approach merging both the bottom-up and middle-down ideologies to provide a fast, robust, and refined solution for current proteomics challenges targeting the characterization of low-to-medium complexity samples. The methodology, referred to as MELD for MultiEnzymatic Limited Digestion, capitalizes on a single 2 h-long proteolytic step involving the synergic action of a diluted mix of several enzymes. While the multienzymatic benefits are retained, the lower enzyme concentration and reaction time induce a limitation in the proteolytic reaction. This feature engenders numerous missed cleavage events27 and results in various peptide products of diverse length with overlapping stretches of residues. This complex outcome eventually increases the extent of and confidence in the sequence coverage, in turn enabling refined protein characterization and efficient database-free de novo sequencing.



Enzymatic Digestion

Trypsin, Glu-C, and chymotrypsin proteases were purchased from ThermoFisher Scientific (Waltham, MA, United States) as 20 μg MS-grade lyophilized powders. They were reconstituted to 1 mg/mL on ice using respectively HCl 10 mM, ultrapure H2O Milli-Q, and HCl 1 mM. 1. Monoenzymatic Digestion. Each pure enzyme solution was added to a 20 μg protein fraction to reach a final protease-to-protein ratio of 1:50 (w:w), according to manufacturer’s recommendations. The tubes were incubated at 37 °C for 16 h under stirring at 600 rpm (Thermomixer comfort, Eppendorf, Hamburg, Germany) before the digestion was stopped using trifluoroacetic acid (TFA) at 0.5% (v:v) final concentration to reach pH < 3. The resulting samples were evaporated to dryness under vacuum and reconstituted to 1 pmol/1 μL in H2O with 0.1% TFA for injection. 2. Combinatory Multienzymatic Digestion. The monoenzymatic digests achieved from the independent action of each three enzymes (trypsin, Glu-C, chymotrypsin) following the above-detailed protocol were pooled in equiproportion before injection. 3. Synergic Multienzymatic and Limited Digestion (MELD). Two multienzyme mixtures combining trypsin, GluC, and chymotrypsin were prepared on ice immediately before use by mixing the pure 1 mg/mL solutions in a ratio of 1.00:1.00:1.55 (v:v:v), respectively. The high-ratio mixture was used as such. The low-ratio mixture was obtained by a 9-fold dilution of the former using 25 mM NH4HCO3. Two distinct digestions were performed by respectively adding a same volume of the so-prepared mixtures to two 20 μg protein fractions to obtain final protease-to-protein ratios of 1:85, 1:85, and 1:55 on one hand and 1:750, 1:750, and 1:500 on the other hand for trypsin, Glu-C, and chymotrypsin, respectively. CaCl2 at 5 mM final concentration was also added. Each tube was incubated for 2 h at 37 °C under stirring at 600 rpm (Thermomixer Comfort, Eppendorf, Hamburg, Germany) before the digestion was stopped using TFA at 0.5% (v:v) final concentration to reach pH < 3. Equal amounts of both digests were subsequently pooled. The resulting mixture was evaporated to dryness under vacuum and reconstituted to 15 pmol/9 μL in H2O with 0.1% TFA for injection. Liquid Chromatography−Mass Spectrometry Analysis

MATERIALS AND METHODS

The chromatographic separation was performed on a M-Class ACQUITY UPLC (Waters, Milford, MA, United States). A 3 min-long sample trapping step was first achieved on a reversedphase (RP) ACQUITY UPLC M-Class Trap Column (Symmetry C18, 100 Å, 5 μm, 180 μm × 20 mm, Waters, Milford, MA, United States) prior to releasing on a ACQUITY UPLC M-Class HSS T3 C18 analytical column (100 Å, 1.8 μm spherical silica, 75 μm × 250 mm, Waters, Milford, MA, United States). Water and acetonitrile both supplemented with 0.1% (v:v) of formic acid (FA) were used as eluents and mixed according to a 177 min-long gradient method (Supporting Information, Section S1). The flow rate was set at 0.6 μL/min.

Sample Preparation

Bovine serum albumin (BSA) was purchased as a 2 mg/mL standard protein solution packaged in sealed ampules from Sigma-Aldrich (Darmstadt, Germany). Adalimumab antibody was purchased as a 40 mg/0.8 mL Humira solution (AbbVie, Rungis) encapsulated in a subcutaneous injection pen and diluted to 2 mg/mL using NH4HCO3 50 mM. Both compounds were aliquoted as 20 μg proteins fractions and further processed for cysteine reduction and alkylation following a three-step reaction scheme performed in 50 mM ammonium bicarbonate: (a) reduction with dithiothreitol 2502

DOI: 10.1021/acs.jproteome.9b00044 J. Proteome Res. 2019, 18, 2501−2513

Article

Journal of Proteome Research

Figure 1. In silico digestion performances monitored on human serum albumin (HSA) based on (a) the number of generated unique peptides, (b) the sequence coverage, and (c) the mean occurrence per residue, as functions of the number of theoretically allowed missed cleavages and considering an LC-MS compliant mass interval between 500 and 5000 Da. Three approaches were considered: a monoenzymatic tryptic digestion; a combined multienzymatic digestion relying on trypsin, Glu-C, and chymotrypsin; and a synergic multienzymatic digestion relying on trypsin, GluC, and chymotrypsin.

Database Search

The mass detection was performed on an online hyphenated Q-Exactive Plus Hybrid Quadrupole-Orbitrap mass spectrometer (ThermoFisher Scientific, Waltham, MA, United States) operated in data-dependent positive acquisition mode and calibrated using Pierce LTQ ESI positive ion calibration solution (ThermoFisher Scientific, Waltham, MA, United States). Source parameters were set as follows: spray voltage = 2.3 kV, capillary temperature = 270 °C, and S-lens RF level = 50. The MS spectra were acquired with 70 000 mass resolution (m/z 200) from m/z 400 to m/z 1600, AGC target of 1 × 106, and lock mass internal calibration at m/z 445.12003. The MS/ MS spectra were acquired for the 12 most intense ions of each MS scan (TopN = 12) with 17 500 mass resolution (m/z = 200), an isolation window of m/z 2, automatic gain control (AGC) target of 1 × 105, maximum injection time of 50 ms, and (N)CE = 28. Single-charged ions exclusion and a 10 s dynamic exclusion were used. The optimization of the standard MELD protocol was performed on a Dionex UltiMate 3000 HPLC (ThermoFisher Scientific, Waltham, MA, United States) with an Esquire HCT Ultra ion trap mass spectrometer (Bruker Daltonics, Bremen, Germany).

The database search was performed in PEAKS Studio Software v7.5 (Bioinformatics Solutions Inc., Waterloo, ON, Canada) on the adalimumab sequence backgrounded with the SWISSPROT database28 restricted to the bovine taxonomy. Tolerances of 5 ppm and 0.02 Da were allowed on the precursor and fragment masses, respectively. Carbamidomethylation of cysteine was set as a fixed modification. Deamidation of asparagine and glutamine, oxidation of methionine and N-glycosylation of asparagine with G0F, G1F and G2F fucosylated glycans were set as variable modifications. A maximum of 5 PTMs per peptide was allowed. Except otherwise stated, no-enzyme rule was used. To ensure high quality MS/MS sequencing matches, a filtering for a false discovery rate (FDR) < 0.1% was applied. De Novo Sequencing and Assembly

The de novo sequencing and assembly were performed in PEAKS AB v1.0 (“sequence validation job”) and in PEAKS AB v2.0 (“protein sequencing job”) using identical mass tolerances, fixed and variable PTMs, and maximum PTM counts as described above. 2503

DOI: 10.1021/acs.jproteome.9b00044 J. Proteome Res. 2019, 18, 2501−2513

Article

Journal of Proteome Research

Figure 2. Evaluation of the tryptic digestion efficiency on BSA performed with enzyme to substrate ratios ranging from 1:100 to 1:10 000 (w:w). Outcomes were analyzed with (a) 1D gel electrophoresis to monitor proteolytic peptide length distribution as well as with nano-HPLC-MS/MS to quantify (b) the unique proteolytic peptide count, the sequence coverage, the mean residue occurrence, and (c) the missed cleavage pattern distribution. With the objective to enhance the proteolysis versatility, digest mixes issued from the combination of all ratios as well as of the 1:100 and 1:1000 ratios were considered and confronted to the single-ratio optimum (1:500) with respect to (d) the proteolysis peptides mass distribution and (e−f) the identity overlapping.

De Novo Peptide Candidates

Data Analysis

Candidate lists used as input for the Sequence Assembly algorithm were generated in PEAKS Studio Software v7.5 (Bioinformatics Solution Inc., Waterloo, ON, Canada) and initially extracted without local confidence (LC) score filtering.

Venn distributions were calculated on the Bioinformatics and Evolutionary Genomics portal of the Vlaams Instituut voor Biotechnologie (Ghent, Belgium). Chromatographic peak integration was performed in Skyline v.4.029 (MacCoss Lab

In Silico Digestion

Software, University of Washington, WA, United States).

The digestion was simulated on human serum albumin (HSA) using SequenceEditor BioTools v3.2 (Bruker Daltonics, Bremen, Germany). The cleavage rules were set as follows: K and R for trypsin, E for Glu-C, and F,L,M,W,Y for chymotrypsin. Peptide lists were further refined based on their masses considering an LC-MS compliant interval between 500 and 5000 Da.

Sequence Assembly Algorithm

Java programming language was used to write the de novo assembly algorithm. The code is available free of charge on the MSLab software webpage at http://www.mslab.ulg.ac.be/ software. 2504

DOI: 10.1021/acs.jproteome.9b00044 J. Proteome Res. 2019, 18, 2501−2513

Article

Journal of Proteome Research



RESULTS

occurrence (Figure 2b), and the missed-cleavage pattern diversity (Figure 2c). Results are illustrated for trypsin. The 1D gel electrophoresis displays an expected progressive transition from a low molecular weight (MW) smear toward high MW peptides bands as the enzyme-to-substrate ratio is decreased from 1:100 to 1:10 000. This adjustment reflects a gradual rise in the diversity and maximum reachable size of proteolytic peptides when the probability of missed cleavage is increased through a limitation in the proteolytic reaction. Moving to LC-MS/MS experiments, this modulation results in the presence of an optimum in the identification efficiency that is localized within the 1:200 to 1:750 ratios. Considering other dilutions, both the unique peptide count and the sequence coverage get gradually worse as an increasing fraction of the produced peptides become incompatible with the analysis, being either too short-length at high ratios or too long-length at low ratios. While an acceptable sequence coverage of 90.2% is already achieved with the 1:750 single-ratio digestion, drastic gains in the peptide counts and mean residue coverage are obtainable considering multiple-ratio combinations. This strategy offers both a greater diversity in the proteolytic peptide lengths, therefore increasing the probability of overlapping segments required for de novo assembly, and an amplified versatility, thus universalizing the digestion protocol regardless of the protein sequences and properties. The best efficiency monitored for the experimentally unviable combination of all 1:100 to 1:10 000 ratios may be efficiently approached by mixing two digests arising from conditions localized on either side of the single-ratio optimum. In this context, the combination of the 1:100 and 1:1000 ratios was found as the best compromise for trypsin. The so-generated proteolytic set covers a wide mass distribution (Figure 2d), overlaps most of the peptides generated at the single-ratio optimum (Figure 2e), and accounts for a 76% (86/113) sampling of the complete set issued from the all-ratio mix (Figure 2f). Similar tunings were performed for the other five precited enzymes, each leading to their own optimal two-ratio combination (Supporting Information, Section S2). Compromising between respective digestion efficiencies and diversities in the cleavage sites offered by each enzyme and capitalizing on the two-ratio strategy, the standard MELD protocol is eventually set up as a combination of two independent digestions, both performed with the three same enzymes working synergistically: trypsin, Glu-C, and chymotrypsin. After further refinements carried on proteins with different digestibility and size properties, i.e. myoglobin, βlactamase, and fetuin (Supporting Information, Section S3), the high-ratio or concentrated enzymes mix is fixed at 1:85, 1:85 and 1:55 ratios, whereas the low-ratio or diluted enzymes mix is fixed at 1:750, 1:750 and 1:500 ratios, for the three enzymes in the precited order, respectively. Applied to HSA in a triplicate run, we achieved 100% sequence coverage with 1287 ± 127 unique peptides bearing from 0 to 30 missed cleavages. An average count of 9.9 ± 0.4 missed cleavage events per peptide was monitored, with respective contributions of 2.1 ± 0.2, 2.7 ± 0.1, and 5.1 ± 0.1 for trypsin, Glu-C, and chymotrypsin.

Theorizing on Multienzymatic and Limited Digestions

The outcome of diverse proteolytic digestions carried on human serum albumin (HSA) was simulated in silico with increasing numbers of allowed missed cleavages. Three distinct approaches were considered: (a) a monoenzymatic tryptic digestion, (b) a multienzymatic mix issued from the combination of parallel monoenzymatic digestions independently performed with trypsin, glutamyl endopeptidase (Glu-C), and chymotrypsin, and (c) a multienzymatic digestion issued from the synergic action of the three aforementioned enzymes pooled together. Results were assessed based on the number of unique proteolytic peptides (Figure 1a), the protein sequence coverage (Figure 1b), and the mean residue occurrence (Figure 1c), defined as the average number of peptides encompassing a given residue among the protein sequence. Protocols employed in state-of-the-art monoenzymatic20,30 and combinatory multienzymatic methodologies22 commonly rely on long reaction times and high enzyme-to-substrate ratios. As such, they favor low missed cleavage counts and are thus embodied by conditions localized in the left part of the graphs. Within this section, both these methods are associated with already extensive sequence coverages but display limited numbers of proteolytic peptides and low mean residue occurrence. Practically considered, these last aspects may result in missing information, therefore impeding exhaustive protein characterization, and incomplete peptides connectivity, yet a required feature for de novo sequencing. Gradually raising the possibility of missed cleavage events reflects a progressive limitation of the enzymatic digestion and a shift toward conditions localized on the right side of the graphs. Within this section, the number of unique peptides and the mean residue occurrence both drastically increase. This is chiefly observed for the synergic multienzymatic approach which, in those respects, proves to be the most efficient of the three simulated strategies. Summed together, these aspects theoretically emphasize opportunities of improvement in protein characterization using the MELD methodology. Setting Up the Conditions of the Standard MELD Protocol

This section aims to provide a standard MELD protocol that constitutes a methodological basis readily applicable for the mainstream characterization of a wide variety of protein samples. It is effectively tweakable based on the nature and concentrations of enzymes in view of shaping the proteolytic peptides mix for specific applications or further enhancing compatibility with specific substrates, as longer debated in the discussion section. The protocol optimization was instigated from a collection of monoenzymatic proteolytic data sets acquired after the timelimited digestion of bovine serum albumin (BSA) using each of six different enzymes: trypsin, chymotrypsin, Glu-C, Asp-N, Arg-C, and Lys-C. In this perspective, 8 distinct enzyme-toprotein ratios ranging from 1:100 to 1:10 000 (w:w) were systematically evaluated, and a fixed 2-h reaction time was considered to ensure adequate reaction reproducibility while minimizing time-dependent artifacts.31 The digests were examined through 1D polyacrylamide gel electrophoresis to visually assess both the digestion yield and the proteolytic peptide size distribution (Figure 2a). Nano-HPLC-MS/MS analysis was also performed to assess the unique proteolytic peptide count, the sequence coverage, the mean residue

Validating and Benchmarking the MELD Approach on an Antibody Sequence

Starting from the advent of hybridoma technologies, monoclonal antibodies (mAbs) have rapidly raised tremendous 2505

DOI: 10.1021/acs.jproteome.9b00044 J. Proteome Res. 2019, 18, 2501−2513

Article

Journal of Proteome Research

Figure 3. Qualitative reproducibility, identification, quantification, and coverage performances monitored from the different proteolytic methodologies: the standard MELD protocol, the monoenzymatic strategy following a combinatory workflow, the monoenzymatic tryptic strategy with constrained database search, and the monoenzymatic tryptic strategy with unconstrained database search. Results are established from triplicate. (a) Distribution of the number of unique peptides according to their presence or absence in the proteolytic populations generated for each replicate of a given methodology. The corresponding relative proportions of unique peptides found in the three replicates, in two of the three replicates, and in one single replicate of each methodology are displayed in the second line. (b) Performances regarding the identification-related attributes: the unique peptide count, the sequence coverage, the mean residue occurrence, and the mean peptide confidence score, and regarding the quantitative-related attributes: the averaged absolute peptide intensity normalized to 1 pmole and its mean variability over triplicate. Amplitude in the residue coverage across the sequence of the (c) heavy adalimumab chain and (d) light adalimumab chain. (e) Distribution of the number of unique peptides according to their presence or absence in the proteolytic populations generated for each four experienced proteolytic methodologies after merging their three respective replicates. (f) Distribution of the proteolytic population associated with each methodology as a function of their intensity gauged from their respective chromatographic peak area.

interest in the scope of immunotherapy32 and are currently the focus of sustained characterization efforts from the proteomics community.33 In this context, we here used adalimumab, a therapeutic ∼144 kDa human immunoglobulin G IgG1

involved in the regulation of diverse inflammatory processes, as a candidate molecule to gauge the efficacy of the standard MELD protocol in comparison to the well-established monoenzymatic strategies encompassed in individual and 2506

DOI: 10.1021/acs.jproteome.9b00044 J. Proteome Res. 2019, 18, 2501−2513

Article

Journal of Proteome Research

Figure 4. Coverage of specific modifications of adalimumab monitored from the number and redundancy of indicative unique peptides achieved using different proteolytic methodologies: the standard MELD protocol, the monoenzymatic strategy following a combinatory workflow, and the monoenzymatic tryptic strategy with both constrained and unconstrained database search. Different types of modifications are investigated: (a) the N-glycosylation of Asn-301 by fucosylated glycans G0F and G1F, (b) the dehydration consecutive to succinimidation of Asp-284, (c) the deamidation of Asn-329, (d) the C-terminal truncation of Lys-451, and (e) the cleavage occurring between His-228 and Thr-229. Only peptides showing the modifications are reported together with their redundancy monitored from their appearance in each replicate.

peptide, reported as −10log10 (P-value),34 are respectively equal to 55 ± 7, 52 ± 7, 95 ± 16, and 54 ± 6, a score of 20 being often qualified as of relatively high confidence (Supporting Information, Section S4). Although these results emphasize the expected best reproducibility and confidence achieved for the gold-rule tryptic workflow, they also validate the adequate robustness and reproducibility inherent to the MELD methodology. Indeed, despite the complexified peptide mixtures awaited from our strategy, its reproducibility performance is on par with both the single and combinatory monoenzymatic strategies when the “no-enzyme” rule is used for database searching. Heading to identification-related attributes, our results highlight 100% sequence coverage for both the heavy and light chains when exploiting the standard MELD protocol. Slightly lower performances were monitored with the tryptic and combinatory approaches. Considering a 6.5% coefficient of variation (CV) from triplicates, 949 unique peptides were obtained using MELD with respective contribution of 628 and 321 for the heavy and light chains. These numbers correspond to a 1.8-fold increase from the combinatory methodology and to a 2-to-6-fold increase from the unconstrained monoenzymatic methodologies, depending on the enzyme nature. This raise may be imparted to both the synergic action of enzymes, diversifying the cleavage sites, and to the limitation of the

combinatory workflows.30 To this end, the peptide mixes issued from each tested proteolysis were individually analyzed by LC-MS/MS, and the generated spectra were searched against the adalimumab sequence, both with and without enzyme-specific cleavage rule compliance to assess the contribution of nonspecific cleavages (Supporting Information, Tables S1−S4). The respective performances of each strategy were evaluated over triplicate experiments in regards to the digestion reproducibility (Figure 3a), of identification-related attributes, i.e. the unique peptide count, the sequence coverage, the mean residue occurrence, and the mean peptide confidence score, and of quantitative-related attributes, i.e. the averaged absolute peptide intensity and its variability (Figure 3b) as well as of the distribution in the residue coverage across the sequences of both heavy (Figure 3c) and light (Figure 3d) chains. The qualitative digestion reproducibility inherent to each methodology was gauged from their three respective digest replicates based on the amount of common and singly found peptide identities. Calculated over the respective total populations, the common peptides account for 60, 56, 65, and 65%, while the peptides only found in one replicate account for 24, 27, 11, and 20% for the MELD, combinatory, constrained tryptic, and unconstrained tryptic methodologies, respectively. Additionally, the mean confidence score per 2507

DOI: 10.1021/acs.jproteome.9b00044 J. Proteome Res. 2019, 18, 2501−2513

Article

Journal of Proteome Research digestion, favoring missed cleavages. This last idea is first supported by the fact that 49% of the complete population of proteolytic peptides resulting from the combination of all four strategies are solely found with MELD proteolysis, while this contribution goes down to 14% for the combinatory, 6% for the unconstrained tryptic, and less than 1% for the constrained tryptic workflows (Figure 3e). It is else further reinforced by an average of 6.1 missed cleavage events per peptide monitored with MELD against 4.3 with the combinatory strategy. These discrepancies also rationalize the higher 30 ± 3 mean occurrences per residue recorded using MELD with respective contributions of 28 ± 3 and 35 ± 3 for the heavy and light chains joined with an extensive coverage distribution across the entire sequences. Benchmarked with experienced alternatives, these data represent an improvement of above 190% over the combinatory method and from 280 to 800% over the individual monoenzymatic methods. In parallel, results acquired using monoenzymatic workflows highlight substantial gains in the unique peptide counts on the order of 100% with trypsin and 180% and 300% for Glu-C and chymotrypsin, respectively (Supporting Information, Section S5) as well as in the mean residue occurrence when the database search is performed without specific cleavage rules. Further corroborating the already-documented aspecificity of chymotrypsin,35 these data support the idea that the identification accuracy and versatility imparted from monoenzymatic strategies would strongly benefit from unconstrained rules for database searching. This option yet requires sufficient spectral resolution and bioinformatics adjustments regarding the database search algorithms, FDR filtering, and results scoring. Finally, focusing on quantitative-related attributes, the averaged absolute peptide intensity was found equal to 9.6 × 109 when injecting 15 pmoles of MELD-digested adalimumab. This corresponds to 6.4 × 108 per pmole of digested protein, denotating a decrease of less than 1 order of magnitude from the combinatory strategy evaluated at 2.9 × 109 and of slightly more than 1 order of magnitude from the tryptic strategies respectively evaluated at 4.4 × 1010 and 1.4 × 1010 after constrained and unconstrained database searches. While these differences might foreshadow possible hindrances concerning the detection and fragmentation of MELD-generated peptides on low-end instruments, their root cause still witnesses of a benefit. Indeed, the oversampling of the most abundant peptides when using trypsin induces a distribution in peptide intensity that is largely bimodal, centered on 1 × 107 and 1 × 1012 (Figure 3f). On the contrary, the distribution achieved using MELD is much more uniform and related to an ideal Gaussian-type function centered on 1 × 109, indicating a more efficient sampling of the proteolytic population. The mean variability in peptide intensity over triplicate is of 49 ± 17% for MELD, which is on the same order as the 44 ± 17% obtained with the combinatory methodology. This result yet shows that the MELD methodology is less adequate than tryptic workflows, associated with a variability on the order of 20%, for relative batch-to-batch quantification purposes. Still, a variation in peptide intensity of at least 2 orders of magnitude that would be observed using our methodology can safely be considered as relevant.

Looking Deeper into the Characterization of PTMs, Truncations, and Cleavage Sites

Built around a tetrameric protein assembly, mAbs exhibit a variable domain devoted to antigen recognition and a constant N-glycosylated domain responsible for receptor binding. The production-inherent diversity in the glycosylation patterns was reported to drastically influence the effector function and pharmacokinetics of therapeutic mAbs.36 As such, it constitutes a mandatory but still challenging attribute to characterize by bottom-up MS-based proteomics.37 Pursuing our study of adalimumab, we here compared the efficiency and extent in the coverage of the N-glycosylated Asn-301 residue using the standard MELD protocol and both the tryptic and combinatory monoenzymatic strategies (Figure 4a). Additionally, specific modifications reported for adalimumab38 were also monitored following the same rationale: the succinimidation of Asp-284 by reaction with Gly-285 that results in dehydration (Figure 4b), the preferential deamidation of Asn329 (Figure 4c), the C-terminal truncation of Lys-451 (Figure 4d), and the cleavage occurring between His-228 and Thr-229 (Figure 4e). Our results show that MELD is successful in covering all listed PTMs and mutated zones of interest with the highest count and redundancy in indicative unique peptides. First focusing on N-glycosylation, MELD enables the confident sequencing of 9 unique peptides covering the G0F-modified Asn-301 residue, among which 4 are found in all three replicates, and of 5 unique peptides covering the G1F-modified Asn-301 residue, among which 2 are found in all three replicates. In terms of numbers, these data highlight a threefold increase in the coverage of the main G0F N-glycoform from the tryptic methodologies, considering both constrained and unconstrained database searches. In terms of specificity, MELD is the only methodology that enables the confident characterization of the less abundant G1F N-glycoform.38 The dehydration of Asp-284 and the deamidation of Asn-329 are respectively supported by three unique peptides (one found in all replicates) and by six unique peptides (four found in all replicates) using MELD, while only one candidate is found at both sites when using the tryptic methodologies. Heading to the C-terminal truncation of Lys-451, MELD provides eight highly redundant unique peptides covering the modification. These are detected as matching pairs showing either intact or truncated C-terminus, therefore confidently evidencing the existence of both forms. On the other hand, the cleavage occurring between His-228 and Thr-229 is supported by three unique peptides whose N-terminus starts at Thr-229. Both the truncation and the cleavage events are left uncovered using the tryptic methodologies. Only the combinatory approach provides results on par with MELD when considering the deamidation of Asn-329 and the cleavage at His-228. Capitalizing on the MELD Protocol for de Novo Sequencing

The extensive network of overlapping peptides added to the large quantity of sequence-inherent daughter ions offered by MELD make our methodology an ideal and easily implementable alternative in view of de novo sequencing. While efficient strategies relying on combinatory proteolytic workflows with various activation techniques39 or on acid hydrolysis coupled with dedicated assembly algorithms40 have been reported, our method has the practical benefits to be easily implementable 2508

DOI: 10.1021/acs.jproteome.9b00044 J. Proteome Res. 2019, 18, 2501−2513

Article

Journal of Proteome Research

Figure 5. Performances of tryptic, combinatory, and MELD protocols exploited for de novo assembly. (a) Theoretical sequence coverage generated from de novo candidate lists as furnished by PEAKS Studio for k-mer length ranging from 3 to 8 and (k − 1) overlapping. (b) Correct sequence coverage, average confidence score related to residue assignment, number of residues with confidence score >85, proportion of exploitable MS2 data considering an ALC score above 50, sequence-supporting peptides, and PSMs as monitored form PEAKS AB v2.0 software.

associated with a confidence score higher than 85. That constitutes better performances in regard to these achieved with the combinatory methodology which allows to correctly sequence 97% of the total sequence with a lower confidence score of 84 and with 85% of sequences residues above the confidence threshold of 85. Additionally, the number of supporting peptides is also higher using MELD with a gain of about 30% from the combinatory methodology. Focusing on N-glycosylation, 12 peptides were sequenced with the correct modification on the Asn-301 residue, compared to 8 with trypsin and only 1 with the combinatory methodology (Supporting Information, Section S8). In an effort to export de novo assembly possibilities on all types of proteins, we are currently designing a dedicated “Sequence Assembly” algorithm operating from de novo candidate lists generated in PEAKS Studio (Bioinformatics Solutions Inc., Waterloo, Canada) and taking advantage of the specific features offered by the MELD methodology. Its rationale is based on three successive steps, i.e. the initiation occurring from the selection of a seed sequence, the elongation of the sequence in both C-terminus and N-terminus directions, and the termination (Figure 6). The initiation step starts with the establishment of a list of 5AA-long seed sequence candidates selected based on a minimum confidence score threshold, as furnished by the PEAKS Studio software. Practically, a threshold of 70 was found as an ideal compromise between extensive sampling and low false positive rate. The redundancy of each seed sequence is then calculated, as well as a global confidence score after averaging over all instances of a given seed sequence. Seed sequences associated with the highest redundancies (>30) and global confidence scores are used in priority. Next, for a given selected seed sequence, the last triplet of AA localized at the Cterminus is searched against the sequences of all the de novo

on CID-equipped instruments, avoid exogenous chemical protein modifications, and minimize time-dependent artifacts. As a theoretical proof of concept, we here established the lists of de novo peptide candidates issued from the triplicated adalimumab digests generated by the different methodologies, aligned each respective nominee on the whole protein sequence, and enumerated those whose identity match a 3to 8-successive residue-long k-mer stretch. The so-defined target window was panned by a 1-AA step across the complete protein sequence to evaluate the coverage amplitude offered by each methodology (Figure 5a). Our results show that the MELD methodology offers superior performance compared to the combinatory methodology when the k-mer length increases, which ties in a more stringent filtering of de novo candidates and a lower false positive rate (Supporting Information, Section S6). Considering a k-mer length of 6 or 7 as optimal for accurate sequence assembly,41 MELD theoretically enables 90 ± 1% sequence coverage while the combinatory and tryptic strategies only sample 78 ± 4% and 67 ± 1% of the sequence (Supporting Information, Section S7). The digest raw files were further processed using PEAKS AB Software (Bioinformatics Solutions Inc., Waterloo, Canada) which offers an integrated solution for de novo assembly of monoclonal antibody sequences. Results achieved with the different methodologies were compared in terms of correct sequence coverage, confidence score related to residue assignment, number of sequence-supporting peptides and peptide-spectrum matches PSMs, and proportion of exploitable MS2 data considering an average local confidence (ALC) score above 50 (Figure 5b). Results achieved from the MELD methodology enables to correctly sequence 99% of the complete adalimumab sequence with an averaged confidence score of 89 out of 100 and with 92% of sequenced residues 2509

DOI: 10.1021/acs.jproteome.9b00044 J. Proteome Res. 2019, 18, 2501−2513

Article

Journal of Proteome Research

Figure 6. Design of the Sequence Assembly algorithm devoted to the de novo assembly of MELD-generated data. Lists of de novo peptide candidates issued from PEAKS Studio software are used as inputs. The algorithm starts with the selection of eligible 5-AA-long seed sequences based on their respective redundancies and global confidence scores. This sequence is next elongated in the C-terminus direction through an alignment of the last triplet of AA on all peptide candidates. The AA to concatenate is selected based on hierarchy of rules involving three criteria: the occurrence count, the global confidence score, and the CountCorrect score. The process is pursued iteratively and followed by similar Nterminus elongation. The final concatenated sequence tag along with the average confidence and occurrence scores of each constitutive amino acid are eventually reported in the output file.

correct 96-amino-acid-long tag localized in the first portion of the heavy chain sequence.42

candidates. The first neighboring amino acid position is evaluated to find out the four most occurring AA candidates in this position. After selection, two values are calculated for these four candidates: (a) the global confidence score averaged from all their respective instances and (b) the CountCorrect score which corresponds to a stricter evaluation of the occurrence encompassing the C-terminus quadruplet of AA from the seed sequence instead of the initial triplet. On the basis of these data, i.e. occurrence, confidence, and CountCorrect, a hierarchy of rules is used to select the most probable candidates to concatenate onto the growth sequence. This hierarchy is conducted based on three criteria of decreasing priority: (i) if the occurrence count of either of the four candidates is greater than 10, the amino acid to concatenate is selected as the one with the highest product from multiplication of the occurrence count by the CountCorrect score; (ii) if one candidate has both a CountCorrect score and an occurrence count respectively higher by at least two and five points than the next best candidate, the amino acid to concatenate is selected as the former; (iii) if the confidence score of one candidate is higher than 200, the amino acid to concatenate is selected as the one with the highest product from multiplication of the global confidence score by the CountCorrect score. The seed sequence is extended by iteration following steps of one AA in both C-terminus and N-terminus until no further valid match is found. The output file reports, for each unique tested seed sequence, the concatenated sequence tag along with the average confidence and occurrence scores of each constitutive amino acid. Using this algorithm on the triplicate adalimumab MELD digests, we were able to cover, respectively, 64 ± 1% and 59 ± 1% of the heavy and light chains sequences and to obtain a



DISCUSSION The synergic and limited proteolytic methodology here described, abbreviated MELD for multienzymatic limited digestion, present numerous advantages over alternative bottom-up strategies relying on monoenzymatic workflows: (i) it generates higher unique peptide count and mean residue occurrence, in turn potentially inducing improved sequence coverage and characterization accuracy; (ii) it enables confident PTMs identification and localization, namely within the challenging field of N-glycosylation in mAbs which classically requires alternative fragmentation strategies to CID;43 (iii) through the two-ratio approach, it offers greater peptides diversity and amplified operation versatility; (iv) it engenders an extensive network of overlapping peptides, therefore opening way to efficient de novo sequencing and related applications; and (v) it is tunable to cope with substrate particularities and/or specific study requirements. In this latter perspective, adjustments in the composition of the multienzymatic mixture regarding the concentration and nature of proteases expectedly constitute assets to accommodate unusual occurrence and distribution of cleavable residues or purposely shape specific properties of the proteolytic mixture such as the median length of the peptides. The current limitations of our methodology lie in the increased complexity of the resulting proteolytic mixtures consecutively to the joint action of multiple enzymes and the promotion of missed cleavage events. This feature amplifies effects of competitive ionization inherent to electrospray and somewhat hinders the application of MELD for shotgun proteomics on complex mixtures with large concentration 2510

DOI: 10.1021/acs.jproteome.9b00044 J. Proteome Res. 2019, 18, 2501−2513

Journal of Proteome Research



dynamics. Else, the digest products reproducibility over monoenzymatic strategies is commonly lower, therefore advising caution for quantitative proteomics studies relying on the MELD methodology. The standard MELD protocol was successfully employed as part of specific proteomic studies ranging from sequence confirmation and PTMs characterization44 to terminal amine isotopic labeling of substrates (TAILS).45 This encompasses the characterization of punctual mutations inherent to sequence frameshifts, substitutions, and truncations as well as the identification and sequencing of proteins from databases. This latter application recently led to the exhaustive characterization, with total sequence determination, PTMs identification, and cleavage sites localization, of a novel walnut allergen from a transcriptome repository converted to proteins and expressed sequence tags (EST) database.46 While our results already demonstrate the adequacy of the MELD methodology to answer punctual queries among today’s challenges in proteomics, we strongly believe that the advent of tomorrow’s technologies will enlarge its field of application toward the characterization of more complex protein samples. In the context where a growing number of studies highlight the partial,47 and at time inadequate,48 sampling of the protein information offered by tryptic approaches, the transition toward more comprehensive alternatives for differential shotgun proteomics constitutes a key step. The major bottleneck presently impeding the widespread utilization of MELD resides in the capacity to analyze the highly complex proteolytic mixtures it engenders. While the upstream implementation of specific enrichments, e.g. based on immunoaffinity,49 offers readily available solutions to slim them down, we likely foresee future evolutions in LC and MS setups as a way toward novel opportunities. In this context, improvements in the accessible dynamic range for detection and developments of alternative techniques for samples fractionation50 and peptides fragmentation51 together with refinements in the bioinformatics tools concerning the digestion rules, search algorithms, and FDR calculations should greatly contribute in changing the game. On the basis of this prospective outcome and capitalizing on the depth of information it furnishes, the MELD methodology appears as a promising alternative to address present and upcoming biomedical challenges. Among others is the identification of proteins mutations occurring in certain types of pathologies such as these affecting amyloidogenic proteins in the context of hereditary amyloidosis.52 Tweaking of the proteolytic conditions to achieve adequate peptide sizing may else be exploited to refine epitope mapping data issued from pulldown assays. In other respects, the demonstrated higher count of unique proteolytic peptides is expected to enhance amount and confidence of identifications in the scope of biomarker discovery studies. In the same vein, the extended sequence coverage might also constitute a valuable step toward the sequencing of the whole proteome with awaited reverberations on cancer medicine in the wake of what resulted from the whole exome sequencing.53 As a last insight, the concept of “digestibility profiles”, linking the intensity of proteolytic peptides with their respective localization inside the protein 3D structure, should benefit from MELD to become a relevant structural probe used in the mapping of disordered stretches or enzyme-resistant regions found in allergens.54

Article

ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jproteome.9b00044. Sections S1, Composition of the eluent; S2, Optimal “two-ratio” combinations; S3, MELD performances achieved on myoglobin, β-lactamase, and fetuin; S4, PSM score distributions; S5, Performances of monoenzymatic workflows relying on chymotrypsin and GluC; S6, False positive rate in the matching of de novo candidates; S7, Amplitude of de novo coverage with MELD, combinatory, and tryptic method; S8, Coverage of the N-glycosylated Asn-301 residue from de novo candidates (PDF) Supplementary Table S1: peptides issued from the standard MELD protocol (XLSX) Supplementary Table S2: unconstrained tryptic workflow (XLSX) Supplementary Table S3: constrained tryptic workflow (XLSX) Supplementary Table S4: combinatory workflow (XLSX)



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. ORCID

Denis Morsa: 0000-0002-1944-2760 Tyler A. Zimmerman: 0000-0002-5408-9771 Emeline Hanozin: 0000-0002-7717-9999 Edwin De Pauw: 0000-0003-3475-1315 Gabriel Mazzucchelli: 0000-0002-8757-8133 Present Addresses □

T.A.Z.: Collins Aerospace, San Dimas, California 91773, United States. ○ R.L.: Department of Clinical Pharmacology and Pharmacoepidemiology, University of Heidelberg, Heidelberg 69117, Germany. Notes

The authors declare no competing financial interest. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium (http:// proteomecentral.proteomexchange.org) via the PRIDE partner repository55 with the data set identifier PXD009800.



ACKNOWLEDGMENTS The authors thank Lisette Trzpiot and Nancy Rosière (University of Liege) for experimental assays and technical help. Zac Anderson and Wen Zhang from Bioinformatics Solutions Inc. (Waterloo, Canada) are deeply acknowledged for their support, guidance, and comments on PEAKS AB software and de novo results. D.M. and D.B. are financed by the FEDER and the Wallonia Region conventions.



REFERENCES

(1) Brower, V. Proteomics: Biology in the Post-Genomic Era. Companies All over the World Rush to Lead the Way in the New Post-Genomics Race. EMBO Rep. 2001, 2 (7), 558−560.

2511

DOI: 10.1021/acs.jproteome.9b00044 J. Proteome Res. 2019, 18, 2501−2513

Article

Journal of Proteome Research

Protein Complexes and Lens Tissue. Proc. Natl. Acad. Sci. U. S. A. 2002, 99 (12), 7900−7905. (26) Molina, H.; Horn, D. M.; Tang, N.; Mathivanan, S.; Pandey, A. Global Proteomic Profiling of Phosphopeptides Using Electron Transfer Dissociation Tandem Mass Spectrometry. Proc. Natl. Acad. Sci. U. S. A. 2007, 104 (7), 2199−2204. (27) Feng, Y.; De Franceschi, G.; Kahraman, A.; Soste, M.; Melnik, A.; Boersema, P. J.; de Laureto, P. P.; Nikolaev, Y.; Oliveira, A. P.; Picotti, P. Global Analysis of Protein Structural Changes in Complex Proteomes. Nat. Biotechnol. 2014, 32 (10), 1036−1044. (28) Boutet, E.; Lieberherr, D.; Tognolli, M.; Schneider, M.; Bairoch, A. UniProtKB/Swiss-Prot. Methods Mol. Biol. 2007, 406, 89− 112. (29) Schilling, B.; Rardin, M. J.; MacLean, B. X.; Zawadzka, A. M.; Frewen, B. E.; Cusack, M. P.; Sorensen, D. J.; Bereman, M. S.; Jing, E.; Wu, C. C.; et al. Platform-Independent and Label-Free Quantitation of Proteomic Data Using MS1 Extracted Ion Chromatograms in Skyline. Mol. Cell. Proteomics 2012, 11 (5), 202−214. (30) Gundry, R. L.; White, M. Y.; Murray, C. I.; Kane, L. A.; Fu, Q.; Stanley, B. A.; Van Eyk, J. E.; Su, C.; Yang, F.; et al. Preparation of Proteins and Peptides for Mass Spectrometry Analysis in a BottomUp Proteomics Workflow. Curr. Protoc Mol. Biol. 2009, 77 (5), 342− 355. (31) Dick, L. W., Jr; Mahon, D.; Qiu, D.; Cheng, K. Peptide Mapping of Therapeutic Monoclonal Antibodies: Improvements for Increased Speed and Fewer Artifacts. J. Chromatogr. B: Anal. Technol. Biomed. Life Sci. 2009, 877 (3), 230−236. (32) Yamada, T.; Choy, E. H.; Panayi, G. S.; Kingsley, G. H. Therapeutic Monoclonal Antibodies. Keio J. Med. 2011, 60 (2), 37− 46. (33) Zhang, H.; Cui, W.; Gross, M. L. Mass Spectrometry for the Biophysical Characterization of Therapeutic Monoclonal Antibodies. FEBS Lett. 2014, 588 (2), 308−317. (34) du Prel, J.-B.; Hommel, G.; Röhrig, B.; Blettner, M. Confidence Interval or P-Value? Part 4 of a Series on Evaluation of Scientific Publications. Dtsch. Arztebl. Int. 2009, 106 (19), 335−339. (35) Keil, B. Proteolysis Data Bank: Specificity of AlphaChymotrypsin from Computation of Protein Cleavages. Protein Seq Data Anal 1987, 1 (1), 13−20. (36) Higel, F.; Seidl, A.; Sörgel, F.; Friess, W. N-Glycosylation Heterogeneity and the Influence on Structure, Function and Pharmacokinetics of Monoclonal Antibodies and Fc Fusion Proteins. Eur. J. Pharm. Biopharm. 2016, 100, 94−100. (37) Dell, A.; Morris, H. R. Glycoprotein Structure Determination by Mass Spectrometry. Science (Washington, DC, U. S.) 2001, 291 (5512), 2351−2356. (38) Füssl, F.; Cook, K.; Bones, J.; Fitzgerald, O.; Trappe, A.; Scheffler, K. Comprehensive Characterisation of the Heterogeneity of Adalimumab via Charge Variant Analysis Hyphenated On-Line to Native High Resolution Orbitrap Mass Spectrometry. MAbs 2019, 11 (1), 116−128. (39) Guthals, A.; Clauser, K. R.; Frank, A. M.; Bandeira, N. Sequencing-Grade De Novo Analysis of MS/MS Triplets (CID/ HCD/ETD) From Overlapping Peptides. J. Proteome Res. 2013, 12 (6), 2846−2857. (40) Savidor, A.; Barzilay, R.; Elinger, D.; Yarden, Y.; Lindzen, M.; Gabashvili, A.; Adiv Tal, O.; Levin, Y. Database-Independent Protein Sequencing (DiPS) Enables Full-Length de Novo Protein and Antibody Sequence Determination. Mol. Cell. Proteomics 2017, 16 (6), 1151−1161. (41) Tran, N. H.; Rahman, M. Z.; He, L.; Xin, L.; Shan, B.; Li, M. Complete De Novo Assembly of Monoclonal Antibody Sequences. Sci. Rep. 2016, 6 (1), 31730. (42) Mazzucchelli, G.; Zimmerman, T.; Smargiasso, N.; Baiwir, D.; Meuwis, M.-A.; De Pauw, E. De Novo Sequencing Using MELD Proteolysis Coupled to a “Sequence Assembly” Algorithm. In 63rd ASMS Conference Proceedings; 2015.

(2) Pandey, A.; Mann, M. Proteomics to Study Genes and Genomes. Nature 2000, 405 (6788), 837−846. (3) He, Q. Y.; Chiu, J. F. Proteomics in Biomarker Discovery and Drug Development. J. Cell. Biochem. 2003, 89 (5), 868−886. (4) Chan, I. S.; Ginsburg, G. S. Personalized Medicine: Progress and Promise. Annu. Rev. Genomics Hum. Genet. 2011, 12 (1), 217−244. (5) Metzger, J.; Schanstra, J. P.; Mischak, H. Capillary Electrophoresis-Mass Spectrometry in Urinary Proteome Analysis: Current Applications and Future Developments. Anal. Bioanal. Chem. 2009, 393 (5), 1431−1442. (6) Arpino, P. Combined Liquid Chromatography Mass Spectrometry. Part III. Applications of Thermospray. Mass Spectrom. Rev. 1992, 11 (1), 3−40. (7) Rajabi, K.; Ashcroft, A. E.; Radford, S. E. Mass Spectrometric Methods to Analyze the Structural Organization of Macromolecular Complexes. Methods 2015, 89, 13−21. (8) Han, X.; Aslanian, A.; Yates, J. R. Mass Spectrometry for Proteomics. Curr. Opin. Chem. Biol. 2008, 12 (5), 483−490. (9) Switzar, L.; Giera, M.; Niessen, W. M. A. Protein Digestion: An Overview of the Available Techniques and Recent Developments. J. Proteome Res. 2013, 12 (3), 1067−1077. (10) McLafferty, F. W.; Breuker, K.; Jin, M.; Han, X.; Infusini, G.; Jiang, H.; Kong, X.; Begley, T. P. Top-down MS, a Powerful Complement to the High Capabilities of Proteolysis Proteomics. FEBS J. 2007, 274 (24), 6256−6268. (11) Zhang, Y.; Fonslow, B. R.; Shan, B.; Baek, M. C.; Yates, J. R. Protein Analysis by Shotgun/Bottom-up Proteomics. Chem. Rev. 2013, 113 (4), 2343−2394. (12) Brodbelt, J. S. Ion Activation Methods for Peptides and Proteins. Anal. Chem. 2016, 88 (1), 30−51. (13) Sleno, L.; Volmer, D. a. Ion Activation Methods for Tandem Mass Spectrometry. J. Mass Spectrom. 2004, 39, 1091−1112. (14) Aebersold, R.; Mann, M. Mass Spectrometry-Based Proteomics. Nature 2003, 422 (6928), 198−207. (15) Yates, J. R. The Revolution and Evolution of Shotgun Proteomics for Large-Scale Proteome Analysis. J. Am. Chem. Soc. 2013, 135 (5), 1629−1640. (16) Yates, J. R.; Ruse, C. I.; Nakorchevsky, A. Proteomics by Mass Spectrometry: Approaches, Advances, and Applications. Annu. Rev. Biomed. Eng. 2009, 11 (1), 49−79. (17) Swaney, D. L.; Wenger, C. D.; Coon, J. D. Value of Using Multiple Proteases for Large-Scale Mass Spectrometry-Based Proteomics. J. Proteome Res. 2010, 9 (3), 1323−1329. (18) Capelo, J. L.; Carreira, R.; Diniz, M.; Fernandes, L.; Galesio, M.; Lodeiro, C.; Santos, H. M.; Vale, G. Overview on Modern Approaches to Speed up Protein Identification Workflows Relying on Enzymatic Cleavage and Mass Spectrometry-Based Techniques. Anal. Chim. Acta 2009, 650 (2), 151−159. (19) Meyer, B.; Papasotiriou, D. G.; Karas, M. 100% Protein Sequence Coverage: A Modern Form of Surrealism in Proteomics. Amino Acids 2011, 41 (2), 291−310. (20) Giansanti, P.; Tsiatsiani, L.; Low, T. Y.; Heck, A. J. R. Six Alternative Proteases for Mass Spectrometry−based Proteomics beyond Trypsin. Nat. Protoc. 2016, 11 (5), 993−1006. (21) Forbes, A. J.; Mazur, M. T.; Patel, H. M.; Walsh, C. T.; Kelleher, N. L. Toward Efficient Analysis of > 70 KDa Proteins with 100% Sequence Coverage. Proteomics 2001, 1 (8), 927−933. (22) Tsiatsiani, L.; Heck, A. J. R. Proteomics beyond Trypsin. FEBS J. 2015, 282 (14), 2612−2626. (23) Wiśniewski, J. R.; Mann, M. Consecutive Proteolytic Digestion in an Enzyme Reactor Increases Depth of Proteomic and Phosphoproteomic Analysis. Anal. Chem. 2012, 84 (6), 2631−2637. (24) Choudhary, G.; Wu, S. L.; Shieh, P.; Hancock, W. S. Multiple Enzymatic Digestion for Enhanced Sequence Coverage of Proteins in Complex Proteomic Mixtures Using Capillary LC with Ion Trap MS/ MS. J. Proteome Res. 2003, 2 (1), 59−67. (25) MacCoss, M. J.; McDonald, W. H.; Saraf, A.; Sadygov, R.; Clark, J. M.; Tasto, J. J.; Gould, K. L.; Wolters, D.; Washburn, M.; Weiss, A.; et al. Shotgun Identification of Protein Modifications from 2512

DOI: 10.1021/acs.jproteome.9b00044 J. Proteome Res. 2019, 18, 2501−2513

Article

Journal of Proteome Research (43) Kolarich, D.; Jensen, P. H.; Altmann, F.; Packer, N. H. Determination of Site-Specific Glycan Heterogeneity on Glycoproteins. Nat. Protoc. 2012, 7 (7), 1285−1298. (44) Chiavarina, B.; Thiry, M.; Scheijen, J. L.; Hutton, C. A.; Belpomme, D.; Peixoto, P.; Turtoi, A.; Bellahcène, A.; Bianchi, E.; Delvenne, P.; et al. Methylglyoxal, a Glycolysis Side-Product, Induces Hsp90 Glycation and YAP-Mediated Tumor Growth and Metastasis. eLife 2016, 5, No. e19375. (45) Bekhouche, M.; Leduc, C.; Dupont, L.; Janssen, L.; Delolme, F.; Vadon-Le Goff, S.; Smargiasso, N.; Baiwir, D.; Mazzucchelli, G.; Zanella-Cleon, I.; et al. Determination of the Substrate Repertoire of ADAMTS2, 3, and 14 Significantly Broadens Their Functions and Identifies Extracellular Matrix Organization and TGF-β Signaling as Primary Targets. FASEB J. 2016, 30 (5), 1741−1756. (46) Dubiela, P.; Kabasser, S.; Smargiasso, N.; Geiselhart, S.; Bublin, M.; Hafner, C.; Mazzucchelli, G.; Hoffmann-Sommergruber, K. Jug r 6 Is the Allergenic Vicilin Present in Walnut Responsible for IgE Cross-Reactivities to Other Tree Nuts and Seeds. Sci. Rep. 2018, 8 (1), 11366. (47) Raijmakers, R.; Neerincx, P.; Mohammed, S.; Heck, A. J. R. Cleavage Specificities of the Brother and Sister Proteases Lys-C and Lys-N. Chem. Commun. 2009, 46 (46), 8827−8829. (48) Beltran, L.; Cutillas, P. R. Advances in Phosphopeptide Enrichment Techniques for Phosphoproteomics. Amino Acids 2012, 43 (3), 1009−1024. (49) Guo, A.; Gu, H.; Zhou, J.; Mulhern, D.; Wang, Y.; Lee, K. A.; Yang, V.; Aguiar, M.; Kornhauser, J.; Jia, X.; et al. Immunoaffinity Enrichment and Mass Spectrometry Analysis of Protein Methylation. Mol. Cell. Proteomics 2014, 13 (1), 372−387. (50) Swearingen, K. E.; Moritz, R. L. High-Field Asymmetric Waveform Ion Mobility Spectrometry for Mass Spectrometry-Based Proteomics. Expert Rev. Proteomics 2012, 9 (5), 505−517. (51) Chalkley, R. J.; Medzihradszky, K. F.; Lynn, A. J.; Baker, P. R.; Burlingame, A. L. Statistical Analysis of Peptide Electron Transfer Dissociation Fragmentation Mass Spectrometry. Anal. Chem. 2010, 82 (2), 579−584. (52) Rowczenio, D. M.; Noor, I.; Gillmore, J. D.; Lachmann, H. J.; Whelan, C.; Hawkins, P. N.; Obici, L.; Westermark, P.; Grateau, G.; Wechalekar, A. D. Online Registry for Mutations in Hereditary Amyloidosis Including Nomenclature Recommendations. Hum. Mutat. 2014, 35 (9), E2403−E2412. (53) Van Allen, E. M.; Wagle, N.; Stojanov, P.; Perrin, D. L.; Cibulskis, K.; Marlow, S.; Jane-Valbuena, J.; Friedrich, D. C.; Kryukov, G.; Carter, S. L.; et al. Whole-Exome Sequencing and Clinical Interpretation of Formalin-Fixed, Paraffin-Embedded Tumor Samples to Guide Precision Cancer Medicine. Nat. Med. 2014, 20 (6), 682−688. (54) Mazzucchelli, G.; Holzhauser, T.; Cirkovic Velickovic, T.; DiazPerales, A.; Molina, E.; Roncada, P.; Rodrigues, P.; Verhoeckx, K.; Hoffmann-Sommergruber, K. Current (Food) Allergenic Risk Assessment: Is It Fit for Novel Foods? Status Quo and Identification of Gaps. Mol. Nutr. Food Res. 2018, 62 (1), 1700278. (55) Vizcaíno, J. A.; Csordas, A.; Del-Toro, N.; Dianes, J. A.; Griss, J.; Lavidas, I.; Mayer, G.; Perez-Riverol, Y.; Reisinger, F.; Ternent, T.; et al. 2016 Update of the PRIDE Database and Its Related Tools. Nucleic Acids Res. 2016, 44 (D1), D447−D456.

2513

DOI: 10.1021/acs.jproteome.9b00044 J. Proteome Res. 2019, 18, 2501−2513