Generic Workflow for Mapping of Complex Disulfide Bonds Using In

2. Proteomics Program, The Novo Nordisk Foundation Center for Protein ... 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. ...
0 downloads 0 Views 1MB Size
Subscriber access provided by Kaohsiung Medical University

Article

Generic Workflow for Mapping of Complex Disulfide Bonds Using In-Source Reduction and Extracted Ion Chromatograms from Data-Dependent Mass Spectrometry Christian N. Cramer, Christian D Kelstrup, Jesper V. Olsen, Kim F. Haselmann, and Peter Kresten Nielsen Anal. Chem., Just Accepted Manuscript • Publication Date (Web): 07 Jun 2018 Downloaded from http://pubs.acs.org on June 7, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Generic Workflow for Mapping of Complex Disulfide Bonds Using In-Source Reduction and Extracted Ion Chromatograms from Data-Dependent Mass Spectrometry Christian N. Cramer1,2, Christian D. Kelstrup2, Jesper V. Olsen2, Kim F. Haselmann1 & Peter Kresten Nielsen1* 1. Protein Engineering, Global Research, Novo Nordisk A/S, Novo Nordisk Park, 2760 Måløv, Denmark 2. Proteomics Program, The Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen, Denmark * Corresponding author: Peter Kresten Nielsen, Ph.D. Protein Engineering, Global Research Novo Nordisk A/S Novo Nordisk Park 2760 Måløv, Denmark Email: [email protected] Tel.: (+45)30790375 Notes: The authors declare no competing financial interest.

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Abstract: Disulfide bond mapping is a critical task in protein characterization as protein stability, structure and function is dependent on correct cysteine connectivities. Mass spectrometry (MS) is the method of choice for this, providing fast and accurate characterization of simple disulfide bonds. Disulfide mapping by liquid chromatography tandem mass spectrometry (LC-MS/MS) is performed by identifying disulfide-bonded partner peptides following proteolytic digestion. With the recently introduced ability to assign complex disulfide patterns by online postcolumn partial disulfide reduction by in-source reduction (ISR) in a LC-ISR-MS/MS methodology, the main challenge is data analysis to ensure detection of both expected and unexpected disulfide species. In this study, we introduced a workflow for confident and unbiased mapping of complex disulfide bonds using the powerful combination of extracted ion chromatograms (XICs) of LC-ISR-MS/MS data. With postcolumn partial reduction, identical LC retention times of intact disulfide-bonded species, their constituting free peptides and partially reduced variants were observed. Subsequent selective MS/MS fragmentation of all reduction products allowed confident identification of free cysteine-containing peptides using a classical shotgun proteomics database search. Matching XICs of the identified cysteine-containing peptides allowed identification of both predicted and unpredicted disulfide species, including unforeseen proteolytic specificities, missed cleavage sites, scrambled disulfide variants and the presence of disulfide-entangled complexes. Applying this workflow, we successfully mapped the complex disulfide bonds of tertiapin and the epidermal growth factor (EGF) family members transforming growth factor α (TGFα) and EGF. In addition, we were able to characterize the disulfide patterns of the special disulfide fold of the TGFβ superfamily in an all-online methodology.

Introduction: Disulfide mapping is a crucial task in analytical protein chemistry to ensure proper protein structure, stability, and function. With the increasing use of recombinant proteins as biotherapeutics, which usually are enriched with disulfide bonds to support their extracellular functions, disulfide mapping is especially important in the biopharmaceutical industry for quality control and to ensure the safety and potency of the drug products.1 Mass spectrometry (MS) is the most widely used technique for this. The general procedure in MS-based disulfide bond mapping is to enzymatically digest the protein of interest under nonreducing conditions and analyze the resulting disulfidelinked peptide species consisting of two peptides connected by a single disulfide bond by LC-MS analysis. These peptide species can then be identified either by their mass alone or by MS/MS sequencing. Challenges associated with disulfide mapping by LC-MS of nonreduced proteolytic digests, are to resolve complex disulfide connectivities and ensure confident data analysis. These challenges will be described in the following sections. Complex disulfide patterns arise when analyzing proteins lacking enzymatic digestion sites between closely–spaced cysteines or nested disulfide bonds. Proteolytic digestion of such complex patterns results in disulfide-bonded species containing more than a single disulfide bond. To determine distinct cysteine pairings of such species, partial disulfide bond reduction is needed, whereby some disulfide bonds are reduced while others are kept intact. This facilitates disulfide mapping by LC-MS/MS from the partially reduced species. Online disulfide reduction approaches compatible with LC-MS have attrated a lot of attention in recent years. Postcolumn reduction of disulfide bonds can be generated in the gas-phase by specialized MS/MS fragmentation techniques, by introduction of chemical reducing agents or an electrochemical (EC) flow cell in the LC-MS flow path, or more recently by in-source reduction (ISR) during electrospray ionization. Gas-phase reduction of disulfide bonds can be obtained by MS/MS using electron transfer dissociation (ETD)2-4, electron capture dissociation (ECD)5,6 and ultraviolet photodissociation (UVPD)7,8. However, using these fragmentation techniques as a

ACS Paragon Plus Environment

Page 2 of 17

Page 3 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

disulfide reduction step in MS2 requires an additional fragmentation step by collisional induced dissociation (CID)-MS3 to obtain selective sequence information of the reduced products. However, this is far from straightforward due to the significantly reduced sensitivity in MS3-mode. Postcolumn disulfide reduction prior to ionization has successfully been demonstrated by tee-in of DTT9 or TCEP10 and by using EC reduction in flow cells placed in a LC-EC-MS setup11-14 . The benefits of postcolumn disulfide reduction are presence of intact disulfide-bonded species, their constituting free peptides and partially reduced variants at identical retention time, when the reduction is only partial. However, these approaches suffer from the difficulty in controlling the reduction efficiency, the nonautomated modification of the LC-MS setup, and ion suppression effects from the chemical reductants9,10 and working electrode material12. Recently, we have described selective disulfide reduction obtained by ISR during electrospray ionization as an online partial reduction approach.15 We suggested the reducing nature of ISR to stem from corona discharge, for which additional studies since have added further evidence, by ISR of disulfide-bonded peptides monitored by ion mobility MS16 and by ISR generation of reduction products of small molecules.17 With disulfide reduction obtained during ionization, ISR represents an easy to implement postcolumn reduction approach in a standard LC-MS setup. By development of such a LC-ISR-MS/MS setup, assignment of complex disulfide patterns was recently demonstrated by selective MS/MS characterization of partially reduced disulfide products.15 A major challenge in data analysis of disulfide mapping studies lies in confident and unbiased identification of different disulfide species. Many different approaches exist ranging from manual interpretation to automated software solutions. Among the manual approaches, the Desaire group demonstrated the power of using extracted ion chromatograms (XICs) of ETD spectra to map disulfide bonds.18,19 Utilizing the efficient ETD cleavage of disulfide bonds, XICs of cysteine-containing peptides could be plotted and correlated to find disulfide-bonded peptide partners. The strength of this approach was automatic detection of both major and minor disulfide forms, as the XIC of a cysteine-containing peptide being part of different disulfide species will have several retention times. The drawback of their method was the biased generation of the XICs, performed based on theoretical masses of cysteine-containing peptides from a fully specific trypsin digestion alone, and the fact that only disulfide-linked species that have been selected for ETD fragmentation can be analyzed this way. Among software solutions, multiple approaches exist enabling disulfide mapping based on the special MS/MS fragmentation patterns of disulfide-bonded species20,21. While the many software tools developed so far can readily assign simple interchain disulfide species, they run into limitations with increasingly complex database searches. This includes disulfide patterns resulting in species consisting of more than two peptides and a single disulfide bond, combinatorial effects of modifications of the disulfide-linked peptides and unpredictable digestion patterns of less specific enzymes. In the present study, we demonstrate the powerful combination of using XICs of LC-ISRMS/MS data to map complex disulfide bonds. Utilizing confident identification of reduced cysteine-containing peptides obtained by postcolumn disulfide reduction, selective CIDbased MS/MS sequencing and database searching, XICs could be generated to reveal both expected and unexpected disulfide species. Using this workflow, we successfully determined distinct cysteine connectivities of the nested disulfide patterns of tertiapin and the EGF family members TGFα and EGF, and confirmed expected disulfide patterns of the special disulfide fold of the TGFβ superfamily in an all-online methodology.

Experimental: Reagents: Lyophilized tertiapin (>95%), lyophilized transforming growth factor-α (TGFα) (≥98%), lyophilized epidermal growth factor (EGF) (>98%), lyophilized growth differential factor 2 (GDF2) (>95%), lyophilized transforming growth factor-β 3 (TGFβ3)

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(>98%) and formic acid (FA) were purchased from Sigma-Aldrich, modified trypsin was purchased from Promega, endoproteinase Lys-C and chymotrypsin from Roche Diagnostics, optima LC/MS grade 0.1% FA in water (LC buffer A) and 0.1% FA in ACN (LC buffer B) from Fisher Scientific and sodium di-hydrogen phosphate mono-hydrate (NaH2PO4∙H2O) and di-sodium hydrogen phosphate di-hydrate (Na2HPO4∙2H2O) used to make a 100mM NaH2PO4/Na2HPO4 buffer (pH 6.8) from Merck. Sample preparation: Nonreducing proteolytic digestions were performed in NaH2PO4/Na2HPO4 buffer (pH 6.8) with final protein concentrations of 0.5mg/ml. Tertiapin, GDF2 and TGFβ3 were digested by addition of Lys-C and Trypsin in 1:40 and 1:20 (w/w) enzyme to protein ratios, respectively. TGFα and EGF were digested using chymotrypsin in an 1:40 (w/w) enzyme to protein ratio. The digest reactions were incubated over-night at 37°C and quenched by addition of FA to a final concentration of 20% (v/v). Mass spectrometry: All MS experiments were carried out using an Orbitrap Fusion Lumos mass spectrometer (Thermo Fisher Scientific, San Jose, CA, USA), using the HESI ion source operating in positive ion mode. The source settings for were; spray voltage of 3.5 kV, sheath, aux and sweep gasses of 15, 10 and 0 arbitrary units respectively, ion transfer tube temperature of 300°C and vaporizer temperature of 135°C. All LC-ISR-MS/MS experiments were performed using data-dependent higher-energy collision induced dissociation (ddHCD)22 acquisition. In the ddHCD method, survey scans of precursors were performed at 120k resolution with AGC target of 4.0x105 and maximum injection time of 200ms with scan ranges from 250 to 1550 m/z for tertiapin, TGFα and EGF and from 350 to 2000 m/z for GDF2 and TGFβ3. The MS/MS was performed by quadrupole precursor isolation of 3 m/z, normalized HCD fragmentation energy of 24% and orbitrap detection with 120k resolution. The MS/MS AGC target was set to 3.0x105 and maximum injection time of 333ms. The filters applied were; peptide monoisotopic precursor selection, charge states 2-12, intensity threshold of 1.0x105 and dynamic exclusion duration set to 8 sec with a 25 ppm tolerance. The method was run in top speed mode with cycles of 2 sec. Liquid Chromatography: All LC separations were performed using a Thermo Scientific Vanquish Horizon UHPLC system connected to the Orbitrap MS instrument. The analytical column used was an Acquity UPLC CSH C18 reversed-phase column, 1.7 µm, 1.0 x 150 mm (Waters Company, UK) with a column temperature of 55°C. The mobile phase buffers A and B were water and ACN, respectively, both containing 0.1% FA. The flow rate used was 100 µL/min, and sample elution was performed with increasing ratio of buffer B, using a gradient starting with 4% B for the first minute and then linearly increasing to 35% B at 34 min. The amounts of protein material injected on the column were between 2 and 2.5 µg. Data analysis: Xcalibur 4.0 (Thermo Fisher Scientific) was used for recording data, analyzing raw MS files and generating XICs. Byonic (Protein Metrics, San Carlos, CA, USA) was used to perform database searches, using the semi specific digestion specificity parameter. GPMAW v. 9.50 (Lighthouse Data, DK) was used to assist data analysis.

Results and discussion Method Overview: A schematic overview of the disulfide mapping workflow is shown in Figure 1. This workflow consists of a data acquisition strategy (Figure 1A), which we previously described15, and a more advanced data analysis strategy (Figure 1B), which is presented here. In brief, the data acquisition strategy can be described in four steps. The disulfide-containing proteins of interest are subjected to proteolytic digestion under nonreducing conditions (acquisition step 1) followed by LC-ISR-MS/MS data acquisition. The LC-ISR-MS/MS data acquisition utilizes; LC separation (acquisition step 2) of species from the nonreducing protein digests, online partial reduction of disulfide bonds by ISR

ACS Paragon Plus Environment

Page 4 of 17

Page 5 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

(acquisition step 3), followed by selective MS/MS characterization of all generated species by ddHCD (acquisition step 4). The fifth and final step in the disulfide mapping workflow is data analysis, which is often the most challenging part of disulfide mapping studies, as previously introduced. The method we describe here utilizes the presence of intact disulfide-bonded species, their free constituting peptides and partially reduced intermediates, all present at identical retention times, as disulfide reduction is achieved postcolumn. A schematic representation of the data analysis workflow we present in this study is shown in Figure 1B. The data analysis will be thoroughly introduced step-by-step throughout this section using the LC-ISR-ddHCD methodology on a tryptic digestion of tertiapin. Tertiapin is a 21 amino acid peptide containing two intertwined intrachain disulfide bonds (Figure 2, top), which we analyzed using the LC-ISR-ddHCD workflow. The starting point of the data analysis is a classical shotgun proteomics database search to identify free reduced cysteine-containing peptides, which have been generated by the online postcolumn ISR and selective MS/MS characterization in the LC-ISR-ddHCD setup (analysis step 1 in Figure 1B). The database search can be performed by common proteomics software with high confidence due to the well-known fragmentation patterns of linear peptides. This eliminates the need for specialized software and circumvents the challenges associated with confident detection of disulfide-bonded species. In addition, the database search for linear cysteine-containing peptides can be expanded to include more advanced searches, such as increased number of missed cleavage sites, semispecific digestion (i.e. one terminus may disagree with the digestion specificity parameters) and combinations of different peptide modifications (e.g. oxidations, pyroglutamic acid formation) without compromising the amount of false discovery hits. Having identified a list of cysteine-containing peptides, the next step is to generate XICs for these species (analysis step 2). From the database search of the tryptic digested tertiapin, the peptides ALCNCNR and IIIPHMCWK were identified by MS/MS in free forms from the LC-ISR-ddHCD data acquisition. In addition to these expected peptides from an in silico fully specific trypsin digestion, the peptide MCWK stemming from C-terminal histidine digestion was also detected. The XICs of these peptides are shown in Figure 2A, B and C. As disulfide reduction is achieved postcolumn, peptide partners involved in disulfide bonding will have perfect co-elutions, which will be detected by correlation of their respective XICs (analysis step 3). This is also observed for the ALCNCNR and IIIPHMCWK peptides with co-elution around 11 minutes (Figure 2, A and B), and for the ALCNCNR and MCWK peptides around 1.8 minutes (Figure 2, A and C). The full MS spectra at retention times of these cysteine-containing peptides should then be inspected (analysis step 4), to see if the identified disulfide partner peptides account for the intact mass of the disulfide-bonded fragment present in the full MS spectrum, or if a low-mass cysteine-containing peptide is also present, but only as singly charged and thus not selected for MS/MS (referred to as the ∆mass approach in analysis step 5). As an example, the full MS spectrum of the most abundant disulfide-containing species eluting at 11 minutes is shown in Figure 3A. In addition to the ALCNCNR and IIIPHMCWK peptides present in multiple charge states, a single charged peak at 307.143 m/z is also present. This mass corresponds to the reduced CGK peptide from tertiapin, which together with the ALCNCNR and IIIPHMCWK peptides accounts for the mass of the intact disulfide-bonded species present in the spectrum. Having identified this CGK peptide an XIC can also be generated for this peptide (Figure 2D), as well as other singly charged cysteine-containing species using the same approach, such as the ALCN peptide stemming from unspecific digestion between N and C (Figure 2E). Combining the information from these XICs with annotation of linear peptides without cysteine residues identified by the database, a full annotation of the TIC chromatogram can be generated (Figure 2F). If an identified disulfide-containing species only consists of two peptides and a single disulfide bond (analysis step 6), the disulfide mapping of this species is complete (analysis step 9). However, if a disulfide-containing species involve more than two peptides connected by a single disulfide bond, such as the disulfide-containing species at 11 minutes, the distinct cysteine connectivities should be determined from partially

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

reduced intermediates. The presence of partially reduced variants is confirmed from the full MS spectrum (analysis step 7), such as observed in Figure 3A by presence of the [T1+T2] and [T1+T4] species. In order to determine exact cysteine pairings, the MS/MS fragmentation of the partially reduced variants are inspected for connectivity-determining fragments (analysis step 8). The ddHCD fragmentation of the partially reduced [T1+T4] species is shown in Figure 3B. As annotated in the spectrum, successful disulfide assignment could be performed by the presence of fragments stemming from backbone cleavage between the two closely-spaced cysteines of the T1 peptide, with the mass increase of the T4 peptide on the C-terminal cysteine of the T1 peptide. Using this approach, disulfide mapping of multi-chain disulfide-species can be performed down to the cysteine-connectivity level, as it has also previously been demonstrated.15 The advantage of the presented methodology of using XICs of LC-ISR-MS/MS data is the unbiased detection of both expected and unexpected disulfide species. Unexpected disulfide species include species from unpredictable enzyme specificities and disulfide scrambled variants. To minimize disulfide bond scrambling during protein digestion, the solution pH was lowered to 6.8. Trypsin is generally a very specific protease that cleaves C-terminal to arginine and lysine residues.23 However, its activity has optimum at pH 8.5, and at lower pH the protease has lower activity and we speculate that it is likely also to have lower specificity resulting in unspecific cleavages. Evident from the annotations in Figure 2F, both unpredicted trypsin specificity was observed in the tertiapin digestion, as well as introduction of scrambled disulfide variants (see Supplementary Table S1 for list of predicted and unpredicted disulfide-containing species detected). Interestingly, the half-tryptic MCWK peptide, which N-termini stems from an unconventional C-terminal histidine digestion, was found to be involved in several low abundant scrambled disulfide species. As all of these scrambled disulfide species involved presence of two Cys14 residues and no covalent dimer-formations were seen in the intact molecule (i.e. RPUPLC analysis before digestion, data not shown) suggesting the source of these scrambled species to be the sample digestion procedure. Using pH 6.8 in this study, this illustrates that nonreducing proteolytic digestion at reduced pH is no guarantee for avoiding disulfide scrambling during the sample procedure. This has also been reported in another study using digestion at pH 6.5.20 Disulfide Mapping Using Proteases with Broad Specificity: Data analysis is often the most challenging step in disulfide mapping studies, as previously introduced. As a consequence hereof, most disulfide mapping studies utilizes proteases with the highest digestion specificities, such as trypsin and/or Lys-C, to avoid further complication of the data analysis task. However, depending on the protein sequence of interest and the location of the disulfide bonds, less specific proteases need to be used in order to obtain disulfide-containing species useful for disulfide mapping. An example of such class of proteins with complicated disulfide patterns is the epidermal growth factor (EGF) family. All members of the EGF family contain one or more repeats of the conserved amino acid sequence CX7CX4-5CX10-13CXCX8GXRC, where X can be any amino acid and the 6 cysteine residues form three intramolecular disulfide bonds with pairings between C1-C3, C2-C4 and C5-C6.24 The disulfide bonds form three loops which are essential for high-affinity binding to the receptors, with roles in cell growth and differentiation.25 The functional role of the disulfide bonds makes disulfide mapping a pivotal task. The sequence and disulfide pattern of the EGF family member transforming growth factor-α (TGFα) is shown Figure 4, top. Based on the amino acid sequence of TGFα, chymotrypsin was chosen for the nonreducing proteolytic digestion to maximize the number of enzymatic cleavages between the cysteine residues. Using the presented workflow (Figure 1A and B), the XICs of the major cysteine-containing peptides identified by the database search of chymotryptic digested TGFα is shown in Figure 4A-E. Assisted by these XICs full annotation of the TIC chromatogram could be performed (Figure 4F), as previously described. Among the most surprising findings from the chymotryptic digestion specificity was a substantial amount of missed cleavage at the phenylalanine residue in the CFHGTCRF peptide and a surprisingly specific digestion C-terminal to the valine residue between the closely-spaced C4 and C5 cysteines, thus producing the LVQEDKPACV

ACS Paragon Plus Environment

Page 6 of 17

Page 7 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

peptide (see Figure 4B and C, as well as Supplementary Table S2 for a list of predicted and unpredicted disulfide-containing species detected). These specificities resulted in an intense peptide peak with retention time at 12 minutes. The constituting peptides involved in disulfide-bonding in this species could be directly identified from the XIC traces in Figure 4A-C, and confirmed by inspection of the full MS spectrum (Figure 5A). As seen in the spectrum, all three reduced peptides were present in multiple charge states, together with their partially reduced intermediates. From selective MS/MS characterization of both partially reduced variants, mapping of distinct cysteine connectivities could be obtained (Figure 5B and C). As highlighted in the spectra, intense y5 and y6 fragments stemming from backbone dissociation between the two cysteine residues on the CFHGTCRF peptide were observed in both spectra. These y5 and y6 fragments had the mass increase of the full length NDCPDSHTQF peptide in the partially reduced [1+2] species (Figure 5B), whereas they were unmodified in the partially reduced [2+3] species (Figure 5C). The assigned disulfide linkages and sequences coverages obtained are summarized as insets in the spectra. Identification of Isobaric Differences in Disulfide-Containing Species. An interesting aspect of the presented methodology on the chymotryptic digestion of TGFα is the ability to distinguish isobaric disulfide-containing species. By inspection of the species eluting at 12 minutes and 13.3 minutes, identical intact masses of the intact disulfide-bonded species were observed (Supporting Figure S1). Revealed by the ISR of disulfide bonds observed in the full MS of the isobaric disulfide species, the only difference between the species were location of a leucine residues after the proteolytic digestion (located Nterminally on peptide 3 vs. C-terminally on peptide 2). Without online disulfide reduction by ISR and selective MS/MS characterization of the reduced products, elucidation of this difference would have been difficult, with the risk of ascribing one of the species as a disulfide scrambled form due to the isobaric nature of these species. This example of alternative digestions patterns, together with the unpredictable high digestion specificity at the valine residue between the C4 and C5 cysteines and the resulting peptides from these cleavages, nicely illustrates the challenges associated with disulfide mapping without the presented methodology. The disulfide mapping methodology was also applied to EGF, the founding member of the EGF family. Using the same approach with XICs of LC-ISR-ddHCD data, the disulfide connectivities were successfully mapped on EGF using chymotryptic digestion, demonstrating the general usefulness of the established workflow (Supporting Figure S2). Characterization of the Unusual Disulfide Fold of the TGFβ Superfamily: To challenge the presented disulfide mapping workflow with extremely complex disulfide bonded proteins, we investigated the unusual disulfide fold of the TGFβ superfamily. The TGFβ superfamily consists of structurally related proteins involved in cell regulation activities, such as controlling differentiation, proliferation and adult tissue homeostasis.26 Most of the family members exist as homo- or heterodimers, with each monomer containing either seven cysteine residues arranged in the conserved disulfide pattern C1C5, C2-C6, C3-C7 and C4-C4* or nine cysteine residues introducing an additional disulfide bond in the N-terminal (i.e. C1-C3, C2-C7, C4-C8, C5-C9 and C6-C6*). The asterisks indicate the interchain disulfide bond bridging two monomers. The amino acid sequence and disulfide connectivities of the TGFβ family member growth differentiation factor (GDF2), also known as bone morphogenetic protein 9 (BMP9), is shown in Figure 6A as an example. The special disulfide fold of the TGFβ superfamily consists of a cystine knot in each monomer, comprised of the three conserved intra-chain disulfide bonds, with two disulfide bonds forming a loop together with the protein backbone and the third disulfide bond passing through the loop.27,28 In Figure 6B the cysteine knot of GDF2 is shown in a three dimensional representation of monomeric GDF2. When we applied the presented disulfide mapping workflow using tryptic digestion of GDF2, we detected four major cysteine-containing peptides, which are highlighted with color-coding in Figure 6. The XICs of these cysteine-containing peptides showed co-eluting profiles with apex for all peptides at the same retention time (Supporting Figure S3). Lack of backbone

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

digestion within the loop between the two pairs of closely-spaced cysteines result in all peptides being connected by entangled disulfide bonds in a complex consisting of eight peptides due to the dimeric nature of GDF2 (Figure 6C).29 Evident from the color-coded peptides in Figure 6A several missed digestion sites were observed, including a lysine residue between the closely-spaced cysteines on the orange peptide, which otherwise upon digestion would have opened the ring structure. However, the lack of digestion around the cysteine residues was probably caused by the bulkiness of the cystine knot preventing binding of the substrate to the active site of the enzyme. To identify the constituents of an entangled disulfide-bonded complex with eight peptides connected, including a total number of 8 missed cleavage sites represented an overwhelming analytical task if not for the presented methodology. Furthermore, the intact mass of the complex provides no information on its own about the disulfide connectivities. In contrast, the plethora of information obtained by the LC-ISR-MS/MS setup is illustrated by the full MS spectrum of the entangled disulfide-bonded complex from the tryptic digestion of GDF2 in Figure 7. Firstly, all the four free reduced peptides were observed in the spectrum, from where selective MS/MS characterization allowed identification of the constituting peptides. The intact disulfide-bonded complex was observed in charge state 12 at 1266 m/z as the most abundant charge state, which corresponds exactly to two of each of constituting peptides. The most intense peak in the spectrum in Figure 7 present in charge state 8 at 949 m/z corresponded to the monomeric core consisting of one of each constituting peptides, thus being a partially reduced variant with the interchain disulfide bond reduced between the two monomeric cores. The disulfide connectivities between the peptides in the monomeric core could be elucidated by the presence of partially reduced species consisting of only two peptides disulfide-linked. As annotated in the spectrum, a partially reduced species consisting of a disulfide bond between the green [1-14]-peptide and the blue [69-87]-peptide was observed (682 m/z, charge state 5), as well as a partially reduced species consisting of the orange [31-53]-peptide and the red [97-110]-peptide linked by two disulfide bonds. The presence of these partially reduced variants complies with the expected disulfide pattern. The selective MS/MS fragmentation of both partially reduced species was investigated and found absent for disulfide connectivity-determining fragments between the closely-spaced cysteines (data not shown). Lacking this last piece of information, two possible cysteine connectivities remained in both of these partially reduced species; C1C4 vs. C1-C5 in the [1-14 + 69-87]-species and C2-C6, C3-C7 vs. C2-C7, C3-C6 in the [31-53 + 97-110]-species. The latter case with two peptides connected by two (or more) interchain disulfide bonds is a central disulfide mapping challenge known from the hinge region of antibodies.30,31 Determination of distinct cysteine connectivities in such loop/ hinge-type disulfide patterns rely either on sequence-dependent observation of MS/MS fragments from double backbone cleavage between both pairs of closely-spaced cysteines18 or by complementary purification and Edman degradation of the disulfidecontaining species.29,30,32 The feasibility of Edman degradation for disulfide mapping purposes depends on the proximity of the disulfide bonds to the N-terminus together with the ability to distinguish different disulfide patterns after certain cycles, thus requiring parallelity of the disulfide bonds. Both of these parameters are dictated by the proteolytic digestion. In case of the two possible disulfide patterns of the [31-53 + 97110]-species (C2-C6, C3-C7 vs. C2-C7, C3-C6) from the tryptic digestion of GDF2, disulfide bonds would be detected in Edman degradation cycles 11 and 13 in both disulfide patterns, due to the location of the cysteine residues in the orange (7 and 11) and red (11 and 13) tryptic peptides (Figure 6A), thus providing no additional information. In order to obtain distinct cysteine connectivities in such cases, alternative digestion strategies should be tried out, such as thermolysin digestion under protection gas.29 The same disulfide mapping approach was applied to TGFβ3, a TGFβ superfamily member containing the additional N-terminal disulfide bond in each monomer. Using the same approach with tryptic digestion of TGFβ3 and LC-ISR-ddHCD data acquisition, the constituting peptides were identified from the disulfide-bonded complex consisting of 10

ACS Paragon Plus Environment

Page 8 of 17

Page 9 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

peptides linked by entangled disulfide bonds (Supporting Figure S4 and S5). Identification of peptide partners directly disulfide-linked with each other was obtained, confirming the expected disulfide bonds (Supporting Figure S5).

Concluding Remarks Here we demonstrated the power of correlating XICs of peptide species analyzed by LCISR-MS/MS. This enabled us to successfully unravel complex disulfide patterns from nonreduced protein digests. The strength of the method is the level of details obtainable by the unbiased detection of disulfide bonds, since the starting point of the data analysis is the free cysteine-containing peptides, which are generated by the online reduction of disulfide bonds by ISR and subsequently subjected to selective MS/MS characterizations. The LC-ISR-ddHCD data acquisition setup combined with the introduced data analysis workflow allowed identification of both predicted and unpredicted disulfide species, including unforeseen proteolytic specificities, missed cleavage sites, scrambled disulfide variants and presence of big disulfide-entangled complexes. Using this methodology, complete mapping of cysteine connectivities was obtained on tertiapin and the EGF family members TGFα and EGF including intertwined disulfide bonds and closely-spaced cysteines. In addition, we were able to verify expected disulfide linkages of the special disulfide fold of the TGFβ superfamily using the presented methodology. To our knowledge, this represents the first study to report useful information of the disulfide pattern of any TGFβ superfamily member in an all-online and streamlined MS-based workflow. With the ability to perform streamlined and unbiased identification of disulfide bonds, the limiting factor in presented workflow is the digestion procedure. The two main considerations for the proteolytic digestion design are; 1) to maximize the number of digestions between cysteines in the protein backbone, to produce as simple disulfidebonded species to analyze as possible, and 2) to reduce the risk of disulfide scrambling during the digestion. Development of efficient digestion procedures completely omitting disulfide scrambling is a challenging task, but would be highly beneficial. Such digestion procedures, combined with the ability of the presented methodology to provide a sensitive tool to identify low-abundant disulfide-scrambled forms, could be used to detect low levels of alternative disulfide isoforms present in the intact protein. Finally, development of software solutions enabling easy data visualization of XICs would be a step to further streamlining the presented methodology. This would assist quick identification of the constituents of disulfide-containing peaks in the chromatogram. Having the chromatogram as the starting point, all major peaks in the chromatogram should be explainable, especially when performing in-depth protein characterization in the biopharmaceutical industry.

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Supporting information: Additional information as mentioned in text, is available free of charge via the Internet at http://pubs.acs.org.

Acknowledgements: This work, a part of the industrial PhD project for Christian N. Cramer, was supported by The Danish Agency for Science, Technology and Innovation and the Novo Nordisk STAR program. The Novo Nordisk Foundation Center for Protein Research (CPR) is supported financially by the Novo Nordisk Foundation (Grant agreement NNF14CC0001).

ACS Paragon Plus Environment

Page 10 of 17

Page 11 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

B Data Analysis Workflow: 1. Database search

2. Generate XICs for Cysteine-containing peptide hits Optional

3. Correlations of XICs to detect disulfide partners based on co-elution

Optional

4. Inspection of full MS spectra at RT’s of Cys-containing peptides

5. Check for low mass cysteine-containing peptides, only present as singly charged (∆ mass approach) (present in the spectrum but not selected for MS/MS)

6. Does the disulfide specie only consist of two peptides and a single disulfide bond? No, more Yes

7. Confirm presence of partially reduced species

8. Manually inspect MS/MS fragmentations of partially reduced species consisting of two peptides for disulfide determining fragments Fragments present

9. Disulfide mapping complete

Fragments not present

Try alternative digestion strategy

Figure 1. Schematic representations of the disulfide mapping workflow. A) The data acquisition workflow by LC-ISR-MS/MS, and B) the data analysis workflow.

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2. LC-ISR-MS/MS of tertiapin digested with trypsin under nonreducing conditions. In (A-C), the XICs of cysteine-containing peptides identified by a database search are shown. Additional XICs of small cysteine containing peptides are shown in (D) and (E), identified by the ∆ mass approach, as described in the text. Vertical blue bars highlight correlation of XICs from cysteine-containing peptides involved in disulfide-bonding. Assisted by these, all major peaks are annotated in the TIC chromatogram of the LC separated tertiapin digest shown in (F). The sequence and disulfide pattern of tertiapin is shown in the top of the figure.

Figure 3. Disulfide mapping of the tertiapin species eluting at 11min (Figure 2F) by LCISR-ddHCD. (A) Full MS spectrum following LC separation and partial disulfide reduction by ISR. Presence of the intact disulfide-bonded species, the constituting free peptides and partially reduced variants were observed due to the partial reduction by ISR. (B) ddHCD MS/MS fragmentation spectrum of the partially reduced [T1+T4]3+ species. Enlarged annotations indicate fragments originating from backbone dissociation between the two closely-spaced cysteines of the T1 peptide. These fragments allowed unambiguous assignment of the disulfide linkages, as summarized by the sequence coverage in the inset. P: Intact disulfide-bonded precursor.

ACS Paragon Plus Environment

Page 12 of 17

Page 13 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figure 4. LC-ISR-MS/MS of TGFα digested with chymotrypsin under nonreducing conditions. In (A-E), the XICs of cysteine-containing peptides identified by a database search are shown. Vertical blue bars highlight correlation of XICs from cysteinecontaining peptides involved in disulfide-bonding. The TIC chromatogram of the LC separated TGFα digest is shown in (F), with all major peaks annotated. The sequence and disulfide pattern TGFα is shown in the top of the figure.

Figure 5. Disulfide mapping of the TGFα species eluting at 12min (Figure 4F) by LC-ISRddHCD. (A) Full MS spectrum following LC separation and partial disulfide reduction by ISR. Presence of the intact disulfide-bonded species, the constituting free peptides and partially reduced variants were observed due to the partial reduction by ISR. In (B) and (C), the ddHCD MS/MS fragmentation spectra of the partially reduced [1+2]3+ and [2+3]3+ species are shown, respectively. Enlarged annotations indicate fragments originating from backbone dissociation between the two cysteines of the 2-peptide. These fragments allowed unambiguous assignment of the disulfide linkages, as summarized by the sequence coverages in the insets. P: Intact disulfide-bonded precursor.

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 6. Disulfide structure of GDF2. (A) Amino acid sequence and disulfide pattern of the GDF2 dimer. (B) 3D structure of the GDF2 monomer (PDB: 5I05). The disulfide bonds of the monomeric cystine knot are highlighted as black sticks. (C) Schematic representation of the disulfide-entangled complex obtained from a tryptic digestion of GDF2. The complex consists of eight peptides being connected by entangled disulfide bonds. The green, orange, blue and red color coded peptides indicate the [1-14], [31-53], [69-87], and [97-110] peptides obtained from the tryptic digestion of GDF2, respectively.

Figure 7. LC-ISR-MS of GDF2 digested with trypsin. The full MS spectrum shows the products of the entangled disulfide-complex of GDF2, following LC separation (Figure S3) and partial reduction by ISR. The free cysteine-containing peptides are annotated in color coding with numbering according to spanning amino acid sequences. The intact disulfideentangled complex and the partially reduced intermediates are annotated with schematic representations. Only the most abundant charge state of each species is annotated, together with charge state annotation of all major peaks in the spectrum.

ACS Paragon Plus Environment

Page 14 of 17

Page 15 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

For TOC only:

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

References: (1) Lakbub, J. C.; Shipman, J. T.; Desaire, H. Analytical and bioanalytical chemistry 2018, 410, 2467-2484. (2) Wu, S. L.; Jiang, H.; Hancock, W. S.; Karger, B. L. Anal Chem 2010, 82, 5296-5303. (3) Massonnet, P.; Upert, G.; Smargiasso, N.; Gilles, N.; Quinton, L.; De Pauw, E. Anal Chem 2015, 87, 5240-5246. (4) Ni, W.; Lin, M.; Salinas, P.; Savickas, P.; Wu, S. L.; Karger, B. L. Journal of the American Society for Mass Spectrometry 2013, 24, 125-133. (5) Fort, K. L.; Cramer, C. N.; Voinov, V. G.; Vasil'ev, Y. V.; Lopez, N. I.; Beckman, J. S.; Heck, A. J. R. Journal of proteome research 2018, 17, 926-933. (6) Zubarev, R. A.; Kruger, N. A.; Fridriksson, E. K.; Lewis, M. A.; Horn, D. M.; Carpenter, B. K.; McLafferty, F. W. Journal of the American Chemical Society 1999, 121, 2857-2862. (7) Wongkongkathep, P.; Li, H.; Zhang, X.; Loo, R. R.; Julian, R. R.; Loo, J. A. Int J Mass Spectrom 2015, 390, 137-145. (8) Agarwal, A.; Diedrich, J. K.; Julian, R. R. Anal Chem 2011, 83, 6455-6458. (9) Liu, H.; Lei, Q. P.; Washabaugh, M. Anal Chem 2016, 88, 5080-5087. (10) Li, X.; Wang, F.; Xu, W.; May, K.; Richardson, D.; Liu, H. Analytical biochemistry 2013, 436, 93-100. (11) Switzar, L.; Nicolardi, S.; Rutten, J. W.; Oberstein, S. A.; Aartsma-Rus, A.; van der Burgt, Y. E. Journal of the American Society for Mass Spectrometry 2016, 27, 50-58. (12) Cramer, C. N.; Haselmann, K. F.; Olsen, J. V.; Nielsen, P. K. Anal Chem 2016, 88, 1585-1592. (13) Zhang, Y.; Yuan, Z.; Dewald, H. D.; Chen, H. Chem. Commun. (Camb.) 2011, 47, 4171-4173. (14) Cai, Y.; Zheng, Q.; Liu, Y.; Helmy, R.; Loo, J. A.; Chen, H. European journal of mass spectrometry (Chichester, England) 2015, 21, 341-351. (15) Cramer, C. N.; Kelstrup, C. D.; Olsen, J. V.; Haselmann, K. F.; Nielsen, P. K. Anal Chem 2017, 89, 5949-5957. (16) Stocks, B. B.; Melanson, J. E. Journal of the American Society for Mass Spectrometry 2018, 29, 742-751. (17) Pei, J.; Hsu, C. C.; Zhang, R.; Wang, Y.; Yu, K.; Huang, G. Journal of the American Society for Mass Spectrometry 2017, 28, 2454-2461. (18) Lakbub, J. C.; Clark, D. F.; Shah, I. S.; Zhu, Z.; Go, E. P.; Tolbert, T. J.; Desaire, H. Analytical methods : advancing methods and applications 2016, 8, 6046-6055. (19) Clark, D. F.; Go, E. P.; Desaire, H. Anal Chem 2013, 85, 1192-1199. (20) Lu, S.; Fan, S. B.; Yang, B.; Li, Y. X.; Meng, J. M.; Wu, L.; Li, P.; Zhang, K.; Zhang, M. J.; Fu, Y.; Luo, J.; Sun, R. X.; He, S. M.; Dong, M. Q. Nat. Methods 2015, 12, 329331. (21) Liu, F.; van Breukelen, B.; Heck, A. J. Molecular & cellular proteomics : MCP 2014, 13, 2776-2786. (22) Olsen, J. V.; Macek, B.; Lange, O.; Makarov, A.; Horning, S.; Mann, M. Nature methods 2007, 4, 709-712. (23) Olsen, J. V.; Ong, S. E.; Mann, M. Molecular & cellular proteomics : MCP 2004, 3, 608-614. (24) Dreux, A. C.; Lamb, D. J.; Modjtahedi, H.; Ferns, G. A. Atherosclerosis 2006, 186, 38-53. (25) Harris, R. C.; Chung, E.; Coffey, R. J. Experimental cell research 2003, 284, 2-13. (26) Weiss, A.; Attisano, L. Wiley interdisciplinary reviews. Developmental biology 2013, 2, 47-63. (27) Schlunegger, M. P.; Grutter, M. G. Nature 1992, 358, 430-434. (28) Daopin, S.; Piez, K. A.; Ogawa, Y.; Davies, D. R. Science (New York, N.Y.) 1992, 257, 369-373. (29) Trachsel, C.; Kampfer, U.; Bechtold, R.; Schaller, J.; Schurch, S. Analytical biochemistry 2009, 390, 103-108.

ACS Paragon Plus Environment

Page 16 of 17

Page 17 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

(30) Zhang, B.; Harder, A. G.; Connelly, H. M.; Maheu, L. L.; Cockrill, S. L. Anal Chem 2010, 82, 1090-1099. (31) Cheng, Y.; Chen, Y.; Yu, C. Journal of pharmaceutical and biomedical analysis 2016, 129, 203-209. (32) Zhang, W.; Marzilli, L. A.; Rouse, J. C.; Czupryn, M. J. Analytical biochemistry 2002, 311, 1-9.

ACS Paragon Plus Environment