MoFi – A software tool for annotating glycoprotein mass spectra by

Apr 6, 2018 - Here, we describe a novel software tool, MoFi, which integrates hybrid MS data to assign glycans and other post-translational modificati...
0 downloads 6 Views 963KB Size
Subscriber access provided by UNIV OF DURHAM

MoFi – A software tool for annotating glycoprotein mass spectra by integrating hybrid data from the intact protein and glycopeptide level Wolfgang Skala, Therese Wohlschlager, Stefan Senn, Gabriel E. Huber, and Christian G. Huber Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.8b00019 • Publication Date (Web): 06 Apr 2018 Downloaded from http://pubs.acs.org on April 9, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

MoFi – A software tool for annotating glycoprotein mass spectra by integrating hybrid data from the intact protein and glycopeptide level Wolfgang Skala†,‡, Therese Wohlschlager†,‡, Stefan Senn†,‡, Gabriel E. Huber†, and Christian G. Huber*,†,‡ †

Department of Biosciences, Bioanalytical Research Labs, University of Salzburg, Hellbrunner Straße 34, 5020 Salzburg, Austria ‡ Christian Doppler Laboratory for Innovative Tools for Biosimilar Characterization, University of Salzburg, Hellbrunner Straße 34, 5020 Salzburg, Austria ABSTRACT: Hybrid mass spectrometry (MS) is an emerging technique for characterizing glycoproteins, which typically display pronounced microheterogeneity. Since hybrid MS combines information from different experimental levels, it crucially depends on computational methods. Here, we describe a novel software tool, MoFi, which integrates hybrid MS data to assign glycans and other post-translational modifications (PTMs) in deconvoluted mass spectra of intact proteins. Its two-stage search algorithm first assigns monosaccharide/PTM compositions to each peak and then compiles a hierarchical list of glycan combinations compatible with these compositions. Importantly, the program only includes those combinations which are supported by a glycan library as derived from glycopeptide or released glycan analysis. By applying MoFi to mass spectra of rituximab, ado-trastuzumab emtansine and recombinant human erythropoietin, we demonstrate how integration of bottom-up data may be used to refine information collected at the intact protein level. Accordingly, our software reveals that a single mass frequently can be explained by a considerable number of glycoforms. Yet, it simultaneously ranks proteoforms according to their probability, based on a score which is calculated from relative glycan abundances. Notably, glycoforms that comprise identical glycans may nevertheless differ in score if those glycans occupy different sites. Hence, MoFi exposes different layers of complexity that are present in the annotation of a glycoprotein mass spectrum.

Protein glycosylation is one of the most intricate and frequent types of post-translational modification (PTM). SwissProt release 2017_10 classifies 5.5% (30382) of all entries (556006) as experimentally verified glycoproteins. Yet, the expected number of glycoproteins is considerably higher, since the N-glycosylation sequon appears in 61.3% (340940) of all sequences1,2. Glycans affect a plethora of cellular processes, either by interacting with glycan-binding proteins or by influencing the conformation, stability and turnover of the protein to which they are attached3,4. Defects in glycosylation pathways are associated with clinically relevant disorders5,6, and alterations of glycosylation patterns occur in a range of malignancies7. Glycosylation has also attracted substantial interest by the biopharmaceutical industry: Most biologicals are produced as glycoproteins in mammalian cell lines8, which is why their efficacy may depend on their glycosylation state9,10. Consequently, glycosylation has emerged as a critical quality attribute of such drugs11 and may decide about their approval and release12. Currently, mass spectrometry (MS) represents the method of choice for initial characterization of glycoproteins13. MS-based strategies comprise analysis of released glycans, glycopeptides, and intact glycoproteins14,15. In the area of glycomics, single-stage or tandem MS determines the structure and abundance of glycans that have been enzymatically or chemically released from a target protein16. In the field of glycoprote-

omics, tandem MS ultimately yields information on glycosylation sites and glycan repertoires of glycopeptides, which are derived from glycoproteins by proteolytic digestion17,18. Intact glycoproteins are either analyzed under denaturing conditions, which typically arise from coupling MS to reversed-phase liquid chromatography or capillary electrophoresis19. Alternatively, native MS allows a glycoprotein to remain in its folded state in solution prior to the ionization step20, which increases its average mass-to-charge ratio. Thereby, the number of overlapping charge states decreases, resulting in an increased spatial resolution, which in turn allows resolving highly complex proteoform profiles14. Each of these analytical levels yields unique, yet complementary details on the micro- and macroheterogeneity of glycoproteins. Those terms describe structural diversity due to the presence or absence of a glycan at a specific glycosylation site, or due to the presence of different glycans, respectively21. This has prompted the development of so-called hybrid MS approaches, which combine information from several levels of MS. Hybrid MS has so far been successfully employed to detect sequence errors22 and to characterize glycoforms23 of therapeutic monoclonal antibodies (mAbs); to quantify differences in N-glycosylation of prostate-specific antigen isoforms in an interlaboratory study24; and to identify new glycosylation sites on the human complement C9 protein25 and properdin26.

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Glycoproteins typically yield complex mass spectra, whose interpretation relies on bioinformatic support27. While there are several programs for analyzing mass spectra of released glycans and glycopeptides28,29, only one dedicated tool for combining information from different MS levels is currently available: Yang et al. simulated native mass spectra of intact glycoproteins by using PTM masses and relative abundances derived from middle-down proteomics26. Here, we describe MoFi (short for modification finder), a Python-based program that integrates intact (native) and glycopeptide or released glycan MS data via a two-stage search algorithm. The first stage assigns PTM compositions (comprising monosaccharides and other modifications) to the peaks in a deconvoluted mass spectrum, taking into account the masses of the amino acid sequence and fixed modifications. The second stage merges these intermediary annotations with data from a glycan library, i.e., a (site-specific) list of all glycans and their relative abundances. Thereby, it allows MoFi to compile a list of glycan combinations for each peak and to calculate scores in order to rank alternative annotations. We demonstrate our software by analyzing a series of increasingly complex glycoprotein mass spectra. This approach reveals that some peaks are indeed compatible with a considerable number of glycan combinations. Yet, it also demonstrates that integration of bottom-up data may refine information gained at the intact protein level, since scores derived from relative glycan abundances reveal glycoforms occurring with a higher probability.

EXPERIMENTAL SECTION Implementation of the software. MoFi was implemented in Python v3.5 (Python Software Foundation, https://www.python.org), except for the central algorithm of the first search stage, which was implemented as a C++ extension module due to performance reasons. The software imports the NumPy v1.13.130, pandas v0.20.331, and matplotlib v2.0.232 modules from the SciPy stack. The graphical user interface was built with the Qt 5 framework bound via PyQt5 v5.9 (Riverbank Computing, https://www.riverbank computing.com). MoFi is freely available as source code (File S-1) from GitHub (https://github.com/cdl-biosimilars/mofi) and frozen executable for Linux and Windows (http://cdlbiosimilars.sbg.ac.at). MoFi annotates deconvoluted mass spectra of intact proteins by performing a two-stage search. The first search stage, or composition search, requires a list of experimental masses. Each experimental mass is the sum of two components: (1) A constant mass fix , which can be readily calculated from the amino acid sequence of the protein and from fixed PTMs; and (2) a residual mass res , which is due to unknown modifications. Consequently, fix is equal for all experimental masses in a spectrum, while res differs from peak to peak. The composition search is able to detect any chemical modification of the protein that changes its mass and thereby contributes to res . For instance, addition of a single hexose moiety (C6H10O5) increases res by 162.14 Da (using average atomic masses), while formation of pyroglutamate from glutamine (N–1H–3) decreases it by 17.03 Da. Given a residual mass res , a list of  = 1, … , possible modifications with masses , … ,  and allowed counts  subject to min ≤  ≤ max (min , max : lower and upper limits, respectively), as well as a mass tolerance , the composition search returns a list of tuples ( , … ,  ). Each tuple satisfies |res − ∑    | ≤

 and thus represents a valid stage 1 annotation. For instance, the algorithm may assign the composition “seven hexoses, eight N-acetylhexosamines, two fucoses” to a residual mass of 3054.58 Da, leaving an unexplained mass of 1.78 Da. The search problem encountered in the first search stage is essentially similar to the subset sum problem, a classical problem of theoretical computer science that is known to be NPcomplete33,34. Hence, there is presumably no efficient algorithm to solve this problem, which is why MoFi performs a plain exhaustive search: For each peak, the program recursively enumerates all possible PTM compositions and accepts any composition whose mass falls within the closed interval res − ; res + . Despite the complexity class of the stage 1 search problem, we were able to improve the performance of MoFi by two techniques: First, we implemented the composition search algorithm in C++, which decreased its runtime by approximately two orders of magnitude. Second, we limited the minimum and maximum counts for each modification manually or by employing prior biological knowledge stored in the glycan library, which further reduced the search space size. Taken together, these improvements allowed us to annotate even highly complex spectra within seconds. The second search stage, or structure search, reduces the potentially large number of stage 1 annotations by integrating released glycan or bottom-up data represented by a glycan library. A glycan library is a set of glycans with known monosaccharide content and glycosylation site, optionally supplemented with relative abundances. Given such a library, the structure search first enumerates all possible glycan combinations. For a protein with  = 1, … ,  glycosylation sites, each of which may harbor a single glycan from a site-specific set of glycans Γ , the number of possible glycan combinations is ∏" |Γ |. (Note that Γ has to include a “null glycan” if site  may exist in an unglycosylated form.) The search algorithm then calculates the overall monosaccharide composition of each glycan combination. For instance, the combination A2G0F/A2G1F, which is characteristic of mAbs, contains seven hexoses, eight N-acetylhexosamines, and two fucoses. In order to finally obtain stage 2 annotations, the structure search performs an inner join of the stage 1 annotations on the glycan combinations, using identity of the respective monosaccharide compositions as join predicate. MoFi further splits stage 2 annotations into a number of groups per peak called hits. A single hit comprises all annotations that agree in their glycan set, but differ in site occupancy. For instance, a hit with glycan set (A2G0F, A2G1F) may comprise the two assignments “A2G0F to site A, A2G1F to site B” and vice versa. Here, we call these groups of isobaric annotations permutations. If the glycan library provides relative abundances, MoFi computes permutation and hit scores from these values, assuming that all sites are mutually independent with respect to their glycan distribution. The permutation score is proportional to the probability of a glycan combination in a peak. Assume that MoFi found # permutations for a peak, and that the $-th permutation contains the combination %& = (' , … , '" ) of glycans ' ∈ Γ , … , '" ∈ Γ" with known perm abundances )*+ , … , )*, . Then, the permutation score -& of the $-th permutation is ∏*∈./ )* perm -& = 0 . ∑& ∏*∈./ )* Notably, glycan abundances may differ between sites, i.e., )*+ ≠ )*3 for ' ≡ '5 if ' ∈ Γ , '5 ∈ Γ5 and Γ ≠ Γ5 . In this

ACS Paragon Plus Environment

Page 2 of 9

Page 3 of 9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry case, any two permutations within a common hit will have different scores. The hit score is the sum of the scores of all permutations in a hit. Assume that MoFi found 6 hits for a peak, and that the ℎth hit comprises a set of permutations 89 ⊆ {1, … , #}. Then, the hit score -9hit of the ℎ-th hit is -9hit = = -&

perm

&∈>?

.

Both scores relate to the relative abundance of a peak: From the definition of the permutation score, it is evident that ∑0& -&perm = 1 . Consequently, it discloses the fraction of the relative abundance that is explained by a permutation. For instance, assume that a given residual mass may be compatible with the glycoforms A2G0F/A2G2F and A2G1F/A2G1F, whose scores are 0.7 and 0.3, respectively. Then, the former permutation will account for 70% of the peak abundance, while the latter one will explain the remaining 30%. The same applies to the hit score: Since the family of sets of permutations 89 is a partition of the set of all permutations {1, … , #}, it hit follows that ∑@ 9 -9 = 1 . Software workflow. The layout of the main window (Figure S-1) encourages a workflow that comprises six steps, which are summarized below. More information can be found in the extensive manual included in the software. (1) Load a protein sequence. The amino acid sequence must be in FASTA format, with polypeptide chains separated by header lines (i.e., lines starting with “>”). Calculation of the protein mass depends on (a) the number of chains, since each separate chain increases the protein mass by the mass of a water molecule in addition to the added amino acid residues; (b) the number of disulfides, since each cystine decreases the protein mass by the mass of two hydrogen atoms; (c) prior treatment with PNGase F, which converts all glycosylated asparagine residues to aspartates, corresponding to a mass change of N–1H–1O+1 per N-glycosylation sequon; and (d) the atomic mass set used for calculation35. MoFi ships with three predefined sets of atomic masses: Average masses as defined by IUPAC in 201336, average masses estimated from the isotopic abundance in organic materials35,37, and monoisotopic masses from the 2012 atomic mass evaluation38. Additional mass sets may be provided in the form of configuration files. (2) Specify a list of modifications for the composition search. Each modification comprises a name, a molecular mass (given either explicitly in Da or by a molecular formula) and lower/upper bounds for the number of occurrences. In the case of monosaccharides, these bounds are inferred from the glycan library if available. Modifications can be entered directly in the main window or loaded from a CSV or XLS(X) file. Since the CSV file format is not standardized, MoFi allows changes in CSV parameters like the separation character prior to importing such a file (Figure S-2). The program also provides dialogs for creating modifications that correspond to the mass differences associated with a single amino acid exchange (Figure S-3) or with N- or C-terminal truncations (Figure S-4). (3) Provide a glycan library for the structure search (optional). Each glycan comprises a name, a monosaccharide content a single glycosylation site or multiple sites, and a relative abundance. Providing the latter value is optional, but allows MoFi to calculate hit and permutation scores and is therefore strongly recommended. The software supports arbitrary glycan abbreviations, since there is no generally accepted nomencla-

ture for glycans. Glycans can be entered manually in the main window (table of structures) or loaded from a CSV or XLS(X) file. For convenience, MoFi is able to calculate the monosaccharide composition from abbreviations conforming to the Zhang nomenclature39,40 (e.g., “A2G0F” is converted to “3 Hex, 4 HexNAc, 1 Fuc”). Moreover, the program recognizes modifications like “N300+A2G0F” in peptide mapping results as generated by Thermo Scientific BioPharma Finder (tested up to v3.0), from which it extracts the glycan name, monosaccharide composition and glycosylation site (“N300”). The second search stage also supports N- or C-terminal truncations, which are treated as peptides that are composed of amino acid residues and may be attached to a single site. In this case, MoFi automatically includes the relevant residues as modifications in the composition search (Figure S-4). . (4) Load or enter mass values. MoFi either requires centroid masses from a deconvoluted spectrum stored in a CSV or XLS(X) file, or input of a single mass entered in the main window. While peak intensities allow the program to draw a proper mass spectrum, they are irrelevant for the search algorithm and thus optional. (5) Perform the search(es). If a glycan library is available from step 3, MoFi will perform a composition search followed by a structure search. Otherwise, the program only reports stage 1 annotations. This step also allows to set the mass tolerance (in Da or ppm) for acceptable annotations. (6) Evaluate and save the results. Search results are inherently hierarchical: For each peak, MoFi finds zero or more stage 1 annotations; each of these PTM compositions corresponds to zero or more stage 2 hits, and each hit comprises at least one permutation. The tables at the bottom of the main window represent search results by a tree model, thereby reflecting this hierarchy. For reasons of clarity, results from the composition and structure search appear in separate tables, but remain linked by the nomenclature for the annotation indices. Data sets and software tools. Deconvoluted spectra of adotrastuzumab emtansine41 (Genentech Kadcyla, batch B0003B01, expiration date 01/2017) and recombinant human erythropoietin26 (EDQM reference standard, batch 4) have been previously published. A glycan library for the latter biopharmaceutical was provided by Vojtech Franc (Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, The Netherlands). Mass spectra of intact rituximab (Roche MabThera, batch N7025B04, expiration date 2017-02-01, purchased from a local pharmacy) and its tryptic glycopeptides were provided by Christof Regl (Department of Biosciences, University of Salzburg, Austria) and deconvoluted with Thermo Scientific™ BioPharma Finder™ software v3.0 using default settings. All MoFi input files used in this study are available as ZIP archive (File S-2). Charts were prepared with plotnine v0.2.1 (Hassan Kibirige, https://github.com/has2k1/plotnine). All glycan structures were drawn in GlycanBuilder2 v1.0.0.0_beta42 and appear in SNFG representation43. Figure S-5 illustrates the glycan structures that appear in the used libraries. Throughout this article, we use a glycan nomenclature as introduced by Zhang39,40, augmented by an Ln portion which indicates the presence of n N-acetyllactosamine units. Statistical analyses. Calculation of precision-recall curves is described in the Supplementary Methods. The respective Py-

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

thon code and all input files are available as ZIP archive (File S-3).

RESULTS AND DISCUSSION We selected the following model proteins for demonstrating the capabilities of MoFi: Rituximab, ado-trastuzumab emtansine and human erythropoietin. An application of the program to a more complex setting, that is, for characterizing the glycoform heterogeneity of etanercept on the subunit and intact protein level, will be described elsewhere44. N-glycans of rituximab. Rituximab was the first anti-CD20 mAb to be approved by the US Food and Drug Administration for the treatment of non-Hodgkin’s lymphomas45. The protein comprises two light and two heavy chains with 1326 residues in total. The overall molecular formula based on its amino acid sequence is C6414H9908N1696O2006S44, yielding a molecular mass of 144286.32 Da using average IUPAC masses36. Nglycosylation of each heavy chain at Asn 301 results in a marked microheterogeneity of rituximab, which is evident in a representative deconvoluted mass spectrum of the intact protein (Figure 1): Instead of a single peak at the mass of the unmodified protein, the spectrum contains a series of peaks corresponding to masses between 146619.92 and 147853.38 Da. Since the glycan combinations present on rituximab are well-characterized46,47, we chose this mAb as test case. In a first attempt to annotate the mass spectrum of intact rituximab, we sought to determine the PTM compositions responsible for the observed residual masses. To this end, we supplied MoFi with a list of monosaccharides expected to occur in the glycans: Hexose (Hex), representing isobaric mannose and galactose; N-acetylhexosamine (HexNAc), representing isobaric N-acetylglucosamine; N-acetylneuraminic acid (Neu5Ac); and fucose (Fuc). We also included up to two lysines as variable modifications, referring to the C-terminal lysines of the heavy chains, since these residues may still be present in some molecules48. The N-terminal Gln residues of all four chains usually cyclize to form pyroglutamate49, which results in a loss of four ammonia molecules and corresponds to a mass decrease of 68.12 Da. Moreover, 16 intra- and intermolecular disulfide bonds reduce the mass by an additional 32.26 Da. Taking these fixed modifications into account, MoFi faced a list of residual masses between 2433.98 and 3667.44 Da. The composition search reported a large number of possible stage 1 annotations for these masses, ranging from 67 possibilities for the smallest mass up to 225 possibilities for the largest mass. Obviously, a significant number of these annotations were unfeasible from a biological perspective. For instance, 86 annotations of the most abundant peak (147236.59 Da) contained three or more Fuc residues, which is unlikely given that Chinese hamster ovary cells produce Nglycans with predominantly one core Fuc50,51. Hence, we aimed at reducing the number of stage 1 annotations by restricting the types of monosaccharides and their lower and upper limits of occurrence. To this end, we provided MoFi with a glycan library which was derived from tryptic peptide mapping of rituximab and described seven glycans along with their relative abundances. By analyzing the monosaccharide compositions of all ABC5D E = 28 possible glycan combinations inferred from the 5 library, MoFi could narrow the range of allowed counts for Hex, HexNAc and Fuc, and eliminate sialic acids altogether

from the composition search. Concomitantly, the number of stage 1 annotations diminished to values between 0 and 4. The glycan library also enabled MoFi to perform a structure search, which found glycan combinations for 10 out of 13 peaks and confirmed a glycosylation pattern typical of mAbs (Figure 1): The four most abundant peaks denote a series of core-fucosylated glycoforms differing by the mass of a single Hex. In this series, the glycoform with the lowest mass carries two A2G0F glycans, while the glycoform with the highest mass accommodates the glycans A2G1F and A2G2F. The lowest mass in the spectrum corresponds to the M5/M5 glycoform. Five minor peaks indicate proteoforms that comprise one or two C-terminal Lys or lack one Fuc.

Figure 1. Glycoforms of rituximab as reported by the structure search. Glycans of major glycoforms are shown above the peaks in SNFG representation. Empty circles and triangles indicate mass differences between adjacent peaks that correspond to the mass of a single Hex or Fuc, respectively. PTMs of three minor glycoforms (relative abundance less than 0.5%) appear below the spectrum. With default settings, MoFi annotates all peaks drawn in green. For peaks drawn in pink, the software only finds glycan combinations if the library contains additional glycans (lower left) or if the mass tolerance is increased from 5 to 7 Da (lower right). C-terminal Lys residues are indicated by CPK-colored stick representations of their side chains.

Three masses remained unannotated during the structure search. For a pair of peaks at 146862.49 and 147025.17 Da, MoFi reported the glycan combinations M5/A2G1 and M5/A2G2, respectively, after these non-fucosylated glycans were added to the library. This finding demonstrates that MoFi is susceptible to errors that occur upstream in the analytical workflow or to missing information. Accordingly, putative errors in the glycan library will propagate to the glycan search, ultimately resulting in wrong or missing annotations. The third unannotated peak at 147718.21 Da probably corresponds to

ACS Paragon Plus Environment

Page 4 of 9

Page 5 of 9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figure 2. Search results and statistics for Kadcyla. (a) Assignment of glycans and drug molecules. The mass spectrum of intact Kadcyla contains ten groups of peaks whose drug–antibody ratio varies from zero to nine as indicated below the spectrum. Colored peaks correspond to glycoforms with three conjugated DM1-MCC molecules. Glycans of the six most abundant glycoforms (violet, filled circles) are shown. Each glycoform may contain a single unconjugated MCC linker, giving rise to an additional series of peaks (green, open circles). MoFi also reports putative glycan combinations for most minor glycoforms, nine of which are annotated (labels A to I). (b) Semilogarithmic plot of the search space size versus residual mass. From the top to the bottom curve, an increasing number of restrictions is applied to the type and number of modifications. Arrows indicate the corresponding factors by which the cumulative search space size decreases. (c) Boxplot of the number of annotations versus mass tolerance. Each box summarizes the number of results for all peaks in the spectrum. In the case of stage 1 results, semitransparent points depict the raw data. The ends of the whiskers indicate the smallest and largest value within 1.5 interquartile range of the first and third quartile, respectively.

the A2G2F/A2G2F glycoform with a theoretical mass of 147725.19 Da. In this case, the deconvolution algorithm may have failed at calculating the correct uncharged mass, which again results in erroneous annotations in the first or second search stage. N-glycans and drug molecules of Kadcyla. Kadcyla (adotrastuzumab emtansine) is an antibody-drug conjugate (ADC) that has been approved for treating metastasized HER2positive breast cancer52. Kadcyla consists of the maytansinoid emtansine (DM1) conjugated to trastuzumab through a maleimidomethyl cyclohexane-1-carboxylate (MCC) linker53. A side reaction during the manufacturing process generates antibodies with bound MCC linkers that lack the DM1 “warhead”. These unconjugated linkers result in crosslinking of two Lys residues at the surface of the antibody54. We employed MoFi to annotate a deconvoluted mass spectrum of intact Kadcyla41. Since both heavy chains of trastuzumab are N-glycosylated at Asn 300, we obtained a library from glycopeptide analysis comprising 15 entries (14 N-glycans and the unglycosylated site) for the structure search. Apart from

the monosaccharides deduced from the glycan library (i.e., Hex, HexNAc, Neu5Ac, and Fuc), the composition search included DM1-MCC (C47H61ClN4O13S, average mass 957.53 Da) and MCC (C12H15O3N, average mass 221.26 Da) as variable modifications with maximum occurrences of 10 and 1, respectively. With these settings, MoFi proposed glycan combinations for all major and most of the low-abundant peaks in the spectrum, leaving only 9 out of 158 peaks unannotated. The Highlight delta series functionality of MoFi further aided us in interpreting initial search results; this feature highlights up to two delta series (i.e., series of peaks with a user-defined mass difference) simultaneously (Figure S-6). Comprehensive analysis of the annotations and delta series revealed that the mass spectrum of intact Kadcyla results from a superposition of three components55,56 (Figure 2a): (a) Nglycosylation and its associated peak pattern (see the discussion on rituximab above); (b) zero to nine DM1-MCC molecules, whose number is known to follow a Poisson distribution57; and (c) the presence of at most one unconjugated (“dead”) linker.

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3. Search statistics and hierarchy of stage 2 annotations for rhEPO. (a, b) Variation of search space size and number of annotations with residual mass. Residual masses to the right of the vertical solid line derive from the mass spectrum of rhEPO, which is shown at the top of chart (a). Residual masses to the left of the line (0 to 7850 Da) were added manually in 50 Da steps. They correspond to simulated peaks whose masses are lower than the ones actually observed, but still larger than the mass of the rhEPO peptide chain. (c) Hierarchical search results for the most abundant peak in the spectrum (index 111, 29888.02 Da). Considering all glycan combinations that can be inferred from the library, only a single stage 1 annotation (111-13) is able to explain the residual mass associated with this peak (11652.05 Da). Three stage 2 hits (111-13-0 to 111-13-2) and ten permutations (111-13-0-0 to 111-13-2-0) are compatible with this PTM composition. Three-letter strings depict all possible assignments of the N-glycans (A to D) to the three glycosylation sites. For example, “CDA” indicates a glycoform with glycan C at Asn 24, glycan D at Asn 38, and glycan A at Asn 83. Each arrow has a line width proportional to the respective hit or permutation score.

Due to its complexity, the mass spectrum of Kadcyla is wellsuited to demonstrate how the combinatorial search may benefit from prior knowledge. A “naïve” composition search includes at least seven modifications (DM1, MCC, Hex, HexNAc, Neu5Ac, N-glycolylneuraminic acid (Neu5Gc), and Fuc), whose lower and upper counts are not restricted. Consequently, the search space grows almost exponentially with the residual mass (Figure 2b), and its cumulative size is above 3 × 108 (Table S-1). (The cumulative search space size is the total number of compositions MoFi has to check during the first search stage.) However, the following restrictions drastically decrease the cumulative size by a total factor of 454.9: (1) Limit the maximum number of DM1 and MCC moieties to 10 and 1, respectively; (2) extract the set of monosaccharides from the glycan library; and (3) calculate feasible lower and upper bounds for their counts. The overall improvement of the search space is especially pronounced in the case of large residual masses: While prior knowledge decreases the search space size merely by a factor of 9.7 for the smallest mass, the largest mass benefits from a reduction factor of 1351.8.

To a lesser extent, search space size depends on the mass tolerance, at least if an increase of this parameter allows MoFi to explore additional PTM compositions. Importantly, high values of the mass tolerance yield a large number of stage 1 and 2 annotations (Figure 2c). Hence, an appropriate mass tolerance is critical: Values too high will lead to many incorrect annotations, while annotations may be missed if the mass tolerance setting is too strict. This is also evident from a precision-recall (PR) analysis58, which was performed for a series of simulated zero charge mass spectra of Kadcyla differing in mass error. For our program, the PR analysis highlights the influence of mass tolerance on the overall quality of the search results in terms of precision (i.e., how many of the found annotations are indeed correct?) and recall (i.e., how many of the correct annotations are found?). On each PR curve (Figure S7), the point where the mass tolerance equals the mass error generally denotes high precision combined with high recall. This observation confirms that the mass tolerance used by MoFi should match the mass error present in the spectrum. Notably, curves recorded for low mass errors run close to the top right corner of the respective plots, which implies that high

ACS Paragon Plus Environment

Page 6 of 9

Page 7 of 9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry analytical accuracy enables annotations that are both sensitive and precise. N-and O-glycans of human erythropoietin. Erythropoietin is a hormone that is mainly synthesized in peritubular fibroblasts of the renal cortex and stimulates the proliferation of erythrocyte progenitors via JAK/STAT signaling59. Since 1989, recombinant human erythropoietin (rhEPO) has been available for the treatment of anemia induced by chronic renal failure60. Due to its infamous use as blood doping agent in sports, rhEPO may be regarded as one of the best-known biopharmaceuticals61. The protein comprises 165 amino acids and two disulfide bridges62, resulting in a molecular mass of 18235.97 Da for the peptide chain using average IUPAC masses. However, N-glycosylation at three sites (Asn 24, 38 and 83)63 as well as O-glycosylation at Ser 12664 typically increases the mass of rhEPO to values between 26 and 33 kDa26. Although glycosylation decreases the affinity of rhEPO for its receptor, it concomitantly increases the biological half-life of the hormone and therefore is a prerequisite for its in vivo efficacy64,65. Compared to the previous examples, the native mass spectrum of intact rhEPO represents an even more complex test case for MoFi: First, the residual masses are relatively large, i.e., between 7851.76 and 14617.90 Da. Second, the protein accommodates both N- and O-glycans. Unlike the glycosylation sites of mAb heavy chains, which mostly harbor identical sets of glycans due to symmetry, each N-glycosylation site of rhEPO is characterized by a different glycan repertoire. Third, all glycans are sialyated, and possible sialic acids comprise Neu5Ac, Neu5Gc, and their O-acetylated derivatives66. For these reasons, we expected an escalation of the search space size for large residual masses. Indeed, the search space grows exponentially with the residual mass at low values. However, its size stagnates for large residual masses, yielding a sigmoid curve (Figure 3a). This counterintuitive mode of growth results from the limits min and max imposed on the number of each modification. For modifications, these minimum and HIJ maximum counts limit the search space size to ∏ −  ( min + 1). The variation of the number of stage 1 results with residual mass follows a bell-shaped curve, which reaches its maximum at approximately half the maximum possible residual mass (Figure 3b). (Here, the maximum possible residual mass is 15289.76 Da, which corresponds to three A4S4L3G4F glycans, one doubly-sialylated core 1 structure, one Oacetylation and one replacement of Neu5Ac by Neu5Gc.) The observed curve shape results from PTM counts whose range is markedly confined at both the lower and the upper limit of residual masses: Small residual masses are only compatible with compositions that contain few modifications, while large residual masses require tighter lower bounds on the number of modifications. The number of stage 2 annotations depends in a less obvious way on the residual mass. Apparently, only particular masses between 11 and 13 kDa permit a large number of different glycan assignments. However, this mass range also comprises peaks with unique glycan combinations and even unannotated peaks. If supplied with a deconvoluted mass spectrum of rhEPO and a glycan library containing information on both N- and Oglycans, MoFi reproduces the monosaccharide annotations that have been published previously (Supplementary table 1 in Yang et al.26). Notably, MoFi finds additional annotations for the masses with index 96, 97, 127, 128, 144, 145, 162, and 163. However, these alternatives are characterized by large

mass deviations and/or low hit scores, which renders them rather unlikely. Since each N-glycosylation site accommodates a different set of glycans, each permutation within a common hit has a unique score. For instance, the first search stage yields 115 valid PTM compositions for the most abundant peak (29888.02 Da). The second search stage, however, only retains a single composition, which gives rise to three hits with distinct scores (Figure 3c). In turn, these hits comprise one, three, or six different permutations that drastically differ in their score, which assumes values between 0.32 and 28.34%. Taken together, high permutation scores identify those isobaric glycoforms that contribute the bulk intensity to a peak.

CONCLUSIONS Hybrid MS approaches for characterizing glycoproteins demand innovative bioinformatics tools that combine heterogeneous data collected in different types of experiments14. In line with these requirements, the search algorithms implemented in MoFi represent a novel way of integrating MS data from the intact protein and glycopeptide level. By adopting a purely combinatorial approach, MoFi first determines all monosaccharide/PTM compositions that are consistent with the residual masses derived from a deconvoluted spectrum of an intact glycoprotein. The program then combines these primary annotations with a site-specific glycan library, which is commonly generated by peptide mapping. Alternatively, information on the overall glycan identity and abundance may be entered into MoFi upon released glycan analysis. However, the latter method does not reveal differences between the glycan repertoire of multiple glycosylation sites. For each peak, MoFi eventually reports all glycan combinations whose monosaccharide compositions are supported by stage 1 annotations, augmented with scores proportional to their probability, which in turn bases upon glycopeptide abundances. The program also structures these search results hierarchically. Thereby, it facilitates their analysis at different degrees of complexity, while simultaneously disclosing all proteoforms that are compatible with a given residual mass. Importantly, highly ambiguous annotations indicate that a target protein is too complex for direct analysis at the intact protein level. Consequently, such findings may encourage to reduce sample complexity, e.g., by cleaving multidomain proteins into subunits or by employing a panel of glycosidases to decrease glycan heterogeneity44. Our analyses of therapeutic proteins clearly demonstrate that certain information must be provided to MoFi a priori (i.e., protein mass, fixed modifications, choice of putative PTMs), while other restrictions are optional (i.e., upper/lower bounds for PTM counts, glycan identity and site specificity; see Figure S-8 for an overall scheme). Yet, the latter constraints reduce the search space size and, concomitantly, the number of alternative annotations, which renders them highly useful. In case that the presence of an unknown modification is presumed, an extensive list of allowed modifications may nevertheless be implemented. Moreover, glycan abundances may further improve the significance of the annotations: If interpreted as independent probabilities, they allow to calculate hit and permutation scores, which in turn may help to identify the most abundant glycoform for each peak. In this respect, it is essential to ensure that upstream data is correct (i.e., glycan libraries and deconvoluted masses), since any errors will propagate, ultimately resulting in wrong annotations at the intact protein level. In order to verify the applied glycan library, one may therefore consider cross-validation with data obtained from an orthogonal approach, such as released glycan analysis. In

ACS Paragon Plus Environment

7

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

addition, deconvolution of complex mass spectra should be manually verified by comparison of the deconvoluted and the raw mass spectrum. In summary, we consider MoFi as a tool both for characterizing novel glycoproteins as well as for routinely checking protein samples for changes in their proteoform profiles. Providing a solution to the challenges of intact protein MS and glycan analysis, our software is applicable to both top-down proteomics and biopharma characterization. In this context, the intuitive user interface and unbiased reporting capabilities of MoFi may facilitate implementation of intact protein characterization in a GMP-regulated environment, which is especially attractive for the biopharmaceutical industry.

ASSOCIATED CONTENT Supporting Information The Supporting Information is available free of charge on the ACS Publications website. Precision-recall analysis (Supplementary Methods), the main window of MoFi (Figure S-1), the CSV/XLS(X) import dialog (Figure S-2), the create point mutation dialog (Figure S-3), the create terminal truncation dialog (Figure S-4), glycan structures and abundances for therapeutic proteins (Figure S-5), the highlight delta series feature (Figure S-6), precision-recall analysis (Figure S-7), influence of a priori information on search space size (Figure S-8), and search statistics for Kadcyla (Table S-1) (PDF) MoFi source code (File S-1) (ZIP) MoFi input files and search results for therapeutic proteins (File S-2) (ZIP) Python code and input files for precision-recall analyses (File S-3) (ZIP)

AUTHOR INFORMATION Corresponding Author *E-mail: [email protected]. Phone: (+)43 662 8044 5738. Fax: (+)43 662 8044 5751.

ORCID Wolfgang Skala: 0000-0002-7350-4045 Therese Wohlschlager: 0000-0001-9359-6744 Christian G. Huber: 0000-0001-8358-1880

Notes The authors declare the following competing financial interests: Novartis BTDM, Sandoz GmbH as well as Thermo Fisher Scientific provide financial support for the Christian Doppler Laboratory for Innovative Tools for Biosimilar Characterization. The salaries of Wolfgang Skala and Therese Wohlschlager are fully funded; Christian G. Huber’s salary is partly funded by the Christian Doppler Laboratory for Biosimilar Characterization. The authors declare no other competing financial interest.

ACKNOWLEDGMENT The financial support by the Austrian Federal Ministry of Science, Research, and Economy and by a Start-up Grant of the State of Salzburg is gratefully acknowledged. We thank Vojtech Franc and Albert Heck (University of Utrecht), Aaron Bailey (Thermo Fisher Scientific) and Christof Regl (University of Salzburg) for sharing data; Kai Scheffler, Jennifer Sutton, and Stephane Houel (all from Thermo Fisher Scientific) for scientific discussions; and Urs Lohrig (Sandoz) for critical reading of the manuscript.

REFERENCES

Page 8 of 9

(1) Apweiler, R.; Hermjakob, H.; Sharon, N. Biochim. Biophys. Acta 1999, 1473, 4-8. (2) Khoury, G. A.; Baliban, R. C.; Floudas, C. A. Sci. Rep. 2011, 1, 90. (3) Cummings, R. D.; Pierce, J. M. Chem. Biol. 2014, 21, 1-15. (4) Ohtsubo, K.; Marth, J. D. Cell 2006, 126, 855-867. (5) Freeze, H. H.; Chong, J. X.; Bamshad, M. J.; Ng, B. G. Am. J. Hum. Genet. 2014, 94, 161-175. (6) Freeze, H. H. J. Biol. Chem. 2013, 288, 6936-6945. (7) Stowell, S. R.; Ju, T.; Cummings, R. D. Annu. Rev. Pathol. 2015, 10, 473-510. (8) Walsh, G. Nat. Biotechnol. 2014, 32, 992-1000. (9) Higel, F.; Seidl, A.; Sorgel, F.; Friess, W. Eur. J. Pharm. Biopharm. 2016, 100, 94-100. (10) Jefferis, R. Nat. Rev. Drug Discov. 2009, 8, 226-234. (11) Rogers, R. S.; Nightlinger, N. S.; Livingston, B.; Campbell, P.; Bailey, R.; Balland, A. MAbs 2015, 7, 881-890. (12) Schiestl, M.; Stangler, T.; Torella, C.; Cepeljnik, T.; Toll, H.; Grau, R. Nat. Biotechnol. 2011, 29, 310-312. (13) Alley, W. R., Jr.; Mann, B. F.; Novotny, M. V. Chem. Rev. 2013, 113, 2668-2732. (14) Yang, Y.; Franc, V.; Heck, A. J. R. Trends Biotechnol. 2017, 35, 598-609. (15) Zhang, L.; Luo, S.; Zhang, B. MAbs 2016, 8, 205-215. (16) Leymarie, N.; Zaia, J. Anal. Chem. 2012, 84, 3040-3048. (17) Cao, L.; Qu, Y.; Zhang, Z.; Wang, Z.; Prytkova, I.; Wu, S. Expert Rev. Proteomics 2016, 13, 513-522. (18) Zhu, Z.; Desaire, H. Annu. Rev. Anal. Chem. (Palo Alto Calif.) 2015, 8, 463-483. (19) Staub, A.; Guillarme, D.; Schappler, J.; Veuthey, J. L.; Rudaz, S. J. Pharm. Biomed. Anal. 2011, 55, 810-822. (20) Leney, A. C.; Heck, A. J. J. Am. Soc. Mass Spectrom. 2017, 28, 5-13. (21) Zacchi, L. F.; Schulz, B. L. Glycoconj. J. 2016, 33, 359-376. (22) Ayoub, D.; Jabs, W.; Resemann, A.; Evers, W.; Evans, C.; Main, L.; Baessmann, C.; Wagner-Rousset, E.; Suckau, D.; Beck, A. MAbs 2013, 5, 699-710. (23) Yang, Y.; Wang, G.; Song, T.; Lebrilla, C. B.; Heck, A. J. R. MAbs 2017, 9, 638-645. (24) Leymarie, N.; Griffin, P. J.; Jonscher, K.; Kolarich, D.; Orlando, R.; McComb, M.; Zaia, J.; Aguilan, J.; Alley, W. R.; Altmann, F.; Ball, L. E.; Basumallick, L.; Bazemore-Walker, C. R.; Behnken, H.; Blank, M. A.; Brown, K. J.; Bunz, S. C.; Cairo, C. W.; Cipollo, J. F.; Daneshfar, R., et al. Mol. Cell. Proteomics 2013, 12, 2935-2951. (25) Franc, V.; Yang, Y.; Heck, A. J. Anal. Chem. 2017, 89, 34833491. (26) Yang, Y.; Liu, F.; Franc, V.; Halim, L. A.; Schellekens, H.; Heck, A. J. Nat. Commun. 2016, 7, 13397. (27) Lazar, I. M.; Deng, J.; Ikenishi, F.; Lazar, A. C. Electrophoresis 2015, 36, 225-237. (28) Tsai, P. L.; Chen, S. F. Mass Spectrom. (Tokyo) 2017, 6, S0064. (29) Woodin, C. L.; Maxon, M.; Desaire, H. Analyst 2013, 138, 2793-2803. (30) van der Walt, S.; Colbert, S. C.; Varoquaux, G. Comput. Sci. Eng. 2011, 13, 22-30. (31) McKinney, W. In Proceedings of the 9th Python in Science Conference, van der Walt, S.; Millman, J., Eds., 2010, pp 51-56. (32) Hunter, J. D. Comput. Sci. Eng. 2007, 9, 90-95. (33) Garey, M. R.; Johnson, D. S. Computers and intractability; Freeman: New York, 1979, p 338. (34) Karp, R. M. In Complexity of Computer Computations, Miller, R. E.; Thatcher, J. W.; Bohlinger, J. D., Eds.; Springer US: Boston, MA, 1972, pp 85-103. (35) Zhang, Z.; Pan, H.; Chen, X. Mass Spectrom. Rev. 2009, 28, 147-176. (36) Meija, J.; Coplen, T. B.; Berglund, M.; Brand, W. A.; Bièvre, P.; Gröning, M.; Holden, N. E.; Irrgeher, J.; Loss, R. D.; Walczyk, T.; Prohaska, T. Pure Appl. Chem. 2016, 88, 201. (37) Coplen, T. B.; Böhlke, J. K.; De Bièvre, P.; Ding, T.; Holden, N. E.; Hopple, J. A.; Krouse, H. R.; Lamberty, A.; Peiser, H. S.;

ACS Paragon Plus Environment

8

Page 9 of 9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry Revesz, K.; Rieder, S. E.; Rosman, K. J. R.; Roth, E.; Taylor, P. D. P.; Vocke, R. D.; Xiao, Y. K. Pure Appl. Chem. 2002, 74, 1987. (38) Audi, G.; Wang, M.; Wapstra, A. H.; Kondev, F. G.; MacCormick, M.; Xu, X.; Pfeiffer, B. Chin. Phys. C 2012, 36, 12871602. (39) Zhang, Z. Anal. Chem. 2009, 81, 8354-8364. (40) Shah, B.; Jiang, X. G.; Chen, L.; Zhang, Z. J. Am. Soc. Mass Spectrom. 2014, 25, 999-1011. (41) Bailey, A. O.; Houel, S.; Scheffler, K.; Damoc, E.; Sutton, J.; Josephs, J. L., Complete characterization of a lysine-linked antibody drug conjugate by native LC/MS intact mass analysis and peptide mapping; Thermo Fisher Application note 72511; 2017. (42) Tsuchiya, S.; Aoki, N. P.; Shinmachi, D.; Matsubara, M.; Yamada, I.; Aoki-Kinoshita, K. F.; Narimatsu, H. Carbohydr. Res. 2017, 445, 104-116. (43) Varki, A.; Cummings, R. D.; Aebi, M.; Packer, N. H.; Seeberger, P. H.; Esko, J. D.; Stanley, P.; Hart, G.; Darvill, A.; Kinoshita, T.; Prestegard, J. J.; Schnaar, R. L.; Freeze, H. H.; Marth, J. D.; Bertozzi, C. R.; Etzler, M. E.; Frank, M.; Vliegenthart, J. F.; Lütteke, T.; Perez, S., et al. Glycobiology 2015, 25, 1323-1324. (44) Wohlschlager, T.; Scheffler, K.; Forstenlehner, I. C.; Senn, S.; Damoc, E.; Holzmann, J.; Huber, C. G.: submitted for publication, 2017. (45) Leget, G. A.; Czuczman, M. S. Curr. Opin. Oncol. 1998, 10, 548-551. (46) Samonig, M.; Huber, C. G.; Scheffler, K., LC/MS Analysis of the Monoclonal Antibody Rituximab Using the Q Exactive Benchtop Orbitrap Mass Spectrometer; Thermo Fisher Application Note 591; 2013. (47) Visser, J.; Feuerstein, I.; Stangler, T.; Schmiederer, T.; Fritsch, C.; Schiestl, M. Biodrugs 2013, 27, 495-507. (48) Harris, R. J. J. Chromatogr. A 1995, 705, 129-134. (49) Liu, H.; Ponniah, G.; Zhang, H.-M.; Nowak, C.; Neill, A.; Gonzalez-Lopez, N.; Patel, R.; Cheng, G.; Kita, A. Z.; Andrien, B. mAbs 2014, 6, 1145-1154. (50) Reusch, D.; Haberger, M.; Falck, D.; Peter, B.; Maier, B.; Gassner, J.; Hook, M.; Wagner, K.; Bonnington, L.; Bulau, P.; Wuhrer, M. mAbs 2015, 7, 732-742. (51) Reusch, D.; Haberger, M.; Maier, B.; Maier, M.; Kloseck, R.; Zimmermann, B.; Hook, M.; Szabo, Z.; Tep, S.; Wegstein, J.; Alt, N.; Bulau, P.; Wuhrer, M. mAbs 2015, 7, 167-179. (52) Verma, S.; Miles, D.; Gianni, L.; Krop, I. E.; Welslau, M.; Baselga, J.; Pegram, M.; Oh, D.-Y.; Diéras, V.; Guardino, E.; Fang, L.; Lu, M. W.; Olsen, S.; Blackwell, K. N. Engl. J. Med. 2012, 367, 1783-1791. (53) Lewis Phillips, G. D.; Li, G.; Dugger, D. L.; Crocker, L. M.; Parsons, K. L.; Mai, E.; Blättler, W. A.; Lambert, J. M.; Chari, R. V. J.; Lutz, R. J.; Wong, W. L. T.; Jacobson, F. S.; Koeppen, H.; Schwall, R. H.; Kenkare-Mitra, S. R.; Spencer, S. D.; Sliwkowski, M. X. Cancer Res. 2008, 68, 9280-9290. (54) Chen, Y.; Kim, M. T.; Zheng, L.; Deperalta, G.; Jacobson, F. Bioconjug. Chem. 2016, 27, 2037-2047. (55) Marcoux, J.; Champion, T.; Colas, O.; Wagner-Rousset, E.; Corvaïa, N.; van Dorsselaer, A.; Beck, A.; Cianférani, S. Protein Sci. 2015, 24, 1210-1223. (56) Chen, L.; Wang, L.; Shion, H.; Yu, C.; Yu, Y. Q.; Zhu, L.; Li, M.; Chen, W.; Gao, K. MAbs 2016, 8, 1210-1223. (57) Kim, M. T.; Chen, Y.; Marhoul, J.; Jacobson, F. Bioconjug. Chem. 2014, 25, 1223-1232. (58) Raghavan, V.; Bollmann, P.; Jung, G. S. ACM Trans. Inf. Syst. 1989, 7, 205-229. (59) Jelkmann, W. Transfus. Med. Hemother. 2013, 40, 302-309. (60) Ridley, D. M.; Dawkins, F.; Perlin, E. J. Natl. Med. Assoc. 1994, 86, 129-135. (61) Robinson, N.; Giraud, S.; Saudan, C.; Baume, N.; Avois, L.; Mangin, P.; Saugy, M. Br. J. Sports Med. 2006, 40 Suppl 1, i30-34. (62) Lai, P. H.; Everett, R.; Wang, F. F.; Arakawa, T.; Goldwasser, E. J. Biol. Chem. 1986, 261, 3116-3121. (63) Tsuda, E.; Goto, M.; Murakami, A.; Akai, K.; Ueda, M.; Kawanishi, G.; Takahashi, N.; Sasaki, R.; Chiba, H.; Ishihara, H.; et al. Biochemistry 1988, 27, 5646-5654.

(64) Tsuda, E.; Kawanishi, G.; Ueda, M.; Masuda, S.; Sasaki, R. Eur. J. Biochem. 1990, 188, 405-411. (65) Darling, R. J.; Kuchibhotla, U.; Glaesner, W.; Micanovic, R.; Witcher, D. R.; Beals, J. M. Biochemistry 2002, 41, 14524-14531. (66) Hokke, C. H.; Bergwerff, A. A.; Van Dedem, G. W.; Kamerling, J. P.; Vliegenthart, J. F. Eur. J. Biochem. 1995, 228, 9811008.

TOC GRAPHIC

ACS Paragon Plus Environment

9