Rapid Identification of Disulfide Bonds and Cysteine-Related Variants

Dec 1, 2018 - ... data partitioning, model evaluation, variable importance, and cross validation were facilitated by standard methods in the caret pac...
0 downloads 0 Views 919KB Size
Subscriber access provided by TULANE UNIVERSITY

Article

Rapid Identification of Disulfide Bonds and Cysteine-Related Variants in an IgG1 Knob-into-Hole Bispecific Antibody Enhanced by Machine Learning Jordan J Baker, Dana McDaniel, David Cain, Paula Lee Tao, Charlene Li, Yuting Huang, Hongbin Liu, Judith Zhu-Shimoni, and Milady R. Ninonuevo Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.8b04071 • Publication Date (Web): 01 Dec 2018 Downloaded from http://pubs.acs.org on December 2, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 23 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

0

Rapid Identification of Disulfide Bonds and Cysteine-Related Variants in an IgG1 Knob-into-Hole Bispecific Antibody Enhanced by Machine Learning Jordan J. Baker1,2, Dana McDaniel1, David Cain1, Paula Lee Tao1, Charlene Li1, Yuting Huang1,3, Hongbin Liu1,4, Judith Zhu-Shimoni1, Milady Niñonuevo1* 1Genentech

– 1 DNA Way, South San Francisco, CA 94080

Present Addresses: 2

(J.J.B) Department of Bioengineering, University of California, Berkeley, CA 94720

3 (Y.H)

Pfizer, Inc - 700 Chesterfield Parkway West, Mailstop AA4E, Chesterfield, MO 63017

4 (H.L)

Nektar Therapeutics - 455 Mission Bay Boulevard South, San Francisco, CA 94158

*Corresponding author email: [email protected]; phone: +1 (650) 225-6670

Abstract Bispecific antibodies are regarded as the next generation of therapeutic modalities as they can simultaneously bind multiple targets, increasing the efficacy of treatments for several diseases and opening up previously unattainable treatment designs. Linking two half antibodies to form the knob-into-hole bispecific requires an additional in vitro assembly step, starting with reduction of the antibodies and then re-oxidization. Analysis of the disulfide bonds (DSBs) is vital to ensure the correct assembly, stability, and higher-order structures of these important biomolecules because incorrect disulfide bond formation and/or presence of cysteine-related post-translational modifications can cause a loss of biological activity or even elicit an immune response from the host. Despite advancements in analytical methods, characterization of cysteine forms remains technically challenging and time-consuming. Herein, we report the development of an improved non-reduced peptide map method coupled with machine learning to enable rapid identification of disulfide bonds and cysteine-related variants in an IgG1 knob-into-hole bispecific antibody. The enhanced method offers a fast, consistent, and accurate workflow in mapping-out expected disulfide bonds in both half antibodies and bispecifics, and identifying cysteine-related modifications. Comparison between two versions of the bispecific molecule and analysis of stressed samples were also accomplished, indicating this method can be utilized to identify alterations originating from bioprocess changes and to determine the impact of assembly and post-assembly stress conditions to product quality.

Introduction Bispecific antibodies provide an innovative therapeutic option for previously untreated medical conditions due to their ability to bind at least two targets with a single molecular entity. There are over sixty different bispecific formats to date derived from antibody motifs1. The “knob-into-hole” format was successfully generated by the heterodimerization of two half antibodies, which is favorably formed by “knob” and “hole” modifications of the CH3 domain in the Fc region2.

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 23

1

While holding incredible promise for treating patients with unmet medical needs, these bispecific antibodies present unique analytical challenges with atypical manufacturing requirements3. The knob-into-hole format is produced by generating each half antibody in two different cell fermentations and then in vitro assembly to construct the bispecific antibody. The distinct assembly step poses potential risks of forming unwanted by-products because the antibodies are reduced and then re-oxidized to form the bispecific antibody. During assembly, excess reduced glutathione (GSH) provides reducing equivalents for the hinge region cysteines by reducing disulfide bonds and capping the cysteines. Then, the assembly conditions, such as elevated temperature, air oxidation, and alkaline pH, favor the formation of the disulfide bonds between the two half antibodies to generate the bispecific antibody4. Incorrect disulfide bond (DSB) formation or cysteine modifications generated during or after assembly can cause a loss of biological activity or even elicit an immune response from the host5,6,7,8. Therefore, it is essential to analyze the assembled bispecific antibody for product quality to ensure that the protein is properly folded with minimal traces of cysteine-related by-products that may affect structural integrity3,9. Identification and analysis of these critical quality attributes (CQAs) is an essential step of pharmaceutical development to ensure the quality, efficacy, and safety of the final product as stated on the label. The knowledge of CQAs enables the design and development of the manufacturing processes, including cell culture processes, purification, and formulation. As with most analytical methods, throughput capability is a critical element to meet today’s needs10,11,12,13. Currently, non-reduced/reduced peptide mapping has been utilized to detect various disulfide isoforms of human IgG2 antibodies14, to detect locations of conjugated species on antibodies15, to map and identify disulfide bonds in proteins16,17, and to detect modifications and sequence alterations to single amino acids, such as oxidation, thioethers, trisulfides, and other cysteine-related modifications18,19,20. These methods can also be used in biosimilar development to compare the disulfide bonds and structures of antibodies21. While providing many analytical benefits, there are several challenges in developing such analytical methods. Each step needs to be refined to ensure complete digestion of the protein while minimizing off-target labeling or method-induced artifacts such as disulfide scrambling. In addition, the liquid chromatography (LC) gradient and buffers need to be optimized to obtain adequate separation as well as optimization of mass spectrometry (MS) settings to provide accurate full scan MS and high-quality tandem MS data for enhanced identification of cysteine-related variants22. Current workflows for non-reduced peptide maps suffer from inefficient and potentially inconsistent data analysis, all the peptides, modifications, and disulfide linkages must be verified manually for correct peak assignments. For knob-into-hole bispecific, for example, there could be about a hundred LC peaks and tens of thousands of MS peaks that need to be curated, leading to time-consuming data analysis, especially if many modifications or alterations are found. With the remarkable improvement in speed in MS data acquisition along with the information-rich data that MS provides, the demand arises to keep up with analyzing complex and heterogeneous data sets and interpreting the data into meaningful and useful information. To that end, machine learning (ML) algorithms have been increasingly exploited in wideranging areas of science such as metabolomics23, proteomics, genomics24, drug discovery25,26,27, pharmaceutical formulation development28, and likewise in the realms of business and government29. Machine learning techniques have already been applied to the analysis of MS data for proteins to improve peptide identification from MS2 data30, rescoring

ACS Paragon Plus Environment

Page 3 of 23 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

2

MASCOT peptide-spectra matches31, improving S/N ratios in SAMDI-TOF peptide array designs32, early detection of ovarian cancer33 and identification of sub-cellular protein locations34. In this paper, an improved non-reduced peptide mapping LC-MS method is described. The analysis of an IgG1 knob-into-hole bispecific antibody with eleven unique (sixteen total) disulfide peptide bonds (Figure 1) was streamlined based on a method for analyzing standard monoclonal antibodies35. Utilizing an Excel workbook, PADMA (PepFinder Automated Data Macro Analysis), and machine learning for data analysis, this method allows for a semiautomated, high throughput analysis of cysteine-related characteristics, including verification of expected disulfide bonds and identification of mis-paired disulfide bonds. The comprehensive and accurate results also contain information on free thiol and cysteine modifications35. Moreover, this method could provide product quality information related to cysteine modifications stemming from bioprocess changes, varying assembly conditions, or post-assembly stress conditions.

Figure 1. Structure and disulfide bonds of an IgG1 bispecific antibody. Red lines indicate a disulfide bond between two cysteines (labeled C1-C32).

Materials and Methods Protein samples are prepared and analyzed on the LC-MS. Data is processed and analyzed to gain meaningful results. Schematic Diagram 1 details the entire process from sample preparation to data analysis. Two versions of the bispecific were analyzed to evaluate product quality and to demonstrate if this method can identify differences (if any) in product quality

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 23

3

stemming from process changes. The two versions differed in the cell line used for the production of the knob half antibody, a different feeding strategy for the hole half antibody cell culture process, and in Version 2, two of the purification chromatography steps (second and third chromatographic steps) were reversed compared to Version 1.

Protein denaturation and free thiol capping with NEM

Machine learning algorithms update PADMA to improve ID of disulfide bonds and modifications

Buffer exchange into digestion buffer

PepFinder 2.0 outputs are used as inputs for PADMA

Digestion with Trypsin/Lys-C

LC-MS data for both samples analyzed using PepFinder 2.0 software

Sample split into two aliquots – one is to be reduced with TCEP, the other to be left untreated

Both reduced and non-reduced samples analyzed via LCMS

Schematic Diagram 1. Workflow for sample preparation, LC-MS analysis, and data analysis from the beginning sample to final results. N-Ethylmaleimide (NEM), tris(2-carboxyethyl) phosphine (TCEP), PepFinder Automated Data Macro Analysis (PADMA).

Sample Preparation Five hundred (500) µg of protein (>3 mg/mL) was diluted to 1 µg/ µL using denaturing buffer with N-Ethylmaleimide (NEM) (6M Guanidine HCl + 360mM Sodium Acetate + 2mM EDTA + NEM to a final molar ratio of protein: NEM of 1:1,460, pH 6) to a final volume of 500 uL, based on previous protocols analyzing monoclonal antibodies35. The protein in denaturing buffer with NEM was incubated at 37 °C for 30 minutes, shaking at 300 rpm. The length of the denaturing/NEM labeling step was optimized to minimize off-target NEM labeling of amino acids other than cysteine. A time-course experiment was run with incubation times up to four hours and with the current molar ratio of protein to NEM. The concentration of the protein was measured using a Thermo Nanodrop 8000 at a wavelength of 280 nm and then the sample was buffer exchanged using Illustra NAP-5 columns equilibrated with digestion buffer (25mM Tris + 1 mM CaCl2, pH 7) and eluted with 1,000 µL of the digestion buffer. The protein concentration was measured to be around 0.5 µg/ µL. Ten (10) µL of Promega Trypsin/LysC enzyme mixture at 1 µg/ µL was added and the samples were incubated for four hours at 37 °C at 300 rpm. After the incubation, the sample was split into two 500 µL aliquots and 15 µL of 0.5 M TCEP-HCl (tris(2-carboxyethyl) phosphine hydrochloride) was added to one of the aliquots to reduce the disulfide bonds in the sample. This reduced digest was incubated at 37 °C for 40 minutes while the non-reduced digest was kept at room temperature for 40 minutes. Finally, both samples were quenched with 10 µL of 25% trifluoroacetic acid (TFA).

ACS Paragon Plus Environment

Page 5 of 23 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

4

Liquid Chromatography – Mass Spectrometry (LC-MS) Peptide separation was performed using a Waters H-Class UPLC (Ultra-pressure liquid chromatography) with a Waters Acquity BEH C8, 1.7 µm, 2.1x150mm column. The flow rate was set at 0.4 mL/min with a split flow of 0.2 mL/min diverted to the mass spectrometer, a column temperature of 60°C, auto sampler temperature of 2-8°C, an injection volume of 40 µL, and a UV detection at 214 nm. Mobile phase A was 0.1% (v/v) TFA in water and mobile phase B was 0.08% (v/v) TFA in acetonitrile with a 65-minute gradient method described below. The mass spectrometer was a Thermo Q Exactive Plus Hybrid Quadrupole-Orbitrap Mass Spectrometer equipped with heated-electrospray ionization probe (HESI-II) operated in positive polarity, a full scan range of 250-2000 m/z, a full MS resolution of 70,000, a MS2 resolution of 17,500, and a top 5 dynamic exclusion. ESI source conditions were as follows: spray voltage of 3.5kV, capillary temperature of 320 °C, and a sheath gas pressure of 8 arbitrary units. The UHPLC was operated with a 65-minute gradient starting at 0% mobile phase B for the first 3 minutes. The gradient then changed linearly with time to 19% B after 12 min, 23% B after 16 minutes, 30% B after 30 minutes, stayed at 30% until 32 minutes, then changed to 33% after 33 minutes, stayed at 33% until 40 minutes, then changed to 40% B after 43 minutes, 45% B after 48 minutes, 80% B after 49 minutes, stayed at 80% until 55 minutes, then changed to 0% B after 56 minutes, and then stayed at 0% B until 65 minutes. Degraded Panel/Stress Sample Conditions Formulated drug substance solutions were subjected to a panel of typical stress conditions to assess stability and to identify any changes to the bispecific antibody from these conditions. Thermal stressed materials were stored at 40°C for 4 weeks. An oxidative stress was applied by adding 2,2’-azobis(2-amidinopropane) dihydrochloride (AAPH) to a final concentration of 1 mM and then the samples were incubated at 40°C for 24 hours. Light stressed samples were placed in a light box at 25°C, 60% relative humidity, at 5,000 lux for 240 hours (1.2 million lux hours). Samples were buffer exchanged into pH 8.2 Tris buffer using dialysis cassettes and placed on stability at 40°C for seven days and then buffer exchanged back to formulation buffer in order to assess a basic stress. Glutathione (GSH) stress during assembly was tested by performing the operation at one third of the normal GSH concentration (Low GSH) and 2.7 times the normal GSH concentration (High GSH) compared to the reference material. In order to create a positive control for scrambled disulfide bonds, the samples were heated to 90°C for either one or two hours before starting the peptide mapping sample preparation. Data Analysis PepFinder 2.0 software from Thermo Fisher Scientific was used to analyze the MS files using the following settings, which were determined after an extensive design of experiment (DOE) study investigating how changing input settings affected output results: minimum MS signal of 1,000, using the non-reduced (NR) and reduced peptide maps together for processing, and searching for the modifications of interest. Those modifications were NEM additions, cysteinylation, glutathionylation, thioether, trisulfide, and oxidation. Pepfinder searches for these modifications by analyzing the mass shift caused by the modifications compared to the expected mass of each amino acid in a peptide along with MS data. From the PepFinder data output, the peptides and modifications were then selected for inclusion based on the following criteria: confidence > 0.6, non-reduced MS peak area had to be greater than five times larger than the reduced peak area (for disulfide bonds only, the peak area should be low in the reduced condition but large in the non-reduced condition), and a mass error less than 5

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 23

5

ppm. Additionally, MS2 data was manually analyzed to verify the identification of the peptide and/or modification. These settings provided consistent results for replicate injections and sample preparations. An Excel workbook, PADMA, was created to automate the data analysis by comparing results to previous, manually identified and verified peptides, disulfide bonds, and modifications to increase throughput. Machine learning models were integrated into PADMA to further increase speed and accuracy of the data analysis by screening out artifacts, shown in the Results section. Evaluation of MS softwares such as Mass Matrix and Protein Metrics (PMI) was performed and identified PepFinder to be best suitable for the purpose of this work. While Mass Matrix software is a great and fast visual tool for identifying disulfide bonds, and scrambled disulfides, the software relies heavily on MS2 data and was unable to identify two expected disulfide bonds (hinge, CH1), potentially due to a poor MS2 of relatively large peptides such as CH1. The software also limits further automation of this workflow. PMI software could be a viable option for automation, however, similar to PepFinder, PMI requires searching and filtering results and then performing a manual verification in order to extract concise and meaningful information out of the PMI search results output. Furthermore, PepFinder incorporates additional criteria of requiring a reduced peptide map as another confirmatory criterion for a positive identification which is currently not a feature in the PMI software. Machine Learning Methods Machine learning (ML) approaches are designed to generate the most effective model from large volume and multidimensional data to enable consistent, high quality analysis without compromising analysis throughput. Employment of decision trees (also called classification trees) is a popular technique in data mining due to the interpretability of the resulting model36 and its tolerance for noise in data37. Stochastic gradient boosting is an independent ML technique that leverages random sampling from training data and a gradient descent algorithm to iteratively construct an ensemble regression model with greater accuracy and less overfitting38. These techniques are readily available for use in statistical software packages. Machine learning algorithms learned from PepFinder data exports (n = 39 developmental peptide maps) to create a model which evaluated MS data orthogonally to a human analyst. These peptide maps were drawn from files representing bispecific antibody samples with differing instruments, analysts during preparation, bispecific assembly and stress conditions. These maps were selected as representative data, and as a safeguard against changing process conditions over time. Quantitative attributes pertaining to cysteine-linked peptides identified in PepFinder, such as MS areas, monoisotopic mass, and confidence score, were loaded into the R statistical computing environment and filtered for observations with greater than zero monoisotopic mass, whose origins and significance in the PepFinder exports were unknown. Each observation was QC labeled with the number of charge states observed in the peptide map, the ratio of reduced MS area to non-reduced MS area, a quality screen for MS area ratio less than zero or infinity, and zero ppm error. Data was partitioned into test and training datasets with an 80:20 split, while maintaining class balance of misidentified and correctly identified peptide species. Misclassifications by either model were manually assessed for classification updates to the original dataset, and the model was retrained. This cycle proceeded until updates were no longer found. The decision tree learning models were created by the RPART algorithm39 and implemented by the recursive partitioning (rpart) package for R40. The RPART learning algorithm recursively partitions the data into groups with greater class purity until a stopping point is

ACS Paragon Plus Environment

Page 7 of 23 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

6

reached, then the classification tree is pruned into a subtree with lower misclassification on cross-validation data. Decision trees are represented by computer science graphs, which is a collection of nodes associated by edges. Nodes are symbolic representations of some set of the data, which are split by decisions into smaller sets of data, also represented by nodes. Descendent nodes are those which are split by a decision, and contain a subset of ancestor node data. The edges associated between two nodes represent which decision was made to generate the descendent node from the ancestor node. Algorithms regarding tree construction generally revolve around three procedures: selecting a decision boundary to split the parent node, decisions regarding when to stop splitting the data into smaller nodes, and assignment of terminal nodes to classes such as misidentifications and true identifications. What follows is a summary adapted from (Therneau, T. M., & Atkinson, E. J. 2011.)39) and (Breiman, Friedman, Olshen, and Strone. 1984.41). The package rpart scans the domain of the input data by binary search for a split that maximizes the information content of the child nodes. Shannon’s entropy42 was used as a basis for the package’s information index: 𝑓(𝑝𝑖) = ― 𝑝𝑖 𝑙𝑛(𝑝𝑖) Where 𝑝𝑖 is the proportion of observations in the class, and node impurity is defined as: 𝐶

𝐼(𝐴) =

∑𝑓(𝑝 ) 𝑖

𝑖=1

Where c is the number of classes being assessed. In this case, PepFinder observations are being screened for misidentifications, so there are two classes. The split being assessed maximizes the information content assessed by finding the best split of ancestor node A into descendent nodes 𝐴𝐿and 𝐴𝑅 via measurement of: ∆𝐼 = 𝑝(𝐴)𝐼(𝐴) ― 𝑝(𝐴𝐿)𝐼(𝐴𝐿) ― 𝑝(𝐴𝑅)𝐼(𝐴𝑅)

Splits are performed until the data being assessed has < 20 observations, or the children nodes from this resulting split have < 7 observations. Setting the minimum data size for the child nodes and parent split size saves computation time, since smaller nodes are typically pruned away during tree cross validation. The cross-validation procedure repeatedly evaluates the accuracy of the fully grown tree and then prunes the tree one terminal node at a time based on whether the node improves the tree information content more by than a given complexity parameter. Typically, accuracy increases during the initial pruning procedure because large and complex trees have overfitted data during training. This pruning procedure repeats until the accuracy of the tree begins to decline, and the maximum accuracy tree is accepted as a final tree. Stochastic gradient boosting learning models (gbm) are implemented using the gbm package43, which relies on a gradient boosting algorithm44,45 of sequentially fitting a base learner to pseudo-residuals by least squares regression38,43: 𝑁

Initialize 𝑓(𝑥𝑖) as a constant, 𝑓(𝑥𝑖) = arg min𝜌 ∑𝑖 = 1𝛹(𝑦𝑖, 𝜌) For t in range (1, T) do: 1.

Compute the negative gradient as a working response:

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 23

7

𝑧𝑖 = ―

|



𝛹(𝑦𝑖, 𝑓(𝑥𝑖)) ∂𝑓(𝑥𝑖) 𝑓(𝑥𝑖) = 𝑓(𝑥𝑖)

2. Fit a regression model, 𝑔(𝑥𝑖), predicting 𝑧𝑖 from the inpute covariates 𝑥𝑖 3. Choose a gradient descent step size: 𝑁

𝜌 = arg min 𝜌

∑𝛹(𝑦 , 𝑓(𝑥 ) + 𝜌𝑔(𝑥 )) 𝑖

𝑖

𝑖

𝑖=1

4. Update the estimate of the loss function 𝑓(𝑥) as: 𝑓(𝑥𝑖) ← 𝑓(𝑥𝑖) + 𝜌𝑔(𝑥) Each of these algorithms are defined by several hyperparameters that were evaluated for optimum performance. For the rpart algorithm, several complexity costs were examined. For the gbm algorithm, a combination of several interaction depths (a feature of 𝜌) and numbers of trees were examined. This hyperparameter tuning for both learning algorithms was performed to maximize the area under the Receiver Operating Characteristic (ROC) curve. ROC is generated from the relationship between sensitivity and specificity in a diagnostic, and has the advantage of being decision threshold independent and insensitive to imbalanced class proportions46,47. Model hyperparameter tuning was performed with three-time k=10 fold cross validation within the training dataset. Algorithm deployment, data partitioning, model evaluation, variable importance, and cross validation was facilitated by standard methods in the caret package48. Readers should be aware that for machine learning model evaluation, standard statistical definitions are used to compare machine learning models against the gold standard of an analyst manually evaluating MS2 data for each observation and classifying them as misidentifications or correct identifications: 𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 =

𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 =

∑𝑡𝑟𝑢𝑒 𝑚𝑖𝑠𝑖𝑑𝑒𝑛𝑡𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛𝑠 𝑑𝑒𝑡𝑒𝑐𝑡𝑒𝑑 ∑𝑎𝑙𝑙 𝑚𝑖𝑠𝑖𝑑𝑒𝑛𝑡𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛𝑠

∑𝑡𝑟𝑢𝑒 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑖𝑑𝑒𝑛𝑡𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛𝑠 𝑑𝑒𝑡𝑒𝑐𝑡𝑒𝑑 ∑𝑎𝑙𝑙 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑖𝑑𝑒𝑛𝑡𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛𝑠

𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑣𝑒 𝑉𝑎𝑙𝑢𝑒 =

∑𝑡𝑟𝑢𝑒 𝑚𝑖𝑠𝑖𝑑𝑒𝑛𝑡𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛𝑠 𝑑𝑒𝑡𝑒𝑐𝑡𝑒𝑑 ∑𝑎𝑙𝑙 𝑚𝑖𝑠𝑖𝑑𝑒𝑛𝑡𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛𝑠

Results & Discussion In order to achieve adequate peak separation during chromatography, high quality MS data, and accurate and reproducible data, several steps of this method were optimized, including the protein concentration used, choice of digestion enzyme, enzyme to protein ratio, digestion time and pH, column type, LC gradient, MS acquisition parameters, and the MS software used (data shown in Supplemental Information). Further optimization determining the best NEM incubation time for capping free thiols was a critical step in identifying free thiols, while preventing method-induced artifacts and off-target labeling which could lead to misidentification of peptides. NEM can bind to other amino acids due to reactions from the imide group or reactive alkene49. For example, since NEM is an alkylating agent, it has been shown that it can alkylate the amino group of lysine (K) and the imidazole group of histidine (H)

ACS Paragon Plus Environment

Page 9 of 23 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

8

while NEM has a high affinity for sulfhydryl groups, it can also react with each of the listed amino acids as well50. N-Ethylmaleimide (NEM) Incubation Time Optimization To investigate how changing the incubation time affected the labeling of free thiols as well as off-targeted NEM labeling, samples were incubated for 0.5, 1, 2, 3, or 4 hours and then digested with Trypsin/Lys-C and analyzed using LC-MS peptide mapping, as described in the methods section. Table 1 shows the relative free thiol amount for six cysteine residues, with all of the other cysteines having no NEM labeling. In addition to relative free thiol amounts, off-targeted labeling on any amino acid other than cysteine was investigated, also shown in Table 1. Table 1. Relative percentage of specific free thiol sites and off-targeted labeling of NEM on amino acids other than cysteine with differing NEM incubation times. Cysteine Residue

4 hours

3 hours

2 hours

1 hour

0.5 hours

Knob VH C1

57.9 %

57.2 %

56.6 %

56.5 %

55.8 %

Knob VH C2

41.0 %

39.4 %

40.4 %

41.6 %

40.3 %

Knob CH3 C10

3.5 %

3.6 %

3.6 %

3.3 %

3.5 %

Hole VH C17

1.8 %

1.7 %

1.6 %

1.8 %

1.7 %

Hole VH C18

5.3 %

5.3 %

5.0 %

5.4 %

5.3 %

Knob CH3 C26

5.8 %

6.2 %

5.8 %

5.8 %

5.9 %

8

6

3

3

1

KDHIE

KDHIE

DH

DI

I

1.10E+06

7.40E+05

3.40E+05

1.60E+05

3.00E+04

# of off-target residues modified Off-target amino acids labeled Total MS area of off-target labeling

The results show that the relative free thiol amount remained fairly consistent at each identified site between 0.5 and 4 hours with less off-target labeling with shorter incubation times, leading to the selection of a 0.5 hr incubation time for this method. During these experiments, the total MS peak area for labeled free thiol sites remained relatively constant, indicating no changes to total MS peak area for free thiols. Interestingly, some free cysteines detected did not have a corresponding free cysteine from the expected disulfide bond (e.g. Knob CH3 C10 had a small amount of free cysteine while the other cysteine in the expected disulfide bond, Knob CH3 C11, had no free cysteines detected). This phenomenon has been found in the analysis of other proteins and suggests one cysteine is possibly buried and less accessible to chemical modifications20. In addition, decreasing the incubation time leads to a decrease in the number of different residues and the total peak area of off-targeted labeling. PADMA Improves Data Analysis Throughput Previously, data analysis for this method required ~4-6 hours of analysis per sample. The majority of this analysis time came after the PepFinder software provided a CSV output. First, the data had to be sorted into disulfide bond peptides and cysteine modification peptides and sorted to have a confidence greater than 0.6 (~15 minutes). Next, each disulfide bond peptide, numbering about 70 per sample, had to be compared to the theoretical disulfide bond peptides provided by GPMAW_9.0 software, including the calculation of mass error and verification with MS2 data (~2.5 hours). After the disulfide bonds were analyzed, a similar process had to be done with the cysteine modifications, numbering around 40 per

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 23

9

sample (~1.5 hours). From there, each of the fragments had to be grouped by the disulfide bond or modified cysteine to calculate total peak areas to compare against other disulfide bonds and modifications for the final results (~1 hour). To improve the throughput of this method, an Excel workbook, named PADMA, was constructed to take inputs from the PepFinder comma-separated values (CSV) (disulfide bonds and cysteine modifications data with confidences greater than 0.6) to calculate the normalized peak areas and relative free thiol amounts. PADMA also automatically calculates mass errors, filtering out any peptides with mass errors greater than 5 ppm by crossreferencing peptides and modifications that were manually verified previously with acceptable mass accuracy and MS/MS data. After analyzing the expected peptides and modifications, PADMA labels all other peptides and modifications that needs to be verified and analyzed manually. PADMA automatically analyzed ~90% of the peptides and modifications, reducing the data analysis time by ~80% (~5 hours to 30 minutes) by eliminating manual analysis of disulfide bond peptides (~2.5 hours to 10 minutes) and cysteine modification peptides (~1.5 hours to 5 minutes), and decreasing the time for the final analysis (~1 hour down to 5 minutes). Because PADMA analyzed ~90% of the peptides and modifications, only ~10% of the peptides had to be manually analyzed and checked with MS2, saving a significant amount of time. In addition, a system suitability test was built into PADMA to verify expected results of the reference material. System suitability was established with reference material data obtained during assay optimization. Assay results are accepted if the attributes are quantified within two standard deviations of the reference material average values. An example of the PADMA output can be seen in Figure S-4. Machine Learning Predicts Method Artifacts and Enhances Data Analysis Accuracy PADMA is improved by the integration of separate ML models for the prediction of assay artifacts and misidentifications in PepFinder data exports for cysteine modifications and disulfide bonded peptides. This discussion focuses on the ML models predicting artifacts in the DSB data sets because outside of free thiols, the bioprocess does not generate enough PTMs at a level which could create robust general purpose models. Although both boosting and recursive partitioning algorithms created models with acceptable performance predicting misidentifications from PepFinder, it was noted that the sensitivity of the gbm model (99.8%) and the positive predictive value of the gbm model (95.5%) were superior to the rpart model sensitivity (93.1%) and rpart model positive predictive value (87.2%). The capability of both ML models to describe test data sets is illustrated by the receiver operating curve (ROC) plot in Figure 2. The corresponding confusion matrices are presented in Table S-1. The decision tree model from the rpart model was selected for integration into the PADMA template due to its ease of interpretation, comparability to the superior gbm model, ease of integration into PADMA with native MS Excel functions, and acceptable performance parameters.

ACS Paragon Plus Environment

Page 11 of 23 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

10

Figure 2. Receiver operating characteristic (ROC) curve comparing machine learning methods. This graph illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied.

An interesting property of PepFinder datasets was discovered during repeated decision tree modeling: observing zero ppm error between the observed mass in the MS data and the expected mass is a strong (but not complete) predictor of assay artifacts. This is believed to be supported by high relative importance of mass error in predicting artifacts in the gradient boosting tree, which uses an independent learning algorithm. Manual investigation of these species determined that >97.4% of all misidentifications had reported mass error of exactly zero (Figure 3A). This lends confidence to the final decision tree used (shown in Figure 3D). A)

ACS Paragon Plus Environment

B)

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 23

11

C)

D)

Figure 3. Important data attributes used by recursive partitioning to develop a decision tree for filtering out PepFinder misidentifications. (A) Histogram of empirical mass errors reveals that many misidentifications have exactly zero mass error. (B) Histogram of average mass reveals that many misidentifications have larger mass. (C) Empirical cumulative distribution reveals that misidentifications tend to have lower confidence than true identifications. (D) The final decision tree topology to filter out misidentifications and artifacts.

The decision tree model, which was implemented into Excel, starts by evaluating the mass error of the species, and confirms the species if the mass error is greater than 0.0013 ppm. The decision tree continues for the other species to examine if the average mass of the sample is greater than 6,597 Da (the size of the largest expected disulfide peptide), in which case it is labeled as an artifact and not a disulfide bond. For those with a mass less than 6,597 Da, if the confidence of the PepFinder identification is greater than 82.8%, it is labeled as a disulfide bond. If the confidence is lower, it is labeled as an artifact that is not a disulfide bonded peptide. It was investigated whether changing the PepFinder confidence filter to 0.8 would be sufficient to screen out misidentifications, but because 82.7% of the misidentifications have confidence >0.8 (Table S-2), this was considered insufficient. It was also noted that although rpart model topology was unstable in a fashion characteristic of these decision trees44, performance characteristics were relatively stable. The performance of the machine learning model was observed to be robust to changing conditions when evaluating data from individual peptide maps, despite important differences in data conditions and quality. Variable importance measurements routinely found the following attributes to be useful discriminators of DSB artifacts: ppm error, number of charge states identified for the peptide, retention time, peptide mass, and confidence. In contrast, measured differences between instruments, MS areas, and charge states did not have significant importance. Implementing the recursive partitioning model in Excel is straightforward with Excel's native logical functions. Because PepFinder occasionally reports what are reasoned to be spurious peptide hits with zero monoisotopic mass, an exception against classifying these entries is included. The recursive partitioning algorithm, described above, is implemented in DSB Data worksheet column N: ML Screener Probability of DSB (Figure S-4). The calculated output of this formula is the percentage of historical DSB data that supports real disulfide bond status at the time of model training. Excel’s native conditional formatting feature enabled easy-to-see visual guides to complement the probability output: researchers using this worksheet are presented with green cells when the historical probability of artifacts is 0%, red when the historical probability of artifact is 100%, and some grade of yellow is presented when the historical probability is less certain.

ACS Paragon Plus Environment

Page 13 of 23 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

12

Method Repeatability Using this optimized method for sample preparation and data analysis, the method repeatability was assessed by testing independent sample preparations (n=14) as well as repeat injections (n=6) for Version 1 of the bispecific antibody. Consistent peak area analysis (RSD no greater than 18%) were obtained for all eleven disulfide bonds and seven cysteine free thiols with these fourteen sample preparations (Table 2), suggesting the method is repeatable. The repeat injections provided smaller RSDs, so only the independent sample preparations were shown in Table 2 as the worst-case variability. Table 2. Disulfide bond and free thiol analyses of an IgG1 bispecific antibody (Version 1) using the optimized non-reduced LC-MS peptide map enhanced with machine learning. Normalized disulfide bond peak area (Normalized to CL Peak) (n=14)

Relative free thiols (%) (n=14)

Bond

Avg ± Std

RSD (%)

Cysteine

Avg ± Std

RSD (%)

HC-LC

0.12 ± 0.01

9

Knob VH C1

58 ± 6

11

Hole VH

0.12 ± 0.01

8

Knob VH C2

41 ± 4

9

CH2

1.00 ± 0.08

8

Knob CH3 C10

3.6 ± 0.4

10

Hole CH3

0.40 ± 0.02

5

Hole VH C17

1.7 ± 0.2

11

Knob VH

0.012 ± 0.001

12

Hole VH C18

5.3 ± 0.4

8

Knob CH3

0.52 ± 0.03

5

Hole CH3 C26

6.0 ± 0.7

12

CL

1.00 ± 0.00

N/A

Hole VL C28

3.1 ± 0.4

12

Knob VL

0.14 ± 0.01

7

Hole VL

0.16 ± 0.03

18

CH1

0.96 ± 0.06

6

Hinge

0.73 ± 0.06

7

The relative free thiol percentages for the “knob” VH region were significantly higher than all the other free thiol amounts, possibly because the disulfide bond peak area for the knob VH bond was consistently low (presumably due to a poor ionization efficiency related to the large size of the peptide), resulting in artificially inflated free thiol percentages. It is important to note that the relative free thiol percentage is not an absolute quantitation, but rather used for comparison between samples. Overall, the method produces robust and repeatable results for assembled bispecific antibody. Identification of Disulfide Bonds and Cysteine Modifications Both the UV (ultraviolet) chromatogram and TIC (total ion chromatogram) were analyzed to characterize and comprehensively annotate the chromatogram of the bispecific antibody. A sequence coverage of 99.4% was obtained with only three single amino acids residues missing. This annotation was supplied to PADMA as the reference material to compare all samples to.

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 23

13

Using the theoretical mass to charge ratios (m/z) of the disulfide bonds, the extracted ion chromatogram (EIC), found in Figure 4, can be extracted from the TIC. All eleven of the unique disulfide bonds in the assembled bispecific can be identified as well as all eight of the expected disulfide bonds in each half antibody.

Figure 4. Representative extracted ion chromatograms (EIC) showing the disulfide bonds for (A) fulllength assembled bispecific, (B) “hole” half-antibody, and (C) “knob” half antibody.

Each individual half antibody was analyzed to better understand the product quality before and after assembly. The half antibodies behaved as expected with the eight expected disulfide bonds (three of the eleven unique bonds are specific to each half) being identified with no detectable scrambled peptides. Additionally, free thiols were identified at all the corresponding sites in the assembled material. There were a few free thiol sites identified in the unassembled half antibodies, but not in the assembled bispecific; however, each of these was at levels below 5%. Cysteinylation and glutathionylation modifications were found on some of the hinge cysteines on the half antibodies, but were not present in the assembled material, as verified by mass accuracy and MS2 data (Figure 5A). A possible explanation for these observations is that cysteine modifications found on the half antibody, most notably on the hinge cysteines, are removed from the protein during assembly and downstream purification, where disulfide bonds are formed between the two half antibodies. The presence of glutathionylation was also consistent with the intact mass data, which showed a double glutathionylation modification as a small percentage of the half antibody (Figure 5B).

ACS Paragon Plus Environment

Page 15 of 23 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

14

Figure 5. (A) MS/MS spectrum and (B) intact mass analysis of a double glutathionylation modification on the hinge cysteines of a half antibody (hole).

Analysis of Critical Quality Attributes The assembled Version 1 bispecific antibody was subjected to a panel of common stresses to analyze CQAs (disulfide bond stability and cysteine by-products). Assembled material was subjected to light, basic, oxidative (AAPH), and thermal stresses. None of the eleven unique disulfide bonds differed significantly from the Version 1 representative material as the majority (except for thermal, HC-LC and basic stress, knob CH3) of the stressed sample normalized bond areas fell within two standard deviations of the representative material, shown in Table 3. The relative free thiol levels of all of the stressed materials were also similar to the Version 1 material. Table 3. Normalized disulfide bond peak area for the stressed material compared to the Version 1 assembled bispecific.

HC-LC

0.12 ± 0.02

0.10

0.10

0.11

0.09

0.13

High (400x) GSH 0.12

Hole VH

0.12 ± 0.02

0.13

0.13

0.13

0.13

0.12

0.12

CH2

1.00 ± 0.2

0.90

0.84

0.85

0.87

1.06

1.00

Hole CH3

0.40 ± 0.04

0.38

0.36

0.38

0.43

0.41

0.38

Knob VH

0.012 ± 0.003

0.012

0.014

0.012

0.011

0.011

0.011

Knob CH3

0.52 ± 0.06

0.48

0.45

0.47

0.52

0.50

0.49

CL

1.00 ± 0.00

1.00

1.00

1.00

1.00

1.00

1.00

Knob VL

0.14 ± 0.02

0.15

0.16

0.16

0.15

0.13

0.13

Hole VL

0.16 ± 0.06

0.15

0.14

0.13

0.14

0.17

0.15

Bond ID

v1 (n=14) (Avg ± 2 Std)

Light Stress

Basic Stress

AAPH Stress

Thermal Stress

Low (50x) GSH

CH1

0.96 ± 0.1

0.90

0.90

0.94

0.97

0.95

0.98

Hinge

0.73 ± 0.1

% mis-paired

not detected

0.69 not detected

0.68 not detected

0.71 not detected

0.69 not detected

0.74 not detected

0.68 not detected

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 23

15

While no scrambled disulfide bonds were identified in the representative Version 1 or stressed material, it was important to test positive controls for scrambled disulfide bonds to ensure the method can detect scrambling. To create positive controls, samples were heated to 90°C for one or two hours before sample preparation. Heating to 90°C created a significant amount of mis-paired disulfide bonds that were identified by PepFinder. These mis-paired disulfide bonds can be visualized using the Mass Matrix software21 which creates a heat map of all possible disulfide bonds where dark blue squares represent no disulfide bonds and red squares represent a high confidence of the existence of a disulfide bond (Figure 6). The axes are a list of all of the cysteines in the protein and each square represents a possible disulfide bond between the cysteine on the y-axis and the cysteine on the x-axis. In Figure 6A, the unstressed control (Version 1) shows the correct/expected disulfide bonds in the bispecific antibody while Figure 6B and 6C show the mis-paired disulfide bonds in the heated samples. The significantly different disulfide bond profiles indicate mis-paired disulfide bonds, which match the PepFinder outputs, verifying a significant percentage of mis-paired disulfide bonds. The disulfide bonds scrambled during intense heat were primarily bonds involving the CL, HC-LC, CH1, Knob VL, Hinge, CH2, and Hole VL cysteines. These mis-paired peptides were verified using mass accuracy (limit of 5 ppm) and MS/MS data. In addition to affecting the disulfide bonds, these conditions also significantly lowered all the free thiol sites. These results indicate that extreme heat stress (90°C) can cause the breakage and reshuffling of cysteine bonds and can be used as a positive control for detecting mispaired peptides by PADMA.

Figure 6. Disulfide bond configuration analyzed by Mass Matrix software for (A) assembled Version 1 unstressed bispecific protein, (B) heating the protein at 90°C for one hour, and (C) heating the protein at 90°C for two hours.

Overall, this method has shown the ability to detect correct and mis-paired disulfide bonds, free thiols, and other cysteine modifications, such as glutathionylation that can change between the half antibodies and the assembled bispecific molecules. Application of Improved Method to Assess Impact of Process Changes on Product Quality To better understand how process changes affect disulfide bond and cysteine modifications, samples of the assembled bispecific antibody generated from two different versions of the process were analyzed. Version 2 utilized a different cell line for the knob half antibody, a different feeding strategy for the hole half antibody, and two of the purification chromatography steps were reversed compared to Version 1. The normalized peak area for the version 1 and 2 materials were relatively similar and well within one standard deviation

ACS Paragon Plus Environment

Page 17 of 23 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

16

of each other, indicating the disulfide bonds occurred at similar amounts with no detectable scrambled disulfide bonds (Table 4). Table 4. Disulfide bond and free thiol analyses of two different versions of an IgG1 bispecific antibody produced from two different processes. Normalized disulfide bond peak area (Normalized to CL Peak) v1 (n=14) v2 (n=4) Bond ID Avg ± Std Avg ± Std

Relative free thiols (%) Cysteine

v1 (n=14) Avg ± Std

v2 (n=4) Avg ± Std

HC-LC

0.12 ± 0.01

0.12 ± 0.03

Knob VH C1

58 ± 6

74 ± 5

Hole VH

0.12 ± 0.01

0.10 ± 0.02

Knob VH C2

41 ± 4

58 ± 4

CH2

1.00 ± 0.08

0.94 ± 0.05

Knob CH3 C10

3.6 ± 0.4

3.3 ± 0.7

Hole CH3

0.40 ± 0.02

0.42 ± 0.03

Hole VH C17

1.7 ± 0.2

1.8 ± 0.5

Knob VH

0.012 ± 0.001

0.009 ± 0.003

Hole VH C18

5.3 ± 0.4

5.2 ± 1.2

Knob CH3

0.52 ± 0.03

0.52 ± 0.04

Hole CH3 C26

6.0 ± 0.7

5.8 ± 1.3

CL

1.00 ± 0.00

1.00 ± 0.00

Hole VL C28

3.1 ± 0.4

2.1 ± 0.3

Knob VL

0.14 ± 0.01

0.14 ± 0.01

Hole VL

0.16 ± 0.03

0.16 ± 0.02

CH1

0.96 ± 0.06

1.04 ± 0.07

Hinge

0.73 ± 0.06

0.69 ± 0.07

% mis-paired

not detected

not detected

The relative free thiol percentages were consistent for the two versions of the assembled bispecific material with the exception of Knob VH C1 and C2. Free thiols were consistently identified in the same seven cysteines with RSDs below 10% for the majority of the relative free thiol percentages (Table 4). While the same seven free thiols sites were similar, there was a significant difference in the free thiol amounts at the two cysteines in the “knob” VH region with the Version 2 material having a higher relative free thiol percentage (58% ± 6% compared to 74% ± 5% for the knob VH C1 and 41% ± 4% compared to 58% ± 4% for the knob VH C2). The difference between version 1 and 2 materials free thiol amount was verified using a RP-UHPLC NcHM (reverse-phase ultra-high pressure liquid chromatography N-cyclohexylmaleimide) tagging method, which showed the Version 1 material had a total free thiol amount of 0.70 mol free thiol /mol protein and the Version 2 material had 0.86 mol free thiol / mol protein. While the values derived from these two methods are not directly comparable, the trend is similar for both methods, in which an elevated level of free thiol in version 2 material is apparent. The site-specific results indicate that the change in process between Versions 1 and 2 led to a slight increase in free thiols on the VH region of the “knob” half antibody and demonstrates this method can identify differences in relative free thiols coming from process changes.

Conclusion and Perspectives An enhanced non-reduced peptide map LC-MS method described in this work allows for a consistent and comprehensive characterization of cysteine forms in a bispecific antibody with a much shorter turn-around time compared to previous methods. Sample preparation and data analysis were greatly improved by decreasing incubation time for free thiol capping and by implementing a semi-automated data analysis protocol developed in-house combined with machine learning. The creation of PADMA and incorporation of machine learning

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 23

17

significantly cut down the data analysis from about six hours per sample down to 30 minutes. The method enables consistent and reliable relative quantification of expected disulfide bonds, mis-paired disulfide bonds, free thiols and other cysteine-related modifications with good reproducibility (