Codon-Directed Determination of the Biological Causes of Sequence

Analytical, Cell Line and Process Development, Bristol-Myers Squibb Company, 311 Pennington-Rocky Hill Road, Pennington, New Jersey 08534, United Stat...
0 downloads 9 Views 877KB Size
Subscriber access provided by READING UNIV

Article

Codon-Directed Determination of the Biological Causes of Sequence Variants in Therapeutic Proteins Tao Jiang, Hangtian Song, Thomas R Slaney, Wei Wu, Erik Langsdorf, Gargi Gupta, Richard Ludwig, Li Tao, Duncan McVey, and Tapan K. Das Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.7b02914 • Publication Date (Web): 31 Oct 2017 Downloaded from http://pubs.acs.org on November 1, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Analytical Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 21

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Codon-Directed Determination of the Biological Causes of Sequence Variants in Therapeutic Proteins

Tao Jiang1,*, Hangtian Song2,**, Thomas R. Slaney2, Wei Wu2, Erik Langsdorf2, Gargi Gupta2, Richard Ludwig2, Li Tao2, Duncan McVey2, Tapan K. Das2

1

Department of Chemistry, the University of Michigan, 930 North University Avenue, Ann Arbor, MI 48109

2

Analytical, Cell Line and Process Development, Bristol-Myers Squibb Company, 311 PenningtonRocky Hill Road, Pennington, NJ 08534

*Current address: 200 Technology Square, Cambridge, MA

Author Information **To whom correspondence should be addressed Phone: 1-609-818-4251. Email: [email protected] Notes: the authors declare no competing financial interest.

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 21

Abstract Recombinant monoclonal antibodies (mAbs) manufactured from immortalized mammalian cell lines are becoming increasingly important as therapies. Ensuring the quality of expressed proteins is critical when developing manufacturing processes. Protein sequence variants (PSVs) are a type of product-related variant in which errors in the protein sequence are present. Detecting PSVs and determining their origins, either by DNA mutation or mRNA mistranslation, is critical. Mutations cannot be remediated without developing new clones, which can be costly and time consuming. In contrast, mistranslation can usually be mitigated by optimizing cell culture conditions. In this work, we first developed a new method to detect low-abundance PSVs with improved sensitivity. Then, a statistical metric was proposed to determine whether the observed PSVs originate from mutation or mistranslation by characterizing the distribution of PSVs. This method was applied to the evaluation of 50 clones from 5 mAbs programs, allowing for identification of 5 mutation and 139 mistranslation PSVs. The presence of even a few mutations demonstrates the necessity of clone screening during process development. Keywords: monoclonal antibody, sequence variant analysis, liquid chromatography, mass spectrometry, peptide mapping

ACS Paragon Plus Environment

Page 3 of 21

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Introduction Recombinant proteins are becoming increasingly important as therapies for diseases such as cancer1, hepatitis2, genetic disorders3 and rheumatoid arthritis4. Most therapeutic proteins are expressed by extensively characterized systems such as Chinese Hamster Ovary (CHO) cells. During manufacturing, any deviation in the primary sequence or higher-order structure of a therapeutic protein could either raise safety concerns such as immunogenicity, or impact efficacy by altering binding affinity, stability or pharmacokinetics. Therefore, monitoring and controlling variants of the protein product are crucial to biopharmaceutical development. Protein sequence variants (PSVs) are a type of variant in which the amino acid sequence deviates from the designed sequence in one or more locations. PSVs have been observed at single5 or multiple6,7 sites in therapeutic proteins. PSVs could originate from multiple steps in gene expression, starting from deoxyribonucleic acid (DNA) replication8, messenger ribonucleic acid (mRNA) transcription9, to peptide synthesis (mRNA translation) in the ribosome10. In general, DNA mutations and mRNA mistranslations have been observed more frequently, while transcriptional errors, by their nature, are unlikely to be observed11. An unexpected nucleotide in the vector which is transfected to the host cell, or a nucleotide mutation during DNA replication, may cause an inheritable mutated DNA sequence in the cell bank5. However, translation errors (or misincorporation) represent the most widely observed PSVs and can occur by either undesired tRNAs mismatching with mRNA codons12-15, or tRNAs being mischarged with the incorrect amino acid7,9,16-18. Differentiating mutations from mistranslations is critical since the strategy for minimizing PSVs depends on their root causes. Recombinant protein products frequently need large-scale protein manufacturing, and the consequences of a mutant in the master cell bank can be costly and time consuming to remediate, especially during late stages of clinical development. A belatedly identified PSV may even lead to a repeat of pre-clinical and clinical studies. On the contrary, translational ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 21

misincorporation can be reduced by optimizing cell culture conditions at a much lower cost 7,19,20. A clone exhibiting mistranslational PSVs during screening but with otherwise superior performance may therefore still be considered for development. Due to the mechanistic differences between the PSVs resulting from mutation versus mistranslation, their occurrence rates and distributions are different. Regarding occurrence rates, mistranslation (mismatch and mischarge) is usually more common than DNA replication errors, by up to 104 fold 18,21,22

. However, the chemical reagents used to select for desired gene expressing cells such as

methotrexate can impact DNA replication, leading to mutations. If a missense mutation occurs within the expression vector sequence, then it is likely to be observed as a PSV. It is essentially improbable that clones sharing no common ancestry after transfection would contain the same mutation at the same amino acid residue21. On the other hand, a panel of similarly prepared clones under identical conditions may exhibit the same type of mistranslation in multiple clones, as well as at additional residues within the same clone. Therefore, distribution patterns provide a framework for determining the root cause of each observed PSV. Detection of low-level PSVs requires sensitive analytical techniques. For detecting mutations, mRNA/cDNA sequencing and PCR screening can be applied5,18. State-of-the-art sequencing methods such as duplex sequencing can detect as low as 1 mutant nucleotide in 107 nucleotides23. To detect PSVs in recombinant protein products, several common protein analytical techniques such as charge-based24, size-based18, and hydrophobicity-based5 separations, as well as Edman sequencing6 have been successfully applied. Peptide mapping with liquid chromatography and tandem mass spectrometry (LCMS/MS) detection provides comprehensive information on location, identity and relative quantification of PSVs at the amino acid residue level with high sensitivity25-27. In LC-MS/MS experiments, datadependent acquisition (DDA) mode is commonly used to detect PSVs, in which the most abundant peaks in each MS scan are individually isolated and subjected to sequencing analysis by fragmentation.

ACS Paragon Plus Environment

Page 5 of 21

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Although DDA is a useful technique given its wide applicability, it is limited in sensitivity by the scanning and fragmentation speed of the instrument, which may leave some low level variants undetected. In this work, we developed a novel technique to improve sensitivity for detecting low-abundance PSVs by LC-MS/MS analyses. Next, a quantitative scoring scheme developed in our laboratory is applied to differentiate PSVs originating from mutation versus mistranslation. This scoring scheme is based on the distribution of PSVs and their relative abundances across clones and was demonstrated on five therapeutic proteins under development. This method allowed us to characterize the typical distribution patterns of PSVs caused by mutation versus mistranslation, thereby greatly facilitating cell line development.

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 21

Experimental Section Materials and Reagents All therapeutic proteins used in this work were recombinant monoclonal IgG1 or IgG4 antibodies expressed in CHO cells and purified by Protein A affinity chromatography (Bristol-Myers Squibb, Bloomsbury, NJ). Unless otherwise specified, all chemicals were purchased from Sigma-Aldrich (St. Louis, MO) and were BioXtra grade or better. Amicon Ultra-0.5 mL centrifugal filters (3K molecular weight cut-off, MWCO) were purchased from EMD Millipore (Billerica, MA). Zeba 7 kDa MWCO gel desalting 96-well spin-plates were purchased from Thermo Fisher Scientific (Rockford, IL). LC-MS grade solvents were purchased from Avantor (Center Valley, PA). Sequencing grade modified trypsin was purchased from Promega (Madison,WI). Synthetic peptides were ordered from GenScript (Piscataway, NJ). UPLC Tryptic Peptide Mapping Protein samples were concentrated to 25 mg/mL by Amicon centrifugal filters. First, the sample protein (20 µL) was added to 70 µL of 8 M guanidine hydrochloride in 100 mM Tris HCl pH 8.0, followed by addition of 9 µL of 200 mM dithiothreitol. The sample was next heated at 37 °C for 20 mins. Next, 10 µL of 400 mM iodoacetamide was added, and the sample incubated at room temperature for 15 mins in the dark. A Zeba desalting plate was equilibrated with 50 mM Tris HCl, 2 M urea, 10 mM CaCl2, pH 7.6 by four cycles of adding 250 µL Tris buffer to each well and and centrifuging at 1200 g for 2 mins. The reduced and alkylated protein samples were loaded onto the equilibrated Zeba desalting plate and eluted into a collection plate by centrifigation at 1200 g for 2 mins.. Each sample was diluted with 400 µL of 50 mM Tris HCl, 10 mM CaCl2, pH 7.6 to approximately 1 mg/mL. Trypsin (0.5 mg/mL) was added to each sample (1:40, w/w, enzyme to protein) followed by incubation at 37 °C for 2 hrs. Digestion was stopped by adding 1.0 M HCl to a pH of 2 to 3. Tryptic digests (~1 mg/mL, 25 µg per injection) were

ACS Paragon Plus Environment

Page 7 of 21

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

chromatographically separated on an Acquity UPLC BEH300 C18 column (1.7 µm, 2.1×150 mm, Waters, Milford, MA) at a flow rate of 0.2 mL/min using an Acquity UPLC instrument (Waters, Milford, MA) at 45 °C. Mobile phase A (MPA) was 0.1% formic acid in water (v/v). Mobile phase B (MPB) was 0.1% formic acid in 80% acetonitrile and 20% water (v:v). The MPB concentration throughout the gradient was as follows: 1% from 0 to 4 min; 1 to 20% from 4 to 40 min; 20 to 30% from 40 to 80 min; 30 to 50% from 80 to 100 min; 50% to 100% from 100 to 102min; 100% from 102 to 105min. MS/MS Sequence Variant Detection Tryptic peptides in chromatographic eluate were analyzed on-line by electrospray ionization MS using a Thermo Orbitrap Elite MS (San Jose, CA). First, a top-ten ion DDA MS/MS acquisition was performed, wherein parent ions were detected by the Orbitrap analyzer, and the 10 most abundant ions were fragmented using collision-induced dissociation with detection in the linear ion trap28. For MS1, resolution of the Orbitrap detector was at 30,000 and scan range was 200 to 3000 m/z. Automatic gain control was set to 1×106, with maximum injection time set to 200 ms. For MS/MS scans, the mass analyzer utilized was the ion trap at rapid scan rate in normal mass range. Automatic gain control was set to 1×104. Maximum injection time was set to 100 ms, and minimum signal threshold was 1000 counts. Dynamic exclusion was enabled, with the following settings: repeat count 2; repeat duration 5 sec; exclusion list size 500; exclusion duration 30 sec; exclusion mass width 1.5 m/z (for both low and high mass). Database searching of spectra was performed using either Trans Proteomic Pipeline with X!Tandem29 or Protein Metrics Byonic (San Carlos, CA)30 software. Sequence variants identified by DDA were designated as major sequence variants. Next, the same samples were re-analyzed by LC-MS, acquiring 240,000 resolution MS1 scans only. MS1 data sets were deconvoluted using Thermo SIEVE version 2.0 software (San Jose, CA) with these settings: MaximumFrames 400,000; MZWidthPPM 10; PRMaxCharge 5; RTWidth 0.5; Threshold ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 21

5000. An in-house R (R Foundation, Vienna, Austria) script was developed to compare major sequence variants detected to the cDNA sequence of the therapeutic protein, as well as to search the aligned MS1 spectra for ions corresponding to peptides with the same amino acid substitutions at other sites of the same codon. Each potential MS1 match was verified via a third analysis by LC-MS/MS, using a targeted MS/MS acquisition to fragment each suspected ion. Relative quantification was based on extracted ion chromatogram peak areas of a given sequence variant peptide versus the sum of the sequence variant peptide and its corresponding wild-type peptide. The m/z of the first isotope peak at the charge state with the highest MS response was used for each extracted ion chromatogram. UV 215 nm chromatographic peaks were used for quantitation of peptides corresponding to lysine or arginine variants whenever they were baseline resolved. This experimental workflow is shown in Figure 1. The procedures for identification of DNA mutations by characterization of sequence variant distributions is described in detail in the Supporting Information. Results and Discussion Improvement of Sensitivity in Sequence Variant Detection To increase sensitivity for detecting low-abundance PSVs, we improved the typical LC-MS/MS workflow. This method starts with an LC-MS/MS DDA experiment and database searching. When a PSV is identified, the same amino acid substitution is applied to other sites with the same codon to generate a list of potential variant peptides. For example, if one Ser (AGT)-to-Asn (AAT) variant is detected, a list of theoretical tryptic peptides containing Ser-to-Asn at other AGT-coded Ser sites is generated. Then, another LC-MS/MS experiment targeting m/z and retention times (observed using an LC-MS1 only experiment) of the theoretical PSV peptide list is performed to obtain MS/MS spectra of low-abundance PSVs previously missed by the DDA experiment. This improved workflow is illustrated in Figure 1.

ACS Paragon Plus Environment

Page 9 of 21

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figure 1. Targeted Acquisition for Detecting Low-Level PSVs. The blue sequence shows a typical bottom-up LC-MS/MS peptide mapping workflow for PSV searching. The yellow and orange sequences show the codon-based targeted LC-MS/MS workflow additions for detecting low-level PSVs and determining their biological mechanism of origin, respectively. To evaluate the improvement in sensitivity for PSV detection using the new method compared to existing LC-MS/MS methods, the peptide mapping workflow described above was applied to the analysis of BMS antibodies A, B, C, D and E. A total of twelve clones for each of the antibodies A, C and E, eight clones for antibody B, and six clones for antibody D were examined by these methods. The number of clones evaluated was dependent on the needs of each program. As the first step of the new procedure is to follow a typical DDA LC-MS/MS experiment, the improvement in sensitivity results from the additional targeted acquisition step, which is measurable by comparing how many additional PSVs are identified. For example, the DDA method detected one valine-to-leucine PSV in antibody B at amino acid residue number 305 on the heavy chain (annotated as HC V305L) at 0.03% abundance. Then, this workflow triggered a search for all variant peptides caused by the same codon change GTC(Val)-to-CTC(Leu) throughout the protein, which detected four additional low-level V to L variants, including HC V273L with less than 0.01% abundance. It should be noted that Leu and Ile are not distinguishable by MS/MS as performed in this study. Val to Leu is shown for simplicity. Both GTC(Val)-to-ATC(Ile) and GTC(Val)-to-CTC(Leu) are possible. GTC(Val)-to-ATC(Ile) is possible

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 21

according to the G/U mismatching mechanism15. GTC(Val)-to-CTC(Leu) could possibly be caused by tRNA mischarging. For each antibody, the number of variants detected by DDA alone and our method are shown in Table S1. The smallest increase was observed for antibody D, from 19 hits in the DDA search to 34 hits after the codon-directed search. On the other hand, antibody C exhibited the greatest increase from 7 hits in the DDA search to 42 hits after the targeted search, a 5-fold increase in the number of hits. Sequence Variant Abundance Distribution Patterns An important feature of the peptide mapping method developed in this work is examining each PSV by the codon at which it occurs, then searching other locations of the same codon within the protein for the presence of the same mutation. From this perspective, two distinct distribution patterns of PSVs were observed for the five therapeutic antibodies examined. One pattern is when a specific PSV is detected at only one codon site of the protein and within just one or a few clones. This distribution pattern is consistent with a mutation of the cell DNA instead of mistranslation, as DNA mutations are expected to be rare and independent of each other. On the contrary, it is unlikely that the conditions causing mistranslation affect only one codon site in only one or a few clone(s). An example of this distribution is the HC K370N PSV present in antibody B. This variant was detected in only three clones out of the total of eight, and only at one codon site per clone (Figure 2A).

ACS Paragon Plus Environment

Page 11 of 21

A

70%

CloneB1

PSV Abundance

60%

CloneB2

50%

CloneB3

40%

CloneB4

30%

CloneB5

20%

CloneB6

10%

CloneB7

0% HC K370N

B. PSV Abundance

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

CloneB8

(Other sites)

0.25% CloneB6

0.20%

CloneB7

0.15%

CloneB2 CloneB8

0.10%

CloneB5

0.05%

CloneB4 CloneB1

0.00%

CloneB3 HC HC HC HC LC LC S142N S304N S408N S440N S174N S202N

Figure 2. Sequence variant abundances for two types of distributions observed across different codon sites in 8 clones (represented by different colors) expressing antibody B. Panel A: AAA(K) to AAC/AAT(N) is consistent with DNA mutation. Panel B: AGC(S) to AAC(N) had a pattern consistent with mistranslation-induced PSVs. The second, more common pattern is that certain PSVs are observed across all clones and on multiple sites derived from the same codon within a protein molecule. While the abundances of the PSV at each codon site varied, the relative abundances at specific sites is similar for different clones, as shown in Figure 2B. Compared to other clones, clone B6 expressed the most abundant levels of the Ser to Asn PSV at 5 out of 6 locations, and similarly clone B2 expressed the second most abundant levels at the same 5 out of 6 sites, etc. This distribution suggests that a common stress factor (such as amino acid ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 21

starvation or other culture condition), induces mistranslation on the same codon at several locations and among all clones. Furthermore, when this stress varies from clone to clone (or flask to flask, bioreactor to bioreactor, etc.), mistranslation rates at each codon site is impacted too. Another pattern we have observed is that the relative levels of some PSVs tended to be consistent even between different antibody molecules within the conserved sequence domains. For example, Ser (AGC)to-Asn (AAC) PSV levels at conserved sites showed similar levels of mistranslation in 4 of 5 therapeutic antibodies assessed, as shown in Figure 3 (not detected in Antibody D). These distribution patterns across PSV sites could be caused by several factors. First, ionization efficiency differences between wild-type and variant peptides could contribute to the observed differences in PSV levels at different sites. Due to similarity of the constant region between mAbs, the observed similarity in PSV distribution in these regions would also be explained. However, the impact of ionization bias was tested by comparing UV chromatographic peak areas for Antibody B (Table S2), which demonstrated that the profile of PSV levels was comparable to that observed by extracted ion chromatograms. We can conclude then that these differences in abundances at different sites reflect real PSV levels within test samples. Secondly, the varying distribution across sites may be caused by preferential removal of some PSVs during purification. Some amino acid substitutions may cause protein misfolding, aggregation, or weaker affinity column binding and thus lead to their removal. This would again hold true for PSVs at conserved sites between different mAbs, leading to the pattern similarities observed between mAbs. Third, local differences in the mRNA-ribosome microenvironment may affect translation fidelity at different regions of the expressed protein, especially between the antibody heavy chain and light chain. Future work will examine if these distribution differences are caused by translation or purification.

ACS Paragon Plus Environment

Page 13 of 21

0.25

Antibody A Antibody B Antibody C Antibody E

0.20

% Abundance

0.15 0.10 0.05 0.00

82a 82b 98 99 119 134 136 138 165 183 184 191 207 219 267 298 304 330 331 364 375 383 408 415 440 30 31 53 63 76 77 168 171 174/182 176/177 202 208

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Heavy Chain

Light Chain

Figure 3. Average levels of serine (AGC) to asparagine (AAC) PSVs at different amino acid positions from 4 of 5 therapeutic antibodies. Some sites such as HC 267, 408 and 440 (Kabat and EU numbering) demonstrated similar levels of mistranslation for different antibody molecules. No serine to asparagine PSVs were detected in antibody D. Quantitative Differentiation of DNA Mutations and Translation Errors As described in the case studies above, the results of this codon-directed LC-MS workflow can be used to differentiate mRNA mistranslation versus DNA mutation errors if the PSV distribution across codon sites matches with one of the two patterns identified in this study. However, not every PSV resulting from DNA mutation can be distinguished by the absence of PSVs at other codon sites, as low-level mistranslation may potentially co-occur with DNA errors. In such a situation, it may be difficult to distinguish a high-level PSV falling within the expected mistranslation distribution pattern from a lowlevel DNA mutation. We therefore opted to develop a quantitative, empirically-derived algorithm to determine the expected abundance of PSVs at each codon site when screening multiple clones simultaneously, which can therefore be used for calculating the probability of a DNA error versus a translation error. This algorithm is based on assessing how much variability of measured PSV levels is contributed from the protein expression system as well as from the analytical methods used. If the level of one ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 21

mistranslation PSV from one clone expressed multiple times under identical culture conditions is assumed to follow the Gaussian distribution (the “Process Variability”), as is error resulting from the analytical measurement of the PSV (the “Analytical Variability”), then a z-score (and its corresponding p-value) can be used to determine the probability that this PSV significantly deviates from the distribution of sequence variants, as shown in Equation (1)

‫=ݖ‬

Equation (1):

௫ି௫෤ ఙ

To obtain the z-score of one PSV in one clone at one site, three values are needed. First, the relative abundance (x) of this PSV, which is experimentally measured by MS (or UV) as the relative chromatographic peak areas. Second, the expected abundance value (x̃, median) of this sequence variant needs to be derived from the PSV levels of all clones and all PSV sites for this codon. Third, the standard deviation of this sequence variant abundance measurement (σ) from this clone needs to be determined, accounting for variability introduced from cell culture to analysis (including both process variability and analytical variability). Methods to calculate the expected PSV abundance and PSV standard deviations are described in detail in the Supporting Information. Calculated z-scores can then be converted to a p-value for assessing the probability of a DNA mutation versus mRNA mistranslation. If a PSV codon site has a z-score greater than 2.33 (corresponding to a single-sided p value of less than 0.01), this PSV is considered outside the mRNA mistranslation distribution and is therefore a DNA mutation with a confidence level of 99%. Validation of the z-Score PSV Metric To evaluate the performance of the z-score metric for distinguishing PSVs resulting from DNA mutation versus mRNA mistranslation, a standard addition experiment was performed. A synthetic peptide identical to the tryptic peptide containing sequence variant HC S304N (corresponding to an AGC-toAAC mutation or misreading) from antibody B was spiked into one clone sample of antibody B which ACS Paragon Plus Environment

Page 15 of 21

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

expressed 0.04% of HC S304N. Peptide levels added were from 0.025% to 0.1% of the molar concentration of the sample heavy chain. Each spiked sample and the other eight clone samples were analyzed by LC-MS/MS. In each sample, expected abundance value (x̃), deviation (σ) and z-score were calculated for HC S304N. As illustrated in Figure 4, expected abundance value (x̃) of HC S304N did not change upon peptide addition. This is expected because the median function, instead of averaging, was adopted for calculating the expected abundance value in order to minimize the impact of outlier values. Analytical variability did not change upon peptide addition because analytical variability was assessed independently and was not related to a specific sequence variant or clone. Overall variability did increase upon peptide addition, because all clones and samples contribute to this metric. At 0.025% and 0.05% spiked HC S304N concentration levels, the measured PSV abundance was still within the 99% confidence range of analytical and clone variability, thus the variant was considered a mistranslation. At the 0.075% and 0.1% PSV spike levels, the measured abundance is outside of 99% confidence range of analytical and clone variability, thus at these levels, HC S304N is considered a DNA mutation-induced PSV. This demonstrates the feasibility of distinguishing a DNA mutant PSV with a limit of detection of less than 0.2% of PSVs in the presence of a 0.04% mistranslation background.

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 21

Figure 4. A synthetic tryptic peptide containing PSV (S304N) was spiked into one antibody Bsample to simulate a DNA mutation in the presence of mistranslation. When the concentration was increased by 0.075%, a DNA mutation was detected (p < 0.01) by comparison to the PSV distribution across the clones. Application of the z-Score Approach for Differentiating DNA Mutation from Translational Misincorporation PSVs in antibodies A, B, C, D and E were analyzed by our method to differentiate those caused by DNA mutations from those caused by mistranslation. Z-scores for all detected PSVs in all clones are as shown in Figure 5. The vast majority of PSVs detected (134 out of 139, or 96%) were determined to be mistranslations (z-score < 2.33). To verify that these variants are process-related, antibody D were expressed by CHO again under improved culture conditions and re-analyzed. Results demonstrated the mistranslational PSVs identified previously were significantly reduced (Table S4). After media optimization, Asn-to-Ser mistranslation was nearly eliminated but one Ser-to-Asn PSV increased, indicating the media may need to be further optimized. Overall, only five (4%) PSVs were attributed to DNA mutations (z-score > 2.33, p < 0.01), as listed in Table 1, including a Val-to-Leu mutation at 0.14%. The Val-to-Leu variant was not observed (noise level at around 0.001% of the wild type peptide) in other clones. As an orthogonal method, mRNA sequencing was utilized to search for DNA mutations in clones. Two occurrences of DNA mutations were confirmed by sequencing: AAA-to-AAC in

ACS Paragon Plus Environment

Page 17 of 21

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

antibody B, and GGG-to-TGG in antibody D. However, DNA mutation was not detected in the other three instances, which is likely due to limitations in sensitivity of the mRNA sequencing method used. This low occurrence of DNA mutations relative to mistranslations is consistent with previous observations15. Table 1. Sequence Variants in Each Antibody Project Attributed to Mutation by Using the ZScore Metric (p < 0.01).

Codon

Amino Acid

Clone

Abundance

Confirmed by mRNA Sequencing

Antibody B 63%, 61%, 62% Yes Clone B1,B2, B3 Antibody D 3.3% No GAC->GGC Asp->Ala Clone D6 Antibody D 25.3% Yes GGG->TGG Gly->Trp Clone D4 Antibody E Ser->Ala 3.8% No TCA->GCA Clone E4 Antibody E 0.14% No GTG->CTG Val->Leu* Clone E3 * Leu and Ile was not distinguishable by MS/MS in this study. In this case Leu is assigned because Ile (ATT, ATC or ATA) cannot be explained by a single nucleotide change, as in the G/U mismatch mechanism15. AAA->AAC

Lys->Asn

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 21

Figure 5. Heat maps showing z-scores of PSVs detected in 6 to 12 different clones expressing antibodies A, B, C, D and E. PSVs with a z-score greater than 2.33 (p < 0.01, black) are considered the result of a DNA mutation, as they fall outside the distribution of mistranslation-induced PSVs.

ACS Paragon Plus Environment

Page 19 of 21

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Conclusion Differentiating PSVs resulting from DNA mutation and mRNA mistranslation is critical during cell line development for the manufacturing of biotherapeutics, since most translational errors can be reduced or eliminated by optimizing cell culture conditions, whereas DNA mutation is impossible to fix without developing a new cell line which is costly and time consuming. Therefore, detailed information on DNA mutation and mRNA mistranslation are invaluable to both cell line development and upstream process development. The novel approach developed in this work can be used to differentiate PSVs resulting from DNA mutation versus mRNA mistranslation with improved sensitivity and coverage.

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 21

Supporting Information The supplemental information file includes detailed procedures of calculating expected PSV abundance, estimating variance of PSV abundance, and correcting ionization efficiency by synthetic peptides. It also includes representative PSV distribution and PSV levels before and after medium optimization. References (1) Scott, A. M.; Wolchok, J. D.; Old, L. J. Nat. Rev. Cancer 2012, 12, 278-287. (2) Marasco, W. A.; Sui, J. Nat. Biotechnol. 2007, 25, 1421-1434. (3) Desnick, R. J.; Schuchman, E. H. Nat. Rev. Genet. 2002, 3, 954-966. (4) Walsh, G. Nat Biotech 2014, 32, 992-1000. (5) Harris, R. J.; Murnane, A. A.; Utter, S. L.; Wagner, K. L.; Cox, E. T.; Polastri, G. D.; Helder, J. C.; Sliwkowski, M. B. Nat. Biotechnol. 1993, 11, 1293-1297. (6) Yu, X. C.; Borisov, O. V.; Alvarez, M.; Michels, D. A.; Wang, Y. J.; Ling, V. Anal. Chem. 2009, 81, 9282-9290. (7) Wen, D.; Vecchi, M. M.; Gu, S.; Su, L.; Dolnikova, J.; Huang, Y.-M.; Foley, S. F.; Garber, E.; Pederson, N.; Meier, W. J. Biol. Chem. 2009, 284, 32686-32694. (8) Lebkowski, J. S.; Miller, J. H.; Calos, M. P. Mol. Cell. Biol. 1986, 6, 1838-1842. (9) Parker, J. Microbiol. Rev. 1989, 53, 273-298. (10) Dietrich, A.; Kern, D.; Bonnet, J.; Giege, R.; Ebel, J. P. Eur. J. Biochem. 1976, 70, 147-158. (11) Li, M.; Wang, I. X.; Li, Y.; Bruzel, A.; Richards, A. L.; Toung, J. M.; Cheung, V. G. Science 2011, 333, 53-58. (12) Brinkmann, U.; Mattes, R. E.; Buckel, P. Gene 1989, 85, 109-114. (13) Calderone, T. L.; Stevens, R. D.; Oas, T. G. J. Mol. Biol. 1996, 262, 407-412. (14) Seetharam, R.; Heeren, R. A.; Wong, E. Y.; Braford, S. R.; Klein, B. K.; Aykent, S.; Kotts, C. E.; Mathis, K. J.; Bishop, B. F.; Jennings, M. J.; Smith, C. E.; Siegel, N. R. Biochem. Biophys. Res. Commun. 1988, 155, 518-523. (15) Zhang, Z.; Shah, B.; Bondarenko, P. V. Biochemistry 2013, 52, 8165-8176. (16) Guo, M.; Chong, Y. E.; Shapiro, R.; Beebe, K.; Yang, X.-L.; Schimmel, P. Nature 2009, 462, 808-812. (17) Beebe, K.; Mock, M.; Merriman, E.; Schimmel, P. Nature 2008, 451, 90-93. (18) Guo, D.; Gao, A.; Michels, D. A.; Feeney, L.; Eng, M.; Chan, B.; Laird, M. W.; Zhang, B.; Yu, X. C.; Joly, J.; Snedecor, B.; Shen, A. Biotechnol. Bioeng. 2010, 107, 163-171. (19) Khetan, A.; Huang, Y. m.; Dolnikova, J.; Pederson, N. E.; Wen, D.; Yusuf-Makagiansar, H.; Chen, P.; Ryll, T. Biotechnol. Bioeng. 2010, 107, 116-123. (20) Ibba, M.; Soll, D. Science 1999, 286, 1893-1897. (21) Kunkel, T. A.; Bebenek, R. Annu. Rev. Biochem 2000, 69, 497-529. (22) Kramer, E. B.; Farabaugh, P. J. RNA-Publ. RNA Soc. 2007, 13, 87-96. (23) Kennedy, S. R.; Schmitt, M. W.; Fox, E. J.; Kohrn, B. F.; Salk, J. J.; Ahn, E. H.; Prindle, M. J.; Kuong, K. J.; Shen, J.-C.; Risques, R.-A.; Loeb, L. A. Nat. Protocols 2014, 9, 2586-2606. (24) Ling, J.; Söll, D. Proc. Natl. Acad. Sci. USA 2010, 107, 4028-4033. (25) Que, A. H.; Zhang, B.; Yang, Y.; Zhang, J.; Derfus, G.; Amanullah, A. BioProcess Int 2010, 8, 52-60. (26) Zhang, T.; Huang, Y.; Chamberlain, S.; Romeo, T.; Zhu-Shimoni, J.; Hewitt, D.; Zhu, M.; Katta, V.; Mauger, B.; Kao, Y.-H. mAbs 2012, 4, 694-700. (27) Yang, Y.; Strahan, A.; Li, C.; Shen, A.; Liu, H.; Ouyang, J.; Katta, V.; Francissen, K.; Zhang, B. mAbs 2010, 2, 285-298. (28) Makarov, A.; Denisov, E.; Kholomeev, A.; Baischun, W.; Lange, O.; Strupat, K.; Horning, S. Anal. Chem. 2006, 78, 2113-2120. (29) Deutsch, E. W.; Mendoza, L.; Shteynberg, D.; Farrah, T.; Lam, H.; Tasman, N.; Sun, Z.; Nilsson, E.; Pratt, B.; Prazen, B.; Eng, J. K.; Martin, D. B.; Nesvizhskii, A. I.; Aebersold, R. Proteomics 2010, 10, 1150-1159. (30) Bern, M.; Cai, Y. H.; Goldberg, D. Anal. Chem. 2007, 79, 1393-1400.

ACS Paragon Plus Environment

Page 21 of 21

For TOC only

Mistranslation tRNA AAC mRNA AGC

Mutation DNA AAC TTG

Sequence Variant %

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Clone 1-8

S142N

S304N

S408N

S440N

S174N

S202N

ACS Paragon Plus Environment