Comprehensive Discovery and Quantitation of Protein Heterogeneity

Subscriber access provided by United Arab Emirates University | Libraries Deanship

Article

Comprehensive Discovery and Quantitation of Protein Heterogeneity via LC-MS/MS Peptide Mapping for Clone Selection of a Therapeutic Protein Matthew J. Traylor, Anna V. Tchoudakova, Amie M. Lundquist, John E. Gill, Ferenc L. Boldog, and Bruce S. Tangarone Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.6b02895 • Publication Date (Web): 27 Aug 2016 Downloaded from http://pubs.acs.org on August 30, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Analytical Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 16

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Comprehensive Discovery and Quantitation of Protein Heterogeneity via LC-MS/MS Peptide Mapping for Clone Selection of a Therapeutic Protein 1,2

Traylor, M.J.; 3 Tchoudakova, A.V.; 3 Lundquist, A.M.; 3 Gill, J.E.; 3 Boldog, F.L.; 2 Tangarone, B.S.

2

Department of Analytical Development, 3 Department of Cell Line Development; Shire, Lexington, MA USA

1

To whom correspondence should be addressed. Email: [email protected].

Disclosures: The authors declare no competing financial interest. Abstract Development of biopharmaceutical production cell lines requires efficient screening methods to select the host cell line and final production clone. This is often complicated by an incomplete understanding of the relationship between protein heterogeneity and function at early stages of product development. LC-MS/MS peptide mapping is well suited to the discovery and quantitation of protein heterogeneity; however, the intense handson time required to generate and analyze LC-MS/MS data typically accommodates only smaller sample sets at later stages of clone selection. Here we describe a simple approach to peptide mapping designed for large sample sets that includes higher-throughput sample preparation and automated data analysis. This approach allows for the inclusion of orthogonal protease digestions and multiple replicates of an assay control that encode an assessment of accuracy and precision into the data, significantly simplifying the identification of truepositive annotations in the LC-MS/MS results. This methodology was used to comprehensively identify and quantify glycosylation, degradation, unexpected post-translational modifications, and three types of sequence variants in a previously uncharacterized non-mAb protein therapeutic expressed in approximately 100 clones from three host cell lines. Several product quality risks were identified allowing for a more informed selection of the production clone. Moreover, the variability inherent in this unique sample set provides important structure/function information to support quality attribute identification and criticality assessments, two key components of Quality by Design. Keywords Multi-attribute Method, Clone Selection, Peptide Map, Glycan, Sequence Variant, PTM, Glycopeptides Abbreviations Liquid chromatography tandem mass spectrometry (LC-MS/MS); ultraviolet (UV); mass spectrometry (MS); posttranslational modification (PTM); trifluoroacetic acid (TFA); collision-induced dissociation (CID); higher-energy collisional dissociation (HCD); N-acetylneuraminic acid (NeuAc); N-glycolylneuraminic acid (NeuGc); galactose (Gal); mannose (Man); N-acetylglucosamine (GlcNAc); master cell bank (MCB); methionine sulfoximine (MSX); core-fucosylated, biantennary glycan with two terminal NeuAc monosaccharides (A2S2F); core-fucosylated, biantennary glycan with one terminal NeuAc and one terminal NeuGc (A2Sg1S1F); Chinese hamster ovary cells (CHO); Chinese hamster ovary cells with Glutamine Synthetase gene knock-out as sold by SAFC/SIGMA (CHOZN GS -/-); FreeStyleTM human embryonic kidney cells as sold by ThermoFisher Scientific (HEK 293-F); monoclonal antibody (mAb); relative standard deviation (RSD); quality by design (QbD).

ACS Paragon Plus Environment


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 16

Introduction Protein therapeutics, especially those derived from eukaryotic hosts, can be highly heterogeneous due to the wide variety of modifications that can occur during production, including post-translational modifications (PTMs), sequence variations, and degradation reactions.1-3 This heterogeneity can impact therapeutic safety and efficacy. Thus, characterization of protein modifications is critically important to understand and predict protein function. Mass spectrometry (MS) offers unparalleled sensitivity and specificity for the identification and quantitation of protein modifications.2,4-11 In particular, peptide mapping methods often yield high sequence coverage with residue-level detail of modifications. In these methods, the protein is digested into peptides with a sequencespecific protease, and the peptides are separated chromatographically and detected with in-line UV absorbance and MS. The peptide components of chromatographic peaks are identified by their exact mass and MS/MS fragmentation spectra, and protein modifications are quantified based on the intensities of the UV or MS signals for the modified and non-modified peptides. Peptide mapping approaches, also called Bottom Up methods, contrast with Top Down methods,5,8,12,13 which seek to provide the same residue-level detail without proteolytic treatment, relying instead on the separation and fragmentation power of the MS instrumentation. Top Down methods can retain protein-level information that is lost during digestion, and they involve simpler sample preparation, which can reduce degradation artifacts and improve throughput. To date, the technical challenges associated with the identification and quantitation of low-level modifications (e.g., 1% abundance) in larger or more heterogeneous proteins are significantly greater in Top Down versus Bottom Up experiments12,13. Thus, peptide mapping methods (i.e., Bottom Up approaches) are the most generally applicable techniques to comprehensively characterize protein samples. There can be a data analysis bottleneck in the development of MS-based peptide mapping methods due to the difficulty of peptide annotation when interpreting complicated MS/MS spectra. Although many academic and commercial software packages are available that considerably accelerate peptide annotation,11,14,15 the process of differentiating between true-positive and false-positive annotations from the software output is still a laborious task requiring significant manual inspection of MS/MS spectra and a high level of technical expertise. This data analysis bottleneck limits the speed at which high-throughput MS peptide mapping methods can be developed and thus their application to problems early in the biopharmaceutical development process (e.g., clone selection). In biopharmaceutical manufacturing, a master cell bank (MCB) is used to seed production bioreactors. The MCB is required to be a clonal cell line derived from a single transfected cell. During cell line development, the type of host cell line used for expression, and even the exact clone selected to generate the MCB, significantly affects the heterogeneity of the expressed protein. Clone selection is the workflow used to identify the “best” clone from a transfected pool of cells and typically involves several sequential rounds of testing, where a decreasing number of clones are subjected to an increasing number of analytical methods16 (Schematic 1). The difficulty of developing and applying peptide mapping protocols typically limits the rigorous characterization of protein heterogeneity until later stages of clone selection, by which time decisions (e.g., host cell line selection) have already been made limiting the parameter space available from which to select the final production clone. This limited parameter space can be especially problematic for non-antibody protein therapeutics where the types of


Page 3 of 16

Host 2

Host 3

Transfection/ Stable Pool Selection

Transfection/ Stable Pool Selection

96-well

1000’s

1000’s

1000’s

24-well

100’s

100’s

100’s

6-well

10’s

10’s

10’s

30

30

Shake Flask/ 30 Tube Spin

Lead/Back up Host Bench-Scale Bioreactor 5

Product Quality

Host 1 Transfection/ Stable Pool Selection

Activity

modifications present, the functional impact of those modifications, and the variability between clones and host cell lines is often poorly understood when the cell line development project is initiated.

Productivity / Growth

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Lead/Back up Clone

Schematic 1. Example clone selection workflow where multiple hosts are screened concurrently. Potential culture scales, clone counts at each stage, and screening assays are shown. The goal of this work was to develop a simple, robust, and high-throughput LC-MS/MS peptide mapping method to discover and quantify protein heterogeneity in large sample sets. This approach enabled the use of peptide mapping during clone selection to successfully identify multiple risks related to glycosylation, degradation, PTMs, and sequence variants inherent in different cell lines and within specific clones. These risks were identified early enough that mitigation strategies could be implemented with little impact to the overall development timeline. Experimental Procedures Protein Samples. The protein analyzed in this work contains 448 residues, two N-linked glycosylation sites, and five disulfide bonds within one polypeptide chain. This protein was expressed in three host cell lines, including a CHO cell line (CHOZN GS -/- from SAFC/Sigma) and two human cell lines, HEK 293-F (ThermoFisher Scientific) and HT1080.17 Protein samples derived from clonal expression cultures were subjected to single-step purification with a butyl resin prior to LC-MS/MS analysis Peptide Mapping Sample Preparation. Protein samples and assay controls were prepared in duplicate starting with 10 to 50 µg of protein. The amount of protein used was primarily determined by expression yield. Samples were first denatured with a Denaturation Buffer (6.25 M guanidine hydrochloride, 0.1 M tris, pH 8.2) by one of two procedures. For HEK clones, duplicate samples were dried under vacuum and reconstituted with 90 µL of Denaturation Buffer. For HT1080 and CHO clones, duplicate samples were buffer exchanged into Denaturation Buffer with a 96-well Zeba desalting plate (ThermoFisher Scientific, Waltham, MA). Following denaturation, samples were reduced with 2-3 µL of 0.5 M TCEP bond breaker (ThermoFisher Scientific, Waltham, MA) by incubation at room temperature in the dark with shaking at 750 RPM for 1 hr. The reduced samples were alkylated with 3-5 µL of 0.5 M iodoacetic acid (ThermoFisher Scientific, Waltham, MA) for 1 hr in the dark at



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 16

room temperature without shaking. The cysteine-alkylated samples were desalted into Digestion Buffer (50mM tris pH 8.0) using 96-well Zeba desalting plates. One replicate of each sample was digested with sequencinggrade endoprotease Lys-C (Roche Diagnostics, Switzerland) and the second replicate was digested with sequencing-grade endoprotease Glu-C (Roche Diagnostics, Switzerland). For both proteases, 2 µg of protease was added to the desalted protein sample. The sample plate was sealed with foil sealing film, and digestion was allowed to proceed for 18 hr at 25 °C. Digestion was quenched with the addition of 2 µL of 5% trifluoroacetic acid (TFA). Samples were transferred to Waters Total Recovery UPLC vials (Waters, Millford, MA) and stored frozen at -80 °C until analysis. Chromatography and Mass Spectrometry. Analyses were performed with an Acquity Classic UPLC (Waters, Millford, MA) with an in-line LTQ Orbitrap Velos mass spectrometer (ThermoFisher Scientific, Waltham, MA) operated in positive-ion mode using a data-dependent top two ion method. The heated electrospray ionization source was set with a capillary temperature of 275 °C and a source voltage of 4 kV. Full MS scans were conducted at a resolution of 30,000 FWHM for the scan range 500 to 2000 m/z followed by two data-dependent CID MS/MS scans and two data-dependent HCD MS/MS scans using a dynamic exclusion set at 15 s. Samples were separated on a BEH300 2.1x100 mm chromatography column with 1.7 µm particles (Waters, Millford, MA). Mobile Phase A was 0.1% TFA in LC-MS-grade water and Mobile Phase B was 0.085% TFA in LCMS-grade acetonitrile. Chromatographic separation was conducted at a 0.25 mL/min flow rate with the following 75 min gradient: initial condition of 2% B, 5 minute hold at 2% B, 56 minute linear gradient to 55% B, 5 minute linear gradient to 95% B, 2 minute hold at 95% B, 2 minute linear gradient to 2% B, and 5 minute hold at 2% B. The column temperature was set at 35 °C and the autosampler was set at 5 °C. Automated Data Analysis. MS data were analyzed with PepFinder 2.0 (ThermoFisher Scientific, Waltham, MA) and were separated into two batches based on the digestion protease used in the study. Default analysis parameters were used with the following modifications: a 5000 signal-to-noise cutoff was applied, the maximum peptide mass was set at 10 kDa, the mass accuracy threshold was set at 10 ppm, strict protease specificity was used, CHO glycans were selected, and “single base change” amino acid substitutions were chosen. A modification list and ion list were generated using the default software parameters and exported into Microsoft Excel for further analysis. A visual assessment of the variability for the assay control replicates versus the clone samples was used to identify true-positive annotations from the PepFinder modification list, which were confirmed by inspection of the fragmentation spectra. A sufficiently complex calculation could likely be developed to computationally identify true-positive annotations; however, the overall goal is to develop a simple high-throughput method and we found a visual inspection of the data to be a simple and robust approach to identify true-positives. The relative quantitation of amino acid modifications is based on MS peak areas, and, except as stated, is derived directly from the modification list generated by PepFinder 2.0. Abundances calculated based on relative MS peak areas may be impacted by differences in ionization efficiency of modified versus unmodified peptides. This impact would not be expected to affect the clone ranking efforts of this paper.


Page 5 of 16

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Glycan nomenclature follows that of Zhang and Shah 2010 and of Shah et al. 2014.18,19 Details of the calculation of Relative Terminal Monosaccharide content and Relative Glycan Type are described in the Supporting Information. Results and Discussion Three host cell lines were transfected with expression vectors encoding the same recombinant therapeutic protein. Two CHO-derived pools of stable transfectants were generated: one with and one without the selection reagent methionine sulfoximine (MSX). One stable transfectant pool was generated for each of the human cell lines (HEK and HT1080). These four pools were cloned, and the best clones were identified over three testing rounds, as shown in schematic 1, based on high-throughput titer and potency assays (data not shown). The top 89 clones (30 HEK, 16 HT1080, 27 CHO selected without MSX, and 16 CHO selected with MSX) were analyzed by the high-throughput LC-MS/MS peptide mapping workflow described below. Overview of peptide mapping workflow This peptide mapping workflow includes three important features to simplify the acquisition and interpretation of MS data. First, the peptide mapping sample preparation was executed using high-throughput methodologies. In our hands, the most labor-intensive portion of peptide mapping sample prep is the buffer exchange steps, which are normally executed individually with spin- or drip-columns. In our hands, this limits throughput to one to two dozen samples per assay. In this protocol, 96-well desalting plates were used, which increases throughput roughly 4-6 fold, allowing 96 peptide maps to be prepared concurrently. Second, the higher-throughput sample preparation enabled the inclusion of multiple control replicates and orthogonal protease digestions, which encode an estimation of accuracy and precision into the MS results. All samples and controls were prepared with separate Lys-C and Glu-C digestions. The use of two proteases improves sequence coverage. Areas of the protein that are poorly recovered with one protease (e.g., too many or too few cleavage sites, poor chromatographic peak shape) may be well-recovered with the other protease. Additionally, the two proteolytic treatments generate orthogonal data sets. Attributes that are well quantified with both proteases can be compared to evaluate the accuracy of the quantitation. Additionally, multiple replicates of an assay control were included. The variability of the control replicates provides an estimate of the precision of each measurement. In general, attributes of interest for clone selection are those that vary more significantly between the clones than between the control replicates (i.e., the samples vary more than the precision of the measurement). Third, the MS data files were analyzed using the commercial software package PepFinder, based on the MassAnalyzer algorithm,11 which generates a comprehensive data set of potential protein modifications. A significant advantage of this algorithm for high-throughput analyses is that it will apply a single MS/MS annotation identified in one file to all other files analyzed concurrently. Due to the stochastic nature of ion selection for MS/MS fragmentation, each MS data file will have a slightly different set of fragmentation spectra and thus a different set of annotations. In PepFinder, the annotations are pooled and applied to all files. Thus, the resulting data set contains the same quantified modifications for all samples, which substantially simplifies



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 16

data processing. Additionally, the total number of true-positive annotations increases when large numbers of diverse samples (e.g., those from different host cell lines) are processed together, yielding a more comprehensive characterization profile for the protein of interest. Existing approaches for LC-MS/MS analysis of proteins were not sufficient to quickly and comprehensively characterize large numbers of disparate samples, as is required in clone selection, primarily due to difficulties in identifying true-positive annotations from computational results. The combination of these three important features results in a comprehensive data set with quality control elements built in (e.g., precision, accuracy) such that the discrimination of false-positives versus true-positives is drastically simplified requiring very little manual analysis of MS/MS spectra or of individual MS peak areas. Overall, this method delivers a roughly 4-6 fold increase in the number of samples that can be processed concurrently, and the data analysis time is reduced from days or weeks to hours. In total, 210 data files were generated in four separate experiments (one each for the HT1080, HEK, CHO selected with MSX, and CHO selected without MSX clones) including 89 protein samples and 16 assay control replicates with 2 proteolytic digestions per sample. The analysis of the Lys-C maps generated 272 annotations, and the Glu-C maps generated 283 annotations. Glycan Analysis Glycosylation plays a fundamental role in determining the safety and efficacy of many protein therapeutics20-22 and can be one of the most important PTMs to consider during clone selection. For the candidate protein therapeutic in the present work, two glycan properties were deemed most important. First, we sought to maximize the level of N-acetylnueraminic acid (NeuAc), which impacts serum half-life by reducing affinity to the asialoglycoprotein receptor, a key route of protein uptake and degradation in vivo.23 Second, we sought to minimize the relative abundance of N-glycolylneuraminic acid (NeuGc), a non-human monosaccharide produced in CHO cells that has been associated with immunogenicity.24-27 NeuGc is expected to be absent from the HEK and HT1080 samples because human cell lines lack the enzymes to synthesize NeuGc and the cell culture medium does not contain exogenous NeuGc.27 N-linked glycosylation sites can be highly heterogeneous with numerous different glycan types present at each site.28 Measuring the relative abundance of each glycan can yield a large data set that is difficult to interpret. For this study, specific glycan attributes relevant to clone selection were calculated, including a calculation of the Relative Terminal Monosaccharide content and Relative Glycan Type (Figure 1B,C). The Relative Terminal Monosaccharide value indicates the proportion of glycan antennae terminating in monosaccharides of a given type (i.e., Man, Gal, GlcNAc, NeuAc, or NeuGc). This attribute is useful for assessing the relative NeuAc or NeuGc content. Relative Glycan Type indicates the proportion of glycans that are unoccupied, hybrid, high mannose, or complex. Complex glycans are further separated based on the number of antennae (i.e., bi-, tri-, or tetraantennary). The antennarity results are also useful for predicting in vivo clearance. Complex glycans with higher antennarity have a higher affinity to the asialoglycoprotein receptor and are cleared more quickly.23 This protein contains two N-linked glycosylation sites. Glycan site 1 was quantified with both the Lys-C (Fig. 1) and Glu-C (Fig. S1) glycopeptide maps, while glycan site 2 was only recovered with Glu-C digestion (Fig. S2). The consistent quantitation of glycans at both sites in the 16 HT1080 control replicates demonstrates good precision. The average RSD for glycans in the control with larger than 1% relative abundance was 13% for site 1 with Lys-C


Page 7 of 16

(N=26), 19% for site 1 with Glu-C (N=23), and 17% for site 2 with Glu-C (N=23). Additionally, the calculated Relative Terminal Monosaccharide and Relative Glycan Type attributes at Site 1 correlate very well between the Lys-C and the Glu-C maps (Fig. S3). Furthermore, the relative abundances of individual glycans at Site 1 are also well correlated between the Lys-C and Glu-C maps (Fig. S4). Taken together, these correlations suggest an accurate measurement of glycosylation. The glycan characterization results indicate several broad trends. Site 1 is composed of complex glycans of variable antennarity, while Site 2 is a mix of high-mannose, hybrid, and complex glycans. There appears to be more glycan variability between host cell lines than between clones from the same host. Additionally, CHOderived samples contain very little terminal GlcNAc at Site 1 compared to the human-derived samples, which, in light of the absence of terminal Man at Site 1, may indicate that the CHO-derived clones are more competent at the addition of galactose during glycan biosynthesis. The CHO-derived samples also appear to generally contain complex glycans of lower antennarity than the human samples.

% N-Linked Glycan Abundance at Site 1

A 100% 80%

60%

40%

20%

0% 30 HEK Clones

16 HT1080 Clones

27 CHO (-) MSX Clones

16 CHO (+) MSX Clones

16 HT1080 Controls

Deamidation

A4Sg1S2G1F

A4S4F

A4S3G1F

A4S3G1

A4S3G0F

A4S2G2F

A4S2G1F

A4S2G0F

A4S1G3F

A4S1G2F

A4S1G1F

A4G4F

A4G1F

A4G0F

A3Sg1S2F

A3Sg1S1G1F

A3S3F

A3S3

A3S2G1F

A3S2G0F

A3S1G2F

A3S1G1F

A3S1G0F

A3G3F

A3G1F

A3G0F

A2Sg1S1F

A2S2F

A2S2

A2S1G1F

A2S1G1

A2S1G0F

A2G2F

A2G2

A2G1F

A2G0F

A2G0

A1G0F

A1G0

C 100%

% Glyan Type at Site 1

B 100%

% Terminal Monosaccharide at Site 1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


80%

60%

40%

20%

80% 60% 40% 20% 0%

0% 30 HEK Clones

% NeuAc

16 HT1080 Clones

% NeuGc


% GlcNAc

16 CHO 16 (+) MSX HT1080 Clones Controls

% Gal

% Man

30 HEK Clones

Unnocupied Biantennary


16 27 CHO (-) MSX 16 CHO 16 HT1080 Clones (+) MSX HT1080 Clones Clones Controls High Man Triantennary

Hybrid Tetraantennary


Figure 1. A) Relative abundance of annotated N-linked glycopeptides at Site 1 from the Lys-C glycopeptide maps. Each bar is an individual sample (either clone or control). B) Relative Terminal Monosaccharide content and C) Relative Glycan Type were calculated as described in the Supporting Information.

5% y = 0.66x - 0.11 R² = 0.86

A % Terminal NeuGc [GluC, Site 1]

45%

30%

15% y = 0.51x - 0.16 R² = 0.82

0% 0%

B

4% 3% 2% 1% 0%

100% 30% 25% 20% 15% 10% 5% 50% 0% % Terminal NeuAc 0% 10% 20% [LysC, Site 1] HEK

y = 0.51x + 0.01 R² = 0.20

HT1080

0% 30%

2% 4% % Terminal NeuGc 40% 50% [LysC, Site 1]

HT1080 Control

A2Sg1S1F / (A2Sg1S1F + A2S2F) [GluC, Site 1]

This data set can be used to identify clones with the highest NeuAc content. Relative Terminal NeuAc levels at Site 1 are correlated with Site 2 (Figure 2A), likely because variation in the glycan biosynthesis machinery for sialylation would impact both glycosylation sites. However, it appears that the human clones (HT1080 and HEK) fall on a separate trend line compared with the CHO clones. In general, the CHO and HEK clones produce more highly sialylated glycans when compared to the HT1080 clones, and the clones with the highest NeuAc content can be identified.

% Terminal NeuAc [GluC, Site 2]

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 16

30% 25%

y = 0.91x + 0.01 R² = 0.95

C

20% 15% 10% 5%

0% 0% 10% 20% 30% 6% y = 0.33x + 0.03… A2Sg1S1F / (A2Sg1S1F + A2S2F) 60% 70% 80% 90% [LysC, Site 1]

CHO (-) MSX

CHO (+) MSX

Figure 2. Comparison of A) Relative Terminal NeuAc content calculated at N-linked Site 1 from the Lys-C maps and at N-linked Site 2 from the Glu-C Maps. Comparison of B) Relative Terminal NeuGc content calculated at Nlinked site 1 from the Lys-C maps versus the Glu-C maps. Comparison of the C) relative A2g1S1F content from the Lys-C versus the Glu-C maps for N-linked Site 1. Relative A2g1S1F content is calculated using the MS peak areas for the +4 charge state in the following equation: A2g1S1F / (A2g1S1F + A2S2F). Linear least-squares best fit lines and equations are shown. Only the CHO results (excluding the outlier) are included in the least-squares fit for Fig. 2B,C. The identification of clones with lower NeuGc content is more difficult than identifying clones with higher NeuAc content. In this study, NeuGc-containing glycans have weaker signal intensities with poor signal-to-noise because NeuGc is a minor component in the CHO-derived glycans. Although no NeuGc-containing glycans were detected at Site 2, glycans at Site 1 can be used to estimate NeuGc content and the comparison of NeuGc results from the Lys-C versus Glu-C maps can be used to estimate the accuracy of this measurement. As shown in Figure 2B, the poor correlation between Relative Terminal NeuGc content at Site 1 from the Lys-C versus the Glu-C maps indicates that the calculated Relative Terminal NeuGc results are not sufficiently accurate to confidently identify clones with lower NeuGc content.


Page 9 of 16

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


In an effort to improve signal-to-noise in the NeuGc quantitation, we focused on a single charge state (+4) of the most abundant sialylated glycan in the CHO samples, the core-fucosylated, doubly-sialylated, biantennary glycan (A2S2F), and a NeuGc-containing analogue (A2Sg1S1F). The structures of these two glycans are shown in Fig. 3. The relative abundance of A2Sg1S1F versus A2S2F can be used as an indicator of NeuGc content and correlates well between the Lys-C and Glu-C maps (Fig. 2C). This good correlation between the two orthogonal data sets suggests an accurate ranking of clones by NeuGc content and allows for the confident identification of clones with lower NeuGc content

Figure 3. Schematic of glycan structures A2S2F and A2Sg1S1F Unexpected PTMs Two unexpected PTMs were identified: hydroxylation at a proline residue29 and mannosylation at a tryptophan residue.30-32 Both PTMs are mediated by amino acid sequence motifs, which were found in the protein sequence after MS identification, but neither PTM had been previously reported for this protein in the literature. Notably, both PTMs are located in areas of the protein important for function. Figure 4A shows the level of proline hydroxylation. The variability of the control results (15 % RSD in Lys-C maps, 24 % RSD in Glu-C maps) demonstrates good precision in the measurement when compared to the samples (56 % RSD in Lys-C maps, 65 % RSD in Glu-C maps). The measured level of proline hydroxylation is similar between the orthogonal Lys-C and Glu-C data sets (Fig. 4C), which implies accurate quantitation of this PTM. The HT1080 clones have the lowest amount of proline hydroxylation, while the CHO clones produce relatively higher levels, ranging approximately from 25-30% hydroxylation.



% Pro Hydroxylation

35% 30%

LysC Data

25%

GluC Data

A

20% 15% 10% 5% 0% 16 HT1080 Clones



16 HT1080 Controls

18% 16% 14% 12% 10% 8% 6% 4% 2% 0%

B

30 HEK Clones

16 HT1080 Clones

35% y = 1.00x - 0.01 R² = 0.87

30%

C

25% 20% 15% 10% 5% 0% 0%

30% 25% 20% 15% 10% 5% 0%

10% 0%

20% 10%

30% 20%

30%

40%


% Trp Double Oxidation Site 2

% Trp Mannosylation

30 HEK Clones

% Pro Hydroxylation (GluC)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 16

40%

HT1080

16 HT1080 Controls

4% 3%

y = 6.42x - 0.00 R² = 0.75

D

2% 1% 0% 0.0% 50%

y =0.2% 0.33x + 0.03… 0.4% 60% 70%

% Pro Hydroxylation (LysC) HEK


80%

0.6%

90%

0.8%

% Trp Double Oxidation Site 1 HT1080 Control

CHO (-) MSX

CHO (+) MSX

Figure 4. A) Percent proline hydroxylation observed in the Lys-C and Glu-C maps. B) Percent tryptophan mannosylation observed in the Glu-C maps. Each bar is an individual sample (either clone or control). C) Comparison of the percent proline hydroxylation observed in the Lys-C and Glu-C maps. D) Comparison of two tryptophan double oxidations observed in the Lys-C maps. Linear least-squares best fit lines and equations are shown. Relative abundance is based on MS peak areas with no correction for possible differences in ionization efficiency of the modified versus unmodified peptides. Figure 4B shows the level of tryptophan mannosylation. This PTM was only quantified in the Glu-C map. The variability of the control (20 % RSD) is less than that of the samples (44 % RSD), thus the clonal variability in tryptophan mannosylation is larger than the variability in the measurement. The clones range from roughly 1% to nearly 20% mannosylated.


Page 11 of 16

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


It is highly advantageous to detect and quantify unexpected PTMs at early stages of clone selection. In this case, potency data was collected concurrently, which demonstrated no significant impact of these PTMs on potency (data not shown). This information can be highly beneficial for quality attribute criticality assessments, an important component of QbD-based process development.33 For PTMs that are difficult to modulate in vitro, such as proline hydroxylation and tryptophan mannosylation, it would be difficult to gather a similarly diverse data set with sufficient PTM variability to assess attribute criticality. Degradation The level of oxidation at two tryptophan residues, both quantified in the Lys-C maps, were found to be well correlated (Fig. 4D). These two residues are expected to be surface exposed and spatially separate based on their relative positioning in the crystal structures of homologous proteins. The good correlation between oxidation levels at these two sites could indicate an accurate ranking of the oxidative propensity of clones, where a low propensity for oxidation would be a desirable trait of the selected production clone. However, there is significant variability in the HT1080 control measurements with respect to the samples (Fig. 4D). It appears that the level of oxidation varies significantly between experiments, where some experiments (e.g., those including the HT1080 clone samples) yielded more oxidation than others. Therefore, the observed variation in tryptophan oxidation is likely due to day-to-day variability in the experiment. This demonstrates the utility of multiple control replicates for determining whether observed modifications are experimental artifacts or attributes of the samples. No other annotated modifications indicated a significant propensity for specific host cell lines or individual clones to generate degraded protein. Sequence Variants It is critically important for the chosen clone to express the correct protein sequence. Unexpected mutations may impact the safety and efficacy of the protein therapeutic and would invariably require significant effort to characterize and justify to health agencies. Three types of sequence variants were observed. First, two examples of sequence variants likely caused by mutations acquired during the selection process are shown in Figure 5A and 5B. Second, a series of sequence variants appeared in only the CHO clones at roughly 1-10%, two are shown in Figure 5C and 5D. These sequence variants were found to correspond to the endogenous CHO analogue of the protein therapeutic, a recombinant human protein. The CHO and human versions of this protein have 94% amino acid identity, and the MassAnalyzer algorithm identified the unique amino acid differences of the CHO variant. The sequence differences between the human protein and endogenous CHO variant are expected to be highly surface exposed and could pose an immunogenicity risk.34-36 Additionally, the CHO and human variants are not sufficiently different to expect downstream purification to successfully separate the variant from the desired product. Therefore, any CHO clone would likely need the gene for the endogenous CHO protein to be genetically knocked-out. Finally, variability in the amino acid sequence was also identified at the N-terminus of the protein. The signal sequence, which directs the protein to be translated into the endoplasmic reticulum, is heterogeneously cleaved.37-39 A small percentage of the protein contains one, four, or seven additional amino acids at the N-terminus. As shown in Figure 5E, all clones from all hosts appear to express similar amounts of the elongated forms of the protein (~5%). There were no clear examples of sequence variants due to amino acid



misincorporation,40,41 although it is possible that the threshold used for efficient data processing was too high to detect low-level misincorporations.

B

V -> M Mutation 35% 30% 25% 20% 15% 10% 5% 0%

75% 60% 45% 30% 15% 0%

30 HEK Clones

C

16 HT1080 Clones



16 HT1080 Controls

30 HEK Clones

D

L -> V Mutation

% Abundance

6% 4% 2% 0%

16 HT1080 Clones



16 HT1080 Controls

S -> A Mutation

6%

8%

% Abundance

S -> F Mutation

90%

% Abundance

% Abundance

A

5% 4% 3% 2% 1% 0%

30 HEK Clones

16 HT1080 Clones



16 HT1080 Controls

30 HEK Clones

16 HT1080 Clones



16 HT1080 Controls

E 100% 80%

% Abundance

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 16

60% 40% 20% 0% 30 HEK Clones

16 HT1080 Clones

Expected Nterm

Nterm + 1AA


Nterm + 4AA


16 HT1080 Controls

Nterm + 7AA

Figure 5. Percent abundance of point mutations observed in A) a single HEK clone and B) a single CHO clone that are likely due to mutations in the genomic DNA of the integrated transfected gene. Percent abundance of C, D) two point mutations observed in all CHO clones indicating the relative abundance of the endogenous CHO analogue of the recombinant therapeutic protein. E) Percent abundance of protein generated from different cleavage sites in the N-terminal signaling sequence. Each bar is an individual sample (either clone or control). Conclusions In this work, a simple, robust, and high-throughput LC-MS/MS peptide mapping method was demonstrated. The combination of high-throughput sample preparation and automated data analysis enabled the inclusion of


Page 13 of 16

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


multiple control replicates and orthogonal protease digestions. The control replicates supported an estimation of method precision while orthogonal protease digestions allowed for an estimation of method accuracy. This greatly simplified the identification of true-positive results from false-positives in the data analysis output. This approach allows 4-6 fold more samples to be processed concurrently and significantly reduces data analysis time. While this approach is high-throughput with respect to sample preparation and data analysis, improvements could be made to optimize the chromatographic separation to further improve the throughput. In this study, data collection with a 75 minute method and 210 peptide maps required roughly 1.5 weeks of instrument time in total. However, the continuous operation of an LC-MS/MS instrument requires minimal hands-on time, especially when compared with the time saved during data analysis. Additionally, as chromatography columns, UPLC systems, and MS instruments continue to improve, it is expected that less instrument time will be required for similar studies in the future. A key goal in clone selection is to identify and assess product quality risks. Glycosylation results, including NeuAc and NeuGc levels, were measured. HT1080 clones produced poorly sialylated protein while the CHO and HEK clones produced the highest NeuAc levels. CHO clones were identified with the lowest levels of NeuGc. In addition to glycosylation, two unexpected PTMs were identified: proline hydroxylation and tryptophan mannosylation. Activity results were able to demonstrate that these PTMs had no significant impact on potency. No modifications were identified that indicated any clones express significantly more or less degraded protein (e.g., oxidation, deamidation). Two clones were identified that produced protein with point mutations, and all CHO clones expressed 1-10% of the endogenous CHO version of the recombinant protein, which potentially presents an immunogenicity risk. Taken together, these results were used to identify the top HEK and CHO clones. The HT1080 clones were excluded due to less favorable glycosylation and poor productivity (data not shown). The top CHO clones are preferable due to higher productivity, but they would require gene-editing to knock-out the endogenous CHO variant. The top HEK clones would not require gene-editing, but have lower productivities. Using these results, a final decision to select a CHO or HEK production clone will be based on a balance between program material needs and timeline constraints. This method proved to be highly valuable for the analytical support of a biopharmaceutical cell-line development process. This method was used to identify the types of protein modifications present, the functional impact of those modifications on product quality, and the variability between clones and host cell lines; thus, enabling a more informed decision for clone selection. The characterization results generated in this study are also useful for the identification of potential critical quality attributes and for an assessment of their criticality33 to safety and efficacy. Finally, this characterization work can provide the foundation for the development of multi-attribute peptide mapping methods for use in routine testing to support later process development and manufacturing activities.18,42,43 Supporting Information Calculation details for aggregate glycan metrics, Glu-C peptide map results for both N-linked glycosylation sites, comparison of results at N-linked site 1 from Lys-C versus Glu-C peptide maps



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 16

References (1) Walsh, C. T.; Garneau-Tsodikova, S.; Gatto, G. J., Jr. Angew Chem Int Ed Engl 2005, 44, 7342-7372. (2) Beck, A.; Wagner-Rousset, E.; Ayoub, D.; Van Dorsselaer, A.; Sanglier-Cianferani, S. Anal Chem 2013, 85, 715736. (3) Patel, J.; Kothari, R.; Tunga, R.; Ritter, N. M.; Tunga, B. BioProcess Int 2011, 9, 20-31. (4) Kita, A.; Ponniah, G.; Nowak, C.; Liu, H. Anal Chem 2016, 88, 5430-5437. (5) Levy, M. J.; Gucinski, A. C.; Boyne, M. T., 2nd. Anal Chem 2015, 87, 6995-6999. (6) Sjogren, J.; Olsson, F.; Beck, A. Analyst 2016, 141, 3114-3125. (7) Srebalus Barnes, C. A.; Lim, A. Mass Spectrom Rev 2007, 26, 370-388. (8) Tran, B. Q.; Barton, C.; Feng, J.; Sandjong, A.; Yoon, S. H.; Awasthi, S.; Liang, T.; Khan, M. M.; Kilgour, D. P.; Goodlett, D. R.; Goo, Y. A. J Proteomics 2016, 134, 93-101. (9) Yang, X.; Kim, S. M.; Ruzanski, R.; Chen, Y.; Moses, S.; Ling, W. L.; Li, X.; Wang, S. C.; Li, H.; Ambrogelly, A.; Richardson, D.; Shameem, M. MAbs 2016, 8, 706-717. (10) Zhang, B.; Jeong, J.; Burgess, B.; Jazayri, M.; Tang, Y.; Taylor Zhang, Y. J Chromatogr B Analyt Technol Biomed Life Sci 2016. (11) Zhang, Z. Anal Chem 2009, 81, 8354-8364. (12) Compton, P. D.; Zamdborg, L.; Thomas, P. M.; Kelleher, N. L. Anal Chem 2011, 83, 6868-6874. (13) Toby, T. K.; Fornelli, L.; Kelleher, N. L. Annu Rev Anal Chem (Palo Alto Calif) 2016, 9, 499-519. (14) Cox, J.; Mann, M. Nat Biotechnol 2008, 26, 1367-1372. (15) Sadygov, R. G.; Cociorva, D.; Yates, J. R., 3rd. Nat Methods 2004, 1, 195-202. (16) Wurm, F. M. Nat Biotechnol 2004, 22, 1393-1398. (17) Rasheed, S.; Nelson-Rees, W. A.; Toth, E. M.; Arnstein, P.; Gardner, M. B. Cancer 1974, 33, 1027-1033. (18) Shah, B.; Jiang, X. G.; Chen, L.; Zhang, Z. J Am Soc Mass Spectrom 2014, 25, 999-1011. (19) Zhang, Z.; Shah, B. Anal Chem 2010, 82, 10194-10202. (20) Arnold, J. N.; Wormald, M. R.; Sim, R. B.; Rudd, P. M.; Dwek, R. A. Annu Rev Immunol 2007, 25, 21-50. (21) Sola, R. J.; Griebenow, K. BioDrugs 2010, 24, 9-21. (22) Sola, R. J.; Griebenow, K. J Pharm Sci 2009, 98, 1223-1245. (23) Stockert, R. J. Physiol Rev 1995, 75, 591-609. (24) Borys, M. C.; Dalal, N. G.; Abu-Absi, N. R.; Khattak, S. F.; Jing, Y.; Xing, Z.; Li, Z. J. Biotechnol Bioeng 2010, 105, 1048-1057. (25) Noguchi, A.; Mukuria, C. J.; Suzuki, E.; Naiki, M. J Biochem 1995, 117, 59-62. (26) Padler-Karavani, V.; Yu, H.; Cao, H.; Chokhawala, H.; Karp, F.; Varki, N.; Chen, X.; Varki, A. Glycobiology 2008, 18, 818-830. (27) Tangvoranuntakul, P.; Gagneux, P.; Diaz, S.; Bardor, M.; Varki, N.; Varki, A.; Muchmore, E. Proc Natl Acad Sci U S A 2003, 100, 12045-12050. (28) Stanley, P.; Schachter, H.; Taniguchi, N. In Essentials of Glycobiology, Varki, A.; Cummings, R. D.; Esko, J. D.; Freeze, H. H.; Stanley, P.; Bertozzi, C. R.; Hart, G. W.; Etzler, M. E., Eds.; Cold Spring Harbor Laboratory Press The Consortium of Glycobiology Editors, La Jolla, California.: Cold Spring Harbor (NY), 2009. (29) Gorres, K. L.; Raines, R. T. Crit Rev Biochem Mol Biol 2010, 45, 106-124. (30) Buettner, F. F.; Ashikov, A.; Tiemann, B.; Lehle, L.; Bakker, H. Mol Cell 2013, 50, 295-302. (31) Hofsteenge, J.; Muller, D. R.; de Beer, T.; Loffler, A.; Richter, W. J.; Vliegenthart, J. F. Biochemistry 1994, 33, 13524-13530. (32) Julenius, K. Glycobiology 2007, 17, 868-876. (33) Group, C. B. W. Emeryville, CA: CASSS 2009. (34) Chirino, A. J.; Ary, M. L.; Marshall, S. A. Drug Discov Today 2004, 9, 82-90. (35) Schellekens, H. Clin Ther 2002, 24, 1720-1740; discussion 1719.


Page 15 of 16

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


(36) Schellekens, H. Nephrol Dial Transplant 2003, 18, 1257-1259. (37) Petersen, T. N.; Brunak, S.; von Heijne, G.; Nielsen, H. Nat Methods 2011, 8, 785-786. (38) Ambrogelly, A.; Liu, Y. H.; Li, H.; Mengisen, S.; Yao, B.; Xu, W.; Cannon-Carlson, S. MAbs 2012, 4, 701-709. (39) Kotia, R. B.; Raghani, A. R. Anal Biochem 2010, 399, 190-195. (40) Harris, R. P.; Kilby, P. M. Curr Opin Biotechnol 2014, 30, 45-50. (41) Zhang, Z.; Shah, B.; Bondarenko, P. V. Biochemistry 2013, 52, 8165-8176. (42) Rogers, R. S.; Nightlinger, N. S.; Livingston, B.; Campbell, P.; Bailey, R.; Balland, A. MAbs 2015, 7, 881-890. (43) Dong, J.; Migliore, N.; Mehrman, S. J.; Cunningham, J. C.; Lewis, M. J.; Hu, P. Analytical Chemistry 2016.



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

For Table of Contents Only


Page 16 of 16

Comprehensive Discovery and Quantitation of Protein Heterogeneity

Recommend Documents