In-Depth Characterization and Spectral Library Building of

Mar 18, 2016 - This work presents a detailed analysis of glycopeptides produced in the tryptic digestion of an IgG1 reference material. Analysis was d...
0 downloads 3 Views 1MB Size
Subscriber access provided by HOWARD UNIV

Article

In-depth Characterization and Spectral Library Building of Glycopeptides in the Tryptic Digest of a Monoclonal Antibody Using 1D and 2D LC-MS/MS Qian Dong, Xinjian Yan, Yuxue Liang, and Stephen E. Stein J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.5b01046 • Publication Date (Web): 18 Mar 2016 Downloaded from http://pubs.acs.org on April 5, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Proteome Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

In-depth Characterization and Spectral Library Building of Glycopeptides in the Tryptic Digest of a Monoclonal Antibody Using 1D and 2D LC-MS/MS

Qian Dong*, Xinjian Yan, Yuxue Liang, and Stephen E. Stein

Biomolecular Measurement Division, National Institute of Standards and Technology, 100 Bureau Drive, Stop 8362, Gaithersburg, Maryland 20899, United States

* Corresponding author: Email: [email protected]; Tel: 301-975-2569

Running title: Characterization and Spectral Library Building of Glycopeptides

1 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ABBREVIATIONS PAGE LC-MS: liquid chromatography–mass spectrometry, LC-MS/MS: liquid chromatography-tandem mass spectrometry, MWCO: molecular weight cut-off, RSLC: rapid separation liquid chromatography, UPLC: ultra-performance liquid chromatography, HILIC: hydrophilic interaction liquid chromatography, 1D RPLC: one dimensional reversed phase liquid chromatography, 2D RPLC: two dimensional reversed phase liquid chromatography, MS1: full MS scan, MS2: tandem MS scan, HCD: higher energy collision dissociation, NCE: normalized collisional energies, DTT: dithiothreitol, IAA: iodoacetamide, TCEP: tris(2carboxyethyl)phosphine, TRIS: tris-hydroxymethyl-aminomethane, NIST: National Institute of Standards and Technology, XIC: extracted ion chromatograms, mAb: monoclonal antibody, GlcNAc: Nacetylglucosamine, GalNAc: N-acetylgalactosamine, HexNAc: GlcNAc or GalNAc, Fuc: Fucose, Neu5Gc: Nglycolylneuraminic acid, Y0: peptide, Y1: Y0+HexNAc, 0,2X0: Y0+C4H5NO (Y0+83.0371 or Y0+GlcNAc fragment), cross-ring cleavage of the reducing GlcNAc residue, Hex: galactose or mannose, Alpha-Gal: galactosealpha-1,3-galactose

2 ACS Paragon Plus Environment

Page 2 of 44

Page 3 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Abstract This work presents a detailed analysis of glycopeptides produced in the tryptic digestion of an IgG1 reference material. Analysis was done by nanospray ESI LC-MS/MS over a wide range of HCD collision energies with both conventional 1D separation for various digestion conditions and a 20-fraction 2D-LC study of a single digest. An extended version of NIST-developed software for analysis of “shotgun” proteomics served to identify the glycopeptides from their precursor masses and product ions for peptides with up to three missed cleavages. A peptide with a single missed cleavage, TKPREEQYNSTYR, was dominant and led to the determination of almost all glycans reported in this study. The 2D studies found a total of 247 glycopeptide ions and 60 glycans of different masses, including 30 glycans found in the 1D studies. This significantly larger number of glycans than found in any other glycoanalysis of therapeutic glycoproteins is due to both the improved separation of sialylated vs asialylated species in the first (high pH) dimension and the ability to inject large amounts of glycosylated peptides in the 2D studies. Systematic variations in retention with glycan size were also noted. Energy dependent changes in HCD fragmentation confirmed the proposed glycan structures and led to a peak-annotated mass spectral library to aid the analysis of glycopeptides derived from IgG1 drugs.

Keywords: Glycopeptides, HCD fragmentation, N-glycosylation, quantification, fractionation, 2D LCMS/MS, high pH RPLC and low pH RPLC, Human IgG1 mAb

Introduction A well-known heterogeneity in antibody drugs and a source of significant variation between different such drugs or even batches of the same drug are the distribution of glycans on the protein.1-9 A variety of methods have been developed for determining this distribution, which for many IgG-based drugs occurs at a single, conserved site, though occupied by a wide range of glycan structures.10-14 Methods for 3 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

measuring this distribution include chromatographic determination of enzymatically released and fluorescently labeled or permethylated glycans15-16, determinations of mass distributions of intact proteins, intact protein mass measurement17 as well as by measurement of glycopeptides released by enzymatic digestion.18-19 In this paper we extend the latter method by using 2D-LC for glycoanalysis of a NIST reference material, the recombinant human IgG1κ NISTmAb. While antibody digestion, typically by trypsin, is widely used for sequence determination, the product glycopeptides are invisible to most current proteomics software since their fragmentation generally does not generate sufficient sequence information to identify the peptide backbone. Using the known glycan location, characteristic fragmentation and a list of possible glycans, individual glycopeptides can be identified in digests. Detailed analysis shows that it is possible to avoid well-known influences of in-source fragmentation20-21. This method is employed in the present study to identify a large number of glycans and generate a glycopeptide library to aid the analysis of glycopeptides derived from IgG drugs. Since glycosylation of most IgG drugs occurs in a conserved region of the protein, this library is applicable to most antibody drugs.

Materials and Methods Materials The candidate reference material (RM) 8670 NISTmAb, lot 3f1b, 22 expressed by NS0 cell line from murine myeloma, was obtained from the Bioanalytical Science Group at NIST. Digestion reagents guanidine hydrochloride, urea, dithiothreitol (DTT), and iodoacetamide (IAA) were purchased from Sigma-Aldrich (St. Louis, MO, USA, see Disclaimer). Sequencing grade trypsin was purchased from Promega (Madison, WI, USA). Zeba spin columns (7K MWCO) were purchased from Thermo Fisher Scientific (Waltham, MA, USA). Chromatographic separations were performed on an Acclaim pepmap100 nano column (150mm x 75µm, C18, 3µm particle size, 100Å pore size, Dionex, Sunnyvale, CA). The first dimension of 2D separation

4 ACS Paragon Plus Environment

Page 4 of 44

Page 5 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

was conducted on a ZORBAX capillary reverse phase column (Extend-C18, 2.1x150 mm, 5µm particle size, Agilent).

Tryptic digestion for 1D and 2D analysis For most digestions reported here, 1 mg of NISTmAb was denatured with 6 mol/L guanidine in 100 µL of 100 mmol/L tris buffer at either room temperature or high temperature (85 oC). Reduction was performed by adding 5 µL of 200 mmol/L DTT to the above denatured mixture and incubated at room temperature for 1 h, followed by alkylation with 20 µL of 200 mmol/L IAA (iodoacetamide) at room temperature in the dark for 1 h. Then, the excess IAA was quenched by adding 20 µL of 200 mmol/L DTT solution and incubated at room temperature for 1 h. The resulting solution was desalted in a Zeba spin column (7K MWCO), and then digested by trypsin at 37 ˚C with extracts taken after 0.25 h, 2 h and 18 h for the 1D studies. 2D studies were digested for 18 h with high temperature guanidine for denaturing. Each of the above digests was quenched by 5 µL of formic acid (50 % volume fraction).

Several 1D studies were also done using the above 1D method with denaturing by 6 mol/L urea at room temperature.

1D LC-MS/MS analysis 1 µL aliquots of the digest solution (0.2 µg/µL) were analyzed on a Dionex Ultimate 3000 RSLC Nano LC with an Acclaim pepmap100 column with a nanospray source connected to a Q Exactive mass spectrometer (Thermo Fisher Scientific, Waltham, MA) in the positive ion mode. Mobile phase A consisted of 0.1 % (volume fraction) formic acid in water and mobile phase B consisted of 0.1 % (volume fraction) formic acid in ACN. The peptides were eluted by increasing mobile phase B from 1 % to 90 % over 75 minutes. Data was collected using a data dependent mode with a dynamic exclusion of 20 seconds. The

5 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

top 10 most abundant precursor ions were selected from a 250 m/z to 1850 m/z full scan for fragmentation by the use of HCD with normalized collisional energy (NCE) parameters (NCE of 12, 16, 20, 24, 28, 32, 36, and 40). The resolution of full MS scan was set at 70,000 and the resolution of MS/MS scan was set at 17,500.

2D LC-MS/MS analysis For the first dimension of 2D LC analysis, 1.0 mg of digest was fractionated by basic (pH 10) reverse phase high-performance liquid chromatography (HPLC) on an Agilent ZORBAX capillary column by a Thermo Dionex Ultimate 3000 HPLC system. Solvent A was composed of 20 mmol/L ammonium formate, 2 % (volume fraction) ACN, pH 10 and solvent B was 90 % (volume fraction) ACN, 10 % volume fraction 20 mmol/L ammonium formate, pH 10. The flow rate was 200 µL per minute. The gradient was 0 min to 5 min 0 % solvent B, 5 min to 60 min 50 % solvent B, 60 min to 70 min 100 % solvent B, 70 min to 95 min 0 % solvent B. A total of 80 fractions were collected at one minute intervals between 5 minutes and 85 minutes. These fractions were combined into 20 fractions by combining four one-minute fractions separated by 20-minute intervals (e.g., fractions 1, 21, 41, and 61 were combined; 2, 22, 42, and 62, etc.). Final fractions were frozen at -80 oC and dried before LC-MS/MS analysis. The second dimension of 2D LC-MS/MS analysis was done for each of the 20 fractions dissolved in 200 µL of 0.1 % (volume fraction) formic acid in water using the same procedures described above for the 1D analysis. The only exception is that 1 µg of peptides was injected, which was about 5 times the loading of the 1D studies.

Data analysis In the following discussion five abbreviations are used to represent monosaccharide residues- HexNAc: N-acetylglucosamine (GlcNAc) or N-acetylgalactosamine (GalNAc); Hex: galactose or mannose; Fuc:

6 ACS Paragon Plus Environment

Page 6 of 44

Page 7 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Fucose; Neu5Ac: N-Acetylneuraminic acid; and Neu5Gc: N-glycolylneuraminic acid. A five-digit numerical code is used to denote glycan composition: 1st, the number of HexNAc residues; 2nd, the number of Hex; 3rd, the number of Fuc; 4th, the number of Neu5Ac; 5th, the number of Neu5GC residues. Code G45101, for example, represents a glycan composition of 4 HexNAc, 5 Hex, 1 Fuc, 0 Neu5Ac, and 1 Neu5Gc. The fragmentation nomenclature used in this work for oligosaccharides, based on the method of Domon and Costello, 23 for ions from fragmentation of the glycan, including: Y0, Y1, Y1F (Y1 + Fuc), and Y2 (Y1 + HexNAc) at the site of the glycosidic bond; and 0,2X0 a cross ring cleavage product.

MS1 data analysis software developed at NIST and used in the MSQC Pipeline24-26 was extended to identify glycopeptides. The identification was based on a list of masses of glycopeptides from a combination of all related peptide sequences containing up to 3 missed cleavages and all possible glycan compositions. This process involved the following steps. (1) Each tandem spectrum was examined for oxonium ions, including m/z 204.0867 (HexNAc), m/z 366.1395 (NAcHex1Hex1), and m/z 528.1923 (NAcHex2Hex1). Spectra having one or more of these ions were tentatively recognized as arising from glycopeptides. (2) Spectra passing Step (1) then were selected as candidates if their masses were equal to the sum of possible peptide sequences and all possible glycan masses to accuracy of at least 8 ppm. (3) Candidate spectra obtained in Step (2) were confirmed if all of their fragment ions corresponded to expected products and contained no major unexplained glycopeptide fragments (>20 % of base peak). Furthermore, for identifying sialylation or non-core fucosylation, fragment ions m/z 290.0870 (Neu5Gc) or m/z 512.1974 (Fuc) over 5 % of the base peak were required to be present, respectively. (4) Retention time constraints were also applied based on observed values. Glycopeptides were required to elute within 3 minutes of the median retention time of the most abundant glycans (G43100

7 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

and G44100) for each different sequence. Sialylated glycopeptides eluting before G43100 and G44100 were also rejected. Prior to final acceptance, spectra were manually examined at the lowest fragmentation energies for glycan composition and HCD high energy fragmentation to confirm its sequence. Around 30 % spectra having low levels of fragmentation and significant questionable peaks were rejected. (5) All acceptable glycopeptide spectra were annotated and incorporated into a spectral library for facilitating and improving the analysis of glycopeptides derived from IgG drugs. Abundances of glycans on each glycopeptide sequence are reported relative to the most abundant glycoforms and, except where noted, the most abundant charge state(s) for the sequence.

Results Workflow of N-linked glycopeptide analysis A schematic representation of glycopeptide analysis, from digestion to spectrum annotation, is shown in Figure 1.

Identification of N-linked glycopeptides by 1D LC-MS/MS Tryptic glycopeptides were first identified in a total of 45 1D LC-MS/MS runs at multiple collisional energy settings, digestion times and conditions. These large numbers of analyses were done primarily for the purpose of building a peptide spectral library and comparing digestion methods.24 Table 1 gives the numbers of peptide ion identifications for each of the glycopeptide sequences observed, summing over all charge states. Over half of the glycopeptides were detected in multiple charge states, with the dominant charge mostly observed as quadruply charged ions of the M1 glycopeptides and triply charge ions of the M0 glycopeptides as well as others shown in Column 6 of Table 1. A total of 1,700 tandem spectra are included in the final spectral library. All identified glycopeptides contained the well-known

8 ACS Paragon Plus Environment

Page 8 of 44

Page 9 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

glycosylated site in the CH2 domain of the heavy chain of IgG proteins (such as Asn300 in the case of NISTmAb), and were found and analyzed by the NIST in-house ProMS program.25-26 Glycopeptides of the four most commonly detected peptide sequences are examined in detail. These differ in the numbers of missed cleavages and are labeled “Mn” where n is the number of missed cleavages in each sequence, shown in the first column of Table 1. They are EEQYNSTYR (M0), TKPREEQYNSTYR (M1), EEQYNSTYRVVSVLTVLHQDWLNGKEYK (M2), and TKPREEQYNSTYRVVSVLTVLHQDWLNGKEYK (M3). These glycopeptides are noted in the base-peak chromatogram given in Figure 2A, where, in order of elution, M1 glycopeptides elute first at 26.5 min to 30.1 min, M0 at 29.5 min to 34.0 min., M3 at 68.0 min to 70.0 min, and M2 at 73.5 min to 75.5 min., respectively. Figures 2B to 2E display the extracted ion chromatograms of each of these peptide sequences for the major glycoform, G43100 (HexNAc4Hex3Fuc1, G0F). Figure 3 presents the averaged mass spectrum for the period 27 min to 34 min, glycoforms which include both M0 and M1 glycoforms, with some of the major ions observed in multiple charge states.

A total of 195 glycosylated peptide ions at different charge states were identified in the 1D studies. Table 2 shows results for these ions in their higher, generally predominant charge states at 2-hour digestion where M1 glycoforms reached maximum abundances. In all cases, the mass accuracy of the identified glycopeptides was within 3 ppm. In Table 2, M0 glycoforms are compared to those of M1, and M2 glycoforms are compared to those of M3. At 2-hour digestion, the M1 peptide produced the largest number of identified glycoforms (30 different glycans) with its two major glycoforms (G43100 and G44100) detected in 2+, 3+ and 4+ charge states, whereas the M0 peptide produced 19 glycoforms. There were 11 minor M1 glycoforms that were not detected in the M0 peptides. For abundant M1 peptides, charge state 4+ was typically 3 to 10 times greater than charge 3+, while for M0 charge 3+ was 1.5 to 3 times greater than charge 2+. Since the less abundant, lower charge states could arise from both charge partitioning and in-source fragmentation (see later discussion of in-source effects), only the higher

9 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 44

charges states are used for quantification. Relative abundances of major glycoforms in M0 and M1 were generally consistent, with a standard deviation of 0.17, though differences tended to increase with decreasing abundance. While differences in charge partitioning for different glycans was possible, it was too small to be clearly observed, so no corrections were made. Note that the M1 glycoform G32100, observed in 60 % of the runs and a major fragment in the dissociation of the major glycopeptide G43100 in charge state 4, was found only in charge state 3, suggesting that it arose solely from in-source fragmentation (see later discussion). The HPLC retention times (RT) for glycopeptides in Table 2 show that glycopeptides with the same peptide backbone elute within a 2 to 3 minute period, shown in Figure 2A, with sialylated glycopeptides eluting after those without sialylation and the aglycosylated peptide eluting between them. The order of elution in each structurally-related group was found to vary inversely with the number of monosaccharides in the glycan sequence. For example, the M1 glycopeptide series G47100, G46100, G45100, G44100, and G43100 elute at (27.32, 27.35, 27.44, 27.52 to 27.63) min, giving an average separation of approximately 5 seconds. This observation agrees with Wang et al, 27 who attributed this retention trend to the hydrophilic nature of the glycan moieties. Overall, our observations indicated that the LC elution profile of glycopeptides was mainly influenced by the peptide moiety as evidenced by the elution relation between glycosylated and aglycosylated species. Nevertheless, the minor contribution from the glycan moiety helps resolve different glycoforms with reasonable confidence.

For the runs in Table 2, the aglycosylated forms were identified in over 20 of 45 runs with a median abundance of approximately 0.006 relative to the largest glycoform (M1+G43100). The M0 form of G00000 was found to be consistently 4-fold higher than the M1 form in our experiments, which was due to the lower abundance of the largest glycoform of M0 peptide, revealing a greater effect of ion suppression on M0 glycopeptides than M1 glycopeptides (see further discussion in the next paragraph). This has been improved in 2D shown in Table 3. Unlike the glycosylated forms, the most intense ions of

10 ACS Paragon Plus Environment

Page 11 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

M1+G00000 were triply charged ions and M0+G00000 were doubly charged ions, one charge lower comparing to their corresponding glycosylated counterparts. This observation obtained in the present 1D and 2D (Table 3) studies is in agreement with previous reports 21, 28-29, and we hypothesize that this extra proton may be assigned to the glycan moieties.

Relative abundances of each of the four groups of glycopeptides, M0 to M3, are shown in Figure 4 at different digestion times. This figure shows that the M1 peptide produced abundant glycoforms at all three times, and the M0 group was relatively significant after 18 hour digestion. The stability of M1 was consistent with the presence of two glutamic acid residues at the C-terminal side of the missed-cleavage arginine. 24, 30 M2 glycopeptides were never abundant while M3 was significant only at short digestion times. However, M2 and M3 peptides generated few identifiable glycoforms (The bottom of Table 2), due at least partly to the dilution of its signal over multiple charge states and isotopes. As evidenced from Table 2 and other previous reports 21, 28, the amino acid sequence of the peptide does affect glycoform intensities. While both M1 and M0 peptide moieties have shown to be suitable for glycoform identification and quantification, the M1 peptide sequence with two more available sites of protonation appeared to have the advantage over M0 in producing stronger MS signals, which led to better detection and fragmentation of more minor glycoforms.

Determination of N-linked glycosylation profile based on 1D and 2D LC-MS/MS of glycopeptides 1D LC-MS/MS. Thirty N-linked glycan structures were identified based on 195 glycopeptide ions in the 1D LC-MS/MS runs. Using the glycan classification scheme proposed by Prien et al, 31 glycans are classified in eight groups (Table 4) for easy comparison. The three most abundant Fc N-glycoforms are G44100, G43100, and G45100 (Group 1), accounting for 87 % of the glycosylation. Also detected are minor levels of glycans with high mannose (Group 4), hybrid (Group 2), and terminal alpha-linked Gal structures (Group

11 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 44

6). Low abundance sialic acid Neu5Gc was found on 9 out of 30 identified glycans along with other trace level glycan structures.

2D LC-MS/MS. A total of 60 different glycan compositions were determined from 247 glycopeptide ions in the 2D runs, including all 30 glycan structures in the 1D study. Supplementary Figures S1 (b) and (c) show the LC-MS peptide mapping of NISTmAb from Fractions 17 and 18 of a 2D experiment, respectively, with various M1 glycoforms eluting at 28 to 34 min and M0 glycoforms eluting at 34 to 39 min. The longer 2D retention time period (4 to 5 min) compared to that of 1D (2 to 3 min) was a result of additional glycoforms identified in 2D including difucosylated and disialylated species. The LC-MS peak profile and MS2 fragmentation pattern of each individual glycan structure were manually examined and confirmed. Table 3 compares M0 and M1 glycoforms from the 2D studies with all proposed structures shown in Supplementary Table S1. 45 glycoforms were identified in both M0 and M1, while 11, including five difucosylated glycan types, were found only in M1 and 4 more, including three disialylated glycan types, found only in M0. Abundances of corresponding 1D glycoforms, taken from Table 2, are entered in the third column, showing that the range of abundances in the 2D studies is approximately 10 to 50 times greater than in the 1D studies (Supplementary Figure S2). This greater depth is a result of 20-fold concatenated fractionation as well as five times higher loading in the 2D studies.

The well-known lower ionization efficiency of glycopeptides in LC-ESI-MS relative to non-glycosylated peptides makes the quantitative analysis very challenging. 32-33 Furthermore, abundance measurements of low intensity ions in 1D studies are challenging due to possible run-to-run variation in ion suppression and abundance threshold effects, including variations in detectable isotopic peaks over the elution profile. In 2D LC, computed abundances are even more variable since peptides can elute in multiple fractions and may be subject to other losses in the high pH first LC dimension. In the present 2D approach where each

12 ACS Paragon Plus Environment

Page 13 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

initial fraction was collected over 1/80th of the peptide elution period and subsequently combined into final 20 fractions, depending on abundance, some major glycoforms were observed in most fractions while less abundant glycoforms were often identified in only one. For M1 glycoforms, a large proportion of the abundance of M1 glycoforms were found in two adjacent fractions (17 and 18), among which over 96 % of sialylation abundance occurred in 17 and 60 to 80 % of nonsialylated abundance in 18. The most abundant M0 glycoforms were in four contiguous fractions (13, 14, 15, and 16), among which 98% of sialylation abundance in 13 and 14 and over 80 % of nonsialylated abundance in 15 and 16. Total abundance for each glycopeptide ion was estimated by summing over these fractions. The percentage of individual glycoform abundance found in both 1D and 2D experiments are shown in a log-log plot in Figure 5 (non-sialylated glycans in blue and sialylated glycans in green). While trends in abundances were similar in the two studies, values for the 2D study showed significant signal intensity enhancement comparing to those in the 1D study. Minor asialylated glycoforms were typically 1.6 fold higher in the 2D studies relative to the predominant glycoforms, while the sialylated glycoforms were 3.5 times higher. The advantage conferred by performing 2D LC-MS/MS analysis (fractionation following glycopeptide separation with RPHPLC at pH 10) was greater separation of glycopeptides from peptides and sialyated from asialyated species (Supplementary Figures S1 (a) to (c)). Consequently, even though the higher sample load used in the 2D separation compromised chromatogram resolution, it was possible to increase the analytical dynamic range, and detect greater numbers of glycopeptide species from the high resolution mass data.

The N-glycosylation site occupancy was determined from the abundances of higher charge states of the aglycosylated and glycosylated M1 peptides obtained in the 2D studies. The 3+ M1+G00000 signal relative to the total abundance of the glycosylation site was approximately 0.39 %, which is consistent with previous reports that the occupancy level of the recombinant IgG1 monoclonal antibody is over 99.0 %. 20, 31

13 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 44

Identification of glycopeptides using HCD fragmentation at multiple collision energies The utility of examining peptide fragmentation at various collision energies has been explored by the Yates group and others as a means of confirming the identity of peptide sequences and modifications.

34-35

In

this work, we employed similar methods to glycopeptides. Spectra were acquired over a wide range of HCD energy at NCE values: 12, 16, 20, 24, 28, 32, 36, and 40 to aid the determination of glycan structures as well to provide various spectra for use in spectral libraries.

Three illustrations of how fragmentation analysis is used to show consistency with the expected glycan structures are presented in Figures 6 through 8. Note that such fragmentation does not generally prove the uniqueness of the structural assignment. The first example, shown in Figure 6, presents general glycosidic cleavage patterns found in the energy-dependent analysis using the charge 4+ state of M1+G44100 (HexNAc4Hex4Fuc1). At the lowest energy NCE 12, Figure 6A shows that two primary fragment ions at m/z 971.765 and m/z 1,025.776 in the high m/z range are due to the loss of a singly charged HexNAc1Hex1 and HexNAc1 from non-reducing termini, respectively. These were fragmented at the labile glycosidic bond between mannose and HexNAc residues and would be expected to be the primary insource fragments of this glycopeptide. Other fragment ions in this range correspond to subsequent sugar losses. In addition, two intense, singly charged oxonium ions at m/z 366.137 and m/z 204.085 are the complement of the two major fragments respectively, which are commonly monitored to indicate the presence of glycopeptides. Together, these ions provide majority of glycosidic cleavage products (except Y1, Y1F, and Y2) representing the structure of the N-linked glycopeptide M1 + G44100. It is important to note that in this, and all other glycopeptide fragments from the M1 4+ charge state that no significant neutral sugar loss fragments were observed, indicating significant in-source fragment ions would have a lower charge state than their precursor ion. This is equally valid for M0 charge state +3 ions. At NCE 20

14 ACS Paragon Plus Environment

Page 15 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

(Figure 6B), two doubly-charged fragment ions become dominant, arising from the loss of an additional oligosaccharide ion from the initial +3 charge state ions. In addition, minor glycopeptide ions from more extensive glycan loss (Y1, Y1F and Y2) can be observed in the spectrum. Figure 6C shows an increased dominance of more of these highly fragmented glycan ions at NCE 28. At higher energy, Figure 6D shows the presence of some y and b peptide fragments, at NCE 36, thereby confirming peptide sequence assignment. At NCE 40, Figure 6E shows dominant

0,2

X0 and Y1 with no remaining characteristic

glycopeptide peaks. Note that under higher energies, oxonium peaks such as m/z 204 or other smaller m/z oxonium peaks would be the most abundant peaks if they were recorded in these spectra. All of these and other trends can be observed in the glycopeptide library described later.

Figure 7 shows an example of an unusual glycan structure having a Neu5Gc residue and an outer arm fucose residue. The spectrum generated at NCE 16 was identified as a charge 4 M1+G45201 at m/z 974.1545. Three peaks confirm the presence of sialylation, with the m/z 308 corresponding to the Neu5Gc residue, the m/z 290 to its water loss, and the m/z 673 to the fragment containing the sialylated branch, HexNAc1Hex1Neu5Gc1. In the case of the outer arm fucose, this is demonstrated by a relatively large peak m/z 512 (22 % of the base peak) corresponding to HexNAc1Hex1Fuc1 at the nonreducing end, while the peak Y1F confirms the presence of a core-fucose. Our library spectra showed that this peak was observed at abundances 5 % to 50 % of the base peak with median value of 17 % using NCE values 12, 16, 20, 24 and 28 where two or more fucose residues were present. Note that it was also found involving in the Fuc transfer of core monofucosylated glycopeptides at very low abundances (see HCD Glycan Rearrangement in Discussion).

The NS0 cell line used for making the IgG under study is known to express nonreducing alpha-Gal residues, which can elicit an immunogenic response6. Figure 8 shows the MS2 spectrum of the charge 4 M1+G47100

15 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 44

(m/z 941.8939) confirming the structure feature of the α-linked galactose. A set of dominant peaks arising from the cleavage of one or two terminal Gal-Gal-GlcNAc branches were clearly exhibited in the spectrum: m/z 528, 1079, 1274, and 1355 ions. Furthermore, the m/z 528 ion abundance of this species is about 70 % of the base peak, as compared to below 10 % in the major glycoform M1+G44100 spectra shown in Figures 6 and 7. These results are consistent with previous reports concerning terminal alpha-Gal residues. 36-37

Discussion Comparison with released glycan studies. Released glycans from NISTmAb were previously characterized using UPLC HILIC separation with 2-aminobenzoic acid (2AA)- and 2-aminobenzamide (2-AB)-labeled fluorescence and online mass spectrometry detection.31 Figure 9A shows a Venn diagram comparing the 32 unique glycan compositions derived from 42 peak assignments, including several isomers, identified by that study (Table 1 and Table 5 of Reference 31) with the present results of 1D and 2D experiments. The UPLC-fluorescence and 1D studies had 26 glycan composition overlaps, while additional four monoantennary glycans including three with a Neu5Gc residue were seen only in the present study while five higher mass triantennary complex types including three with a Neu5Gc residue were identified only in the labeled glycan studies. Figure 7B shows that abundances of major glycans in the present 1D LC studies were in a good agreement with the released glycan studies with a standard deviation of 0.14. However, lower level of sialylated and other minor glycans from our 1D studies were also shown in this figure, indicating that their MS signals appear to be under-represented in 1D LC-MS when compared to those of the labeled glycans. This is consistent with the aforementioned abundance difference between our 1D and 2D studies as well as the glycoanalysis using synthetic peptides and glycopeptides.28 The 2D studies identified almost all glycans from both released glycan and 1D studies in addition to 25 additional low abundance glycoforms, with comparison details shown in Table 4. In total, we identified 60 unique glycan

16 ACS Paragon Plus Environment

Page 17 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

compositions, almost more than double earlier reports for individual recombinant monoclonal IgG1 antibodies expressed in CHO, NS0 and other cells.20-21, 31, 38-45 On the one hand, this comparison directly illustrates the added depth of the 2D studies – high pH RPLC with fraction concatenation of complex glycopeptides combined with higher loading capacities led to the extended abundance range and glycoanalysis coverage.46 On the other hand, it revealed cell line glycoform specificity with higher extents of galactosylation and sialylation, including non-human Alpha-Gal and Neu5GC residues, occurring in mAbs expressed in mouse (Table 4) than in CHO cell lines (see this released NIST glycopeptide library for detail).

To summarize the glycosylation pattern observed in this protein, the glycan structures from the present 1D and 2D studies that described in the previous section were also compared with the released glycans (Table 4) using a similar table presented in Reference 31. The number and percentage of glycosylated groups by three methods showed that the IgG-FC N-linked glycosylation profile of NISTmAb was predominated by asialylated, biantennary core fucosylated complex glycans (Group 1), which are wellknown main glycoforms observed in IgG1 mAbs produced by all major mammalian cell expression systems.10, 20, 47-48 A small percentage of hybrid N-glycans (Group 2) and trace levels of the glycans with triantennary and high-mannose structures (Group 3 and 4) were also observed in all three methods. Additionally, significantly higher amounts of alpha-linked galactosylation (8.19%, Group 6) and sialylation (6.25%, group 5) were observed in the 2D study compared to the 1D LC-MS and released glycan methods. Both are typical for mAbs expressed in murine cell lines NS0 and SP2/0.

10, 36, 38, 48-49

The energy

dependence of HCD on glycopeptide fragmentation was used to confirm the presence of these features demonstrated in Figures 7 and 8. Compared to the glycan distribution observed in the 1D and released glycan studies, a greater variety of low abundance glycan structures were detected in the 2D study. The 2D LC-MS/MS approach detected more higher-mass glycans consisting of lower abundance bi- and tri-

17 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 44

antennary saccharides carrying terminal Neu5Gc (groups 5), Alpha-Gal (Group 6), or outer-arm fucose (Group 8) residues. The numbers of sialylated, alpha-galactosylated, and out-arm fucosylated glycans identified in 2D are 1.7 times, 3.2 times and 6 times of that in the 1D studies, respectively. It is worth noting that eight isomeric glycan structures arising from G44100, G35100, G45100, G54100, G55100 and G44101 were well separated by the labeled glycan method whereas they were unable to be separated by the current study using HPLC of glycopeptides.

In-source fragmentation of the glycopeptides. The formation of smaller glycopeptides from the in-source fragmentation of larger glycopeptides has been a concern for both the identification and quantification of naturally occurring glycans. 20-21, 44 In-source fragments can be identified by both their co-elution with their precursor ion and their presence as peaks in the low energy fragmentation of their precursor. The relatively small differences in elution of similar glycoforms make retention values an equivocal measure, whereas their identification in lower NCE spectra (NCE 10, 12, and 16 in this case) is definitive evidence of in-source vs. native glycopeptides. Specifically, a thorough examination of the higher charge states of all M0 and M1 glycoform spectra (charge 3+ for M0 and 4+ for M1) shows that most significant in-source fragments of the major glycopeptides arise from loss of charged fragments from the precursor and therefore invariably generate lower charge state products. Although some minor product ions (20 fractions 2: C18 RPLC pH 2

HCD NCE energy settings: 12, 16, 20, 24, 28, 32, 36, 40 MS2: Consistent with peptide and glycan moiety Falls within expected retention time range of glycopeptide sequence

Abundance Determination glycoforms Spectral Library Creation

36 ACS Paragon Plus Environment

Create consensus spectra; annotate peaks and record retention and occurrences

Page 37 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 2

37 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3.

Figure 4.

38 ACS Paragon Plus Environment

Page 38 of 44

Page 39 of 44

Figure 5. Non-Sialylated

Log10 2D abundance

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Sialylated

4.0 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0 0.0

0.5

1.0

1.5

2.0

2.5

3.0

Log10 1D abundance

Figure 6.

39 ACS Paragon Plus Environment

3.5

4.0

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

40 ACS Paragon Plus Environment

Page 40 of 44

Page 41 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 7.

Figure 8.

41 ACS Paragon Plus Environment

Journal of Proteome Research

Figure 9A.

1D LC-MS/MS

4 26 2D LC-MS/MS Fractionation

Released

5

Glycans

25

Figure 9B.

Log10 %RA of Glycoforms from 1D studies

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 42 of 44

3.5 3.0

Standard deviation: 0.14

2.5 2.0 1.5 1.0 0.5 0.0 0.0

0.5

1.0

1.5

2.0

2.5

3.0

Log10 %RA of Glycans from Released glycans

42 ACS Paragon Plus Environment

3.5

Page 43 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 10.

43 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

For TOC only

44 ACS Paragon Plus Environment

Page 44 of 44