Anal. Chem. 2009, 81, 3109–3118
Targeted N-Linked Glycosylation Analysis of H5N1 Influenza Hemagglutinin by Selective Sample Preparation and Liquid Chromatography/Tandem Mass Spectrometry Thomas A. Blake, Tracie L. Williams, James L. Pirkle, and John R. Barr* Biological Mass Spectrometry Laboratory, Division of Laboratory Sciences, National Center for Environmental Health, Centers for Disease Control and Prevention, 4770 Buford Highway, MS F-50, Atlanta, Georgia 30341 Using liquid chromatography/tandem mass spectrometry (LC/MS/MS) analysis of deglycosylated and intact glycopeptides from tryptic digests of whole influenza virus, we determined that the six predicted N-linked glycosylation sites within the N-terminal ectodomain of hemagglutinin (HA) from three selected H5N1 strains are occupied. The use of selective sample preparation strategies, including solid-phase extraction (SPE) of glycopeptides via hydrazide capture chemistry as well as hydrophilic interaction liquid chromatography (HILIC), sufficiently reduced sample complexity to allow determination of occupied glycosylation sites. The specific amino acid sequence of the tryptic glycopeptides for the identified sites varied slightly among strains, but the overall locations of the occupied glycosylation sites were conserved in the protein sequence. We used this knowledge of glycosylation site occupation to examine the glycans attached to these occupied sites on HA for a reassortant H5N1 strain grown in embryonated chicken eggs. By applying mass spectrometry-based methodologies for examining glycosylation to the study of influenza virus proteins, we can better understand the effect that this post-translational modification has upon the virulence and antigenicity of emerging strains. Hemagglutinin (HA) is the primary, regulated antigen in commercial vaccines against seasonal influenza; the production of antibodies neutralizing to HA is the main mechanism of defense against infection. HA is a membrane-bound glycoprotein on the virus surface which initially binds to cell receptors1 and subsequently mediates fusion with the cell membrane following a pHinduced conformation change during endocytosis.2 N-Linked glycosylation is a known post-translational modification (PTM) of the asparagine (Asn) side chains in HA, and to date, no observation of O-linked glycosylation has been reported. HA glycosylation is thought to affect the virulence of an influenza * To whom correspondence should be addressed. E-mail:
[email protected]. Fax: 770-488-0509. (1) Skehel, J. J.; Wiley, D. C. Annu. Rev. Biochem. 2000, 69, 531–569. (2) Bullough, P. A.; Hughson, F. M.; Skehel, J. J.; Wiley, D. C. Nature 1994, 371, 37–43. 10.1021/ac900095h Not subject to U.S. Copyright. Publ. 2009 Am. Chem. Soc. Published on Web 03/16/2009
strain by attenuating receptor binding,3-7 masking antigenic regions of the protein,8-10 or impeding the activation of the protein precursor HA0 via its cleavage into the disulfide-linked subunits HA1 and HA2.11-15 The number of potential N-linked glycosylation sites on HA depends upon the strain’s specific amino acid (AA) sequence, and the complexity of the glycans attached is determined by host cell conditions during virus propagation.16-19 Analysis of HA glycosylation has previously been reported through use of lectin affinity chromatography,18,20 radioactive labeling with gel electrophoresis,20 gas chromatography,21 methylation with 2-D nuclear magnetic resonance (NMR) analysis,22 and serial exoglycosidase treatment with matrix-assisted laser desorption/ionization (MALDI) mass spectrometry (MS) analy(3) Gambaryan, A. S.; Marinina, V. P.; Tuzikov, A. B.; Bovin, N. V.; Rudneva, I. A.; Sinitsyn, B. V.; Shilov, A. A.; Matrosovich, M. N. Virology 1998, 247, 170–177. (4) Matrosovich, M.; Zhou, N.; Kawaoka, Y.; Webster, R. J. Virol. 1999, 73, 1146–1155. (5) Ohuchi, M.; Ohuchi, R.; Feldmann, A.; Klenk, H. D. J. Virol. 1997, 71, 8377–8384. (6) Mishin, V. P.; Novikov, D.; Hayden, F. G.; Gubareva, L. V. J. Virol. 2005, 79, 12416–12424. (7) Wagner, R.; Wolff, T.; Herwig, A.; Pleschka, S.; Klenk, H.-D. J. Virol. 2000, 74, 6316–6323. (8) Munk, K.; Pritzer, E.; Kretzschmar, E.; Gutte, B.; Garten, W.; Klenk, H.-D. Glycobiology 1992, 2, 233–240. (9) Skehel, J. J.; Stevens, D. J.; Daniels, R. S.; Douglas, A. R.; Knossow, M.; Wilson, I. A.; Wiley, D. C. Proc. Natl. Acad. Sci. U.S.A. 1984, 81, 1779– 1783. (10) Abe, Y.; Takashita, E.; Sugawara, K.; Matsuzaki, Y.; Muraki, Y.; Hongo, S. J. Virol. 2004, 78, 9605–9611. (11) Klenk, H.-D.; Rott, R.; Orlich, M.; Blo ¨dorn, J. Virology 1975, 68, 426–439. (12) Lazarowitz, S. G.; Choppin, P. W. Virology 1975, 68, 440–454. (13) Bosch, F. X.; Garten, W.; Klenk, H. D.; Rott, R. Virology 1981, 113, 725– 735. (14) Deshpande, K. L.; Fried, V. A.; Ando, M.; Webster, R. G. Proc. Natl. Acad. Sci. U.S.A. 1987, 84, 36–40. (15) Ohuchi, M.; Orlich, M.; Ohuchi, R.; Simpson, B. E. J.; Garten, W.; Klenk, H.-D.; Rott, R. Virology 1989, 168, 274–280. (16) Deom, C. M.; Schulze, I. T. J. Biol. Chem. 1985, 260, 14771–14774. (17) Wagner, R.; Geyer, H.; Geyer, R.; Klenk, H.-D. J. Virol. 1996, 70, 4103– 4109. (18) Mir-Shekari, S. Y.; Ashford, D. A.; Harvey, D. J.; Dwek, R. A.; Schulze, I. T. J. Biol. Chem. 1997, 272, 4027–4036. (19) Schulze, I. T. J. Infect. Dis. 1997, 176, S24–S28. (20) Keil, W.; Niemann, H.; Schwarz, R. T.; Klenk, H. D. Virology 1984, 133, 77–91. (21) Basak, S.; Pritchard, D. G.; Bhown, A. S.; Compans, R. W. J. Virol. 1981, 37, 549–558. (22) Keil, W.; Geyer, R.; Dabrowski, J.; Dabrowski, U.; Niemann, H. v.; Stirml, S.; Klenk, H.-D. EMBO J. 1985, 4, 2711–2720.
Analytical Chemistry, Vol. 81, No. 8, April 15, 2009
3109
sis.18 Nevertheless, genome-based approaches are typically utilized when investigating the effects of changes in glycosylation due to the greater simplicity of these measurements.23-25 Such studies are based upon searching for the N-linked glycosylation consensus sequence (Asn-Xxx-Ser/Thr where Xxx can be any amino acid except proline) in the AA sequence predicted by the viral RNA. By relying on genome data alone, the assumption is made that all N-linked glycosylation sites are occupied and that O-linked glycosylation does not occur. Unfortunately, the AA sequence cannot be relied upon as the sole determinant of glycosylation. It is, therefore, necessary to measure the differences in glycosylation site occupation and the glycans residing at those sites in order to link changes in HA glycosylation to differences in virulence and antigenicity among strains or virus growth conditions. Investigation of glycopeptides by MS is complicated by several factors which, along with common MS-based analysis approaches, have been discussed in the literature.26-28 First, the ionization efficiency of intact glycopeptides is decreased over nonglycosylated peptides. Second, there may be a wide variety of glycans attached at one particular glycosylation site (multiple glycoforms). This inherent heterogeneity dilutes the analyte signal, further complicating MS-based analysis. A third complication arises in the tandem mass spectrometry (MS/MS) analysis of intact glycopeptides. In a typical MS/MS experiment, collision-induced dissociation (CID) primarily yields glycosidic bond cleavage with varying levels of peptide fragmentation. The reduced signal intensity of these precursor and product ions decreases the likelihood of identifying the bare peptide molecular weight from the MS/MS spectrum. Although these factors increase the difficulty of this type of investigation, selective sample preparation techniques and targeted precursor ion selection assist in overcoming these hurdles. Additionally, significant effort is being placed into the development of alternative ion fragmentation schemes27,29-32 and automated data analysis tools33-37 to further mature this rapidly developing field. Lectin affinity chromatography has recently been reported as an effective technique for separating hemagglutinin from influenza (23) Inkster, M. D.; Hinshaw, V. S.; Schulze, I. T. J. Virol. 1993, 67, 7436– 7443. (24) Zhang, M.; Gaschen, B.; Blay, W.; Foley, B.; Haigwood, N.; Kuiken, C.; Korber, B. Glycobiology 2004, 14, 1229–1246. (25) Vigerust, D. J.; Ulett, K. B.; Boyd, K. L.; Madsen, J.; Hawgood, S.; McCullers, J. A. J. Virol. 2007, 81, 8593–8600. (26) Morelle, W.; Canis, K.; Chirat, F.; Faid, V.; Michalski, J.-C. Proteomics 2006, 6, 3993–4015. (27) Wuhrer, M.; Catalina, M. I.; Deelder, A. M.; Hokke, C. H. J. Chromatogr., B 2007, 849, 115–128. (28) Dalpathado, D. S.; Desaire, H. Analyst 2008, 133, 731–738. (29) Hogan, J. M.; Pitteri, S. J.; Chrisman, P. A.; McLuckey, S. A. J. Proteome Res. 2005, 4, 628–632. (30) Adamson, J. T.; Hakansson, K. J. Proteome Res. 2006, 5, 493–501. (31) Alley, W. R.; Mechref, Y.; Novotny, M. V. Rapid Commun. Mass Spectrom. 2009, 23, 161–170. (32) Wu, S.-L.; Huhmer, A. F. R.; Hao, Z.; Karger, B. L. J. Proteome Res. 2007, 6, 4230–4244. (33) Ceroni, A.; Maass, K.; Geyer, H.; Geyer, R.; Dell, A.; Haslam, S. M. J. Proteome Res. 2008, 7, 1650–1659. (34) Go, E. P.; Rebecchi, K. R.; Dalpathado, D. S.; Bandu, M. L.; Zhang, Y.; Desaire, H. Anal. Chem. 2007, 79, 1708–1713. (35) Goldberg, D.; Bern, M.; Parry, S.; Sutton-Smith, M.; Panico, M.; Morris, H. R.; Dell, A. J. Proteome Res. 2007, 6, 3995–4005. (36) Irungu, J.; Go, E. P.; Dalpathado, D. S.; Desaire, H. Anal. Chem. 2007, 79, 3065–3074. (37) Ozohanics, O.; Krenyacz, J.; Luda´nyi, K.; Pollreisz, F.; Ve´key, K.; Drahos, L. Rapid Commun. Mass Spectrom. 2008, 22, 3245–3254.
3110
Analytical Chemistry, Vol. 81, No. 8, April 15, 2009
preparations to increase yields of the glycoprotein during vaccine manufacturing.38,39 To investigate HA glycosylation by liquid chromatography (LC)/MS/MS, the glycoprotein is digested with trypsin to produce a mixture of glycopeptides and nonglycosylated peptides that can be ionized, fragmented, and analyzed more easily than the intact glycoprotein itself. Due to the selective affinity of lectins for specific sugar isomers and linkages, a mixture of agarose-bound lectins may be required to perform an affinity separation of all glycopeptides from all nonglycosylated peptides in an HA tryptic digest. Additionally, direct coupling of lectin affinity chromatography to MS is not typically practiced due to the concentration of buffers and elution compounds required for binding and release of the glycopeptides although recent work has shown that this barrier can be overcome using silica-based lectin microcolumns.40 Solid-phase extraction (SPE) of glycopeptides via hydrazide capture41,42 is an alternative approach since the separation is based upon a fundamental chemical difference between glycans and peptides and therefore does not discriminate between classes of glycosylation (N-linked vs O-linked) or among sugar and linkage types, although any information on the glycan portion of the captured glycopeptides is lost. Once hydrazide SPE is used to aid in the determination of glycosite occupation and identification of the peptide portion of the glycopeptides, a twostage separation using a more broadly selective strategy such as hydrophilic interaction liquid chromatography (HILIC) can be used to preconcentrate intact glycopeptides prior to reversedphase LC to reduced sample complexity and improve analysis of the glycans that are attached to those glycosylation sites.43 By utilizing the hydrazide SPE technique in combination with LC/MS/MS analysis, we have identified the N-linked glycosylation sites within the N-terminal ectodomain of HA that are occupied for three selected H5N1 strains. Although the specific AA sequence of the tryptic glycopeptides for the identified sites varies slightly among strains, the locations of the occupied sites in the overall protein sequence are conserved. We have used this information on glycosylation site location and occupation to interpret intact glycopeptide LC/MS/MS data for a reassortant H5N1 strain grown in embryonated chicken eggs. This combined approach is a first step toward the development of LC/MS/MS methods for determining extent of glycosylation of influenza HA which may be linked to changes in virulence and antigenicity. Such an approach may be useful for deciding which strains should be included in seasonal vaccines, how strains for the vaccines should be grown, and which strains might be implicated in a future influenza pandemic. METHODS Virus Samples. A monovalent (i.e., single strain) subvirion H5N1 vaccine (research purposes only) was obtained through the (38) Opitz, L.; Salaklang, J.; Buttner, H.; Reichl, U.; Wolff, M. W. Vaccine 2007, 25, 939–947. (39) Opitz, L.; Zimmermann, A.; Lehmann, S.; Genzel, Y.; Lu ¨ bben, H.; Reichl, U.; Wolff, M. W. J. Virol. Methods 2008, 154, 61–68. (40) Madera, M.; Mechref, Y.; Novotny, M. V. Anal. Chem. 2005, 77, 4081– 4090. (41) Zhang, H.; Li, X.-j.; Martin, D. B.; Aebersold, R. Nat. Biotechnol. 2003, 21, 660–666. (42) Tian, Y.; Zhou, Y.; Elliott, S.; Aebersold, R.; Zhang, H. Nat. Protoc. 2007, 2, 334–339. (43) Zhang, Y.; Go, E. P.; Desaire, H. Anal. Chem. 2008, 80, 3144–3158.
National Institutes of Health’s (NIH) Biodefense and Emerging Infections Research Resources Repository, National Institute of Allergy and Infectious Diseases (NIAID) (rgA/Vietnam/1203/ 2004, NR-4143). Formalin-inactivated whole-virus reagents representing three different viral strains (H5N1 reference antigens no. 50: rgA/Vietnam/1203/2004; no. 59: A/Indonesia/05/2005; and no. 62: A/Bar-headed goose/Qinghai Lake/1A/05) were obtained from the Center for Biologics Evaluation and Research (CBER), Food and Drug Administration. β-Propiolactone (BPL)-inactivated whole virus (Ind05/PR8-RG2) grown in embryonated chicken eggs was obtained from the National Center for Immunization and Respiratory Diseases (NCIRD), Centers for Disease Control and Prevention (CDC). This reassortant virus includes the HA and neuraminidase (NA) from an H5N1 strain (A/Indonesia/05/2005) with all other viral proteins belonging to a nonpathogenic H1N1 strain (A/Puerto Rico/8/34). Note that in the reassortant virus the HA1-HA2 cleavage site has been modified so that there are no arginine or lysine residues present to ensure complete removal of pathogenicity. Enzymatic Digestion of Split Virus Vaccine and Whole Inactivated Virus Reagents. Virus and vaccine samples were buffer-exchanged using 10 kDa molecular weight cutoff (MWCO) spin-filters (Amicon Ultra-4, Millipore, Billerica, MA) prior to proteolysis. Buffer-exchanged samples were resuspended with 0.1% Rapigest (Waters Corporation, Milford, MA) in 50 mM ammonium bicarbonate (aq) to ensure complete solubilization. Samples were then heated at 100 °C for 10 min to aid in protein unfolding. Dithiothreitol (DTT) (aq) (Sigma-Aldrich, St. Louis, MO) was added to reduce any disulfide bonds present (10 mM DTT final concentration, 1 h, room temperature) followed by addition of iodoacetamide (IAM) (aq) (Sigma-Aldrich, St. Louis, MO) to alkylate the reduced cysteines residues (13 mM IAM final concentration, 1 h in dark, room temperature). Additional DTT was added to quench any unreacted IAM (23 mM DTT total final concentration). Typically, 50 µL (20 µg; ∼860 pmol) sequencing grade modified trypsin (Promega Corp., Madison, WI) was added followed by incubation at 37 °C for 4-6 h to ensure complete protein digestion. Assuming the molecular weight of HA is approximately 76 kDa, 0.59-4.11 nmol of HA was used in each digestion reaction. When using Rapigest, it is critical that the surfactant is properly degraded to remove interference with subsequent chromatographic separations.44 Samples were incubated for a maximum of 1 h at room temperature following addition of HCl (final concentration 40-50 mM HCl; pH < 2.0) to ensure complete degradation of any Rapigest present. Samples were then centrifuged to remove the insoluble degradation product although the precipitate did not always collect on the bottom of the sample tube. The presence of the precipitate was not observed to interfere with downstream C18 SPE cleanup. Isolation of Glycopeptides for Glycosylation Site Identification. Intact glycopeptides were isolated via the reaction of hydrazide resin with oxidized glycans following a published protocol.41,42 Briefly, excess DTT, IAM, and degradation products of Rapigest were removed from the HA tryptic digest mixture using C18 SPE (Sep-Pak Vac 1 cm3 [100 mg] C18 cartridges, (44) Yu, Y. Q.; Gilar, M.; Lee, P. J.; Bouvier, E. S.; Gebler, J. C. Anal. Chem. 2003, 75, 6023–6028.
Waters Corp., Milford, MA). The cartridge was washed with 3 × 1 mL of 0.1% trifluoroacetic acid (TFA) (Thermo Fisher Scientific, Inc., Rockford, IL) in 50% acetonitrile (HPLC grade, Honeywell Burdick & Jackson, Morristown, NJ) followed by equilibration with 2 × 1 mL of 0.1% TFA in water (HPLC grade, Honeywell Burdick & Jackson, Morristown, NJ). Sample was loaded onto the cartridge along with all vial washes to minimize sample loss. The loaded cartridge was washed with 3 × 1 mL of 0.1% TFA(aq). Sample was eluted with 2 × 200 µL of 0.1% TFA in 50% acetonitrile. Freshly made NaIO4(aq) (CAUTION, strong oxidizer) (Bio-Rad Laboratories, Hercules, CA) was added to the sample (12 mM final concentration, 1 h in dark, 4 °C) to oxidize the glycan portions of any glycopeptides present. A second round of C18 SPE was utilized to remove the oxidizing agent. Samples were eluted with 3 × 200 µL of 0.1% TFA in 80% acetonitrile directly into a tube containing 50 µL of 50% slurry of terminal hydrazide (Hz) modified agarose gel (AffiGel Hz, Bio-Rad Laboratories, Hercules, CA) that had been washed using 1 mL of H2O. The coupling reaction between the oxidized glycopeptides and the Hz gel was performed overnight with end-over-end rotation at room temperature. Following incubation, the Hz gel was washed with 3 × 800 µL of H2O followed by 3 × 800 µL of 50 mM NH4CHO3(aq) to remove any nonglycosylated peptides present in the tryptic digest mixture. The Hz gel was then resuspended in 50 µL of 50 mM NH4CHO3(aq) along with 3 µL of glycopeptidase F (PNGase F, glycerol free, 500 U/µL, New England Biolabs, Inc., Ipswich, MA) with end-over-end rotation at 37 °C for at least 4 h to release the peptide portion of any bound glycopeptides. The tryptic glycopeptides released from the Hz gel via enzymatic deglycosylation were collected, dried, and reconstituted for direct analysis by LC/MS/MS. Note that the glycan portion of any captured glycopeptide remains attached to the Hz gel when using this sample preparation strategy. LC/MS/MS Conditions and Data Analysis. Custom HILIC columns (300 µm × 150 mm) were packed with a methanol slurry of TSKgel Amide-80 particles (5 µm) obtained from a 4.6 mm i.d. × 10 cm column (Tosoh Bioscience LLC, Montgomeryville, PA) at e1500 psi. Chromatography was performed using an Agilent 1100 capillary flow system (Agilent Technologies, Inc., Santa Clara, CA) at a flow rate of 10 µL/min. Mobile phase A was 0.1% formic acid (Sigma-Aldrich, St. Louis, MO) in acetonitrile, and mobile phase B was 0.1% formic acid in water. Samples were injected in 0.1% TFA in 80% acetonitrile. Nonglycosylated peptides were washed from the column for 10 min at 20% B followed by a quick ramp to 95% B in 2.5 min to elute the glycopeptides isocratically while holding at 95% B for 20 min. The column was re-equilibrated at 20% B in 15 min. The HILIC-retained glycopeptide fraction was collected and dried to be further separated using reversed-phase chromatography. For reversed-phase chromatography, HILIC-concentrated intact glycopeptides or deglycosylated hydrazide SPE-captured glycopeptides were injected in 0.1% TFA(aq) onto a commercial C18 column (3.5 µm particle, 300 µm i.d. × 150 mm, Symmetry300 NanoEase column, Waters Corporation, Milford, MA) coupled to a Waters QTOF Premier mass spectrometer (Waters Corporation, Milford, MA) for tandem MS analysis. Mobile phase A was 0.1% formic acid in water, and mobile phase B Analytical Chemistry, Vol. 81, No. 8, April 15, 2009
3111
Table 1. Predicted Tryptic Peptides Containing Potential N-Linked Glycosylation Sites from the H5N1 Influenza Strains Investigateda glycosite
residue
region
amino acid sequence (K) SDQICIGYHANNSTEQVDTIMEK (N)b (K) SDQICIGYHANNSTEQVDTIMEK (N)c (K) SDQICIGYHANNSTEQVDTIMEK (N)d (K) NVTVTHAQDILEK (K)b (K) NVTVTHAQDILEK (T)c (K) NVTVTHAQDILEK (T)d (K) NSTYPTIK (R)b (K) NSTYPTIK (K)c (K) NNAYPTIK (R)d (R) SYNNTNQEDLLVLWGIHHPNDAAEQTK (L)b (K) SYNNTNQEDLLVLWGIHHPNDAAEQTR (L)c (R) SYNNTNQEDLLVLWGIHHPNDAAEQTR (L)d (K) CQTPMGAINSSMPFHNIHPLTIGECPK (Y)b (K) CQTPMGAINSSMPFHNIHPLTIGECPK (Y)c (K) CQTPIGAINSSMPFHNIHPLTIGECPK (Y)d (R) NGTYDYPQYSEEAR (L)b (R) NGTYNYPQYSEEAR (L)c (R) NGTYDYPQYSEEAR (L)d (K) LESIGIYQILSIYSTVASSLALAIMVAGLSLWMCSNGSLQCR (I)b (K) LESIGTYQILSIYSTVASSLALAIMMAGLSLWMCSNGSLQCR (I)c (K) LESIGTYQILSIYSTVASSLALAIMVAGLSLWMCSNGSLQCR (I)d
1
16-38
HA1
2
39-51
HA1
3
170-177
HA1
4
179-205
HA1
5
294-320
HA1
6
500-513
HA2
7
524-565
CYT
a Potential N-linked glycosylation sites are underlined and in bold. Amino acid differences are italicized and in bold. Note that these are only the amino acids that are different in the potential tryptic glycopeptides and should not be considered the only differences among the selected strains. b rgA/Vietnam/1203/2004 [AAW80717]. c A/Indonesia/05/2005 [ABW06108]. d A/Bar-headed goose/Qinghai Lake/1A/05 [no sequence in database]ssequence from A/Bar-headed goose/Qinghai/61/05 [ABE68926] used. HA1 ) globular head region; HA2 ) stem region; CYT ) cytosolic tail region.
was 0.1% formic acid in acetonitrile. Chromatography was performed at a flow rate of 10 µL/min with a linear gradient of 5-50% B in 55 or 110 min following an initial 5 min hold at 5% B. The PepSeq program within the MassLynx software package (Waters Corporation, Milford, MA) was used for de novo sequence analysis of peptide MS/MS data that had been deconvoluted using the MaxEnt 3 function. Sequence tags determined from PepSeq were used in MS-Pattern of Protein Prospector45 to search the nonredundant database of the National Center for Biotechnology Information (NCBI) for protein identification. Protein sequences for specific strains were retrieved and downloaded from NCBI’s Influenza Virus Resource46 and aligned using MUSCLE.47 PepSeq was used along with ExPASy’s GlycoMod Tool48 to predict potential glycan compositions from glycan masses determined from MS/MS analysis of intact glycopeptides. Monosaccharides selected in GlycoMod Tool for consideration as potentially present were chosen based on species historically observed for influenza HA18 (allowed hexose, N-acetylhexosamine, deoxyhexose, Nacetylneuraminic acid, and sulfate; disallowed N-glycolylneuraminic acid, deaminated neuraminic acid, glucuronic acid, pentose, and phosphate). RESULTS AND DISCUSSION Significant effort has been made recently to determine amino acid differences among subtypes and strains with a focus on the selection of peptide targets that are conserved within a subtype (45) Clauser, K. R.; Baker, P.; Burlingame, A. L. Anal. Chem. 1999, 71, 2871– 2882. (46) Bao, Y.; Bolotov, P.; Dernovoy, D.; Kiryutin, B.; Zaslavsky, L.; Tatusova, T.; Ostell, J.; Lipman, D. J. Virol. 2008, 82, 596–601. (47) Edgar, R. Nucleic Acids Res. 2004, 32, 1792–1797. (48) Cooper, C. A.; Gasteiger, E.; Packer, N. H. Proteomics 2001, 1, 340–349.
3112
Analytical Chemistry, Vol. 81, No. 8, April 15, 2009
for the LC/MS/MS quantification of hemagglutinin.49,50 Differences in the peptides containing potential glycosylation sites were not investigated in those studies. In this study, protein sequences were first examined to check for interstrain variation in the number and location of potential glycosylation sites within the three H5N1 strains selected. Sequence analysis can be used as a prediction tool but not as a definitive answer on the actual presence of glycosylation at a particular site. Protein sequences were available in the database for the two human strains investigated; however, the specific bar-headed goose strain was not. From the sequences available, there are six potential N-linked glycosylation sites within the N-terminal ectodomain of HA for the human strains (Table 1). On the basis of the 13 bar-headed goose sequences in the database, there are only five sites predicted within this same portion of the protein. Although there is an additional predicted glycosylation site inside the viral membrane near the C-terminus for both the human and the barheaded goose strains, the transmembrane and cytosolic portions of HA are not typically of interest for binding studies involving an influenza virion to a host cell. Additionally, the cytosolic tail of the protein is known to be palmitoylated, and observation of peptides from this portion of the protein was not expected based on the sample preparation strategy chosen. Sequences for the entire subset of human H5N1 strains from Vietnam (47 sequences) and Indonesia (106 sequences) and for all bar-headed goose strains (13 sequences) were aligned to determine conservation of the potential tryptic glycopeptides predicted from the reagent strain sequences (Table 2). In the case of the bar-headed goose subset, the 13 available sequences were (49) Williams, T. L.; Luna, L.; Guo, Z.; Cox, N. J.; Pirkle, J. L.; Donis, R. O.; Barr, J. R. Vaccine 2008, 26, 2510–2520. (50) Luna, L. G.; Williams, T. L.; Pirkle, J. L.; Barr, J. R. Anal. Chem. 2008, 80, 2688–2693.
Table 2. Potential Glycosylation Site Conservation for H5N1 Influenza Strain Subsets glycosite 1 2 3 4 5 6 7 strainsubset (total strains)
percent (number) strains with sequence conserved 100 (47) 100 (47) 91.5 (43) 55.3 (26) 91.5 (43) 95.7 (45) 89.4 (42) Vietnam (47)
99.1 (105) 98.1 (104) 84.9 (90) 73.6 (78) 98.1 (104) 99.1 (105) 51.9 (55) Indonesia (106)
100 (13) 100 (13) not predicted (13) 100 (13) 100 (13) 100 (13) 100 (13) bar-headed goose (13)
identical for the five predicted tryptic glycopeptides. For the missing potential glycopeptide (site 3), 11 of the strains had the sequence reported in Table 1 (two strains had Asp170 instead of Asn170). For all three reagent strains examined, the exact position of glycosylation site 1 cannot be predicted from the AA sequence alone since either Asn26 or Asn27 could potentially have a glycan attached. It is unlikely that both would be occupied due to steric hindrance, and MS/MS should be able to determine which Asn residue is the actual site of glycosylation. For the Vietnam and Indonesia strain subsets, the AA sequence for the peptide containing potential glycosylation site 4 shows the least amount of sequence conservation, suggesting that this region of the protein is more susceptible to mutation than the other potential glycosylation sites. Additionally, the bar-headed goose strain subset was found to have 100% sequence conservation for the five predicted glycosylation sites. However, when the hydrazide SPE method was used in our studies a sixth, nonpredicted site was unveiled suggesting a region of the protein that may readily undergo mutation. Initially we tried a two-stage sample fractionation approach without first determining the peptide portion of the glycopeptides present. Direct MS/MS analysis of glycopeptides generated from digestion of whole inactivated H5N1 influenza was complicated by the fact that CID tends to be dominated by the formation of product ions generated from fragmentation of the glycan portion of the molecule.51,52 In the best cases this analysis yielded the bare peptide molecular weight and sometimes confirmatory peptide fragment ions. Unfortunately, we found that in our hands influenza hemagglutinin does not form only truly tryptic fragments but also undergoes nonspecific and missed cleavages among other modifications. Additionally, protein sequence information for many strains and serotypes is not available, further complicating glycosylation site identification from the MS/MS data of intact HA glycopeptides alone. Since we were interested in all of the occupied glycosylation sites within the N-terminal ectodomain of the protein, we found it highly beneficial to first experimentally determine the products of our specific tryptic digest conditions before attempting to interpret the intact glycopeptide MS/MS data (especially since this data interpretation is primarily a manual task). The general workflow used to accomplish this task for influenza HA is depicted in Figure 1. Terminal hydrazide groups bound to agarose gel selectively bind aldehydes and ketones to (51) Conboy, J. J.; Henion, J. D. J. Am. Soc. Mass Spectrom. 1992, 3, 804–814. (52) Medzihradszky, K. F.; Gillece-Castro, B. L.; Settineri, C. A.; Townsend, R. R.; Masiarz, F. R.; Burlingame, A. L. Biol. Mass Spectrom. 1990, 19, 777–781.
Figure 1. Workflow used for determining N-linked glycosylation site occupation and for investigating the size of glycans attached at particular sites on HA.
form a stable hydrazone bond. The use of a low concentration of NaIO4 to oxidize vicinal diols (not typically found in peptides) present in the glycan portions of glycopeptides allows this method to discriminate between glycosylated and nonglycosylated species in tryptic digests of the selected influenza samples. The deglycosylated forms of the glycopeptides isolated by hydrazide SPE were de novo sequenced, and glycosylation site occupation was confirmed. Once the identity of the tryptic peptides that contain occupied glycosylation sites was determined, a second tryptic digest sample that was not treated with NaIO4 was fractionated with HILIC and analyzed using LC/MS/MS to determine the size of the glycan portion of each glycopeptide for the reassortant egg-grown strain (Ind05/PR8-RG2). Although the total number of unique proteins that should be present in the whole inactivated H5N1 influenza virus samples should be relatively small (influenza RNA codes for only 11 proteins), the resulting chromatogram from the LC/MS/MS analysis of the tryptic digest mixture is still rather complex. Also, there is a much higher abundance of matrix protein 1 (M1) and nucleoprotein (NP) than of HA and NA. This makes searching for signal from intact glycopeptides or their deglycosylated counterparts difficult if no discrimination between glycosylated and nonglycosylated peptides is utilized. Figure 2 demonstrates the reduction in sample complexity gained through use of hydrazide SPE to separate glycopeptides from nonglycosylated peptides. Essentially the only species observed following this cleanup method are peptides resulting from capture of the occupied glycosylation sites. The hydrazide reaction with oxidized glycans is highly specific; it allows for the observation of lowabundance peptide modifications that might not be typically observed if deglycosylation without a prior separation step had been performed. In addition, since the hydrazide SPE method is selective for occupied glycosylation sites, de novo interpretation Analytical Chemistry, Vol. 81, No. 8, April 15, 2009
3113
Figure 2. (a) Base peak intensity (BPI) chromatogram for a tryptic digest of the A/Vietnam/1203/2004 whole inactivated virus reagent without any sample treatment prior to liquid chromatography/tandem mass spectrometry (LC/MS/MS) analysis. (b) BPI chromatogram of a tryptic digest from the same A/Vietnam/1203/2004 strain with LC/ MS/MS analysis performed after isolation of all tryptic glycopeptides by hydrazide solid-phase extraction. Ions observed were the deglycosylated forms of the captured glycopeptides. Numbers 1-6 correspond to the predicted glycosylation sites that were identified as occupied.
of the deglycosylated peptides generated by the enzymatic release of the captured glycopeptides becomes a simpler task. Occupied glycosylation sites can be identified even in the case of other modifications, missed/nonspecific cleavages, or unknown peptide sequences. By using the hydrazide SPE technique described, we found that our specific proteolysis conditions consistently yielded a mixture of digestion products for many of the occupied glycosylation sites for the strains investigated, hence the observation of multiple peaks in the chromatogram in Figure 2b for several of the glycosylation sites. This information underscores the need in this application for determining the peptide portion generated from enzymatic digestion of HA before attempting to interpret intact glycopeptide mass spectra. Similar chromatograms were also generated for digests of the other H5N1 strains after being processed by the hydrazide SPE method (data not shown). When deglycosylation is performed without first separating out glycosylated species, deamidation can occur in nonglycosylated peptides under normal sample handling conditions causing potential misidentifications of N-linked glycosylation sites. 18O labeling during deglycosylation has been proposed as a method of avoiding this potential ambiguity in glycosylation site identification53 and has been utilized in the analysis of pathogen glycosylation in combination with concanavalin A preconcentration of glycopeptides.54 However, the sample must be completely free of trypsin in these experiments or nonspecific incorporation can (53) Kuster, B.; Mann, M. Anal. Chem. 1999, 71, 1431–1440. (54) Atwood, J. A., III.; Minning, T.; Ludolf, F.; Nuccio, A.; Weatherly, D. B.; Alvarez-Manilla, G.; Tarleton, R.; Orlando, R. J. Proteome Res. 2006, 5, 3376– 3384.
3114
Analytical Chemistry, Vol. 81, No. 8, April 15, 2009
Figure 3. Mass spectra of deglycosylated tryptic peptides containing occupied glycosylation sites for A/Vietnam/1203/2004 (H5N1). Numbers refer to glycosylation sites 1-6 as predicted from the amino acid sequence (see Table 1). Note that for each glycosylation site (underlined and in bold), Asn has been converted to Asp via deglycosylation during the glycopeptide release step of the hydrazide solid-phase extraction method. Mox ) oxidized methionine; Cam ) carboxyamidomethyl cysteine; Camox ) carboxyamidomethyl cysteine sulfoxide; * ) cone voltage induced fragment ion of 2a.
occur.55 The use of hydrazide SPE removes the possibility of the Asn to Asp conversion resulting from deamidation and not deglycosylation since nonglycosylated species are not retained by the Hz gel or, if they are, they will not be released by using an endoglycosidase such as PNGase F. Hydrazide SPE was used to isolate glycopeptides from the three selected H5N1 strains independently. Figure 3 shows the survey scan mass spectra for the glycopeptides isolated from the Vietnam whole-virus reagent sample following reduction and alkylation of disulfide bonds, proteolysis with trypsin, and oxidation of the glycans present. The peptides observed in this analysis were the deglycosylated forms, and the glycosylation site was identified by the presence of an Asp within the N-linked consensus sequence (Asp-Xxx-Ser/Thr where Xxx * Pro) instead of an Asn. Similar spectra were observed for tryptic digests of the other whole-virus reagent samples when the hydrazide SPE method was used (data not shown). De novo sequencing of the hydrazidecaptured glycopeptides confirmed that all of the predicted N-linked glycosylation sites (Table 1) were in fact occupied for the three different H5N1 strains investigated. By isolating the glycopeptides using the Hz gel, we were able to identify all occupied N-linked glycosylation sites within the N-terminal ectodomain of HA, even when unpredicted modifications or missed/nonspecific tryptic cleavages had occurred. In the case of glycosylation site 1, de novo sequencing of the MS/MS data identified that the glycan was in fact attached to Asn27 and not Asn26 for all three strains. We were also able to confirm that glycosylation site 3 was present (55) Angel, P. M.; Lim, J.-M.; Wells, L.; Bergmann, C.; Orlando, R. Rapid Commun. Mass Spectrom. 2007, 21, 674–682.
in the bar-headed goose strain even though the sequence of this particular strain had not been deposited in the database and subsequent prediction of that glycosylation site was not possible from the other available bar-headed goose strains. The sequence of this additional site in the selected bar-headed goose strain was determined to be NNTYPTIK. Additionally, all of the glycosylation site containing peptides for the bar-headed goose sample were found to match those predicted by the other bar-headed goose sequences available in the database with the exception of site 3. Although de novo analysis of MS/MS data is more intensive than standard database search methods, the information that is provided by manually generating sequence tags that are then used to perform directed database searches is too valuable to neglect. It therefore becomes important to reduce sample complexity as much as possible prior to LC/MS/MS analysis, to keep accurate records of peptides identified via de novo sequencing, and to actively use the include/exclude list settings of the MS/MS scan functions. While sequencing the HA peptides captured by the hydrazide SPE method, multiple modifications due to the sample processing steps used were observed. These included the oxidation of methionine56-58 although no similar oxidation of tryptophan59 was observed in our samples. A portion of the carboxyamidomethyl cysteine residues present following reduction and alkylation were also oxidized in a similar fashion.60 Additionally, N-terminal carboxyamidomethyl cysteine was observed as a cyclic form in some instances.61 Initially, no steps were taken to reduce and alkylate any disulfide bonds present in HA in order to keep sample handling and data interpretation as simple as possible; however, insufficient peptide fragmentation of suspected tryptic glycopeptides for glycosylation site 5 following deglycosylation indicated that an internal disulfide bond might be present. Including a reduction and alkylation step was found to allow better fragmentation of this peptide and was therefore incorporated into the protocol. Having determined which glycosylation sites were actually occupied for HA from the three H5N1 reagent strains, knowledge of the glycosylated peptides produced by the selected digestion conditions and their elution order in reversed-phase LC separation was then used to aid in the analysis of intact glycopeptides from an egg-grown reassortant virus (Ind05/PR8-RG2). When glycopeptides are analyzed by MS/MS, unique fragment ions are produced in the low m/z range that are indicative of the presence of glycosylation.51 These signature oxonium ions have been utilized for determining the presence of glycosylated species when analyzing tryptic digests of proteins.62-64 A list of some typical glycopeptide oxonium ions that may be observed in MS/ (56) Bouchon, B.; Jaquinod, M.; Klarskov, K.; Trottein, F.; Klein, M.; Van Dorsselaer, A.; Bischoff, R.; Roitsch, C. J. Chromatogr., B 1994, 662, 279– 290. (57) Lagerwerf, F. M.; Weert, M. v. d.; Heerma, W.; Haverkamp, J. Rapid Commun. Mass Spectrom. 1996, 10, 1905–1910. (58) Mo, W.; Ma, Y.; Takao, T.; Neubert, T. A. Rapid Commun. Mass Spectrom. 2000, 14, 2080–2081. (59) Taylor, S. W.; Fahy, E.; Murray, J.; Capaldi, R. A.; Ghosh, S. S. J. Biol. Chem. 2003, 278, 19587–19590. (60) Yagu ¨e, J.; Nu´n ˜ez, A.; Boix, M.; Esteller, M.; Alfonso, P.; Casal, J. I. Proteomics 2005, 5, 2761–2768. (61) Geoghegan, K. F.; Hoth, L. R.; Tan, D. H.; Borzilleri, K. A.; Withka, J. M.; Boyd, J. G. J. Proteome Res. 2002, 1, 181–187. (62) Carr, S. A.; Huddleston, M. J.; Bean, M. F. Protein Sci. 1993, 2, 183–196. (63) Sullivan, B.; Addona, T. A.; Carr, S. A. Anal. Chem. 2004, 76, 3112–3118. (64) Olivova, P.; Chen, W.; Chakraborty, A. B.; Gebler, J. C. Rapid Commun. Mass Spectrom. 2008, 22, 29–40.
Table 3. Oxonium Ions Potentially Observed in Tandem Mass Spectrometry Spectra [M + H]+ (m/z) [mono] MW (Da) [mono] 147.066 163.061 168.066 186.077 204.087 274.093 292.103 366.140 407.167 528.193 657.235
146.058 162.053 167.058 185.069 203.079 273.085 291.095 365.132 406.159 527.185 656.228
compositiona dHex Hex HexNAc-2H2O HexNAc-H2O HexNAc NeuAc-H2O NeuAc Hex(HexNAc) (HexNAc)2 Hex2(HexNAc) NeuAc(Hex)(HexNAc)
a dHex ) deoxyhexose; Hex ) hexose; HexNAc ) N-acetylhexosamine; NeuAc ) N-acetylneuraminic acid (sialic acid).
MS spectra is included in Table 3. On the QTOF MS used in the experiments reported herein, a high/low expression scan function is available that allows monitoring for these oxonium ions by alternating the energy applied to the collision cell without performing a precursor ion selection. This function records all potential precursor ions in one scan (low energy) followed by all of the product ions in the subsequent scan (high energy). This allows fragmentation of low-abundance glycopeptides that would not otherwise be selected for fragmentation in a typical datadependent analysis (DDA) experiment. This scan type can be used to quickly check for the presence of glycopeptides in a sample and to determine LC retention times of the intact glycopeptides. When obtaining the required amounts of sample is not practical to perform full glycan analysis or when the exact glycan sequence is not needed, knowledge of an accurate glycan mass of the intact glycopeptides obtained from MS/MS can be used to predict glycan composition. Combining the data from our LC/MS/MS analyses of tryptic digests of the H5N1 influenza samples both before and after enzymatic deglycosylation yielded the intact glycopeptide, bare peptide, and glycan masses as well as the peptide sequence for the selected reassortant virus grown in eggs. A broadly selective preconcentration step was utilized prior to analyzing intact glycopeptides from tryptic digests of this sample. In-house packed HILIC columns were chosen to perform a rough separation of the intact glycopeptides from nonglycopeptides based on reports in the literature.65,66 The separation relies upon interactions of the hydrophilic glycan portion of the glycopeptides with carbamoyl functional groups on the HILIC stationary phase. Conveniently, the same solvents and chromatography system were used for this separation as were utilized in our reversed-phase separations. This allowed for optimal separation conditions to be determined by monitoring the LC output for oxonium ions selected from those listed in Table 3 (namely, m/z 204.087 and 366.140). Although the HILIC packing material was not ideal for performing high-resolution separations of the individual glycopeptides, it was useful for collecting a total glycopeptide-enriched fraction to further separate with a standard C18 column. From our separation optimization experiments using digests of influenza samples as well as standard glycoproteins, we found that the general chromatogram shape was independent of the (65) Wuhrer, M.; Boer, A. R. d.; Deelder, A. M. Mass Spectrom. Rev. 2009, 28, 192–206. (66) Zaia, J. Mass Spectrom. Rev. 2009, 28, 254–272.
Analytical Chemistry, Vol. 81, No. 8, April 15, 2009
3115
Figure 4. Liquid chromatography/tandem mass spectrometry (LC/MS/MS) of the hydrophilic interaction liquid chromatography-retained (glycopeptide-rich) fraction of a tryptic digest of an egg-grown reassortant influenza virus (Ind05/PR8-RG2). (a) Extracted ion chromatogram (EIC) of m/z 204.087 and 366.140 from the high-energy scan showing presence of glycopeptides. Labels refer to protein and glycosylation site assignments made for glycopeptides observed at particular retention times (i.e., NA1 ) neuraminidase glycosite 1). (b) Deconvoluted MS/MS spectrum of [938.397]3+. The proposed glycan composition for this particular glycopeptide is (Hex)1(HexNAc)3(Fuc)1(SO3)(Man)3(GlcNAc)2. Fuc ) fucose; Man ) mannose; GlcNAc ) N-acetylglucosamine; Hex ) hexose; HexNAc ) N-acetylhexosamine.
starting glycopeptide mixture (data not shown). With the conditions selected, the glycopeptide containing fraction consistently eluted from roughly 19 to 23 min. This separation allowed for the removal of many of the high-abundance nonglycosylated tryptic peptides, although some of the more hydrophilic nonglycosylated peptides did coelute with the intact glycopeptides. However, the HILIC separation removed enough components to improve the chances of selecting glycopeptides for MS/MS using the DDA scan function. For example, only the most abundant glycopeptides present in a tryptic digest of the egg-grown reassortant virus sample were actually selected for MS/MS when the HILIC separation was not used. This was even the case when using the PID product (precursor ion discovery via product ion detection) scan mode67 available on the QTOF MS which combines the high/low expression scan function with DDA analysis to selectively collect fragmentation spectra for those ions that actually yield the glycopeptide oxonium ions at m/z 204.087 and 366.140. Figure 4 shows an example of the LC/MS/MS analysis of the HILIC-separated glycopeptide fraction of a tryptic digest of the egg-grown reassortant influenza virus (Ind05/PR8-RG2). The PID product scan was used in this case to try to increase the number of glycopeptides successfully selected for MS/MS analysis. Figure 4a is the high-energy scan from this experiment, showing the extracted ion chromatogram (EIC) for the oxonium ions at m/z 204.087 and 366.140. Glycopeptides from this sample were identified from five of the six known occupied glycosylation sites (67) Bateman, R. H.; Carruthers, R.; Hoyes, J. B.; Jones, C.; Langridge, J. I.; Millar, A.; Vissers, J. P. C. J. Am. Soc. Mass Spectrom. 2002, 13, 792–803.
3116
Analytical Chemistry, Vol. 81, No. 8, April 15, 2009
for HA and from one occupied glycosylation site of NA. Knowledge of the tryptic peptides and modifications expected was important for interpreting the MS/MS data. Figure 4b shows an example of the MS/MS analysis of one of the intact glycopeptides for HA glycosylation site 3 (m/z 938.397). The inset in Figure 4b is the deconvoluted low-energy survey scan showing the different glycoforms present for this particular glycosylation site at this particular retention time. Fragmentation of the parent ion is dominated by glycosidic bond cleavage giving predominantly sequential losses of monosaccharide units. This particular ion had strong enough signal to give a recognizable fragment ion for the bare peptide NSTYPTIK (HA glycosite 3 from A/Indonesia/5/ 2005) which was confirmed by the presence of several peptide fragment ions in the MS/MS spectrum and by hydrazide capture experiments. A likely N-linked glycan composition of (Hex)1(HexNAc)3(SO)3(Fuc)1(Man)3(GlcNAc)2 was proposed based on the calculated glycan molecular weight (intact glycopeptide ) 2812.168 Da; bare peptide ) 922.507 Da; glycan ) 1889.661 Da). Note this was one of two potential glycan compositions predicted for this calculated exact mass glycan molecular weight (predicted MW for this proposed glycan assignment ) 1889.623 Da; difference of 0.038 Da). The other potential composition [(Hex)4(HexNAc)1(Fuc)1(Man)3(GlcNAc)2; MW ) 1889.666 Da] was ruled out since the fragment ion at m/z 407.181 corresponding to the oxonium ion [HexNAc2 + H]+ would originate from an unlikely internal fragment of the pentasaccharide core in this alternate composition. In the composition assigned, the ion for [HexNAc2 + H]+
Table 4. N-Linked Glycans Identified by Liquid Chromatography/Tandem Mass Spectrometry of Hydrophilic Interaction Liquid Chromatography-Enriched Intact Glycopeptides from H5N1 Influenza (Ind05/PR8-RG2) Grown in Eggs site HA2
HA3
HA4
HA5
HA6
NA1
ion observed (m/z) (mono) b
1030.780 1079.467b 1152.539b 1179.145 901.129b 955.938b 1322.889b 1047.221b 1083.698b 886.749b,c 898.074b,c 911.748c 938.397b,c 946.757 965.765b,c 987.048 992.414 1014.456c 1088.766b 1129.251 1139.470 1149.728 1169.732 1179.999b 1190.311b 1200.515b 1226.771b 1230.760b 1237.029b 1241.081b 1257.031 1022.273b,c 1198.042c 1052.472 1092.983c 1148.181 1188.769c 1105.476b 1132.131 1154.169b 1173.120 1180.774 1202.793 920.639b 957.150b 993.638 1011.890 978.147 991.822b 1032.167 1045.792 1099.859 835.400b 1162.226b
z (+)
glycan [M + H]+ (mono)
∆mass (Da)
proposed glycan compositiona
3 3 3 3 4 4 3 4 4 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 5 4 4 4 4 4 3 3 3 3 3 3 4 4 4 4 3 3 3 3 3 4 3
1622.571 1768.624 1987.756 2067.656 2133.737 2352.888 2498.835 2718.021 2863.964 1606.647 1768.652 1809.711 1889.661 1914.743 1971.761 2035.646 2051.712 2117.814 1216.439 1378.548 1419.477 1460.242 1540.481 1581.583 1622.630 1663.674 1768.712 1784.589 1809.566 1825.730 1889.600 1971.735 1768.633 1971.809 2133.773 2336.751 2498.884 1622.642 1702.622 1768.694 1825.595 1848.556 1914.676 1987.741 2133.761 2279.743 2352.780 1216.476 1257.488 1378.590 1419.436 1581.525 1622.599 1768.706
0.011 0.016 0.042 0.015 0.035 0.042 0.069 0.043 0.073 0.060 0.012 0.045 0.038 0.046 0.042 0.035 0.036 0.037 0.016 0.072 0.047 0.190 0.066 0.071 0.048 0.066 0.072 0.045 0.100 0.069 0.023 0.016 0.008 0.090 0.001 0.100 0.020 0.060 0.084 0.054 0.067 0.040 0.021 0.027 0.011 0.087 0.066 0.053 0.039 0.114 0.066 0.013 0.017 0.066
(Hex)2(HexNAc)2(Man)3(GlcNAc)2 (Hex)2(HexNAc)2(dHex)1(Man)3(GlcNAc)2 (Hex)3(HexNAc)3(Man)3(GlcNAc)2 (Hex)3(HexNAc)3(SO3)(Man)3(GlcNAc)2 (Hex)3(HexNAc)3(dHex)1(Man)3(GlcNAc)2 (Hex)4(HexNAc)4(Man)3(GlcNAc)2 (Hex)4(HexNAc)4(dHex)1(Man)3(GlcNAc)2 (Hex)5(HexNAc)5(Man)3(GlcNAc)2 (Hex)5(HexNAc)5(dHex)1(Man)3(GlcNAc)2 (Hex)1(HexNAc)2(dHex)1(Man)3(GlcNAc)2 (Hex)2(HexNAc)2(dHex)1(Man)3(GlcNAc)2 (Hex)1(HexNAc)3(dHex)1(Man)3(GlcNAc)2 (Hex)1(HexNAc)3(dHex)1(SO3)(Man)3(GlcNAc)2 (Hex)2(HexNAc)2(dHex)2(Man)3(GlcNAc)2 (Hex)2(HexNAc)3(dHex)1(Man)3(GlcNAc)2 (Hex)1(HexNAc)3(dHex)2(SO3)(Man)3(GlcNAc)2 (Hex)2(HexNAc)3(dHex)1(SO3)(Man)3(GlcNAc)2 (Hex)2(HexNAc)3(dHex)2(Man)3(GlcNAc)2 (Hex)2(Man)3(GlcNAc)2 (Hex)3(Man)3(GlcNAc)2 (Hex)2(HexNAc)1(Man)3(GlcNAc)2 (Hex)1(HexNAc)2(Man)3(GlcNAc)2 (Hex)4(Man)3(GlcNAc)2 (HexNAc)3(SO3)(Man)3(GlcNAc)2 (Hex)2(HexNAc)2(Man)3(GlcNAc)2 (Hex)1(HexNAc)3(Man)3(GlcNAc)2 (Hex)2(HexNAc)2(dHex)1(Man)3(GlcNAc)2 (Hex)3(HexNAc)2(Man)3(GlcNAc)2 (Hex)1(HexNAc)3(dHex)1(Man)3(GlcNAc)2 (Hex)2(HexNAc)3(Man)3(GlcNAc)2 (Hex)1(HexNAc)3(dHex)1(SO3)(Man)3(GlcNAc)2 (Hex)2(HexNAc)3(dHex)1(Man)3(GlcNAc)2 (Hex)2(HexNAc)2(dHex)1(Man)3(GlcNAc)2 (Hex)2(HexNAc)3(dHex)1(Man)3(GlcNAc)2 (Hex)3(HexNAc)3(dHex)1(Man)3(GlcNAc)2 (Hex)3(HexNAc)4(dHex)1(Man)3(GlcNAc)2 (Hex)4(HexNAc)4(dHex)1(Man)3(GlcNAc)2 (Hex)2(HexNAc)2(Man)3(GlcNAc)2 (Hex)2(HexNAc)2(SO3)(Man)3(GlcNAc)2 (Hex)2(HexNAc)2(dHex)1(Man)3(GlcNAc)2 (Hex)2(HexNAc)3(Man)3(GlcNAc)2 (Hex)2(HexNAc)2(dHex)1(SO3)(Man)3(GlcNAc)2 (Hex)2(HexNAc)2(dHex)2(Man)3(GlcNAc)2 (Hex)3(HexNAc)3(Man)3(GlcNAc)2 (Hex)3(HexNAc)3(dHex)1(Man)3(GlcNAc)2 (Hex)3(HexNAc)3(dHex)2(Man)3(GlcNAc)2 (Hex)4(HexNAc)4(Man)3(GlcNAc)2 (Hex)2(Man)3(GlcNAc)2 (Hex)1(HexNAc)1(Man)3(GlcNAc)2 (Hex)3(Man)3(GlcNAc)2 (Hex)2(HexNAc)1(Man)3(GlcNAc)2 (HexNAc)3(SO3)(Man)3(GlcNAc)2 (Hex)2(HexNAc)2(Man)3(GlcNAc)2 (Hex)2(HexNAc)2(dHex)1(Man)3(GlcNAc)2
a dHex ) deoxyhexose; Hex ) hexose; HexNAc ) N-acetylhexosamine; Man ) mannose; GlcNAc )N-acetylglucosamine; Fuc ) fucose. b Multiple charge states observed for this glycopeptide. c Multiple peptides observed for the glycan due to nonspecific or missed cleavages or other posttranslational modifications.
could easily be generated by fragmentation from the nonreducing terminal end of the glycan. A full listing of the glycopeptides identified for this sample, with proposed glycan compositions, can be found in Table 4. Since all of the glycopeptides analyzed are N-linked species, the common pentasaccharide core composed of three mannose and two N-acetylglucosamine units can be identified along with fragment ions containing these residues. The exact sugar isomers and linkages for the rest of the glycan beyond this core structure
cannot be determined from this experiment; however, this data might not be entirely necessary when initially investigating glycosylation patterns of individual influenza strains and comparing differences among strains and virus propagation conditions. It is important to note that significantly more starting material is needed to perform this kind of analysis than hydrazide SPE to determine glycosylation site occupation due to the dilution of analyte signal over multiple glycoforms for each of the glycosylation sites. Ion signal for each of these glycoforms is further Analytical Chemistry, Vol. 81, No. 8, April 15, 2009
3117
diluted for several of the occupied glycosylation sites since multiple proteolysis products are observed. This is the reason that multiple peaks in the chromatogram (Figure 4a) are observed for glycosylation sites HA3 and HA5. Also, since automated data processing software is not readily available for accurately processing glycopeptide MS/MS data, it is important to perform targeted analyses with a reduction in the amount of data produced to allow for more samples to be processed and analyzed and to increase the likelihood of obtaining information on glycosylation site occupation and glycan composition. CONCLUSIONS We have demonstrated an approach for determining N-linked glycosylation site occupation and for investigating the glycans attached to those sites for tryptic digests of whole influenza virus samples by selective sample preconcentration and LC/MS/MS analysis. We have found that for the three selected H5N1 strains, all six predicted N-linked glycosylation sites are occupied. Identification of occupied N-linked glycosylation sites was possible even in cases such as the selected bar-headed goose strain in which the exact protein sequence was not available from the database. This approach actually identified the presence of glycosylation site 3 for the bar-headed goose strain, even when all other strains within this subset did not predict the presence of this site. From sequence alignment analysis for the three H5N1 strain subsets, site 4 would be of further interest in the Vietnam and Indonesia strains, whereas site 3 would be of special interest in the bar-headed goose strains, since these regions of the protein seem to be more variable than the other glycosylation sites. Such variation in the AA sequence around these individual glycosylation sites might have an influence on the glycans that are attached or the presence of a glycosylation site at all. Either of these situations may have an effect on the virulence and antigenicity of a particular strain.
3118
Analytical Chemistry, Vol. 81, No. 8, April 15, 2009
The reported method is a straightforward means of acquiring information about N-linked glycosylation of antigenic viral proteins and could be expanded to include analysis for O-linked glycosylation by incorporating additional peptide release steps into the hydrazide SPE sample processing. This methodology is general enough to identify occupied glycosylation sites even when the protein sequence is unknown or when the predicted site is ambiguous due to two potential sites being immediately next to each other (i.e., NNST). This approach can also be utilized to determine changes in glycosylation incurred via different virus propagation pathways. The information gained from these experiments could then be used to examine the relationship between changes in HA glycosylation and changes in virulence/antigenicity. ACKNOWLEDGMENT The authors thank the other members of the Biological Mass Spectrometry Laboratory at the National Center for Environmental Health, CDC for helpful discussions on sample preparation and data interpretation. We also gratefully acknowledge the Influenza Division at NCIRD/CDC for providing egg-grown reassortant virus samples. Reference in this article to any specific commercial product, process, service, manufacturer, or company does not constitute an endorsement or a recommendation by the U.S. Government or CDC. The findings and conclusions in this report are those of the authors and do not necessarily represent the views of the Centers for Disease Control and Prevention.
Received for review January 14, 2009. Accepted February 18, 2009. AC900095H