Subtyping of the Influenza Virus by High Resolution Mass Spectrometry

Apr 7, 2009 - E-mail: [email protected]. ... High resolution, high mass accuracy mass spectra of hemagglutinin and whole virus digests of influenza...
2 downloads 0 Views 2MB Size
Anal. Chem. 2009, 81, 3500–3506

Subtyping of the Influenza Virus by High Resolution Mass Spectrometry Alexander B. Schwahn,† Jason W. H. Wong,‡ and Kevin M. Downard*,† School of Molecular & Microbial Biosciences, University of Sydney, NSW 2006, Australia, and UNSW Cancer Research Centre, University of New South Wales, Sydney, NSW, Australia High resolution, high mass accuracy mass spectra of hemagglutinin and whole virus digests of influenza are shown to be able to be used to type and subtype the major circulating forms of the virus in humans. Conserved residues and peptide segments of the hemagglutinin antigen have been identified across type A and B strains, and for type B strains of the Yamagata 16/88 and Victoria 2/87 lineages. The theoretical masses for the protonated peptide ions for tryptic peptides of conserved sequence were subsequently shown to be unique in mass when compared to in silico generated peptides from all influenza viral protein sequences and those proteins known to contaminate virus preparations. The approach represents a more rapid and direct approach with which to type and subtype the virus that is of critical need to prepare strategies and treatments in the event of a local epidemic or global pandemic. Influenza is one of the most deadly viruses known to man.1 It remains a leading cause of death throughout the developed world. According the World Health Organization, the virus contributes to between 3-5 million cases of severe illness and between 250,000 and 500,000 deaths every year.2 To respond effectively to this antigenically variable virus, vaccines must be reassessed and reformulated annually to provide adequate protection against the most common circulating strains. This requires that such strains be characterized, a process orchestrated through a global surveillance network involving 112 influenza centers in 83 countries. The need to identify and characterize new strains is even more important in the event that new highly virulent strains can lead to global pandemics resulting in millions of deaths. Humans are primarily infected with type A and B influenza, and inactivated forms of common circulating strains of these types form the basis of current vaccines administered against the virus.3 While type B influenza is almost exclusively a human pathogen, type A influenza also infects birds and other mammals.4 Type A * To whom correspondence should be addressed. E-mail: kdownard@ usyd.edu.au. Phone: +61 (0)2 9351 4140. † University of Sydney. ‡ University of New South Wales. (1) Barry, J. M. The Great Influenza: The Epic Story of the Deadliest Plague In History; Viking: New York, 2004. (2) World Health Organization, Fact Sheet No. 211, March 2003. (3) World Health Organization, Wkly. Epidemiol. Rec. 2005, 80, 279-287. (4) Cheung, T. K.; Poon, L. L. Ann. N.Y. Acad. Sci. 2007, 1102, 1–25.

3500

Analytical Chemistry, Vol. 81, No. 9, May 1, 2009

strains are further subtyped5 according to the nature of the two surface protein antigens hemagglutinin (HA) and neuraminidase (NA). These facilitate the binding and release of the viral particles to the host cell respectively.6 There are presently 16 known HA and 9 known NA subtypes.7,8 Many different combinations of HA and NA proteins are possible though relatively few influenza A subtypes (e.g., H1N1, H1N2, and H3N2) are generally found circulating in the human population. However, where hosts are exposed simultaneously to both human and animal strains, gene reassortment in the host9 can give rise to the generation of highly virulent, diverged new strains of the virus. The major approach to type and subtype the virus employs the reverse transcriptase polymerase chain reaction (RT-PCR).10 Subtyping of the virus by RT-PCR uses sets of primers for conserved sequences of the different hemagglutinin and neuraminidase antigens for particular viral subtypes. The success of RTPCR-based methods is heavily influenced by the choice of appropriate primer sequences. In terms of the hemagglutinin antigen, primer pairs specific for the hemagglutinin (HA) gene of currently circulating strains of the virus are used. When the primers successfully anneal to their target complementary sequence, the presence of an amplified product is indicative of a particular type and/or subtype. The subtype of the virus is then confirmed by sequencing of the PCR products and a comparison of the sequences with those deposited in the Genbank or specialized flu databases. Parallel multiplex RT-PCR reactions can be employed to achieve this more rapidly.11 Given that significant sequence variations can appear across strains, and that unpredictable mutations can arise through antigenic drift and shift, by gene reassortment,9 the RT-PCR approach is not always successful. Primers can fail to anneal to the target sequence, and thus, the PCR is unable to amplify the (5) Wright, K. E.; Wilson, G. A.; Novosad, D.; Dimock, C.; Tan, D.; Weber, J. M. J. Clin. Microbiol. 1995, 33, 1180–1184. (6) Wagner, R.; Matrosovich, M.; Klenk, H. D. Rev. Med. Virol. 2002, 12, 159– 166. (7) Fouchier, R. A. M.; Munster, V.; Wallensten, A.; Bestebroer, T. M.; Herfst, S.; Smith, D.; Rimmelzwaan, G. F.; Olsen, B.; Osterhaus, A.D.M.E. J. Virol. 2005, 79, 2814–2822. (8) Laver, W. G.; Colman, P. M.; Webster, R. G.; Hinshaw, V. S.; Air, G. M. Virology 1984, 137, 314–323. (9) Barr, I. G.; Komadina, N.; Hurt, A.; Shaw, R.; Durrant, C.; Iannello, P.; Tomasov, C.; Sjogren, H.; Hampson, A. W. Virus Res. 2003, 98, 35–44. (10) World Health Organization, Manual on Animal Influenza Diagnosis and Surveillance, 2002. (11) Stockton, J.; Ellis, J. S.; Saville, M.; Clewley, J. P.; Zambon, M. C. J. Clin. Microbiol. 1998, 36, 2990–2995. (12) Han, X.; Lin, X.; Liu, B.; Hou, Y.; Huang, J.; Wu, S.; Liu, J.; Mei, L.; Jia, G.; Zhu, Q. J. Virol. Methods 2008, 152, 117–121 (2008). 10.1021/ac900026f CCC: $40.75  2009 American Chemical Society Published on Web 04/07/2009

desired target sequence. Microarrays of oligonucleotides probes12 to known types of the hemagglutinin and neuraminidase antigens can also fail to identify newly diverged strains or variants. A genotyping approach employing mass spectrometric analysis of the PCR amplicons also suffers from the same limitations.13 In the study of Sampath et al., a high-resolution mass spectrometer was used to “weigh” the amplicons obtained from the PCR amplification of conserved gene segments for the matrix, nucleoprotein, polymerase, and non-structural proteins across various serotypes. These masses were converted to likely base compositions that, through a comparison with known database sequences may enable the type and subtype of both human and animal influenza strains to be determined. Given that mass spectrometers are better adept at analyzing proteins over their nucleic acid counterparts14 because of the greater ionization efficiencies of proteins and their greater stabilities in a mass spectrometer, the implementation of the most direct method yet of typing and subtyping the virus is described here. It involves the proteotying of the virus through the targeted study of the hemagglutinin from a range of type A and B strains and subtypes by high accuracy mass analysis of their proteolytic peptides following digestion of the whole virus or following the release of its component surface antigens. It has previously been shown that it is possible to survey both the primary structure and antigenicity of the influenza virus in a single step using a matrix-assisted laser desorption ionization (MALDI) based immunoassay.15 These studies have enabled both the structure and antigenicity of the hemagglutinin antigen of H1N1 and H3N2 type A, and type B strains of the virus to be determined using whole virus16 or gel-recovered antigens that have separated using modern proteomics approaches.17,18 The assay allows specific epitopic peptides, targeted by monoclonal antibodies raised to a specific serotype, to be localized with fine molecular detail. Further, in time course experiments the relative rates of binding of various epitopic peptides and their variants can be established.19 Where high mass accuracy is achieved when studying the whole virus or single antigen digests, the identity of a conserved peptide and thus the viral subtype can be determined. Highresolution mass spectrometry (HR-MS) has been used for decades to establish the identity of chemical compounds, particularly natural products, by establishing their elemental composition.14 All elements exist as isotopes whose masses are non-integer, namely, they contain an integer and fractional component. The fractional mass component gives rise to subtle differences in the molecular mass of the compounds that may share a common nominal (integer) mass. These can be unique identifiers such that a molecular mass measurement that is accurate to a few partsper-million (ppm) can be sufficient to distinguish one possible elemental composition from another. (13) Sampath, R.; Hall, T. A.; Massire, C.; Li, F.; Blyn, L. B.; Eshoo, M. W.; Hofstadler, S. A.; Ecker, D. J. Ann. N.Y. Acad. Sci. 2007, 1102, 109–120. (14) Downard, K. M. Mass Spectrometry - A Foundation Course; Royal Society of Chemistry: Cambridge, 2004. (15) Kiselar, J. G.; Downard, K. M. Anal. Chem. 1999, 71, 1792–1801. (16) Kiselar, J. G.; Downard, K. M. Biochemistry 1999, 38, 14185–14191. (17) Morrissey, B.; Downard, K. M. Proteomics 2006, 6, 2034–2041. (18) Morrissey, B.; Streamer, M.; Downard, K. M. J. Virol. Methods 2007, 145, 106–114. (19) Morrissey, B.; Downard, K. M. Anal. Chem. 2008, 80, 7720–7726.

With sufficient mass accuracy, such unequivocal assignments of elemental compositions and structures can be extended to larger molecules (>500 Da.) and macromolecules. Spengler has shown20 that a hypothetical peptide of 19 amino acids of mass 1005.4433, in which all non-isomeric amino acids are possible at each position, has an unequivocal elemental composition as the accuracy of the mass measurement is increased to 0.1 ppm. A peptide of mass 1000.0000 will be accurate to an error of ±0.0001 with a mass precision of 0.1 ppm. Thus it will be distinguishable by mass alone from another peptide of 1000.0002 with this mass precision. Mass accuracies below 1 ppm are routinely obtained with high resolution Fourier transform ion cyclotron resonance (FT-ICR) mass spectrometers.21 The concept of using such mass tags or signatures for protein identification has recently been reported22 and applied to studies of bacteria.23 Given the restrictions in sequence imposed on the viral antigens of the influenza virus, it is apparent that unique mass signatures for conserved regions of the protein antigens may be found to enable the virus to be typed and subtyped based solely on the mass of particular proteolytic peptides despite frequent mutations from antigenic drift. In some cases, it can be envisaged that the mass of a single detected proteolytic peptide might be enough to enable a particular antigen type to be characterized. However, since non-conserved regions of the protein may undergo antigenic drift and generate peptides common in sequence and that are indistinguishable in mass, even within the small experimental errors attainable in high resolution mass spectrometry experiments, a small set of conserved peptides is better monitored. This article describes the identification of the peptide mass signatures for conserved segments of the hemagglutinin antigen across types and subtypes of the influenza virus that are in common circulation in humans. It subsequently demonstrates that uncharacterized strains of the virus can be rapidly, and simply, typed and subtyped in this way. EXPERIMENTAL SECTION Virus Strains. All influenza strains used in this study were obtained from Advanced ImmunoChemicals Inc. (Long Beach, CA, U.S.A.) as inactivated virus preparations from allantoic fluid of 10-11 day old embryonated eggs. The following influenza strains were used: A/Beijing/262/95 (H1N1), A/Panama/2007/99 (H3N2), B/Tokyo/53/99 (of B/Victoria/2/87 lineage), and B/Victoria/ 504/2000 (of B/Yamagata/16/88 lineage). Hemagglutinin Preparation and Digestion. Tryptic digested HA derived from type A influenza strains A/Beijing/262/95 (H1N1) and A/Panama/2007/99 (H3N2) were kindly provided by B. Morrissey and were prepared as previously described.17,18 Viral proteins of the type B viruses (20 µg of each virus strain) were separated by SDS-PAGE (12.5% separation gel) and visualized (20) Spengler, B. J. Am. Soc. Mass Spectrom. 2004, 15, 703–714. (21) Marshall, A. G.; Hendrickson, C. L.; Shi, S. D. Anal. Chem. 2002, 74, 252A– 259A. (22) Conrads, T. P.; Anderson, G. A.; Veenstra, T. D.; Pasa-Tolic´, L.; Smith, R. D. Anal. Chem. 2002, 72, 3349–3354. (23) Lipton, M. S.; Pasˇa-Tolic´, L.; Anderson, G. A.; Anderson, D. J.; Auberry, D. L.; Battista, J. R.; Daly, M. J.; Fredrickson, J.; Hixson, K. K.; Kostandarithes, H.; Masselon, C.; Markillie, L. M.; Moore, R. J.; Romine, M. F.; Shen, Y.; Stritmatter, E.; Tolic´, N.; Udseth, H. R.; Venkateswaran, A.; Wong, K.K.; Zhao, R.; Smith, R. D. Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 11049– 11054.

Analytical Chemistry, Vol. 81, No. 9, May 1, 2009

3501

by Coomassie brilliant blue staining. Hemagglutinin-containing bands were excised and destained (25 mM NH4HCO3, 50% acetonitrile) at 37 °C. The reduction and alkylation of cysteine residues of HA was achieved with DTT (10 mM DTT, 50 mM NH4HCO3; 30 min, 56 °C) and iodoacetamide (55 mM iodoacetamide, 50 mM NH4HCO3; 20 min at room temperature in the dark). Excess iodoacetamide was removed by washing with acetonitrile, and the gel pieces were dried in a vacuum concentrator. In gel tryptic digestion of the HA was carried out overnight at 37 °C using 13 ng · µL-1 modified trypsin (Roche Diagnostics GmbH, Mannheim, Germany) in a digestion buffer containing 25 mM NH4HCO3, 10% acetonitrile, and 3.42 mM octyl-β-D-glucopyranoside. Cleaved peptides were extracted by repeated sonication in 60% acetonitrile containing 0.1% trifluoroacetic acid. Extracted peptides were dried completely in a vacuum concentrator and dissolved in 25 mM NH4HCO3. Tryptic Digestion of Whole Influenza Virus. A 50 µg quantity of influenza virus B/Tokyo/53/99 (corresponding to 42 µL of the virus suspension) was concentrated to near dryness in a vacuum concentrator, resuspended in 50 µL digestion buffer (50 mM NH4HCO3, 10% acetonitrile, 2 mM DTT), and incubated at 37 °C for 4 h. A 1.5 µL volume (at 1 mg · mL-1) of modified trypsin (Roche Diagnostics, Mannheim, Germany) was added, and the digestion carried out overnight at 37 °C. To improve sample throughput, multiple whole virus samples can be digested in parallel, with digestion even accelerated using microwave irradiation.24 MALDI FT-ICR Mass Spectrometry. One microliter of sample was diluted with 3 µL matrix solution (10 mg · mL-1 R-cyano-4-hydroxycinnaminic acid, 50% acetonitrile, 0.1% TFA). One microliter of the analyte and matrix solution were spotted onto a MALDI target (MTP AnchorChip 400/384 TF) and dried in air. MALDI-FTICR mass spectra were recorded on a 7T Bruker APEX-Qe instrument (Bruker Daltonics, Billerica, MA, U.S.A.) in positive ion mode. Ions produced from 25 laser shots (at 5% laser power) were accumulated and then transferred to the FTICR cell. The MALDI plate potential was held at 385 V. Ions were accumulated above the plate in the ion source for 0.2 s, stored in the hexapole for 1.0 s, and then passed through the transfer optics in 1.0 msec using a side kick voltage of 0 V and an offset of -1.5 V. Thirty two scans were acquired and averaged into a single mass spectrum. Spectra were acquired for 512 K data points using a broadband excitation. The acquisition mass range was set to m/z 404-4000. An external mass calibration was applied using a mixture of peptides (comprising Angiotensin I, ACTH (1-17), (1839) and (7-38)). A mass resolution of 50,000 was achieved for the ions of angiotensin. Mass spectra were processed using Data Analysis v3.4 software (Billerica, MA, U.S.A.) and recalibrated internally utilizing identified peptide ions in each spectrum derived from viral hemagglutinin or nucleoprotein. Mass spectra are acquired from each sample within seconds and, because of the high loading capacity of the MALDI plate (24 × 16 ) 384 samples), many virus samples can be analyzed within a few minutes. (24) Pramanik, B. N.; Mirza, U. A.; Ing, Y. H.; Liu, Y-H., 1; Bartner, P. L.; Weber, P. C.; Bose, A. K. Protein Sci. 2002, 11, 2676–2687.

3502

Analytical Chemistry, Vol. 81, No. 9, May 1, 2009

Alignment of Database Hemagglutinin Sequences Across Strains. Translated protein sequences for gene sequences of hemagglutinin derived from human pathogenic influenza A subtype H1N1 and H3N2, and type B viruses were obtained from the NCBI Influenza Virus Resource25 using data derived from the NIAID Influenza Genome Sequencing Project26 and GenBank.27 Redundant sequences were removed with a software filter. Multiple sequence alignments for influenza type A H1 and H3 hemagglutinin sequences, as well as influenza type B hemagglutinin sequences, were generated using the ClustalW algorithm.28,29 A consensus sequence based on the most conserved residue at each position of the protein was calculated for each alignment using the Jalview multiple alignment editor.30 Since not all deposited sequences cover the full length of HA an initial alignment using only full-length sequences was created and subsequently amended using alignments of sequences that cover a common section of the full sequence. The resulting final consensus sequence over all alignments was used to determine candidates for potential signature peptides. The type B HA sequences were further subdivided and realigned according to their lineage.31,32 Identification of Signature Tryptic Peptide Masses. Each final consensus sequence for a given HA subtype (or lineage in case of the type B HA) was used to create an in silico tryptic digest. The relative frequency of occurrence p(n) for a conserved residue at position n was used to assess the conservation of the peptide sequence PS and cleavage site PC for each theoretical tryptic fragment (see results section for details) and the most conserved tryptic peptides were chosen as candidates for potential signature peptides characteristic of either H1, H3 or type B hemagglutinin derived from Yam88-like or Vic87-like viruses. Validation of Unique Masses of Signature Peptides Using the FluGest Algorithm. The signature peptide candidate masses for a type and subtype of the virus were compared against the masses of theoretical tryptic fragments from all nonredundant influenza protein sequences, and those of several chicken proteins (ovotransferrin, R1-fetoprotein, serum albumin, ovalbumin, ovalbumin related Y protein, apolipoprotein AI and chondrogenesis associated lipocalin) that contaminant influenza virus preparations, using the FluGest computer algorithm written expressly for this purpose. This algorithm, written in C++, generates the masses for peptide ions of the predicted tryptic fragments for all proteins within the custom database allowings for up to 3 missed cleavages, (25) Bao, Y.; Bolotov, P.; Dernovoy, D.; Kiryutin, B.; Zaslavsky, L.; Tatusova, T.; Ostell, J.; Lipman, D. J. Virol. 2008, 82, 596–601. (26) Ghedin, E.; Miller, N. A.; Shumway, M.; Zaborsky, J.; Feldblyum, T.; Subbu, V.; Spiro, D.; Sitz, J.; Koo, H.; Bolotov, P.; Dernovoy, D.; Tatusova, T.; Bao, Y.; St.George, K.; Taylor, J.; Lipman, D. J.; Fraser, C. M.; Taubenberger, J. K.; Salzberg, S. L. Nature (London) 2005, 437, 1162–1166. (27) Benson, D. A.; Karsch-Mizrachi, I.; Lipman, D. J.; Ostell, J.; Sayers, E. W. Nucleic Acids Res. 2008, 36, D25–D30. (28) Larkin, M. A.; Blackshields, G.; Brown, N. P.; Chenna, R.; McGettigan, P. A.; McWilliam, H.; Valentin, F.; Wallace, I. M.; Wilm, A.; Lopez, R.; Thompson, J. D.; Gibson, T. J.; Higgins, D. G. Bioinformatics 2007, 23, 2947–2948. (29) Thompson, J. D.; Higgins, D. G.; Gibson, T. J. Nucleic Acids Res. 1994, 20, 4673–4680. (30) Clamp, M.; Cuff, J.; Searle, S. M.; Barton, G. J. Bioinformatics 2004, 20, 426–427. (31) McCullers, J. A.; Saito, T.; Iverson, A. R. J. Virol. 2004, 78, 12817–12828. (32) Rota, P. A.; Wallis, T. R.; Harmon, M. W.; Rota, J, S.; Kendal, A. P.; Nerome, K. Virology 1990, 175, 59–68.

the oxidation of methionine and N-terminal pyroglutamate formation. It compares the signature masses with these tryptic peptide ion masses within a specified tolerance (in ppm). RESULTS AND DISCUSSION Alignment of Database Hemagglutinin Sequences Across Strains. Translated gene sequences for hemagglutinin for all strains of the virus isolated from human were obtained from the NIAID Influenza Genome Sequencing Project and GenBank database via the NCBI Influenza Virus Resource.25 Multiple sequence alignments across type A H1 (931 sequences), type A H3 (2814 sequences) and type B HA (1057 sequences) entries were performed using the ClustalW algorithm 28,29 to assess the level of conservation at each amino acid residue position (n). The type B sequences were further subdivided and realigned according to their lineage. B/Victoria/2/87-like (Vic87-like; 424 sequences) and B/Yamagata/16/88-like (Yam88-like; 629 sequences) strains, as differentiated by the deletion of amino acids at positions 165 and 167 in the hemagglutinin sequences of Yam88-like viruses 31,32 were subsequently realigned. A consensus sequence was calculated for each alignment (for H1, H3, all B type, B type Vic87-like and B type Yam88-like) and the most conserved residue assigned to each position of the hemagglutinin protein. The relative frequency of occurrence p(n) for the conserved residue (between 0-1), at each position n, was used to assess the sequence (PS) and cleavage site conservation (PC) across all tryptic peptides generated from an in silico digest according to eqs 1 and 2 PS ) ∏p(n)

(1)

PC ) p(K ⁄ R)N · p(K ⁄ R)C

(2)

In these equations, p(K/R)N and p(K/R)C refer to the relative frequency of occurrence that a lysine (K) or arginine (R) residue occurs ahead of the N-terminal cleavage site, or at the C-terminus of a potential tryptic fragment, respectively. The product of PS with PC (eq 3) was then used to calculate the overall frequency of occurrence (PO) of each theoretically predicted tryptic peptide and is utilized to assess the level of its conservation within the sequences analyzed of a given influenza type or subtype. PO)PS·PC

(3)

Those peptides which were conserved above a certain threshold level (PO(H1) > 0.94, PO(H3) > 0.95 and PO(B type) > 0.97 based on the highest level of conservation among detected peptides within the mass spectra of analyzed strains) were identified as potential signature peptides for each hemagglutinin type and subtype. The theoretical monoisotopic masses (12C- only) of the protonated ions [M+H]+ were calculated for these peptides. Identification of Signature Tryptic Peptide Masses. These masses were subsequently used to search the sequences of all influenza viral proteins (and some additional chicken egg proteins commonly found within virus preparations) across all strains and hosts. This was performed to establish that the masses are unique to the hemagglutinin tryptic peptides for the type and subtype of the virus from which they were derived, and differ in mass from any other tryptic peptide generated

Table 1. Summary of the Output of the FluGest Algorithm for a Signature Mass for H1 Hemagglutinin Within a Mass Tolerance of 50 ppm theoretical mass (1268.611683) ∆ ppm 1268.611683 1268.611676 1268.615697 1268.605159 1268.61908 1268.620408 1268.620416 1268.622917 1268.630312 1268.632317 1268.584707 1268.584523 1268.639537 1268.641536 1268.641546 1268.642714 1268.645387 1268.646059 1268.576661 1268.648939 1268.649399 1268.569425 1268.565402 1268.659304 1268.560913 1268.557532 1268.666698 1268.669272 1268.669272 1268.67456

0.00 -0.01 3.16 -5.14 5.83 6.88 6.88 8.86 14.68 16.27 -21.26 -21.41 21.96 23.53 23.54 24.46 26.57 27.10 -27.61 29.37 29.73 -33.31 -36.48 37.54 -40.02 -42.69 43.37 45.40 45.40 49.56

sequence

no. of hits

protein

EQLSSVSSFER EYNETVRTEK LNWLYESESK EFRDLMSQSR TIADGFTAMVDK YGNGVWMGRTK CFWRGGSINTK DPNNEKGNPGVK TASSFQDILMR WYM(+O)LMPRQK DWFM(+O)LMPGQK RM(+O)EDGFRDAR RM(+O)GAQMQRFK RYMAKRVESE LPDGPPCAQRSK TGIIRM(+O)M(+O)ESAK GHRGDTQIQTR SRSIIFNM(+O)ER TM(+O)MDQVRESR FQDILMRMSK AARNQYSGFVR FAWWSSDVDR FAWRSSDEDR TVGTHPNSSAGLK SGSCVVQCQTEK QYDSDEPKMR MQFSSLTVTVR pyGlu-QVLAELQDIEK pyGlu-QVLAELQDLEK DGRLPFPPNQK

1208 1 25 1 1 5 13 14 1 2 1 1 1 1 3 1 2 1 3 4 7 1 23 46 1 14 4 1 1 3

HA (H1) NA (N8) HA (H3) PB2 SAP NA (N1) HA (H10) NA (N2) NS2 NS1 NS1 HA (H6) MP1 NS1 PA NP PB1 PB2 NP NS2 PB2 NS1 NS1 MP1 HA (H9) PA PB2 PA PA NS1

strains influenza influenza influenza influenza G. gallus influenza influenza influenza influenza influenza influenza influenza influenza influenza influenza influenza influenza influenza influenza influenza influenza influenza influenza influenza influenza influenza influenza influenza influenza influenza

A (H1N1, H1N2, human, swine, avian, ubiq.) A (A/duck/Australia/341/83(H15N8)) A (H3N2, human, ubiq.) A (A/Chicken/Beijing/1/94(H9N2)) (Serum albumin precursor) A (H1N1, Thailand, 2001) A (not NA-specific, avian, ubiq.) A (H3N2,H5N2,H7N2, avian, ubiq.) A (A/duck/Shantou/515/2004(H9N2)) A (not subtype specific, avian, ubiq.) A (A/swine/Korea/CAS05/2004(H3N2)) A (A/swan/Shimane/190/2001(H6N9)) A (A/chicken/Pennsylvania/13609/93 (H5N2NSB)) A (A/duck/Zhejiang/11/2000(H5N1)) A (not subtype specific, avian, ubiq.) A (A/Wisconsin/3623/1988(H1N1)) A (not subtype specific (but both N2), avian, ubiq.) A (A/swine/Korea/S452/2004(H9N2)) A (not subtype specific, human, avian, ubiq.) A (not subtype specific, not host specific, ubiq.) A (H5N2, H2N3, avian, U.S.A.) A (A/domestic green-winged teal/Hunan/79/2005(H5N1)) A (H5N1, avian, human, Vietnam, China) A (not subtype specific (but only H6 and H9), avian, U.S.A.) A (A/guinea fowl/Hong Kong/WF10/99(H9N2)) A (H1N1, H1N2, H3N2, swine, human, ubiq.) A (H7N2, avian, New York, 1994/95) A (A/goose/Guiyang/1461/2006(H5N1)) A (A/equine/Berlin/1/1989(H3N8)) A (H5N1, avian, China, HK)

Analytical Chemistry, Vol. 81, No. 9, May 1, 2009

3503

Table 2. Summary of the Outputs of the FluGest Algorithm for Signature Masses for H3, all B type, B type Vic87-like, and B Type Yam88-like Hemagglutinin Within a Mass Tolerance of 3 ppm theoretical mass (514.265965)

∆ ppm

sequence

no. of hits

protein

strains

514.265965 514.265965

0.00 0.00

GYFK YGFK

2832 1

HA (H3) MP2

influenza A (not NA-specific, not host specific, ubiq.) influenza A (A/Duck/Hong Kong/P185/97(H3N8))

theoretical mass (593.307520)

∆ ppm

sequence

no. of hits

protein

strains

593.307520

0.00

SSIMR

2685

HA (H3)

influenza A (not NA-specific, not host specific, ubiq.)

theoretical mass (937.510113)

∆ ppm

sequence

no. of hits

protein

strains

937.510113 937.510120

0.00 0.01

TGTIVYQR GPSLPPNQK

298 14

HA (Btype) NS1

influenza B (human, ubiq.) influenza A (H5N2, chicken, ubiq.)

theoretical mass (939.489378)

∆ ppm

sequence

no. of hits

protein

strains

939.489378 939.489375

0.00 0.00

TGTITYQR KYKEESR

139 189

HA (Btype) HA (H9)

influenza B (ubiq.) influenza A (H9N2, avian, swine, ubiq.)

theoretical mass (1035.546891)

∆ ppm

no. of hits

protein

LYGDSKPQK

sequence

146

HA (Btype)

KESNYPVAK DGKGDVAFVK KVDGKWMoxR VDGKWMoxRK MoxGETVLEIK

2 1 1 3 1

HA (H2,H3) OT NP NP BM2

strains

1035.546891

0.00

influenza B (human, ubiq.) influenza A (A/Moscow/1019/1965(H2N2); A/Singapore/1/1957(H2N2))

1035.54689 1035.5469 1035.54489 1035.54489 1035.543553

0.00 0.01 -1.93 -1.93 -3.22

theoretical mass (1280.626924)

∆ ppm

sequence

no. of hits

protein

strains

1280.626924 1280.627810

0.00 0.69

SKPYYTGEHAK DWFMLMPRGK

445 3

HA (Btype) NS1

influenza B (human, ubiq.) influenza A (H1N1, H3N2 swine, China, 2005)

G. gallus influenza influenza influenza

(Ovotransferrin precursor) A (A/swine/Wisconsin/125/97(H1N1)) A (H1N1, H2N2, human, ubiq.) B (B/Perth/25/2002)

Table 3. Summary of Signature Peptides Derived from the Hemagglutinin Antigen for Human Type A (H1N1, H3N2) and Type B Influenza Viruses type

subtypea/ lineage

[M+H]+

sequence

PO

PS

PC

residuesb

Nalnc

A A A A B B B B

H1 H1 H3 H3 all Vic87 Vic87 Yam88

780.429016 1268.611683 514.265965 593.307520 1280.626920 1035.546891 939.489378 937.510113

FEIFPK EQLSSVSSFER GYFK SSIMR SKPYYTGEHAK LYGDSKPQK TGTITYQR TGTIVYQR

0.9840 0.9424 0.9897 0.9553 0.9896 0.9789 0.9859 0.9716

0.9904 0.9557 0.9933 0.9732 0.9906 0.9906 0.9882 0.9716

0.9936 0.9861 0.9964 0.9816 0.9991 0.9882 0.9976 1.0000

114-119 103-113 257-260 266-270 305-315 203-211 257-264 257-264

931 931 2814 2814 1057 424 424 629

a Lineage affiliation for influenza type B. b Numbering of type B HA according to strain B/Lee/40. c Number of sequences used in multiple sequence alignment.

from an in silico digest for all other influenza viral proteins within the database. These searches were achieved with a computer algorithm (FluGest) written expressly for this purpose. The algorithm compares an input mass against all theoretical [M+H]+ monoisotopic mass values and outputs a list of tryptic fragments that matches the input mass within a given tolerance (in ppm). The mass value for each signature peptide ion was searched against those theoretically derived for the tryptic fragments for all proteins within the custom database composed of all known sequences for all influenza antigens (from all strains and hosts) in addition to a set of chicken albumin proteins (ovotransferrin, R-fe3504

Analytical Chemistry, Vol. 81, No. 9, May 1, 2009

toprotein, serum albumin, ovalbumin, ovalbumin related Y protein, apolipoprotein AI and lipocalin). A summary of such an output is shown in Tables 1 and 2. In the case of a tryptic peptide (residues 120-131) of H1 hemagglutinin of sequence EQLSSVSSFER, its m/z value of 1268.611683 is a unique indicator of this peptide within 1208 H1N1 and H1N2 strains (Tables 1 and 3). Within a 50 ppm mass error, the same mass can be attributed to 30 peptides in total across a range of hemagglutinin types, neuraminidase, non-structural, polymerase and matrix proteins. However, as the accuracy of the mass measurement improves to less than 1 ppm (Table 1, top two entries), only two candidate peptides are possible. One of them

Figure 1. High resolution MALDI mass spectrum of the tryptic digest of the hemagglutinin antigen of the influenza type A strain Beijing/ 262/95 (H1N1).

Figure 3. High resolution MALDI mass spectrum of the tryptic digest of the hemagglutinin antigen of the type B/Victoria/504/00 strain of influenza of the Yam88 lineage.

Figure 2. High resolution MALDI mass spectrum of the tryptic digest of the hemagglutinin antigen of the influenza type A strain Panama/ 2007/99 (H3N2).

is associated with a segment of N8 neuraminidase and is common to a single strain only. Thus below 1 ppm, in high mass accuracy experiments, the peptide at m/z 1268.611683 is a unique identifier or signature of H1 hemagglutinin. A second identifier of H1 hemagglutinin is the presence of a peptide ion at m/z 780.429016 in the spectrum of the whole virus or antigen digest. This peptide comprising residues 114-119 of the H1 hemagglutinin antigen has the amino acid sequence FEIFPK that is conserved across 1336 H1 hemagglutinin known sequences, and within a 5 ppm mass accuracy, is only common in mass to a segment of rearranged sequence in the polymerase antigen (PA) within 11 known sequences (see Table 3). Its identification in addition to the m/z 1268 peptide enables a H1 antigen to be unequivocally identified. Detection of Signature Tryptic Peptides in High Resolution MALDI-MS of Hemagglutinin Digests. Figure 1 illustrates the identification and typing of the hemagglutinin H1 antigen based on the presence of both signature peptide ions following the digestion of the antigen for the type A strain Beijing/262/95 (H1N1). The conserved peptide ions are detected among other tryptic peptides derived from the HA-rich band separated on a

Figure 4. High resolution MALDI mass spectrum of the tryptic digest of the hemagglutinin antigen of the influenza type B Tokyo/53/99 strain of the Vic87 lineage.

polyacrylamide gel with m/z values of 1268.61089 and 780.42895 respectively. These values differ by 0.00079 mass units (or 0.66 ppm) and 0.00007 mass units (or 0.09 ppm) from the theoretical values, respectively. Note that the baseline separation of peaks within the isotopic cluster for all ions is achieved in such high resolution, high mass accuracy experiments (see inset Figure 1). This mass accuracy and resolution enables the signature peptides to be assigned with high confidence and also minimizes the probability that another peptide of a common nominal mass will overlap with the ion signals for the signature peptide. Within a 5 ppm error, it can be seen from Table 2 that peptides at m/z 514.265965 and 593.307520 are identifiers of type H3 hemagglutinin. In only one H3N8 strain does the peptide mass also align to a tryptic peptide within the matrix protein MP1. Widening the mass error to 10 ppm, only two additional peptides Analytical Chemistry, Vol. 81, No. 9, May 1, 2009

3505

Figure 5. High resolution MALDI mass spectrum of the whole virus tryptic digest of the influenza type B strain Tokyo/53/99.

unique to a polymerase for one H6N2 strain and a neuraminidase of a type B strain are found to match the mass of the larger of the two peptides (data not shown). Figure 2 shows the mass spectrum of the tryptic digest products of the hemagglutinin antigen of the A/Panama/2007/99 (H3N2) strain. Both signature peptides are detected at m/z 514.26520 (see enlargement of low m/z region) and 593.31318. These differ by -0.00077 and 0.00593 mass units or -1.49 and +9.54 ppm from the theoretical masses, respectively. The detection of a peptide at m/z 1280.626924 is a unique HA signature of type B influenza while additional signature peptides when also present in a mass spectrum enable the lineage of the strain to be determined. The detection of this peptide is evident in Figures 3 and 4 for both B/Victoria/504/00 and B/Tokyo/53/ 99 strains, respectively. In the case of the Victoria B strain, the peptide is detected at a m/z value of 1280.62678, or 0.00014 mass units or 0.11 ppm lower than the theoretical mass (Figure 3). For the Tokyo strain, the peptide appeared at m/z 1280.62698, or 0.00006 mass units or 0.04 ppm above the theoretical mass (Figure 4). For the B strains it is also possible to distinguish the lineage of the strain. Two lineages of B strains of the virus emerged in the 1970s31 herein referred to as the Vic87 and Yam88 lineages. Among the conserved sequence differences in the HA1 subunit across strains of each lineage are those that result from mutations at residues 201, 202, 208, and 261. The mutation at position 261 involves the substitution of threonine with valine in the tryptic peptide comprising residues 257-264 with sequence TGTI(V/ T)YQR. In strains of the Vic87 lineage, the tryptic peptide contains valine and has the theoretical mass of the protonated peptide ions 939.489378. In the diverged Yam88 lineage this peptide contains

3506

Analytical Chemistry, Vol. 81, No. 9, May 1, 2009

a threonine residue and gives rise to unique signature peptide ions of m/z 937.510113. The mutation at residues 201 of Ala to Lys and common mutation at 202 and 208 of Lys to Asn alters the tryptic cleavage sites in the HA1 subunits of the strains from the different lineages. One missed cleavage product detected, which results from the presence of proline at position 209 that hinders the action of trypsin, is a unique signature tryptic peptide of the Vic87 lineage. It has the sequence LYGDSKPQK across residues 202-211 and a m/z value of 1035.546891. While peptide ions at m/z 939.489378 associated with this lineage might be mistaken for H9 type A influenza (with 189 of the total 328 hits, see Table 2), when the peptide is detected in combination with peptide ions at m/z 1035.546891 (indicative of the Vic87 lineage) and the other signature peptides at m/z 1280.626924 (see Figure 3) it can be confidently assigned as a Vic87-like B strain of the virus. Such assignments can be made for whole virus digests, over separated or recovered antigen. The HR mass spectrum of a whole virus digest of the type B/Tokyo/53/99 strain is shown in Figure 5. While the presence of additional peptide ions associated with other viral proteins complicates the spectrum, the high resolution capabilities of the mass spectrometer enable the signature type B peptides at m/z 939.48854 and 1280.62614 of a strain of the Vic87 lineage to be characterized despite the absence, in this case, of the signature peptide predicted at m/z 1035.546891. CONCLUSIONS Signature tryptic peptides have been identified and detected in the mass spectra of digests of the hemagglutinin antigen or whole virus of influenza that enable the type and subtype or lineage of the strain to be unequivocally assigned. The approach provides a direct and more rapid method with which to type and subtype the influenza virus. Mass spectra are acquired from each sample within seconds that, together with the high loading capacity of the MALDI plate, allows many digested virus samples to typed and subtyped within minutes. ACKNOWLEDGMENT The FT-ICR mass spectrometer was purchased with funds provided by an Australian Research Council Discovery Linkage Infrastructure Equipment Facility (LIEF) Grant (LE0668439) and the University of Sydney. Tryptic digested HA derived from type A influenza strains A/Beijing/262/95 (H1N1) and A/Panama/ 2007/99 (H3N2) were kindly prepared by B. Morrissey. Received for review January 5, 2009. Accepted March 22, 2009. AC900026F