Rapid Identification of Protein Biomarkers of ... - ACS Publications

Mar 16, 2010 - Street, Albany, California 94710, and University of California, San Francisco, School of Medicine, Obstetrics,. Department of Gynecolog...
3 downloads 10 Views 1MB Size
Anal. Chem. 2010, 82, 2717–2725

Rapid Identification of Protein Biomarkers of Escherichia coli O157:H7 by Matrix-Assisted Laser Desorption Ionization-Time-of-Flight-Time-of-Flight Mass Spectrometry and Top-Down Proteomics Clifton K. Fagerquist,*,† Brandon R. Garbus,† William G. Miller,† Katherine E. Williams,‡ Emma Yee,† Anna H. Bates,† Sı´obha´n Boyle,† Leslie A. Harden,† Michael B. Cooley,† and Robert E. Mandrell† Western Regional Research Center, Agricultural Research Service, U.S. Department of Agriculture, 800 Buchanan Street, Albany, California 94710, and University of California, San Francisco, School of Medicine, Obstetrics, Department of Gynecology and Reproductive Sciences, 521 Parnassus, San Francisco, California 94143 Six protein biomarkers from two strains of Escherichia coli O157:H7 and one non-O157:H7, nonpathogenic strain of E. coli have been identified by matrix-assisted laser desorption ionization time-of-flight-time-of-flight tandem mass spectrometry (MALDI-TOF-TOF-MS/MS) and top-down proteomics. Proteins were extracted from bacterial cell lysates, ionized by MALDI, and analyzed by MS/MS. Protein biomarker ions were identified from their sequence-specific fragment ions by comparison to a database of in silico fragment ions derived from bacterial protein sequences. Web-based software, developed inhouse, was used to rapidly compare the mass-to-charge (m/z) of MS/MS fragment ions to the m/z of in silico fragment ions derived from hundreds of bacterial protein sequences. A peak matching algorithm and a p-value algorithm were used to independently score and rank identifications on the basis of the number of MS/MS-in silico matches. The six proteins identified were the acid stress chaperone-like proteins, HdeA and HdeB; the cold shock protein, CspC; the YbgS (or homeobox protein); the putative stress-response protein YjbJ (or CsbD family protein); and a protein of unknown function, YahO. HdeA, HdeB, YbgS, and YahO proteins were found to be modified post-translationally with removal of an N-terminal signal peptide. Gene sequencing of hdeA, hdeB, cspC, ybgS, yahO, and yjbJ for 11 strains of E. coli O157:H7 and 7 strains of the “near-neighbor” serotype O55:H7 revealed a high degree sequence homology between these two serotypes. Although it was not possible to distinguish O157:H7 from O55:H7 from these six biomarkers, it was possible to distinguish E. coli O157:H7 from a nonpathogenic E. coli by top-down proteomics of the YahO and YbgS. In the case of the YahO protein, a single amino acid residue substitution in its sequence (resulting in a molecular weight difference of only 1 Da) was sufficient to distinguish E. coli O157:H7 from a non-O157:H7, non-

pathogenic E. coli by MALDI-TOF-TOF-MS/MS, whereas this would be difficult to distinguish by MALDI-TOF-MS. Finally, a protein biomarker ion at m/z ∼9060 observed in the MS spectra of non-O157:H7 E. coli strains but absent from MS spectra of E. coli O157:H7 strains was identified by top-down analysis to be the HdeB acid stress chaperone-like protein consistent with previous identifications by gene sequencing and bottom-up proteomics. An increasing area of interdisciplinary research involving analytical science and microbiology is the rapid identification and characterization of bacterial microorganisms. At the forefront of this research are mass spectrometry (MS)-based techniques.1-3 Because of its high specificity and sensitivity, MS has been used to identify/classify bacteria based on detection and/or identification of biomolecules that are characteristic of the microorganism either individually or in combination. Such biomolecules may be small molecules,4 proteins,5-36 or nucleic acids.37 Identification/ classification of bacteria by MS analysis of expressed proteins has (1) (2) (3) (4) (5) (6) (7) (8)

(9) (10) (11) (12) (13)

* To whom correspondence should [email protected]. † U.S. Department of Agriculture. ‡ University of California, San Francisco.

be

addressed.

10.1021/ac902455d  2010 American Chemical Society Published on Web 03/16/2010

E-mail: (14) (15)

Demirev, P. A.; Fenselau, C. J. Mass Spectrom. 2008, 43, 1441–1457. Fenselau, C.; Demirev, P. A. Mass Spectrom. Rev. 2001, 20, 157–171. Lay, J. O., Jr. Mass Spectrom. Rev. 2001, 20, 172–194. Claydon, M. A.; Davey, S. N.; Edwards-Jones, V.; Gordon, D. B. Nat. Biotechnol. 1996, 14, 1584–1586. Cain, T. C.; Lubman, D. M.; Weber, W. J., Jr. Rapid Commun. Mass Spectrom. 1994, 8, 1026–1030. Krishnamurthy, T.; Ross, P. L.; Rajamani, U. Rapid Commun. Mass Spectrom. 1996, 10, 883–888. Krishnamurthy, T.; Ross, P. L. Rapid Commun. Mass Spectrom. 1996, 10, 1992–1996. Holland, R. D.; Wilkes, J.G,; Rafii, F.; Sutherland, J. B.; Persons, C. C.; Voorhees, K. J.; Lay, J. O., Jr. Rapid Commun. Mass Spectrom. 1996, 10, 1227–1232. Holland, R. D.; Duffy, C. R.; Rafii, F.; Sutherland, J. B.; Heinze, T. M.; Holder, C. L.; Voorhees, K. J.; Lay, J. O., Jr. Anal. Chem. 1999, 71, 3226–3230. Arnold, R.; Reilly, J. Rapid Commun. Mass Spectrom. 1998, 12, 630–636. Welham, K.; Domin, M.; Scannell, D.; Cohen, E.; Ashton, D. Rapid Commun. Mass Spectrom. 1998, 12, 176–180. Haag, A.; Taylor, S.; Johnston, K.; Cole, R. J. Mass Spectrom. 1998, 33, 750–756. Wang, Z.; Russon, L.; Li, L.; Roser, D.; Long, S. R. Rapid Commun. Mass Spectrom. 1998, 12, 456–464. Dai, Y.; Li, L.; Roser, D.; Long, S. R. Rapid Commun. Mass Spectrom. 1999, 13, 73–78. Ramirez, J.; Fenselau, C. J. Mass Spectrom. 2001, 36, 929–936.

Analytical Chemistry, Vol. 82, No. 7, April 1, 2010

2717

the advantage that the analyte being detected is linked to the genetic identity of the microorganism. The most widely used MSbased approach for protein-taxonomic bacterial classification is matrix-assisted laser desorption ionization (MALDI) time-of-flight mass spectrometry (TOF-MS).38,39 Two reasons for its wide use are relatively simple sample preparation and extremely rapid data acquisition and analysis. The only significant disadvantage is that the number of proteins detected by MALDI is typically less than 100 which is a small fraction of the 2000-5000 putative proteins present in bacterial genomes. Amazingly, MALDI-MS detection of this relatively small subset of the bacterial proteome has been shown to be sufficient for bacterial identification/classification at the genus, species, subspecies, and in some cases, strain level differentiation.5-32 Other MS-based techniques, most notably liquid chromatography (LC) coupled with electrospray ionization mass spectrometry (ESI-MS) have demonstrated detection of as many as ∼500 proteins with a concomitant increase in taxonomic resolution and the ability to differentiate very closely related strains of bacteria.33-36 However, LC separation is labor intensive and time-consuming requiring multiple replicates, e.g. five 2 h LC runs. In addition, the highly charged protein ions generated by ESI(16) Whiteaker, J.; Karns, J.; Fenselau, C.; Perdue, M. L. Foodborne Pathog. Dis. 2004, 1, 185–194. (17) Jarmon, K. H.; Cebula, S. T.; Saenz, A. J.; Petersen, C. E.; Valentine, N. B.; Kingsley, M. T.; Wahl, K. L. Anal. Chem. 2000, 72, 1217–1223. (18) Wahl, K. L.; Wunschel, S. C.; Jarman, K. H.; Valentine, N. B.; Petersen, C. E.; Kingsley, M. T.; Zartolas, K. A.; Saenz, A. J. Anal. Chem. 2002, 74, 6191–6199. (19) Wunschel, S. C.; Jarman, K. H.; Petersen, C. E.; Valentine, N. B.; Wahl, K. L.; Schauki, D.; Jackman, J.; Nelson, C. P.; White, E. J. Am. Soc. Mass Spectrom. 2005, 16, 456–462. (20) Demirev, P. A.; Ho, Y.-P.; Ryzhov, V.; Fenselau, C. Anal. Chem. 1999, 71, 2732–2738. (21) Peneda, F. J.; Lin, J. S.; Fenselau, C.; Demirev, P. A. Anal. Chem. 2000, 72, 3739–3744. (22) Demirev, P. A.; Lin, J. S.; Peneda, F. J.; Fenselau, C. Anal. Chem. 2001, 73, 4566–4573. (23) Yao, Z.-P.; Demirev, P. A.; Fenselau, C. Anal. Chem. 2002, 74, 2529–2534. (24) Peneda, F. J.; Antoine, M. D.; Demirev, P. A.; Feldman, A. B.; Jackman, J.; Longenecker, M.; Lin, J. S. Anal. Chem. 2003, 75, 3817–3822. (25) Mandrell, R. E.; Harden, L. A.; Bates, A. H.; Miller, W. G.; Haddon, W. F.; Fagerquist, C. K. Appl. Environ. Microbiol. 2005, 71, 6292–6307. (26) Fagerquist, C. K.; Miller, W. G.; Harden, L. A.; Bates, A. H.; Vensel, W. H.; Wang, G.; Mandrell, R. E. Anal. Chem. 2005, 77, 4897–4907. (27) Fagerquist, C. K.; Bates, A. H.; Heath, S.; King, B. C.; Garbus, B. R.; Harden, L. A.; Miller, W. G. J. Proteome Res. 2006, 5, 2527–2538. (28) Fagerquist, C. K.; Yee, E.; Miller, W. G. Analyst 2007, 132, 1010–1023. (29) Fagerquist, C. K. J. Proteome Res. 2007, 6, 2539–2549. (30) Donohue, M. J.; Smallwood, A. W.; Pfaller, S.; Rodgers, M.; Shoemaker, J. A. J. Microbiol. Methods 2006, 65, 380–389. (31) Mazzeo, M. F.; Sorrentino, A.; Gaita, M.; Cacace, G.; Di Stasio, M.; Facchiano, A.; Comi, G.; Malorni, A.; Siciliano, R. A. Appl. Environ. Microbiol. 2006, 72, 1180–1189. (32) Pierce, C. Y.; Barr, J. R.; Woolfitt, A. R.; Moura, H.; Shaw, E. I.; Thompson, H. A.; Massung, R. F.; Fernandez, F. M. Anal. Chim. Acta 2007, 583, 23– 31. (33) Williams, T. L.; Leopold, P.; Musser, S. Anal. Chem. 2002, 74, 5807–5813. (34) Williams, T. L.; Monday, S. R.; Edelson-Mammel, S.; Buchanan, R.; Musser, S. M. Proteomics 2005, 5, 4161–4169. (35) Williams, T. L.; Monday, S. R.; Feng, P. C. H.; Musser, S. M. J. Biomol. Tech. 2005, 16, 134–142. (36) Williams, T. L.; Musser, S. M.; Nordstrom, J. L.; DePaola, A.; Monday, S. R. J. Clin. Microbiol. 2004, 42, 1657–1665. (37) Ecker, D. J.; Sampath, R.; Massire, C.; Blyn, L. B.; Hall, T. A.; Eshoo, M. W.; Hofstadler, S. A. Nat. Rev. Microbiol. 2008, 6, 553–558. (38) Karas, M.; Bachmann, D.; Bahr, U.; Hillenkamp, F. Int. J. Mass Spectrom. Ion Processes 1987, 78, 53–68. (39) Tanaka, K.; Ido, Y.; Akita, S.; Yoshida, Y.; Yoshida, T. 2nd Japan-China Joint Symposium Mass Spectrometry, Osaka, Japan, 1987; Abstract p 185188.

2718

Analytical Chemistry, Vol. 82, No. 7, April 1, 2010

MS requires charge-state envelope deconvolution of the data which is also time-consuming. In contrast, MALDI generates primarily singly charged ions; therefore, deconvolution of MALDI-TOF-MS data is not necessary. Acquisition of multiple replicates by MALDITOF-MS is very rapid requiring seconds as opposed to hours by the LC-ESI-MS approach. A common requirement of proteo-centric/ MS-based microbial identification approaches is the need to culture the bacteria prior to analysis in order to have sufficient protein for detection by mass spectrometry. In effect, culturing is a protein amplification step. Another benefit of culturing is that it demonstrates the viability of the microorganism(s). Bacterial identification/classification by MALDI-TOF-MS has taken two general approaches toward data analysis. The first approach is pattern recognition which relies upon identification of an unknown bacterial strain by comparing its MS spectrum to adatabaseofMSspectraofknownreferencebacterialstrains.17-19,25-27 These reference strains have been previously identified by a microbiological, biochemical, genetic, or genomic analysis. The protein biomarker ion peaks in an MS spectrum represent a “fingerprint” of the microorganism. In this approach, the actual identity of any individual protein biomarker ion is not important; only the pattern or profile comprising all of the protein biomarker ions is critical. Many researchers have devised their own in-house software to perform this kind of analysis.17-19,25,32 In addition, a number of instrument and software manufacturers have developed commercial software and MS reference databases for microbial identification that incorporates principal component analysis (PCA) and/or cluster analysis algorithms for pattern recognition identification. A weakness of the pattern recognition approach is the reproducibility of the MS spectrum due to variations in culturing, sample preparation, MALDI matrix and instrument tuning, sensitivity, and overall performance.19 The second approach for bacterial identification involves exploiting the information contained in bacterial genomic databases.20-24 This approach compares the mass-to-charge (m/ z) of protein biomarker ions in a spectrum against the theoretical molecular weights (MW) of putative proteins (open reading frames) in bacterial genomic databases. A relatively high number of “matches” between the protein MWs of a particular microbial genome and the m/z of protein biomarker ions would indicate the identity of the microorganism. Typically, a significance testing algorithm (p-value) is used to rank/score identifications.20-24 This bioinformatics approach comes with several caveats. First, the microorganism being identified must have been sequenced genomically (or at least a closely related strain). Second, the accuracy of the genomic information is critical. Third, for posttranslationally modified (PTM) proteins, there can be a significant difference between the theoretical MW found in a genomic database and the MW of the mature, functional protein. The simplest PTM among bacterial proteins is N-terminal methionine removal, and there is predictive rule governing this PTM that is dependent upon the penultimate residue.40-42 Although there can be exceptions to this rule,26,27 it is a fairly good predictor of this PTM and has been incorporated into the in-house developed (40) Hirel, P. H.; Schmitter, J. M.; Dessen, P.; Fayet, G.; Blanquet, S. Proc. Natl. Acad. Sci. U.S.A. 1989, 86, 8247–8251. (41) Gonzalez, T.; Baudouy, J. J. FEMS Microbiol. Rev. 1996, 18, 319–334. (42) Solbiati, J.; Chapman-Smith, A.; Miller, J.; Miller, Ch.; Cronan, J., Jr. J. Mol. Biol. 1999, 290, 607–614.

bioinformatics algorithm for microorganism identification.22 More complicated PTMs (e.g., removal of signal peptides, methylation, phosphorylation, glycosylation, etc.) are less easily predicted and must be determined experimentally or inferred from sequence homology to a protein whose PTMs have been previously confirmed experimentally. With the development of tandem time-of-flight mass spectrometers (TOF-TOF) for MALDI analysis,43,44 it has become possible to rapidly acquire MS and MS/MS data of intact proteins45 and to exploit this analysis for identification of microorganisms from bacterial cells or cell lysates.46,47 Although singly charged (protonated) proteins do not fragment with as high an efficiency as peptides, it has been shown that modest-sized proteins under ∼20 kDa may fragment on the microsecond time scale of tandem TOF instruments. These polypeptide fragmentations often occur at sites adjacent to aspartic acid (D), glutamic acid (E), and proline (P) residues.45 The fragmented proteins result in sequence-specific fragment ions which can be compared to a database of in silico fragment ions generated from the amino acid sequences of bacterial proteins derived from genomic sequencing databases.46,47 In this approach, MS/MS fragment ions are compared against the in silico fragment ions of bacterial proteins whose nominal mass is the same as the uncharged protein biomarker ion. A relatively high number of “matches” to a particular bacterial protein sequence indicate the identity of the protein and implicitly of the source microorganism. Demirev and co-workers were the first to successfully to use a top-down proteomic approach for identification of Bacillus atrophaeus and B. cereus spores with MALDI-TOF-TOF-MS/MS and software, developed in-house, that incorporated a p-value significance-testing algorithm to rank/score identifications.46 Very recently, Fagerquist et al. reported development of a webbased software for the rapid identification of food-borne pathogens from MS/MS of protein biomarkers using MALDI-TOF-TOF mass spectrometry.47 The software incorporated both a peak matching algorithm as well as Demirev’s p-value algorithm. The software and algorithms were tested against protein biomarkers from species/strains of Campylobacter that had previously been identified by bottom-up proteomics. An additional innovation of the software was the ability to compare MS/MS fragment ions to residue-specific in silico fragment ions, i.e., in silico fragment ions generated from fragmentation of the polypeptide backbone at sites adjacent to specific amino acid residues, e.g., D, E, or P residues, etc. A residue-specific comparison often enhanced the top identification score (i.e., correct identification) relative to the “runnerup” identification scores when compared to the score differential obtained from a nonresidue-specific comparison. In addition, it was shown that it was possible to confirm the algorithm identification from the systematic fragment error generated by a correct algorithm identification.47 (43) Medzihradszky, K. F.; Campbell, J. M.; Baldwin, M. A.; Falick, A. M.; Juhasz, P.; Vestal, M. L.; Burlingame, A. L. Anal. Chem. 2000, 72, 552–558. (44) Suckau, D.; Resemann, A.; Schuerenberg, M.; Hufnagel, P.; Franzen, J.; Holle, A. Anal. Bioanal. Chem. 2003, 376, 952–965. (45) Lin, M.; Campbell, J. M.; Mueller, D. R.; Wirth, U. Rapid Commun. Mass Spectrom. 2003, 17, 1809–1814. (46) Demirev, P. A.; Feldman, A. B.; Kowalski, P.; Lin, J. S. Anal. Chem. 2005, 77, 7455–7461. (47) Fagerquist, C. K.; Garbus, B. R.; Williams, K. E.; Bates, A. H.; Boyle, S.; Harden, L. A. Appl. Environ. Microbiol. 2009, 75, 4341–4353.

In the current study, we have extended this top-down proteomic approach to protein/microorganism identification of two pathogenic E. coli O157:H7 strains and one non-O157:H7, nonpathogenic E. coli strain. Using this approach, we have identified protein biomarkers whose amino acid sequences for the O157: H7 strains are different from the nonpathogenic/non-O157:H7 E. coli strain. We also report a confirmatory top-down proteomic identification of a protein biomarker that is present in MALDITOF-MS spectra of non-O157:H7 strains which is absent from MALDI-TOF-MS spectra of O157:H7 strains. Portions of this work were presented at the 57th American Society of Mass Spectrometry Conference May 31-June 4, 2009 in Philadelphia, PA48 and at the 238th American Chemical Society National Meeting August 16-20, 2009 in Washington, DC.49 EXPERIMENTAL SECTION Cultivation of E. coli Strains. E. coli strains were preserved at -80 °C in Microbank Microbial Preservation System beads (Pro-Lab Diagnostics, Richmond Hill, Ontario, Canada) and propagated on LB agar (Becton Dickinson, Sparks MD). The inoculated agar plates were incubated overnight at 37 °C in normal atmosphere. Select protein biomarkers from two strains of E. coli O157:H7 and one nonpathogenic, non-O157:H7 E. coli strain were identified by top-down proteomics. Safety considerations: E. coli strains were prepared in a class 2 biohazard safety cabinet using appropriate precautions. The genomically sequenced strain E. coli O157:H7 strain EDL93350 (designated RM1272 in our strain collection) was isolated originally from raw hamburger associated with a hemorrhagic colitis outbreak in 1982 and was generously provided by Jim Keen (USDA, Clay Center, NE). Another E. coli O157:H7 strain (designated RM5603) was isolated from a water sample taken from the Salinas River (Salinas, California) in 2006.51 RM5603 is negative for shiga toxin 1 (stx1) and positive for shiga toxin 2 (stx2), intimin (eae), and hemolysin (hly) by polymerase chain reaction (PCR). RM5603 was also analyzed by multilocus variable-number tandem repeat analysis (MLVA).51 The nonO157:H7 E. coli strain (designated RM3061) was isolated by Dan Mills from Romaine lettuce as part of the USDA-AMS Microbial Data Program microbiological survey of produce conducted in 2002. RM3061 was analyzed with a Biolog Microstation (Hayward, CA) and identified as a non-O157:H7 E. coli. RM3061 is also negative for shiga toxin 1 (stx1), shiga toxin 2 (stx2abc, stx2ex, stx2f), intimin (eae), hemolysin (ehly), and subtilase (subA) by PCR. Genetic Sequencing of E. coli Protein Biomarkers. The genes of six protein biomarkers (hdeA, hdeB, cspC, ybgS, yahO, and yjbJ), identified by top-down proteomics from RM1272, (48) Fagerquist, C. K.; Garbus, B. R.; Williams, K. E.; Bates, A. H.; Boyle, S.; Harden, L. A.; Miller, W. G.; Mandrell, R. E. Proceedings of 57th ASMS Conference on Mass SPectrometry and Allied Topics, Philadelphia, PA, May 31-June 4, 2009. (49) Fagerquist., C. K. National American Chemical Society Meeting, Washington, DC, August 16-20, 2009. (50) Perna, N. T.; Plunkett, G.; Burland, V.; Mau, B.; Glasner, J. D.; Rose, D. J.; Mayhew, G. F.; Evans, P. S.; Gregor, J.; Kirkpatrick, H. A.; Po´sfai, G.; Hackett, J.; Klink, S.; Boutin, A.; Shao, Y.; Miller, L.; Grotbeck, E. J.; Davis, N. W.; Lim, A.; Dimalanta, E. T.; Potamousis, K. D.; Apodaca, J.; Anantharaman, T. S.; Lin, J.; Yen, G.; Schwartz, D. C.; Welch, R. A.; Blattner, F. R. Nature 2001, 409, 529–533. (51) Cooley, M.; Carychao, D.; Crawford-Miksza, L.; Jay, M. T.; Myers, C.; Rose, C.; Keys, C.; Farrar, J.; Mandrell, R. E. PLoS One 2007, 2 (11), e1159.

Analytical Chemistry, Vol. 82, No. 7, April 1, 2010

2719

RM5603, and RM3061 strains, were genetically sequenced in 11 E. coli O157:H7 strains, 7 O55:H7 strains, 9 O55:H6 strains, 2 O55: HN strains, 1 O55 strain, and 1 non-O157:H7 strain. The gene sequences were submitted to GenBank (NCBI), and their accession numbers are provided in Table S-1 in the Supporting Information). The (5′-3′) primers used to sequence these genes were as follows: cspCF AGTGGACAGGAAAAMGACATGACAA; cspCR AACAGATTTCGGGTTAAACGAGGTA; hdeAF CGTTCACCAGTGACAAAGCCGC; hdeAR AACGCGTSTAAGAATGCAGTCGATT; hdeBF TGCCGCGCATTTGCCTTTC; hdeBR CCCAGCTATCGTTCAGGCTTGTACT; yahOF ATCGGGTGCGARAGAGATCACAA, yahOR GCACTGGAGAAATTTAAWGGCGATA; ybgSF ACAACAAGCGACGAGCATTATTATC; ybgSR AACGCCAGATTGTCGCGAAAAAC; yjbJF TTGCGCGGGCTTTCTCTGG; yjbJR TGTGATTGAAGCACATGGRCTCTG. Escherichia coli genomic DNA was amplified using a Tetrad thermocycler (Bio-Rad, Hercules, CA) with the following settings: 30 cycles of 94 °C for 30 s, 53 °C for 30 s, and 72 °C for 2 min. Each amplification reaction of 50 µL contained 50 ng genomic DNA, 1× MasterAmp PCR buffer (Epicenter, Madison, WI), 1× MasterAmp PCR enhancer (Epicenter), 2.5 mM MgCl2, 250 µM (each) dNTPs, 100 pmol of each primer, and 1 U Taq polymerase (New England Biolabs, Beverly, MA). Amplicons were purified on a BioRobot 8000 workstation (Qiagen, Valencia, CA). Cycle sequencing reactions were performed on a Tetrad thermocycler using the ABI PRISM BigDye terminator cycle sequencing kit (version 3.1; Applied Biosystems, Foster City, CA) and standard protocols. Cycle sequencing extension products were purified using BigDye XTerminator (Applied Biosystems). DNA sequencing was performed on an ABI PRISM 3730 DNA Analyzer (Applied Biosystems) using POP-7 polymer and ABI PRISM Genetic Analyzer Data Collection and ABI PRISM Genetic Analyzer Sequencing Analysis software. E. coli Protein Biomarker Extraction and Analysis. Bacterial cells were harvested and their proteins extracted as described in detail previously.25-27 Briefly, a 1 µL loop of bacterial cells were bead-beat for 1 min in 0.5 mL of extraction solution (67% water, 33% acetonitrile and 0.1% TFA) with 40 mg of 0.1 mm zirconia/silica beads (BioSpec Products Inc., Bartlesville, OK). After bead-beating, the sample was centrifuged at 10 000 rpm for 5 min. MALDI-TOF-MS and MALDI-TOF-TOF-MS/MS Analysis. Cell lysate/protein extract samples were analyzed using a 4800 TOF-TOF mass spectrometer (Applied Biosystems, Foster City, CA) as described in detail in a previous report.47 Briefly, equal aliquot volumes of sample extraction supernatant and a saturated solution of MALDI matrix were mixed. A 0.5 µL aliquot of this mixture was deposited onto a stainless steel target having 384 target spots. Two MALDI matrixes were utilized for this study: 3,5-dimethoxy-4-hydroxycinnamic acid (sinapinic acid, Mr ) 224.21 Da) and R-cyano-4-hydroxycinnamic acid (HCCA, Mr ) 189.17 Da). The mass spectrometer is equipped with a 200 Hz pulsed solid-state YAG laser (λ ) 355 nm). Samples were analyzed in both linear (MS analysis) and reflectron-modes (MS/MS analysis). The instrument was operated for detection of only positively charged ions. The instrument was externally calibrated in linear mode using 2720

Analytical Chemistry, Vol. 82, No. 7, April 1, 2010

bovine insulin (MW ) 5733.58 Da), E. coli thioredoxin (MW ) 11 673.47 Da), and horse heart apomyoglobin (MW ) 16 951.55 Da). The instrument was externally calibrated in reflectron-mode using the y-type fragment ions of glu1-fibrinopeptide B (MW ) 1570.60) at m/z 175.120 and 1441.635. For MS analysis, laser desorbed ions were accelerated from the source at 20 kV. Ions were separated over an effective field free length of 1.5 m before striking a multichannel plate detector operated at 2.190 kV. For MS/MS analysis, ions were accelerated from the source at 8.0 kV. Both modes employed delayed ion extraction for improved ion focusing. Ions were separated in a field-free region between the source and the collision cell. A timed ion selector (TIS), positioned before the collision cell, deflects ions on the basis of their arrival time so that only ions with specific m/z pass through the TIS. The TIS window was typically operated at ±100 Da for MS/MS analysis. Ions passing through the TIS were decelerated to 1.70 kV prior to entry into a floating collision cell at 2.0 kV. Ions were fragmented by postsource dissociation (PSD) at higher than normal laser fluence. Ions were then reaccelerated to 15 kV. An ion gate, after the second acceleration region, was used to suppress the unfragmented protein ion signal. A two-stage reflectron mirror assembly (mirror 1, 10.910 kV; mirror 2, 18.750 kV) separates and deflects the fragment ions toward the reflectron multichannel plate detector also operated at 2.190 kV. MS data was collected and summed from 1000-2000 laser shots which provided excellent signal-to-noise (S/N) ratio. Because of the low fragmentation efficiency of singly charged protein ions, MS/MS data was collected and summed from 30 000 to 40 000 laser shots in order to improve the signal-to-noise (S/N) ratio. At a 200 Hz laser repetition rate, this required 2-3 min of data acquisition from a single spot. The ultrafast MALDI target rastering allowed higher than normal laser fluence, for protein ion fragmentation, without exhausting the sample spot. MS and MS/MS data were processed using the instrument software (Data Explorer Software, version 4.9). MS data were subjected to noise filtering (correlation factor ) 0.7). MS/MS data were subjected to an advanced baseline correction (peak width ) 32, flexibility ) 0.5, degree ) 0.1) followed by noise removal (std deviation ) 2) and Gaussian smooth (filter width ) 31 points). Processed and centroided MS and MS/MS spectra were exported as an ASCII file (m/z vs absolute intensity). MS and MS/MS ASCII files were uploaded to their respective tables in the USDA in-house database. Data Analysis Software. The web-based software, developed in-house, has been described in detail.47 Briefly, the software was written in JAVA and uses a Tomcat Apache (TA) web server and Java Server Pages (JSP). The data analysis software uses the MySQL Database Management System. The software rapidly compares the m/z ratio of MS/MS fragment ions to the m/z of in silico fragment ions from hundreds of bacterial protein sequences within a preset m/z tolerance. In addition, only those bacterial proteins having a molecular weight (MW) within a prespecified range (e.g., ±10 Da) to that of the uncharged protein biomarker ion are selected for comparison. The number of matches between a MS/MS fragment ions and the in silico fragment ion sequence are used to score/rank an identification. Protein/microorganism identification involves the use of two

Figure 1. MS spectrum of the extracted cell lysate of E. coli O157:H7 strain RM1272 (EDL933) using the HCCA matrix.

independent algorithms for scoring/ranking: a simple peak matching algorithm developed at the USDA and a more complicated p-value algorithm reported previously by Demirev et al.46 The software also allows the comparison of MS/MS fragment ions to nonresidue-specific in silico fragment ions (i.e., all in silico fragment ions) or to residue-specific in silico fragment ions, e.g., D-, E-, P-, or D-specific in silico fragment ions. Singly charged (protonated) proteins are more likely to fragment at aspartic acid (D), glutamic acid (E), or proline (P) residues than at other amino acid residues. In silico databases were constructed from bacterial protein MW searches performed using the TagIdent software at the ExPASy Web site (http://ca.expasy.org/tools/tagident.html). Searches were conducted against the UniProtKB/Swiss-Prot (versions 56.2-56.3) and UniProtKB/TrEMBL (versions 39.2-39.3) databases. Search criteria were as follows: protein biomarker MW ± 5 Da (i.e., the uncharged protein biomarker ion); taxonomy, bacteria; protein pI ) 0.00-14.00. The search generates a single, multiprotein sequence FastA file which was downloaded from the ExPASy Web site and batched processed using a β-version (8.01a5) of GPMAW software (Lighthouse Data, Denmark). GPMAW software converts each protein sequence into an individual file containing taxonomic classification of the microorganism, protein name, its amino acid sequence, its average MW, and in silico fragment ions identified by their average m/z, ion type (a, b, b-18, y, y-17, and y-18) and adjacent amino acid residues at the site of polypeptide backbone cleavage. These in silico files were batch uploaded to their respective databases within the USDA software.47 RESULTS AND DISCUSSION Genetic Sequencing of E. coli Protein Biomarkers. Table S-2 in the Supporting Information summarizes the E. coli strains and genes that were genetically sequenced: hdeA, hdeB, cspC, ybgS, yahO, and yjbJ. Identical nucleotide gene sequences (NT) were assigned the same arbitrary number (e.g., 1, 2, 3, etc.).

Identical protein amino acid sequences were likewise assigned identical arbitrary numbers. In some cases, different nucleotide sequences translate to the same amino acid sequence. For example, hdeB genes type 1 and 3 both translate to amino acid sequence HdeB type 1. The sequences uploaded to our in silico database for comparison are indicated in bold. The amino acid sequences are given in Table S-3 in the Supporting Information where residue variations between different sequences are boxed. E. coli O157:H7 Strain RM1272 (EDL933). Figure 1 shows the MALDI-TOF-MS spectrum of the extracted cell lysate of the E. coli O157:H7 strain RM1272 (EDL933) using the HCCA MALDI matrix. Figure 2 shows the MALDI-TOF-TOFMS/MS spectrum of the protein biomarker ion at m/z 7705.6 shown in Figure 1. Prominent fragment ions are identified by their m/z, type/number, and amino acid residues adjacent to the site of polypeptide backbone cleavage of the protein sequence of the top scoring identification (Table 1). Many of the fragment ions are the result of polypeptide cleavage adjacent to aspartic acid (D) and/or glutamic acid (E) residues. Table 1 shows the top identification scores of the MS/MS spectrum in Figure 2. The top scoring identification of both the USDA peak matching algorithm and the p-value algorithm is the putative uncharacterized YahO protein whose the amino acid sequence is identical for the O157:H7 and O55:H7 serotypes. E. coli O55:H7 is a “near neighbor” of the more pathogenic O157:H7 serotype.52,53 The O55:H7 serotype is most often linked to infantile diarrhea. The second ranked identification is the YahO of non-O157:H7, nonpathogenic E. coli strains RM3061 and K-12. Thus, it was possible to distinguish pathogenic from nonpathogenic E. coli strains using this protein biomarker. The peak matching algorithm is three times faster, (52) Wick, L. M.; Qi, W.; Lacher, D. W.; Whittam, T. S. J. Bacteriol. 2005, 187, 1783–1791. (53) Zhou, Z.; Li, X.; Liu, B.; Beutin, L.; Xu, J.; Ren, Y.; Feng, L.; Lan, R.; Reeves, P. R.; Wang, L. PLoS One 2010, 5 (1), e8700.

Analytical Chemistry, Vol. 82, No. 7, April 1, 2010

2721

Figure 2. MS/MS spectrum of the protein biomarker ion at m/z 7705.6 (Figure 1). Prominent fragment ions are identified by their m/z, type/number, and amino acid residues adjacent to the site of polypeptide backbone cleavage of the sequence of the top identification from Table 1: YahO. Table 1. Top Six Identification Scores of a Protein Biomarker from E. coli O157:H7 Strain RM1272 (EDL933) Observed at m/z 7705.6 (Figure 1) and Analyzed by MS/MS (Figure 2) and Top-Down Proteomics Using a Nonresidue-Specific in Silico Fragment Ion Comparisona in silico ID

identifier

26947

>tr|Q8X699|Q8X699_ECO57

43989

>]|WGM|WGM_PSMRU_5A

43962

>]|WGM|WGM_PSMRU_1A

26281

>sp|P75694|YAHO_ECOLI

43983

>]|WGM|WGM_PSMRU_2A

25925

tr|B3G283|B3G283_PSEAE

USDA score

p-value

putative uncharacterized protein YahO PTM-21SigPep 7707.62 YahO protein PTM 21-SigPep 7707.62 YahO protein PTM-21SigPep 7707.62 UPF0379 protein YahO PTM-21SigPep 7706.64 YahO protein PTM-21SigPep 7706.64

66.30

4.3 × 10-18

66.30

4.3 × 10-18

66.30

4.3 × 10-18

63.04

9.0 × 10-16

63.04

9.0 × 10-16

putative uncharacterized protein 7709.01

44.57

4.8 × 10-5

sample name

protein

Escherichia coli O157:H7 (strain EDL933) Escherichia coli O157:H7 (strain RM5603) Escherichia coli O55:H7 (strain RM2057) Escherichia coli (strain K-12) Non-O157:H7 Escherichia coli (strain RM3061) Pseudomonas aeruginosa

a MS/MS to in silico comparison parameters: intensity threshold, 2%; no. of MS/MS peaks with intensity g2%, 92; m/z range for comparison, 0-14 000 Th; fragment ion tolerance, 2.5 Th; protein MW, 7705 ± 10 Da; no. of bacterial proteins, 1323; all in silico fragment ions compared; “PTM N-Met” indicates that the in silico protein sequence was modified to remove the N-terminal methionine; “PTM #SigPep” indicates that the in silico protein sequence was modified to remove a signal peptide. Algorithm computation times: USDA peak matching algorithm, 35.2 s;p-value, 98.3 s.

in computation speed, than the p-value calculation. Figure 3 shows the amino acid sequence of YahO of E. coli O157:H7 strains RM1272 (EDL933) and RM5603 and E. coli O55:H7 strain RM2057. Figure 3 also shows the YahO of the nonpathogenic E. coli strains K-12 and RM3061. The mature YahO has a 21-residue N-terminal signal peptide. The three sequences have amino acid variations at residue 14 (F T L) in the signal peptide which does not affect the MW of the mature protein. Another substitution at residue 86 (D T N) results in a MW difference in the mature protein of 1 Da. A 1 Da difference is difficult to detect by MALDI-TOF-MS; however, sequencespecific fragmentation by MALDI-TOF-TOF-MS/MS and topdown analysis allows this slight difference to be detected. Table 2 shows the top scoring identifications of the protein biomarker at m/z 9737.5 (Figure 1) of E. coli O157:H7 strain RM1272 (EDL933) analyzed by MS/MS and top-down analysis. 2722

Analytical Chemistry, Vol. 82, No. 7, April 1, 2010

The top protein identification is the acid stress chaperone-like protein: HdeA. The top identification score is significantly higher than the top incorrect identification which is indicative of the high quality of the MS/MS spectrum. Unfortunately, identification of the source microorganism is limited due to full sequence homology of HdeA across E. coli O157:H7, O55:H7, K-12, and RM3061 strains as well as Shigella flexnari. HdeA has a 21-residue N-terminal signal peptide and two cysteine residues suggesting a disulfide bridge in the mature protein.54 Table S-4 in the Supporting Information shows the top scoring identifications of the protein biomarker ion at m/z 10471.7 (Figure 1) of E. coli O157:H7 strain RM1272 (EDL933) analyzed by MS/ MS and top-down analysis. The top protein identification (homeobox protein or YbgS) of both the USDA peak matching (54) Fagerquist, C. K.; Garbus, B. R.; Williams, K. E.; Bates, A. H.; Harden, L. A. J. Am. Soc. Mass Spectrom. 2010, in press.

Figure 3. Amino acid sequence of the YahO protein of pathogenic E. coli O157:H7 strains EDL933 and RM5603 and E. coli O55:H7 strain RM2057 and the non-O157:H7, nonpathogenic E. coli strains K-12 and RM3061. Proteins are post-translationally modified with a 21-residue signal peptide (in outline). Variations in residues between the three proteins are boxed. The D T N amino acid substitution results in a molecular weight difference of 1 Da in the mature protein.

Table 2. Top Seven Identification Scores of a Protein Biomarker from E. coli O157:H7 strain RM1272 (EDL933) Observed at m/z 9737.5 (Figure 1) and Analyzed by MS/MS and Top-Down Proteomics Using a Nonresidue-Specific in Silico Fragment Ion Comparisona in silico ID

identifier

24512

>sp|P0AET0|HDEA_ECO57

43991

>]|WGM|WGM_PSMRU_5C

43979

>]|WGM|WGM_PSMRU_1C

24511

>sp|P0AES9|HDEA_ECOLI

43985

>]|WGM|WGM_PSMRU_2C

24513

>sp|P0AET1|HDEA_SHIFL

25275

>tr|B3E2H3|B3E2H3_GEOLS

24594

>tr|Q7 VNM3|Q7 VNM3_HAEDU

USDA Score

p-value

chaperone-like protein HdeA PTM-21SigPep 9738.91 HdeA acid stress chaperone-like protein PTM_21SigPep 9738.91 HdeA acid stress chaperone protein PTM-21SigPep 9738.91 chaperone-like protein HdeA PTM-21SigPep 9738.91 HdeA acid resitance chaperone protein PTM-21SigPep 9738.91

52.50

2.7 × 10-8

52.50

2.7 × 10-8

52.50

2.7 × 10-8

52.50

2.7 × 10-8

52.50

2.7 × 10-8

chaperone-like protein HdeA PTM-21SigPep 9738.91 Antisigma-28 factor, FlgM PTM N-Met 9734.00

52.50

2.7 × 10-8

sample name

protein

Escherichia coli O157:H7 (strain EDL933) Escherichia coli O157:H7 (strain RM5603) Escherichia coli O55:H7 (strain RM2057) Escherichia coli (strain K-12) Non-O157:H7 Escherichia coli (strain RM3061) Shigella flexneri Geobacter lovleyi (strain ATCC BAA-1151/DSM 17278/SZ) Haemophilus ducreyi

putative uncharacterized protein PTM-N-Met 9743.18

40.00 1.2 × 10-3

a MS/MS to in silico comparison parameters: intensity threshold, 2%; no. of MS/MS peaks with intensity g2%, 80; m/z range for comparison, 0-14 000 Th; fragment ion tolerance, 2.5 Th; protein MW, 9737 ± 10 Da; no. of bacterial proteins, 2017; all in silico fragment ions compared; “PTM N-Met” indicates that the in silico protein sequence was modified to remove the N-terminal methionine; “PTM #SigPep” indicates that the in silico protein sequence was modified to remove a signal peptide. Algorithm computation times: USDA peak matching algorithm, 59.7 s; p-value, 123.6 s.

algorithm and the p-value calculation identifies the source microorganism as either E. coli O157:H7 or E. coli O55:H7. The top score is not as “significant” compared with the lower identifications which was due primarily to the poor quality of the MS/MS spectrum. It was thus possible to distinguish between the E. coli O157:H7 strain EDL933 and E. coli strain K-12 and Shigella flexneri using this protein biomarker. The computation speed of the peak matching algorithm was more than twice as fast as that of the p-value algorithm. Figure S-1 in the Supporting Information shows the amino acid sequence of the homeobox protein of E. coli O157: H7 strain EDL933 and E. coli O55:H7 strain RM2057 and the YbgS sequence of E. coli strain K-12 and Shigella flexneri and the YbgS sequence of E. coli strain RM3061. This protein has a 24-residue N-terminal signal peptide as well as two cysteines suggesting a

possible disulfide bridge. These three sequences are differentiated from each other by one or two substitutions at residue 31 (T T S) or at residue 74 (P T S) or both. The latter substitution, by itself, results in a protein MW difference of 10 Da which is detectable by MALDI-TOF-MS; however, top-down MS/MS also allows these two proteins to be differentiated. Additional protein biomarker identifications of E. coli O157:H7 strain RM1272 (e.g., cold shock-like protein CspC) are provided in the Supporting Information. E. coli O157:H7 Strain RM5603. Table S-7A in the Supporting Information shows the top scoring identifications of the protein biomarker ion from E. coli O157:H7 strain RM5603 at m/z 7706.1 in Figure S-2 in the Supporting Information and analyzed by MS/MS and top-down proteomics. Analytical Chemistry, Vol. 82, No. 7, April 1, 2010

2723

Figure 4. MS spectrum of the extracted cell lysate of the non-O157:H7, nonpathogenic E. coli strain RM3061 using the HCCA matrix. Note that the protein biomarker ion at m/z 9063.4, which is absent from the MS spectrum of E. coli O157:H7 in Figure 1.

As with the EDL933 strain, the top scoring identification of this biomarker ion is YahO. Although it was possible to distinguish between E. coli O157:H7 and the nonpathogenic E. coli using this biomarker, it was not possible to distinguish between the O157:H7 and O55:H7 serotypes. Table S-7B in the Supporting Information shows a greater difference in identification scores between E. coli O157:H7 and nonpathogenic E. coli when MS/ MS fragment ions are compared to D-, E-, and P-specific in silico fragment ions. The difference in computation times for the two algorithms is even more pronounced for this residuespecific analysis. Additional protein biomarker identifications (e.g., putative stress-response protein YjbJ or CsbD family protein) of the E. coli O157:H7 strain RM5603 are provided in the Supporting Information. Non-O157:H7, Nonpathogenic E. coli Strain RM3061. Figure 4 shows the MS spectrum of the extracted cell lysate of the non-O157:H7, nonpathogenic E. coli strain RM3061. A very prominent ion peak at m/z ∼9063.4 is observed which is conspicuously absent from the MS spectra of E. coli O157:H7 strains (Figure 1). Mandrell et al.55 (and others9,31) previously identified this peak as the acid stress chaperone-like protein HdeB using MALDI-TOF-MS and bottom-up proteomics9 or gene sequencing techniques.55 Table 3 shows the top scoring identifications of the protein biomarker at m/z 9063.4 which was analyzed by MS/MS and top-down analysis which confirmed the previous identification of this peak as being HdeB. The source microorganism is also correctly identified; however, the sequence homology of HdeB is shared with several other serotypes of E. coli as well as with Shigella flexneri. HdeB has a 29-residue N-terminal signal peptide and two cysteine residues suggesting a disulfide bridge.54 Table S-10 in the Supporting Information shows the top scoring identifications of the protein biomarker ion at m/z 9738.4 (Figure 4) which was analyzed by MS/MS and top-down analysis. The biomarker ion was identified as the acid stress chaperone-like (55) Mandrell, R. E.; Harden, L. A.; Horn, S. T.; Haddon, W. F.; Miller, W. G. American Society of Microbiology, Los Angeles, CA, May 21-25, 2000; Poster C-177.

2724

Analytical Chemistry, Vol. 82, No. 7, April 1, 2010

protein HdeA which is consistent with previously reported results.9,55 The top identification of both algorithms includes the correct source microorganism; however, the sequence homology of HdeA is fully conserved across E. coli O157:H7 strains, E. coli O55:H7 strains, other E. coli strains, and Shigella flexneri. As noted previously, HdeA has a 21-residue N-terminal signal peptide as well as a putative disulfide bridge.54 Additional protein biomarker identifications of E. coli strain RM3061 are provided in the Supporting Information. Identification of the Acid Stress Chaperone-Like Protein: HdeB. HdeA and HdeB are acid stress chaperone-like proteins.56,57 HdeB is detected (expressed) or not detected (not expressed) in MALDI-TOF-MS spectra depending on the E. coli serotype. HdeB appears to be consistently absent from O157:H7 strains.31,55 Mandrell et al. previously reported the nonexpression of HdeB in E. coli O157:H7 strains (and thus not detected by MALDI-TOFMS) due to a mis-sense mutation in the start codon of the hdeB gene.55 In the case of E. coli O55:H7 strain RM2057 (DECA 5D, Sri Lanka, 1965), neither HdeA and HdeB are expressed (Table S-2 in the Supporting Information) due presumably to a mutation in the promoter region although HdeA and HdeB genes appear to be fully functional.55 When HdeA is observed in MALDI-TOFMS spectra, the absence of HdeB in the same spectra could serve as a “negative” biomarker for identification of the O157:H7 serotype. Mazzeo et al. reported the absence of the HdeB ion at ∼m/z 9060 in MALDI-TOF-MS spectra of E. coli O157:H7 strains and its presence in the spectra of non-O157:H7 E. coli strains and suggested that the absence of this protein ion peak could be used to distinguish between O157:H7 and non-O157:H7 strains.31 However, they were unable to identify the cause for the absence of the HdeB ion signal in the MS spectra of O157:H7 strains.31 In point of fact, the start codon for the hdeB gene is incorrectly (56) Malki, A.; Le, H. T.; Milles, S.; Kern, R.; Caldas, T.; Abdallah, J.; Richarme, G. J. Biol. Chem. 2008, 283, 13679–13687. (57) Kern, R.; Malki, A.; Abdallah, J.; Tagourti, J.; Richarme, G. J. Bacteriol. 2007, 189, 603–610.

Table 3. Top Eight Identification Scores for a Protein Biomarker from the non-O157:H7, Nonpathogenic E. coli Strain RM3061 Observed at m/z 9063.4 (Figure 4) and Analyzed by MS/MS and Top-Down Proteomics Using a Nonresidue-Specific in Silico Fragment Ion Comparison in silico ID

identifier

sample name

protein

43963

>]|WGM|WGM_PSMRU_2D

33855

>sp|P0AET2|HDEB_ECOLI

43980

>]|WGM|WGM_PSMRU_1D

43969

>]|WGM|WGM_PSMRU_3D

43976

>]|WGM|WGM_PSMRU_4D

33856

>sp|P0AET3|HDEB_ECOL6

non-O157:H7 Escherichia coli (strain RM3061) Escherichia coli (strain K-12) Escherichia coli O55:H7 (strain RM2057) Escherichia coli O55:H6 (strain RM2068) Escherichia coli O55: HN (strain RM2024) Escherichia coli O6

33857

>sp|P0AET4|HDEB_SHIFL

Shigella flexneri

34374

>tr|Q15UH4|Q15UH4_PSEA6

Pseudoalteromonas atlantica (strain T6c/BAA_1087)

HdeB acid stress chaperone protein PTM-29SigPep 9063.26 protein HdeB PTM-29SigPep 9063.26 HdeB acid stress chaperone protein PTM-29SigPep 9063.26 HdeB acid stress chaperone protein PTM-29SigPep 9063.26 HdeB acid stress chaperone protein PTM-29SigPep 9063.26 protein HdeB PTM-29SigPep 9063.26 protein HdeB PTM-29SigPep 9063.26 putative uncharacterized protein PTM-Met 9062.05

USDA score

p-value

52.17

8.2 × 10-10

52.17

8.2 × 10-10

52.17

8.2 × 10-10

52.17

8.2 × 10-10

52.17

8.2 × 10-10

52.17

8.2 × 10-10

52.17

8.2 × 10-10

38.04

1.6 × 10-3

a MS/MS to in silico comparison parameters: intensity threshold, 2%; no. of MS/MS peaks with intensity g2%, 92; m/z range for comparison, 0-14 000 Th; fragment ion tolerance, 2.5 Th; protein MW, 9063 ± 10 Da; no. of bacterial proteins, 1450; all in silico fragment ions compared; “PTM N-Met” indicates that the in silico protein sequence was modified to remove the N-terminal methionine; “PTM #SigPep” indicates that the in silico protein sequence was modified to remove a signal peptide. Algorithm computation times: USDA peak matching algorithm, 42.1 s; p-value, 115.6 s.

identified in NCBI (and other public databases), thus precluding identification of the mis-sense mutation in the start codon of hdeB.55 CONCLUSIONS We have identified six protein biomarkers from two strains of E. coli O157:H7 and one non-O157:H7, nonpathogenic E. coli strain by MALDI-TOF-TOF-MS/MS and top-down proteomics using web-based software developed in-house. The proteins identified are acid stress chaperone-like proteins HdeA and HdeB; cold shock protein, CspC; YbgS (or homeobox) protein; putative stressresponse protein YjbJ (or CsbD family protein); and a protein of unknown function, YahO. Four of the six proteins possessed a N-terminal signal peptide. Presumably these proteins are shuttled to the bacterial periplasmic space at which point their signal peptides are removed to generate the mature, functional protein. Because of the close taxonomic relationship between E. coli serotypes O157:H7 and O55:H7, the six protein biomarkers identified in this study were not sufficient to distinguish between these two serotypes. However, there may exist other, as yet undiscovered, protein biomarkers that will allow discrimination between these two important E. coli types. The MS/MS-top-down analysis of intact proteins was able to distinguish between the O157:H7 E. coli serotype from non-O157:H7, nonpathogenic E. coli from a single amino acid variation between the two E. coli types which suggests the power of this approach for identifying

protein biomarkers and, in certain cases, the source microorganism if the protein sequence is sufficiently unique. We were also able to confirm, by top-down analysis, the identity of a protein biomarker ion at ∼m/z 9060 as HdeB acid stress chaperone-like protein which is often detected in MS spectra of non-O157:H7 E. coli strains but which is consistently absent from the MS spectra of O157:H7 E. coli strains. ACKNOWLEDGMENT The mention of a brand or firm name does not constitute an endorsement by the U.S. Department of Agriculture over other of a similar nature not mentioned. We wish to thank Peter Højrup at Lighthouse for modifying his existing GPMAW software and providing it to us as a β-version. We also wish to thank Christine Hoogland at Bioinformatics Institute of Switzerland for her assistance. We also wish to thank Linda C. Whitehand for statistical discussions. We gratefully acknowledge the support of Applied Biosystems (Foster City, CA) for access to their 4800 TOF-TOF mass spectrometer for these experiments. SUPPORTING INFORMATION AVAILABLE Additional information as noted in text. This material is available free of charge via the Internet at http://pubs.acs.org. Received for review October 28, 2009. Accepted March 4, 2010. AC902455D

Analytical Chemistry, Vol. 82, No. 7, April 1, 2010

2725