Identification of Salmonella taxon-specific peptide markers to the

14 hours ago - We present an LC-MS/MS pipeline to identify taxon-specific tryptic peptide markers for identification of Salmonella at the genus, speci...
0 downloads 0 Views 381KB Size
Subscriber access provided by ECU Libraries

Article

Identification of Salmonella taxon-specific peptide markers to the serovar level by mass spectrometry Shu-Hua Chen, Christine H. Parker, Timothy R Croley, and Melinda A. McFarland Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.8b04843 • Publication Date (Web): 12 Mar 2019 Downloaded from http://pubs.acs.org on March 13, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Identification of Salmonella taxon-specific peptide markers to the serovar level by mass spectrometry

Shu-Hua Chen; Christine H. Parker; Timothy R. Croley; Melinda A. McFarland*

U.S. Food and Drug Administration, Center for Food Safety and Applied Nutrition, College Park, MD

*Corresponding author: [email protected]

Abstract We present an LC-MS/MS pipeline to identify taxon-specific tryptic peptide markers for identification of Salmonella at the genus, species, subspecies, and serovar levels of specificity. Salmonella enterica subsp. enterica serovars Typhimurium and its four closest relatives, Saintpaul, Heidelberg, Paratyphi B, and Muenchen, were evaluated. A decision tree approach was used to identify peptides common to the five Salmonella proteomes for evaluation as genus-, species-, and subspecies-specific markers. Peptides identified to two or fewer Salmonella strains were evaluated as potential serovar markers. Currently there are approximately 140,000 assembled bacterial genomes publicly available, more than 8,500 of which are for Salmonella. Consequently, the specificity of each candidate peptide marker was confirmed across all publicly available protein sequences in the NCBI non-redundant (nr) database. Performance of a subset of candidate taxon-specific peptide markers was evaluated in a targeted mass spectrometry method.

1 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The presented workflow offers a marked improvement in specificity over existing MALDI-TOFbased bacterial identification platforms for identification of closely related Salmonella serovars.

Introduction Salmonella is a bacterial pathogen that causes 1.2 million cases of salmonellosis annually in the United States, including 23,000 hospitalizations and 450 deaths.1 Rapid and accurate identification of Salmonella is important for surveillance, prevention, and control of food-borne diseases. A broad range of methods have been used for Salmonella identification and taxonomic classification.2-4 For foodborne disease surveillance and epidemiological investigation, methods that identify Salmonella at the serovar level are needed to trace back to the source of an outbreak. Serovars are subgroups defined by common cell surface antigens. Traditionally, Salmonella has been serotyped based on agglutination of the cells using antisera against the antigens on the cell wall. Conventional microbiological methods for identification of Salmonella in food involve multiple enrichments and selective plating, followed by identification by biochemical and serological methods.3 The procedure typically takes more than 5 days for complete confirmation.3 More recently, nucleic acid-based pathogen detection methods were developed to improve assay sensitivity and accuracy.2,4 Since the first application of MS for bacterial identification in 1975,5 it has been shown that MS offers a unique capability to detect proteins, nucleic acids, lipids, and oligosaccharides that are specific to a bacterium.6-9 Among the mass spectrometric techniques, whole cell matrix assisted laser desorption ionization (MALDI)-time-of-flight (TOF) MS-based methods have gained much attention due to the speed, low cost, straightforward sample preparation, and ease of operation. Typically, the identification of an unknown microbial species is achieved by

2 ACS Paragon Plus Environment

Page 2 of 30

Page 3 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

comparing mass spectral profiles of intact proteins in the experimental spectra to those from reference spectra of cultured colonies in a library.6,10 However, due to the limited data content, it is generally not effective for distinguishing closely related bacteria, and identification beyond the species level remains a challenge.11 In the case of Salmonella, classification below the species level has been demonstrated;12,13 however, a recent study concluded that MALDI-TOF MS is not useful for routine assignment of Salmonella serovars.14 Coupling MS to separation techniques, in particular liquid chromatography (LC), allows for the separation of complex samples, providing an extended dynamic range and a large number of detectable species, typically hundreds to thousands of proteins in a single experiment. A typical bacterial genome encodes about 5,000 proteins,15 of which more than 2,000 proteins are estimated to be translated at any given time.10 The moderate complexity of a bacterial proteome makes it suitable for LC-MS analysis because a significant portion of expressed proteins can be detected, which is likely to provide information to infer bacterial taxonomy. Gekenidis et al. illustrated that 10 times more proteins than classical MALDI-TOF MS were identified by combining tryptic digestion and LC-MALDI, facilitating the identification of biomarkers for Salmonella subspecies.16 Hu et al. showed that peptide markers that differentiate a panel of bacteria at the genus and species levels can be identified and implemented in a selective MS/MS assay using capillary electrophoresis and tandem mass spectrometry.17 Cheng et al classified Salmonella based on identification of dual-phase flagellum antigens using LC-MS/MS.18 Approaches to bacterial identification to the genus and species level based on overall proteome content identified by LC-MS/MS have been developed.19,20 Methods that compare bacterial proteomes based on the number of total and/or discriminative peptides identified from bottom-up proteomic analyses to a given set of reference genomes have been proposed to determine

3 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

taxonomic relationship. Strain-level resolution has been shown on a limited number of isolates by a few groups.21-25 Recent advances in whole genome sequencing (WGS) technologies15 have led to a dramatic increase in the number of sequenced genomes since the first two bacterial genomes were fully sequenced in 1995.26,27 As of April 2018, NCBI included assembled genome sequences for approximately 140,000 bacterial isolates.28 Of those, approximately 8,500 are Salmonella genomes. The availability of such a large number of highly homologous genomes reflects a future of protein databases that contain a high degree of both redundancy and highly informative amino acid variations. Mass spectrometry (MS)-based approaches can harness the wealth of whole genome sequencing data to identify expressed proteins/peptides that are useful for taxonomic assignments by comparing mass spectra or fragment ion spectra (MS/MS) to those predicted in silico from genomic data.12,13,19-24,29 Previously, we presented a top-down LC-MS/MS platform for the identification of Salmonella serovars.29 Here, we extended our method to the peptide level, with the end goal of multiplexed identification of peptide markers for Salmonella at the genus, species, subspecies, and serovar level. This method was developed using members of a closely related collection of salmonellae, the Salmonella reference collection A (SARA),30 referred to as the Typhimurium complex consisting of Salmonella serovar Typhimurium and its four closest relatives, namely, S. Saintpaul, S. Heidelberg, S. Paratyphi B, and S. Muenchen. Specifically, a pipeline was developed that combines bottom-up proteomics and filtering based on a BLAST search. Tryptic digests from each Salmonella lysate were compared to identify candidate peptide markers at the proteome level. BLAST searches against all publicly available protein sequences facilitates a final determination of the specificity of these peptides as Salmonella taxon markers. Taxon-

4 ACS Paragon Plus Environment

Page 4 of 30

Page 5 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

specific identification was illustrated with a subset of Salmonella markers showcasing specificity from genus to serovar. These peptide markers were applied in a targeted assay of a mixture of isolates for Salmonella classification to the serovar level.

Experimental Section Materials and bacterial strains Dithiothreitol, iodoacetamide, and ammonium bicarbonate were purchased from SigmaAldrich (St. Louis, MO). Mass spectrometry grade trypsin gold was purchased from Promega (Madison, WI). Formic acid was obtained from Thermo Fisher Scientific (Waltham, MA). Optima LC-MS grade solvents were purchased from Fisher Scientific (Fair Lawn, NJ). Five strains from Salmonella reference collection A (SARA)30 including Salmonella serovars Typhimurium SARA2 (LT2), Saintpaul SARA24, Heidelberg SARA39, Paratyphi B SARA50,

and

Muenchen

SARA63

were

obtained

from

the

Food

and

Drug

Administration/Center for Food Safety and Applied Nutrition stock culture collection. Taxonomic classification of these strains is provided in Table 1. Sample preparation Bacterial strains were streaked on trypticase soy agar plates and incubated at 35 °C for 16 h. Cells were harvested in 1 mL deionized water, transferred into a 1.5 mL microcentrifuge tube, washed twice with deionized water and once with 70% ethanol and re-suspended in lysis/extraction buffer consisting of 50:49:1 acetonitrile:water:formic acid. Cell lysis and protein extraction were performed within 24 hours after harvesting on a Barocycler NEP 3229 pressure cycling device (Pressure BioSciences, Inc, South Easton, MA). The pressure was cycled 24 times with 240 MPa for 15 s and then ambient pressure for 10 s per cycle at 44 °C. Total protein

5 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

concentration of each bacterial lysate was determined using a Qubit Fluorometer (Thermo Fisher Scientific, Waltham, MA). Each strain was cultured twice to serve as replicates. Strains were cultured again at a later time for targeted LC-MS/MS experiments. 100 ug of each cell lysate was neutralized with ammonium bicarbonate buffer (pH = 7.8), reduced by 10 mM dithiothreitol (30 min, 60 °C) and alkylated by 25 mM iodoacetamide (30 min, room temperature in the dark). Enzymatic digestion was performed at 37 °C overnight using a 1:100 trypsin:protein ratio. The digest was diluted to 50 ng/µL with 96.9:3:0.1 water:acetonitrile:formic acid. Samples were centrifuged at 21,000 x g for 15 min to remove all cellular debris. A portion of the supernatant was transferred to an autosampler vial with a PTFE/Silicone septum and analyzed by LC-MS/MS. Remaining digested samples were stored at –40 °C for up to six months after which time all targeted peptides were still detectible. For targeted experiments on samples containing a mixture of five isolates, digests (50 ng/µL) from each of the five Salmonella serovars were mixed using a ratio of 1:1:1:1:1 by volume. Two mixtures were created, each from different cultures. LC-MS/MS Data-dependent LC-MS/MS analysis was performed using an Orbitrap Elite mass spectrometer (Thermo Scientific, San Jose, CA) coupled with a nanoACQUITY UPLC system (Waters, Milford, MA). 100 ng of Salmonella digest was loaded onto a Symmetry C18 trap column (100 Å, 5 K " 180 K

× 20 mm) and peptides were subsequently separated by a BEH

C18 column (130 Å, 1.7 K " 100 K

× 100 mm) at 40 °C using a 120-min linear gradient of 3–

40% acetonitrile in 0.1% formic acid at a flow rate of 300 nL/min. Survey scans (resolving power 30,000) were acquired in the Orbitrap, and the fifteen most intense ions in each survey spectrum were selected for subsequent collision induced dissociation (CID) fragmentation in the

6 ACS Paragon Plus Environment

Page 6 of 30

Page 7 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

linear ion trap (normalized collision energy of 35%; signal threshold of 5,000 counts). Automatic gain control (AGC) allowed accumulation of up to 5×105 ions for MS scans and 1×104 ions for MS/MS scans. The maximum injection time was 100 ms for FTMS and 50 ms for Ion Trap scans. Each biological replicate was analyzed in three technical replicates, resulting in a total of six replicates for each strain. Targeted LC-MS/MS was performed on an Orbitrap Fusion Lumos Tribrid mass spectrometer (Thermo Scientific, San Jose, CA) with the same conditions as described above unless noted. The instrument was operated in time-scheduled targeted selected ion monitoring (tSIM) mode with MS/MS. The quadrupole mass filter was set to pass a 2 Da window around the m/z of the precursor ion of interest. MS1 scans were collected in the Orbitrap with an AGC target of 2×104 ions. The m/z for all 14 marker peptides was targeted in each experiment, with each peptide targeted for 5 min based on its elution time. When the ion of interest was present, the MS scan was followed by CID in the linear ion trap for confirmation of peptide identification. Biological replicates of each isolate and mixture were analyzed a minimum of two times to confirm reproducibility. Extracted ion chromatograms (XIC) were reconstructed by plotting MS1 precursor data at a mass accuracy of 5 ppm for all 14 targeted ions.

Custom proteome database A single custom database was created comprised of the protein sequences from one sequenced genome for each of the five Salmonella enterica subsp. enterica serovars in this study (DT: Typhimurium; DS: Saintpaul; DH: Heidelberg; DP: Paratyphi B; DM: Muenchen), as well as representative translated genomes from closely related Escherichia coli (E. coli) K-12 (DK) and O157:H7 (DO), and Shigella flexneri (DF) and sonnei (DN), and common protein contaminants

7 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(DC), such as keratin and trypsin.31 Efforts were made to reflect the actual strains used. Strains with no publicly available annotated genome were represented by a closely related annotated genome at the same taxonomic rank. The goal of this study was to identify peptides shared across multiple strains of the same serovar, to serve as general markers. Unidentified peptides present in analyzed strains but missing from the database were not considered as suitable peptide markers. The genomes used to assemble the custom database are provided in Table 1. Translated protein sequences were downloaded from the National Institutes of Health’s National Center for Biotechnology Information (NCBI) in March 2017. It should be noted that, due to the large number of sequenced genomes for bacteria, as of April 2015, UniProt protein knowledgebase (UniprotKB) removed bacterial proteomes that were considered redundant.32 The procedure involved grouping proteomes at the species level, flagging proteomes with similarity greater than 90%, and removing the corresponding protein entries from TrEMBL. The five Salmonella serovars studied here have genomes that are approximately 99% similar (based on average nucleotide identity via BLAST+ by JSpeciesWS tool,33 data not shown). These serovars were considered redundant in UniprotKB and removed. Database search and parsimony analysis Tandem mass spectra were searched by a Mascot search engine v. 2.5 (Matrix Science, Boston, MA) against the custom databases described above (43,568 sequences) for tryptic peptides with a precursor mass tolerance of 50 ppm, fragment ion tolerance of 0.8 Da and up to two missed cleavages. Carbamidomethylation of cysteine residues was a fixed modification and oxidation of methionine residues was a variable modification. MassSieve v 1.1234 was used to pool the output of six replicate injections of each strain and to evaluate peptide and protein level parsimony comparisons across multiple experiments.

8 ACS Paragon Plus Environment

Page 8 of 30

Page 9 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Only the top scoring peptide hit with a Mascot score greater than or equal to the identity score threshold (p < 0.05) was retained. Ambiguous peptide hits, i.e., more than one peptide sequence in the database assigned with the same score, were excluded. Peptides containing missed tryptic cleavage sites were removed (KP and RP were considered possible missed cleavages and removed). For each serovar, peptides from proteins with less than two identified peptides were discarded and remaining peptides had to be identified in a minimum of four of six replicates. Sequence alignment to genomic databases Not all serovars express all encoded proteins at detectable concentrations under a given experimental condition. To minimize peptide candidates for BLAST searches, data was filtered to account for peptides detected in a single isolate but present in more than one genome in the custom database to reduce the number of candidate peptides. All peptides identified by MS/MS and filtered by the criteria above were aligned to translated protein sequences in the custom database. For each peptide, a custom Python script was used to read through the protein sequences in each genome and construct a map between the peptide and reference genomes. This alignment was used to exclude peptides with low specificity. Next, the remaining peptides were further evaluated for marker candidacy by BLAST searches against all sequences in the NCBInr protein database. Specifically, for each peptide candidate, BLASTP was performed to retrieve a maximum of 500 sequence alignments with an e-value cutoff of 2×105. The BLAST output was parsed to retrieve alignments that match exactly to the query (sequence coverage = 100%, identity = 100%). All proteins containing the candidate peptide were obtained with taxonomic affiliation. The taxonomic position of each protein including genus, species, subspecies, serovar, and strain was recorded if available and the number of proteins at each taxon was counted. Batch

9 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

BLAST analyses and parsing were performed with in-house Python scripts against the entire online NCBInr database between March and September 2017.

Results and Discussion Salmonella is the leading cause of hospitalization due to foodborne illness in the United States.35 Efforts to trace back to the source of Salmonella contaminated food require a minimum of serovar level specificity. Currently more than 1,500 Salmonella enterica subsp. enterica serovars have been reported, all of which are genetically similar. For example, genome sequences for the five strains representing the five Salmonella serovars (Table 1) used in the custom database show approximately 99% similarity. Assembled genomes for more than 8,100 S. enterica isolates are publicly available, of which approximately 6,800 are for S. enterica subsp. enterica serovars. Consequently, identifying serovar-specific markers, either genomically or proteomically, that are predictive across all known Salmonella isolates is a significant challenge. In an effort to explore the utility of a rapid, multiplexed mass spectrometry method for identification of Salmonella in food matrices, we developed a platform to identify informative peptide markers at the Salmonella genus, species, subspecies, and serovar level that are amenable to targeted LC-MS/MS screening assays. Proteome analysis The selection of Salmonella taxon-specific markers was implemented by bottom-up proteomic analysis and database searches. In brief, tryptic digests of stationary phase bacteria were analyzed using LC-MS/MS. Due to a lack of downstream commercial software compatible with search results against large, redundant databases and the corresponding high number of proteins associated with each peptide, the primary search of the resultant MS/MS spectra was

10 ACS Paragon Plus Environment

Page 10 of 30

Page 11 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

limited to a custom protein database. The database was comprised of the translated protein sequences from one sequenced genome per analyzed strain, which represent the five Salmonella enterica subsp. enterica serovars S. Typhimurium, S. Saintpaul, S. Heidelberg, S. Paratyphi B, S. Muenchen, as well as closely related E. coli K-12, E. coli O157:H7, Shigella flexneri, and Shigella sonnei (Table 1). Figure 1 illustrates the degree to which identified peptides are shared across the Salmonella serovar strains. Each block represents a set of peptides identified in a subset of isolates. Serovar names are denoted by a single letter. An uppercase bold letter denotes the serovars in which the set of peptides was identified. Lowercase letters signify the set of peptides was not identified. For example, TSHPM represents the peptides identified in T (S. Typhimurium), S (S. Saintpaul), H (S. Heidelberg), P (S. Paratyphi B), and M (S. Muenchen) proteomes. The complete set of identified peptides in a bacterium represents the readily detectable portion of the expressed proteome and can contain a subset of marker peptides that are specific at various taxonomic levels. Approximately 6,400 tryptic peptides were identified in each Salmonella lysate. For example, a total of 6,198 peptides were reproducibly identified in S. Typhimurium (top row). The majority, 3,729 peptides, were identified in isolates from all five serovars (TSHPM), 882 peptides were present in four of the five serovars (TsHPM, TShPM, TSHpM, TSHPm), 774 in three (TshPM, TsHpM, TsHPm, TShpM, TShPm, TSHpm), and 373 peptides were shared by two isolates (TshpM, TshPm, TsHpm, TShpm). Only 440 peptides were uniquely detected in S. Typhimurium (Tshpm). Peptide sequences conserved across all serovars can be treated as potential genus-, species-, and subspecies-level markers. Considerable overlap in the proteomes is seen across the Salmonella serovar set, with only 7.0% of identified peptides unique to a single serovar.

11 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Genus and species markers The largest population in Figure 1, shown in violet, represents the core proteome composed of peptides present in all strains (3,729 peptides), accounting for 58% of the identified Salmonella peptides. To reduce the number of candidate peptide markers and identify peptides with high Salmonella taxon specificity, a decision tree was developed (Figure 2). Approximately 60% of these peptides were shared with E. coli and Shigella strains (2,254 peptides) in the custom database, i.e., present in DK, DO, DF, or DN, and subsequently removed. The remaining 1,475 peptides were considered the initial set of genus-, species- and subspecies-specific candidate peptides. These candidate peptides were subject to BLAST searches against all public translated proteomes in the NCBInr protein database for final determination of taxonomic specificity. For each peptide, the BLAST search returned a list of all proteins in the database that contained the candidate peptide. It is expected that the core peptides are primarily derived from essential and housekeeping proteins and will be ubiquitous across many bacteria when queried against a larger database. Analysis of taxonomic data for the returned entries showed that approximately 40% of the peptide candidates were present in ten or more genera. Fewer than 24% of the peptide set (350 peptides) were unique to the Salmonella genus, with each of these peptides present in an average of 5,400 Salmonella protein entries. These 350 peptides were unique to Salmonella when queried across the genomes for approximately 140,000 bacterial isolates and were designated as conserved Salmonella genus markers. Genus markers were further analyzed for use as candidate S. enterica species markers. While 151 peptides were found in both S. enterica and S. bongori, the remaining 199 peptides were present in S. enterica only and designated as S. enterica species markers.

12 ACS Paragon Plus Environment

Page 12 of 30

Page 13 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Subspecies markers Out of 199 S. enterica species markers, all but one were present in more than one of the six S. enterica subspecies; namely, enterica (I), salamae (II), arizonae (IIIa), diarizonae (IIIb), houtenae (IV), and indica (VI) (see taxonomic data in Supplementary File 1). One peptide marker was specific to S. enterica subsp. enterica, but it was revealed to be present in only a small number of serovars when searched against NCBInr database. The absence of a representative and detectable subspecies-specific peptide is in line with the increase in protein homology at deeper Salmonella taxonomic levels and underlies the difficulty in establishing suitable markers below the species level. Consistent with our data, a recent pan-genome study by Laing et. al evaluated 4,893 Salmonella enterica genomes and showed that a number of S. enterica genomic DNA markers can be identified but none of them were specific to any of Salmonella subspecies.36 To establish subspecies markers, filtering criteria were loosened to include BLAST analysis from the initial genus-, species- and subspecies-specific candidate set (1,475 peptides), regardless of their presence in other genera and species. That is, the genus and species alignments were not restricted to Salmonella and S. enterica for subspecies’ analysis. The subspecies alignment analysis is illustrated in Figure S1. Only 15 peptides were specific to genomes of the subspecies enterica and designated S. enterica subsp. enterica markers. However, none of these were present in the majority of available Salmonella enterica genomes. Consequently, more than one peptide marker is required to establish subspecies. The potential for peptide markers at the subspecies level to be the result of a different genus or species is mitigated by the co-requirement of detection of a Salmonella genus and a species marker. Serovar markers

13 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

As illustrated in Figure 1 and listed in Table 2, only a few hundred peptides were identified in a single serovar, with 440, 343, 447, 679, 301 peptides identified uniquely in the S. Typhimurium, S. Saintpaul, S. Heidelberg, S. Paratyphi B and S. Muenchen isolates, respectively. While these peptides were unique to a single serovar within the confines of the current sample set, the aim of this study was to identify specific markers when considered across all 1,500 known S. enterica subsp. enterica serovars, for use in analysis of unknown samples. This subset of peptides was further analyzed by BLAST searches against the NCBInr database. Among the hundreds of peptides queried, only a few peptides were serovar-specific when searched against all available sequence data (3, 0, 1, 3, 0 peptides for Salmonella Typhimurium, Saintpaul, Heidelberg, Paratyphi B and Muenchen, respectively). Although peptides specific to a single serovar were identified, many of them were only present in a few isolates. The result is not surprising considering the more than 1,500 S. enterica subsp. enterica serovars and more than 8,000 Salmonella enterica genomes available in public databases. The staggering degree of homology across S. enterica subsp. enterica serovars makes selection of a single serovar specific peptide challenging and suggests sets of peptides are required to uniquely identify individual Salmonella serovars. Our data is in line with the pan-genome study of Salmonella enterica by Laing et. al which concluded that no serovar had a pan-genome DNA region that was specific to any one Salmonella serovar.36 Data were reevaluated to consider identified peptides that are genomically predicted in two and fewer Salmonella strains in the custom database (DT, DS, DH, DP, and DM). These sequences were aligned against NCBInr database using BLAST. As expected, most of the sequences were not unique and were found in an average of 29 serovars each. In total, 54, 7, 90, 38, and 27 peptides were present in ten or fewer serovars across the NCBInr for S. Typhimurium,

14 ACS Paragon Plus Environment

Page 14 of 30

Page 15 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

S. Saintpaul, S. Heidelberg, S. Paratyphi B and S. Muenchen, respectively. Combinations of these peptides can be used as Salmonella serovar markers, as shown in Table S1. Detailed lists of all peptides are provided in Supplementary File 1. The numbers of peptides selected as Salmonella genus, species, subspecies, serovar-level identification are summarized in Table 2. Targeted assays When all available public sequences are considered, it is evident that identification of Salmonella to the serovar level by targeted LC-MS/MS requires sets of peptides specific to each level of taxonomic specificity. In this work, a multiplexed peptide-based assay was created for identification of Salmonella at the genus, species, subspecies and serovar level. One possible set of peptide markers is shown in Figure 3. This collection of peptides was selected based on taxonomic specificity, ease of detection based on spectral counts, absence of variable modifications, and chromatographic resolution. The properties of these representative peptides are summarized Table S1. The Salmonella genus marker (SG) SLTDTLEEVLSSSGEK was derived from a membrane protein and ubiquitously present in both species, S. enterica and S. bongori, and in more

than

5,500

Salmonella

entries.

A

peptide

from

succinyltransferase

APAVEPAAQPALGAR was selected as the species marker (SE) due to its specificity to S. enterica and presence in all six S. enterica subspecies and more than 5,500 S. enterica entries. Two S. enterica subsp. enterica subspecies markers (SEE) were monitored, each of which were present

in

more

than

2,000

S.

enterica

subsp.

enterica

entries.

SEE1

GLDLSPTNELLIDESLIGWK was a peptide from carbamoyl phosphate synthase, and SEE2 AGNVIGGGDWAK was from a dehydratase.

15 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

A combination of two peptides was required to define a serovar. For example, S. Typhimurium marker 1 (T1) EVTPFGAALR was present in serovars 4,[5],12:i:-,37 Aqua, Give, Kentucky, Newport and Typhimurium, and marker T2 GNTIVGSGSGGTTK in Mbandaka and Typhimurium. Consequently, Typhimurium was the only possible serovar when both markers are present. Similarly, a combination of two peptides was required to unambiguously identify Salmonella serovars Saintpaul, Heidelberg, Paratyphi B and Muenchen (see Figure 3). The presence of these marker peptides in genomes of these five serovars in NCBInr is summarized in Table 3. As discussed above, the SEE and serovar markers can be found in genera other than Salmonella and species other than enterica; however, these peptides are used in conjunction with Salmonella genus and species-specific peptides. While the number of genomes which a given peptide matched was considered, the absolute number of hits of a given peptide to a particular serovar is not necessarily reflective of specificity. For example, there are more than 350 genomes available for a single strain S. Typhimurium str. DT104.28 As a result, BLAST hits to this strain are over-represented. As is true for any identification method based on publicly available protein sequences; accuracy is limited by the correctness of available genomic information and the markers need to be re-assessed when newly sequenced genomes are available. A scheduled targeted mass spectrometric assay was developed to evaluate performance of the set of selected Salmonella markers based on reproducibility, chromatographic resolution, and MS intensity. The fourteen Salmonella peptides shown in Figure 3 were monitored in scheduled 5-min retention time windows. The top five panels in Figure 4 display an overlay of extracted ion chromatograms for the set of peptide targets obtained from SARA2 (S. Typhimurium), SARA24 (S. Saintpaul), SARA39 (S. Heidelberg), SARA50 (S. Paratyphi B), and SARA63 (S.

16 ACS Paragon Plus Environment

Page 16 of 30

Page 17 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Muenchen) cultures. While the ion expression profiles vary across the samples, all markers from genus to subspecies (SG, SE, and SEE1 and SEE2) were observed in all of the isolates analyzed. Further, for each serovar, the specific combination of peptide markers was detected in only the target serovar (T, S, H, P, M series). The selected markers were specific and reproducibly detected with adequate ion abundance and little overlap in their elution profiles. It is noted that even within the confines of a scheduled targeted experiment, isobaric peptides were present. For example, noted with an asterisk in Figure 4 is a closely eluting chromatographic peak with the same elemental composition (LGPTFAADIR, at approximately 53 min) in the retention time window of T1 (EVTPFGAALR). However, the potential ambiguities were avoided with tandem mass spectrometric data. The specificity of the markers was verified using targeted analysis of additional strains of the serovars (see Figure S2). The utility of a multiplexed assay for Salmonella identification in a mixture of closelyrelated serovars was demonstrated with a 1:1:1:1:1 mixture of the same five SARA serovars. The bottom panel of Figure 4 contains extracted ion chromatograms for the same fourteen marker ions obtained from the mixture. Markers ions SG, SE, SEE1 and SEE2 were detected, allowing the classification to S. enterica subsp. enterica. The marker ion set for each serovar (T, S, H, P, M series) was observed, indicating the presence of all five serovars.

Conclusion A pipeline that identifies sets of informative peptide markers for Salmonella identification down to the serovar level was developed as a multi-step classification strategy. Bottom-up proteomics was combined with a set of filtering criteria based on BLAST searches against all publicly available annotated genomic sequences, including those from 14,000

17 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

bacterial and 8,500 Salmonella isolates. BLAST searches reduced the number of potential marker candidates but ensured applicability across the greater than 1,500 known Salmonella enterica subsp. enterica serovars for use in identification of unknown isolates. This method illustrates the complexity of developing a mass spectrometry-based approach for taxonomic assignment of Salmonella to the serovar level when faced with the complexity of large amounts of available shotgun genome sequencing data. A multiplexed targeted assay was performed to confirm the applicability of a subset of markers in a mixture of Salmonella serovars and the panel of closely-related serovars was simultaneously identified. The potential of targeted MS approaches, including SIM and MRM, to provide sensitive and specific detection of a panel of marker ions provides an opportunity for bacterial identification in complex matrices such as food at a fraction of the time needed for whole genome sequencing. Although the results are promising, food samples are likely to contain a mixture of bacteria in a protein rich matrix. Ongoing studies of assay sensitivity and potential interference from bacteria present in a variety of food matrices will determine whether a multiplexed, targeted peptide method is directly applicable in food or, more likely, requires enrichment of Salmonella by selective media and/or antibody-coated magnetic nano-particles.

Acknowledgments The authors thank Hsueh-Ying Chen for informatics consultation, Ruth Timme for bacterial genome consultation, and Rebecca Bell and Christina Ferreira for providing isolates. Shu-Hua Chen was supported, in part, by appointment to the Research Participation Program,

18 ACS Paragon Plus Environment

Page 18 of 30

Page 19 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

administered by the Oak Ridge Institute for Science and Education (Oak Ridge, TN) through and interagency agreement between the D.O.E. and the F.D.A.

Figure Legend Figure 1. Proteomic comparison across five Salmonella serovars. The intersection/complement among the five sets of peptides is shown by different colored bars. The first letter of each serovar (T/t, S/s, H/h, P/p, M/m) denotes the peptides identified in each serovar proteome. Uppercase black /lowercase grey letters signify presence/absence of the serovar, respectively. For example, TshpM represents the peptides identified only in S. Typhimurium and S. Muenchen strains. Figure 2. Decision tree classification for selection of Salmonella taxon-specific markers. See Table 1 for database abbreviations. Figure 3. A subset of markers for identification of Salmonella from the genus to serovar level. SG: Salmonella genus marker, SE: S. enterica, SEE: S. enterica subsp. enterica markers, T: S. Typhimurium, S: S. Saintpaul, H: S. Heidelberg; P: S. Paratyphi B, M: S. Muenchen. Properties of the selected peptides are provided in Table S1. Additional peptide markers are provided in Supplementary File 1. Figure 4. Extracted ion chromatograms plotted with a 5 ppm mass tolerance from MS1 scans from targeted LC-MS/MS of a set of Salmonella taxon markers from tryptic digests of SARA2 (T; Typhimurium), SARA24 (S; Saintpaul), SARA39 (H; Heidelberg), SARA50 (P; Paratyphi B), and SARA63 (M; Muenchen). Fourteen peptides were targeted in pure cultures as well as a mixture of the five serovars. Signal intensities were adjusted for visualization (labeled above the peaks). Letters refer to the marker peptides: SG (Salmonella genus), SLTDTLEEVLSSSGEK; SE

(S.

enterica):

APAVEPAAQPALGAR;

SSE1

(S.

19 ACS Paragon Plus Environment

enterica

subsp.

enterica

1),

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

GLDLSPTNELLIDESLIGWK; SSE2 (S. enterica subsp. enterica 2), AGNVIGGGDWAK; T1 (S. Typhimurium 1), EVTPFGAALR; T2 (Typhimurium 2), GNTIVGSGSGGTTK; S1 (S. Saintpaul 1), LLTEHNLEASAIK; S2 (S. Saintpaul 2), TAIIWEGDDTSQSK; H1 (S. Heidelberg 1), AAEESAQISQR; H2 (S. Heidelberg 2), INDNQVIDGGESR; P1 (S. Paratyphi B 1), IGAADYTILGTVK; P2 (S. Paratyphi B

2), TVNYTDATGATK; M1 (S. Muenchen 1),

VNLIESLESLSVTK; M2 (S. Muenchen 2), YDANNVYLAAQYSQTYNATR. Sequences were confirmed by tandem mass spectrometry. A peak denoted by an asterisk in the targeted elution time window of T1 was confirmed as a co-eluting peak with the same elemental composition but different amino acid sequence. Table 1. Bacterial strains examined by proteomic analyses and used to construct a custom database. Table 2. The number of peptides specific to different Salmonella taxonomic levels determined by proteomic analyses, within the strains in the custom database, and confirmed by BLAST searches against all genomic databases. See Table 1 for database abbreviations. Table 3. Evaluated Salmonella serovar markers and their presence in genomes of the study serovars in NCBInr.

Supplemental data Detailed lists of all candidate Salmonella markers; Subspecies alignments for peptides common to all five Salmonella serovar proteomes; Verification of selected markers in additional strains.

20 ACS Paragon Plus Environment

Page 20 of 30

Page 21 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

References (1) https://www.cdc.gov/salmonella/index.html (2) Bell, R. L.; Jarvis, K. G.; Ottesen, A. R.; McFarland, M. A.; Brown, E. W. Microb.

Biotechnol. 2016, 9, 279-292. (3) Lee, K.-M.; Runyon, M.; Herrman, T. J.; Phillips, R.; Hsieh, J. Food Control 2015, 47, 264276. (4) Wattiau, P.; Boland, C.; Bertrand, S. Appl. Environ. Microbiol. 2011, 77, 7877-7885. (5) Anhalt, J. P.; Fenselau, C. Anal. Chem. 1975, 47, 219-225. (6) Sauer, S.; Kliem, M. Nat. Rev. Microbiol. 2010, 8, 74-82. (7) von Wintzingerode, F.; Bocker, S.; Schlotelburg, C.; Chiu, N. H.; Storm, N.; Jurinke, C.; Cantor, C. R.; Gobel, U. B.; van den Boom, D. Proc. Natl. Acad. Sci. U. S. A. 2002, 99, 70397044. (8) Calvano, C. D.; Zambonin, C. G.; Palmisano, F. Rapid Commun. Mass Spectrom. 2011, 25, 1757-1764. (9) Gilbart, J.; Fox, A.; Morgan, S. L. Eur. J. Clin. Microbiol. 1987, 6, 715-723. (10) Singhal, N.; Kumar, M.; Kanaujia, P. K.; Virdi, J. S. Front. Microbiol. 2015, 6, 791. (11) Sandrin, T. R.; Goldstein, J. E.; Schumaker, S. Mass Spectrom. Rev. 2013, 32, 188-217. (12) Dieckmann, R.; Helmuth, R.; Erhard, M.; Malorny, B. Appl. Environ. Microbiol. 2008, 74, 7767-7778. (13) Dieckmann, R.; Malorny, B. Appl. Environ. Microbiol. 2011, 77, 4136-4146. (14) Kang, L.; Li, N.; Li, P.; Zhou, Y.; Gao, S.; Gao, H.; Xin, W.; Wang, J. Eur J Mass Spectrom (Chichester) 2017, 23, 70-82. (15) Land, M.; Hauser, L.; Jun, S. R.; Nookaew, I.; Leuze, M. R.; Ahn, T. H.; Karpinets, T.; Lund, O.; Kora, G.; Wassenaar, T.; Poudel, S.; Ussery, D. W. Funct Integr Genomics 2015, 15, 141-161. (16) Gekenidis, M. T.; Studer, P.; Wuthrich, S.; Brunisholz, R.; Drissner, D. Appl. Environ. Microbiol. 2014, 80, 4234-4241. (17) Hu, A.; Chen, C. T.; Tsai, P. J.; Ho, Y. P. Anal. Chem. 2006, 78, 5124-5133. (18) Cheng, K.; Sloan, A.; Meakin, J.; McCorrister, S.; Jerome, M.; Westmacott, G.; Drebot, M.; Nadon, C.; Knox, J. D.; Wang, G. J. Clin. Microbiol. 2014, 52, 2189-2192. (19) Alves, G.; Wang, G.; Ogurtsov, A. Y.; Drake, S. K.; Gucek, M.; Suffredini, A. F.; Sacks, D. B.; Yu, Y. K. J. Am. Soc. Mass Spectrom. 2016, 27, 194-210. (20) Alves, G.; Wang, G.; Ogurtsov, A. Y.; Drake, S. K.; Gucek, M.; Sacks, D. B.; Yu, Y.-K. J. Am. Soc. Mass Spectrom. 2018, 29, 1721-1737. (21) Dworzanski, J. P.; Snyder, A. P.; Chen, R.; Zhang, H.; Wishart, D.; Li, L. Anal. Chem. 2004, 76, 2355-2366. (22) Dworzanski, J. P.; Deshpande, S. V.; Chen, R.; Jabbour, R. E.; Snyder, A. P.; Wick, C. H.; Li, L. J. Proteome Res. 2006, 5, 76-87. (23) Tracz, D. M.; McCorrister, S. J.; Chong, P. M.; Lee, D. M.; Corbett, C. R.; Westmacott, G. R. J. Microbiol. Methods 2013, 94, 54-57. (24) Boulund, F.; Karlsson, R.; Gonzales-Siles, L.; Johnning, A.; Karami, N.; Al-Bayati, O.; Ahren, C.; Moore, E. R. B.; Kristiansson, E. Mol. Cell. Proteomics 2017, 16, 1052-1063. (25) Karlsson, R.; Davidson, M.; Svensson-Stadler, L.; Karlsson, A.; Olesen, K.; Carlsohn, E.; Moore, E. R. J. Proteome Res. 2012, 11, 2710-2720.

21 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(26) Fleischmann, R. D.; Adams, M. D.; White, O.; Clayton, R. A.; Kirkness, E. F.; Kerlavage, A.

R.; Bult, C. J.; Tomb, J. F.; Dougherty, B. A.; Merrick, J. M.; et al. Science 1995, 269, 496-512. (27) Fraser, C. M.; Gocayne, J. D.; White, O.; Adams, M. D.; Clayton, R. A.; Fleischmann, R. D.; Bult, C. J.; Kerlavage, A. R.; Sutton, G.; Kelley, J. M.; Fritchman, R. D.; Weidman, J. F.; Small, K. V.; Sandusky, M.; Fuhrmann, J.; Nguyen, D.; Utterback, T. R.; Saudek, D. M.; Phillips, C. A.; Merrick, J. M., et al. Science 1995, 270, 397-403. (28) https://www.ncbi.nlm.nih.gov/genome/ (29) McFarland, M. A.; Andrzejewski, D.; Musser, S. M.; Callahan, J. H. Anal. Chem. 2014, 86, 6879-6886. (30) Beltran, P.; Plock, S. A.; Smith, N. H.; Whittam, T. S.; Old, D. C.; Selander, R. K. J. Gen. Microbiol. 1991, 137, 601-606. (31) https://www.thegpm.org/crap/ (32) https://www.uniprot.org/help/proteome_redundancy (33) Burall, L. S.; Grim, C. J.; Mammel, M. K.; Datta, A. R. PLoS One 2016, 11, e0150797. (34) Slotta, D. J.; McFarland, M. A.; Markey, S. P. Proteomics 2010, 10, 3035-3039. (35) Scallan, E.; Hoekstra, R. M.; Angulo, F. J.; Tauxe, R. V.; Widdowson, M.-A.; Roy, S. L.; Jones, J. L.; Griffin, P. M. Emerg. Infect. Dis. 2011, 17, 7-15. (36) Laing, C. R.; Whiteside, M. D.; Gannon, V. P. J. Front. Microbiol. 2017, 8, 1345. (37) https://www.pasteur.fr/sites/default/files/veng_0.pdf

22 ACS Paragon Plus Environment

Page 22 of 30

Page 23 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Table 1. taxonomy (experimental strains) genus

species

subsp.

Salmonella

enterica

enterica

serovar Typhimurium

strain SARA2 (LT-2)

Saintpaul

SARA24

Heidelberg

SARA39

Paratyphi B

SARA50

Muenchen

SARA63

database genome

abbreviation

Salmonella enterica subsp. enterica serovar Typhimurium str. LT2-4

DT

Salmonella enterica subsp. enterica serovar Saintpaul str. SARA26

DS

Salmonella enterica subsp. enterica serovar Heidelberg str. SARA39

DH

Salmonella enterica subsp. enterica serovar Paratyphi B str. SPB7

DP

Salmonella enterica subsp. enterica serovar Muenchen str. ATCC8388

DM

Escherichia coli str. K-12 substr. MG1655

DK

Escherichia coli O157:H7 EDL933

DO

Shigella flexneri 2a str. 2457T

DF

Shigella sonnei Ss046

DN

Common contaminants

DC

23 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table 2.

Salmonella genus S. enterica S. enterica subsp. enterica

filtering based on proteomic data

filtering based on selected genomes (custom database)

filtering based on all public genomes (NCBInr)

common across the five Salmonella proteomes

absent in DK, DO, DF, DN

taxon-specific

3729

1475

15 a single Salmonella strain proteome

S. S. S. S. S.

Typhimurium Saintpaul Heidelberg Paratyphi B Muenchen

350 199

440 343 447 679 301

specific to two or specific to ten or fewer fewer S. enterica subsp. enterica serovars of DT , DS, DH, DP, DM 164 86 242 112 141

54 7 90 38 27

24 ACS Paragon Plus Environment

Page 24 of 30

Page 25 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Table 3. marker T1

sequence

Salmonella serovar genomes in NCBInr Typhimurium Saintpaul

EVTPFGAALR

x x

T2

GNTIVGSGSGGTTK

S1

LLTEHNLEASAIK

S2

TAIIWEGDDTSQSK

x

x

x

x

Heidelberg Paratyphi B Muenchen

x

x x

H1

AAEESAQISQR

H2

INDNQVIDGGESR

x

P1

IGAADYTILGTVK

x

P2

TVNYTDATGATK

x

M1

VNLIESLESLSVTK

M2

YDANNVYLAAQYSQTYNATR

x

x x x x

x

25 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 30

Figure 1. Tshpm TsHpm TshpM TShPm TsHPm TshPM TSHpM

TsHPM

TSHPM

Typhimurium TShpm TshPm TSHpm TShpM TsHpM TSHPm TShPM tShpm tSHpm tShpM TShPm tSHPm tShPM TSHpM tSHPM

TSHPM

Saintpaul TShpm tShPm TSHpm TShpM tSHpM TSHPm TShPM tsHpm tSHpm tsHhpM TsHPm tSHPm tsHPM TSHpM tSHPM

TSHPM

Heidelberg TsHpm tsHPm TSHpm TsHpM tSHpM TSHPm TsHPM tshPm tShPm tshPM TsHPm tSHPm tsHPM TShPM tSHPM

TSHPM

Paratyphi B TshPm

tsHPm TShPm TshPM

tShPM TSHPm TsHPM

TsHpM tShpM tsHPM TShPM tSHPM tshpM tshPM tSHpM

TSHPM

Muenchen TshpM tsHhpM TShpM TshPM tShPM TSHpM

0

TsHPM

2000

4000 No. of identified peptides

26 ACS Paragon Plus Environment

6000

Page 27 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figure 2. Yes

Salmonella genus markers

Genus = Salmonella? Present in any of DK, DO, DF, DN?

No

BLAST

Yes

Genus = Salmonella? Species = enterica ?

Yes

Salmonella enterica markers

Yes Subspecies = enterica ?

Identified across the five Salmonella serovar proteomes?

Salmonella enterica subsp. enterica markers

No Present in two or fewer of DT, DS, DH, DP, DM?

Yes BLAST

no. of hit serovars @ 10?

Yes

combination of markers for Salmonella serovars

27 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3. All tryptic peptides (SG) SLTDTLEEVLSSSGEK

Salmonella (SE) APAVEPAAQPALGAR

Salmonella enterica (SEE1) GLDLSPTNELLIDESLIGWK (SEE2) AGNVIGGGDWAK

Salmonella enterica subsp. enterica

(T1) EVTPFGAALR (T2) GNTIVGSGSGGTTK

Typhimurium

(S1) LLTEHNLEASAIK (S2) TAIIWEGDDTSQSK

Saintpaul

(H1) AAEESAQISQR (H2) INDNQVIDGGESR

Heidelberg

(P1) IGAADYTILGTVK (P2) TVNYTDATGATK

Paratyphi B

(M1) VNLIESLESLSVTK (M2) YDANNVYLAAQYSQTYNATR

Muenchen

28 ACS Paragon Plus Environment

Page 28 of 30

Page 29 of 30

8 4

SEE1

M1

SG

T1 M2 P1

X2 X0.5

X0.01

SARA2 (T)

X0.02

SARA24 (S)

X0.5 *

0 8 4 0 4

SE SEE2 S1 S2

H2

P2

H1 T2

Figure 4.

Intensity (x107)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

X3 X0.5

X0.5

* X5 *

X0.2

X2

X5

2 0

X0.01

SARA50 (P)

X0.01

SARA63 (M)

X2

X3

4

X0.5 *

0 20 2

X0.1 X0.1

SARA39 (H)

X0.01

0

30

40

50

60

70

X10

X0.2

80 X2

90 X0.1 X0.01

*

100 mixture

0 20

30

40

50

60

70

80

90

100

Retention time (min)

29 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

For TOC only

30 ACS Paragon Plus Environment

Page 30 of 30