Bacterial Whole Cell Typing by Mass Spectra Pattern Matching with

Oct 31, 2017 - Our approach combines new measures for spectra similarity and a novel bootstrapping assessment. We tested our approach on a general dat...
1 downloads 0 Views 546KB Size
Subscriber access provided by READING UNIV

Article

Bacterial Whole Cell Typing by Mass Spectra Pattern Matching with Bootstrapping Assessment Yi Yang, Yu Lin, Zhuoxin Chen, Tianqi Gong, Pengyuan Yang, Hubert H. Girault, Baohong Liu, and Liang Qiao Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.7b03820 • Publication Date (Web): 31 Oct 2017 Downloaded from http://pubs.acs.org on November 6, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Analytical Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 7

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Bacterial Whole Cell Typing by Mass Spectra Pattern Matching with Bootstrapping Assessment Yi Yang1, Yu Lin2,*, Zhuoxin Chen3, Tianqi Gong3, Pengyuan Yang1,3, Hubert Girault4, Baohong Liu1,3, and Liang Qiao1,3* 1. Department of Chemistry, Shanghai Stomatological Hospital, Fudan University, Shanghai 200000, China 2. College of Engineering and Computer Science, The Australian National University, Canberra, Australia 3. Institutes of Biomedical Sciences, Fudan University, Shanghai 200000, China 4. Laboratoire d’Electrochimie Physique et Analytique, Ecole Polytechnique Fédérale de Lausanne, Industrie 17, CH1951 Sion, Switzerland ABSTRACT: Bacterial typing is of great importance in clinical diagnosis, environmental monitoring, food safety analysis and biological research. Matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS) is now widely used to analyze bacterial samples. Identification of bacteria at species level can be realized by matching the mass spectra of samples against a library of mass spectra of known bacteria. Nevertheless, in order to reasonably type bacteria, identification accuracy should be further improved. Herein, we propose a new framework to the identification and assessment for MALDI-MS based bacterial analysis. Our approach combines new measures for spectra similarity and a novel bootstrapping assessment. We tested our approach on a general dataset containing the mass spectra of 1,741 strains of bacteria and another challenging dataset containing 250 strains, including 40 strains in Bacillus cereus group that were previously claimed impossible to be resolved by MALDI-MS. With bootstrapping assessment, we achieved much more reliable predictions at both the genus and species level, and enabled to resolve the Bacillus cereus group. To the best of the authors’ knowledge, our method is the first to provide a statistical assessment to MALDI-MS based bacterial typing that could lead to more reliable bacterial typing.

Rapid and reliable microbial identification is an urgent civilian and military safety requirement.1-3 Exposure to pathogenic bacteria can happen in public areas, such as hospitals, airports and train stations, and occupational workplaces, for example dental practices, wastewater treatment plants and slaughterhouses.4-11 The treatment of patients with bacterial infection calls for accurate and fast bacterial typing. However, traditional bacterial identification relies on plate culture followed with phenotypic analyses.12,13 It usually needs 2-3 days to generate a clinical diagnosis report, limiting its clinical value. Therefore, the usage of antibiotics still largely relies on experience, leading to the global antimicrobial resistance problem.14 To date, super bacteria resistant to most existing antibiotics are increasingly reported.15

pattern matching between the mass spectra of samples and a library of standard mass spectra of known bacteria. Using this concept, commercial MALDI-TOF MS systems, such as Vitek MS of BioMérieux and MALDI Biotyper of Bruker Daltonics, have been adopted to identify a variety of bacteria.25-27 Because of its advantages in simple sample preparation, high automation and low cost, the technique is now widely equipped in clinical laboratories. Although promising, in order to reasonably type bacteria, the performance, e.g. robustness, resolution and discriminatory power, of MALDI-MS based bacterial analyses needs to be improved.28 Efforts in bacterial sample preparation, MALDI sample spots preparation, optimization of mass spectrometry methods and data analyses could be made for the purpose. Herein, we focus on data analysis to improve the performance of MALDI-MS based bacterial typing.

New bacterial identification approaches based on advanced genotypic and mass spectrometric techniques have been developed.16-18 Genotypic methods include high throughput sequencing, 16S rRNA gene sequencing and PCR.19-21 However, these methods suffer from complexity in sample pretreatment and expensive kits, limiting their widespread in clinical usages. Matrix-assisted laser desorption/ionization (MALDI) time-of-flight (TOF) mass spectrometry (MS) profiling of bacteria has been exploited since the 1990s.22-24 Identification is based on spectra

To date, Biotyper from Bruker Daltonics and spectral archive and microbial identification system (SARAMIS) from bioMérieux have emerged as two of the mostly used bacterial MALDI-TOF MS typing software solutions.29-34 In addition to Biotyper and SARAMIS, there are also some open access tools for microbial identification. Karola et al constructed a spectral library named SpectraBank, including peak mass lists for more than 200 bacterial strains/70 1

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

bacterial species.35 Robert Koch-Institute developed a Matlab-based software MicrobeMS, which allows spectral preprocessing, such as smoothing, baseline correction, intensity normalization and peak detection, and provides microbial identification based on interspectral distances based similarity measures.36

Page 2 of 7

Resources Centres (EMbaRC) and the database of highly pathogenic bacteria from the Robert Koch-Institute. The mass spectra of 250 bacterial strains including 40 Bacillus cereus group strains were obtained from the Public Health Agency of Sweden. The two datasets were used separately because of the difference in data quality control from different institutes. Raw mass spectra were processed using flexAnalysis software (version 3.4; Bruker Daltonics, Germany). Peaks with a signal-to-noise ratio (S/N) of at least 3 were extracted from each spectrum after baseline correction and intensity normalization, and were exported as text files (.txt).

In these software solutions, spectra similarity based measures are used as the measurement for pattern matching level between the mass spectra from samples and a library. A threshold of the similarity score is usually applied for effective identification. For instance, Biotyper suggests a cutoff value for its log(score) as 2.0, 37 wherein the log(score) is an ad-hoc unit score between 0 and 3 reflecting the similarity between sample and reference spectra.38 Nevertheless, assessment of reported identifications has not been properly addressed. Some researchers suggested that the cutoff value of Biotyper could be lower or higher depending on spectra quality and analyzed bacterial samples.39 Moreover, some closely related species or subspecies cannot be resolved using MALDI-MS and current data analysis methods, e.g. the Bacillus cereus group including Bacillus cereus, Bacillus anthracis and Bacillus thuringiensis, which share more than a 70% common genome according to DNA-DNA hybridization experimental results.40 According to the guideline M58 from the Clinical and Laboratory Standards Institute (CLSI) 41, MALDITOF MS cannot resolve the Bacillus cereus group at species level to date.

All subsequent data analyses were conducted with R from the R Foundation for Statistical Computing (http://www.rproject.org). Source code for this work is available at https://github.com/lmsac/BacteriaMS.

Similarity Measures for Spectra Pattern Matching In this work, we propose two new spectra similarity based measures, relative Euclidean distance similarity (Eu) and intensity-weighted relative Euclidean distance similarity (iEu). A pair of common peaks between spectrum i and spectrum j is determined when the m/z of a peak from spectrum j is located within the tolerance width of a peak from specth trum i, and the k pair of common peaks is denoted by (Pjk, Pik). The tolerance is 2,000 ppm in this work. The Euclidean th distance between the k pair of common peaks is: 2 2 euk = ( xik − x jk ) + ( yik − y jk )  ,1 ≤ k ≤ lij  

In this study, we propose a new framework for more accurate bacterial typing based on their MALDI-TOF MS spectra. This framework includes two steps. Firstly, spectra pattern matching is applied to figure out the candidates of identification results, where we introduced two new measures for spectra similarity, i.e. relative Euclidean similarity (Eu) and intensity-weighted relative Euclidean similarity (iEu), in addition to the well-known cosine correlation (Cos). Secondly, the identification results are further assessed by a novel bootstrapping model to provide more reliable characterization. The performance of the framework was evaluated with a general dataset containing the MALDI-TOF MS spectra of 1,741 strains of bacteria and another challenging dataset containing 250 strains, including 40 strains in Bacillus cereus group. Compared to the commonly used cosine correlation, Eu and iEu showed better performance. In addition, much more reliable bacterial identification could be provided with the bootstrapping assessment. With the dataset containing 1,741 strains, the sensitivity of bacterial identification at species level increased from 41% to 88% at the same error rate (5%) with the help of bootstrapping assessment. Our approach also makes it possible to resolve the species in the Bacillus cereus group using their MALDI-TOF MS spectra.

(M1)

where lij is the number of common peaks in spectrum i to spectrum j, x is the m/z of peak, and y is the intensity of peak normalized to 100 with respect to the strongest peak in the corresponding spectrum. Assuming Pik is more intense than Pjk, the relative Euclidean distance (Euk, M2) between the pair of peaks is defined as the ratio of euk to the maximal possible Euclidean distance (eumk, M3) to Pik from a pseudopeak in spectrum j, where d(m/z) = tolerance width and d(I) = intensity of Pik (see Figure 1).

Euk =

euk eumk

(M2)

2 2 eumk =  max ( xik , x jk ) ⋅ tolerance 2 + max ( yik , y jk )    (M3)

For peaks in spectrum i or j without a common peak, a relative Euclidean distance of 1 is defined by assuming a pseudopeak with intensity = 0 and d(m/z) = tolerance width. The relative Euclidean similarity (Eu) score between spectra i and j is then defined as: lij

Eu = 1 −

MATERIALS AND METHODS Mass Spectra Datasets and Coding

∑ Eu

k

+ ni + n j − 2lij

k =1

ni + n j − lij

(M4)

which is the average value of Euk of each pair of aligned common peaks and then reduced by 1. ni is the number of peaks in spectrum i; nj is the number of peaks in spectrum j.

As shown in Dataset S-1, the mass spectra of 1,741 strains were obtained from the European Consortium of Microbial

2

ACS Paragon Plus Environment

Page 3 of 7

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry the ratio of the bootstrapped spectra with the best match in the same genus/species as those of the sample spectrum.

For iEu score, relative Euclidean distance is weighted by the intensity (y) of peaks, and is expressed as: lij

∑ Eu ( y k

iEu = 1 −

k =1

ik

ni

nj

lij

k =1

k =1

k =1

+ y jk ) + ∑ yik + ∑ y jk − ∑ ( yik + y jk ) nj

ni

∑ y +∑ y ik

k =1

jk

ConfGN = Count ( GN b = GN ) n

(M7)

Conf SP = Count ( SPb = SP ) n

(M8)

(M5)

k =1

The cosine correlation (Cos) is a frequently used method 42 to calculate similarity between spectra. In this work, we modify the commonly used cosine correlation by introducing fake common peaks with intensity as 0 for any peak k in spectrum i or j which does not have a real common peak, and is expressed as: lij

∑y

Cos =

ik

y jk

(M6)

k =1

nj

ni

∑y ∑y 2 ik

k =1

2 jk

k =1

Figure 2. Workflow of the bootstrapping model. 2, 3 and 4 on the bootstrapped spectra indicate the occurrence frequency of each peak.

RESULTS AND DISCUSSION Performance of New Similarity Measures In this work, three spectra similarity based measures are employed for spectra pattern matching: cosine correlation (Cos), relative Euclidean similarity (Eu) and intensity-weighted relative Euclidean similarity (iEu), as explained above in the Materials and Methods section. The performance of Eu, iEu, and Cosine correlation in bacterial typing by mass spectra pattern matching was evaluated with a dataset containing 1,741 mass spectra of 1,741 bacterial strains (Dataset S-1). In the dataset, each species contains at least two strains. Each spectrum in the dataset was selected as the sample spectrum, and the other spectra were used to construct a reference database, respectively. The three similarity measures were applied to calculate the similarity between the sample spectrum and each reference spectrum in the database. The species or genus corresponding to the reference spectrum with the highest spectra similarity score was regarded as the identification results of the sample spectrum. It is worthwhile to mention that there was not the correct strain for any input spectrum in its corresponding database, which might lead to lower identification accuracy during the cross-validation test compared to the real situation of MALDI-MS based bacterial typing.

Figure 1. Concept of the relative Euclidean distance between two peaks. xik is the m/z of peak k from spectrum i.

The Bootstrapping Model 43

The bootstrapping model was introduced by Efron , and has been widely used in many research areas, e.g. phylo44 genomics by Joseph Felsenstein . Bootstrapping is used to estimate the confidence limits on complicated estimator 44 (e.g., phylogenetic inference or bacterial identification in our paper) that may be analytically intractable. Given a sample spectrum with N peaks, a bootstrapped spectrum is a fictional spectrum constructed by random sampling the N peaks with replacement from the sample spectrum (see Figure 2). In our experiments, 1,000 bootstrapped spectra were generated for each input spectrum. Each bootstrapped spectrum (Sb) was compared with reference spectra in database to find out the best match with its genus (GNb) and species (SPb), respectively. When calculating the similarity score between Sb and each reference spectrum (Sr), for a group of duplicated peaks (Pb) with occurrence frequency f > 1 in Sb, if there was one corresponding common peak (Pr) in Sr, f Pb were add to Pr to obtain f common peak pairs (Pb, Pr); otherwise, each Pb was matched with a fake peak with intensity as 0, respectively. The confidence score (Conf) of the best match (GN, SP) of the sample spectrum at genus/species level was defined as

The samples and corresponding top matches are shown in Dataset S-2. Considering similarity score s ≥ s0 as a threshold for reliable identification, the sensitivity (S), specificity (SP) and error rate (ER) at the genus and species levels were calculated as:

Sgenus = Count ( correct genus and s ≥ s0 ) Count ( correct genus ) (R1) 3

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

SPgenus = Count ( incorrect genus and s < s0 ) Count ( incorrect genus )

Page 4 of 7

fore, we have further introduced a bootstrapping model 43 to provide assessment to the identification results from spectra pattern matching. The workflow of the bootstrapping model has been explained in details above in the Materials and Methods section.

(R2)

ERgenus = 1 − Count ( correct genus and s ≥ s0 ) Count ( s ≥ s0 ) (R3)

Sspecies = Count ( correct species and s ≥ s0 ) Count ( correct species )

Considering confidence score Conf ≥ Conf 0 as a threshold for reliable identification, the sensitivity (S), specificity (SP) and error rate (ER) at the genus and species levels were calculated as:

(R4) SPspecies = Count ( incorrect species and s < s0 ) Count ( incorrect species )

(R5)

Sgenus = Count ( correct genus and Conf ≥ Conf0 ) Count ( correct genus )

ERspecies = 1 − Count ( correct species and s ≥ s0 ) Count ( s ≥ s0 )

(R7)

(R6)

SPgenus = Count ( incorrect genus and Conf < Conf0 ) Count ( incorrect genus )

The receiver operating characteristic (ROC) curves for the classifiers were then obtained as shown in Figure 3. It was observed that Eu (area under the curve (AUC)genus = 96.9%, AUCspecies = 69.1%) and iEu (AUCgenus = 94.9%, AUCspecies = 70.1%) showed clearly better performance than Cos (AUCgenus = 88.8%, AUCspecies = 71.6%) at genus level. The performances at species level were similar with the three similarity measures.

(R8) ERgenus = 1− Count ( correct genus and Conf ≥ Conf0 ) Count ( Conf ≥ Conf0 )

(R9) Sspecies = Count ( correct species and Conf ≥ Conf0 ) Count ( correct species )

(R10)

The widely used spectra similarity measures, such as Pearson’s correlation, Spearman’s correlation and cosine correla45 tion, consider mainly differences in relative intensities between the common peaks from two spectra and the number of common peaks. Eu and iEu proposed in this work further take the difference in m/z between each couple of common peaks into consideration. With linear TOF technique, the mass spectra of bacterial whole cells have normally limited resolution (500-1,000 ppm). With such resolution, tolerance for common peaks alignment is quite large, leading to large d(m/z) between common peaks and possible false alignment. Considering d(m/z) in the case can help to rule out false positive identification. Compared to Eu, iEu increases the weighing of peaks with strong intensity in similarity scoring. All these modifications aim at a more reliable mass spectra similarity matching based bacterial typing.

By calculating the area under ROC curves (Figure 4) with the dataset of 1,741 bacterial strains, we observed that the performance of the classifiers at species level was significantly enhanced after bootstrapping assessment. The AUC values for genus level identification increased from 0.88-0.97 to 0.93-0.98; and the AUC values for species level identification increased from 0.69-0.72 to 0.84-0.86. Dataset S-2 shows the Conf at genus and species level for the top match of each sample by Cos, Eu and iEu.

Figure 3. Receiver operating characteristic (ROC) curves for the identification of bacterial samples with Cos, Eu and iEu at genus and species levels.

Figure 4. ROC curves for the identification of bacterial samples by Cos, Eu and iEu with bootstrapping assessment at genus and species levels.

Bootstrapping Based Assessment for Bacterial Typing by MALDI-TOF MS

Plots of sensitivity and error rate against similarity score and confidence score threshold at genus and species levels are shown in supporting information Figure S-1 and Figure S-2. If we require error rate ≤ 1% for genus level identification and error rate ≤ 5% for species level identification, the corre-

SPspecies = Count ( incorrect species and Conf < Conf 0 ) Count ( incorrect species )

(R11) ERspecies = 1 − Count ( correct species and Conf ≥ Conf 0 ) Count ( Conf ≥ Conf 0 )

(R12)

Although the predictor from iEu similarity measure showed better performance, incorrect identifications could still be observed, especially at species level. There4

ACS Paragon Plus Environment

Page 5 of 7

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

sponding score thresholds and sensitivities are listed in Table 1. Using the confidence scores by the bootstrapping model, sensitivities were highly increased with the same error rates, e.g. from 41% to 88% with 5% error rate at the species level using Eu. Among the three similarity measures, Eu showed the best performance with or without bootstrapping. With 0.41 as confidence score threshold for genus level identification, the error rates were < 1%, while the sensitivities were > 99%. Similarly, a threshold of 0.85 could be used for species level identification, wherein the error rates were < 5%, while the sensitivities were 88%.

high as 0.85 (similar to the AUC value of 0.86 of 1,741 samples), and thus indicates much more reliable identification of the 40 strains in the Bacillus cereus group at species level. Therefore, the bootstrapping model enables us to resolve the species within the Bacillus cereus group by MALDI-TOF MS for the first time.

Table 1. Thresholds of similarity score and confidence score and corresponding sensitivities, given error rates at genus level ( 99

0.98

0.85

88

4.9

Conf(iEu)

0.61

99

0.77

0.81

86

5.0

Figure 5. ROC curves for the identification of strains in the Bacillus cereus group at species level by using similarity measures and bootstrapping assessment. (A) ROC curves for the top matches by the similarity scores. (B) ROC curves for the re-ranked top matches by the Conf scores.

Conclusions We have implemented a framework including three spectra similarity based measures and a bootstrapping model for MALDI-TOF MS characterization of bacterial samples. Compared to Cos, the new similarity measures, Eu and iEu showed good performance in bacterial identification. The bootstrapping model provided reliable assessment, wherein identification performance at species level was highly enhanced. The distinctive feature was demonstrated using 40 strains from the Bacillus cereus group. The method could promote MALDI-MS based bacterial typing and may further be applied to many other spectroscopy-based analytics, e.g. the MS/MS spectral pattern matching in proteomics and metabolomics.

Resolving Species in Bacillus cereus Group with the Bootstrapping Model The bootstrapping model provides statistical assessment of spectra similarity matching based identification by evaluating the confidence scores of top matches. Higher confidence scores indicate better reliability of the top matches. However, the assessment doesn’t propose new matches for the ones with low Conf score. Therefore, we further calculated the confidence score of each genus/species in the reference database, and re-ranked the matches by their corresponding confidence scores. The genus/species with the top confidence score (ConfR) was given as the identification at genus/species level.

ASSOCIATED CONTENT Supporting Information The Supporting Information is available free of charge on the ACS Publications website.

Using the method, we tried to resolve species within the Bacillus cereus group, which was reported as an impossible 41 task for MALDI-TOF MS. The Bacillus cereus group contains five species, i.e. Bacillus cereus sensu stricto, Bacillus thuringiensis, Bacillus anthracis, Bacillus mycoides, and Bacil46 lus weihenstephanensis. A dataset containing the mass spectra of 250 bacterial strains (Dataset S-1) including 40 strains of Bacillus cereus, Bacillus anthracis and Bacillus thuringiensis was used to illustrate the performance of the method. Each of the 40 strains was selected as a sample spectrum, and the other 249 spectra were used to construct a reference database, respectively. The identification results are shown in Dataset S-3 and the ROC curves are shown in Figure 5. Without bootstrapping, the predicted identification at species level of the 40 strains showed random per41 formance, in accordance with the guideline M58 of CLSI . Using the re-ranked top matches according to Conf scores from the bootstrapping assessment, AUC value reaches as

Plots of sensitivity and error rate against spectra similarity score and Conf score threshold at genus/species level. (Word) Name list of the bacterial strains in the dataset. (Excel) Identification results of the 1,741 spectra. (Excel) Identification results of the 40 spectra of the Bacillus cereus group. (Excel)

AUTHOR INFORMATION Corresponding Author * Correspondence should be addressed to Dr. Yu Lin ([email protected]) or Dr. Liang Qiao ([email protected]).

ACKNOWLEDGMENT 5

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 7

(21) Emerson, D.; Agulto, L.; Liu, H.; Liu, L. P. Bioscience 2008, 58, 925-936. (22) Krishnamurthy, T.; Ross, P. L. Rapid Commun. Mass Spectrom. 1996, 10, 1992-1996. (23) Claydon, M. A.; Davey, S. N.; EdwardsJones, V.; Gordon, D. B. Nat. Biotechnol. 1996, 14, 1584-1586. (24) Holland, R. D.; Wilkes, J. G.; Rafii, F.; Sutherland, J. B.; Persons, C. C.; Voorhees, K. J.; Lay, J. O. Rapid Commun. Mass Spectrom. 1996, 10, 1227-1232. (25) Dixon, P.; Davies, P.; Hollingworth, W.; Stoddart, M.; MacGowan, A. Eur. J. Clin. Microbiol. Infect. Dis. 2015, 34, 863-876. (26) Chen, J. H. K.; Ho, P. L.; Kwan, G. S. W.; She, K. K. K.; Siu, G. K. H.; Cheng, V. C. C.; Yuen, K. Y.; Yam, W. C. J. Clin. Microbiol. 2013, 51, 1733-1739. (27) Loonen, A. J. M.; Jansz, A. R.; Stalpers, J.; Wolffs, P. F. G.; van den Brule, A. J. C. Eur. J. Clin. Microbiol. Infect. Dis. 2012, 31, 1575-1583. (28) Sauget, M.; Valot, B.; Bertrand, X.; Hocquet, D. Trends Microbiol. 2017, 25, 447-455. (29) Seng, P.; Drancourt, M.; Gouriet, F.; La Scola, B.; Fournier, P.-E.; Rolain, J. M.; Raoult, D. Clin. Infect. Dis. 2009, 49, 543-551. (30) Bizzini, A.; Durussel, C.; Bille, J.; Greub, G.; Prod'hom, G. J. Clin. Microbiol. 2010, 48, 1549-1554. (31) Risch, M.; Radjenovic, D.; Han, J. N.; Wydler, M.; Nydegger, U.; Risch, L. Swiss Med. Wkly. 2010, 140, w13095-w13095. (32) Szabados, F.; Woloszyn, J.; Richter, C.; Kaase, M.; Gatermann, S. J. Med. Microbiol. 2010, 59, 787-790. (33) Fothergill, A.; Kasinathan, V.; Hyman, J.; Walsh, J.; Drake, T.; Wang, Y. F. J. Clin. Microbiol. 2013, 51, 805-809. (34) Lohmann, C.; Sabou, M.; Moussaoui, W.; Prevost, G.; Delarbre, J.-M.; Candolfi, E.; Gravet, A.; Letscher-Bru, V. J. Clin. Microbiol. 2013, 51, 1231-1236. (35) Böhme, K.; Fernández-No, I. C.; Barros-Velázquez, J.; Gallardo, J. M.; Cañas, B.; Calo-Mata, P. Electrophoresis 2012, 33, 2138-2142. (36) Lasch, P.; Wahab, T.; Weil, S.; Pályi, B.; Tomaso, H.; Zange, S.; Kiland Granerud, B.; Drevinek, M.; Kokotovic, B.; Wittwer, M.; Pflüger, V.; Di Caro, A.; Stämmler, M.; Grunow, R.; Jacob, D. J. Clin. Microbiol. 2015, 53, 2632-2640. (37) Schulthess, B.; Bloemberg, G. V.; Zbinden, R.; Bottger, E. C.; Hombach, M. J. Clin. Microbiol. 2014, 52, 1089-1097. (38) Schulthess, B.; Brodner, K.; Bloemberg, G. V.; Zbinden, R.; Bottger, E. C.; Hombach, M. J. Clin. Microbiol. 2013, 51, 1834-1840. (39) Mather, C. A.; Rivera, S. F.; Butler-Wu, S. M. J. Clin. Microbiol. 2014, 52, 130-138. (40) Ivanova, N.; Sorokin, A.; Anderson, I.; Galleron, N.; Candelon, B.; Kapatral, V.; Bhattacharyya, A.; Reznik, G.; Mikhailova, N.; Lapidus, A.; Chu, L.; Mazur, M.; Goltsman, E.; Larsen, N.; D'Souza, M.; Walunas, T.; Grechkin, Y.; Pusch, G.; Haselkorn, R.; Fonstein, M., et al. Nature 2003, 423, 87-91. (41) Clinical and Laboratory Standards Institute. M58 Methods for the Identification of Cultured Microorganisms Using MatrixAssisted Laser Desorption/Ionization Time-of-Flight Mass Spectrometry, 1st ed., 2017. (42) Wan, K. X.; Vidavsky, I.; Gross, M. L. J. Am. Soc. Mass Spectrom. 2002, 13, 85-88. (43) Efron, B. Ann. Stat. 1979, 7, 1-26. (44) Felsenstein, J. Evolution 1985, 39, 783-791. (45) Kim, S.; Zhang, X. Comput. Math. Methods Med. 2013, 2013, 12. (46) McIntyre, L.; Bernard, K.; Beniac, D.; Isaac-Renton, J. L.; Naseby, D. C. Appl. Environ. Microbiol. 2008, 74, 7451-7453.

This work was supported by National Natural Science Foundation of China (NSFC, 81671849), Ministry of Science and Technology of China (MOST, 2016YFE0132400), Science and Technology Commission of Shanghai Municipality (17JC1400900), and the Thousand Talents Program of China. We acknowledge the following institutes for sharing their MALDI-TOF MS data: European Consortium of Microbial Resources Centres (EMbaRC), Robert Koch-Institute and the Public Health Agency of Sweden. The research leading to these results had received funding from the European Community’s Seventh Framework Programme (FP7, 2007-2013), Research Infrastructures action, under the grant agreement No. FP7-228310 (EMbaRC project).

REFERENCES (1) Fitch, J. P.; Raber, E.; Imbro, D. R. Science 2003, 302, 13501354. (2) Sadik, O. A.; Wanekaya, A. K.; Andreescu, S. J. Environ. Monit. 2004, 6, 513-522. (3) Kitchen, L. W.; Vaughn, D. W. Vaccine 2007, 25, 7017-7030. (4) Laitinen, S.; Kangas, J.; Kotimaa, M.; Liesivuori, J.; Martikainen, P. J.; Nevalainen, A.; Sarantila, R.; Husman, K. Am. Ind. Hyg. Assoc. J. 1994, 55, 1055-1060. (5) Bennett, A. M.; Fulford, M. R.; Walker, J. T.; Bradshaw, D. J.; Martin, M. V.; Marsh, P. D. Br. Dent. J. 2000, 189, 664-667. (6) Korzeniewska, E. Front. Biosci. (Schol. Ed.) 2011, 3, 393-407. (7) Erik, M.; Wijnand, E.; Asbjnrrn, S.; Per, S.; Jsrgen, L.; Per, S.; Kari, H. Am. J. Ind. Med. 1994, 25, 59-63. (8) Lawniczek-Walczyk, A.; Gorny, R. L.; Wlazlo, A. Ann. Agric. Environ. Med. 2013, 20, 259-268. (9) Dahlman-Hoglund, A.; Renstrom, A.; Acevedo, F.; Andersson, E. Ann. Occup. Hyg. 2013, 57, 1020-1029. (10) Eduard, W.; Heederik, D.; Duchainede, C.; Greenf, B. J. J. Environ. Monit. 2012, 14, 334-339. (11) Lopata, A. L.; Jeebhay, M. F. Curr Allergy Asthma Rep. 2013, 13, 288-297. (12) Bertelli, C.; Greub, G. Clin. Microbiol. Infect. 2013, 19, 803813. (13) Loman, N. J.; Misra, R. V.; Dallman, T. J.; Constantinidou, C.; Gharbia, S. E.; Wain, J.; Pallen, M. J. Nat. Biotechnol. 2012, 30, 434-439. (14) Laxminarayan, R.; Duse, A.; Wattal, C.; Zaidi, A. K. M.; Wertheim, H. F. L.; Sumpradit, N.; Vlieghe, E.; Hara, G. L.; Gould, I. M.; Goossens, H.; Greko, C.; So, A. D.; Bigdeli, M.; Tomson, G.; Woodhouse, W.; Ombaka, E.; Peralta, A. Q.; Qamar, F. N.; Mir, F.; Kariuki, S., et al. Lancet Infect. Dis. 2013, 13, 10571098. (15) Herzog, T.; Chromik, A. M.; Uhl, W. Eur. J. Med. Res. 2010, 15, 525-532. (16) Woo, P. C. Y.; Lau, S. K. P.; Teng, J. L. L.; Tse, H.; Yuen, K. Y. Clin. Microbiol. Infect. 2008, 14, 908-934. (17) Maurin, M. Expert Rev. Mol. Diagn. 2012, 12, 731-754. (18) Mai, M.; Müller, I.; Maneg, D.; Lohr, B.; Haecker, A.; Haberhausen, G.; Hunfeld, K. Methods Mol. Biol. 2015, 1237, 139-147. (19) Matsuki, T.; Watanabe, K.; Fujimoto, J.; Kado, Y.; Takada, T.; Matsumoto, K.; Tanaka, R. Appl. Environ. Microbiol. 2004, 70, 167-173. (20) Peters, R. P. H.; van Agtmael, M. A.; Danner, S. A.; Savelkoul, P. H. M.; Vandenbroucke-Grauls, C. Lancet Infect. Dis. 2004, 4, 751-760.

6

ACS Paragon Plus Environment

Page 7 of 7

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

For TOC

ACS Paragon Plus Environment

7