A New Phylogenetic Approach and Algorithm to Chart Evolutionary

May 6, 2013 - University of Sydney, Sydney, New South Wales, Australia. ‡. Prince of Wales Clinical School and Lowy Cancer Research Centre, Universi...
0 downloads 0 Views 1MB Size
Article pubs.acs.org/ac

Mass Trees: A New Phylogenetic Approach and Algorithm to Chart Evolutionary History with Mass Spectrometry Aaron T. L. Lun,† Kavya Swaminathan,† Jason W. H. Wong,†,‡ and Kevin M. Downard*,† †

School of Molecular Bioscience. University of Sydney, Sydney, New South Wales, Australia Prince of Wales Clinical School and Lowy Cancer Research Centre, University of New South Wales, Sydney, New South Wales, Australia



ABSTRACT: A new phylogenetics approach and algorithm with which to chart the evolutionary history of organisms is presented. It utilizes mass spectral data produced from the proteolytic digestion of proteins, rather than partial or complete gene or translated gene sequences. The concept and validity of the approach is demonstrated herein using both theoretical and experimental mass data, together with the translated gene sequences of the hemagglutinin protein of the influenza virus. A comparison of the mass trees with conventional sequenced-based phylogenetic trees, using two separate tree comparison algorithms, reveals a high degree of similarity and congruence among the trees. Given that the mass map data can be generated more rapidly than gene sequences, even when next generation parallel sequencing is employed, mass trees offer new opportunities and advantages for phylogenetic analysis.

P

and viruses that make use of identifiable ions within the mass spectrum.12−14 Phylogenetic analysis can be performed with protein sequences where they are obtained from the in silico translation of gene sequences or by tandem mass spectrometry (MS/MS) using either “bottom−up”15,16 or “top−down”17,18 approaches. The former requires the availability of gene sequences whereas the latter is more time-consuming than mass mapping or fingerprinting.19,20 Regardless of the mass spectrometer employed, there is typically some loss in sensitivity in MS/ MS spectra of peptides over their conventional analysis by mass (MS) albeit some background ions are removed. Data from MS/MS experiments are also more difficult to interpret, even when computer algorithms are available.21,22 The masses of proteolytic peptides produced from the proteolytic digestion of a protein also reflect the sequence of that protein. As such, homologous proteins will yield proteolytic peptides with identical mass in sections of the protein that share a common sequence. When high-resolution mass spectrometry is employed, these masses are accurate to within, or even below, 1 ppm such that even subtle changes to a peptide’s sequence will alter its mass.23 Lists of peptide masses, rather than sequence data, can be used to construct phylogenetic trees and these “mass trees” can be used to trace evolutionary history following the digestion of component proteins post-recovery or even within whole microbe digests in the case of microorganisms. This affords a significant time

hylogenetic trees represent a mainstay in biology to display and infer the evolutionary relationships among various biological species based upon similarities or differences in their physical or genetic characteristics.1 The branches of the tree display the divergence of a species from a common ancestor.2 The use of gene sequences for phylogenetic studies is based on the assumption that gene mutations occur randomly over time. A full phylogenetic analysis requires the comparison of a wide range of gene sequences including the presence or absence of particular genes. In the case of the influenza virus, reassortment processes result in progeny viruses that contain genes with different evolutionary histories. This necessitates a large amount of gene sequencing that remains a time-consuming process even with next-generation sequencing technologies.3 It becomes even more challenging in the case of organisms where no reference genomes are available. We have been investigating the use of mass spectral data to chart the evolutionary history of microorganisms without the need for gene or even protein sequencing. This has developed from an interest in characterizing and monitoring the evolution of the influenza virus and other biopathogens through their proteotyping with high resolution mass spectrometry.4 It has been demonstrated that the approach enables strains of the influenza virus and by extension other pathogens to be typed,5 subtyped,4,6,7 and their lineage8 determined through single ion detection. The approach has more recently been applied to distinguish pandemic from seasonal influenza,9 study the evolution of influenza viruses across human and animal hosts10 and identify reassorted strains.11 This proteotying approach contrasts other more conventional mass mapping methods employing mass spectrometry to characterize bacteria © XXXX American Chemical Society

Received: February 24, 2013 Accepted: May 6, 2013

A

dx.doi.org/10.1021/ac4005875 | Anal. Chem. XXXX, XXX, XXX−XXX

Analytical Chemistry

Article

This scoring approach minimizes the distance between two sets of mass values that contain nearly identical masses, including those that differ only by a single amino acid substitution. This will be the case among closely related proteins thus associating their parent strains. Note that less weight is given to a pair of peptides whose masses differ by a single amino acid substitution over those that are identical. Their significance to the distance score is arbitrarily weighted at one-half of that of a perfect mass match and the square root of the s value is used to avoid the latter factor being zero if all peptide masses in two peak lists were related by single amino acid substitutions. A distance matrix is then generated through pairwise comparison of mass values in all m/z lists: M1, M2, ..., Mt. The distance matrix is used to construct mass trees using a relaxed neighbor joining approach24 employing the clearcut algorithm.25 The branch lengths of the mass tree reflect the relative ratio of different mass values across all spectra. For a hypothetical pair of spectra with 5 common masses and 2 different ones, the total length of the branch that connects these two sets will be 2/7 (see Figure 1). Generation of Mass and Phylogenetic Trees. Mass trees from theoretical and experimental mass map data were generated using the MassTree algorithm. In the case of experimental mass spectral data produced from the digestion of inactivated virus, deisotoped mass lists containing only the measured monoisotopic masses of all peptide ions were exported from the DataAnalysis software (Bruker Daltonics, Breman Germany) used to process the mass spectral data and input into the MassTree algorithm. The corresponding phylogenetic tree for each mass tree was generated using translated HA sequence of identical strains extracted from the NCBI Influenza Virus Resource Database in FASTA format. Multiple sequence alignment was performed with the Clustal X algorithm (version 2.1),26 using the neighbor-joining method. All mass trees and phylogenetic trees were visualized and midpoint rooted with Archaeopteryx software.27 Comparison of Mass and Phylogenetic Trees. Three algorithms were used to compare the topologies of the mass and phylogenetic trees for each data set. A custom written algorithm called TreeCompare was used to visualize the similarities between trees with edge colorization to highlight matching clades in each tree using a unique color. The Compare2Trees28 and MAST algorithms29,30 were used to compare the topologies of the mass and phylogenetic trees. The Compare2Trees algorithm compares every pair of branches in two trees and assigns a score that reflects the topological similarity of the branches. Branches in the trees are then paired to optimize the overall score expressed as a percentage. The MAST algorithm establishes maximum agreement subtrees (MAST) that result from the “pruning” of the fewest number of end-vertices among two trees.29 The number of leaves in the MAST for any pair of trees, normalized by the mean size expected from random trees, establishes their congruence index. The larger the congruence index, the higher the congruence between the trees than expected by chance. A pvalue is calculated from the congruence index to where a small p-value reflects that the two trees are more congruent than by chance. Furthermore, the p-values were input into the R algorithm to ensure that the significance of the observed probabilities was not a result of multiple testing. The q-value is an adjusted p-value that corrects for the false discovery rate or

saving, and reduces sample consumption, given that mass spectral data of protein digests can be recorded within a fraction of a second. The concept, and results which compare mass trees with conventional sequenced-based phylogenetic trees, is demonstrated here for the first time.



EXPERIMENTAL SECTION

Generation of Simulated Mass Map Data. The FluSim algorithm11 was used to generate theoretical mass map data. Accessions corresponding to translated full length hemagglutinin (HA) sequences from the strains within the various data sets were downloaded from the NCBI Influenza Virus Resource Database. All entries containing identical sequences were collapsed into a single nonredundant entry. The FluSim parameters were set to generate theoretical monoisotopic mass lists for protonated tryptic peptide ions from the sequences within each data set. Mass Accuracy Issues. Only the isomeric residues leucine and isoleucine can be substituted without a change to the mass of a peptide or protein. They share a common theoretical residue mass (consisting of the amino acid residue less a molecule of water) of 113.0841 (monoisotopic). Note that these residues, for the same reason, are also often not distinguished in tandem MS/MS sequencing experiments. Although lysine and glutamine also share the same nominal mass (128), their masses differ within the fractional mass component. This enables them to be distinguished because of the precision achieved with mass spectrometry. Evolutionary substitutions in the protein sequence will usually result in peptides of different mass. These differences can be identified if the peptide masses can be measured to a high mass accuracy. High-resolution mass spectrometers achieve a mass precision at or below 1 ppm. For a peptide with a molecular mass of 1000 Da, this equates to an error of 0.001 Da or better. MassTree Algorithm. The MassTree algorithm reads t sets of monoisotopic m/z values M1, M2, ..., Mt. Each set described as Mx = {mx1, mx2, ..., mxn} contains m/z values for peptide ions detected in a mass spectrum following the proteolytic digestion of a viral or bacterial protein. The m/z values contained within pairs of sets are compared to establish the number of indistinguishable mass values, i. Mass values are deemed to be indistinguishable if the difference between them lies within a specified mass tolerance. The mass error tolerance for the highresolution mass spectra recorded is set at a default of 5 ppm. Among the peptide ions detected, pairs of mass values from M1 and M2 that differ in mass but correlate with a single amino acid substitution, s, are also determined.23 A distance score between M1 and M2 is then computed based on the number of matching mass values within each set according to eq 1. ⎛ 1.0 × i d M1M2 = ⎜⎜1 − nM1nM 2 ⎝

⎞ ⎛ 0.5 × s ⎟⎟ × ⎜⎜1 − nM1nM 2 ⎠ ⎝

⎞ ⎟⎟ ⎠

(1)

where dM1M2 denotes the distance score between the two sets of peak lists M1 and M2, i is the number of matching masses (i.e., indistinguishable m/z values) in M1 and M2, s is the number of masses in M1 and M2 that reflect a single amino acid difference, nM1 and nM2 are the total number of mass values present in M1 and M2, respectively. B

dx.doi.org/10.1021/ac4005875 | Anal. Chem. XXXX, XXX, XXX−XXX

Analytical Chemistry

Article

Figure 1. Alignment of the first 100 N-terminal residues of the hemagglutinin sequences derived from four human strains of the influenza virus. Tryptic peptides are boxed and shown with their mass designations.

Figure 2. Hypothetical mass tree produced from the mass data of Figure 1, together with a phylogenetic tree generated for these sequences using the Phylogeny.fr algorithm.32

Unweighted Pair Group Method with Arithmetic Mean (UPGMA). The UPGMA approach assumes a constant rate of evolution and clusters the most similar sequences, with the smallest distance between them, until a complete rooted tree is generated. The NJ approach, on the other hand, corrects for the unequal evolutionary rates by finding a pair of neighboring leaves that have the same parent node and progressively clusters them. The overall percentile topology scores,

the proportion of false positive outcomes following multiple testing. Comparison of Conventional Phylogenetic Sequence Trees. The same Compare2Trees28 and MAST algorithms29,30 were used to compare two conventional phylogenetic trees generated following the multiple alignment of 50 hemagglutinin sequences (for subsets 1 and 2) each using two different tree generation approaches, namely, Neighbor-Join (NJ) and C

dx.doi.org/10.1021/ac4005875 | Anal. Chem. XXXX, XXX, XXX−XXX

Analytical Chemistry

Article

Figure 3. Mass and phylogenetic trees for 250 representative full-length H1 human hemagglutinin sequences representing one sequence from each country for each year (from 1918 to 2012).

Table 1. Results of Two Tree Comparison Algorithms When Mass and Phylogenetic Trees for Different Numbers of H1 Hemagglutinin Sequences Are Compared Compare2Trees number of H1 sequences 250 50 (subset 1) 50 (subset 2) 25

MAST Icong

overall topological score (%) 60.1 75.7 71.7 80.2

3.177 2.897 3.283 2.606

congruence indices, and p-values were calculated by the Compare2Trees and MAST algorithms, respectively.

p-value 1.198 9.258 7.350 3.205

× × × ×

q-value −29

10 10−16 10−19 10−11

4.80 1.234 1.47 3.21

× × × ×

10−29 10−15 10−18 10−11

are located on a different clade to the other two, and the New Caledonia and Puerto Rico strains diverge earlier due to fewer peptides sharing a common mass. The mass tree groups the sequences to a clade or subclade where the relative length of the branches indicates the number of matching peptides of the total among the sequences. A phylogenetic tree generated using the sequence data from the same strains is also shown in Figure 2. It has the same topology as the mass tree. Construction of Mass Trees from Large Sequence Data Sets: Type A H1 Hemagglutinin. To investigate the reliability of mass trees produced from a larger number of sequences, mass, and phylogenetic trees were constructed for 250 representative full-length H1 human hemagglutinin sequences. The sequences were acquired from the NCBI Influenza virus resource, first sorted according to the date of collection and then by the country of origin. One sequence from each country for any given year (from 1918 to 2012) was selected and the resulting 250 sequences were utilized to generate the trees (Figure 3). Note that the overall topologies of the trees are very similar. Some branches of the trees were colored using the TreeCompare algorithm. The colored clades and subclades contain sequences that are identical in both trees. Uncoloured clades contain at least one different strain between the two trees. The upper branch contains H1 sequences derived from strains that originated before the 2009 pandemic associated with a H1N1 swine-originating influenza virus (SOIV). The lower branch is comprised of H1 sequences from postpandemic strains.



RESULTS AND DISCUSSION The Mass Tree Concept. A simple illustration of the concept behind the generation of mass trees is shown in Figures 1 and 2. An alignment of the first 100 N-terminal residues of the hemagglutinin sequences derived from four human strains of the influenza virus is shown in Figure 1. If these proteins were proteolytically digested with trypsin, peptides will be generated which terminate in a lysine (K) or arginine (R) residue. These are shown boxed in Figure 1. Given the sequences of the A/California/2009 strains are near identical (they differ only at residues 8 and 99), all but two of the peptides generated from either sequence will have the same mass (either m1a or m3a−m7a). The peptides of different mass can be designated m2a and m2b and m8a and m8b. The sequences from the New Caledonia and Puerto Rico strains are more dissimilar when compared to one another, or to the sequences of the two California strains. This means that each strain will have more peptides with unique sequences and therefore unique mass. Only the N-terminal dipeptide MK is common to all four strains and its mass has been arbitrarily designated m1a. A peptide with mass designated m4c is common to both the New Caledonia and Puerto Rico strains while the remaining peptide masses are unique. A mass tree produced solely from the mass data will have an appearance like that shown in Figure 2. The California strains D

dx.doi.org/10.1021/ac4005875 | Anal. Chem. XXXX, XXX, XXX−XXX

Analytical Chemistry

Article

Table 2. Results of Two Tree Comparison Algorithms When Phylogenetic Sequence Trees for Two Sets of 50 H1 Hemagglutinin Sequences Assembled by Two Different Approaches Are Compared Compare2Trees

MAST

number of H1 sequences

overall topological score (%)

Icong

p-value

q-value

50 (subset 1) 50 (subset 2)

68.3 66.8

1.2554 1.5451

1.389 × 10−2 6.568 × 10−5

1.389 × 10−2 1.3134 × 10−4

Figure 4. Mass tree of all known 3208 full-length H1 human hemagglutinin sequences showing location of strains by year group.

This is represented more clearly in Figure 4 in which the mass tree for all 3208 human H1 hemagglutinin sequences extracted from the NCBI Influenza virus resource are shown. The upper branch contains the pre-2009 pandemic strains isolated between 1999 and 2009 in red and others in violet and black. The lower major branch contains the SOIV pandemic 2009 strains in blue, in addition to those that have emerged since, colored in olive. Note that the current post 2010 strains have evolved from different pandemic strain ancestors. To assess the performance and reliability of generating trees based on peptide masses versus gene sequences, the data of Figure 3 was compared using two different tree comparison algorithms. Many methods exist for comparing phylogenetic trees to reveal where similarities and differences exist. Tree construction methods use different criteria to compare sequences and generate trees and thus different trees can result even where common sequence data is utilized. The first algorithm employed to compare the trees, Compare2Trees,28 matches branches with similar topologies through consideration of the leaf nodes along each branch in a tree. For two phylogenetic trees defined T1 and T2, the algorithm pairs each branch in one phylogenetic tree with a matching branch in the second tree, and finds the optimal map between pairs of branches i and j in the two trees. For every pair, a score s(i,j) is assigned that that reflects the topological similarity of the branches i and j. Branches across the two trees are paired up to optimize the overall score. It is expressed as a percentage where 100% represents the optimal tree alignment, that is, identical trees.

As shown in Table 1, an overall topological score of 60.1% is achieved for all 250 sequences represented in Figure 3. Given increased uncertainty in tree construction when the number of sequences is high, trees with fewer H1 sequences were generated. Trees for two subsets of 50 sequences of the total of 250 were produced. The first contained one sequence from each year through to 2009 of geographically diverse origin, while the second set contains more recent 2012 strains but lacks some in earlier periods in order to maintain the same total. The Compare2Trees algorithm reports an overall topology score of 71.7−75.7% for these smaller trees and an improved 80.2% score when a smaller selection of 25 sequences was utilized. These scores compare favorably with those obtained for other sequence data 31 assessed by the Compare2Trees algorithm. This is despite that, in this instance, the trees were generated from different (mass versus sequence) data sets. Each set of trees was further compared using the maximum agreement subtree (MAST) algorithm.29,30 The MAST method combines the information from different phylogenetic trees into a new agreement subtree. A congruence index, Icong, has been reported to test the topological congruence between pairs of trees based on the maximum agreement subtrees.30 A p-value is also employed to report the probability that the two trees are more similar than by chance with a threshold value of 0.05. The congruence indices, p-values, and the adjusted p-values (i.e., q-values) are reported in Table 1 for all four pairs of trees. The congruence index is found to be 3.17 in the case of the larger tree and 2.61 for the smallest tree with 25 leaf nodes. The E

dx.doi.org/10.1021/ac4005875 | Anal. Chem. XXXX, XXX, XXX−XXX

Analytical Chemistry

Article

Figure 5. Mass tree of all known 533 human H1 hemagglutinin sequences of human type B viruses showing diverged lineages.

While viruses of the so-called Yamagata 88 lineage predominated in most parts of Africa, the Americas, and Europe in the 1990s, they were found to cocirculate with strains of the Victoria 87 lineage in eastern Asia. At a global level, type B viruses of each lineage have alternated in prevalence over the past two decades and vaccines against influenza have needed to be formulated accordingly. The ability of mass trees to discern and display the divergence of type B human influenza viruses is shown in Figure 5. A mass tree was generated from all 533 nonidentical H1 hemagglutinin sequences of human type B viruses. All sequences of the upper branch are associated with strains of the Victoria 87 lineage (shown in blue) while those of the Yamagata 88 lineage appear in the lower branch of the two trees. The topology score calculated with the Compare2Trees algorithm was 54.6%. Construction of Mass Trees from Real Experimental Data. To this point, all mass trees have been constructed using sets of theoretical masses for tryptic peptide ions across the entire hemagglutinin sequence corresponding to 100% coverage. The mass spectra of protein digests rarely contain the ions of all potential peptides. This is due to a range of factors including incomplete protein digestion, the preferential ionization of some peptides over others, the ability to detect ions at lower m/z more easily than those of higher m/z on some instruments, and even the suppression of ion signals in complex mixtures, particularly in the case of peptide ions that share similar m/z values. To evaluate the mass trees generated from actual mass spectral data,4,8,10 the masses of peptides detected within the high resolution MALDI mass spectra of twelve H1 and H5 hemagglutinin tryptic digests were used to generate the mass tree shown in Figure 6. The peptides detected represent between 17% and 47% of the reported sequence. Figure 6 shows the phylogenetic tree generated from either the full or partial protein sequences. Both trees separate the H1 hemagglutinin sequences into a different clade (at top in both figures) from the H5 sequences that are grouped below. Of the latter, the H5 sequence derived

two subset trees with 50 leaf nodes have a congruence index of 2.90 and 3.28. In all four trees, the p-values are very low and are similar to the q-values range from 10−11 for the trees with 25 leaves to 10−29 for the trees with 250 leaves. Both tree comparison algorithms find topological similarities among the mass and sequence trees despite their generation using different data types. As expected with any tree comparison some differences exist but the localization of strains from particular hosts and periods (pandemic and nonpandemic) support the validity of mass trees for representing the evolution history of strains of influenza, and by extension other microorganisms. Comparison of Two Sequence Trees Using the Compare2Trees and MAST Algorithms. The tree comparison results were further considered in the context of those obtained for two different sequence trees. Conventional phylogenetic trees were generated for each of the two subsets of 50 hemagglutinin sequences with the Clustal X algorithm,26 using two different tree generation methods employing a neighbor-joining and Unweighted Pair Group Method with Arithmetic Mean (UPGMA) approach. The Compare2Trees and MAST algorithms were then used to calculate the overall topology score, congruence indices, and p-values respectively (Table 2). Despite two identical sequence based data sets, the topology scores reported by the Compare2Trees algorithm were 68.3% and 66.8% for subsets 1 and 2, respectively. These values are slightly lower, but still in accord, with those obtained when the mass and sequence trees were compared with this algorithm, further validating the mass tree approach. The results from the MAST algorithm also gave rise to slightly lower congruence indices and are more congruent than by chance but with much larger p and q values. This is a consequence of the different clustering of sequences when a neighbor-joining versus a Unweighted Pair Group Method with Arithmetic Mean (UPGMA) approach is employed. Mass Trees for Type B Hemagglutinin. Two antigenically distinct lineages of type B influenza viruses have cocirculated in the human population since the late 1980s. F

dx.doi.org/10.1021/ac4005875 | Anal. Chem. XXXX, XXX, XXX−XXX

Analytical Chemistry

Article

Figure 6. Comparison of mass tree, generated from experimental MALDI mass spectral data of twelve H1 and H5 hemagglutinin tryptic digests (representing between 17% and 47% of the available sequence) with the phylogenetic tree for these sequences.

compared with current technologies employed to generate gene sequence data. Subject to the mass spectrometer employed, it also has benefits in terms of infrastructure costs, and is capable of high sample throughput and automation with currently available instrumentation. Although illustrated here with mass and sequence data for the hemagglutinin protein of a wide range of strains of the influenza virus, the fundamental basis of the mass tree approach demonstrates that it has broad applicability to evolutionary studies employing phylogenetic analysis.

from the pandemic Hong Kong 1997 strain is an outlier in both trees due to its divergence from other strains. The human Cambodian and Vietnam H5 sequences are grouped to a common subclade in both trees, as are the avian Hunan, Hong Kong and human Indonesia sequences. Only the human 2003 Hong Kong and 2005 Turkey sequences are grouped differently in the two trees. The peptides detected in the mass spectra for these strains represent 17.4 and 31.5% of the HA sequence respectively. The former has the lowest coverage of all sequences analyzed and this impacts on the ability to position the strain within the mass tree. Despite these minor differences, the topology of the trees is highly conserved supporting the validity of the mass tree approach when real experimental mass data is used. A computer algorithm, known as FluShuffle,11 has recently been written that allows the peptides associated with different proteins to be identified to allow for the construction of mass trees for each digested protein within mixtures.



AUTHOR INFORMATION

Corresponding Author

*Tel. +61 (0)2 93514140. E-mail: [email protected]. Notes

The authors declare no competing financial interest.





ACKNOWLEDGMENTS This work was supported by an Australian Research Council Discovery Project grant (DP120101167) awarded to K.M.D. and J.W.H.W.

CONCLUSIONS The conceptualization and application of mass trees has shown them to be a viable alternative to conventional sequence trees for phylogenetic analysis. The application of several tree comparison algorithms demonstrates that, despite the deployment of different data sets, mass trees are largely congruent with those generated using sequence data. Furthermore, strains can be correctly identified and localized on such trees using mass spectral data. Mass trees can be generated from experimental mass map data more rapidly and directly



REFERENCES

(1) Nei, M., Kumar, S. Molecular Evolution and Phylogenetics; Oxford University Press: Oxford, U.K., 2000. (2) Felsenstein, J. Annu. Rev. Genet. 1998, 22, 521−565. (3) Sanderson, M. J.; Driskell, A. C. Trends Plant Sci. 2003, 8, 374− 379. G

dx.doi.org/10.1021/ac4005875 | Anal. Chem. XXXX, XXX, XXX−XXX

Analytical Chemistry

Article

(4) Schwahn, A. B.; Wong, J. W. H.; Downard, K. M. Anal. Chem. 2009, 81, 3500−3506. (5) Schwahn, A. B.; Wong, J. W. H.; Downard, K. M. J. Virol. Methods 2010, 165, 178−185. (6) Schwahn, A,B.; Wong, J. W. H.; Downard, K. M. Analyst 2009, 134, 2253−2261. (7) Schwahn, A. B.; Wong, J. W. H.; Downard, K. M. Eur. J. Mass Spectrom. 2010, 16, 321−329. (8) Schwahn, A. B.; Downard, K. M. J. Virol. Methods 2011, 171, 117−122. (9) Schwahn, A. B.; Wong, J. W. H.; Downard, K. M. Anal. Chem. 2010, 82, 4584−4590. (10) Ha, J. W.; Downard, K. M. Analyst 2011, 136, 3259−3267. (11) Lun, A. T. L.; Wong, J. W. H.; Downard, K. M. BMC Bioinf. 2012, 13, 208. (12) Demirev, P. A.; Fenselau, C. Annu. Rev. Anal. Chem. 2008, 1, 71−93. (13) Sauer, S.; Freiwald, A.; Maier, T.; Kube, M.; Reinhardt, R.; Kostrzewa, M.; Geider, K. PLoS One 2008, 3, e2843. (14) Sjöholm, M. I. L.; Dillner, J.; Carlson, J. J. Clin. Microbiol. 2008, 46, 540−545. (15) Biemann, K.; Papayannopoulos, I. A. Acc. Chem. Res. 1994, 27, 370−378. (16) Palmblad, M.; Deelder, A. M. Rapid Commun. Mass Spectrom. 2012, 26, 728−732. (17) Ge, Y.; Lawhorn, B. G.; El Naggar, M.; Strauss, E.; Park, J. H.; Begley, T. P.; McLafferty, F. W. J. Am. Chem. Soc. 2002, 124, 672−678. (18) McLafferty, F. W.; Breuker, K.; Jin, M.; Han, X.; Infusini, G.; Jiang, H.; Kong, X.; Begley, T. P. FEBS J. 2007, 274, 6256−6268. (19) Pappin, D. J.; Hojrup, P.; Bleasby, A. J. Curr. Biol. 1993, 3, 327− 332. (20) Henzel, W. J.; Watanabe, C.; Stults, J. T. J. Am. Soc. Mass Spectrom. 2003, 14, 931−942. (21) Johnson, R. S.; Biemann, K. Biomed. Environ. Mass Spectrom. 1989, 18, 945−957. (22) Dancík, V.; Addona, T. A.; Clauser, K. R.; Vath, J. E.; Pevzner, P. A. J. Comput. Biol. 1999, 6, 327−342. (23) Tanaka, K.; Takenaka, S.; Tsuyama, S.; Wada, Y. J. Am. Soc. Mass Spectrom. 1999, 17, 508−513. (24) Evans, J.; Sheneman, L.; Foster, J. A. J. Mol. Evol. 2006, 62, 785−792. (25) Sheneman, L.; Evans, J.; Foster, J. A. Bioinformatics 2006, 22, 2823−2824. (26) Larkin, M. A.; Blackshields, G.; Brown, N. P.; Chenna, R.; McGettigan, P. A.; McWilliam, H.; Valentin, F.; Wallace, I. M.; Wilm, A.; Lopez, R.; Thompson, J. D.; Gibson, T. J.; Higgins, D. G. Bioinformatics 2007, 23, 2947−2948. (27) Han, M. V.; Zmasek, C. M. BMC Bioinf. 2009, 10, 356. (28) Nye, T. M. W.; Lio, P.; Gilks, W. R. Bioinf. 2006, 22, 117−119. (29) Kubicka, E.; Kubicki, G.; McMorris, F. R. J. Classif. 1995, 12, 91−99. (30) de Vienne, D. M.; Giraud, T.; Martin, O. C. Bioinformatics 2007, 23, 3119−3124. (31) Shen, X.-X.; Liang, D.; Wen, J.-Z.; Zhang, P. Mol. Biol. Evol. 2011, 28, 3237−3252. (32) Dereeper, A.; Guignon, V.; Blanc, G.; Audic, S.; Buffet, S.; Chevenet, F.; Dufayard, J.-F.; Guindon, S.; Lefort, V.; Lescot, M.; Claverie, J.-M.; Gascuel, O. Nucleic Acids Res. 2008, 36, W465−469.

H

dx.doi.org/10.1021/ac4005875 | Anal. Chem. XXXX, XXX, XXX−XXX