Mass Spectral Metabonomics beyond Elemental Formula: Chemical

ergoloid mesylate, 20, 2, 6, 0, 1, 7, hematoporphyrin I, 30, 2, 2, 0, 17, 42. thonzide, 20, 2, 4, 0, 1, 4, nandrolone, 20, 10, 49, 9, 18, 124. vecuron...
0 downloads 0 Views 364KB Size
Anal. Chem. 2008, 80, 5574–5582

Mass Spectral Metabonomics beyond Elemental Formula: Chemical Database Querying by Matching Experimental with Computational Fragmentation Spectra Dennis W. Hill,† Tzipporah M. Kertesz,† Dan Fontaine,† Robert Friedman,‡ and David F. Grant*,† Department of Pharmaceutical Sciences, and Biotechnology Bioservices Facility, University of Connecticut, Storrs, Connecticut 06260-3092 Despite recent advances in NMR and mass spectrometry, the structural identification of organic compounds in complex biofluids remains a significant analytical challenge. For mass spectroscopy applications, chemical identification is generally limited to determination of elemental formula. Here we test the hypothesis that unknown chemical structures can be determined by matching their experimental collision-induced dissociation (CID) fragmentation spectra with computational fragmentation spectra of compounds retrieved from chemical databases. The monoisotopic molecular weights (MIMW ( 10 ppm) of 102 “test” compounds were used to download 102 “bins” from the PubChem database. Each bin contained the corresponding test compound and, on average, 272 other candidate compounds, including 158 compounds having the same elemental formula as the test compound. Commercially available software was used to generate fragmentation spectra for all compounds in each of the 102 bins. Experimental CID spectra for each of the 102 test compounds were then compared to the computational spectra in order to rank candidate compounds based on number of fragment MIMW matches. This method returned the test compound as the highest ranking (or tied with the highest ranking) compound for 65 of the 102 bins. The test compound was ranked within the top 20 candidate compounds for 87 bins. In addition, the correct elemental formula was ranked first for 98 of 102 bins. Thus, matching experimental with computational fragmentation spectra is a valid method for rapidly discriminating among compounds having the same elemental formula and provides a novel approach for querying chemical databases for structural information. The advent of instrumentation with the capacity to resolve and quantify the levels of thousands of analytes (genes, transcripts, proteins, or metabolites) in a single biological sample has led to the development of “omics” technologies as emerging fields in * Corresponding author. Phone: 860-486-4265. Fax: 860-486-5792. E-mail: [email protected]. † Department of Pharmaceutical Sciences. ‡ Biotechnology Bioservices Facility.

5574

Analytical Chemistry, Vol. 80, No. 14, July 15, 2008

biology and human health.1,2 These instruments and associated analytical techniques (genomics, transcriptomics, proteomics, metabonomics, and others) have allowed researchers to monitor simultaneous changes in multiple cellular pathways and relate these changes to cell, tissue, or organism function. Metabonomics is a recent addition to the omics fields and is defined as “the quantitative measurement of the dynamic multiparametric metabolic response of living systems to pathophysiological stimuli or genetic modification”.3 The metabolome could be considered the final output of a cell, tissue, or organism and is dependent on its genes, transcripts, and protein composition.4 For metabonomics, analytes are typically organic compounds of molecular weight between 75 and 1000 Da, and chemical separation and identification has been accomplished primarily using spectrometric systems. The two most common spectrometric systems used have been nuclear magnetic resonance spectroscopy (NMR) and mass spectrometry (MS), both of which may or may not incorporate some type of prior chromatographic analysis.5 Currently, a major limitation of unsupervised (or global) metabonomic studies is that individual metabolites are frequently identified solely by their chromatographic and/or spectrometric characteristics and not by their actual structure.6 For example, a recent publication using high-resolution high-performance liquid chromatography/mass spectrometry (HPLC/MS) described approximately 900 “features” that were reproducibly observed in a biological sample.7 However, only 12 (1.5%) of these could be assigned to an actual structure. Although one study identified a significantly greater percentage of unknown metabolites using GC/MS,8 this is not typical. Clearly, without knowing actual structures it is difficult to predict biological significance or make mechanistic connections to biochemical pathways associated with a specific disease or toxic end point. (1) Grigorov, M. G. Bioinformatics 2006, 22, 1424–1430. (2) Hood, L.; Heath, J. R.; Phelps, M. E.; Lin, B. Science 2004, 306, 640–643. (3) Nicholson, J. K.; Lindon, J. C.; Holmes, E. Xenobiotica 1999, 29, 1181– 1189. (4) German, J. B.; Hammock, B. D.; Watkins, S. M. Metabolomics 2005, 1, 3–9. (5) Dunn, W. B.; Bailey, N. J.; Johnson, H. E. Analyst 2005, 130, 606–625. (6) Wishart, D. S. Briefings Bioinf. 2007, 8, 279–293. (7) Ding, J.; Sorensen, C. M.; Zhang, Q.; Jiang, H.; Jaitly, N.; Livesay, E. A.; Shen, Y.; Smith, R. D.; Metz, T. O. Anal. Chem. 2007, 79, 6081–6093. (8) Denkert, C.; Budczies, J.; Kind, T.; Weichert, W.; Tablack, P.; Sehouli, J.; Niesporek, S.; Konsgen, D.; Dietel, M.; Fiehn, O. Cancer Res. 2006, 66, 10795–10804. 10.1021/ac800548g CCC: $40.75  2008 American Chemical Society Published on Web 06/12/2008

One approach to determine the structure of a chemical compound from its mass spectrum is to attempt to determine the structure of the individual fragment ions following collisioninduced dissociation (CID) and then reconstruct the parent compound.9–11 This is usually a difficult process because multiple structures can be constructed for the same monoisotopic molecular weight (MIMW) and all portions of the original structure are usually not represented as ions in the CID mass spectra. Another approach to determine the structure of a compound from its mass spectrum is to compare the spectrum to those in a large database of mass spectra of known compounds. Under controlled conditions it has been shown that a relatively unique mass spectral fragmentation profile exists for different organic compounds.12 This approach necessitates the a priori analysis of a large number of compounds to generate the mass spectral library.13 The use of known mass spectral libraries has been shown to work well not only for electron ionization spectra, in which the use of a universal ionization energy insures reproducible ion fragments and peak intensities, but also for CID spectra in which the ion fragments and peak intensities vary due to the use of different ionization energies for each analyte.14,15 Increasing the internal energy of sample molecules by ionization and molecular collision mechanisms destabilizes the molecule, which results in bond cleavage and bond rearrangements to form a more stable ionized or neutral molecular structure. Since the weaker bonds will generally dissociate first, one might be able to use bond energy information to predict fragment ions generated in a fragmentation spectrum. Indeed, for positive ions, general rules for mass spectral fragmentation have been established and for the most part have been described.15 Various computer programs that evaluate the structures of organic compounds and predict mass spectral fragments based on these rules have been developed (Mass Frontier, HighChem, Mass Fragmenter, Advanced Chemistry Development, EPIC 10). One of these programs (Mass Frontier) can also be run in a mode that predicts fragmentation ions by retrieving examples of structure type fragmentation mechanisms from a library of specific fragmentation mechanisms published in the mass spectral literature. Additionally, rules-based fragmentation and mechanistic library fragmentation can be run in tandem. Regardless of the fragmentation algorithm used, software can be configured to allow secondary, tertiary, etc. fragmentation of predicted ions, thus simulating an increase in internal molecular energy that is typical of experimental CID fragmentation analysis. Here we investigated the possibility of identifying unknown organic compounds by matching experimental CID fragmentation spectra with computational predictions of fragmentation spectra (9) Benecke, C.; Grund, R.; Hohberger, R.; Kerber, A.; Laue, R.; Wieland, T. Anal. Chim. Acta 1995, 314, 141–147. (10) Hill, A. W.; Mortishire-Smith, R. J. Rapid Commun. Mass Spectrom. 2005, 19, 3111–3118. (11) Plumb, R. S.; Johnson, K. A.; Rainville, P.; Smith, B. W.; Wilson, I. D.; CastroPerez, J. M.; Nicholson, J. K. Rapid Commun. Mass Spectrom. 2006, 20, 1989–1994. (12) McLafferty, F. W.; Hertel, R. H.; Villwock, R. D. Org. Mass Spectrom. 1974, 9, 690–702. (13) McLafferty, F. W.; Stauffer, D. A.; Loh, S. Y.; Wesdemiotis, C. J. Am. Soc. Mass Spectrom. 1999, 10, 1229–1240. (14) Kienhuis, P. G.; Geerdink, R. B. J. Chromatogr., A 2002, 974, 161–168. (15) McLafferty, F. W.; Turecek, F. Interpretation of Mass Spectra, 4th ed.; University Science Books: Sausalito, CA, 1993.

of candidate compounds retrieved from a large chemical database. Additionally, we identified several analytical and computational variables that were important in determining the success of this strategy. EXPERIMENTAL SECTION Chemicals and Reagents. Supporting Information Table S1 lists the source, ALogPs values (calculated as described16 using SMILES), and MIMW of the compounds used to determine CID mass spectral fragmentation profiles. HPLC grade acetonitrile was purchased from Fisher Scientific (Pittsburgh, PA). LC/MS grade methanol was purchased from Riedel-de Hae¨n (Sigma-Aldrich St. Louis, MO). Reagent grade ammonium acetate was obtained from Fluka (Sigma-Aldrich St. Louis, MO), and reagent grade water was generated by a Barnstead Diamond reverse osmosis/ion exchange/activated carbon system. Stock solutions of the compounds used for the CID fragmentation study were prepared at 1 mg/mL in methanol. A standard solution of each compound was prepared at 39 µg/mL in an acetonitrile solution of 8.3 µg/mL lisinopril, 0.01% trifluoroacetic acid. If a lower concentration of a compound was required to prevent saturation of the mass spectrometer detector, a 3.9 µg/ mL solution of the compound was prepared by diluting the 39 µg/mL solution by 0.1 with an acetonitrile solution of 8.3 µg/mL lisinopril, 0.01% trifluoroacetic acid. Instrumentation and Analytical Methods. Mass spectrometric analyses were performed on a Micromass Q-TOF II (Beverly, MA) mass spectrometer interfaced to a Z-Spray electrospray source. All analyses were performed in the positive ion mode using a capillary potential of 2.5 V. The source temperature was 120 °C, and the desolvation gas (N2) temperature was 150 °C. The source cone gas (N2) flow rate was 50 L/h, and the desolvation gas flow rate was 450 L/h. Compounds analyzed by CID were introduced into the electrospray source by injection through a Rheodyne (Cotati, CA) injector into a stream of 0.05% trifluoroacetic acid, 20% reagent grade water in acetonitrile delivered to the source by an HP1090 HPLC system (with no column) at a flow rate of 100 µL/min. The protonated molecular ions of test compounds were generated at a cone potential of 30 V and isolated at unit resolution in the quadrupole analyzer. The CID spectra of the protonated molecular ions were determined at 10 and 20 eV in one analysis and at 30, 40, and 50 eV in another analysis. The average number (and standard deviation) of scans that were coadded per processed spectrum was 15 (±4). Lisinopril (8.5 µg/mL) was coinjected with each compound solution, and the protonated ion of this calibration standard was generated at a cone potential of 30 V, and the CID spectrum was generated at a collision energy of 25 eV. Argon was used as the collision gas at a pressure of 21 psi. Compound Analysis. One hundred microliters of the 39 µg/ mL solution of each test compound was analyzed individually by the described system to determine the CID spectrum at each collision energy. If the intensity of the ion peaks indicated saturation of the MCP detector on the mass spectrometer, 100 µL of a 3.9 µg/mL solution was used for the analysis. Processing of Mass Spectral Data. The CID spectrum of each test compound obtained at each collision energy was (16) Tetko, I. V.; Tanchuk, V. Y.; Villa, A. E. J. Chem. Inf. Comput. Sci. 2001, 41, 1407–1421.

Analytical Chemistry, Vol. 80, No. 14, July 15, 2008

5575

constructed from the sum of all spectra generated across the halfheight intensity of the ion elution profile during the analysis. The resulting spectrum was background corrected by subtracting the sum of an equal number of baseline pre- and postelution spectra. The linear regression slope and intercept for the measured mass versus the actual mass of five fragment ions in the CID spectrum of the coanalyzed lisinopril calibration standard were determined and used to correct the mass values of the ions in the CID spectra of each test compound. Evaluation of Mass Frontier Predictive Fragmentation Software and Optimization of Parameters. Mass Frontier version 4 predictive mass spectral fragmentation software was purchased from Thermo Cooperation (Waltham, MA). Five compounds (phenylalanine, codeine, cocaine, methadone, and cefoperazone) of low, moderate, and high molecular weight were initially analyzed to establish optimal operational parameters of the software. The CID spectra of the protonated molecular ion of these five compounds were determined at a collision energy of 30 eV. The structure of these compounds (submitted as SDF files) were processed by the Mass Frontier software using the protonated ion mode, fragmentation rules prediction, and allowing four, five, seven, or nine sequences of secondary reactions to occur. The developed ion fragment-matching algorithm was used to determine the quality of spectral match between the experimental spectrum of each compound and the predicted spectrum of each compound using different numbers of allowable secondary reaction sequences. Algorithm for Comparing Predicted Fragment Ion Masses to Experimental Ion Masses. The compounds present in the PubChem database on February 6, 2006 were downloaded as SDF files, and a subdatabase of these compounds consisting of all of the compounds except those that contain one or more atoms other than C, H, O, N, S, or P and compounds that were composed only of C, or C and H, was created. For each test compound, the structure of all compounds in the PubChem subdatabase with a MIMW within ±10 ppm of the MIMW of each test compound were retrieved. The collections of structures for each MIMW range comprised bins of candidate compounds that also contained the correct test compound based on its MIMW. To initiate the evaluation of compounds within a bin as to possible fragment ions that would be generated by CID, the structure of each compound (as an SDF file) was processed by Mass Frontier fragmentation predictive software in the protonated ion mode using fragmentation rules and allowing for varying numbers of secondary ions in the preliminary studies and a maximum of five secondary ions to be generated in the final study. Bins were submitted to MassFrontier in batch mode as a single SDF file containing individual mol tables of all structures. The experimental ions generated by the CID analysis of a test compound at each collision energy were compared to the ions predicted for each compound in the appropriate molecular weight bin of candidate compounds. The number of common ions between the experimentally determined ion fragments and the predicted ion fragments within ±10 ppm of the MIMW of the experimental ion fragments was used to determine the quality of match between the predicted and experimental data. The success of the algorithm to select the mass spectrum of the correct compound was determined by assigning a ranking to each test compound which was equal to the number 5576

Analytical Chemistry, Vol. 80, No. 14, July 15, 2008

of compounds within the molecular weight bin that had the same or greater predicted fragment ion matches to the experimental data as that of the correct compound. Analysis of Urine Samples. The use of human urine samples was approved by the University of Connecticut IRB. Urine samples were prepared for analysis by mixing 600 µL of the sample with 100 µL of 100 mM heptafluorobutric acid and 300 µL of an aqueous solution containing 166.7 µg/mL phenylalanine-d5 and 66.7 µg/ mL lisinopril. One hundred microliters of this mixture was injected onto an RP-1 column (2 mm × 20 mm, 5 µm particles) in a mobile phase of 1 mM heptafluorobutric acid flowing at 1.5 mL/min for 2 min with the effluent passing to waste. A mobile phase consisting of 10 mM heptafluorobutric acid in water/acetonitrile (20:80) was then passed over the column in the reverse direction at 100 µL/ min with the effluent passing into the electrospray source of the Q-TOF mass spectrometer. The mass spectral profile was collected using a 30 V cone potential and a rate of 1 scan/s. Ions of interest were investigated in subsequent analyses in which the ion was isolated at unit resolution in the quadrupole analyzer and CID spectra were collected at various collision energies using lisinopril as the internal calibration standard as described in the Instrumentation and Analytical Methods section. Phenylacetylglutamine standard was synthesized17 and provided by Dr. Henri Brunengraber. RESULTS AND DISCUSSION Currently the largest free access database of chemical compounds is the National Center for Biotechnology Information’s PubChem database. This database was downloaded and modified specifically for metabonomics applications by eliminating compounds containing elements other than C, H, N, O, S, or P. Compounds containing only C and H were also eliminated since these are not detected by electrospray ionization MS. This initial filtering resulted in a database of approximately 3 × 106 candidate compounds. To initially test whether matching experimental mass fragments with computational fragments might be useful for identifying structures of unknown chemical compounds, a preliminary analysis using a set of five test compounds was performed (Table 1). All compounds in the filtered PubChem database with a MIMW within ±10 ppm of that of each test compound were retrieved into separate bins. The resultant MIMW bins ranged in size from 12 candidate compounds for the cefoperazone bin to 1115 candidate compounds for the codeine bin. The number of unique elemental formulas in each bin ranged from two for the phenylalanine bin to seven for the cocaine bin; however, in all cases except for cefoperazone, the predominant elemental formula in each bin was identical to the elemental formula of the test compound. All candidate compounds in each of the five bins were processed by the Mass Frontier predictive fragmentation program in the protonated molecular ion mode using rules fragmentation mechanisms and reaction numbers of 4, 5, 7 and 9. This produced, for each reaction number, a list of MIMWs representing the computational fragmentation spectra for each candidate compound in each bin. These MIMW values were compared to the experimental MIMW values of fragments obtained from the 30 eV CID spectrum of the respective test compounds. For example, there (17) Yang, D.; Beylot, M.; Agarwal, K. C.; Soloviev, M. V.; Brunengraber, H. Anal. Biochem. 1993, 212, 277–282.

Table 1. Predictive Fragmentation Matching Results Used To Optimize Reaction Number for a Preliminary Set of Five Test Compoundsa test compd test compd formula and MIMW bin MIMW range (±10 ppm) no. compds in bin elemental formulas in bin/ no. of compds with that formula

phenylalanine

codeine

cocaine

methadone

cefoperazone

C9H11NO2 165.07897 165.0773-165.0806 391

C18H21NO3 299.15214 299.1491-299.1551 1115

C17H21NO4 303.14705 303.1440-303.1501 532 C17H21NO4/406 C12H23N4O3S/1 C13H17N7O2/6 C15H19N4O3/12 C16H22N3OP/1 C18H17N5/70 C20H19N2O/36

C21H27NO 309.20925 309.2062-309.2123 317

C25H27N9O8S2 645.14239 645.1359-645.1488 12

C21H27NO/308 C16H29N4S/1 C19H25N4/8

C25H27N9O8S2/3 C29H31N3O10S2/2 C30H27N7O6S2/1 C33H31N3O5S3/2 C36H27N3O5S2/4

7/35 10/35 12/35 14/35 26 15 15 16

7/15 8/15 8/15 8/15 1 1 1 1

C9H11NO2/386 C7H9N4O/5

Rx’n #a no. computational fragment matches/ 4 5/10 total no. exptl fragmentsb 5 6/10 7 6/10 9 6/10 rank of test compdc 4 2 5 2 7 5 9 10

C18H21NO3/1060 C13H23N4O2S/5 C14H17N7O/10 C16H19N4O2/29 C21H19N2/11 5/79 11/79 15/79 15/79 >100 12 23 27

9/28 14/28 21/28 22/28 4 2 1 1

a Reaction number (Rx’n #) is a selectable parameter in the computational fragmentation algorithm. It indicates the number of fragmentation steps that are allowed to occur and can be viewed as simulating an increase in internal molecular energy that is typical of experimental CID fragmentation analysis. b These data are for the test compound only. A CID energy of 30 eV was used for experimental fragmentation of the five test compounds as described in the Experimental Section. c Number of compounds in the bin that had a fragment MIMW (±10 ppm) match number equal to or greater than the match number for the test compound. A “1” means that the test compound had more MIMW fragment matches than any other compound in the bin and is the minimum (best) rank value. The maximum (worst) rank value would be equal to the total number of compounds in the bin.

were 79 experimental ions produced for codeine using a CID energy of 30 eV and there were computational fragments generated for codeine that matched 5 of these 79 experimental ions using a reaction number of 4, 11 ions that matched using a reaction number of 5, and 15 ions that matched using reaction numbers of 7 or 9 (Table 1). The results show that for all five compounds, as the number of computational secondary reactions increased, the number of fragment matches also increased. All candidate compounds in each bin were then ranked according to the number of computational fragment MIMWs that matched the experimental fragment MIMWs (±10 ppm). As can be seen (Table 1) the ranking of each test compound generally improved with increasing reaction number from 4 to 5, then generally deteriorated as the reaction number was further increased. For example, there were >100 compounds (out of 1115 total compounds) that had a match number equal to or greater than codeine using a reaction number of 4; however, there were only 12 compounds that had a match number equal to or greater than codeine when the reaction number was set at 5. Table 1 shows that even though the number of computational fragment matches increased with increasing reaction number, the overall ranking did not necessarily improve suggesting that excessive computational fragmentation produced ions that were not necessarily unique to the test compound. With the use of a reaction number of 5, the average rank for these five test compounds was 6 out of 473 total compounds per bin, or 6 out of 432 compounds with identical elemental formula per bin. On the basis of these results, an additional set of 102 compounds was similarly analyzed. The 102 compounds were chosen to approximate the MIMW distribution found in the filtered PubChem database and ranged in size from 137.0841 to 609.2951 Da (Figure 1, Supporting Information Table S1). The number of candidate compounds in each of the 102 bins ranged

Figure 1. Size distribution of compound molecular weights: (A) compounds in the filtered PubChem database; (B) 102 test compounds use for analysis.

from 3 to 1185 with an average of 272 compounds per bin. The number of unique elemental formulas in each bin ranged from 1 to 21, and the number of compounds in a bin with the same (correct) elemental formula as the test compound ranged from 1 Analytical Chemistry, Vol. 80, No. 14, July 15, 2008

5577

Figure 2. Characterization of the 102 test compounds (or 102 bins) used for analysis in this study: (A) distribution of the number of compounds in the corresponding bins; (B) distribution of the number of compounds in each bin with the same elemental formula as the test compound; (C) distribution of the number of unique elemental formulas in each bin; (D) distribution of calculated LogP (ALogPs, ref 16) values for the 102 test compounds.

to 946 and averaged 158. Calculated LogP values for this set of 102 test compounds ranged from -2.85 to 7.0 with an average of 2.19 (Figure 2). For each of the 102 test compounds, five experimental CID energies (ranging from 10 to 50 eV, in 10 eV intervals) were used to generate five experimental fragmentation spectra in order to evaluate the effect of CID energy on ranking results. The Mass Frontier program was used in the protonated ion mode with rules fragmentation mechanism and a reaction number of 5 to predict fragments for all compounds (including the test compound) in each of the 102 bins (27760 total compounds). The results (Table 2) are listed using the lowest experimental CID energy giving the best ranking for each test compound and are sorted from lowest (best) to highest (worst) rank number and secondarily from largest bin size to lowest bin size. For example, thioridazine showed the best overall result since it was ranked no. 1 in a bin containing the largest number (849) of compounds. As can be seen, 87 of the 102 test compounds were ranked in the top 20 compounds in each of their respective bins. Overall, these 87 test compounds had an average rank of fourth out of 234 total candidate compounds, including 110 candidate compounds with the same elemental formula as the test compound. Also note that for all compounds with a ranking g2, a zero in the “∆ best match” column of Table 2 indicates that the test compound was tied with one or more other compounds in the bin for highest match number, and therefore, the rank value listed is the sum of all tied compounds. For example, cholesterol had a ranking of 27 and a ∆ best match value of 0. This indicates there were 27 compounds with the same highest match value (in this case 1 match out of 11 experimental fragments). Inspection of these 27 compounds showed that they were indeed structurally similar with all of them containing a cholesterol ring system. For the group of 102 test compounds used in this study, 65 either had the single highest match value or were one of a few compounds that had the highest 5578

Analytical Chemistry, Vol. 80, No. 14, July 15, 2008

match value in their respective MIMW bins. Thus, a majority of the test compounds that were ranked second or third were tied with either 1 or 2 (respectively) other compounds having the same highest match number. It was observed that even though a test compound was not always ranked first, the correct elemental formula was ranked first for all but four bins despite the use of a mass accuracy of only ±10 ppm (Supporting Information Table S1). We expected that the correct elemental formula would be predicted for bins composed predominantly of compounds having the correct elemental formula. However, the correct elemental formula was also predicted for bins that had very few compounds with the correct elemental formula. For example, the correct elemental formula was predicted for the thioridazine bin even though there were only eight compounds in this bin (of 849 total compounds) with the same elemental formula as thioridazine. In addition, a comparison of the effects of CID energies on ranking values showed that for 43 of the 102 test compounds, two or more experimental CID spectra gave the same overall ranking (Supporting Information Table S1). This suggests that for many compounds, especially for those that were ranked among the top 20, the experimental CID fragmentation energy that gave optimum results was rather broad. There were also several instances where Mass Frontier predicted 100% of the experimental fragments, yet the correct compounds were not ranked highest in their respective bins (Supporting Information Table S1). This was due to the presence of one or more other compounds in the bin, which also had predicted fragments that matched all of the experimental fragments. In general, however, compounds with a high percentage of experimental fragments matched were ranked higher than those with a low percentage of experimental fragments matched. The results show that several test compounds were ranked greater than 100 in their respective bins. In order to evaluate why

Table 2. Results of Predictive Fragmentation Matching of Test Compounds Using the CID Energy Mass Spectrum That Gave the Best Ranking for Each Compound

test compd thioridazine thiothixene ampicillin folic acid methylergonovine piperacetazine oxytetracycline sufentanil acetophenazine etodolac diphenoxylate spectinomycin tetracaine pyrilamine enalapril remifentanil betaxolol alfentanil tetramisole rolitetracycline perindopril tripelennamine methionine enkephalin apramycin streptomycin buspirone dihydroergotamine terfenadine salmeterol tenoxicam ergocristine gallamine ergoloid mesylate thonzide vecuronium thiethylperazine norpropoxyphene enalaprilat acepromazine gingerol hydroxybutorphanol morphine-3-glucuronide drofenine oxybutynin mebeverine cymarin leucine enkephalin buprenorphine cromolyn reserpine boldenone undecylenate

no. of no. CID match exptl ∆ best rank compds energya no.b ionsc matchd no.e in binf 20 20 20 40 30 20 10 10 30 20 30 30 30 20 20 20 20 20 30 10 40 30 20 20 30 30 20 30 20 10 20 10 20 20 30 20 20 20 20 10 30 30 20 20 20 30 20 50 20 20 20

3 2 13 7 12 3 3 2 11 8 12 47 7 2 4 14 15 6 12 2 10 3 19 18 16 12 6 5 8 1 6 3 2 2 6 4 2 7 5 9 8 3 6 11 4 15 23 9 6 7 8

3 3 26 10 32 3 3 2 19 14 25 132 8 2 5 16 17 10 33 2 12 6 24 29 33 16 6 9 10 3 7 15 6 4 6 5 7 7 6 11 14 3 10 16 4 82 26 266 7 18 30

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

849 726 615 602 515 494 483 445 435 420 333 310 308 268 246 246 190 134 120 105 102 97 66 54 37 36 35 34 32 28 16 10 7 4 3 569 392 370 281 182 180 179 117 114 96 61 53 40 33 28 21

test compd tetracycline noscapine prednisolone tebutate testosterone propionate etamiphylline doxorubicin adenosine diphosphate hydrocortisone prazosin aminophylline taurocholate isoxsuprine fenoterol amfenac prednisone ephedrine sulfasalazine adiphenine dobutamine 6a-methylprednisolone methotrexate prednisolone prolintane sulfadimethoxine bumetanide ketorolac apomorphine dextromethorphan daunorubicin theobromine albuterol antipyrine-4-amino hematoporphyrin I nandrolone poldine meprobamate terbutaline cholesterol hydroxyphenethylamine naltrexone nimesulide dimefline ormetoprim fenbendazole oxycodone anileridine oxaprozin antipyrine mefenamic acid strychnine strychnine N-oxide

no. of no. CID match exptl ∆ best rank compds energya no.b ionsc matchd no.e in binf 10 20 10 30 30 20 20 20 30 30 10 20 20 20 10 30 30 20 20 10 30 10 20 10 20 30 10 20 10 30 20 10 30 20 30 10 30 30 30 30 50 30 20 40 20 40 30 30 50 50 30

3 6 5 15 7 8 3 21 10 4 4 10 6 5 6 8 4 5 6 6 3 7 4 1 3 5 1 2 4 5 4 4 2 10 4 1 6 1 2 9 1 2 1 2 3 3 2 6 1 2 1

3 15 8 54 19 12 4 97 33 18 4 12 6 6 11 21 53 5 8 17 5 18 4 1 21 10 2 3 5 34 8 9 2 49 14 3 20 11 12 41 24 9 5 13 5 8 8 68 25 137 43

0 0 0 10 2 0 0 2 1 2 0 1 0 0 0 2 0 0 1 2 1 3 0 0 2 1 1 1 1 6 1 2 0 9 2 2 5 0 2 4 0 2 1 1 1 2 2 8 5 10 8

3 3 3 3 3 3 3 4 4 4 4 5 5 5 5 5 5 6 6 7 8 8 9 9 10 10 12 12 12 14 15 16 17 18 19 19 27 27 32 34 38 39 43 47 65 73 96 97 309 378 1033

529 275 143 134 100 60 32 260 185 94 59 447 370 344 344 246 106 623 447 226 644 269 105 94 619 344 453 166 110 94 143 226 42 124 682 85 175 52 166 1035 136 644 270 403 776 563 461 306 579 664 1185

a Energy of the experimental CID spectrum used for comparison with predicted fragment ions. b Number of experimental fragments predicted by Mass Frontier. c Number of fragment ions in the experimental CID spectrum of the test compound. d Difference between best match and match of correct compound. e Number of compounds in the bin that had a match equal to or less than the match for the correct compound. f Number of compounds in PubChem database on Feb 6, 2006 that had the same MIMW ±10 ppm as that of the test compound.

some compounds ranked poorly, several characteristics that would be predicted to influence the results were compared between the 20 best ranking test compounds and the 20 worst ranking test compounds. There was no significant difference in the number of compounds per bin, the number of fragments produced in the optimal experimental CID spectra, or the calculated LogP values of the compounds for these two groups (Supporting Information Table S2). As might be expected, however, test compounds in the 20 best group were in bins that had a significantly (P < 0.02)

greater number of different elemental formulas and a significantly (P < 0.01) fewer number of compounds with the correct elemental formulas (Table 3). These results seem reasonable since bins with a larger number of different elemental formulas would be expected to have fewer compounds per formula. In addition, as the number of compounds with the same correct elemental formula increases, the less likely it is that fragments unique to the test compound will be predicted or matched. There was also a statistically significant (P < 0.02) difference in the MIMWs of the 20 best Analytical Chemistry, Vol. 80, No. 14, July 15, 2008

5579

Table 3. Comparison of Test Compound Characteristics and Bin Characteristics between the 20 Best Ranked Test Compounds and the 20 Worst Ranked Test Compounds characteristic

statistic

20 best ranking test compdsa

20 worst ranking test compdsb

no. of unique elemental formulas in each bin no. of compds in each bin with correct elemental formulad MIMW of test compd no. of computational fragments predicted for each test compd

mean SEMc range mean SEM range mean SEM range mean SEM range

10 1.0 2-18 144 30 5-453 372 17 204-527 119 18 19-306

7 0.6 2-11 345 63 12-946 300 21 137-598 31 5 5-94

probability (t test)