Identification of Unknowns in Atmospheric ... - ACS Publications

Sep 24, 2008 - existing interpretation schemes for unknown identification. Unknown ... searches are accomplished in seconds using high-speed computers...
1 downloads 0 Views 2MB Size
Anal. Chem. 2008, 80, 7765–7777

Identification of Unknowns in Atmospheric Pressure Ionization Mass Spectrometry Using a Mass to Structure Search Engine Wenta Liao, William M. Draper,* and S. Kusum Perera California Department of Public Health, Sanitation & Radiation Laboratory Branch, Richmond, California 94804 This study evaluates a new model for identifying unknown compounds in atmospheric pressure ionization mass spectrometry based on a mass-to-structure (MTS) paradigm. In this method, rudimentary ESI spectrum interpretation is required to recognize key spectral features such as MH+, MNa+, and MNH4+, which lead to the unknown’s monoisotopic mass. The unknown’s mass is associated directly with known organic compounds using an Access 2003 database containing records of 19 438 substances assembled from common sources such as the Merck Index, pesticide and pharmaceutical compilations, and chemical catalogues. A user-defined mass tolerance ((0.001-0.5 Da) is set according to the instrument mass accuracysunit mass resolution data require a wide mass tolerance (∼0.5 Da) while tolerances for accurate mass data can be as narrow as (0.001 Da. Candidate structures retrieved with the MTS Search Engine appear in a report window providing formulas, mass error, and Internet links. This paper provides examples of structure elucidation with 15 organic compounds based on ESI mass spectra from both unit mass resolution (e.g., quadrupole ion trap and triple-stage quadrupole) and accurate mass instruments (e.g., TOF and Q-TOF). Orthogonal information (e.g., isotope ratios and fragmentation data) is complementary and useful for ranking candidates and confirming assignments. The MTS Search Engine identifies unknowns quickly and efficiently, and supplements existing interpretation schemes for unknown identification. Unknown compounds are identified by various mass spectrometry methods in the instrumental analysis laboratory. Volatile compounds, those with vapor pressures between ∼105 and 10-10 Pa, are commonly identified by comparison of electron ionization (EI) spectra with spectral databases.1 The current libraries are large; e.g., the NIST 05 library incorporates 190 825 spectra of 163 198 different compounds (http:/www.nist.gov/srd/nistla.htm), increasing their effectiveness for identifying unknowns, and searches are accomplished in seconds using high-speed computers. About 30 million organic and inorganic substances have been prepared (http:/www.cas.org). The vast majority of these are not compatible with gas chromatography due to low volatility, thermal * To whom correspondence should be addressed. E-mail: william.draper@ cdph.ca.gov. (1) McLafferty, F. W.; Turecek, F. Interpretation of Mass Spectra, 4th ed.; University Science Books: Sausalito, CA, 1993. 10.1021/ac801166z CCC: $40.75  2008 American Chemical Society Published on Web 09/24/2008

instability, or high polarity. Many natural toxins and other biologically active substances are not compatible with gas chromatography, and therefore, GC/MS analysis is not adequate for comprehensive toxicological and forensic testing. These nonvolatile compounds are widely determined by high performance liquid chromatography (HPLC). And LC-MS instruments have become widely available, especially atmospheric pressure ionization (API) mass spectrometers that ionize molecules by electrospray ionization (ESI), atmospheric pressure chemical ionization (APCI), and atmospheric pressure photoionization.2 ESI is by far the most widely used. Many more compound classes can be determined by LC-MS than GC/MS because of the broad scope of HPLC, which includes reversed phase, normal phase, and ion exchange. And this scope is expanding with new column technologies such as hydrophilic interaction liquid chromatography, monolithic and graphite columns, etc. Most nonvolatile compounds can be ionized in one or more of the API sources. LC-MS is expected to surpass GC/MS as the dominant method for analysis of low to moderate weight organic compounds,3 and it has become the single most useful analytical technique.4 Thus, effective schemes for identification of unknowns in API mass spectrometry will be a boon to forensic science and other areas reliant on instrumental analysis. Particle beam (PB) LC-MS instruments were of great interest because of their ability to generate library-searchable EI spectra. The PB interface coupled HPLC directly to a conventional electron ionization source4 but was most suited to the analysis of moderately labile and semivolatile compounds.5 Many thermally labile substances (e.g., cephapirin, cloxacillin, penicillin G, glucuronide, and sulfate metabolites) do not yield molecular ions in PB LC-MS but instead reveal low-mass fragments.5,6 API involves a different principle of ionization, one in which much lower volatility (2) Cole, R. B., Ed. Electrospray Ionization Mass Spectrometry: Fundamentals, Instrumentation, and Applications; Wiley: New York, 1997. (3) Crews, P.; Rodriguez, J.; Jaspars, M. Organic Structure Analysis; Topics in Organic Chemstry; Oxford University Press: New York, 1998; p 306. (4) Dass, C. Principles and Practice of Biological Mass Spectrometry; WileyInterscience Series on Mass Spectrometry; John Wiley & Sons: New York, 2001. (5) Voyksner, R.; Pack, P.; Smith, C.; Swaisgood, H.; Chen, D. In Liquid Chromatography/Mass Spectrometry: Applicatons in Agricultural, Pharmaceutical, and Environmental Chemistry; Brown, M. A., Ed.;ACS Symposium Series No. 420; American Chemical Society: Washington, DC, 1990; Chapter 2. (6) Brown, R.; Draper, W. In Liquid Chromatography/Mass Spectrometry: Applications in Agricultural, Pharmaceutical, and Environmental Chemistry; Brown, M. A., Ed.;ACS Symposium Series No. 420; American Chemical Society: Washington, DC, 1990; Chapter 15.

Analytical Chemistry, Vol. 80, No. 20, October 15, 2008

7765

molecules are ionized without decomposition, including polypeptides and even proteins.2 ESI ionization involves cationization of moleculessin EI mass spectrometry, high-energy electrons bombard the sample generating molecular ions and fragments. API is a soft ionization method where little excess energy is imparted during ionization accounting, in part, for the lack of fragmentation. Moreover, the evenelectron ions typically formed in API are more stable than oddelectron ions formed in EI MS.1 Ancillary fragmentation is usually accomplished by applying excitation energy either in the source or in a separate collision cell or the ion trap cavitysthis fragmentation is known as collision-induced dissociation (CID). The ions of interest in API mass spectrometry are usually MH+ and [M - H]-, but other types of ions are encountered such as adducts of alkali metals (Na+, K+) or ammonia or carboxylates in -ESI (e.g., [M + HCOO]- and [M + CH3COO]-). Solvent clusters are common (e.g., with CH3OH, H2O, ACN), but application of a “declustering” potential can simplify spectra and enhance the ions of interest. In addition, there are aggregate ions (e.g., dimers, trimers) and multiply charged ions. The dimer ions (e.g., [2M + H]+, [2M + Na]+, or [2M - H]-) are concentration dependent, being less important at lower concentration. ESI mass spectra are not as reproducible as EI mass spectra. EI mass spectrometers produce similar spectra when tuned using a standard protocol, tune compound, and ionization energy. As such, experimental EI spectra often closely match library spectra over a wide concentration range and regardless of the matrix. ESI spectra vary as to types and intensity of ions depending on the instrument design, source conditions, mobile-phase composition, buffers and additives, and sample components. These parameters are difficult to control between laboratories, but they also vary over time on a single instrument. As such, mass spectral libraries for API mass spectrometry have not proven to be widely useful. Identifying compounds in API mass spectrometry, therefore, is fundamentally different from that in EI mass spectrometry.7 Various approaches for this have been described. It has been reported that structure elucidation in mass spectrometry requires initial determination of the elemental composition or empirical formula.8,9 This is accomplished mathematically through the use of empirical formula calculators that consider combinations of elements and identify theoretical empirical formulas that correspond to the experimental masses. Establishing which elements and how many atoms should be allowed is not straightforward, although some a priori rules exist.7 The rare elements of the periodic table that occur infrequently in organic molecules cannot be omitted from consideration. However, it is not unusual to consider only the most common elements, C, H, N, and O, for expediency. As the unknown’s mass increases, the list of candidate formulas grows exponentially and becomes impractically long.8 The presence of halogens or sulfur “eases dramatically the identification of the correct elemental composition by reducing the number of possible formulae”.10 (7) Suzuki, S.; Ishii, T.; Yasuhara, A.; Sakai, S. Rapid Commun. Mass Spectrom. 2005, 19, 3500–3516. (8) Kind, T.; Fiehn, O. Bioinformatics 2007, 8, 105–124. (9) Thurman, E. M.; Ferrer, I.; Fernandez-Alba, A. R. J. Chromatogr., A 2005, 1068, 127–134.

7766

Analytical Chemistry, Vol. 80, No. 20, October 15, 2008

Advancements in structure elucidation have focused on reduction in the numbers of candidate elemental compositions. This can be accomplished, for example, by the application of heuristic rules.8 Kaufmann provides an overview of recent advancements in the determination of elemental compositions using accurate mass instruments and orthogonal information on relative isotopic abundances (RIAs).11 Identification of the elemental composition is further aided by consideration of product ions and neutral losses accomplished by increasingly sophisticated and powerful algorithms.7,11,12 These new approaches are reliant primarily on timeof-flight (TOF) or quadrupole-time-of-flight (Q-TOF) instruments that have the fragmentation capability, required mass accuracy, or both. After successful elucidation of the elemental composition, the next step is to associate the empirical formula to a chemical structure. The approach may be ad hoc, such as entering the elemental formula into a web browser to locate a list of chemical structures. Searching commercially available compilations such as SciFinder (www.cas.org) also is effective. Thurman and coworkers have searched databases (e.g., ChemIndex and Merck Index) with empirical formulas to identify pesticide candidates.9,10 Fragmentation data from MS2 and MS3 scans is then used to select among these candidates and make tentative identifications. Ojanpera et al.13 previously used a large target database of exact monoisotopic masses and isotopic patterns representing the elemental formulas for toxicological drug screening by TOF mass spectrometry. Thus, the prevailing approach to unknown identification follows a paradigm of mass to empirical formula to structure (abbreviated here as MTEFTS). In this paper, we describe an alternative approach that we have found effective for identifying unknowns in API mass spectrometry. This method involves the direct association of the molecular mass to the chemical structure without considering empirical formulas. The mass-to-structure (MTS) connection is accomplished by a computerized search of a database, the MTS Search Engine. This approach, as with empirical formula-based methods, requires simple interpretation of the spectrum to ascertain the molecular mass. And, as with empirical formula-based methods, the process is most effective when high mass accuracy data are available such as that from TOF instruments that have recently become available. However, unit mass resolution data can also lead quickly to structure assignments, especially for higher mass compounds, and this is not true of empirical formula identification schemes. The current approach is simple and rapid. In seconds, a list of candidate structures is retrieved. The experimental database developed has ∼20 000 known compounds, about one-tenth the size of the NIST library. This paper summarizes the development of the database and provides a number of examples of the identification of “global” unknowns by +ESI LC-MS analysis of model compounds in standard mixtures as well as surface water (10) Garcia-Reyes, J. F.; Ferrer, I.; Thurman, E. M.; Molina-Diaz, A.; FernandezAlba, A. R. Rapid Commun. Mass Spectrom. 2005, 19, 2780–2788. (11) Kaufmann, A. Rapid Commun. Mass Spectrom. 2007, 21, 2003–2013. (12) Grange, A. H.; Zumwalt, M. C.; Sovocool, G. W. Rapid Commun. Mass Spectrom. 2006, 20, 89–102. (13) Ojanpera, S.; Pelander, A.; Pelzing, M.; Krebs, I.; Vuori, E.; Ojanpera, I. Rapid Commun. Mass Spectrom. 2006, 20, 1161–1167.

samples. Global unknowns are those for which structural information is limited to the mass spectrum according to McLafferty and Turecek.1 METHODS AND MATERIALS Construction of Database. The objective was to construct a database including biologically active organic compounds of toxicological and public health interest that are amenable to determination by API-LC-MS. The experimental database was developed exclusively to test the feasibility of the MTS approach. The current database contains 19 438 compounds including food additives and drugs, toxic compounds, naturally occurring toxins, environmental contaminants, industrial and agricultural chemicals, and other commercially available chemicals. The majority of the chemical information was assembled from common sources (The Merck Index (14th ed.), United States Pharmacopeia/National Formulary (2007), US EPA (http:// www.epa.gov/osa/fem/methcollectns.htm), FDA (http://www.fda.gov/ cder/drug/DrugSafety/DrugIndex.htm), DEA (http://www.deadiversion.usdoj.gov/index.html), NIOSH (http://www.cdc.gov/niosh/ npg/npgname-a.html), Compendium of Pesticide Common Names (http://www.alanwood.net/pesticides/index.html), Farm Chemicals Handbook (1998), ChemServices (www.chemservice.com), SigmaAldrich(https://www.sigmaaldrich.com),Cole-Parmer(http://www.coleparmer.com), Fisher Scientific (https://new.fishersci.com), and Wikipedia (http://wikipedia.com)). The primary (or index) database consisting of 9094 organic compounds was assembled from the following sources: pharmaceuticals and illicit drugssUSP, FDA, and DEA; agrochemicalss Compendium of Pesticide Common Names, ChemServices Catalog, Farm Chemicals Handbook; general sources covering naturally occurring toxins, pesticides, pharmaceuticalssMerck Index and NIOSH Pocket Guide to Chemical Hazards. With each addition to the database tables, duplicate records were eliminated. Additional editing included removal of the following types of substances: compounds with less that six carbon atoms, most peptides and proteins, inorganic substances, mixtures, organometallic compounds and salts. A supplemental database containing a further 10 346 substances was compiled from the following chemical catalogs: Sigma-Aldrich, Cole-Parmer, and Fisher Scientific. The database was created using Microsoft Access 2003 (Microsoft, Redmond, WA), a relational database, where records (name, molecular formula, monoisotopic mass, reference link, isotope ratio, and other characteristics) of each compound are organized in tables. Depending on the application, the compounds can be investigated in subgroups such as (1) the primary or index database (9094 compounds), (2) the supplemental database (10 346 substances, mostly industrial chemicals), (3) a library of 1081 drugs, (4) a library of 1762 pesticides, and (5) compounds containing bromine or chlorine atoms. Search algorithms are macros that retrieve records consistent with user-defined criteria. The common criteria used in the MTS Search Engine are the unknown’s monoisotopic mass and a mass tolerance. Database tables were prepared by digitizing chemical compound information from the listed sources. In most cases, this involved manual scanning using a flatbed scanner to generate a .pdf file. The digital images as .pdf files were then converted using optical character recognition (OCR) software to MS Word.doc files using copy/paste commands. The .doc files were

Table 1. Candidate Unknowns, CAS Numbers, Formulas, and Monoisotopic Masses compound

CAS No.

formula

monoisotopic mass (Da)

R-amanitin amethopterin avobenzone capsaicin chlorotetracycline colchicinea coumadina denatonium benzoateb deguelin deltamethrin difenzoquatb digitoxina digoxina diquatb eserinea estrone malachite greenb methiocarb oxytetracycline piperonyl butoxide rotenone rotenolone strychnine T-2 toxin tephrosin tetracycline

23109-05-9 59-05-2 70356-09-1 404-86-4 57-62-5 64-86-8 81-81-2 3734-33-6 522-17-8 52918-63-5 49866-87-7 143-62-4 20830-75-5 85-00-7 57-47-6 53-16-7 569-64-2 2032-65-7 79-57-2 51-03-6 83-79-4 16431-42-8 357-57-3 21259-20-1 76-80-2 60-54-8

C39H54N10O13S C20H22N8O5 C20H22O3 C18H27NO3 C22H23ClN2O8 C22H25NO6 C19H16O4 C21H29N2O C23H22O6 C22H19Br2NO3 C17H17N2+ C41H64O13 C41H64O14 C12H12N2+2 C15H21N3O2 C18H22O2 C23H25N2+ C11H15NO2S C22H24N2O9 C19H30O5 C23H22O6 C23H22O7 C21H22N2O2 C24H34O9 C23H22O7 C22H24N2O8

902.359 25 454.171 32 310.156 89 305.199 09 478.114 29 399.168 19 308.104 86 325.227 99 394.141 64 502.973 17 249.139 17 764.434 69 780.429 61 184.100 05 275.163 38 270.161 98 329.201 77 225.082 35 460.148 18 338.209 32 394.141 64 410.136 55 334.168 13 466.220 28 410.136 55 444.153 27

a Screening method target compound. ternary cation formula and mass shown.

b

Quaternary amines: qua-

then edited and reformatted from text to table format (Excel .xls format). Inorganic compounds, salts, and nonpolar compounds such as alkanes were excluded from the tables. Salts of acids appeared as the free acid and amine salts appeared as free bases. Extraneous information including page numbers and Chemical Abstracts Service (CAS) numbers were deleted. The chemical names are common names for the most part. The monoisotopic masses were determined with an accuracy of 0.1 mDa based on the tabulated formulas. Because of the volume of information compiled, it was not practical to correct all errors, although cursory proofing was done. OCR software is error prone in that certain font types or styles and subscripts of chemical formulas are not recognized very accurately. For the purposes of this study, however, the database proved sufficiently accurate to thoroughly establish the feasibility of the technique. Chemicals and Supplies. Table 1 lists all chemicals that were analyzed by +ESI MS in this work. Compounds were obtained from Aldrich or Sigma (St. Louis, MO) except for the following compounds: capsaicin was purchased from Fluka Biochemika (Steinheim, Germany) and tephrosin was a gift from Professor John Casida (University of California, Berkeley). Syringe filters (25 mm, 0.45 µm PTFE) were obtained from Alltech Associates (Deerfield, IL). Cesium hydroxide and formic acid were from Aldrich. Instruments. The following API mass spectrometers were used: an Agilent MSD TOF instrument (Agilent Instruments, Wilmington, DE), a Thermo-Finnigan LCQ Deca quadrupole ion trap (QIT) instrument (San Jose, CA), a Thermo-Finnigan TSQ Quantum triple-stage quadrupole instrument, and a Waters Q-TOF Micro quadrupole time-of-flight tandem instrument (Waters Corp., Analytical Chemistry, Vol. 80, No. 20, October 15, 2008

7767

Milford, MA). Each instrument was equipped with both ESI and APCI sources, but for most experiments +ESI was used. The Agilent TOF employed a multimode source with simultaneous ESI and APCI. The instruments were tuned and calibrated according to manufacturer’s procedures using autotune software. Both of the TOF instruments used internal mass references for optimal mass accuracy. Nitrogen for the API instruments was supplied by a Parker Balston N2-35 nitrogen generator (Parker-Hannifin, Haverhill, MA). The Thermo-Finnigan and Agilent instruments used Agilent 1100 HPLC binary HPLC pumps or capillary binary HPLC pumps equipped with solvent degassers, automated samplers, and diode array detectors (Agilent Instruments). The Waters Q-TOF instrument used an Aquity UPLC high-pressure system (Waters Corp.). Laboratory reagent water and water for HPLC mobile phases was produced with a Millipore Milli-Q Gradient A10 water purification system (Billerica, MA). Harvard syringe pumps were used for postcolumn addition of buffer salts (Harvard Apparatus, Holliston, MA). API-LC-MS Screening Method. A general purpose API-LCMS method was used. This screening method employed a C8 or C18 reversed-phase column with methanol-water gradient elution and +ESI. This method was developed for use in water security and emergency response for the California drinking water program. The method has broad scope for determination of polar, thermally unstable, and high molecular weight substances in surface water, groundwater, and finished drinking water. The method identifies six target analytes, aldicarb, colchicine, coumadin (warfarin), digoxin, digitoxin, and eserine (physostigmine), although its principal use is the detection and identification of unknowns. Water samples (up to 100 µL but as low as 8 µL with capillary HPLC) were injected directly onto the column (2.1-, 1.0-, and 0.5-mm-i.d. columns are used depending on the pumping system) and eluted with a methanol-water gradient. Mobile-phase additives were not used in order to facilitate cationization with H+, Na+, and NH4+. Alkali metal ions and ammonia originate from glass surfaces, silica column packing, and fused-silica surfaces, each of which functions as an ion exchanger. Cations are introduced continuously with mobile-phase solvents, buffers, and samples and from previous uses of the instrument. In this screening method, the mass spectrometer can be operated in ESI or APCI modes and in both + and - polarities, but +ESI seems capable of ionizing the greatest number of compounds with the lowest detection limits. The recommended gradient for a 2.1 mm (i.d.) × 15 cm C8 or C18 column with 5-µm packing and a uniform flow rate of 350 µL/ min is the following (A ) H2O, B ) MeOH): 15% B (2 min); 15% B to 100% B (2 to 16 min); and 100% B (16 to 20 min). The mass spectrometer settings including electrospray tip potential, cone voltage or declustering potential, temperatures, and gas flow rates and their temperatures were adjusted for optimal response of the target compounds in LC-MS operation. Mass scanning data were acquired throughout the 20-min run after an initial 0.5-min start delay. A wide scan range is recommended (e.g., 150-1500 Da) consistent with the analysis objectives; however, this should be adjusted to cover the masses of interest in the sample. Data Processing. A variety of methods and ancillary techniques were used to improve detection of unknowns. For example, 7768

Analytical Chemistry, Vol. 80, No. 20, October 15, 2008

diode array detectors were used in series. Base peak plots were effective for reducing noise levels in ESI chromatograms, and background subtraction was critical for spectral quality. Parallel analysis of uncontaminated matrix (e.g., upstream or control samples) was helpful in recognition of “normal” matrix constituents. Mass Spectrum Interpretation Tools. Ion fragmentation was simulated using Mass Spec Calculator software (Version 3.2. Chem SW, Inc., Fairfield, CA). This software also was used to calculate isotope patterns. In limited comparative studies, empirical formulas were calculated for molecular masses measured by a TOF mass spectrometer. A calculator available on the Internet (http://www.ch.cam.ac.uk/magnus/EadFormW.html) was used. Test Samples. Test compound spectra were determined both by infusion of standards and by LC-MS analysis of standard mixtures and surface water samples. Two surface water test mixtures are described. The test mixtures were prepared in surface water from the California aqueduct, so-called California project water (CPW), and a diluted seawater sample (DSW). The seawater, collected at Point Reyes, CA, was filtered and diluted 100-fold with laboratory reagent water. The DSW matrix simulates freshwater sources impacted by marine inputs as occurs in areas of the Sacramento River delta of California. These surface waters have a high dissolved solids content and provide useful information for practical method development. Surface water was spiked with method analytes and unknowns at concentrations ranging from 50 to 1000 µg/L (ppb) depending on the compound. Much lower ESI detection limits are achieved by coupling the direct sample introduction method with sample extraction and concentration by LLE or SPE or by use of selected ion monitoring or multiple reaction monitoring. RESULTS AND DISCUSSION Conventional Approach: Use of an Empirical Formula Generator. As discussed in the introduction, structure elucidation in mass spectrometry generally relies on elucidation of the empirical formula as a necessary adjunct to chemical structure elucidation. The empirical formula generator or calculator lists formulas in order of increasing mass difference between the theoretical and measured masses. Examples of the use of an empirical formula generator are illustrative and provide useful background. Six toxins were analyzed by ESI-LC-MS using a TOF instrument capable of high mass accuracy (60 23 2

>60c >60 >60 >60 >60 >60

1 (2) 1 (1) 1 (3) 1 (2) 1 (1) 1 (3)

(-4 ppm) (-4 ppm) (-5 ppm) (+32 ppm) (-1 ppm) (-3 ppm)

a Elements considered. b Mass tolerance (0.01 Da except digitoxin with 0.05 Da). c The correct elemental composition was not reported among 60 candidates.

restriction to C-, H-, N-, and O-containing formulas, a large number of consistent empirical formulas was generated. Each of these candidate formulas must be evaluated to then obtain the correct elemental composition using a variety of methods discussed earlier. Consideration of an even larger group of common elements leads to a failure of the calculator to generate the correct empirical formula among 60 reported (Table 2). Empirical formula identification becomes particularly difficult as the molecular weight increases due to the exponentially increased number of theoretical combinations of elements that correspond to the unknown’s mass.7 The final step, association of the empirical formula to a chemical structure, also is not straightforward or systematic. Thus, elucidation of both the elemental composition and chemical structure of global unknowns is challenging, which is rightly concluded from the number of recent publications focused on improving the available tools and techniques for this. The MTS Search Engine search results for this set of test compounds also are summarized in Table 2. In each case, the total number of retrieved structures was three or fewer, and the correct compounds were ranked 1 in every case. In the following sections of the paper, information is provided on both operation of the MTS Search Engine and its application to structure elucidation with unit mass resolution and accurate mass data. General Operating Instructions for the MTS Search Engine. The MTS Search Engine is simple to use and requires only a basic knowledge of ESI mass spectrum interpretation. After launching the Access 2003 application, the MTS Search Engine main page opens (Figure 1). The user interface requires input of only three pieces of information: the ion mass, the adduct or ion type, and the mass tolerance. The ion mass is entered into the Search Mass box. Accurate mass data can be input to 0.1 mDa. The adduct or ion type is indicated by the following check boxes: MH+, MNa+, MNH4+, [M - H]-, and MCs+. And recognition of these spectral features is essential. The MCs+ case applies where cesium additive is used to facilitate adduct identification.14 Each of these check boxes ensures that the proper mass is subtracted or added in calculating the experimental monoisotopic mass. The remaining check boxes on the main page direct the search to special database tables including the following: R4N+ (quaternary nitrogen compounds); Cl- or Br-containing compounds; (14) Kaiser, P.; Akerboom, T.; Wood, W. G.; Reinauer, H. Clin. Lab. 2006, 52, 37–42.

pesticides; drugs; and supplemental compounds (industrial chemicals). If none of these boxes is checked, a default search occurs in the main or index table containing 9093 chemicals. Lastly, the mass tolerance is entered in daltons. The appropriate mass tolerance is determined by the mass accuracy of the instrument. Unit mass resolution data require a wide tolerance (typically ±0.5 Da), while accurate mass data allow a narrow mass tolerance, as low as ±0.001 Da. The number of candidate compounds retrieved varies with the applied tolerance. MTS Search Engine results appear after 1 or 2 s in a separate report screen (Figure 2). The report page summarizes the adduct/ ion type, the experimental monoisotopic mass (to 0.1 mDa, regardless of the number of decimal places in the search mass), the tolerance, the number of compounds retrieved, and a list of candidates. The candidate structures appear in a table with columns showing name, formula, theoretical mass, mass error (experimental-theoretical), and any associated Internet links, often Wikipedia (http://en.wikipedia.com) or eMolecule (http://www. emolecule.com). About 95% of the compounds in the index table have Internet links. The following sections provide examples of the use of the database in structure elucidation. Examples of Unit Mass Resolution Data and MTS Search Engine. T-2 Toxin. The sample, California project water, contained two method target compounds and two unknowns. An unknown at tR 14.41 min was detected in +ESI using a quadrupole ion trap instrument. The mass spectrum had three prominent ions: m/z 489.2 (base peak), 484.1, and 954.9. These ions are the corresponding MNa+ and MNH4+ adducts and the [2M + Na]+ dimer ion. The MTS search is carried out by inputting the mass of either MNa+ or MNH4+ ions, indicating the adduct or ion type, and setting a tolerance of ±0.5 Da as seen in Figure 1. These searches are annotated MNa+/489.2 ± 0.5 Da and MNH4+/484.1 ± 0.5 Da, and either retrieves the same candidate structures shown in Figure 2. The two calculated monoisotopic masses are different based on both experimental and rounding errors, also reflected in the mass errors displayed. The unknown, T-2 toxin, appears in both candidate lists. Additional spectral information is required to solve this particular unknown, such as information on the isotope ratios (e.g., ratios of A, A+1, and A+2 isotopic ions)1 or interpretation of fragments produced by CID in a tandem instrument or in source CID in a single mass analyzer instrument. Spectral information additional to the monoisotopic mass is generally known as orthogonal information. Analytical Chemistry, Vol. 80, No. 20, October 15, 2008

7769

Figure 1. MTS Search Engine Main Page. Information on the unknown’s search mass, mass tolerance, and adduct/ion type are entered here to define the database search.

Isotope Ratios. An example of effective use of orthogonal information is the interpretation of isotope ratios among MTS Search Engine candidates. Consideration of the A+1 isotope peak intensity, in particular, is very useful for rapid ranking of structure hypotheses. The A+1 isotope peak has major contributions from 13 C, which has a natural abundance of 1.1%. Thus, the A+1 peak intensity is indicative of the maximum number of carbons in the molecule.1 Strychnine, a C21 compound, produces an intense MH+ ion, m/z 335.3 (QIT instrument, sample infusion). The MTS database search MH+/335.3 ± 0.5 Da retrieves 15 structures including strychnine. The intensity of the A+1 peak is 23.8% corresponding to a carbon number of 21.6. The distribution of carbon atoms for the 15 candidates is plotted in Figure 3asthere is no need to consider compounds with 11, 12, 15, or 16 carbon atoms. Accordingly, consideration of the strychnine candidates should begin with the two C21 structures. If, in turn, they are ruled out, the C20 candidates are considered next, and so forth. Ibanez et al. have recommended limiting consideration to the theoretical carbon number ± 20% to accommodate experimental error.15 The A+1 peak of the C18 compound, estrone, has an intensity of 19.1% corresponding to 17.4 carbon atoms. The MTS database search, MH+/271.2 ± 0.5 Da retrieves 25 candidate structures. The carbon number distribution of these candidates is again plotted in Figure 3a. Highest priority should be placed on examination the nine 17- and 18-carbon candidate structures. More (15) Ibanez, M.; Sancho, J. W.; Pozo, O. J.; Niessan, W.; Hernandez, F. Rapid Commun. Mass Spectrom. 2005, 19, 169–178.

7770

Analytical Chemistry, Vol. 80, No. 20, October 15, 2008

detailed discussion of the use of RIA in organic structure determination appears in recent pubications.7,11 Rotenone and Tephrosin. Two structurally related C23 natural products, ionize to MH+ and MNa+ ions, respectively. Search results for rotenone (MH+/395.2 ± 0.5 Da)(15 candidates) and tephrosin (MNa+/433.1 ± 0.5 Da)(23 candidate structures) are displayed based on carbon number (Figure 3b). The experimental carbon numbers based on RIAs are 24.8 and 22.2 for rotenone and tephrosin, respectively. The same approach is used to rank the candidates with initial priority placed on C25 and C22 compounds. Tetracycline. Introduction by infusion to a QIT instrument in methanol-water (9:1, v/v) with 0.2 mM ammonium acetate produced a spectrum with three prominent ions: [MH - NH3 H2O]+ (m/z 410.1, base), MH+ (m/z 445.1), and MNa+ (m/z 467.0). A dimer ion also is present, [2M + Na]+. The MH+/445.1 ± 0.5 Da database search retrieved nine candidate structures with formulas containing between 22 and 31 carbon atoms. The A+1 peak intensities were 27 (MH+), 22 (MNa+), and 49% ([2M + Na]+) corresponding to an average carbon number of 22.2. Based on RIA, structures with 29 and 31 carbons (2) are rejected. The six remaining candidates have the empirical formulas C22H20O10 (granaticin), C22H24N2O8 (quatrimycin, tetracycline, doxycycline, epitetracycline), and C22H28N4O6 (mitoxantrone). Each of the C22H24N2O8 compounds is a related isomer: quatrimycin and epitetracycline are synonyms for a tetracycline epimer, and doxycycline is the 6-deoxy-5-hydroxy tetracycline isomer. The three compatible structures are shown in Figure 4. The loss of water and ammonia, giving a dehydro and dehydroacylium

Figure 2. MTS Search Engine results page. Results of the database search are summarized in this display showing the list of candidate structures, their formulas and theoretical masses, and the experimental mass error.

fragments is typical of tetracyclines, as is the m/z 154 ion resulting from retro Diels-Alder fragmentation. The cases of chlortetracycline and oxytetracycline are similar. Chlortetracycline. Chlortetracycline’s spectrum has two chlorinecontaining ions, m/z 479.1 (MH+, base) and m/z 501.0 (MNa+). In addition, two fragment ions are seen due to loss of NH3 (m/z 460.0) and loss of H2O and NH3 (m/z 444.1). The A+1 intensity for the MH+ ion is 25.4% (carbon number 23). The MH+/479.1 ± 0.5 Da search in the halogen table retrieves 10 candidates with carbon numbers of C16, C22 (4), C25, C26, and C28 (3). The four C22 structures are chlortetracycline, isochlortetracycline, 4-epitetracycline, and 4-epichlortetracycline. Each has the same empirical formula, and all are geometrical or positional isomers. MS4 scans with the QIT instrument established the following fragmentation maps for chlortetracycline: 479 > 462 > 444 > 154 and 479 > 462 > 444 > 371. Oxytetracycline. Oxytetracycline’s spectrum is analogous with two major ions m/z 461.1 (MH+) and m/z 426.1 ([MH - H2O NH3]+). In the -ESI spectrum a prominent [M - H]- ion (m/z 459.2) is seen as well as m/z 481.3, possibly the sodium replacement ion, [M + Na - 2H]-. MTS search of these masses, e.g., MH+/461.1 ± 0.5 Da retrieves six substances with two empirical formulas: C22H24N2O9; oxytetracyline (CAS No. 79-57-2) and epioxytetracycline (CAS No. 35259-39-3) and C25H32O8; aspidin (CAS No. 989-54-8), albaspidin (CAS No. 58409-52-2), kosin (CAS No. 568-50-3), and prednisolone hemisuccinate (CAS No. 2920-86-7). The A+1 isotope, 25.3%, corresponded to a carbon number of 23. MTS search of the supplemental

compound table yielded a further six structures with C17, C21 (2), C22, C24 and C28 compoundssthe additional C22 and C24 compounds were octyl β-D-glucopyranoside-teraacetate (C22H36O10) and dicyclohexano-24-crown-8 (C24H44O8, CAS No. 17455-23-1). The [M - H]-/459.2 ± 0.5 Da search retrieves the same candidate list. Thus, in these examples, it is seen that the MTS Search Engine provides a short list of compatible structures, even using unit mass resolution spectra. The quadrupole ion trap MS4 scans provide orthogonal information as fragmentation maps such as the following: 461 > 443 > 426 > 337 + 381 + 226 + 154. The sequential loss of water and ammonia is characteristic of tetracyclines as seen earlier. Deltamethrin. While unit mass resolution MS data usually requires additional spectral information to assign a structure, there are many instances in which the MTS Search Engine can lead directly to a single structure. The case of deltamethrin (decamethrin), also analyzed by +ESI using the QIT instrument, illustrates this. Deltamethrin produces an intense MNa+ ion, m/z 525.9, with a 2-bromine isotope cluster. Fragment ions also appear in the MS spectrum including a nonhalogenated ion (m/z 208.1) and a 2-bromine fragment (m/z 278.9). An MTS database search at MNa+/525.9 ± 0.5 Da (halogenated) retrieves a single compound, deltamethrin, from both the index and supplemental tables. The two fragments, a cyanophenoxytropylium ion and an acylium ion, provide strong supporting evidence for the structure. R-Amanitin. The surface water sample described in the T-2 toxin case contained another unknown substance at tR 8.94 min. This compound’s +ESI spectrum had two major ions, m/z 919.3 Analytical Chemistry, Vol. 80, No. 20, October 15, 2008

7771

Figure 3. Distribution of candidate carbon numbers for MTS database search for strychnine (MH+/335.3 ( 0.5 Da) (A), estrone (MH+/271.2 ( 0.5 Da (B), rotenone (MH+/395.2 ( 0.5 Da) (C), and tephrosin (MNa+/433.1 ( 0.5 Da) (D). The experimental carbon numbers based on the RIA (indicated by letters and arrows) are used to select candidates retrieved by the MTS Search Engine.

Figure 4. MTS Search Engine candidates for the MH+/445.1 ( 0.5 Da database search. Each compound is isobaric with tetracycline and could not be excluded based on experimental A+1 isotope ratios. In such cases, fragmentation data are effective for ranking hypothetical structures and ultimate structure assignment.

(base peak) and 941.4, which are the protonated and sodiated adducts, respectively. Additionally, there were two minor ions 7772

Analytical Chemistry, Vol. 80, No. 20, October 15, 2008

(5-10% of the base peak), m/z 901.3 ([MH - H2O]+) and an ammonium adduct. MTS searches of either MH+/919.3 ± 0.5

Figure 5. Distribution of compound masses in the database. Each point represents a 20-Da mass interval. Below m/z 100, there were 356 compounds and the total number of entries is 8192 for the index table.

Da or MNa+/941.4 ± 0.5 retrieved a single compound, R-amanitin. Higher mass molecules searched by the MTS Search Engine are more likely to retrieve a single structure, because of the mass distribution of molecules in the database (Figure 5). A maximum is seen at ∼300 Da where there are more than 500 compounds per interval. Above 700 Da, the number of records is very small; hence, even searches with a wide mass tolerance are likely to retrieve only a few candidate molecules. This situation is opposite when using an empirical formula generator where the number of candidate formulas becomes unwieldy at high mass. Malachite Green. The quaternary nitrogen dye, malachite green (C23H25N2+), gives a base peak of m/z 329.4 (M+) and a fragment ion, m/z 313.4 ([M - CH4]+). An MTS search of R4N+/329.4 ± 0.4 Da retrieves two candidate structures: malachite green and methylene green (C16H17N4O2S+). The intensity of the A+1 peak is 25.4% consistent with a 23-carbon compound and not a C16 compound. The intensity of the A+2 isotope peak also distinguishes the candidates as methylene green (A+2, 6%), but not malachite green (A+2, 3%), and has the A+2 element, sulfur. Further support for the structure is provided by fragmentation in the QIT instrument. Four major fragment ions are formed from the m/z 329.4 precursor: m/z 208.3 (loss of dimethylaniline), 251.3 (loss of benzene), 285.3 (loss of dimethylamine radical), and 313.3 (loss of CH4). These fragments confirm the assignment of malachite green. The rationale for selecting the R4N+ check box (for quaternary amines) is based on the following spectral information. The unknown’s spectrum: (a) exhibits a +ve ion, but no -ve ion at the expected mass; (b) does not exhibit doublet or triplet of +ve ions (proton, sodium, or ammonium adducts) at 5, 17, or 22 Da intervals; (c) +ve ion is not shifted 122 Da by addition of a cesium salt; and (d) nitrogen rule applies (e.g., +ve even mass for odd nitrogen number, odd mass for even nitrogen number). Additional

sample information also may be important such as the chromatographic or partitioning properties or history of the sample. The MTS Search Engine, like EI mass spectrum library searching, rarely retrieves a single organic compound. In fact, wide mass tolerances lead to large candidate lists, as many as 10-50 when both the index and supplemental compound tables are searched. The analyst must rely on all of the available spectral information to rank candidates and proceed with structure elucidation. This can be laborious, but it is much more efficient and practical than de novo spectrum interpretation. Structure Elucidation with High Mass Resolution Data. Capsaicin. Another surface water sample, prepared in diluted seawater (DSW), contained five additives: two target compounds (eserine, 11.15 min; digoxin, 13.03 min) and three unknowns (14.10, 15.21, and 16.18 min). Base peak plots effectively reduced noise in the +ESI chromatograms, and the analysis of laboratory reagent water and an unspiked surface water sample aided in recognizing method artifacts and normal components of the DSW matrix, respectively. These peaks, however, were few in number given the screening method detection limits, which varied between ∼100 and 1000 µg/L using the capillary HPLC triple-stage quadrupole instrumentsthe maximum capillary injection volume was 8 µL and no sample preconcentration was employed. In this experiment, a unit mass resolution instrument was used with a scan range of 150-1500 Da (the mass accuracy is 100 mDa). The +ESI spectrum of the compound at tR 14.10 min had four prominent ions: m/z 306.2, 328.1, 611.4, and 633.4 (Figure 6a). The doublet of ions spaced at 22 Da indicates a MH+/MNa+ pair. These adducts are very common in +ESI using the methanol-water gradient without added buffer or acid. The higher mass ions are dimers, the [2M + H]+ and [2M + Na]+ ions. The MTS Search Engine processing the MH+/306.2 ± 0.5 Da query retrieves 20 structures. The experimental monoisotopic mass is 305.1927 Da, and all structures within the stipulated 0.5 Analytical Chemistry, Vol. 80, No. 20, October 15, 2008

7773

Figure 6. Unit mass resolution +ESI spectra of capsaicin (a), avobenzone (b), and denatonium (c), test compounds used in evaluating the MTS Search Engine.

Da tolerance are reported. Input of the experimental MNa+ ion mass, e.g., MNa+/328.1 ± 0.5 Da, retrieves the same candidate structures. The monoisotopic mass here, 305.1108 Da, is slightly different due primarily to experimental error. Shorter lists of candidates are retrieved with smaller mass tolerances. In an accurate mass TOF instrument, the same +ESI ions had measured masses as follows: m/z 306.2039, 328.1872, 611.4115, and 633.3837. Repeating the two searches using MH+/ 306.2039 and MNa+/328.1872 while decreasing the mass tolerance from 0.1 to 0.01 to 0.001 Da retrieved 12, 1, and 0 structure candidates, respectively. The unknown was tentatively identified as capsaicin (mass error -8 ppm)scapsaicin’s theoretical mass is 305.1991 Da comparable to the experimental mass of 305.1966 Da. Similarly, setting the MNa+ mass tolerance at 0.02 Da retrieves a single candidate structure, again capsaicin with a mass error of -4 ppm. Another candidate structure, amiprilose, is unlikely due to the large mass errors, e.g., ±42 to 46 ppm, which are well above the instrument’s expected mass accuracy. Capsaicin is the active ingredient of pepper spray (CAS No. 404-86-4, C18H27NO3). Amiprilose is a carbohydrate-derived anti-inflammatory compound (CAS No. 56824-20-5, C14H27NO6). 7774

Analytical Chemistry, Vol. 80, No. 20, October 15, 2008

Avobenzone. Another unknown in the diluted seawater sample at tR 16.18 min has ions m/z 311.1, 333.1, 365.2, and 643.3 (Figure 6b). As with capsaicin, ions appearing at a 22-Da interval indicate a MH+/MNa+ pair. The dimer is consistent with [2M + Na]+ and the m/z 356.2 ion is a methanol cluster ([M + CH3OH + Na]+). Searching the protiated and sodiated ions with the MTS Search Engine (e.g., MH+/311.1 ± 0.5 or MNa+/333.1 ± 0.5 Da) retrieves the same list of 29 compounds. The Q-TOF instrument spectrum provides the following mass data: m/z 311.1643 (MH+) and 333.1211 (MNa+). The MH+ search, for example, retrieves lists with 2, 3, 8, and 19 compounds by setting the tolerance at 0.005, 0.01, 0.02, and 0.05 Da, respectively. In Table 3, avobenzone appears as the highest ranking candidate with a mass error of +0.5 ppm. The ranking of the candidates by mass accuracy has no significance per se as long as the error is typical of the instrument performance. Accordingly, avobenzone is the only “good” candidate as only it is typical of the instrument’s normal accuracy. Denatonium. The remaining unknown in the DSW sample at tR 15.21 min exhibits a base peak at m/z 325.2 but displays no

Table 3. Results from MTS Search Engine Query MH+/311.1643 ( 0.015 Da name

formula

mss (Da)

error (ppm)

cupreine sarpagine Wieland-Gumlich aldehyde (WGA) DBHBT avobenzone mepazine bifonazole

C19H22N2O2 C19H22N2O2 C19H22N2O2

310.1681 310.1681 310.1681

-35.6 -35.6 -35.6

C17H26O3S C20H22O3 C19H22N2S C22H18N2

310.1602 310.1569 310.1504 310.1470

-10 0.5 21 32.4

other significant ions (Figure 6c). Thus, this ion cannot be identified as a common cationized species based upon a pattern of adducts. Querying the database produces many assignments, all incorrect if they assume the ion to be an adduct. In fact, the ion is a quaternary nitrogen. Checking the appropriate box (R4N+) and querying the database fails to give any hits, regardless of the mass tolerance allowed. The compound, therefore, is not currently in the database as the experimental mass, 325.2365 Da, was in good agreement (26 ppm) with the theoretical mass, 325.22799 Da. The rationale for selection of the R4N+ checkbox was discussed earlier. Using Cs+ Adduct Data. Postcolumn addition of a cesium salt is useful for modification of the adduct distribution.14 An aqueous cesium formate solution, prepared from cesium hydroxide and formic acid, was delivered by syringe pump to give a final cesium ion concentration of 0.1 mM. Cesium readily exchanges with the common adducts and forms MCs+ ions, corroborating the unknown’s mass. For this purpose, the MTS Search Engine includes an additional check box for inputting the mass of the MCs+ ion. The large cesium ion, 133 Da, displaces the original adduct considerably with the MH+/MCs+ pair appearing as a doublet with a 132-Da interval. In the presence of cesium, the spectra essentially contained a single ionsthe MNa+ ion was eliminated but some of the protonated molecule remained (10-25% relative to the MCs+ base peak). For the capsaicin example, the MTS search of MCs+/438.0926 ± 0.01 Da retrieved capsaicin (number 2 hit, -37 ppm). Similarly, with avobenzone MCs+/443.0515 ± 0.01 Da retrieved the correct structure as the fourth hit with (mass error -33.1 ppm). The use of cesium addition is effective for molecules that preferentially form a single adduct. Many compounds are cationized in the methanol-water gradient forming a recognizable pattern of MH+, MNa+, and MNH4+ ions, and in these cases, the cesium ion adduct method is unnecessary. Use of Fragmentation Data with the MTS Search Engine. As has already been demonstrated in the examples of deltamethrin, malachite green, and the tetracyclines, tandem instruments provide complementary or orthogonal information. Fragmentation information is useful in the capsaicin and avobenzone examples. The capsaicin MH+/306.2039 ± 0.03 Da search retrieves two compounds, and the avobenzone MH+/311.1643 ± 0.015 Da search retrieves five compounds. The candidate structures for capsaicin include capsaicin (-8 ppm) and amiprilose (42 ppm). CID of the capsaicin MH+ion (m/z 306.2039) in the Q-TOF instrument produced two major product ions: m/z 182.1429 and 137.0503. Fragmentation of the two candidate structures was examined using the Mass Spec Calculator simulation software.

Figure 7. CID Fragmentation of capsaicin (a), avobenzone (b), and amethopterin (c). Fragmentation data were useful for structure elucidation (or confirmation), particularly where there are a number of candidates to evaluate.

Simulated fragmentation of capsaicin, but not amiprilose, revealed fragments that include a tropylium ion (C8H9O2, m/z 137.06025 Da) and a methyleneimine fragment (CllH20NO, m/z 182.15449 Da) shown in Figure 7a. The mass errors were 72 ppm for the tropylium and -64 ppm for the methyleneimine fragment. The Q-TOF mass accuracy in MS/MS is reduced relative to MS operation.7 Simulated fragmentation of amiprilose from either single or consecutive bond breaks produced 60 and 129 possible fragments, respectively, none with the correct mass. Fragmentation of the avobenzone MH+ ion again in the Q-TOF instrument yields two product ions, m/z 135.0412 and 161.0941. The simulated fragmentation of avobenzone reveals two fragments out of a total of 25 or 151 (1 + 2 bond breaks) (Figure 7b) with the correct masses, 135.0446 and 161.0996. The experimental mass errors, thus, are -25 and -15 ppm for m/z 135 and 161 ions, respectively. None of the other candidates yielded these fragments in the computer simulation (Table 4). Amethopterin/Methotrexate. Fragmentation information also is effective for ranking MTS Search Engine candidates from unit mass resolution data. In the amethopterin (methotrexate) +ESI spectrum from a triple-stage quadrupole instrument, the base peak is m/z 455.2 and a second major ion is m/z 477.2 (10% relative abundance)sthese ions are the MH+/MNa+ adduct pair. There are additional low-abundance ions including a dimer ([2M + H]+) and fragment ions, m/z 308.4+, 292.6+, and 203.7+ (each 5-10% with a declustering voltage of 20 V). The m/z 308.4+ fragment is the predominant product ion under most conditions. The MH+ ion efficiently fragments to m/z 308.4+ with 80% intensity (relative to m/z 455.6+) with 1.5 mTorr collision gas, but no offset potential. With the application of a small collision energy (5 V), m/z 308+ becomes the base peak and it remains so with collision energies Analytical Chemistry, Vol. 80, No. 20, October 15, 2008

7775

Table 4. Evaluating Avobenzone Candidates and Simulated Fragmentation to m/z 135 and 161 Fragment Ions observed fragment m/z 135.0412

candidate

CAS No.

formula

one bondc (total)

cupreine sarpagine WGAa DBHBTb avobenzone mepazine

524-63-0 482-68-8 466-85-3 63147-28-4 70356-09-1 60-89-9

C19H22N2O2 C19H22N2O2 C19H22N2O2 C17H26O3S C22H20O3 C19H22N2S

0 (13) 0 (11) 0 (3) 0 (31) 135.0446 (31) 0 (7)

one + two bondsd (total) 0 (86) 0 (61) 0 (59) 0 (166) 0 (92)

observed fragment m/z 161.0941 one bond (total) 0 (13) 0 (11) 0 (3) 0 (31) 161.0966 (31) 0 (7)

one + two bonds (total) 0 (86) 0 (61) 0 (59) 0 (166) 0 (92)

a Weiland-Gumlich aldehyde (curacurine or deacetyldiaboline). b 3,5-Di-tert-butyl-4-hydroxybenzylthioglycollate. c Simulated fragments from a single bond break. d Simulated fragments from one and/or two bond breaks.

Table 5. Results for MTS Search MH+/455.2 ( 0.4 Da and Simulated Fragmentation Among Candidate Structures observed fragment m/z 308

compound

formula

error (ppm)

cefazolin saicar verapamil ribostamycin methotrexate primisulfuron

C14H14N8O4S3 C13H19N4O12P C27H38N2O4 C17H34N4O10 C20H22N8O5 C14H10F4N4O7S

799 703 241 364 488 820

a

observed fragment m/z 175

one bonda (total)

one + two bondsb (total)

one bonda (total)

one + two bondsb (total)

0 (27) 0 (43) 0 (45) 0 (33) 1 (37) 0 (39)

0 (171) 0 (293) 0 (336) 0 (233) 2 (260) 0 (270)

0 (27) 0 (43) 0 (45) 0 (33) 1 (37) 0 (39)

0 (171) 0 (293) 1 (336) 0 (233) 1 (260) 2 (270)

Simulated fragments from a single bond break. b Simulated fragments from one and/or two bond breaks.

of 10 and 20 Vsunder these fragmentation conditions, the m/z 175.6+ ion also is prominent. The MH+/455.2 ± 0.4 Da search retrieves six candidate structures shown in Table 5. In order to rank these candidates, the MS/MS fragmentation data are crucial. Fragmentation of the candidate structures was again simulated with the Mass Spec Calculator software. Verapamil, for example, yields a total of 45 possible fragments after breaking a single bond, but none has a mass of 308 or 175 Da (Table 5). In fact, none of the candidates except methotrexate is predicted to fragment to these ionss methotrexate yields both. Figure 7c depicts the proposed CID fragmentation. Using a quadrupole ion trap instrument and an MS4 experiment, the following fragmentation maps were established for methotrexate: 455 > 308 > 175 > 133 and 455 > 308 > 175 > 148. Future Research and Development and Availability of the Database. We are continuing to develop the MTS Search Engine by expanding the content of the database and, through technical improvements relating to spectral interpretation, search strategies and database organization. Limitations to the database size have an obvious impact on reliability. A useful goal for environmental laboratories, for example, may be to incorporate records for the 2800 high production volume chemicals or the 87 000 commercially produced chemicals.12 Spectrum interpretation for special classes of compounds is important as in the case of the quaternary amines. Interested laboratories are encouraged to request a test copy of the latest version of the MTS Search Engine database. We hope to improve the technique with input of other users and their experiences in solving challenging structure elucidation problems in API mass spectrometry and other mass spectrometry techniques. 7776

Analytical Chemistry, Vol. 80, No. 20, October 15, 2008

CONCLUSIONS/SUMMARY The identification of unknown substances is of fundamental importance in analytical chemistry and is necessary in many endeavors such as pharmaceutical and pesticide development, industrial chemistry, flavors and fragrances, forensic science, and environmental analytical chemistry. And it is critical to scientific research. Many regulatory analysis methods such as those used in wastewater testing and drinking water compliance monitoring are limited to determination of compounds specified as target analytes. Target compound analysis has advantages in that analytes have defined separation and detection characteristics and documented method performance. Expectations on instrumental analysis laboratories, however, have expanded in recent years, especially with the need to detect and identify toxic contaminants in drinking water, food, commodities, and consumer goods.16 High-profile examples of tainted products and harmful exposures to the public, companion animals, and livestock are familiar. Banned pesticides in food have been a long-standing concern, but recent cases involve obscure industrial chemicals and substances without established test methods or tolerances. Thus, broad-scope LC-MS screening methods for determination of unusual or unexpected contaminants are increasingly important. Here we demonstrate that rapid identification of global unknowns is feasible using the MTS Search Engine, an Access 2003 database containing mass data on ∼20 000 known organic compounds. This technique is useful for interpretation of both unit mass resolution and accurate mass spectral data. The paper also demonstrates the effective use of orthogonal information such (16) Kemsley, J. Chem. Eng. News 2008, (May 12), 37-43.

as isotope ratios and fragmentation data in structure elucidation. Accurate mass data are valuable, but not essential for identifying compounds by this technique. The MTS Search Engine is easy to use and rapid when compared to existing approaches such as those based on intermediate discovery of empirical formulas and provides needed complementary capability. ACKNOWLEDGMENT We thank co-workers including Dadong Xu, Phil Hill, Dayananda Rajapaksa, and Michael McKinney of the California Department of Public Health. The reviewers anonymously provided useful techni-

cal information and valuable suggestions. This research was supported, in part, by the federal Centers for Disease Control and Prevention of the Department of Health and Human Services and the U.S. Environmental Protection Agency. Mention of trade names is incidental and does not constitute endorsement.

Received for review June 9, 2008. Accepted August 19, 2008. AC801166Z

Analytical Chemistry, Vol. 80, No. 20, October 15, 2008

7777