Anal. Chem. 2009, 81, 6656–6667
Integrated, Nontargeted Ultrahigh Performance Liquid Chromatography/Electrospray Ionization Tandem Mass Spectrometry Platform for the Identification and Relative Quantification of the Small-Molecule Complement of Biological Systems Anne M. Evans,* Corey D. DeHaven, Tom Barrett, Matt Mitchell, and Eric Milgram Metabolon, Incorporated, 800 Capitola Drive, Suite 1, Durham, North Carolina 27713 To address the challenges associated with metabolomics analyses, such as identification of chemical structures and elimination of experimental artifacts, we developed a platform that integrated the chemical analysis, including identification and relative quantification, data reduction, and quality assurance components of the process. The analytical platform incorporated two separate ultrahigh performance liquid chromatography/tandem mass spectrometry (UHPLC/MS/MS2) injections; one injection was optimized for basic species, and the other was optimized for acidic species. This approach permitted the detection of 339 small molecules, a total instrument analysis time of 24 min (two injections at 12 min each), while maintaining a median process variability of 9%. The resulting MS/MS2 data were searched against an in-house generated authentic standard library that included retention time, molecular weight (m/z), preferred adducts, and in-source fragments as well as their associated MS/MS spectra for all molecules in the library. The library allowed the rapid and high-confidence identification of the experimentally detected molecules based on a multiparameter match without need for additional analyses. This integrated platform enabled the high-throughput collection and relative quantitative analysis of analytical data and identified a large number and broad spectrum of molecules with a high degree of confidence. Metabolomics is “the nonbiased quantification and identification of all metabolites present in a biological system”, but in practice the term metabolomics is used in a rather broad sense and covers many different analytical methodologies.1 These include global identification of as many metabolites as possible to targeted analysis of subsets of metabolites.2-4 Part of the ambiguity in the use of the term metabolomics is a result of the fact * To whom correspondence should be addressed. E-mail: aevans@ metabolon.com. Fax: 919-572-1721. (1) Dunn, W. Phys. Biol. 2008, 5, 1. (2) Lu, W.; Bennett, B. D.; Rabinowitz, J. D. J. Chromatogr., B 2008, 871, 236–242. (3) Sabatine, M. S.; Liu, E.; Morrow, D. A.; Heller, E.; McCarroll, R.; Wiegand, R.; Berriz, G. F.; Roth, F. P.; Gerszten, R. E. Circulation 2005, 112, 3868– 3875.
6656
Analytical Chemistry, Vol. 81, No. 16, August 15, 2009
that truly nonbiased quantification and identification of all metabolites present in biological systems is currently not obtainable. So where does one draw the line at how many metabolites one must monitor and what constitutes a nonbiased analysis to be considered a metabolomics analysis? Regardless of ones definition a goal of any “metabolomics” analysis should be a greater understanding of the processes occurring as a result of some perturbation (e.g., disease, drug or toxin exposure, diet or lifestyle habits, etc.).5,6 Since metabolites are critical to many cellular processes the best way to accomplish this is to detect, quantify, and identify as many small molecules as possible. There are many challenging aspects associated with creating a metabolomics platform. One such aspect is that it is crucial to generate a quantitative analytical method that is capable of producing data with low process variability, as this type of method is able to measure small yet significant biological changes in experimental samples. The method must also be able to detect molecules ranging in chemical diversity and dynamic range common in biological systems, from small organic acids like those involved in the citric acid cycle to large lipids used for energy and membrane construction. A metabolomics method must also provide adequate information to permit the identification of detected molecules quickly and accurately. Unfortunately, even for the most comprehensive methods, the ability to identify the detected small molecules is not trivial.7 Liquid chromatography coupled with mass spectrometry (LC/ MS) has become a standard approach for many metabolomics analyses due to its ability to separate, ionize and detect a wide range of chemicals.8-12 In addition, data obtained from LC/MS analyses contains chemically distinct information such as mass (4) Sawada, Y.; Akiyama, K.; Sakata, A.; Kuwahara, A.; Otsuki, H.; Sakurai, T.; Saito, K.; Hirai, M. Y. Plant Cell Physiol. 2009, 50, 37–47. (5) Goodacre, R.; Vaidyanathan, S.; Dunn, W. B.; Harrigan, G. G.; Kell, D. B. Trends Biotechnol. 2004, 22, 245–252. (6) Fiehn, O. Plant Mol. Biol. 2002, 48, 155–171. (7) Werner, E.; Heilier, J.; Ducruix, C.; Ezan, E.; Junot, C.; Tabet, J. J. Chromatogr., B 2008, 871, 143–163. (8) Nordstrom, A.; Want, E.; Northen, T.; Lehtio, J.; Siuzdak, G. Anal. Chem. 2008, 80, 421–429. (9) Patterson, A. D.; Li, H.; Eichler, G.; Krausz, K. W.; Weinstein, J. N.; Fornace, A. J.; Gonzalez, F. J.; Idle, J. R. Anal. Chem. 2008, 80, 665–674. (10) Wilson, I. D.; Nicholson, J. K.; Castro-Perez, J.; Granger, J. H.; Johnson, K. A.; Smith, B. W.; Plumb, R. S. J. Proteome Res. 2005, 4, 591–598. (11) Want, E.; Nordstrom, A.; Morita, H.; Siuzdak, G. J. Proteome Res. 2007, 6, 459–468. 10.1021/ac901536h CCC: $40.75 2009 American Chemical Society Published on Web 07/22/2009
to charge (m/z), retention time, and fragment ion spectra on the detected small molecules, which provides insight into chemical identity. LC/MS analyses also have the added benefit of being able to detect a wide range of small molecules without the need for derivatization, which is often needed in gas chromatography/ mass spectrometry (GC/MS) analyses.13-15 However, even with the improved range of chemical detection provided by LC/MS analyses, the chemical diversity is still so extreme that no single LC/MS method can detect all of the small molecules present in a biological sample, regardless of instrument sensitivity. There are also metabolomics analyses that utilize NMR due to its fundamental quantitative nature; however, these methods are restricted by the comparatively poor sensitivity and dynamic range of NMR.16 Fingerprint NMR analyses have been successful at being able to predict treatment/disease versus control groups based on pattern recognition; however, it is important to note that these analyses are based on the most abundant molecules present in the samples.17-20 To date, the most successful approaches to increase the breadth of small molecules measured in a sample have been to utilize and diversify LC/MS methods by varying ionization sources,8 monitoring positive and negative ions,21 varying chromatographic methods,22 and using ultrahigh performance liquid chromatography (UHPLC).10,23,24 Within LC/MS-based metabolomics studies there are different possible strategies toward the goal of detecting, quantifying, and identifying as many small molecules as possible. There are targeted methods, which are typically based on multiple reaction monitoring (MRM),2,25 and nontargeted methods, as well as combinations of these methods.26 MRM methods have the benefits of high sensitivity and large dynamic range measurements, in addition to eliminating the need for compound identification by selecting the metabolites to monitor.3,4 However, a significant drawback to this type of approach is that only those molecules that have been targeted are detected. It is possible to miss important chemical changes in a study because those (12) Wilson, I. D.; Plumb, R.; Granger, J.; Major, H.; Williams, R.; Lenz, E. M. J. Chromatogr., B 2005, 817, 67–76. (13) Timischl, B.; Dettmer, K.; Kaspar, H.; Thieme, M.; Oefner, P. J. Electrophoresis 2008, 29, 2203–2214. (14) Schauer, N.; Steinhauser, D.; Strelkov, S.; Schomburg, D.; Allison, G.; Moritz, T.; Lundgren, K.; Roessner-Tunali, U.; Forbes, M. G.; Willmitzer, L.; Fernie, A. R.; Kopka, J. FEBS Lett. 2005, 579, 1332–1337. (15) Issaq, H. J.; Abbott, E.; Veenstra, T. D. J. Sep. Sci. 2008, 31, 1936–1947. (16) Want, E.; Cravatt, B. F.; Siuzdak, G. ChemBioChem 2005, 6, 1941–1951. (17) Brindle, J. T.; Antti, H.; Holmes, E.; Tranter, G.; Nicholson, J. K.; Bethell, H. W.; Clarke, S.; Schofield, P. M.; McKilligin, E.; Mosedale, D. E.; Grainger, D. J. Nat. Med. 2002, 8, 1439–1444. (18) Clayton, T. A.; Lindon, J. C.; Cloarec, O.; Antti, H.; Charuel, C.; Hanton, G.; Provost, J.; Le Net, J.; Baker, D.; Walley, R. J.; Everett, J. R.; Nicholson, J. K. Nature 2006, 440, 1073–1077. (19) Griffin, J. L.; Shockcor, J. P. Nat. Rev. 2004, 4, 551–561. (20) Holmes, E.; Tsang, T. M.; Huang, J. T. J.; Leweke, F. M.; Koethe, D.; Gerth, C. W.; Nolden, B. M.; Gross, S.; Schreiber, D.; Nicholson, J. K.; Bahn, S. PLoS Med. 2006, 3, 1420–1428. (21) Tolstikov, V.; Fiehn, O. Anal. Biochem. 2002, 301, 298–307. (22) Van der Werf, M. J.; Overkamp, K. M.; Muilwijk, B.; Coulier, L.; Hankemeier, T. Anal. Biochem. 2007, 370, 17–25. (23) Nordstrom, A.; O’Maille, G.; Qin, C.; Siuzdak, G. Anal. Chem. 2006, 78, 3289–3295. (24) Plumb, R.; Castro-Perez, J.; Granger, J.; Beattie, I.; Joncour, K.; Wright, A. Rapid Commun. Mass Spectrom. 2004, 18, 2331–2337. (25) Kitteringham, N. R.; Jenkins, R. E.; Lane, C. S.; Elliott, V. L.; Park, B. K. J. Chromatogr., B 2009, 877, 1229–1239. (26) Woo, H. M.; Kim, K. M.; Choi, M. H.; Jung, B. H.; Lee, J.; Kong, G.; Nam, S. J.; Kim, S.; Bai, S. W.; Chung, B. C. Clin. Chim. Acta 2009, 400, 63–69.
chemicals were not targeted. Also, the number of MRM transitions that can be monitored in a single analysis is finite. In order to maintain quantitative quality the number of MRM transitions monitored must be low enough to permit adequate coverage of any given chromatographic peak. Therefore the number of MRM transitions that can be monitored is highly dependent on chromatographic peak width. As a result, these analyses often require multiple analyses of the same sample to cover all MRM transitions, and the more transitions monitored the longer the total analysis time. As an example, Sawada et al. published a targeted metabolomics method which utilized a MRM method. In this study, 497 plant metabolites were monitored. A single sample was analyzed 80 times, five MRM transitions per 3 min run, for a total of 240 min per sample analysis time. Sawada et al. detected 390 compounds with 378 being above signal-to-noise ratio (S/N) of 10. While perhaps the most comprehensive method to date and an impressive number of detected molecules, 4 h per sample is not practical for large studies.4 Most targeted MRM studies are not this extensive in either time of analysis per sample or in MRM transitions monitored. The advantages of targeted MRM analyses are significant; however, the length of analysis time and the possibility of missing significant biological changes must also be considered and are the fundamental reasons why nontargeted methods are implemented. Different strategies exist within nontargeted analyses as well. A common approach used for nontargeted metabolomics analyses is to analyze samples using an accurate mass MS and then process the detected ion features using a statistical method, e.g., principal component analysis (PCA), random forest, etc., to determine which ion features vary according to the study design. There are several significant drawbacks to this type of ion-centric approach. First, as a result of treating all ion features independently, without regard to their biochemical origin,7,21,27 it is possible to skew the statistical outcome by over-representation of a given biochemical. For example, if biochemical X produces three ion features, whereas biochemical Y produces 23, then biochemical Y, with 23 ions, will be over-represented and the statistical outcome may be skewed. In addition, this ion-centric approach suffers from an increase in the number of false discoveries due to the increased number of observations processed in the statistical analysis.28 Both of these drawbacks convolute and complicate data interpretation. In addition to the drawbacks related to statistics, this ion-centric approach can be time-consuming. This approach generally involves generating a list of potential molecular formulas from the accurate mass data and then searching those formulas against databases to generate a list of potential structural candidates.7,29,30 This approach to the identification process is time-consuming, as it often requires additional analyses7,9,13,29 and it often does not result in a definitive identification. The limitations of this approach to the identification process are rarely discussed1,7 and have led to the misconception that with accurate mass and MS/MS data most signals can be unambiguously identified. (27) Katajamaa, M.; Oresic, M. J. Chromatogr., A 2007, 1158, 318–328. (28) Broadhurst, D. I.; Kell, D. B. Metabolomics 2006, 2, 171–196. (29) Chen, J.; Zhao, X.; Fritsche, J.; Yin, P.; Schmitt-Kopplin, P.; Wang, W.; Lu, X.; Haring, H. U.; Schleicher, E. D.; Lenmann, R.; Xu, G. Anal. Chem. 2008, 80, 1280–1289. (30) Kind, T.; Fiehn, O. BMC Bioinf. 2007, 8, 105.
Analytical Chemistry, Vol. 81, No. 16, August 15, 2009
6657
Part of this misconception results from the assumption that with sufficient mass accuracy a molecule can be identified. This is a fallacy because even one chemical formula can have many valid chemical structures.31 Take, for example, the chemical formula of phenylalanine, C9H11NO2. This formula has over 6 million valid chemical structures as calculated using Molgen32 Online (http://molgen.de/?src)documents/molgenonline). Generally, the assumption is made that only those structures that have been documented are likely candidates, but even when only documented structures are considered the molecular formula for phenylalanine has over 500 candidates as seen in ChemSpider (http://www.chemspider.com). Furthermore, using accurate mass data to identify candidates relies heavily on the completeness of the databases searched. Although databases such as Kegg, HMDB, MassBank, ChemACX, and ChemSpider contain millions of structures they are far from containing all relevant structures. As an example, small molecules in biological systems are subject to phase I and II metabolism (glucuronidation, reduction, oxidation, sulfation, amino acid conjugation, etc.), and many of these modified small molecules are not covered in current databases. When these compounds are encountered experimentally they are much more difficult to identify accurately. In these cases searching the predetermined molecular formula will yield either no results or false positive results. In order to combat some of the ambiguity and refine the list of potential candidates, typical of this type of approach, MSn spectra must generally be collected. Finally, to confirm identification a standard must be analyzed (if the standard is available for purchase). These additional analyses require additional instrument time and sample, which can be limited. This entire identification process adds a great deal of time to the end of a study, and as a result, structure elucidation is only attempted for those molecules that show statistical significance. Although this process is the only route available when true unknown identification is needed, this process is so inefficient that an alternative approach to identification is needed for a highthroughput metabolomics platform. The method presented here represents an approach to address the challenges associated with metabolomics. It uses an in-house chemical library, generated from 1500 authentic standards. This library contains the retention time/index (RI), mass to charge (m/z), and MS/MS spectral data on all molecules present in the library, including their associated adducts, in-source fragments, and multimers for a total of ∼10 000 recorded MS/MS spectra. This library allows rapid and highconfidence identifications of a compound’s molecular ion and its adducts without the need for further analyses in most cases. The platform includes software that highlights new chemical species not yet contained in the in-house library so that all detected small molecules, whether known or unknown, are documented and monitored. This approach ensures a truly nontargeted metabolomics analysis. The analytical method uses UHPLC to improve data quality, quantity, and speed and multiple LC/MS injections using different chromatographic conditions per sample to increase metabolome coverage. This method integrates both relative quantification (full scan MS) and qualitative capabilities (MS/MS) (31) Kind, T.; Fiehn, O. BMC Bioinf. 2006, 7, 234. (32) Gugisch, R.; Kerber, A.; Laue, R.; Meringer, M.; Rucker, C. MATCH 2007, 58, 239–280.
6658
Analytical Chemistry, Vol. 81, No. 16, August 15, 2009
into each analysis without need for additional data acquisition. To illustrate the analytical improvements of this method relative to a conventional HPLC/MS platform, both methods were applied to the same sample set. The HPLC/MS analytical platform used a conventional HPLC that alternated ion polarity during a single analysis and did not collect MS/MS spectra. The results from the two methods are compared and discussed. METHODS Sample Material. Plasma samples were obtained from 38 Caucasian males, ages 18-40, with clinically defined metabolic syndrome (n ) 19) and healthy age-matched controls (n ) 19). Meta data on BMI (body mass index), blood pressure, HDL (highdensity lipoprotein) levels, HOMA-IR (homeostasis model assessment of insulin resistance), triglyceride levels, fasting glucose, and insulin levels were documented for each individual in the cohort. Sample Preparation. Prior to extraction, samples were stored at -80 °C. On the day of extraction, samples were thawed on ice. Proteins were precipitated from 100 µL of human plasma with methanol using an automated liquid handler (Hamilton LabStar). The methanol contained four standards, which permitted the monitoring of extraction efficiency. The precipitated extract was split into four aliquots and dried under nitrogen and then in vacuo. For the UHPLC method, one aliquot was reconstituted in 50 µL of 0.1% formic acid in water, and another aliquot was reconstituted in 50 µL of 6.5 mM ammonium bicarbonate in water, pH 8. For the conventional HPLC method an aliquot was reconstituted in 50 µL of 0.1% formic acid in 10% methanol. All reconstitution solvents contained instrument internal standards that were used monitor instrument performance and as retention index markers. The remaining aliquot was stored dried for no longer than 2 days at -80 °C for rerun purposes, if necessary. The extracts for both HPLC and UHPLC were analyzed on the same day using two separate linear trap quadrupole (LTQ) instruments, one equipped with a UHPLC and one with an HPLC. In addition to samples in the metabolic syndrome study, a pooled sample of human plasma was extracted five independent times per day per instrument. These samples, which were referred to as QC matrix samples (MTRX), served as technical replicates throughout the data set to assess process variability.33 In addition, 100 µL of water was also extracted five independent times per day per instrument to serve as process blanks. Every sample analyzed was spiked with standards in order to monitor and evaluate instrument and extraction performance and align chromatograms. These standards were carefully chosen so as not to interfere with the measurement of endogenous species. Instrument Analysis. UHPLC Analyses. The aliquots were separated using a Waters Acquity UPLC (Waters, Millford, MA). The extracts that were reconstituted in formic acid were gradienteluted at 350 µL/min using (A) 0.1% formic acid in water and (B) 0.1% formic acid in methanol (0% B to 70% B in 4 min, 70-98% B in 0.5 min, 98% B for 0.9 min), whereas the extracts reconstituted in ammonium bicarbonate used (A) 6.5 mM ammonium bicarbonate in water, pH 8, and (B) 6.5 mM ammonium bicarbonate in 95/5 methanol/water (same gradient profile as above) also at 350 (33) Sangster, T.; Major, H.; Plumb, R.; Wilson, A. J.; Wilson, I. D. Analyst 2006, 131, 1075–1078.
µL/min. A 5 µL aliquot of sample was injected using 2× overfill and analyzed using an LTQ mass spectrometer (MS) (ThermoFisher Corporation) using electrospray ionization (ESI). The acidic extracts were monitored for positive ions, and the basic extracts were monitored for negative ions in independent injections using separate acid/base dedicated 2.1 mm × 100 mm Waters BEH C18 1.7 µm particle columns heated to 40 °C. The MS interface capillary was maintained at 350 °C, with a sheath gas flow of 40 (arbitrary units) and aux gas flow of 5 (arbitrary units) for both positive and negative injections. The spray voltage for the positive ion injection was 4.5 kV, and it was 3.75 kV for the negative ion injection. The instrument scanned 99-1000 m/z and alternated between MS and MS/MS scans. The scan speed was approximately six scans per second (three MS and three MS/ MS scans). The MS scan had an ion-trap target of 2 × 104 (arbitrary units) and an ion-trap fill time cutoff of 200 ms. The MS/MS scan had an ion-trap target of 1 × 104 (arbitrary units) and an ion-trap fill time cutoff of 100 ms. MS/MS normalized collision energy was set to 40, activation Q 0.25, and activation time 30 ms, with a 3 m/z isolation window. MS/MS scans were collected using dynamic exclusion with an exclusion time of 3.5 s. Dynamic exclusion is a process in which after an MS/ MS scan of a specific m/z has been obtained that m/z is placed on a temporary MS/MS exclude list for a user-set period of time, in this case 3.5 s (approximate lifetime of a chromatographic peak). Dynamic exclusion allows greater MS/MS coverage of ions present in the MS scan because the instrument will not trigger an MS/MS scan of the same ion repeatedly. HPLC Analyses. The aliquots were separated using a Surveyor MSplus pump (ThermoFisher) equipped with a 3 µm particle 2.1 mm × 100 mm Aquasil C18 (ThermoFisher) column. These samples were gradient-eluted at 200 µL/min using (A) 0.1% formic acid in water and (B) 0.1% formic acid in methanol (0% B for 4 min, 0-50% B in 2 min, 50-80% B in 5 min, 80-100% B in 1 min, and 100% for 2 min). A 15 µL aliquot of sample was injected onto a 10 µL sample loop and was also analyzed using ESI LTQ MS. The instrument scanned 99-1500 m/z, alternated between positive and negative polarity within a given injection of a sample, and did not collect MS/MS data. Accurate Mass Analyses. In some instances it was necessary to obtain accurate mass MS data. In these instances a hybrid LTQ-Fourier transform ion cyclotron resonance (FTICR) MS (ThermoFisher) was implemented using the above stated chromatographic and ionization conditions. The instrument alternated scans between ion-trap MS data scanned 99-1000 m/z and FTICR MS data scanned 80-800 m/z with the mass resolving power set to 50 000. Mass accuracy was calculated based on internal standards to be 0.995) averages an LDR of 3-4 orders of magnitude and scans 6 scans/s (alternating full scan MS and MS/MS scans). It is important to note that the observed LDR is greatly influenced by the molecule’s ionization characteristics. For example, molecules that readily form adducts and multimers have much lower observed LDRs due to the
Figure 2. ESI+ single-ion chromatogram (SIC) for 132 m/z from a human plasma extract: (A and B) isoleucine and leucine are fully resolved by UHPLC; (C) isoleucine and leucine are not resolved by conventional HPLC.
Table 3. Comparison of Analytical Reproducibility (% RSD) Results of Pooled Human Plasma Technical Replicate Data for Compounds That Were Detected in 100% of the Replicate Samplesa platform
nonstandards median % RSD
standards median % RSD
HPLC UHPLC
18 (n ) 136) 9 (n ) 252)
8 (n ) 7) 5 (n ) 20)
a The numbers of compounds used in the % RSD calculation are shown within the parentheses.
percentage of the molecule being diverted into other ions. Molecules that solely produce a molecular ion have the largest LDR. Overall the ion-trap was used for its reproducibility, sensitivity, good dynamic range, and ability to perform MSn experiments. This new LC/MS method was designed to detect and quantify as many chemicals as possible, while maintaining data quality and sample throughput. The UHPLC system played a significant role in the achievement of these goals. Run time was reduced so significantly that two optimized injections could be implemented, which improved the number and variety of small molecules measured (i.e., metabolome coverage) while simultaneously
reducing the total run time as compared to the HPLC. Further, the improved chromatographic resolving power permitted the detection of more compounds and significantly lowered process variability. Library Identification. Accurate metabolite identification from a complex mixture is perhaps one of the most challenging and time-consuming aspects of metabolomics studies. The importance of correctly identifying a small molecule is paramount to accurate biochemical pathway interpretation. To accomplish the goal of creating a method that could quickly and accurately identify detected small molecules, an in-house chemical library was created using authentic chemical standards. This library contains the nominal mass, retention time, and MS/MS spectra of all the ions generated by the chemicals including the molecular, adducts, insource fragments, and multimers, in both negative and positive ion modes. This translates to over 10 000 recorded MS/MS spectra for 1500 compounds. An example of the usefulness of this library can be seen in the following example. In a set of three studies, a series of peaks were detected in the LC positive injection that shared the same nominal mass (also later determined by LTQ-FTICR to be the same accurate mass, 245.163 m/z ± 5 ppm) and strikingly similar Analytical Chemistry, Vol. 81, No. 16, August 15, 2009
6663
Figure 4. Venn diagram illustrating the ionization characteristics/ preferences of (A) the known/named molecules detected and (B) all molecules detected.
Figure 3. Peaks with the same nominal mass, 245.1 m/z ESI+ (and also later determined to have the same accurate mass 245.163 ( 5 ppm with an LTQ-FTICR MS) and similar ESI+ MS/MS spectra detected in three different matrixes: (A) 245.1 m/z SIC in human muscle tissue; (B) SIC IS tryptophan-d5 in human muscle tissue; (C) 245.1 m/z SIC in human urine; (D) SIC IS tryptophan-d5 in human urine; (E) 245.1 m/z SIC in human plasma; (F) SIC IS tryptophan-d5 in human plasma. Pivaloylcarnitine ) a, 2-methylbutyrylcarnitine ) b, isovalerylcarnitine ) c, and valerylcarnitine ) d.
MS/MS spectra. In human urine there were three peaks with this mass, in human plasma there were two peaks, and in human muscle tissue there were three peaks (Figure 3, parts A, C, and E). What compounds were in urine, in plasma, and in muscle tissue? In this case, neither mass (including accurate mass) nor MS/MS was able to differentiate these compounds. However, comparison to a library that included these factors in addition to retention time allowed the compounds to be quickly identified as follows: urine contained pivaloyl-, and 2-methylbutyrl- and isovalerylcarnitine, muscle tissue contained 2-methylbutyrl-, isovaleryl-, and valerylcarnitine, and human plasma contained 2-methylbutyrland isovalerylcarnitine. Interestingly, although a maximum of three peaks were found in any given matrix, the detected peaks were actually a combination of four different carnitines. Only a database which included all three criteria, retention time, mass, and tandem MS, would permit the rapid and accurate identification of these compounds without need for further analyses. It is important to note that for a method to be truly nontargeted the identifications cannot be restricted to only those molecules present in the library. For that reason at the completion of a study, after the data has been searched against the library and compound identifications have been made, a separate software program searches for ions that have not been associated with a library entry. These ions were then manually reviewed by an analyst to determine if they represented a new compound that was not 6664
Analytical Chemistry, Vol. 81, No. 16, August 15, 2009
entered in the chemical library. If the compound had reproducible retention time, mass, and MS/MS spectra, then a new chemical entry was created and given a numerical designation that enabled that compound to be detected and documented in future experiments. As part of this analysis, an in-house software package38 looked for ions that displayed strong correlation across the study within specified chromatographic time windows. These highly correlated ion groups were inspected for isotope and adduct relationships and, where appropriate, grouped together. This correlation analysis permitted the proposed molecular ion and its isotopes and adducts to be grouped, even though the molecule was unknown. As a result of these procedures, the absence or presence of this new chemical could be reported and tracked, even though the identity of the chemical was unknown. This aspect of the data analysis ensures complete nontargeted coverage of all of the molecules present in the sample (above the limits of detection) without prior knowledge of their existence. Confidence in chemical identification was further improved with the use of the two independent separations (i.e., positive and negative injections). In a case where a compound was detected in both the positive and the negative injection, the compound had to match six independent variables: retention time on both the positive and negative injection, precursor mass on both the positive and negative injection, and MS/MS spectral match on both the positive and negative injection. The likelihood of another molecule sharing all six characteristics is low, thereby providing greater confidence in the proposed identification. In this study, 28% of the known (i.e., named) molecules were detected in both the positive and the negative injections (Figure 4). Of note, the responses of compounds that were detected on both the positive and negative injections correlated with a median of 91% when positive and negative signals were compared, furthering confidence of the quantitative results. In order for a database that incorporates chromatographic retention as one of its criteria to be used, chromatograms must be able to be aligned to the same scale throughout the lifetime of the library and not just aligned within a given study, with which most of today’s LC chromatographic alignment tools are restricted.23,27,41 The current library uses a system used to align GC/ MS data, a retention index.35,36 To our knowledge using a retention index system to align LC/MS data has never been reported. A retention index aligns chromatograms by a process in which retention markers are placed throughout the chromatographic time window and all peaks are measured in relationship to these markers. Each RT marker is given a fixed RI value. As retention time shifts with time as a result of some systematic effect, (41) Vorst, O.; de Vos, C. H. R.; Lommen, A.; Staps, R. V.; Visser, R. G. F.; Bino, R. J.; Hall, R. D. Metabolomics 2005, 1, 169–180.
Figure 5. MS/MS can distinguish two compounds with the same m/z. (A) Experimental ESI+ SIC of 118 m/z from human plasma extract. Two separate compounds were detected with the same m/z (later determined with an LTQ-FTICR MS to be the same mass within an instrument mass error of 5 ppm). (B) MS/MS spectrum of peak at 0.74 min (betaine). (C) MS/MS spectrum of peak at 1.09 min (valine).
the RI scale will remain the same. This system works exceedingly well for GC/MS analyses but is slightly complicated in LC/MS analyses where compounds of varying classes can be affected differently as, for example, the column ages or as the pH of the mobile phase changes with time. Since, a compound’s RI is based on its retention as compared to its flanking retention markers it is assumed that a compound will always elute in the same relative position to those markers. However, this is not always the case. The retention markers are isotopically labeled metabolites that have their own chemical behavior and characteristics. If the experimentally detected chemicals behave different chromatographically from the retention markers as, for example, the column ages, the experimental chemicals RI will drift. The experimentally detected chemicals that do show differing retention behaviors as compared to the markers are given larger RI windows. As an example, the standard RI window for chemicals in the library is 75, in cases where molecules shift in relation to the retention markers their RI window might be as large as 250 (or approximately 15 s). The authentic standard library included detailed information, including MS/MS spectra, on all of the ions produced by each given chemical; including isotopes, adducts (e.g., K and Na), multimers (e.g., (2m + H)+), and in-source fragments. On the basis of primary scan MS data from this database, a single biochemical produced an average of seven ions. However, the number of ions produced by a given biochemical could range from as few as one ion to as many as 43 (in extreme cases) depending on concentration and individual chemical behavior. By documenting the ion species a given chemical could form, as done in the library, when these ions were detected under experimental conditions they could be im-
mediately linked back to their parent chemical. We term this type of approach chemo-centric. The focus with this type of approach is on the identification and removal of the redundant ion features produced by a single chemical prior to statistical analysis. As a result, a representative ion could be selected for statistical analysis thereby limiting the drawbacks that are associated with the ion-centric approach, where all ion features are treated independently, namely, an increase in false discoveries and potential skewing of the statistical outcome. MS/MS Spectral Matching. The field of metabolomics has a significant challenge in that small-molecule fragmentation varies between instrument types. For example, a triple-quadruple MS/ MS spectrum can look very different from an ion-trap MS/MS spectrum.42-44 For this reason MS/MS spectra of small molecules must be compared to experimentally obtained spectra. NIST is the largest example of a database of experimentally obtained spectra; however, as this database is constructed primarily of electron ionization (EI) spectra, it is not applicable to the evergrowing field of LC/MS/MS2 users.14 There is currently a movement to put into place a public database of tandem MS spectra; however, the endeavor is complicated by these interinstrument differences in fragmentation spectra. In spite of this complication there are databases that contain MS/MS spectra, for example, HMDB, MassBank, Metlin, and LipidMaps; (42) Bristow, A. W. T.; Webb, K. S.; Lubben, A. T.; Halket, J. Rapid Commun. Mass Spectrom. 2004, 18, 1447–1454. (43) Jansen, R.; Lachatre, G.; Marquet, P. Clin. Biochem. 2005, 38, 362–372. (44) Josephs, J.; Sanders, M. Rapid Commun. Mass Spectrom. 2004, 18, 743– 759.
Analytical Chemistry, Vol. 81, No. 16, August 15, 2009
6665
Figure 6. MS/MS spectra allow compounds with the same molecular formula and retention time to be differentiated. (A) MS/MS spectrum of anthranilic acid (138 m/z (m + H)+). (B) MS/MS spectrum of salicylamide (138 m/z (m + H)+). (C) experimental SIC of 138 m/z ESI+. (D) Experimental MS/MS spectrum of the peak at 3.17 min indicates that the peak is anthranilic acid. The two compounds could not be distinguished by mass and retention time alone.
however, the number of MS/MS spectra are limited, and in most cases only the molecular ion is documented.35,44-47 Tandem MS is a rich and diagnostic stream of data that gives direct information concerning the structural connectivity of a molecule. In many cases compounds with the same molecular formula can be differentiated by MS/MS spectra. Valine and betaine, two common metabolites having the same m/z, can be readily differentiated based on MS/MS fragmentation (and, in this example, by retention time as well) (Figure 5). Identifications made using an MS/MS spectral library not only allow the confirmation of the compounds identified but also the immediate rejection of those compounds with a fragmentation pattern that do not match the proposed identification. In this study, 16 compounds (data not shown) that were identified based on mass and retention time alone using a conventional HPLC method were determined to be incorrect when MS/MS spectra were compared. For example, as shown in Figure 6, anthranilic acid and salicylamide were both possible identifications for a compound that could (45) Moco, S.; Bino, R. J.; Vorst, O.; Verhoeven, H. A.; de Groot, J.; van Beek, T. A.; Vervoot, J.; Ric de Vos, C. H. Plant Physiol. 2006, 141, 1205–1218. (46) Weinmann, W.; Gergov, M.; Goerner, M. Analusis 2000, 28, 934–941. (47) Baumann, C.; Cintora, M. A.; Eichler, M.; Lifante, E.; Cooke, M.; Przyborowska, A.; Halket, J. M. Rapid Commun. Mass Spectrom. 2000, 14, 349– 356.
6666
Analytical Chemistry, Vol. 81, No. 16, August 15, 2009
not be distinguished by either mass or retention time. However, the experimental MS/MS scan revealed a large 120 ion consistent with anthranilic acid. As a result it could be immediately concluded that anthranilic acid was present in the sample and salicylamide was not, without need for further analyses. Artifact Removal. A commonly overlooked issue which leads to an increased false discovery rate is the presence of artifacts in metabolomic analyses. An artifact is defined as a chemical whose presence is attributed to the process and not originating from the biological samples. Artifacts can include releasing agents present in plastic tubing or sample vials, solvent contaminants, polymers, etc. In the presented platform, artifacts were identified by analyzing the biochemicals present in a sample of water (i.e., QC process blank) analyzed concurrently with the experimental samples. Along with the MTRX samples, these process blanks were analyzed every seven injections. Biochemicals that were present at less than 3 times the levels in the process blanks as compared to the experimental samples were removed from the data set prior to any statistical analysis. In this study, 281 process artifacts were removed from the UHPLC data and over 400 from the HPLC data prior to statistical analysis. Drawbacks of the Current Approach. The largest disadvantage to this type of approach is the expense, both monetarily and timewise, associated with creating and maintaining the in-house spectral library and software. All compounds for the in-house library had to be purchased from chemical vendors, analyzed on all platforms, and then manually reviewed and entered into the library by an analyst. While there is significant time savings on the data processing and analysis side of every project completed, there is a significant expenditure of time in building the libraries and software that must also be taken into consideration. Although metabolomics studies are often referred to as nontargeted, they are not unbiased. Bias is introduced in the choice of ionization techniques, column phase, solvent choice, etc. In this method, the LC methods used ESI. Electrospray ionization only allows detection of those molecules that are able to retain a charge. Therefore, compounds such as cholesterol and carotenoids are not detectable by this method. The LC method also uses a C18 phase UHPLC column. While separation and integration of polar compounds is aided by the lack of methanol in the reconstitution buffer, many polar compounds still elute prior to 1 min of the chromatogram leading to greater ion suppression in this area. As a result, only the most abundant and highest ionization efficiency metabolites are detected in this region. Finally, this method is designed to obtain relative quantitative data and not absolute. In many cases, such as biomarker validation and diagnostics, absolute quantification is necessary. In these applications, an additional absolute quantitation method would need to be utilized. CONCLUSIONS There is a breadth of challenges facing metabolomics including compound identification, metabolome coverage, biological dynamic range, false discovery rate, and analytical reproducibility. Metabolomics holds solutions to many routinely encountered problems: unknown drug mechanism of action and toxicology, bioprocessing complications, need for disease state biomarkers,
and so forth.48-50 However, if these studies cannot be executed quickly and with the highest regard to quality there is a danger that the field will not be utilized to its fullest potential. This platform permitted the rapid completion of metabolomics studies while still maintaining analytical quality and reproducibility. Much of the speed and efficiency associated with this platform was due to the ability to rapidly identify detected small molecules by way of the comprehensive library. This platform represents an approach to non-targeted metabolomics analysis that allows compounds to be identified quickly and with a high degree of confidence, not just those that vary statistically, and which strives (48) Sreekumar, A.; Poisson, L. M.; Rajendiran, T. M.; Khan, A. P.; Cao, Q.; Yu, J.; Laxman, B.; Mehra, R.; Lonigro, R. J.; Li, Y.; Nyati, M. K.; Ahsan, A.; Kalyana-Sundaram, S.; Han, B.; Cao, X.; Byun, J.; Omenn, G. S.; Ghosh, D.; Pennathur, S.; Alexander, D. C.; Berger, A.; Shuster, J. R.; Wei, J. T.; Varambally, S.; Beecher, C.; Chinnaiyan, A. M. Nature 2009, 457, 910– 914. (49) Lawton, K. A.; Beebe, K.; Berger, A.; Guo, L.; Rose, D.; Roulston, A.; Tsutsui, N.; Ryals, J. A.; Milburn, M. V. Computational and Systems Biology. Methods and Applications; Research Signpost: Kerala, India, 2009; Chapter 6. (50) Lawton, K. A.; Berger, A.; Mitchell, M.; Milgram, K. E.; Evans, A. M.; Guo, L.; Hanson, R. W.; Kalhan, S. C.; Ryals, J. A.; Milburn, M. V. Pharmacogenomics 2008, 9, 383–397.
to improve data quality and reduce false leads to ultimately simplify and aid biological interpretation. ACKNOWLEDGMENT The authors thank the platform team at Metabolon, including John Oswald, Fred Hubbard, Danny Wright, Don Harvan, John Lennon, and Carolyn Amoretty for their assistance in extracting samples and acquiring MS data. We thank the IT and software development team including Sarada Tanikella, Hongping Dai, and Herb Lowe for their work in software development, creation, and optimization. We thank the Metabolon CEO and CSO, John Ryals and Mike Milburn, respectively, for their support and encouragement during method development. SUPPORTING INFORMATION AVAILABLE Additional information as noted in text. This material is available free of charge via the Internet at http://pubs.acs.org.
Received for review April 3, 2009. Accepted July 10, 2009. AC901536H
Analytical Chemistry, Vol. 81, No. 16, August 15, 2009
6667