Metabolite Identification in - American Chemical Society

Oct 14, 2010 - and Trent R. Northen*,†. Life Sciences Division and Earth Sciences Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road,...
0 downloads 0 Views 1MB Size
Anal. Chem. 2010, 82, 9034–9042

Metabolite Identification in Synechococcus sp. PCC 7002 Using Untargeted Stable Isotope Assisted Metabolite Profiling Richard Baran,† Benjamin P. Bowen,† Nicholas J. Bouskill,‡ Eoin L. Brodie,‡ Steven M. Yannone,† and Trent R. Northen*,† Life Sciences Division and Earth Sciences Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, California 94720, United States Metabolite profiling using mass spectrometry provides an attractive approach for the interrogation of cellular metabolic capabilities. Untargeted metabolite profiling has the potential to identify numerous novel metabolites; however, de novo identification of metabolites from spectral features remains a challenge. Here we present an integrated workflow for metabolite identification using uniform stable isotope labeling. Metabolite profiling of cell and growth media extracts of unlabeled control, 15N, and 13C-labeled cultures of the cyanobacterium, Synechococcus sp. PCC 7002 was performed using normal phase liquid chromatography coupled to mass spectrometry (LCMS). Visualization of three-way comparisons of raw data sets highlighted characteristic labeling patterns for metabolites of biological origin allowing exhaustive identification of corresponding spectral features. Additionally, unambiguous assignment of chemical formulas was greatly facilitated by the use of stable isotope labeling. Chemical formulas of metabolites responsible for redundant spectral features were determined and fragmentation (MS/MS) spectra for these metabolites were collected. Analysis of acquired MS/MS spectra against spectral database records led to the identification of a number of metabolites absent not only from the reconstructed draft metabolic network of Synechococcus sp. PCC 7002 but not included in databases of metabolism (MetaCyc or KEGG). Metabolite profiling using mass spectrometry methods allows comprehensive analysis of metabolites in complex mixtures.1 These methods have successfully been applied in various contexts for functional genomics2 providing an overall view of cellular metabolic capabilities and indicating the presence of specific metabolic pathways or enzymatic activities. Metabolite profiling using liquid chromatography coupled to high mass accuracy electrospray ionization mass spectrometry * To whom correspondence should be addressed. Phone: +1-510-486-5240. Fax: +1-510-486-4545. E-mail: [email protected]. † Life Sciences Division. ‡ Earth Sciences Division. (1) Garcia, D. E.; Baidoo, E. E.; Benke, P. I.; Pingitore, F.; Tang, Y. J.; Villa, S.; Keasling, J. D. Curr. Opin. Microbiol. 2008, 11, 233–239. (2) Baran, R.; Reindl, W.; Northen, T. R. Curr. Opin. Microbiol. 2009, 12, 547– 552.

9034

Analytical Chemistry, Vol. 82, No. 21, November 1, 2010

(LC-MS) generates large data sets usually containing on the order of thousands of spectral features.3 Identification of metabolites corresponding to spectral features remains a challenge.4 Chromatographic and spectral information from the analysis of chemical standards can be compared to features in sample data sets for metabolite identification.5 However, limited availability of chemical standards typically results in a small coverage of spectral features in sample data sets.6,7 Spectral features which cannot be linked to those of the standards can be analyzed to identify metabolites de novo. Properties of spectral features such as accurate mass, isotopic profile, chemical rules, and elemental ratios can be used for the assignment of putative chemical formulas.4 Candidate chemical formulas can be further constrained by consistency with chemical formulas of corresponding fragment ions or neutral losses.8-10 Assignment of chemical formulas to spectral features applying the above rules is often possible for small molecules (Mr < 250) but becomes more and more ambiguous with increasing molecular weight due to the large number of possible chemical formulas.11,12 Uniform labeling of cellular metabolites with stable isotopes has been accepted as a valuable aid for the discrimination among alternative chemical formulas thanks to characteristic shifts of spectral features corresponding to the numbers of incorporated stable isotopes.13,14 It was shown that uniform labeling with stable isotopes discriminates among alternative chemical formulas of metabolites with higher molecular weights providing a significant (3) Smith, C. A.; Want, E. J.; O’Maille, G.; Abagyan, R.; Siuzdak, G. Anal. Chem. 2006, 78, 779–787. (4) Kind, T.; Fiehn, O. BMC Bioinf. 2007, 8, 105. (5) Lu, W.; Bennett, B. D.; Rabinowitz, J. D. J. Chromatogr., B: Anal. Technol. Biomed. Life Sci. 2008, 871, 236–242. (6) Styczynski, M. P.; Moxley, J. F.; Tong, L. V.; Walther, J. L.; Jensen, K. L.; Stephanopoulos, G. N. Anal. Chem. 2007, 79, 966–973. (7) Bowen, B. P.; Northen, T. R. J. Am. Soc. Mass Spectrom. 2010, 21, 1471– 1476. (8) Suzuki, S.; Ishii, T.; Yasuhara, A.; Sakai, S. Rapid Commun. Mass Spectrom. 2005, 19, 3500–3516. (9) Kaufmann, A. Rapid Commun. Mass Spectrom. 2007, 21, 2003–2013. (10) Bo ¨cker, S.; Rasche, F. Bioinformatics 2008, 24, i49–i55. (11) Hegeman, A. D.; Schulte, C. F.; Cui, Q.; Lewis, I. A.; Huttlin, E. L.; Eghbalnia, H.; Harms, A. C.; Ulrich, E. L.; Markley, J. L.; Sussman, M. R. Anal. Chem. 2007, 79, 6912–6921. (12) Matsuda, F.; Shinbo, Y.; Oikawa, A.; Hirai, M. Y.; Fiehn, O.; Kanaya, S.; Saito, K. PLoS One 2009, 4, e7490. (13) Rodgers, R. P.; Blumer, E. N.; Hendrickson, C. L.; Marshall, A. G. J. Am. Soc. Mass Spectrom. 2000, 11, 835–840. (14) Birkemeyer, C.; Luedemann, A.; Wagner, C.; Erban, A.; Kopka, J. Trends Biotechnol 2005, 23, 28–33. 10.1021/ac1020112  2010 American Chemical Society Published on Web 10/14/2010

increase in coverage.11 This method provides additional advantages, namely, in that it confirms the biological origin of metabolites (vs background) and relative quantification between samples.15,16 Chemical formulas alone are not sufficient for unambiguous identification of metabolites given large numbers of possible structural isomers or stereoisomers matching any given chemical formula.12 Fragmentation patterns from tandem mass spectrometry analysis (MS/MS) provide additional structural information17-19 through comparison of patterns obtained for unknown ions with those of standards or structurally similar compounds.20 Structurebased computational prediction of MS/MS spectra can support identification of metabolites for which experimental MS/MS spectra of standards are not available.18,19,21-23 Microgreen algae such as the cyanobacteria Synechococcus and Prochlorococcus contribute significantly to the global carbon cycle.24 Yet much remains unknown about the metabolism of these organisms. Here we employed a novel integrated workflow for identifying metabolites in a model cyanobacterium, Synechococcus sp. PCC 7002 using uniform stable isotope labeling in combination with untargeted normal phase metabolite profiling and tandem mass spectrometry. Visualization of three-way comparisons of raw labeling data sets allowed exhaustive curation of the data to discriminate signals of biological origin from background, to assign chemical formulas to metabolites responsible for redundant spectral features (adducts and neutral loss products), and to select these metabolites for downstream MS/MS analysis. Confrontation of the set of identified metabolites with the set of metabolites in a reconstructed draft metabolic network of Synechococcus sp. PCC 7002 showed a number of compounds absent not only from the draft metabolic network but unaccounted for even in databases of metabolism MetaCyc25 or KEGG.26 MATERIALS AND METHODS Chemicals and Strains. HPLC grade water, methanol, and acetonitrile (Honeywell), inorganic salts and vitamin B12 for culture media preparation, amino acid kit, 5-methyluridine (SigmaAldrich), [13C]sodium bicarbonate, [15N]sodium nitrate (Cambridge Isotope Laboratories), spongothymidine (MP Biomedicals), and Synechococcus sp. PCC 7002 (American Type Culture Collection, ATCC number 27264) were used. ¨ l, K.; Hummel, J.; Seiwert, B.; Willmitzer, L. Anal. Chem. (15) Giavalisco, P.; Koh 2009, 81, 6546–6551. (16) Feldberg, L.; Venger, I.; Malitsky, S.; Rogachev, I.; Aharoni, A. Anal. Chem. 2009, 81, 9257–9266. (17) Tiller, P. R.; Yu, S.; Castro-Perez, J.; Fillgrove, K. L.; Baillie, T. A. Rapid Commun. Mass Spectrom. 2008, 22, 1053–1061. (18) Leclercq, L.; Mortishire-Smith, R. J.; Huisman, M.; Cuyckens, F.; Hartshorn, M. J.; Hill, A. Rapid Commun. Mass Spectrom. 2009, 23, 39–50. (19) Kertesz, T. M.; Hall, L. H.; Hill, D. W.; Grant, D. F. J. Am. Soc. Mass Spectrom. 2009, 20, 1759–1767. (20) Benton, H. P.; Wong, D. M.; Trauger, S. A.; Siuzdak, G. Anal. Chem. 2008, 80, 6382–6389. (21) Hill, A. W.; Mortishire-Smith, R. J. Rapid Commun. Mass Spectrom. 2005, 19, 3111–3118. (22) Hill, D. W.; Kertesz, T. M.; Fontaine, D.; Friedman, R.; Grant, D. F. Anal. Chem. 2008, 80, 5574–5582. (23) Heinonen, M.; Rantanen, A.; Mielika¨inen, T.; Kokkonen, J.; Kiuru, J.; Ketola, R. A.; Rousu, J. Rapid Commun. Mass Spectrom. 2008, 22, 3043–3052. (24) Scanlan, D. J.; West, N. J. FEMS Microbiol. Ecol. 2002, 40, 1–12. (25) Caspi, R.; Altman, T.; Dale, J. M.; Dreher, K.; Fulcher, C. A.; Gilham, F.; Kaipa, P.; Karthikeyan, A. S.; Kothari, A.; Krummenacker, M.; Latendresse, M.; Mueller, L. A.; Paley, S.; Popescu, L.; Pujar, A.; Shearer, A. G.; Zhang, P.; Karp, P. D. Nucleic Acids Res. 2010, 38, D473–D479. (26) Kanehisa, M.; Goto, S.; Furumichi, M.; Tanabe, M.; Hirakawa, M. Nucleic Acids Res. 2010, 38, D355–D360.

Culture Conditions. Synechococcus sp. PCC 7002 was grown in 1047 MN Marine medium (ATCC Medium 957) containing 10 mM HEPES (pH 8.2) and 2 mg/mL of sodium bicarbonate. [13C]sodium bicarbonate and [15N]sodium nitrate were used in the respective media for uniform stable isotope labeling. Cells were grown as 50 mL cultures in closed 50 mL centrifuge tubes in a 4080 Innova incubator at 33 °C, 200 rpm. Cultures were subjected to a 14/10 h light/dark cycle (Eiko metal halide light bulb, MH400/U) with a light intensity of approximately 3750 lx. The culture of Synechococcus sp. PCC 7002 obtained from ATCC was subcultured two times in standard Erlenmeyer flasks using standard 1047 MN Marine medium, cells were pelleted, resuspended in one tenth of the original culture volume in 1047 MN Marine media containing 10 mM glycerol and 5% (v/v) methanol, and aliquoted into vials to be stored under cryogenic conditions. Cells from a cryogenic vial were subcultured one time before inoculation of the centrifuge tubes was performed (resulting optical density at 730 nm (OD730) was less than 0.008). OD730 ) 1 is equivalent to 2.45 ± 0.12 × 108 cells mL-1.27 After reaching an OD730 of approximately 0.3, centrifuge tube cultures were subjected to metabolite extraction. Metabolite Extraction. A volume of 2 mL of each culture was set aside for OD measurements. The remaining 48 mL were centrifuged at 4500g for 10 min. Supernatant was set aside for metabolite extraction of the media. The pellet was resuspended in 1 mL of cold methanol and sonicated for 20 s with a Branson Sonifier 250 (output 1, duty cycle 100%). The suspension was transferred to a 2 mL eppendorf tube and centrifuged at 2350g for 10 min. Supernatant was dried down with Savant SpeedVac Plus SC110A and redissolved in 100 µL (volume normalized according to culture OD) of methanol containing 1 µg mL-1 of 2-amino-3-bromo-5-methylbenzoic acid (ABMBA, Sigma-Aldrich) as an internal standard. A volume of 1.8 mL of the culture media supernatant was dried down and redissolved in 100 µL of methanol containing 1 µg mL-1 of ABMBA. Redissolved samples were stored overnight at 4 °C, filtered using 0.20 µm PVDF membrane microcentrifugal filters (National Scientific), and analyzed using LC-MS. LC-MS Analysis. Samples were analyzed using an Agilent 1200 capillary LC system with an Agilent 6520 dual-ESI-Q-TOF mass spectrometer. A ZIC-HILIC column (150 mm × 1 mm, 3.5 µm 100 Å, Merck Sequant) was used for LC separation with the following LC conditions: solvent A, 5 mM ammonium acetate; solvent B, 90% acetonitrile with 5 mM ammonium acetate; timetable: 0 min, 100% B; 3 min, 100% B; 33 min, 0% B; 43 min, 0% B; 45 min, 100% B; 65 min, 100% B. Flow rate: 20 µL min-1. Injection volume: 1 µL for scan only MS profiling (MS1) and 4 µL for MS/MS analysis. MS/MS analysis was performed by specifying a preferred list of precursor ions based on the analysis of profiling data (collision energy, 10 eV). Three samples corresponding to cell extracts (control, 13Clabeled, and 15N-labeled cultures) along with three samples corresponding to growth media extracts (control, 13C, and 15N) were analyzed by LC-MS in both positive and negative mode. A total of 12 raw MS1 data sets were thus generated. MS/MS (27) Zhao, J.; Shen, G.; Bryant, D. A. Biochim. Biophys. Acta 2001, 1505, 248– 257.

Analytical Chemistry, Vol. 82, No. 21, November 1, 2010

9035

analysis was later performed on selected precursor ions on unlabeled control samples. MS/MS analysis of some precursor ions, the abundance of which was not sufficient in control samples, was performed with extracts from a different larger culture. Data Analysis. Agilent MassHunter Workstation Software Qualitative Analysis (version B.03.01) was used to extract features from four control raw data sets (cell and media extracts in positive and negative mode). The following settings for feature extraction were applied: absolute height threshold, 500 counts; allowed ion species were + H+, + Na+, + K+ in positive mode and - H+ in negative mode; maximum charge state 2. Chemical formulas were assigned to extracted compounds (as referred to by MassHunter) by MassHunter’s formula generator. The resulting compound tables were exported and used as a basis for manual curation. Raw data sets for the corresponding labeling triplicates (control, 13C, 15N) were preprocessed by a MathDAMP package28 (described and available via http://mathdamp.iab.keio. ac.jp/), and three-way comparison visualizations29 were generated. Annotation labels corresponding to elements of compound tables generated by MassHunter were overlaid on three-way comparison visualizations. Chemical formulas in compound tables generated by MassHunter were iteratively curated and corrected if necessary. Additional chemical formulas were added for labeling patterns without annotation labels on three-way comparison plots. Presence of corresponding spectral features in 13C and 15N data sets and consistency of accurate masses of these features (inspected using MassHunter software) was necessary for chemical formula confirmation. Features with confirmed chemical formulas along with features having clear labeling patterns but without confirmed chemical formulas (due to ambiguity) were clustered based on their retention times (RTs). Redundant spectral features (e.g., adducts or fragments) corresponding to a single metabolite were identified with the help of this clustering and inspection of corresponding mass spectra in MassHunter. [M + H]+ peaks of putative metabolites (and their chemical formulas) responsible for redundant spectral features were then identified. Putative metabolites with identical chemical formulas and similar RTs in the four control data sets (cell and media extract in positive and negative mode) were cross-linked, and a table of putative metabolites was generated. The resulting table of putative metabolites was used to generate a list of preferred precursor ions for MS/MS analysis. MS/MS spectra corresponding to putative metabolites were extracted and analyzed against records in the MassBank database.30 Correspondence of major fragment ions between database records and MS/MS spectra from the samples was required for metabolite identification. For MS/MS spectra of metabolites without a match (28) Baran, R.; Kochi, H.; Saito, N.; Suematsu, M.; Soga, T.; Nishioka, T.; Robert, M.; Tomita, M. BMC Bioinf. 2006, 7, 530. (29) Baran, R.; Robert, M.; Suematsu, M.; Soga, T.; Tomita, M. BMC Bioinf. 2007, 8, 72. (30) Horai, H.; Arita, M.; Kanaya, S.; Nihei, Y.; Ikeda, T.; Suwa, K.; Ojima, Y.; Tanaka, K.; Tanaka, S.; Aoshima, K.; Oda, Y.; Kakazu, Y.; Kusano, M.; Tohge, T.; Matsuda, F.; Sawada, Y.; Hirai, M. Y.; Nakanishi, H.; Ikeda, K.; Akimoto, N.; Maoka, T.; Takahashi, H.; Ara, T.; Sakurai, N.; Suzuki, H.; Shibata, D.; Neumann, S.; Iida, T.; Tanaka, K.; Funatsu, K.; Matsuura, F.; Soga, T.; Taguchi, R.; Saito, K.; Nishioka, T. J. Mass Spectrom. 2010, 45, 703–714.

9036

Analytical Chemistry, Vol. 82, No. 21, November 1, 2010

in the database, chemical formulas were calculated for fragment ions as well as corresponding neutral losses using MassHunter software. These product ions and neutral losses were then searched against the database to identify structurally related compounds and to support the elucidation of structures of unknown metabolites. Identity of some metabolites was assigned only putatively, if structural isomers or stereoisomers could not be unambiguously discriminated. Draft Metabolic Network Reconstruction. PathoLogic module of Pathway Tools software version 13.031 was used for the draft metabolic network reconstruction of Synechococcus sp. PCC 7002. The reconstruction was performed along the workflow outlined in the User Guide of Pathway Tools. FASTA files with DNA sequences of genetic elements along with annotations of genetic elements of Synechococcus sp. PCC 7002 were downloaded from GenBank (Genome Project ID, 28247). The resulting pathway/genome database (PGDB) is available as Supporting Information. Culture Purity Evaluation. The purity of cultures was thoroughly evaluated by a number of approaches. Uniform culture morphology was evaluated microscopically, and the presence of nonphotosynthetic heterotrophs was assayed for by inoculation of the culture into rich media and incubation in the dark. The media used were LB media (LB Broth Miller, EMD), 5× diluted LB media, and MEM Alpha media (GIBCO). Finally, to confirm purity, nucleic acids were extracted from cultures, assayed by PCR for bacterial, fungal, and archaeal signatures, and bacterial amplicons were sequenced. Total nucleic acids were extracted from pelleted cultures (50 mL, OD730 approximately 0.5) according to a previously published protocol.32 Briefly, samples were resuspended in modified CTAB buffer (10% CTAB, 250 mM phosphate, 300 mM NaCl), transferred to a lysing matrix E tube (MP biomedicals), and an equal volume of phenolchloroform-isoamylalcohol (25:24:1) added. Samples were agitated in the FastPrep bead beater (MP Biomedicals: 2 × 20 s, 5.5 m/s) and centrifuged (16 000g, 5 min, 4 °C). The aqueous phase was removed to a new tube, and an equivalent volume of chloroform-isoamylalcohol (24:1) added to yield a crude nucleic acid extract and the solution centrifuged again. Nucleic acids were precipitated with PEG-NaCl, and the crude nucleic acid pellet was washed in 70% ethanol. The pellet was dissolved in nuclease-free (DEPC-treated) water and purified using the DNA/RNA Allprep kit (Qiagen, Valencia, CA) following the manufacturer’s protocol. Purified DNA was eluted in buffer EB (2 × 30 µL) and used as genomic DNA for PCR amplifications. Cultures were screened for the presence of fungal and archaeal contaminants using primers amplifying regions of the 28S and 16S rRNA genes, respectively (primer sequences are presented in Figure S-1a in the Supporting Information). PCR amplifications were performed using Takara ExTaq DNA polymerase (Takara, Madison, WI), with the following thermocycling parameters: initial denaturation at 95 °C for 1 min followed by 25 cycles of 95 °C for 30 s, 58 °C for 30 s, and 72 °C for 2 min. Final product (31) Karp, P. D.; Paley, S. M.; Krummenacker, M.; Latendresse, M.; Dale, J. M.; Lee, T. J.; Kaipa, P.; Gilham, F.; Spaulding, A.; Popescu, L.; Altman, T.; Paulsen, I.; Keseler, I. M.; Caspi, R. Brief. Bioinf. 2010, 11, 40–79. (32) Ivanov, I. I.; Atarashi, K.; Manel, N.; Brodie, E. L.; Shima, T.; Karaoz, U.; Wei, D.; Goldfarb, K. C.; Santee, C. A.; Lynch, S. V.; Tanoue, T.; Imaoka, A.; Itoh, K.; Takeda, K.; Umesaki, Y.; Honda, K.; Littman, D. R. Cell 2009, 139, 485–498.

Figure 1. Curation of spectral features. Spectral features with labeling patterns found by MassHunter software (main pie charts) were complemented by features found manually during the curation process (detached elements) from unlabeled control data sets of cell and media extracts analyzed in positive (+) and negative (-) polarity modes.

extension was at 72 °C for 7 min. Additionally, the V1- and V3regions of the bacterial 16S rRNA gene were amplified with biotinylated primers (Figure S-1a in the Supporting Information) using the same thermocycling conditions as above except annealing temperatures were decreased to 55 °C. Both regions were sequenced using the PyroMark system (PyroMark Q96 ID instrument, Qiagen, Valencia, CA) according to manufacturer’s protocols, sequences were trimmed for quality, and the closest matching sequences were identified using BLASTn.33 RESULTS Spectral feature extraction by MassHunter software from a raw data set corresponding to the analysis of the cell extract control sample in positive mode yielded 1259 feature sets (with m/z e 820). Overlay of extracted features on the visualization of the threeway comparison of the control, 13C, and 15N data sets shows that only a small proportion of the extracted features corresponds to features with labeling patterns (Figure S-2 in the Supporting Information). Annotation labels without characteristic labeling patterns correspond to chemical background or noise. Multiple clear labeling patterns remained unannotated as the features may correspond to adduct ions (e.g., [M + Na]+, [2M + H]+) and are part of a MassHunter extracted feature set or the features were not detected by MassHunter software. Clear cases of features undetected by MassHunter are the ones which are chromatographically separated from other labeling patterns (Figure S-2 in the Supporting Information). Out of 946 chemical formulas assigned by MassHunter software to the above features, 94 with clear labeling patterns were confirmed, 44 corrected, and 45 added for unannotated labeling patterns (Figure 1a). Similar proportions were observed for data sets of cell extract analyzed in negative mode and growth media extracts analyzed in both positive and negative mode (Figure 1). Growth media extract data sets contained significantly fewer features with labeling patterns compared to cell extracts (Figure 1). RT-based clustering of confirmed features with labeling patterns (Figure S-3 in the Supporting Information), identification of redundant spectral features, and cross-linking of putative metabolites across four analysis data set groups led to the finding of 82 distinct metabolites with assigned chemical formulas (Table 1). Chemical formulas could not be assigned to some chromato(33) Altschul, S. F.; Gish, W.; Miller, W.; Myers, E. W.; Lipman, D. J. J. Mol. Biol. 1990, 215, 403–410.

graphically separated labeling patterns which correspond to distinct metabolites. This was either due to ambiguity between alternative chemical formulas (at higher m/z) or no chemical formula consistent with the labeling pattern could be assigned to the metabolite (probably containing unusual elements or forming unusual adduct ions). Additionally, some features with confirmed chemical formulas were not assigned as distinct metabolites if the features were not sufficiently distinguished from other confirmed metabolites either by chromatographic separation or elemental composition (Figure S-3 in the Supporting Information). Out of the 74 unique chemical formulas of found metabolites, 45 did not correspond to any of the 805 metabolites from the reconstructed draft metabolic network of Synechococcus sp. PCC 7002 (Table 1). Moreover, 26 of the unique chemical formulas did not correspond to any of the metabolites in KEGG or MetaCyc databases (Table 1). Out of the 82 metabolites with assigned chemical formulas, 58 could be identified or putatively identified based on the RTs of standards or analysis of MS/MS spectra (Table 1). Evidence for the identification of some of these metabolites is described in more detail below. The overlay of confirmed spectral features on three way comparison results along with confirmed chemical formulas and metabolite identifications is shown in Figure 2 and Figure S-4 in the Supporting Information. Culture purity was confirmed by microscopic, rich-media cultivation and biomarker sequencing which yielded only bacterial PCR amplification products, and these corresponded exactly to Synechococcus sp. PCC 7002 (Figure S-1b in the Supporting Information). Thus, contamination was eliminated as a possible source of unexpected metabolites. Amino acids were identified in both the cell and media extract samples based on the correspondence of their RTs to those of standards, as well as matching of main product ions in the MS/ MS spectra to database records (Table 1, Figure S-5aa-al in the Supporting Information). Various amino acid derivatives were identified based on the analysis of MS/MS spectra, including a series of histidine derivatives (Figure S-5am-au in the Supporting Information). γ-Glutamyl dipeptides with various amino acids were identified in cell extract samples (Figure 3 and Figure S5av-bb in the Supporting Information). Identification of these dipeptides was not performed against standards but was based on fragmentation patterns characteristic Analytical Chemistry, Vol. 82, No. 21, November 1, 2010

9037

Table 1. Found Distinct Metabolites to Which Chemical Formulas Could Be Assigneda peak height no.

RT (min)

neutral mass

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71

6.90 ± 0.03 6.96 ± 0.06 7.36 ± 0.02 8.39 ± 0.01 8.85 ± 0.03 9.62 ± 0.02 10.03 ± 0.01 10.35 ± 0.08 11.49 ± 0.50 12.59 ± 0.58 12.37 ± 0.36 12.93 ± 0.04 13.03 ± 0.08 12.89 ± 0.13 13.13 ± 0.05 13.07 ± 0.18 13.67 ± 0.39 13.43 ± 0.02 14.29 ± 0.01 14.29 ± 0.02 14.65 ± 0.19 14.80 ± 0.01 15.33 ± 0.03 15.82 ± 0.00 15.93 ± 0.04 15.98 ± 0.02 16.18 ± 0.08 16.68 ± 0.01 16.74 ± 0.07 16.82 ± 0.06 17.22 ± 0.07 17.65 ± 0.11 17.67 ± 0.01 18.32 ± 0.04 18.49 ± 0.04 18.56 ± 0.04 18.63 ± 0.05 18.72 ± 0.02 18.76 ± 0.05 18.91 ± 0.06 18.93 ± 0.03 19.04 ± 0.01 19.14 ± 0.02 19.12 ± 0.03 19.16 ± 0.01 19.18 ± 0.03 19.38 ± 0.01 19.52 ± 0.03 19.51 ± 0.04 19.62 ± 0.01 19.63 ± 0.00 19.62 ± 0.06 19.69 ± 0.01 19.76 ± 0.02 19.80 ± 0.01 19.85 ± 0.04 19.89 ± 0.02 19.92 ± 0.02 19.99 ± 0.08 19.93 ± 0.01 19.95 ± 0.03 19.95 ± 0.03 20.04 ± 0.06 20.09 ± 0.01 20.86 ± 0.01 20.83 ± 0.06 20.99 ± 0.16 20.95 ± 0.02 21.21 ± 0.01 21.24 ± 0.06 21.56 ± 0.01

242.0806 122.0365 316.1405 281.1117 258.0831 135.0539 267.0952 233.0510 556.3825 139.0632 285.3032 242.1247 544.3462 301.2978 476.2480 315.3131 528.3495 297.1044 275.0992 354.1257 161.0683 477.3292 438.0904 444.1487 363.1070 111.0431 165.0783 295.1268 131.0944 204.0884 131.0944 149.0510 254.0993 181.0733 117.0787 294.1199 129.0427 656.2228 260.1359 219.1110 260.1351 656.2223 243.1040 333.1310 242.0180 278.0923 268.0784 277.9946 246.1222 189.0631 243.0855 310.1150 342.1143 289.1287 232.1057 229.0887 343.0836 333.0458 160.0833 416.1502 218.0896 115.0632 118.0267 147.0531 416.1535 89.0481 192.0251 218.0893 285.1436 160.0838 229.0876

9038

cell extract

neutral mass error (ppm)

formula

0.9 -2.3 -5.7 -2.5 -8.1 -4.4 -5.8 -16.7 -2.0 -0.9 0.1 -8.1 -1.9 -0.9 -0.5 -2.0 -5.3 -9.8 -4.7 2.1 -3.2 -2.0 -8.3 -1.2 1.0 -1.5 -4.1 0.3 -1.7 -7.2 -1.7 -0.3 -3.4 -3.3 -2.4 -5.7 0.8 -7.3 -5.1 1.5 -8.2 -8.1 -0.6 -4.4 -4.8 -4.8 -3.9 -3.8 2.5 -3.3 -0.1 -4.8 -5.6 4.5 -1.0 0.9 -0.6 -4.9 -9.3 -6.7 -3.1 -1.1 0.8 -0.4 1.2 4.7 -9.9 -4.5 -0.4 -6.2 -3.9

C12H10N4O2 C7H6O2 C17H20N2O4 C11H15N5O4 C10H14N2O6 C5H5N5 C10H13N5O4 C9H7N5O3 C28H52N4O7 C7H9NO2 C18H39NO C11H18N2O4 C26H48N4O8 C18H39NO2 C20H36N4O9 C19H41NO2 C26H48N4O7 C11H15N5O5 C11H17NO7 C16H22N2O5S C6H11NO4 C24H47NO8 C17H19N4O8P C17H24N4O10 C16H17N3O7 C4H5N3O C9H11NO2 C11H21NO8 C6H13NO2 C11H12N2O2 C6H13NO2 C5H11NO2S C9H18O8 C9H11NO3 C5H11NO2 C14H18N2O5 C5H7NO3 C25H40N2O18 C11H20N2O5 C9H17NO5 C11H20N2O5 C25H40N2O18 C10H17N3O2S C16H19N3O5 C6H11O8P C10H18N2O5S C9H16O9 C5H12O9P2 C10H18N2O5 C7H11NO5 C9H13N3O5 C14H18N2O6 C12H22O11 C11H19N3O6 C9H16N2O5 C9H15N3O2S C13H17N3O6S C9H12N5O7P C6H12N2O3 C15H28O13 C8H14N2O5 C5H9NO2 C4H6O4 C5H9NO4 C15H28O13 C3H7NO2 C6H8O7 C8H14N2O5 C11H19N5O4 C6H12N2O3 C9H15N3O2S

(+)

(-)

media extract (+) 280

3000 5215 310 2150 1498

220 825 120 220

formula matches in

(-) 480 2400 3988 570

metabolite benzoate methyladenosine methyluridine adenine adenosine

2595 500 3513 746 500 1000 450 2586

250 dihydrosphingosine (dihydrosphingosine + CH2) 570 350

3599 791

260 140

580

methylguanosine (N-acetylmuramate - H2O)

700 1359 1295 952 200 550 12 860 690 2089 1778 1200

(dihydrosphingosine + C6H8O6) 977

8878

24 417

8259

1992 2264 1301

4093 929 4427 950

1707 1532

452 242 658 300 8259 8214 600 8517 5411 4507 4097 184 691 90 745 4736 1701 800 4672 2112 174 581 152 126 1830 4016 1929 1555 3995 11 525 4036 39 066 163 000 7416 53 520 10 973 22 110 31 690 19 715 7017 3648 19 819 83 700 2409 2032 9430 4845 808 1021 590 5391 2018 6034 2900 16 751 1400 6967 228 714 44 229 1752 3987 7325 1000 1643 2376 908 799 28 479 2192

Analytical Chemistry, Vol. 82, No. 21, November 1, 2010

cytosine phenylalanine (GlcNAc + glycerol - H2O) leucine tryptophan isoleucine methionine (glucosylglycerol) tyrosine valine γ-Glu-Phe oxoproline (hexos(amine)-based oligomer) γ-Glu-Leu (γ-Glu-Ile) (hexos(amine)-based oligomer) γ-Glu-Trp (hexosephosphate - H2O) γ-Glu-Met (glucosylglycerate) γ-Glu-Val acetylglutamate (histidine + C3H6O4 - H2O) γ-Glu-Tyr (2Hexoses - H2O) (γ-Glu-methylglutamine) (Glu-aminobutyrate)

2479

1500

(methylglutamine) (2hexoses + glycerol - 2H2O) (γ-Glu-Ala) proline succinate glutamate (2hexoses + glycerol - 2H2O) (alanine) (citrate/isocitrate) (Ala-Glu) (Ala-Ala) (N,N,N-trimethylthiohistidine)

7002 MetaCyc KEGG 0 1 0 0 0 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 2 1 2 1 1 1 1 0 1 0 0 1 0 0 0 0 0 0 0 1 0 1 1 0 2 0 0 0 0 0 1 0 0 1 1 3 0 4 2 0 0 1 0

1 6 0 1 0 2 3 0 0 2 0 0 0 1 0 0 0 3 1 0 5 0 0 0 1 1 4 0 8 2 8 5 2 8 8 1 6 0 0 1 0 0 1 0 1 0 2 1 0 3 2 0 26 0 3 0 0 0 3 0 3 4 4 9 0 7 10 3 0 3 0

1 6 0 2 2 1 3 0 0 1 0 0 0 1 0 0 0 1 2 0 9 0 1 0 0 1 4 0 11 7 11 4 2 5 10 1 5 0 0 1 0 0 1 0 2 0 1 1 0 3 3 0 29 0 4 1 0 0 4 0 1 3 3 10 0 8 9 1 0 4 1

Table 1. Continued peak height no.

RT (min)

neutral mass

72 73 74 75 76 77 78 79 80 81 82

21.49 ± 0.16 21.69 ± 0.06 21.76 ± 0.00 21.95 ± 0.01 21.91 ± 0.08 22.00 ± 0.08 22.06 ± 0.02 22.05 ± 0.12 22.89 ± 0.01 24.81 ± 0.10 25.89 ± 0.06

146.0686 132.0535 213.1113 188.1151 251.0983 175.0949 89.0478 105.0430 197.1156 303.1554 299.1582

neutral mass error (ppm) -3.7 0.1 -0.2 -5.3 -8.8 -4.5 1.4 3.9 -4.2 3.7 -3.9

cell extract formula

(+)

(-)

1424 1786 C5H10N2O3 C4H8N2O3 850 C9H15N3O3 970 C8H16N2O3 2580 350 C9H17NO7 500 C6H13N3O3 935 C3H7NO2 1650 C3H7NO3 1506 1300 C9H15N3O2 69 974 2444 C11H21N5O5 250 C12H21N5O4 830 330

media extract (+)

formula matches in

(-)

metabolite glutamine asparagine (N,N,N-trimethylhydroxyhistidine) N2-acetyllysine citrulline (sarcosine/β-Ala) serine (N,N,N-trimethylhistidine) (Glu-Arg)

7002 MetaCyc KEGG 1 3 0 1 0 1 4 1 0 0 0

6 7 0 5 0 1 7 3 1 0 0

5 6 2 4 1 1 8 3 1 0 0

a Names of metabolites are shown for ones which could be identified or putatively identified (in parentheses). The height of the [M + H]+ or [M - H]- peaks is listed for unlabeled control samples and analysis polarity modes where the respective metabolite was detected. Intensities of peaks found manually (not by MassHunter software) are rounded. RTs correspond to peaks of the metabolite in the first unlabeled control sample where it was detected. Standard deviations in RT from the corresponding peaks in the labeling data sets are also shown. Numbers of metabolites matching the chemical formulas in the draft metabolic network of Synechococcus sp. PCC 7002 (7002), MetaCyc, and KEGG databases are also listed. Names of these matching metabolites can be found in Figure S-3 in the Supporting Information.

for peptides.34 Glutamate is at the N-terminus of these dipeptides as shown by characteristic patterns of y1 fragments of the second amino acid in positive mode fragmentation spectra (Figure 2a and Figure S-5av-bd in the Supporting Information). Occurrence of a prominent fragment ion with a m/z value of 128 in negative mode MS/MS spectra of these peptides (Figure 3b and Figure S-5av-bb in the Supporting Information) is indicative of a γ-glutamyl linkage.35 A γ-glutamyl linkage could not be confirmed for two dipeptides as negative mode MS/MS spectra of sufficient quality were not obtained (Figure S-5bc,bd in the Supporting Information). One of the detected dipeptides containing glutamate and alanine (or an isomer of alanine) does not fit the above pattern and appears to have the glutamate residue at the C-terminus as indicated by a characteristic y1 fragment of glutamate with a m/z value of 148 in the positive mode, lack of a fragment with a m/z value of 128, and the characteristic neutral loss of CO2 in the negative mode35 (Figure S-4be in the Supporting Information). Alanyl-alanine, with an chemical formula identical with methylglutamine but different MS/MS spectra (Figure S-5ao,bf in the Supporting Information), was putatively identified by a y1 fragment of alanine in the negative mode with a m/z value of 88 (Figure S-5bf in the Supporting Information). γ-Glytamyl leucine and γ-glutamyl isoleucine were discriminated by the extrapolation of chromatographic properties of leucine and isoleucine (leucine eluting before isoleucine) to these dipeptides. Analysis of MS/MS spectra led to the identification of a series of saccharide derivatives including a putative trisaccharide of a hexose, N-acetylglucosamine (GlcNAc), and an oxidized version of N-acetylmuramic acid (MurNAc, Figure S-5bg-bn in the Supporting Information). Two chromatographically separated sets of spectral features correspond to this metabolite as well as to a metabolite which appears to be a condensation product of glycerol and two hexoses (Table 1 and Figure S-4 in the Supporting Information). Dehydrated versions of MurNAc and a hexose phosphate (HexP) were identified based on the correspondence of their MS/MS spectra to database records of MurNAc and HexP (34) Johnson, R. S.; Martin, S. A.; Biemann, K. Int. J. Mass Spectrom. Ion Processes 1988, 86, 137–154. (35) Harrison, A. G. J. Mass Spectrom. 2004, 39, 136–144.

following a neutral loss of water (Figure S-5bm,bn in the Supporting Information). Spectral feature sets of these two metabolites did not contain peaks corresponding to either [MurNAc + H]+ or [HexP + H]+. Additional identified metabolites include nucleobases and nucleosides, organic acids, and dihydrosphingosine (Figure S-5bo-bu in the Supporting Information). A series of methylated nucleosides was detected predominantly in the culture media extracts (Table 1 and Figures S-6 and S-5br,bs in the Supporting Information). Negative mode MS/MS spectra of one of these metabolites contained neutral losses of CHNO and C3H6O3, which are characteristic for cytidine and uridine according to the MassBank database. The MS/MS spectra also contained a fragment with a m/z value of 125.0347, which could correspond to thymine. This information along with the chemical formula of the metabolite (C10H14N2O6) pointed to a nucleoside of thymine with a pentose (instead of a deoxyribose in thymidine). There is no metabolite with a matching chemical formula in the MetaCyc database (Table 1). Two metabolites in KEGG database match the chemical formula, one of them being spongothymidine, a nucleoside of thymine and arabinose (Supplementary Table 1 in the Supporting Information). Spongothymidine and 5-methyluridine (a nucleoside of thymine and ribose) were analyzed by LC-MS and showed identical RTs to the putative metabolite. Comparison of MS/MS spectra of spongothymidine and putative metabolite showed a discrepancy of relative intensities of fragment ions (Figure S-6 in the Supporting Information). The MS/MS spectra of the putative metabolite were very close to those of 5-methyluridine (Figure S-6 in the Supporting Information). DISCUSSION Metabolite identification significantly benefited from the availability of stable isotope labeling data sets. Chemical noise could be discriminated from metabolites and chemical formulas could be assigned unambiguously to a larger set of spectral features with higher m/z values that is consistent with previous observations.11 An apparent small proportion of distinct metabolites (82) with assigned chemical formulas against large numbers of spectral Analytical Chemistry, Vol. 82, No. 21, November 1, 2010

9039

Figure 2. Overall view of the labeling data. Three-way comparison visualization of the data sets corresponding to the positive mode LC-MS analysis of cell extracts of the unlabeled control, 13C, and 15N-labeled cultures of Synechococcus sp. PCC 7002. Annotation labels of [M + H]+ ions of distinct metabolites are shown in black and bold, and labels corresponding to fragment, adduct, or unidentified ions are shown in gray. The inset shows the mass spectra corresponding to tryptophan. Dashed colored lines show the correspondence of signals between the mass spectra and the threeway comparison visualization. Only a part of the result is shown in this figure. Figure S-4 in the Supporting Information shows the full range along with visualizations of comparisons corresponding to negative mode analysis of cell extracts and positive and negative mode analysis of growth media extracts. The data were binned at 0.5 m/z units for visualization. Lanes with light gray background correspond to bins containing integer m/z values.

9040

Analytical Chemistry, Vol. 82, No. 21, November 1, 2010

features initially found by MassHunter in control data sets (>1000) is due to redundancy (fragments and adducts, e.g., stack of features related to glucosylglycerol in Figure 2), chemical background and noise (features without labeling patterns, Figure S-2 in the Supporting Information), and difficulty in assigning formulas unambiguously to some features with high m/z values. The higher proportion of labeling features without assigned chemical formulas in negative mode data sets (Figure 1b,d) compared to positive mode data sets (Figure 1a,c) is likely a result of a higher proportion of peaks corresponding to ions with higher m/z values (>500) in negative vs positive mode data sets (38% and 32% vs 23% and 16%). Three-way comparison visualizations29 of the labeling data sets proved useful for the curation of the data as the visualizations provide a convenient overall view of the data (Figure 2). Labeling with stable isotopes was not complete due to sub-100% purity of the stable isotope labeled substrates, carryover of unlabeled metabolites from the inoculum, and uptake of unlabeled CO2 from the headspace. This led to additional smaller peaks in labeling data sets (Figure 2). Nitrogencontaining metabolites are presented as characteristic patterns of red, blue, and green. Candidate chemical formulas overlaid on such visualizations can be easily confronted with the labeling pattern for correct numbers of carbons and nitrogens as the first step in the curation process (Figure S-2 in the Supporting Information). MassHunter software was used for the finding of spectral features and assignment of candidate chemical formulas in the current study, but it can be replaced with open-source alternatives such as XCMS3 and HR2.4 Labeling patterns of metabolites without nitrogens should consist of two sets of green signals, but the signals for control and 15N data sets provide color gradients due to minor misalignments of peaks from these data sets (e.g., putative glucosylglycerol in Figure 1). Labeling patterns for metabolites containing equal numbers of carbons and nitrogens should consist of two sets of red signals, but the signals for 13C and 15N-labeled data sets are shifted toward purple due to slightly lower signal intensities in the 15N-labeled data sets (e.g., adenine in Figure S-4a in the Supporting Information). Peaks which do not change across the three labeling data sets should not provide any signals on the three-way comparison visualization. Noise or misalignments between background peaks (especially large ones as HEPES at RT 20.27 min and m/z value of 239 in Figure 2) may lead to signals on the three-way comparison visualizations, but these signals are easily distinguishable from characteristic labeling patterns. The visualizations were key in confirming labeling patterns for features detected by external software (MassHunter) and finding labeling patterns of additional undetected features (Figure 1 and Figure S-1 in the Supporting Information). Hierarchical clustering of confirmed features with labeling patterns according to RT (Figure S-3 in the Supporting Information), along with the inspection of labeling patterns on three-way comparison visualizations, helped the grouping of features corresponding to a single metabolite. Shifts in RTs of corresponding peaks in labeling data sets (probably due to small differences in analytical conditions across runs) for some metabolites (e.g., phenylalanine or leucine in Figure 2) or stability of RTs across labeling data sets (e.g., glutamate and acetylglutamate in Figure

Figure 3. γ-Glutamyl phenylalanine identification. MS/MS spectra of γ-glutamyl phenylalanine in positive (a) and negative (b) mode. Positive (c) and negative (d) mode MS/MS spectra of phenylalanine are shown for comparison to y1 fragments of γ-glutamyl phenylalanine. Dominant fragment peak with m/z value of 128 in negative mode MS/MS spectra of γ-glutamyl phenylalanine (b) is indicative of a γ-glutamyl linkage.35 A total of MS/MS spectra of additional identified γ-glutamyl dipeptides are shown in Figure S5av-bb in the Supporting Information.

2) were also indicative of correspondence of features. Assignment of fragment and adduct ions can also be supported by stable isotope labeling15 and is easily observable on three-way comparison visualizations. In-source fragmentation leads to the modification of both m/z and the labeling pattern as for common neutral losses of amino acids of H2O and CO for phenylalanine or leucine or of NH3 for tryptophan36 (Figure 2). Adduct formation leads to the modification of m/z without the modification of the labeling pattern as for the ammonia adduct of putative glucosylglycerol (m/z 272 in Figure 2) or acetate adduct of putative N,N,Ntrimethylthiohistidine (m/z 288 in Figure S-4a in the Supporting Information). Three-way comparison visualizations can also help to identify detected peaks which do not belong to a distinct metabolite (false positives). For example, the peak corresponding to C4H5NO at RT 20.63 min with m/z value of 83 (Figure 2) is chromatographically separated from other features (Figure S-3a in the Supporting Information). This peak most probably does not correspond to a distinct metabolite but is an artifact of a dip in a tail of a peak of a glutamate fragment ion. The dip is probably due to an ion suppression effect caused by large amounts of HEPES (RT 20.27). The ion suppression effect is visible on threeway comparison visualizations as dips in tails of multiple ions in the elution time range of HEPES (Figure 2). Two metabolites were putatively identified as MurNAc missing H2O and a hexose phosphate also missing H2O (Figure S-5bm,bn in the Supporting Information). Absence of peaks corresponding to [MurNAc + H]+ and [HexP + H]+ in profiling (MS1) data sets indicates that either these metabolites are present in the cell extract or the cell extract contains MurNAc and HexP which undergo a complete neutral loss of water during the ionization process. Alternatively, the detected ions could be fragments of even larger molecules. Detection of [M + H]+ or [M-H]- ions may not be possible for all metabolites due to their limited stability during either metabolite extraction or LC-MS analysis. These possibilities have to be taken into consideration during the interpretation of the data. (36) Nagy, K.; Taka´ts, Z.; Pollreisz, F.; Szabo´, T.; Ve´key, K. Rapid Commun. Mass Spectrom. 2003, 17, 983–990.

Identification of metabolites using correspondence of m/z values of major fragments to database records (visual comparisons and fragment ion and neutral loss searches within MassBank) was validated for several compounds (i.e., amino acids, Figure S-5aa-al in the Supporting Information), the identity of which was also confirmed using RTs of chemical standards. In cases, where an authentic standard was not available, a definitive assignment could not be made. However, spectral databases were extremely helpful in suggesting candidate metabolites. Correspondence of relative intensities of peaks of fragment ions could not be used for more discriminatory identification due to different analytical conditions and probably also instrument types used for MS/MS spectra acquisition in this study and in the database. Excellent correspondence of m/z values of fragment ions was observed for multiple metabolites (e.g., acetylglutamate, Figure S-5an in the Supporting Information); in other cases, significant dissimilarity between fragment ions was used to rule out candidate metabolites from metabolite databases matching the confirmed chemical formula. For example, metabolite with an chemical formula of C9H13N3O5 matches only cytidine in the draft metabolic network of Synechococcus sp. PCC 7002 and few other metabolites in MetaCyc and KEGG (Figure S-3a in the Supporting Information). However, MS/MS spectra of this metabolite do not correspond to MS/MS spectra of cytidine but contain fragment ions and neutral losses characteristic for histidine (Figure S-5ar in the Supporting Information). The metabolite was therefore putatively identified as a histidine derivative, potentially a product of condensation with glycerate. Correspondence of m/z values of fragment peaks, without taking their relative intensities into account, is not sufficient to discriminate between stereoisomers and often also positional isomers37,38 (Figure S-6 in the Supporting Information). Therefore, this information is not included in the names of identified metabolites (e.g., stereoisomers of amino acids, structural isomers of alanine, coeluting citrate and isocitrate, (37) March, R. E.; Stadey, C. J. Rapid Commun. Mass Spectrom. 2005, 19, 805– 812. (38) Berman, E. S. F.; Kulp, K. S.; Knize, M. G.; Wu, L.; Nelson, E. J.; Nelson, D. O.; Wu, K. J. Anal. Chem. 2006, 78, 6497–6503.

Analytical Chemistry, Vol. 82, No. 21, November 1, 2010

9041

or positions of the methyl group on the nucleobases in methylated nucleosides). Spectral features with the highest signal intensities in profiling (MS1) data sets correspond to glucosylglycerol and glucosylglycerate, known abundant compatible solutes,39-42 glutamate, known to be abundant in bacteria,43,44 and putative trisaccharides of a hexose, GlcNAc and an oxidized version of MurNAc (potentially N-acetylglucosamine enolpyruvate; Figure S-5bl in the Supporting Information and Table 1). Additional metabolites with intense peaks in the data sets include putative betaine of histidine (hercynine) and a putative thiol of this metabolite (possibly ergothioneine, Table 1, Figure S-5as,at in the Supporting Information). However, biosynthesis of hercynine and ergothioneine has so far only been attributed to nonyeast-fungi and Actinomycetales bacteria.45-47 These metabolites were detected in multiple independent experiments where culture purity was verified as outlined in the Materials and Methods section indicating biosynthesis by Synechococcus sp. PCC 7002. Identification of a putative hydroxy derivative of hercynine, which is not implicated in ergothioneine biosynthesis in fungi or Actinomycetales,48,49 may point to a different biosynthesis of these metabolites by Synechococcus sp. PCC 7002. γ-Glutamyl dipeptides with various amino acids were another class of unexpected metabolites not covered by MetaCyc and KEGG. γ-Glutamylation is known to increase the solubility of nonpolar amino acids.50 In mammals, γ-glutamyl dipeptides were detected in urine,51 brain,52 and plasma53 and are formed during cellular import of amino acids.54 (39) Kollman, V. H.; Hanners, J. L.; London, R. E.; Adame, E. G.; Walker, T. E. Carbohydr. Res. 1979, 73, 193–202. (40) Borowitzka, L. J.; Demmerle, S.; Mackay, M. A.; Norton, R. S. Science 1980, 210, 650–651. (41) Roberts, M. F. Saline Syst. 2005, 1, 5. (42) Kla¨hn, S.; Steglich, C.; Hess, W. R.; Hagemann, M. Environ. Microbiol. 2010, 12, 83–94. (43) Poolman, B.; Glaasker, E. Mol. Microbiol. 1998, 29, 397–407. (44) Bennett, B. D.; Kimball, E. H.; Gao, M.; Osterhout, R.; Dien, S. J. V.; Rabinowitz, J. D. Nat. Chem. Biol. 2009, 5, 593–599. (45) Genghof, D. S. J. Bacteriol. 1970, 103, 475–478. (46) Fahey, R. C. Annu. Rev. Microbiol. 2001, 55, 333–356. (47) Ey, J.; Scho ¨mig, E.; Taubert, D. J. Agric. Food Chem. 2007, 55, 6466– 6474. (48) Ishikawa, Y.; Israel, S. E.; Melville, D. B. J. Biol. Chem. 1974, 249, 4420– 4427. (49) Seebeck, F. P. J. Am. Chem. Soc. 2010, 132, 6632–6633. (50) Suzuki, H.; Yamada, C.; Kato, K. Amino Acids 2007, 32, 333–340. (51) Buchanan, D. L.; Haley, E. E.; Markiw, R. T. Biochemistry 1962, 1, 612– 620. (52) Reichelt, K. L. J. Neurochem. 1970, 17, 19–25. (53) Wikoff, W. R.; Kalisak, E.; Trauger, S.; Manchester, M.; Siuzdak, G. J. Proteome Res. 2009, 8, 3578–3587. (54) Orlowski, M.; Meister, A. Proc. Natl. Acad. Sci. U.S.A. 1970, 67, 1248– 1255.

9042

Analytical Chemistry, Vol. 82, No. 21, November 1, 2010

CONCLUSIONS In this work, a novel workflow for metabolite identification in untargeted metabolomics was demonstrated. Metabolites were identified using uniform stable isotope labeling, exhaustive identification of labeling patterns on three-way comparison visualizations of raw data, and analysis of MS/MS spectra. A significant proportion of identified metabolites lies outside of the reconstructed draft metabolic network of Synechococcus sp. PCC 7002 and points to unannotated metabolic capabilities or pathways. These metabolites are condensation products or derivatives of known metabolites. However, extensive data curation carried out manually in this study presents a significant bottleneck and therefore it is highly desirable to increase the level of automation of the data analysis procedure. Signal intensities of spectral features of found metabolites spanned the whole dynamic range of the mass spectrometer. In spite of this, only a relatively small number of metabolites were found. A surprising absence of phosphorylated nucleosides and coenzymes from our results can be due to low levels in the studied physiological state or due to limits of applied metabolite extraction protocol. Analysis of more concentrated extracts may lead to the identification of a wider set of metabolites, levels of which were below the detection limit in the current study. Genetic or environmental perturbations may highlight an even wider set of metabolites. Overall, the workflow for metabolite identification presented here is suitable for such comprehensive studies, and the implementation of stable isotope assisted metabolite profiling was well suited for the study of cyanobacteria and may enable quantification of identified metabolites in the future. ACKNOWLEDGMENT We thank Aindrila Mukhopadhyay for valuable discussions, Justin P. Ishida for help with the microbial culture, and Bin Yoo for technical help. This work was part of the U.S. Department of Energy Genomics Sciences program ENIGMA is a Scientific Focus Area Program supported by the U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research, Genomics, GTL Foundational Science through Contract DE-AC02-05CH11231 between Lawrence Berkeley National Laboratory and the U.S. Department of Energy. SUPPORTING INFORMATION AVAILABLE Additional information as noted in text. This material is available free of charge via the Internet at http://pubs.acs.org. Received for review July 29, 2010. Accepted September 28, 2010. AC1020112