Coupling Targeted and Untargeted Mass Spectrometry for

Jun 19, 2017 - ... best-scored transformed coordinates were plotted using EMPeror. .... Black line crossing is a mean baseline noise for all compounds...
1 downloads 0 Views 1MB Size
Subscriber access provided by Queen Mary, University of London

Article

Coupling targeted and untargeted mass spectrometry for metabolomemicrobiome-wide association studies of human fecal samples Alexey V. Melnik, Pieter C. Dorrestein, Ricardo R. da Silva, Embriette R. Hyde, Alexander A. Aksenov, Fernando Vargas, Amina Bouslimani, Ivan Protsyuk, Alan Jarmusch, Anupriya Tripathi, Theodore Alexandrov, and Rob Knight Anal. Chem., Just Accepted Manuscript • Publication Date (Web): 19 Jun 2017 Downloaded from http://pubs.acs.org on June 20, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Analytical Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 13

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Coupling targeted and untargeted mass spectrometry for metabolome-microbiome-wide association studies of human fecal samples Alexey V. Melnik,1 Ricardo R. da Silva,1 Embriette R. Hyde,2 Alexander A. Aksenov,1 Fernando Vargas,1 Amina Bouslimani,1 Ivan Protsyuk,4 Alan K. Jarmusch,1 Anupriya Tripathi,1,2,5 Theodore Alexandrov,1,4,6 Rob Knight,2,3,7 Pieter C. Dorrestein1,8 1

Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA 92093, USA. 2 Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA. 3 Department of Computer Science & Engineering, University of California, San Diego, La Jolla, CA 92093, USA. 4 Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg 69117, Germany. 5 UC San Diego Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA 92093, USA; 6 To whom correspondence should be addressed regarding the Optimus and MRMConvert software: [email protected] 7 To whom correspondence should be addressed regarding the sequencing: [email protected] 8 To whom correspondence should be addressed regarding the mass spectrometry, the project and interpretation: [email protected] ABSTRACT: Increasing appreciation of the gut microbiome’s role in health motivates understanding the molecular composition of human feces. To analyze such complex samples, we developed a platform coupling targeted and untargeted metabolomics. The approach is facilitated through split flow from one UPLC, joint timing triggered by contact closure relays and a script to retrieve the data. It is designed to detect specific metabolites of interest with high sensitivity, allows for correction of targeted information, enables better quantitation thus providing an advanced analytical tool for exploratory studies. Procrustes analysis revealed that untargeted approach provides a better correlation to microbiome data, associating specific metabolites with microbes that produce or process them. With the subset of over one hundred human fecal samples from the American Gut project, the implementation of the described coupled workflow revealed that targeted analysis using combination of single transition per compound with retention time misidentifies 30% of the targeted data and could lead to incorrect interpretations. At the same time, the targeted analysis extends detection limits and dynamic range, depending on the compounds, by orders of magnitude. A software application has been developed as a part of the workflow to allows for quantitative assessments based on calibration curves. Using this approach, we detect expected microbially modified molecules such as secondary bile acids, and unexpected microbial molecules including Pseudomonas-associated quinolones and rhamnolipids in feces, setting the stage for MMWAS (metabolome-microbiome-wide association studies).

Recent studies show that an imbalance of our microbial ecosystems give rise to significant health problems ranging from cancer, asthma, allergies, infections, and weight gain, to neurological disorders such as Parkinson’s disease, autism, and depression.1–4 In addition to that, gut microbiome might be as important to drug metabolism as the liver.5,6 Much of these microbiome-associated ailments were discovered by understanding the fecal microbiome through association studies utilizing microbial marker gene or whole genome sequencing data.7,8 Although these studies contributed to an important knowledge base, we still have limited insight into the molecules associated with the human fecal microbial niche. Yet, this compilation of molecules likely defines the

type of microbial communities a specific gut can support. Most fecal metabolomics studies related to the microbiome are performed through targeted analysis of known microbiomeassociated molecules such as short chain fatty acids, secondary bile acids, and serotonin, and in model systems such as mice.9,10 With murine models, one can control diet and environment as key factors that influence the complexity of the fecal metabolome. With humans it is more difficult to control these and other factors and, therefore, the diversity of molecules that make up the fecal metabolome is vastly greater. To illustrate the complexity associated with human fecal samples from an analytical standpoint, the molecules in the gut originate from ingested, absorbed or inhaled materials such as

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

food, drinks, smoke but also from the human host or its microbes, host- or microbially modified molecules, personal care and hygiene products, clothing, preservative and packaging materials, or anything else that we may be exposed to in our homes, offices, parks and other places we visit throughout the day. Mass spectrometric analysis is the most common tool used for study of complex systems and can be performed using targeted or untargeted methods. Untargeted methods focused on comprehensive profiling of all detectable analytes in the sample of interest. Targeted methods, on the other hand, aimed to detect a panel of specific predefined targets.11,12 Each approach has its own unique advantages. Targeted mass spectrometry tends be more sensitive and more quantitative; however, only a limited number of predefined molecules can be detected in a single MS run. In an untargeted experiment, on the other hand, the investigator can get an inventory of molecules detected by the mass spectrometer and, therefore, possibly discover novel molecules or molecules not expected to be present. This is certainly possible for fecal samples, for which the extent of chemical diversity is unknown. Furthermore, on average, only 2% of all metabolomics data in untargeted liquid chromatography-based mass spectrometry experiments have been annotated.13,14 Although one can perform independent targeted and untargeted experiments, it is difficult to integrate the data, and there is always a certain level of uncertainty regarding whether the sample that went into the targeted workflow was indeed identical to the sample that went into the untargeted workflow or if an ion observed in one mode corresponds to the same ion in the another. This strategy achieves some of the same goals as the non-targeted post-acquisition approach that takes an MS1 scan and when an ion on the list is detected then it is fragmented to enable list dependent MS/MS matches to validate the annotation. While providing a quality assurance of the listed data when it is observed, a major objective of coupling two separate instruments to one chromatography system, the post-acquisition strategy is ultimately limited by the limit of detection and dynamic range achievable in a full MS1 scan compared to a strategy that includes a triple Q and does not enable the discovery of unexpected molecules.15–17 To address these challenges, we describe a workflow designed for same-sample untargeted and targeted metabolomics with two designated standalone MS instruments. Using fecal samples collected from over 100 individuals, we not only highlight the aspects of the workflow that needed to be developed, but also assess the capabilities and performance of this methodology by exploring the chemical composition of subsets of samples in relationship to the microbial inventories as determined via sequencing. This work demonstrates how the single sample dual metabolomics analyses approach can be used for microbiome-wide association studies. This is a key step required for functional understanding of the gut microbiome at the chemical level.

EXPERIMENTAL SECTION A summary of the experimental protocol is provided next, and an extended experimental procedure, including preparation of standards and benchmarking solutions, data processing parameters and LC-MS/MS, can be found in the Supporting Information.

Page 2 of 13

In summary, standard compounds for benchmarking solutions were solubilized according to manufacturer’s recommendations and suspended in 50% MeOH at 10 µM final concentration. Human stool samples on swabs were obtained from a subset of the American Gut Project cohort according to approved HRPP protocol (UCSD IRB #141853). The swabs were extracted in 300 µL 50% MeOH overnight, concentrated in centrifugal evaporator and redissolved in 100 µL of 50% MeOH with internal standard. The samples were injected and chromatographically separated using an Vanquish UPLC (Thermo Fisher Scientific, Waltham, MA), using a 100 x 2.1 mm Kinetex 1.7 µM, C18, 100Å chromatography column (Phenomenex, Torrance, CA), 40°C column temperature, 0.5 mL/min flow rate, mobile phase A 99.9% water (J.T.Baker, LC-MS grade) 0.1% formic acid (Thermo Fisher Scientific, Optima LC/MS), mobile phase B 99.9% acetonitrile (J.T.Baker, LC-MS grade) 0.1% formic acid (Fisher Scientific, Optima LC/MS), with a the following gradient: 0-1 min 5% B, 1-8 min 100% B, 8-10.9 min 100% B, 10.9-11 min 5% A, 11-12 min 5% B. MS analysis was performed in parallel on Orbitrap (Q Exactive, Thermo Fisher Scientific, Waltham, MA) and Triple Quadrupole (TSQ Quantum Access Max, Thermo Fisher Scientific, Waltham, MA) mass spectrometers both equipped with identical HESI-II probe sources. The following probe settings were used for both MS for flow aspiration and ionization: Spray voltage of 3500 V, Sheath gas (N2) pressure of 35 psi, Auxiliary gas pressure (N2) of 10 psi, ion source temperature of 270°C, S-lens RF level of 50 Hz (For Orbitrap only) and Aux gas heater temperature at 440°C. For Orbitrap MS, spectra were acquired in positive ion mode over a mass range of 100-1500 m/z. An external calibration with Pierce LTQ Velos ESI positive ion calibration solution (Thermo Fisher Scientific, Waltham, MA) was performed prior to data acquisition with ppm error of less than 1. Data were recorded with data-dependent MS/MS acquisition mode. Full scan at MS1 level was performed with 35K resolution. MS2 scans were performed at 17.5 K resolution with max IT time of 60ms in profile mode. MS/MS precursor selection windows were set to m/z 2 with m/z 0.5 offset. MS/MS active exclusion parameter was set to 5.0 seconds. Raw MS/MS data were converted to mzXML files using MSConvert.18 For triple quadrupole MS spectra were acquired in multiple reaction monitoring mode(MRM). Transition ions were obtained from the National Institute of Standards and Technology database (NIST)19 for fecal samples. Data acquisition parameters were set as follows: minutes 0-0.5 were sent to waste; minutes 0.1-12 were recorded with collision gas pressure of 1.5 mTorr, isolation width of 0.2 m/z and with time of 0.05 seconds for each transition ion. Data batch centroiding and conversion to mzML format was performed using ReadW20 program on Windows. The workflow MRMConvert was developed to convert mzML into mzXML files suitable for peak integration. MRMConvert is implemented as a KNIME21 workflow and can be downloaded from GitHub (See supplementary information). For detection, filtering, and integration of LC-MS/MS features, the open-source workflow Optimus was used

ACS Paragon Plus Environment

Page 3 of 13

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

(Supporting information - data and code availability). Optimus uses OpenMS, an open source library for MS data processing22 and the KNIME Analytics Platform as a workflow management software.21 In order to automate the calibration curve generation from standards and quantity estimation for samples, scripts and visual interface were created to calculate a linear regression and to estimate quantities. The user interface can be accessed with the following link http://dorresteintesthub.ucsd.edu:3838/PALMS/. Molecular networking and MS/MS database dereplication was performed using GNPS (http:// gnps.ucsd.edu).13 16S rRNA sequencing and metabolomics multivariate comparisons were performed on QIIME 1.9.1.23 After dissimilarity calculation, principal coordinate analysis was performed and used for Procrustes analysis. The resulting best-scored transformed coordinates were plotted using EMPeror.24 Pairwise Spearman correlation was performed according to McHardy, et al.25 The discovered correlations were visualized using Cytoscape software.26 RESULTS AND DISCUSSION To enable single sample - dual metabolomics modes analyses for microbiome-wide association studies (MMWAS), several key steps were implemented. The ultimate goal of this mass spectrometry workflow is to create two tables of mass spectrometry features for targeted and untargeted data, in conjunction with the sample information of all the detected features, the molecular annotations, the propagated annotations through spectral comparisons and, when possible, concentration estimations that can then be used to link to microbial sequencing for molecule - microbiome-wide association studies. To generate these tables for analysis, three steps are required. First, it was necessary to couple two instruments - a triple quadrupole for targeted mass spectrometry and a Q Exactive for untargeted mass spectrometry - to a single ultrahigh performance liquid chromatography (UPLC) instrument to ensure that each of the instruments encounter the same molecules at the same time. To achieve this, we implemented a split flow post-column separation at equal distance from two MS sources. Two computers controlling the instruments were set to trigger the contact closure signal provided by the UPLC to simultaneously start data acquisition (Figure 1). The second requirement was the development of a data conversion and processing pipeline for two distinct MS datasets. To achieve this, we employed the Optimus workflow (See Supporting Information) for finding, integrating, and filtering LC-MS features, where an LC-MS feature is defined as a pair of parent mass and retention time with the corresponding intensity value of area under the curve. The final step was development of a data processing workflow creating two summary tables for targeted and untargeted data, including annotations and sample metadata, for integration with microbiome sequencing data to enable MMWAS.

Assessing the Mass Spectrometry Performance of the Platform. To benchmark the single sample - dual metabolomics analyses workflow and to assess the reproducibility, variability, and the concentration estimation capabilities, we generated a series of data sets with standards with and without human fecal background of increasing complexity. All of these benchmark data sets are publicly accessible though

Global Natural Product Social Molecular Networking (GNPS).13,27–29 This study also serves as a guide for what can realistically be expected from the analysis of real-life complex biological samples such as human fecal material. The first benchmark data set was used to assess the entire workflow from sample to feature finding. The synchronization of data acquisition and the observation of ions on two MS instruments were evaluated by injecting 10 µM mixture of six synthetic standards (Supplementary table 1). The synchronized data acquisition process resulted in two paired MS data parts from targeted and untargeted analysis. The chromatograms were compared between the two mass analyzers to ensure that the post-column split of the eluent yielded the same retention time; i.e., both mass analyzers encountered ions at the same time. The retention times of all six compounds matched exactly (sulfamethazole and sulfamethazine co-eluted) (Supplementary figure 1a). We then prepared a dilution series starting from 10 µM of the same standard mixture and analyzed them on the platform to check the data linearity and limit of detection of MS analyzers. The second dilution series of the mixture with fecal extract was prepared to determine how complex background matrix effects influence both data linearity and detection limits. The dilution series enabled a linear regression analysis for both the detection limits and the linearity of instrument response. In the split system and without significant background matrix, the limit of detection was an order of magnitude better for all six standards in targeted vs untargeted ion detection (Supplementary figure 1b). However, matrix background had significant impact on the limits of detection. The greatest difference in the fecal samples in the limit of detection was two to three orders difference when targeted mass spectrometry was compared to the untargeted mode (Supplementary figure 1b). Limit of detection results for each compound ranged from 0.55 to 4.84 nM in targeted and 14.0 to 106 nM in untargeted analysis. When background was present, this changed to 5.48 to 34.6 and 403 to 724 nM in targeted and untargeted data, respectively (Supplementary table 2). To understand how different amounts of background affects the mass spectrometry signals, we added three standards: sulfamethizole, amitriptyline and sulfachloropyridazine (all at 200 nM) to the same fecal sample, but with different amount of material ranging from 0.8 to 6.2 milligrams. The targeted analysis of three added standards revealed measurements variation of 25.1%, 20.0% and 20.2% for sulfamethizole, amitriptyline and sulfachloropyridazine respectively (Figure 2a), whereas untargeted results reveal that the data varied with 29.7%, 26.0% and 28.1% (Supplementary table 3). For comparison, the variability of ten annotated compounds measured using the untargeted data part of the workflow in the same fecal extracts ranged from 64.7 to 318.2% and is consistent with the amount of background fecal material is present (Figure 2a). To measure the variability of MS analyzers at different concentrations, a dilution series from 1 µM to 1 nM was prepared this time in pure solvent for the three standards specified above. Relative standard deviations (σ) were then calculated and plotted (Figure 2b). The σ of targeted and

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

untargeted measurements was similar for two most concentrated samples (1 µM and 100 nM) and ranged from 0.6% to 6.7%. However, at low concentrations we observed that the untargeted side resulted in a much higher σ: ranging from 3.1% to 3.9% at 10 nM for targeted analysis of all six standards, and 12.6 to 26.9% for untargeted, as we are beginning to push the limits of detection. At 1 nM sulfachloropyridazine could no longer be detected via the untargeted arm of this experiment. As expected, overall, better performance on known molecules was observed for targeted measurements, especially at lower concentrations (Figure 2b). To compare the detection capabilities and further understand the dynamic range of targeted vs. untargeted analysis, we prepared a standard mixture of 41 compounds that could potentially be found in fecal material at six different concentrations ranging from 10 µM to 100 pM (See Supporting Information). At the highest concentrations, 36 out of the 41 targeted molecules were detected by each of the analysis arms, 35 of which overlapped between the targeted and untargeted. At lower concentrations, the targeted branch of the analysis could detect additional molecules compared to the untargeted branch (Figure 2c). To assess the dynamic range, we plotted the highest intensity signal at 10 µM compared to the lowest signal that was detected for a multiple reaction monitoring(MRM) transition for a compound that had at least three points in the linear regression curve (Supplementary Figure 2) for eighteen molecules in total. This revealed that the dynamic range for these molecules in untargeted and targeted branches spans over four and five orders of magnitude respectively, which was consistent with previous studies that used such instruments in independent workflows.30,31(Figure 2d and e). Due to high complexity of fecal material, we also assessed the detection performance of both analysis modes for the 41 molecules in a fecal background. The standard mix at 10 µM was serially diluted with a single fecal extract down to 1 nM concentration and analyzed using the platform. The data revealed 31 transitions at correct retention times detected by the targeted branch at the highest 1 µM concentration which is consistent with the detection rate without background addition. Contrastingly, only two ions could be detected at the lowest 1 nM concentration compared to five detected when background is not present. Introduction of the complex background resulted in decrease of the dynamic range by an order of magnitude.

Demonstration of platform utility for molecule – microbiome wide association studies. Targeted analysis. With the workflow in place and its behavior well characterized, we set out to assess how effective this approach could be used in a metabolite - microbiome wide association study of the American Gut project (http://americangut.org/), a crowdsourced microbiome sampling effort, using a random non-spiked sample subset from 103 volunteers. For the targeted analysis component, we aimed to detect the same 41 metabolites and correlate the metabolomics data with microbiome sequence data. Similarly, to the inclusion of the artificial background, 31 of the 41 MRM transitions corresponding to target metabolites could be detected in actual human fecal samples. Several of the 31 transitions were detected in more than 100 samples. These

Page 4 of 13

targeted compounds include tryptophan, lithocholic acid, and riboflavin, (other molecules were detected less frequently). Biotin, C17-sphingosine, estradiol and LysoPAF were only detected in fewer than twenty samples. The higher overall frequency of riboflavin detection, for example, can be explained by its presence in commonly consumed foods such as milk, cheese, and eggs; the median intake of riboflavin from food in the United States and Canadian populations has been estimated to be approximately 2 mg/day for men and 1.5 mg/day for women.32 Conversely, cobalamin, vitamin synthesized by microbes and archaea,33 is present in much smaller quantities (median intake from food in the United States was estimated to be approximately 5 µg/day for men and 3.5 µg/day for women32) and therefore was detected only by the targeted arm of the workflow. There were five bile acids targeted in the study and all of them were detected in more than 50 samples above the limit of detection by the targeted arm. Lithocholic, deoxycholic, taurodeoxycholic acids, all microbially modified secondary bile acids, were detected with higher abundance compared to primary bile acids such as cholic and glycocholic acids, consistent with previous studies.34,35 (Figure 3a) Although the targeted analysis carried out on QqQ instrument is generally accepted by the metabolomics community as the highest level of annotation when transitions are monitored and matched to retention time of authentic standard (level 1 according to the 2007 metabolomics initiative)36, one can still not immediately assume that the MRM transitions or parent mass and retention time unambiguously identify one specific molecule when performing analysis in such complex backgrounds. In this study, at least ten compounds were found with a single transition ion monitored, thus representing ~30% of the data, where signal could have been confused with molecules of interest. (Supplementary Figure 3). For example, with the chromatography setup used in this study, targeted molecules such as testosterone and mesterolone eluted within seconds of closely related analogs of androstanedione and androstenedione, respectively, possessing the identical molecular masses as well as targeted transition ions. This would normally result in incorrect interpretation of the results, unless additional information such as high resolution MS/MS spectra from the untargeted part of the workflow is available. This means that in the employed platform, the untargeted branch can be used to verify the targeted data. For example, to target aspartame in samples, the instrument was set to monitor m/z 295.13 as a parent ion and m/z 120.08 as fragment ion with expected retention time of 2.96 minutes. Such ion pairs were detected in more than 80 samples, with estimated concentrations of the compound as high as 286 nM. The manual inspection of m/z 295.13 MS/MS fragmentation pattern from untargeted analysis revealed a match to a putative Glu-Phe dipeptide structurally similar to aspartame, and that gave rise to the same MRM transition. Similar misleading results were observed for adenine, the oral contraceptive mestranol (and would be a surprise to the men in this study), norepinephrine, and other molecules (Supplementary Figure 3). Due to the integrated nature of the acquired data, it is possible to cross validate the results to improve the accuracy of the targeted information, especially in the cases where tens to hundreds of MRM ions are monitored at once and it is

ACS Paragon Plus Environment

Page 5 of 13

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

impractical to target more than one fragment ion per molecule. Given the nature of biological samples such as feces that present complex backgrounds, it becomes even more important to obtain a high level of confidence to support the observations obtained through the targeted approach. It is reasonable to anticipate that many of the targeted metabolomics studies used widely in microbiome studies will also have an error rate of 30% due to the high chance that several MRM transitions can be shared between different molecular species in complex sample. To alleviate high misannotation rates, a great care should be taken when MRM transitions are assigned especially for structurally similar compounds with close elution times (Supplementary Figure 3). Because of this, not only great care in setting up the experiment is needed but we also encourage the public deposition of raw data in addition to the final quantification table when publishing targeted data so that future careful analysis may be accomplished.37 As the untargeted analysis is beneficial for the verification of the targeted mode data, the opposite is also true. The targeted analysis in conjunction with establishing dilution series for the molecules of interest allows to propagate quantitative information for these molecules of interest (e.g. those involved in particular metabolic pathways) throughout the dataset. Orders of magnitude greater sensitivity of the targeted branch is essential in assessments of presence and amount of low-abundance compounds in complex matrix, a common occurrence in fecal samples and for biological studies in general. Combination of the two analysis modes allows placing new discoveries (untargeted approach) in a context of existing knowledge (targeted approach) and in this regard the described approach offers additional benefits compared to the individual modes used separately - the whole is greater than the sum of its parts. Molecular Networking and Annotation of Detected Molecules. To gain additional annotations and to visualize the diversity of chemistries obtained in an untargeted metabolomics experiment, the untargeted data were subjected to molecular networking on GNPS.13 The general concept of the molecular networking is inferring structural similarities between molecules connected into a network based on the similarities of their corresponding MS/SM fragmentation patterns. The successful annotation of some compounds in the molecular network can in principle be propagated to neighboring nodes thus allowing for more in-depth annotation. There were 8,831 merged consensus spectra (nodes) corresponding to unique compounds across all samples in the molecular network from 103 human fecal samples analyzed (Supplementary Figure 4), and 3,610 nodes were self-looped, indicating that these MS/MS spectra are not similar to any other structures. The frequency of finding molecules that were captured by the method was plotted onto the graph (Figure 3c). Pantothenic acid (vitamin B5) can be found in almost every sample analyzed, while specific drugs such as benazeprilat, a primary treatment for congestive heart failure, was only detected in a single sample. Many key compounds that can be directly attributed to metabolic processes; e.g., biliverdin, the product of heme catabolism,38–40 were detected in subsets of samples. All five bile acids of interest that were included in the targeted data set were also annotated via GNPS

library matching in the untargeted data (Figure 3b). Three out of five bile acids were found in a single cluster that contained both primary bile acids and corresponding secondary bile acids. The cluster included glycocholic and taurocholic acids and analogs that contain a thermally labile amide bond prone to fragmentation. The cholic acid subfamily found at the bottom of the cluster lacks this amide bond, which results in different fragmentation pathway separating these structures. The taurodeoxycholic acid subfamily, located on the top (Figure 3b), has an amide bond that does not have the 12αhydroxylation.41 Other nodes such as lithocholic and deoxycholic secondary bile acids, were found at a separate location in the molecular network. Consistent with the targeted data, we observed a higher abundance of secondary bile acids such as lithocholic, deoxycholic, and taurodeoxycholic acids, indicating partial or complete transformation of these acids by the gut microbiota.34 The molecular networking performed with untargeted data provides additional evidence that a large number of bile acids beyond those originally targeted may be present, many of these compounds could not be identified via matches to reference data in the public domain. When a molecular family in a molecular network contains annotated compound and is surrounded by unannotated nodes, the additional structural insight for unknowns can be proposed based on the annotated neighbours.42 As an example of this in the current study, we utilized the cluster of steroids annotated via spectral matching of public reference data and derivative compounds, shown in Figure 3d. In this molecular family, testosterone was annotated and matched to a database mass spectrum. Although the parent mass and fragmentation pattern match those of testosterone, it is possible that the compound is an isomer from the testosterone molecular family. Using a commercial standard, we verified that this spectrum obtained at its specific retention time was indeed consistent with testosterone. Using public MS/MS spectra reference data, three additional testosterone neighbors were annotated as sitostenone, cholestenone, and bolasterone. Several nodes, including a node m/z 399.362, remained unannotated and did not match any reference standards. Immediate proximity in the network to testosterone, sitostenone, and cholestenone annotations supports that this spectrum belongs to a molecule with structural similarity to all three compounds. Mass differences of -14 Da with cholestenone and +14 Da with sitostenone suggested differences in a methyl group in both cases. Inspecting the fragmentation pattern suggests this methyl group is on the lipid tail. The observed fragmentation pattern of the unannotated structure for m/z 399.362 matches to campestenone or an isomer thereof (Figure 3d). While mass spectrometry can give significant insight into the nature of the molecules, it should be pointed out that differences in fragmentation patterns may not be sufficiently informative to discriminate all possible structural isomers and that mass spectrometry is blind to stereochemistry unless the front-end techniques amenable to chirality-based separation are employed in conjunction with MS analysis. Statistical analysis and data integration. To determine whether individual branches of collected paired data enable explanation of chemical differences within the sample set, we visualized the samples after dimensionality reduction using principal coordinate analysis (PCoA) with Bray-Curtis

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

dissimilarity metric (Supplementary Figure 5a, b). When targeted metabolite abundances were used for PCoA, no obvious separation was observed besides four distantly clustered samples (Supplementary Figure 5a). Although the same analysis with features acquired from untargeted data has shown four distantly located samples, it also revealed a much more informative metabolome-driven differences between samples from different geographical locations (Supplementary Figure 5b). Despite the name “American Gut” for the project, the samples from multiple countries are represented in the given subset of the samples. Two main countries the United Kingdom and the USA, largely occupy distinct regions of PCoA. Although both are westernized nations, this highlights that the molecular makeup of fecal samples is distinct and is driven by the country of residence. A few other samples were obtained from a handful of other European countries; most of these also clustered closer to United Kingdom samples. To understand if the variance captured by untargeted and targeted data is driven by the same biology, we performed Procrustes analysis (least-squares orthogonal mapping) on Bray-Curtis distances (Supplementary Figure 5c).24 Visually, limited correlation between targeted and untargeted data branches was observed with regard to the country-based clustering. However, the four samples that clustered separately in both datasets were highly correlated to each other. Previous microbiome studies have shown that geographical location of the host drives major differences in the gut ecosystem.5,6 This major trend was also observed for 16S rRNA sequencing data of these fecal samples (Figure 4c). To test whether targeted and/or untargeted metabolomics data from the gut can predict microbiome composition, we performed pairwise Procrustes comparisons of sample coordinates obtained from metabolomics datasets (Supplementary Figure 5a, b). to those obtained from 16S rRNA dataset (Figure 4c). Visually, the untargeted metabolomics data appeared to correlate to the microbiome better than targeted dataset (Figure 4a, b). This was confirmed by the M2 statistic obtained from Procrustes analysis: untargeted metabolomics data was correlated with patterns observed in microbiome data to a better extent (M2=0.774) when compared to targeted metabolomics data (M2=0.822). In comparison, when Procrustes was performed between metabolomics datasets with a null model (randomly permuted microbiome data), the fit was much lower (M2=0.984 and M2=0.983 for targeted and untargeted, respectively) (Supplementary Table 4). These results highlight that although both targeted and untargeted datasets exhibited higher correlations with microbiome data when compared to those occurring by random chance, untargeted data was able to predict the actual microbiome composition better than targeted data. We hypothesize that this is because there are many microbial molecules, or microbially modified molecules not accounted for in the targeted data set. Overall, the relatively weak correlations of metabolites with the microbial communities in the gut suggest that only small fraction of the molecules that were detected are defined by the microbes in the gut. In order to determine which molecules could be associated with specific members of the microbial community, we have examined the untargeted metabolomics data and their correlations with microbiome taxonomic profiles using Spearman’s rank correlation coefficient network analysis

Page 6 of 13

(Figure 4d).25 Microbe-to-metabolite networking revealed that the genus Pseudomonas was directly associated with cluster of 129 MS features, Alkalimonas spp. were associated with 99 features, and family Lachnospiraceae was associated with 46 features. (Figure 4d). To further explore the microbiome-associated compounds, we reinspected the molecular network.13 (Supplementary Figure 4). Out of all the metabolites associated with the microbe-to-metabolite network, annotations were available for only two molecular families. The sparsity of annotations is a direct result of the limited number of reference spectra for molecules from microbes in the public domain. The microbial annotations that were all obtained using reference spectra that originated through GNPS crowdsourced capture of reference data, which signifies the importance of crowdsourced accumulation of mass spectrometry knowledge as one of the keys to our improved understanding of molecule - microbiome wide associations. The molecular families that were identified in the molecular network were Pseudomonas aeruginosa-associated metabolites: the signaling compounds quinolones43 and rhamnolipids42; i.e., virulence factors, which are also specific biosurfactants produced by Pseudomonas aeruginosa for biofilm formation, nutrient acquisition, and motility.44,45 (Figure 5a). The sample information from the American gut project metadata reveals that the person with a largest amount of Pseudomonas aeruginosa-associated molecules selfreported a previous diagnosis with irritable bowel syndrome (IBS). According to previous studies, Pseudomonas spp., in particular P. aeruginosa, are found to be associated with all IBS subtypes.46,47 To our knowledge, these are the first Pseudomonas-specific metabolites observed in human feces. Taxonomic profile of these two samples shows the high abundance of Pseudomonadaceae family for samples 31151 and 27689 with a fraction of the population of ~33% and ~2%, respectively, (Figure 5b) as well as a presence of Pseudomonas genus in both samples (Supplementary Table 5). Additionally, microbe-to-metabolite networking suggests that molecules associated with Alkalimonas spp., Lachnospiraceae, and several other organisms can also be detected. Such correlations could arise from organisms biosynthesizing these molecules directly, or through modifying molecules present in the gut. Alternatively, some of the co-detected molecules and microbes could originate from food or the external environment.

CONCLUSION Human fecal material is one of the most complex sample types one can investigate via mass spectrometry. Due to our increased understanding of the role of the gut microbiome in health and disease, there is a significant need to understand the chemical environment of the microbiome as this determines what microbes can grow and when. Unlike well controlled animal studies, people have individualized lifestyles and this influences the type of molecules and microbes that are present. Therefore, there is a need to develop and assess the limitations of integrated methods for molecule - microbiome wide association studies. The paired single sample - dual mass spectrometry analyses approach described here reveals that

ACS Paragon Plus Environment

Page 7 of 13

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

targeted mass spectrometry is more sensitive and has a higher dynamic range than untargeted mass spectrometry for the investigated compounds under current experimental conditions. At the same time, outputs from the untargeted approach can serve as a check of the data produced from the targeted approach to ensure the correct molecule is identified and reported on. Despite limited annotations present in existing public reference data sets, untargeted metabolomics

enables the discovery of compounds associated with specific microbes, even in complex samples such as stool; areas such as dissolved organic matter, sewage or wastewater that have very high diversity of possible molecules would also benefit from the approaches introduced here.48–50 This sets the stage for population based molecule-microbe association studies that will be critical in further understanding the role of the gut in health and disease.

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 13

Figure 1. One sample - two metabolomics analysis workflow. Step 1 - Synchronized data acquisition: eluent from UPLC is split post column and ionized using identical HESI ion sources on two MS instruments with acquisition synchronized via UPLC contact closure, generating untargeted and untargeted data simultaneously. Step 2 - data processing: data conversion and subsequent feature detection and alignment to create a table of features. Step 3 - Results: two metabolomics tables are generated. Targeted table is used for quantification estimation plot. Untargeted data is used for statistical and correlative analysis with respect to microbial sequencing data which represented as a biom table.

Figure 2. Benchmarking the one sample - two metabolomics workflow. a) Box plots of three standards for which peak areas were calculated from targeted data on the left followed by ten annotated fecal compounds with peak areas calculated from untargeted data on the right; boxes represent interquartile, lines within boxes represent dataset median values, whiskers show a dataset range and outliers represented by circles. b) Variability assessment (heteroscedasticity) for each of three standards at different concentrations (n = 3 number of replicates for each concentration) using targeted and untargeted data, bars represent the relative standard deviation in percent. c) Detection rate comparison of the paired data i.e. number of molecules detected by the targeted, blue bars, and untargeted, orange bars, branch at each concentration. (See methods for annotation details). d) and e) Bar chart of mean peak areas for 18 compounds, blue bars correspond to targeted branch and orange to untargeted, flat color of the bars denotes mean maximal areas for the highest quantities, whereas shaded color is mean minimal areas for lowest quantities detected for respective compound. Black line crossing is a mean baseline noise for all compounds plotted.

ACS Paragon Plus Environment

Page 9 of 13

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figure 3. Targeted/Untargeted data analysis for American Gut study samples. a) Concentration estimation plots for eleven targeted molecules in all 103 samples obtained from targeted data. b) Frequency plot defines in how many samples specific molecules appear and share MS/MS spectrum. For example, node 1 is found in the data of all samples while the last nodes are only found in one or several samples. c) Molecular family of bile acids. d) Molecular family of steroid-like compounds with testosterone standard denoted by the red color. Structure for campestenone is proposed based on MS/MS fragmentation similarity to testosterone and library annotations of cholestenone and sitostenone. Brackets show the uncertainty in the position of methyl groups. Color key: orange color denotes standards, red - standards overlapped with spectra from fecal samples, grey color shows fecal samples only. V-shaped nodes are library identifications. If annotations are only based on spectral matches and not compared to an authentic standard, they are called putative. Proposed molecules are propagated annotations based on spectral similarity to molecules that have spectral matches and/or standards.

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 13

Figure 4. Correlation analysis of paired mass spectrometry data with microbiome. a) and b) Procrustes analysis of targeted and untargeted metabolomics data respectively from a peak area feature matrix in all samples with 16S rRNA sequencing data. Bray-Curtis and UniFrac distance metrics were used for metabolome and microbiome data respectively. c) PCoA of 16S rRNA microbial data in all samples using unweighted unifrac distances. Each sphere represents a sample; color-coding based on country of a sample origin. d) Microbe-metabolite correlation network of untargeted data. Green hexagon nodes denote microbes, gray circles denote metabolites, gray v-shaped nodes reflect library annotations from molecular networking results, gray squares are nodes that have MS/MS fragmentation spectra. Pseudomonas spp. and related data is shown with red circles, Alkalimonas spp. was shown with blue circles and Lachnospiracea with pink circles.

ACS Paragon Plus Environment

Page 11 of 13

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figure 5. Microbial metabolites detected in samples. a) Quinolones and rhamnolipids molecular families are highly specific metabolites known to be produced by P. aeruginosa which were found in only two samples shown to contain the microbe. Gray circles denote unannotated metabolites; gray v-shaped nodes reflect library annotations from molecular networking results. b) Taxonomic summary of two Pseudomonas containing samples, taxa fractions are logarithmic transformed on base 10, full taxa summary of these samples can be found in supplementary materials (Supplementary table 5). In sample 27689, there are ~2% of the microbial taxa is in the family Pseudomonadaceae, and ~.05% of them are in the genus Pseudomonas. For sample 31151, ~33% are Pseudomonadaceae, and ~0.6% of them are Pseudomonas.

TOC:

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ASSOCIATED CONTENT Supporting Information The Supporting Information is available free of charge on the ACS Publications website.

AUTHOR INFORMATION Corresponding Author * E-mail: [email protected]

Present Addresses †If an author’s address is different than the one given in the affiliation line, this information may be included here.

Author Contributions The manuscript was written through contributions of all authors. / All authors have given approval to the final version of the manuscript. / Study design: AVM, PCD. System Setup: AVM, FV Sample collection: EH, BA Optimus and MRMConvert software: IP, TA Data analysis: AVM, AA, AT Statistical analysis: AVM, RS, AT, EH, IP, AA Wrote the manuscript: AVM, PCD, AJ, AA

(14) (15) (16)

ACKNOWLEDGMENT We acknowledge NIH Grants 5P41GM103484-07 GMS10RR029121 and Office of Naval Research for the support that enabled this work. IP and TA acknowledge funding from the European Union’s Horizon 2020 program under the grant agreement 634402.

(17) (18)

REFERENCES (1) Hsiao, E. Y.; McBride, S. W.; Hsien, S.; Sharon, G.; Hyde, E. R.; McCue, T.; Codelli, J. A.; Chow, J.; Reisman, S. E.; Petrosino, J. F.; Patterson, P. H.; Mazmanian, S. K. Cell 2013, 155 (7), 1451–1463. (2) Trompette, A.; Gollwitzer, E. S.; Yadava, K.; Sichelstiel, A. K.; Sprenger, N.; Ngom-Bru, C.; Blanchard, C.; Junt, T.; Nicod, L. P.; Harris, N. L.; Marsland, B. J. Nat. Med. 2014, 20 (2), 159–166. (3) Garrett, W. S. Science 2015, 348 (6230), 80–86. (4) Petersen, C.; Round, J. L. Cell. Microbiol. 2014, 16 (7), 1024–1033. (5) Lloyd-Price, J.; Abu-Ali, G.; Huttenhower, C. Genome Med. 2016, 8 (1), 51. (6) Yatsunenko, T.; Rey, F. E.; Manary, M. J.; Trehan, I.; DominguezBello, M. G.; Contreras, M.; Magris, M.; Hidalgo, G.; Baldassano, R. N.; Anokhin, A. P.; Heath, A. C.; Warner, B.; Reeder, J.; Kuczynski, J.; Caporaso, J. G.; Lozupone, C. A.; Lauber, C.; Clemente, J. C.; Knights, D.; Knight, R.; Gordon, J. I. Nature 2012, 486 (7402), 222– 227. (7) Wang, J.; Jia, H. Nat. Rev. Microbiol. 2016, 14 (8), 508–522. (8) Gilbert, J. A.; Quinn, R. A.; Debelius, J.; Xu, Z. Z.; Morton, J.; Garg, N.; Jansson, J. K.; Dorrestein, P. C.; Knight, R. Nature 2016, 535 (7610), 94–103. (9) Jansson, J.; Willing, B.; Lucio, M.; Fekete, A.; Dicksved, J.; Halfvarson, J.; Tysk, C.; Schmitt-Kopplin, P. PLoS One 2009, 4 (7), e6386. (10) Raman, M.; Ahmed, I.; Gillevet, P. M.; Probert, C. S.; Ratcliffe, N. M.; Smith, S.; Greenwood, R.; Sikaroodi, M.; Lam, V.; Crotty, P.; Bailey, J.; Myers, R. P.; Rioux, K. P. Clin. Gastroenterol. Hepatol. 2013, 11 (7), 868–875.e1–e3. (11) Roberts, L. D.; Souza, A. L.; Gerszten, R. E.; Clish, C. B. Curr. Protoc. Mol. Biol. 2012, Chapter 30, Unit 30.2.1–24. (12) Cajka, T.; Fiehn, O. Anal. Chem. 2016, 88 (1), 524–545. (13) Wang, M.; Carver, J. J.; Phelan, V. V.; Sanchez, L. M.; Garg, N.; Peng, Y.; Nguyen, D. D.; Watrous, J.; Kapono, C. A.; Luzzatto-

(19) (20) (21) (22)

(23)

(24) (25)

(26)

(27)

Page 12 of 13

Knaan, T.; Porto, C.; Bouslimani, A.; Melnik, A. V.; Meehan, M. J.; Liu, W.-T.; Crüsemann, M.; Boudreau, P. D.; Esquenazi, E.; Sandoval-Calderón, M.; Kersten, R. D.; Pace, L. A.; Quinn, R. A.; Duncan, K. R.; Hsu, C.-C.; Floros, D. J.; Gavilan, R. G.; Kleigrewe, K.; Northen, T.; Dutton, R. J.; Parrot, D.; Carlson, E. E.; Aigle, B.; Michelsen, C. F.; Jelsbak, L.; Sohlenkamp, C.; Pevzner, P.; Edlund, A.; McLean, J.; Piel, J.; Murphy, B. T.; Gerwick, L.; Liaw, C.-C.; Yang, Y.-L.; Humpf, H.-U.; Maansson, M.; Keyzers, R. A.; Sims, A. C.; Johnson, A. R.; Sidebottom, A. M.; Sedio, B. E.; Klitgaard, A.; Larson, C. B.; Boya P, C. A.; Torres-Mendoza, D.; Gonzalez, D. J.; Silva, D. B.; Marques, L. M.; Demarque, D. P.; Pociute, E.; O’Neill, E. C.; Briand, E.; Helfrich, E. J. N.; Granatosky, E. A.; Glukhov, E.; Ryffel, F.; Houson, H.; Mohimani, H.; Kharbush, J. J.; Zeng, Y.; Vorholt, J. A.; Kurita, K. L.; Charusanti, P.; McPhail, K. L.; Nielsen, K. F.; Vuong, L.; Elfeki, M.; Traxler, M. F.; Engene, N.; Koyama, N.; Vining, O. B.; Baric, R.; Silva, R. R.; Mascuch, S. J.; Tomasi, S.; Jenkins, S.; Macherla, V.; Hoffman, T.; Agarwal, V.; Williams, P. G.; Dai, J.; Neupane, R.; Gurr, J.; Rodríguez, A. M. C.; Lamsa, A.; Zhang, C.; Dorrestein, K.; Duggan, B. M.; Almaliti, J.; Allard, P.-M.; Phapale, P.; Nothias, L.-F.; Alexandrov, T.; Litaudon, M.; Wolfender, J.-L.; Kyle, J. E.; Metz, T. O.; Peryea, T.; Nguyen, D.-T.; VanLeer, D.; Shinn, P.; Jadhav, A.; Müller, R.; Waters, K. M.; Shi, W.; Liu, X.; Zhang, L.; Knight, R.; Jensen, P. R.; Palsson, B. Ø.; Pogliano, K.; Linington, R. G.; Gutiérrez, M.; Lopes, N. P.; Gerwick, W. H.; Moore, B. S.; Dorrestein, P. C.; Bandeira, N. Nat. Biotechnol. 2016, 34 (8), 828–837. da Silva, R. R.; Dorrestein, P. C.; Quinn, R. A. Proc. Natl. Acad. Sci. U. S. A. 2015, 112 (41), 12549–12550. Coscollà, C.; León, N.; Pastor, A.; Yusà, V. J. Chromatogr. A 2014, 1368, 132–142. Senyuva, H. Z.; Gökmen, V.; Sarikaya, E. A. Food Addit. Contam. Part A Chem. Anal. Control Expo. Risk Assess. 2015, 32 (10), 1568– 1606. Bijlsma, L.; Emke, E.; Hernández, F.; de Voogt, P. Anal. Chim. Acta 2013, 768, 102–110. Chambers, M. C.; Maclean, B.; Burke, R.; Amodei, D.; Ruderman, D. L.; Neumann, S.; Gatto, L.; Fischer, B.; Pratt, B.; Egertson, J.; Hoff, K.; Kessner, D.; Tasman, N.; Shulman, N.; Frewen, B.; Baker, T. A.; Brusniak, M.-Y.; Paulse, C.; Creasy, D.; Flashner, L.; Kani, K.; Moulding, C.; Seymour, S. L.; Nuwaysir, L. M.; Lefebvre, B.; Kuhlmann, F.; Roark, J.; Rainer, P.; Detlev, S.; Hemenway, T.; Huhmer, A.; Langridge, J.; Connolly, B.; Chadick, T.; Holly, K.; Eckels, J.; Deutsch, E. W.; Moritz, R. L.; Katz, J. E.; Agus, D. B.; MacCoss, M.; Tabb, D. L.; Mallick, P. Nat. Biotechnol. 2012, 30 (10), 918–920. Center, N. M. S. D.; Stein, S. E. NIST Chemistry WebBook, NIST Standard Reference Database No. 69. Falkner, J. A.; Falkner, J. W.; Andrews, P. C. Bioinformatics 2007, 23 (2), 262–263. Warr, W. A. J. Comput. Aided Mol. Des. 2012, 26 (7), 801–804. Kohlbacher, O.; Reinert, K.; Gröpl, C.; Lange, E.; Pfeifer, N.; Schulz-Trieglaff, O.; Sturm, M. Bioinformatics 2007, 23 (2), e191– e197. Caporaso, J. G.; Kuczynski, J.; Stombaugh, J.; Bittinger, K.; Bushman, F. D.; Costello, E. K.; Fierer, N.; Peña, A. G.; Goodrich, J. K.; Gordon, J. I.; Huttley, G. A.; Kelley, S. T.; Knights, D.; Koenig, J. E.; Ley, R. E.; Lozupone, C. A.; McDonald, D.; Muegge, B. D.; Pirrung, M.; Reeder, J.; Sevinsky, J. R.; Turnbaugh, P. J.; Walters, W. A.; Widmann, J.; Yatsunenko, T.; Zaneveld, J.; Knight, R. Nat. Methods 2010, 7 (5), 335–336. Vázquez-Baeza, Y.; Pirrung, M.; Gonzalez, A.; Knight, R. Gigascience 2013, 2 (1), 16. McHardy, I. H.; Goudarzi, M.; Tong, M.; Ruegger, P. M.; Schwager, E.; Weger, J. R.; Graeber, T. G.; Sonnenburg, J. L.; Horvath, S.; Huttenhower, C.; McGovern, D. P.; Fornace, A. J., Jr; Borneman, J.; Braun, J. Microbiome 2013, 1 (1), 17. Shannon, P.; Markiel, A.; Ozier, O.; Baliga, N. S.; Wang, J. T.; Ramage, D.; Amin, N.; Schwikowski, B.; Ideker, T. Genome Res. 2003, 13 (11), 2498–2504. Watrous, J.; Roach, P.; Alexandrov, T.; Heath, B. S.; Yang, J. Y.; Kersten, R. D.; van der Voort, M.; Pogliano, K.; Gross, H.; Raaijmakers, J. M.; Moore, B. S.; Laskin, J.; Bandeira, N.;

ACS Paragon Plus Environment

Page 13 of 13

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(28) (29) (30) (31)

(32)

(33) (34) (35) (36)

(37) (38) (39)

Analytical Chemistry

Dorrestein, P. C. Proc. Natl. Acad. Sci. U. S. A. 2012, 109 (26), E1743–E1752. Quinn, R. A.; Nothias, L.-F.; Vining, O.; Meehan, M.; Esquenazi, E.; Dorrestein, P. C. Trends Pharmacol. Sci. 2017, 38 (2), 143–154. Allard, P.-M.; Genta-Jouve, G.; Wolfender, J.-L. Curr. Opin. Chem. Biol. 2017, 36, 40–49. Makarov, A.; Denisov, E.; Lange, O.; Horning, S. J. Am. Soc. Mass Spectrom. 2006/7, 17 (7), 977–982. Henry, H.; Sobhi, H. R.; Scheibner, O.; Bromirski, M.; Nimkar, S. B.; Rochat, B. Rapid Commun. Mass Spectrom. 2012, 26 (5), 499– 509. Institute of Medicine; Food and Nutrition Board; A Report of the Standing Committee on the Scientific Evaluation of Dietary Reference Intakes and its Panel on Folate, Other B Vitamins, and Choline and Subcommittee on Upper Reference Levels of Nutrients. Dietary Reference Intakes for Thiamin, Riboflavin, Niacin, Vitamin B6, Folate, Vitamin B12, Pantothenic Acid, Biotin, and Choline; National Academies Press, 2000. Roth, J. R.; Lawrence, J. G.; Bobik, T. A. Annu. Rev. Microbiol. 1996, 50, 137–181. Ridlon, J. M.; Kang, D. J.; Hylemon, P. B.; Bajaj, J. S. Curr. Opin. Gastroenterol. 2014, 30 (3), 332–338. Wahlström, A.; Sayin, S. I.; Marschall, H.-U.; Bäckhed, F. Cell Metab. 2016, 24 (1), 41–50. Sumner, L. W.; Amberg, A.; Barrett, D.; Beale, M. H.; Beger, R.; Daykin, C. A.; Fan, T. W.-M.; Fiehn, O.; Goodacre, R.; Griffin, J. L.; Hankemeier, T.; Hardy, N.; Harnly, J.; Higashi, R.; Kopka, J.; Lane, A. N.; Lindon, J. C.; Marriott, P.; Nicholls, A. W.; Reily, M. D.; Thaden, J. J.; Viant, M. R. Metabolomics 2007, 3 (3), 211–221. Nat. Methods 2016, 13 (10), 799–799. Zhou, K.; Jiang, M.; Liu, Y.; Qu, Y.; Shi, G.; Yang, X.; Qin, X.; Wang, X. PLoS One 2014, 9 (6), e98905. Nakao, A.; Otterbein, L. E.; Overhaus, M.; Sarady, J. K.; Tsung, A.; Kimizuka, K.; Nalesnik, M. A.; Kaizu, T.; Uchiyama, T.; Liu, F.;

(40) (41) (42)

(43) (44) (45) (46) (47)

(48)

(49)

(50)

Murase, N.; Bauer, A. J.; Bach, F. H. Gastroenterology 2004, 127 (2), 595–606. Qin, X. Gut 2007, 56 (11), 1641–1642. Chiang, J. Y. L. J. Lipid Res. 2009, 50 (10), 1955–1966. Nguyen, D. D.; Melnik, A. V.; Koyama, N.; Lu, X.; Schorn, M.; Fang, J.; Aguinaldo, K.; Lincecum, T. L., Jr; Ghequire, M. G. K.; Carrion, V. J.; Cheng, T. L.; Duggan, B. M.; Malone, J. G.; Mauchline, T. H.; Sanchez, L. M.; Kilpatrick, A. M.; Raaijmakers, J. M.; Mot, R. D.; Moore, B. S.; Medema, M. H.; Dorrestein, P. C. Nat Microbiol 2016, 2, 16197. McKnight, S. L.; Iglewski, B. H.; Pesci, E. C. J. Bacteriol. 2000, 182 (10), 2702–2708. Ochsner, U. A.; Reiser, J.; Fiechter, A.; Witholt, B. Appl. Environ. Microbiol. 1995, 61 (9), 3503–3506. Abdel-Mawgoud, A. M.; Lépine, F.; Déziel, E. Appl. Microbiol. Biotechnol. 2010, 86 (5), 1323–1336. Bye, W.; Ishaq, N.; Bolin, T. D.; Duncombe, V. M.; Riordan, S. M. World J. Gastroenterol. 2014, 20 (10), 2449–2455. Kerckhoffs, A. P. M.; Ben-Amor, K.; Samsom, M.; van der Rest, M. E.; de Vogel, J.; Knol, J.; Akkermans, L. M. A. J. Med. Microbiol. 2011, 60 (Pt 2), 236–245. Moran, M. A.; Kujawinski, E. B.; Stubbins, A.; Fatland, R.; Aluwihare, L. I.; Buchan, A.; Crump, B. C.; Dorrestein, P. C.; Dyhrman, S. T.; Hess, N. J.; Howe, B.; Longnecker, K.; Medeiros, P. M.; Niggemann, J.; Obernosterer, I.; Repeta, D. J.; Waldbauer, J. R. Proc. Natl. Acad. Sci. U. S. A. 2016, 113 (12), 3143–3151. McCall, A.-K.; Bade, R.; Kinyua, J.; Lai, F. Y.; Thai, P. K.; Covaci, A.; Bijlsma, L.; van Nuijs, A. L. N.; Ort, C. Water Res. 2016, 88, 933–947. van Nuijs, A. L. N.; Mougel, J.-F.; Tarcomnicu, I.; Bervoets, L.; Blust, R.; Jorens, P. G.; Neels, H.; Covaci, A. Environ. Int. 2011, 37 (3), 612–621.

ACS Paragon Plus Environment