DynaMet: A Fully Automated Pipeline for Dynamic LC–MS Data

Sep 14, 2015 - expressive data visualization. To validate DynaMet we first used time course labeling data of the model strain Bacillus methanolicus fr...
3 downloads 6 Views 1MB Size
Article pubs.acs.org/ac

DynaMet: A Fully Automated Pipeline for Dynamic LC−MS Data Patrick Kiefer,*,†,§ Uwe Schmitt,‡,§ Jonas E. N. Müller,† Johannes Hartl,† Fabian Meyer,† Florian Ryffel,† and Julia A. Vorholt† †

Institute of Microbiology, ETH Zurich, Zurich, Switzerland 8093 ID Scientific IT Services, ETH Zurich, Zurich, Switzerland 8093



S Supporting Information *

ABSTRACT: Dynamic isotope labeling data provides crucial information about the operation of metabolic pathways and are commonly generated via liquid chromatography−mass spectrometry (LC−MS). Metabolome-wide analysis is challenging as it requires grouping of metabolite features over different samples. We developed DynaMet for fully automated investigations of isotope labeling experiments from LC-highresolution MS raw data. DynaMet enables untargeted extraction of metabolite labeling profiles and provides integrated tools for expressive data visualization. To validate DynaMet we first used time course labeling data of the model strain Bacillus methanolicus from 13C methanol resulting in complex spectra in multicarbon compounds. Analysis of two biological replicates revealed high robustness and reproducibility of the pipeline. In total, DynaMet extracted 386 features showing dynamic labeling within 10 min. Of these features, 357 could be fitted by implemented kinetic models. Feature identification against KEGG database resulted in 215 matches covering multiple pathways of core metabolism and major biosynthetic routes. Moreover, we performed time course labeling experiment with Escherichia coli on uniformly labeled 13C glucose resulting in a comparable number of detected features with labeling profiles of high quality. The distinct labeling patterns of common central metabolites generated from both model bacteria can readily be explained by one versus multicarbon compound metabolism. DynaMet is freely available as an extension package for Python based eMZed2, an open source framework built for rapid development of LC−MS data analysis workflows.

T

trometers results in large data sets requiring efficient tools to reduce and extract valuable information. An essential process in LC−MS data analysis is the grouping of detected peaks to features since the latter provide important information for compound identification and thus significantly reduce search space and simplify comparison of different samples. Efficient and fast feature grouper algorithms have already been developed for label free metabolomics data11−14 that handle different types of MS instruments with different characteristics concerning, e.g., linearity or mass resolution. Compared to label free strategies with data sets in which isotopologue distribution (ID) is characteristic for the metabolites elemental composition and remains unchanged, samples with labeled isotopes are highly complex. Thus, in labeling experiments the ID is a function of metabolic activity and the substrate labeling and features with different isotopic patterns have to be matched to the same metabolite. One approach to improve feature mapping consists in spiking natural labeled sample into labeled ones and comparison of those spectra to natural labeled reference samples.15 To remove the reference from labeled

ime-course labeling data provides crucial information about the dynamics of metabolites in biological systems because the speed of labeling incorporation into metabolite pools depends on the corresponding pathway activities. Depending on the sample frequency over time, metabolic processes with different turnover rates can be elucidated. Isotope labeling experiments, initially conducted with 14C and later with 13C, have a long tradition in the elucidation and fundamental understanding of essential pathways.1,2 With progress in mass spectrometry, stable isotope labeling strategies have entered the field of metabolomics (and proteomics) for systems-level studies of metabolic networks3 in contrast to earlier targeted studies where only a few selected metabolites were analyzed. Stable isotope probing is used for metabolic flux determination4 and for dynamic labeling experiments to demonstrate the operation of novel pathways.5,6 Moreover, labeling incorporation reveals the time point when a metabolite pool reaches steady state label, which is required for typical stationary metabolic flux analysis approaches.7−10 In particular, the determination of metabolites with highresolution mass spectrometry allows for the untargeted detection of thousands of isotopologues with minimal background, which offers exciting perspectives for pathway discovery in the untapped diversity of microbial processes. Especially LC−MS analysis with high-resolution mass spec© 2015 American Chemical Society

Received: May 1, 2015 Accepted: September 14, 2015 Published: September 14, 2015 9679

DOI: 10.1021/acs.analchem.5b01660 Anal. Chem. 2015, 87, 9679−9686

Article

Analytical Chemistry

mM). After the labeling switch, samples were taken at 0 s, 1 s, 3 s, 6 s, 11 s, 16 s, 20 s, 30 s, 40 s, 60 s, 90 s, 120 s, 160 s, 200 s, 270 s, 300 s, 400 s, 500 s, 600 s by fast filtration, washed with 10 mL of prewarmed (37 °C) deionized water containing 2 mM glucose (10-fold excess of [U-13C] compared to natural labeled glucose), and quenched, extracted, and analyzed via LC−HRMS as described above (scan range of 100 ≤ m/z ≤ 1000). Determination of Labeling Incorporation Kinetics. To compare the quality of automatically extracted labeling profiles, all isotopologue peaks of selected core metabolites were extracted also manually by defining individual retention time and m/z windows. We calculated peak areas using eMZed2’s exponential modified Gaussian peak model or numerical integration using the trapezoidal rule. Isotopologue fractions were determined as follows:

compounds, IDs can subsequently be corrected for natural labeled abundances. Recently, an extension of the data analysis platform XCMS was introduced.16 X13CMS uses accurate isotope mass shifts to identify isotopic pattern of labeled features. However, none of these tools directly targets the analysis of dynamic labeling incorporation. Here, we introduce DynaMet, a comprehensive, fully automated data analysis workflow to extract metabolite labeling profiles from 13C time course labeling experiments starting from liquid chromatography−high-resolution mass spectrometry (LC−HRMS) raw data. The tool extracts all features with significant labeling incorporation and provides their labeling profiles as well as their isotopologues distributions over time. Moreover, it matches features with database(s) for metabolite identification. The pipeline was primarily designed for pathway discovery but output can be used as input for 13C MFA (Metabolic Flux Analysis) as for unbiased discovery of new enzymatic reactions and pathways. We tested DynaMet with LC−MS data sets generated upon dynamic labeling experiments conducted with Bacillus methanolicus MGA3 during growth on methanol as the sole carbon source.17 Because all carbon bounds are built from the one carbon precursor, such data are particularly suited to test and validate DynaMet. Our results show that the pipeline successfully identified core metabolic pathways on an automated basis and provides a wide overview on metabolite processes at the chosen time scale. In addition, we conducted labeling experiments from [U-13C] glucose using Escherichia coli and used DynaMet for data analysis showing the general applicability of the automated platform.

fM = i

A Mi j=n ∑j = 0 A Mj

(1)

where AMi is the area of the ith isotopologue and n is the number of labeled isotopes. Values were corrected for natural labeled carbon as described previously21 using the DynaMet integrated correction tool, and the number of labeled n13C was calculated by i=m

n13C =

∑ fM i i=0

i

(2)

where m is the metabolites number of carbon atoms. For fitting of first order kinetics two different models were applied, the classical first order kinetic switch response



EXPERIMENTAL SECTION LC−HRMS Data. A first LC−MS data set was generated from samples taken previously from dynamic labeling experiments using B. methanolicus MGA3 grown in bioreactors (for details see Müller et al.17). The samples comprised two biological replicates of time course labeling experiments using 13 C methanol with samples taken 0, 5, 10, 20, 30, 60, 120, 300, and 600 s after the carbon source switch. Prior to quenching, cells were separated from the medium by fast filtration. Metabolites were extracted with an acidified organic solution, freeze-dried, and stored at −20 °C until analysis. For the present study, samples were spiked with a here established reference set of compounds resulting in ions largely distributed over a measured m/z range and RT range (for details about composition see Table S-1). LC−HRMS analysis was performed applying a nanoscale ion-pair reversed-phase HPLC−MS method18 with an LTQ-Orbitrap instrument operating in negative FT mode at unit resolution of 60 000 (at m/z 400) and with a scan range of 150 ≤ m/z ≤ 850. After acquisition, raw data were converted into mzML format with a converter suited for the instrument’s raw data format.19 LC− MS data of all data sets can be downloaded at MetaboLights20 (MTBL S228, S229, http://www.ebi.ac.uk/metabolights). A second data set comprised of two biological replicates of E. coli upon switching to [U-13C] glucose was generated for this study. Cells were grown in baffled shake flasks in M9 minimal medium (37 °C, 120 rpm, Infors HT Minitron, Bottmingen, Switzerland) and 0.5 mL of exponentially growing cells (OD600 ∼ 1) were added to 4.5 mL prewarmed (37 °C) medium containing [U-13C] glucose (99%, Cambridge Isotope Laboratories) at the same concentration compared to the natural labeled glucose present at the sampling time (∼17

y = k(1 − e−t / T )

(3)

and a logistic fit model to take into account time delay in labeling incorporation y=

ky0 etT k + y0 etT − y0

(4)

To compare time constants of both parameters, half time T50 was calculated corresponding to the time needed to exchange half of 12C atoms of a metabolite pool by 13C in case of first order fit by

T50 = −T ln 0.5

(3a)

Whereas with logistic fit half time T50 was calculated according to T50 =

k − y0 1 ln T y0

(4a)

We fitted both models and preferred the one with a smaller normalized root-mean-square error (NRMSE). Comparison of Biological Replicates. To find the difference between feature sets from two sample data sets, we used eMZed2’s data structure table. Two peaks were defined to be identical when absolute m/z difference and retention time differences were within defined ranges. Since we observed mass accuracies