Subscriber access provided by SUNY DOWNSTATE
Article
Improving Natural Products Identification through Targeted LCMS/MS in an Untargeted Secondary Metabolomics Workflow Thomas Hoffmann, Daniel Krug, Stephan Hüttel, and Rolf Müller Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/ac502805w • Publication Date (Web): 03 Oct 2014 Downloaded from http://pubs.acs.org on October 14, 2014
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
Analytical Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 27
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
Improving Natural Products Identification through Targeted LC-MS/MS in an Untargeted Secondary Metabolomics Workflow Thomas Hoffmann, Daniel Krug, Stephan Hüttel, Rolf Müller* Helmholtz Institute for Pharmaceutical Research Saarland, Helmholtz Centre for Infection Research and Department of Pharmaceutical Biotechnology, Saarland University, Building C 2.3, D-66123 Saarbrücken, Germany KEYWORDS Mass spectrometry, precursor-directed MS/MS, secondary metabolites, metabolomics, myxobacteria ABSTRACT Tandem mass spectrometry is a widely applied and highly sensitive technique for the discovery and characterization of microbial natural products such as secondary metabolites from myxobacteria. Here, a data mining workflow based on MS/MS precursor lists targeting only signals related to bacterial metabolism is established using LC-MS data of crude extracts from shaking flask fermentations. The devised method is not biased towards specific compound classes or structural features and is capable of increasing the information content of LC-MS/MS analyses by directing fragmentation events to signals of interest. The approach is thus contrary to typical auto-MS² setups where precursor ions are usually selected according to signal intensity,
ACS Paragon Plus Environment
1
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 2 of 27
which is regarded as a drawback for metabolite discovery applications when samples contain many overlapping signals and the most intense signals do not necessarily represent compounds of interest. In line with this, the method described here achieves improved MS/MS scan coverage for low-abundance precursor ions not captured by auto-MS² experiments and thereby facilitates the search for new secondary metabolites in complex biological samples. To underpin the effectiveness of the approach, the identification and structure elucidation of two new myxobacterial secondary metabolite classes is reported.
Introduction Searching for novel secondary metabolites in microbial sources is a laborious task that frequently involves the LC-MS analysis of complex crude extracts derived from shaking flask cultivations or bioreactor fermentations. Secondary metabolites are much sought-after - but often minor - constituents of these complex mixtures and identifying them is a challenging key step in natural products research.1–3 While targeted approaches focused on the identification and quantification of known compounds based on LC-MS and MS/MS data are well established,4,5 untargeted methods are fairly new to the rapidly developing field of mass spectrometry-driven natural products discovery. In particular, comprehensive data mining and computational methods for automated analysis of MS/MS spectra for secondary metabolomics are currently still at an early stage of development when compared to other –omics platform technologies. Examples of untargeted approaches include the use of multivariate statistical tools such as principal component analysis (PCA) to classify and prioritize microbial strains according to differences between their secondary metabolomes as determined by high-resolution mass spectrometry6–9 and to pinpoint the effects of genetic knock-out experiments.10–12
ACS Paragon Plus Environment
2
Page 3 of 27
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
Reaching beyond biomarker discovery by statistical approaches, several recently reported methods use fragment spectra of metabolites to classify them in a largely unsupervised fashion. Spectral networking and fragmentation tree alignments may be used for clustering unknowns as well as for similarity-based matching of unknowns to known compounds.13–17 Other methods identify unknowns through direct spectral comparison of MSn spectra or with the help of machine learning techniques.18,19 Furthermore, characteristic fragmentation patterns may be used to recognize metabolites bearing distinct building blocks, thus enabling to search genome sequences for matching biosynthetic machinery.20,21 However, ambitions to use MS/MS data for the de novo prediction of structures are still largely limited to peptide-like molecules due to their well-defined fragmentation behavior.22,23 Despite their different underlying principles and scopes the above-mentioned data analysis techniques share a common prerequisite: their success critically relies on high-quality MS/MS spectra where accurate mass acquisition is beneficial or even mandatory for analysis. While powerful bioinformatic methods are emerging at the downstream end of mass spectrometric analysis workflows, relatively few studies specifically address the issues of improving acquisition and preprocessing of MS/MS spectra in order to increase the information content of small-molecule MS/MS datasets. Specialized methods exist where pre-selection of supposedly relevant MS/MS precursors is achieved by scanning for special isotope ratios, e.g. from labelled scavenger compounds.24 Other approaches are based on indicative neutral losses in MS² spectra upon which the mass spectrometer initiates a MS³ scan.25,26 Although being useful for a range of applications, the main drawback of these methods is the bias introduced by the predefined type of data-dependent acquisition. Alternatively, several proteomics applications implement a concept of hypothesis-based precursor ion selection referred to as inclusion-list driven MS.27
ACS Paragon Plus Environment
3
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 4 of 27
These methods are based on highlighting relevant features in a first pass followed by subsequent targeted fragmentation of these features in a second run in order to improve peptide coverage in shot-gun proteomics.28,29 Similarly, Jaffe et al. introduced a workflow which targets mass spectral features related to differentially expressed proteins for subsequent fragmentation.30 In line with these experimental designs, the focus of this work is to improve the MS/MS data acquisition step by data-dependent precursor selection using a precursor list including ions of potential interest. The main objective is to acquire “reasonable” MS/MS spectra for smallmolecule metabolites so that high MS/MS coverage of putatively interesting features is achieved irrespective of peak abundances (as opposed to automatically fragmenting the most abundant signals). By comparing distinct sample types we create precursor lists covering only signals related to bacterial growth which enables us to focus on a particular subset of signals. Taken together, this method in combination with computational approaches for MS/MS data evaluation could improve the chances for discovery and identification of new natural products.
ACS Paragon Plus Environment
4
Page 5 of 27
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
Figure 1: Overview scheme of a targeted MS/MS approach. (A) After creating a suitable sample set the LC-MS data are processed with a feature finding algorithm and transformed to a bucket table covering all features (B). Filtering this bucket table according to user-specific constraints yields a precursor list (C) that is used for re-measuring the strain-derived extract. The histogram plot (D) shows the distribution of MS2 events based on the created precursor list (retention time binning with 2.5 s class size).
Experimental Section Cultivation and Extraction. Cultivation was done in 300 mL shaking flasks filled with 45 mL of medium, 1 mL of XAD-16 slurry (1 % w/v, PS-DVB resin, Rohm & Haas) and 4 mL of inoculation culture. In case of “blank” media cultivations, 49 mL of medium and 1 mL of XAD-16 slurry was used. The flasks were incubated at 30 °C and 140 rpm for 6 days (So
ACS Paragon Plus Environment
5
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 6 of 27
ceGT47) or 8 days (So ce38), respectively. After the incubation time cultures were centrifuged and the adsorber resin together with cells extracted using 2 x 25 mL of methanol. The combined organic extracts were filtered and dried using a rotational evaporator. Residues were suspended in 1 mL of methanol and centrifuged prior to LC-MS. LC-MS. All measurements were performed with a Dionex Ultimate 3000 RSLC (Thermo, Dreieich, Germany) comprising a high pressure gradient pump (HPG-3400RS) with a 150 µl mixing chamber. A BEH C18, 100 x 2.1 mm, 1.7 µm dp column (Waters, Eschborn, Germany) is used to separate 1 µL sample by a linear gradient from (A) H2O + 0.1 % FA to (B) ACN + 0.1 % FA at a flow rate of 600 µL/min and 45 °C. The gradient is initiated by a 0.5 min isocratic step at 5 % B, followed by an increase to 95 % B in 18 min to end up with a 2 min step at 95 % B before reequilibration under the initial conditions. The LC flow is split to 75 µL/min before entering the maXis 4G hr-QqToF mass spectrometer (Bruker Daltonics, Bremen, Germany) using the Apollo ESI source. The ion source parameters are: capillary, 4000 V; endplate offset, 500 V; nebulizer, 1 bar; dry gas, 5 L/min; and dry gas temperature, 200 °C. Ion transfer parameters are: funnel RF, 350 Vpp; multipole RF, 400 Vpp; quadrupole ion energy, and 5 eV at low m/z 200. Collision cell is set to 8 eV with a collision RF of 2500 Vpp in full scan mode. Ion cooler settings are: transfer time, 90 µs; ion cooler RF, 120 Vpp; and pre puls storage, 5 µs. Mass spectra are acquired in centroid mode ranging from 150 – 2500 m/z at a 2 Hz scan rate in full scan positive ESI mode. Each measurement is started with the injection of a 20 µL plug of basic sodium formate solution introduced by a loop switched into the flow path. The resulting peak is used for automatic internal m/z calibration. In addition, a lock mass (Agilent Chip Cube High Mass HP-1221, Art.# G1982-85001) is used for recalibration of single spectra.
ACS Paragon Plus Environment
6
Page 7 of 27
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
MS/MS settings. Minimum precursor intensity is set to 8000. Full scan spectra are acquired at 2 Hz followed by MS/MS spectra acquisition at variable scan speed ranging from 2 to 6.7 Hz being a function of precursor intensity. CID energy varies linearly from 30, 35, 45 to 55 eV with respect to the precursor m/z from 300, 600, 1000 to 2000 m/z. The ion cooler is set to ramp collision energy (90 – 120 % of the set value) and ion cooler RF from 120 to 80 Vpp for every MS/MS scan. The precursor list is evaluated every 2 seconds to assign the upcoming precursors and precursors were moved to an exclusion list for 0.2 min after two spectra were measured (typical chromatographic peak width was 0.10 – 0.15 min). Data processing. The software used for data processing comprises DataAnalysis 4.1, ProfileAnalysis 2.1, and TargetAnalysis 1.3 (Bruker Daltonics, Bremen, Germany) as well as the KNIME analytics platform (KNIME, Zurich, Switzerland) in combination with a MySQL database. Euler diagrams were created using the eulerAPE 2.0.3 freeware.31
Results and Discussion Experimental design. The principal idea to guide MS/MS precursor selection in this study includes creating a scheduled precursor list for the MS/MS analysis of complex microbial samples through the comparison of two sample types, e.g. “strain” and “blank” samples. Both sample types are handled in exactly the same way throughout the experiment, although with the essential difference that a “strain” sample is inoculated with a microbial culture while a “blank” sample contains no living microorganism. The rationale behind this approach is the basic assumption that differences between “blank” and “strain” samples can be attributed to the growth of the microorganism. To elaborate on this point, measured differences may be actually caused
ACS Paragon Plus Environment
7
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 8 of 27
by a variety of life cycle-related reasons, such as decomposition of nutrients, formation and (partial) secretion of small molecules and proteins, as well as compounds related to their metabolic breakdown. In addition, the complex media components can chemically degrade during the incubation time. Finally, the secondary metabolites made by the microorganism – the most interesting portion of the sample in terms of natural product discovery – add up to this complex mixture. Hence, a comprehensive evaluation of differences between sample types reveals the compounds of interest, i.e. secondary metabolites, as a subset of all differentially occurring signals. The measurable differences are then transcribed into a set of molecular features as defined by retention time, accurate m/z and a response value (intensity or area). This information is used to select exclusively the pre-filtered features as precursors in a subsequent MS/MS analysis run. Consequently, this experimental design requires a higher workload than traditional data-dependent scanning, but in return an important pre-selection step is introduced in the untargeted secondary metabolomics workflow. Data-dependent methods cannot achieve equally efficient precursor selection up-front, as no spectral characteristics pinpoint the ions of interest.
Compiling a MS/MS precursor list. Method development in this work was done using the myxobacterial strain Sorangium cellulosum So ceGT47 grown in liquid P-medium with 6 biological replicates alongside with 6 biological replicates for P-medium “blank” cultures. After measurement of the crude methanolic extracts by LC-MS each file was processed using the find molecular features algorithm (FMF, Bruker Daltonics) to give 4000 – 6000 molecular features in each run for “strain” samples and 2000 – 4000 features for “blank” samples (Table S1). The feature sets of all samples are then processed to create a table covering intensity values of all
ACS Paragon Plus Environment
8
Page 9 of 27
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
features referred to as a bucket table. This type of processing was done using the ProfileAnalysis software and is basically a two-dimensional binning of molecular features across all samples with respect to retention time and m/z value (Figure 2, A). Note that intensity values are not binned but transferred as distinct entries to the bucket table.
Figure 2: (A) Buckets of predefined RT and m/z range are created around each feature. All samples are checked for features matching the bucket constraints upon which the respective intensity values are saved within the bucket table. (B) Number of strain-specific buckets in relation to the number of biological and technical replicates used for analysis. The bucket count approaches a basal level as a consequence of reducing false positives and noise when using larger sample sets.
We assumed a notable number of the underlying features - and thus the created buckets – being related to noise and aberrant bucketing. Consequently, a reasonable bucketing of samples should consider replicate measurements to overcome this. To probe this effect, variable numbers of both biological and technical replicate measurements were used for bucketing followed by plotting the number of strain-specific buckets against the number of replicates (Figure 2, B). The decrease in
ACS Paragon Plus Environment
9
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 10 of 27
buckets is not linear but rather approaches a basal level which indicates that the number of false positives is efficiently reduced. We decided to continue this study with data sets consisting of three samples per set with each sample measured twice (3R-3R data set). The resulting 3R-3R bucket tables are filtered to end up with only those buckets that are found in “strain” measurements but never in “blank” measurements, i.e. buckets which are present in every measured “strain” sample fulfill the condition “strain”/”blank” = 6/0. We also considered buckets that match a 5/0 and 4/0 condition, based on practical considerations since the 5/0 condition can mitigate the influence of one failed measurement or processing, e.g. if it faced technical problems. A 4/0 condition even allows for additional drop-out of a biological sample, e.g. owing to inappropriate growth of one individual culture. Based on this we created three different scheduled precursor lists (SPLs) for each bucket table named “only-6”, “6+5”, and “6+5+4” referring to the minimum “strain” count that is used for filtering the bucket table. An intensity cut-off was applied to all SPL entries to eliminate very low abundant precursors (Supporting Information). The compiled lists include all information necessary to perform precursor selection for MS/MS analysis. All molecular features that are most likely related to bacterial metabolism are part of this SPL. The size of SPLs spanned from 459 to 701 to 780 entries when creating an “only-6”, “6+5”, or “6+5+4” list, respectively, for a So ceGT47 Pmedium sample set. A histogram view of the “SPL_only-6” list presented in Figure 1D illustrates in which time slot acquisition of MS/MS spectra will be triggered. Based on the results of these preliminary experiments using one sample set, additional 3R-3R sample sets were produced covering three different media compositions for both S. cellulosum So ceGT47 and S. cellulosum So ce38. For all data sets, SPL lists were created and used for subsequent MS/MS measurements (Table S3).
ACS Paragon Plus Environment
10
Page 11 of 27
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
MS/MS data acquisition. With these precursor lists in hand we set out to measure samples in SPL-MS² and in auto-MS² mode in order to evaluate whether precursor-directed analyses confer a benefit. Noteworthy, the samples used for MS/MS analysis are always a mixture of the three strain extracts used to create the bucket table. This enables analysis of 4/0 bucket scenarios as those may be related to a feature physically missing in one of the three samples. Ideally, the instrument would be able to select all desired precursors and perform the requested highresolution MS/MS measurements. However, complete coverage of the SPL is for technical reasons not achievable since time slots may cover more MS/MS triggers than the instrument duty cycle. In this study, the origin of such time slots overloaded with SPL entries is mostly attributed to compounds susceptible to strong in-source fragmentation as well as adduct formation. In principle, applying MS/MS fragmentation to those in-source-derived fragments and adducts could increase confidence for compound identification, but on the other hand these data add redundancy to the dataset. Hence, restricting the number of precursors in a duty cycle could help to increase coverage for diverse compounds along with avoiding redundant information. For the mass spectrometer used we evaluated the effect of different settings and eventually limited the number of precursors per full scan to a maximum of 4. With this restriction, some of the listed precursors in a particular time slot may be left out in favor of better coverage of adjacent time slots (Table S5). Taken together, the devised precursor-directed MS/MS method was set to allow up to 4 precursors using a full scan acquisition time of 2 Hz followed by MS/MS spectra acquisition at variable scan speeds based on precursor intensity (1.5 – 6.8 Hz). This setting results in duty cycles ranging from 1.1 to 2.5 seconds. Note that auto-MS² mode always selects the 3 most abundant m/z of a full scan and fragments those using the same scan speed settings as for the
ACS Paragon Plus Environment
11
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 12 of 27
SPL mode. Hence, the only significant difference in these experiments is the approach taken for precursor selection. The effect of using a precursor list becomes evident when “blank” samples are measured under control of a SPL-MS² method which results in few MS/MS scans being initiated (Table 1).
Table 1: Comparison of blank and strain samples when measured with a SPL-only6 method. The table shows the number of MS/MS events measured in each run and all MS/MS events that are related to a SPL entry, no matter if several scans belong to identical precursors. The SPL coverage is the percentage of SPL-based precursors being covered by these MS/MS events. Values are average values of duplicate measurements.
Sample
all MS/MS events
MS/MS events related to SPL entries
SPL size
SPL coverage
blank, P medium
56
23
459
4%
So ceGT47, P medium
683
583
459
76%
blank, A medium
66
37
483
2%
So ceGT47, A medium
651
514
483
68%
blank, M medium
116
39
485
5%
So ceGT47, M medium
586
503
485
70%
blank, H medium
52
13
436
2%
So ceGT47, H medium
647
566
436
78%
Evaluation of SPL-derived data. The efficacy of the precursor-selection method was evaluated by checking how many of the listed precursors (SPL) were actually used as precursors for fragmentation in both SPL-MS² and auto-MS² measurements. We named this value SPL
ACS Paragon Plus Environment
12
Page 13 of 27
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
coverage throughout this work. As some precursors were fragmented twice, we introduce another measure called MS/MS efficiency which is the percentage of MS/MS events that addressed entries of the precursor list even if the same precursor is fragmented several times. For this reason the MS/MS efficiency may be higher than the SPL coverage. The complete processing is accomplished by a combination of Bruker DataAnalysis tools, the KNIME software package, and a MySQL database (Supporting Information). In particular, we exported all precursors of a LC-MS/MS run – including retention time and m/z value of the parent ion – and checked each pair for a matching pair in the used precursor list. This was done within a given retention time and m/z window set around each SPL entry. The same type of query was repeated using a list of known compounds instead of the SPL, aiming to evaluate how many known compounds were actually fragmented. The known analytes were identified for each crude extract type by querying an in-house database using the TargetAnalysis software. Both, SPL-MS² and auto-MS²-based LC-MS/MS data sets are processed the same way using the respective SPL and lists of knowns as references. Figure 3 depicts the results of this analysis by means of Euler diagrams. For each SPL used, the SPL-MS² data covers significantly more of the SPL entries compared to auto-MS²-derived data, i.e. when comparing right and left sides of Figure 3. At the same time the overall number of MS/MS events is reduced, affording a significantly improved MS/MS efficiency.
ACS Paragon Plus Environment
13
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 14 of 27
Figure 3: The effect of using different SPL for the same sample represented by Euler diagrams. The overall areas as well as the overlapping areas are size-proportional to the values that contribute to the respective field. Data derived from S. cellulosum So ceGT47 P-medium data sets consisting of three strain and three blank samples. For each Euler diagram, indicated values are averages of three technical replicates.
ACS Paragon Plus Environment
14
Page 15 of 27
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
Furthermore, this result shows that by increasing the size of the SPL, the SPL coverage is increased in a comparable ratio (as indicated by the central overlapping areas in the Euler diagrams). A SPL-MS² run with “SPL-only6” addresses 350 out of 459 SPL entries (76 %). By using “SPL-6+5” and “SPL-6+5+4” the SPL size is increased by a factor of 1.5 and 1.7, respectively, whereas the number of MS/MS scans is increased by 1.4 and 1.5 fold. At the same time the SPL coverage remains high with more than 65 % for both. Similar results are derived for the additional sample sets used in this work, i.e. the strains Sorangium cellulosum So ceGT47 and So ce38 grown in different cultivation media (Figure S5, S6). From these results it can be concluded that the significantly increased SPL size is adequately processed by the instrument. This constitutes a remarkable difference to the auto-MS² method which addressed only 35 to 40 % of the SPL entries while conducting more than double the number of MS/MS events. In terms of known compounds, no significant difference is observed between acquisition methods; this result is likely due to the fairly high abundance of known analytes found in the crude extracts used here. We reasoned that the difference in SPL coverage between SPL-MS² and auto-MS² is due to the inclusion of low abundant precursor ions in SPL-MS². To verify this assumption, the intensity distribution of the measured precursor ions was evaluated by creating an intensity-based histogram. We created such histograms for a “SPL-only6” and a “SPL-6+5+4” measurement as these significantly differ in overall SPL size as well as in SPL coverage (Figure 4).
ACS Paragon Plus Environment
15
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 16 of 27
Figure 4: Overlay displays of MS/MS event histograms for auto-MS² and SPL-MS² data. The additional MS/MS events of a SPL-MS² run compared to an auto-MS² run are usually found for precursor intensities of less than 1x105 counts (highlighted area within the histograms).
The bars in Figure 4 reflect the number of measured precursors within a certain precursor intensity range. When using SPL-MS² the selection of low abundant precursor ions is highly increased as depicted by the blue bars. More precisely, the additional MS/MS events are related to precursor ions with intensities less than 1x105 as can be seen by the difference of blue and
ACS Paragon Plus Environment
16
Page 17 of 27
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
grey bars in the highlighted area. A similar trend is observed for large precursor lists when using the “SPL-6+5+4” list as shown in Figure 4 B. The question arises, however, whether MS/MS scans for precursors of lower abundance are useful in terms of spectra quality. Crude “strain” extracts were diluted with “blank” extracts in order to lower the concentration of known secondary metabolites while maintaining the complexity of the sample. These diluted samples were measured under control of a SPL to obtain MS/MS spectra for low abundant known compounds with a substantial number of indicative fragment signals that matched respective reference spectra as exemplified in Figure 5 (see Supporting Information as well). Hence, we can conclude that selection of low abundant precursor ions is feasible for the instrumental platform used in this study.
Figure 5: (A) Using SPL-MS² enables MS/MS acquisition of the low abundant peak of microsclerodermin M in a S. cellulosum So ce38 H-medium extract. With an intensity of 9000 counts this peak is at the lower limit of the typical intensity range of 2000 – 1.25*106 counts. (B)
ACS Paragon Plus Environment
17
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 18 of 27
The MS/MS spectra quality as derived from this single scan is evaluated by comparison to a reference spectrum (C) with matching m/z values highlighted.
New secondary metabolites - Lipothiazoles. Following the overall idea of highlighting and identifying novel natural products using this method, we set out to search for new secondary metabolites in crude extracts of So ceGT47. By using the bucket table as input for principal component analysis, we identified several unknown molecular features which are related to compounds produced by this strain (Figure 6). The corresponding MS/MS data suggested a Cterminal dipeptide moiety which was ultimately proven by isolating and characterizing the two most abundant compounds, named lipothiazole A and B, by means of NMR and hr-MS analysis (Figure 6, A). We further tentatively identified two additional major and five minor derivatives of the new lipothiazole family by accurate m/z and similar fragmentation patterns (Supporting Information). The SPL-MS² data facilitated the search for these derivatives as MS/MS data was available even for the minor peaks of this class.
ACS Paragon Plus Environment
18
Page 19 of 27
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
Figure 6: (A) Structure of lipothiazole A and B. (B) PCA loadings plot from a So ceGT47 data set (log transformation, Pareto scaling). Zoomed-in view highlights the impact of lipothiazolerelated features on the model (blue dots). (C) Base peak chromatogram of a So ceGT47 H-medium extract. (D) Parts of the lipothiazole biosynthetic gene cluster found by retrobiosynthetic analysis. FACL, fatty acyl CoA-ligase; PCP, peptidyl carrier protein; HC, condensation domain forming a heterocycle; A(cys), adenylation domain specific for cysteine; Ox, oxidation domain.
After full structure elucidation we were able to map the N-terminal part of the molecule to parts of a biosynthesis cluster found in the genome, thereby providing an entry point for the elucidation of biosynthesis of this compound class (Figure 6, D).
ACS Paragon Plus Environment
19
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 20 of 27
New secondary metabolites – ε-amidated tripeptides. When crude extracts of Sorangium cellulosum So ceGT47 were tested for bioactivity, all of them showed an antimicrobial effect against Staphylococcus aureus. Since this activity is of particular interest the crude extract was separated by LC-based fractionation to track down the compounds of interest. One region showing activity was located between 4.75 and 4.9 minutes with no distinct peaks standing out in the BPC (Figure 7, A). Searching the SPL in the respective time window resulted in a list of features supposedly related to bacterial growth. In addition, a PCA of the features found between 4 to 6 min highlighted the same compounds in the loadings plot (Figure 7, B).
Figure 7: (A) Bioactivity-guided fractionation of So ceGT47 H-medium extract revealed a fraction being active against S. aureus between 4.75 and 4.90 min in a chromatographic separation. (B) SPL features present in this time window as well as a PCA point toward a set of
ACS Paragon Plus Environment
20
Page 21 of 27
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
potential candidate compounds with the feature [4.72 min; 408.21309 m/z] highlighted here. (C) This feature corresponds to the new ε-amidated tripeptide SH-407 as characterized by subsequent purification and structure elucidation. A detailed analysis of the respective region revealed two isobaric compounds with 408.21309 m/z of which the more abundant one was isolated and structure elucidated by NMR. We thereby identified a new compound class of ε-amidated tripeptides as depicted in Figure 7 C. Based on MS/MS patterns we tentatively identified - and subsequently proved by isolation and NMR analysis - two additional members of this compound class (Supporting Information). Again the capability of this method to generate MS/MS spectra for relevant ions facilitated compound identification. In summary, this method allows tracking down signals in complex chromatographic regions and thus helps to identify new secondary metabolites.
Conclusions Following a comprehensive natural products screening workflow, the work presented here exemplifies how LC-MS data is processed and filtered to extract molecular features related to the secondary metabolism of a microorganism. The approach specifically addresses “relevant” precursor ions irrespective of their intensity, a fact of notable importance considering the wide concentration range of metabolites in complex biological samples. Modern instrumental platforms such as the hr-QqTOF device used here allow highly accurate and highly resolved MS/MS spectra but the sequential acquisition of spectra can become a time consuming task. Thus acquisition time should be optimized for the subset of relevant precursors and not spent on signals unlikely to contribute valuable information. This pre-filtering is considered advantageous, for example when complex cultivation media is used which will certainly add a
ACS Paragon Plus Environment
21
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 22 of 27
plethora of matrix signals to crude extracts. Such nutrient-related molecules can exhibit secondary metabolite-like structures and fragmentation behavior, thereby introducing a bias to downstream clustering algorithms as these “irrelevant” signals easily outnumber compounds of interest. When working with pre-filtered precursor lists, the respective MS/MS data sets and thus the computational methods applied downstream will exclusively focus on relevant data. With respect to this, we also showed that the quality of SPL-MS²-derived data is sufficient even when low abundant precursor ions were selected for MS/MS. In addition, this approach facilitates a diverse precursor selection due to the ability to filter the SPL according to specific needs, e.g. by avoiding abundant peaks or by focusing on a certain m/z range. We may also use these lists as a basis to identify compounds in overloaded chromatographic regions which is particularly useful when searching for bioactive compounds in the very first steps of bioactivity-guided fractionation.11 In conclusion, careful selection of MS² spectra input is instrumental for increasing the significance of the results from unsupervised methods of computational analyses, thereby improving the utility of such tools for natural products screening in the future. Directing MS/MS spectra acquisition to pre-filtered compounds of interest aids both the classification and identification of unknowns and their prioritization for further characterization as they move along the natural product discovery pipeline. In light of the remarkable progress made over the past couple of years regarding the capabilities for genome mining, instrumental analytics, and computational tools, the result of modern natural products discovery workflows relies more than ever on the tight interplay of all involved methods and the analytical data used as input for these methods.32,33
ACS Paragon Plus Environment
22
Page 23 of 27
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
ASSOCIATED CONTENT Supporting Information. Media recipes, results for all data sets, selected MS/MS spectra, NMR data, and detailed information on compound isolation is provided in the supporting information. This material is available free of charge via the Internet at http://pubs.acs.org. AUTHOR INFORMATION Corresponding Author Prof. Dr. Rolf Müller Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Center for Infection Research and Pharmaceutical Biotechnology, Universität des Saarlandes Campus C2.3, 66123 Saarbrücken, Germany Fax: +49 (0)681 302-70202 E-mail:
[email protected] Author Contributions The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript. Funding Sources Research in R.M’s laboratory was funded by the Bundesministerium für Bildung und Forschung and the Deutsche Forschungsgemeinschaft. ACKNOWLEDGMENT
ACS Paragon Plus Environment
23
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 24 of 27
We thank Kevin Sours for editing the manuscript and Viktoria Schmitt and Jennifer Herrmann for performing the bioactivity assays.
ABBREVIATIONS SPL, scheduled precursor list; SPL-MS², precursor-directed LC-MS/MS method using a SPL;
REFERENCES
(1)
Müller, R.; Wink, J. Int. J. Med. Microbiol. 2014, 304, 3–13.
(2)
Krug, D.; Müller, R. Nat. Prod. Rep. 2014, 31, 768–783.
(3)
Hooft, J. J. J.; Vos, R. C. H.; Ridder, L.; Vervoort, J.; Bino, R. J. Metabolomics 2013, 1009–1018.
(4)
El-Elimat, T.; Figueroa, M.; Ehrmann, B. M.; Cech, N. B.; Pearce, C. J.; Oberlies, N. H. J. Nat. Prod. 2013, 76, 1709–1716.
(5)
Klitgaard, A.; Iversen, A.; Andersen, M. R.; Larsen, T. O.; Frisvad, J. C.; Nielsen, K. F. Anal. Bioanal. Chem. 2014, 406, 1933–1943.
(6)
Krug, D.; Zurek, G.; Revermann, O.; Vos, M.; Velicer, G. J.; Müller, R. Appl. Environ. Microbiol. 2008, 74, 3058–3068.
(7)
Krug, D.; Zurek, G.; Schneider, B.; Garcia, R.; Müller, R. Anal. Chim. Acta 2008, 624, 97–106.
(8)
Hou, Y.; Braun, D. R.; Michel, C. R.; Klassen, J. L.; Adnani, N.; Wyche, T. P.; Bugni, T. S. Anal. Chem. 2012, 84, 4277–4283.
(9)
Farag, M. A.; Weigend, M.; Luebert, F.; Brokamp, G.; Wessjohann, L. A. Phytochemistry 2013.
(10)
Vinayavekhin, N.; Saghatelian, A. ACS Chem. Biol. 2009, 4, 617–623.
(11)
Cortina, N. S.; Krug, D.; Plaza, A.; Revermann, O.; Müller, R. Angew. Chem. Int. Ed. Engl. 2012, 51, 811–816.
(12)
Asamizu, S.; Abugreen, M.; Mahmud, T. Chembiochem 2013, 14, 1548–1551.
ACS Paragon Plus Environment
24
Page 25 of 27
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
(13)
Yang, J. Y.; Sanchez, L. M.; Rath, C. M.; Liu, X.; Boudreau, P. D.; Bruns, N.; Glukhov, E.; Wodtke, A.; de Felicio, R.; Fenner, A.; Wong, W. R.; Linington, R. G.; Zhang, L.; Debonsi, H. M.; Gerwick, W. H.; Dorrestein, P. C. J. Nat. Prod. 2013, 76, 1686–1699.
(14)
Rasche, F.; Svatos, A.; Maddula, R. K.; Böttcher, C.; Böcker, S. Anal. Chem. 2011, 83, 1243–1251.
(15)
Rasche, F.; Scheubert, K.; Hufsky, F.; Zichner, T.; Kai, M.; Svatoš, A.; Böcker, S. Anal. Chem. 2012, 84, 3417–3426.
(16)
Nguyen, D. D.; Wu, C.-H.; Moree, W. J.; Lamsa, A.; Medema, M. H.; Zhao, X.; Gavilan, R. G.; Aparicio, M.; Atencio, L.; Jackson, C.; Ballesteros, J.; Sanchez, J.; Watrous, J. D.; Phelan, V. V; van de Wiel, C.; Kersten, R. D.; Mehnaz, S.; De Mot, R.; Shank, E. a; Charusanti, P.; Nagarajan, H.; Duggan, B. M.; Moore, B. S.; Bandeira, N.; Palsson, B. Ø.; Pogliano, K.; Gutiérrez, M.; Dorrestein, P. C. Proc. Natl. Acad. Sci. U. S. A. 2013, 110, E2611–20.
(17)
Watrous, J.; Roach, P.; Alexandrov, T.; Heath, B. S.; Yang, J. Y.; Kersten, R. D.; van der Voort, M.; Pogliano, K.; Gross, H.; Raaijmakers, J. M.; Moore, B. S.; Laskin, J.; Bandeira, N.; Dorrestein, P. C. Proc. Natl. Acad. Sci. U. S. A. 2012, 109, E1743–52.
(18)
Sheldon, M. T.; Mistrik, R.; Croley, T. R. J. Am. Soc. Mass Spectrom. 2009, 20, 370–376.
(19)
Heinonen, M.; Shen, H.; Zamboni, N.; Rousu, J. Bioinformatics 2012, 28, 2333–2341.
(20)
Kersten, R. D.; Yang, Y.-L.; Xu, Y.; Cimermancic, P.; Nam, S.-J.; Fenical, W.; Fischbach, M. A.; Moore, B. S.; Dorrestein, P. C. Nat. Chem. Biol. 2011, 7, 794–802.
(21)
Kersten, R. D.; Ziemert, N.; Gonzalez, D. J.; Duggan, B. M.; Nizet, V.; Dorrestein, P. C.; Moore, B. S. Proc. Natl. Acad. Sci. U. S. A. 2013.
(22)
Ng, J.; Bandeira, N.; Liu, W.-T.; Ghassemian, M.; Simmons, T. L.; Gerwick, W. H.; Linington, R.; Dorrestein, P. C.; Pevzner, P. A. Nat. Methods 2009, 6, 596–599.
(23)
Kavan, D.; Kuzma, M.; Lemr, K.; Schug, K. A.; Havlicek, V. J. Am. Soc. Mass Spectrom. 2013, 24, 1177–1184.
(24)
Lim, H.-K.; Chen, J.; Cook, K.; Sensenhauser, C.; Silva, J.; Evans, D. C. Rapid Commun. Mass Spectrom. 2008, 22, 1295–1311.
(25)
Yao, M.; Ma, L.; Humphreys, W. G.; Zhu, M. J. Mass Spectrom. 2008, 43, 1364–1375.
(26)
Rochfort, S. J.; Trenerry, V. C.; Imsic, M.; Panozzo, J.; Jones, R. Phytochemistry 2008, 69, 1671–1679.
(27)
Schmidt, A.; Claassen, M.; Aebersold, R. Curr. Opin. Chem. Biol. 2009, 13, 510–517.
ACS Paragon Plus Environment
25
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 26 of 27
(28)
Schmidt, A.; Gehlenborg, N.; Bodenmiller, B.; Mueller, L. N.; Campbell, D.; Mueller, M.; Aebersold, R.; Domon, B. Mol. Cell. Proteomics 2008, 7, 2138–2150.
(29)
Rudomin, E. L.; Carr, S. a; Jaffe, J. D. J. Proteome Res. 2009, 8, 3154–3160.
(30)
Jaffe, J. D.; Mani, D. R.; Leptos, K. C.; Church, G. M.; Gillette, M. a; Carr, S. a. Mol. Cell. Proteomics 2006, 5, 1927–1941.
(31)
Micallef, L.; Rodgers, P. PLoS One 2014, 9, e101717.
(32)
Hufsky, F.; Scheubert, K.; Böcker, S. Nat. Prod. Rep. 2014, 31, 807–817.
(33)
Bouslimani, A.; Sanchez, L. M.; Garg, N.; Dorrestein, P. C. Nat. Prod. Rep. 2014, 31, 718–729.
For Table of Contents Only
ACS Paragon Plus Environment
26
Page 27 of 27
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
(A) Buckets of predefined RT and m/z range are created around each feature. All samples are checked for features matching the bucket constraints upon which the respective intensity values are saved within the bucket table. (B) Number of strain-specific buckets in relation to the number of biological and technical replicates used for analysis. The bucket count approaches a basal level as a consequence of reducing false positives and noise when using larger sample sets. 170x54mm (300 x 300 DPI)
ACS Paragon Plus Environment