Subscriber access provided by UNIVERSITY OF SASKATCHEWAN LIBRARY
Article
DecoMetDIA: Deconvolution of Multiplexed MS/MS Spectra for Metabolite Identification in SWATH-MS based Untargeted Metabolomics Yandong Yin, Ruohong Wang, Yuping Cai, Zhuozhong Wang, and Zheng-Jiang Zhu Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.9b02655 • Publication Date (Web): 22 Aug 2019 Downloaded from pubs.acs.org on August 23, 2019
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
DecoMetDIA: Deconvolution of Multiplexed MS/MS Spectra for Metabolite Identification in SWATH-MS based Untargeted Metabolomics Yandong Yin1, †, Ruohong Wang1,2 †, Yuping Cai1, Zhuozhong Wang1, and Zheng-Jiang Zhu1,* 1 Interdisciplinary
Research Center on Biology and Chemistry, Shanghai Institute of Organic
Chemistry, Chinese Academy of Sciences, Shanghai, 200032 China 2 University
of Chinese Academy of Sciences, Beijing, 100049 China
Corresponding Author:
Dr. Zheng-Jiang Zhu
*e-mail:
[email protected], Phone: 86-21-68582296 Author Contributions: †, These authors contributed equally
ABSTRACT SWATH-MS based data independent acquisition mass spectrometry (DIA-MS) technology has been recently developed for untargeted metabolomics due to its capability to acquire all MS2 spectra and high quantitative accuracy. However, software tools for deconvolving multiplexed MS/MS spectra from SWATH-MS with high efficiency and high quality are still lacking in untargeted metabolomics. Here, we developed a new software tool, namely, DecoMetDIA, to deconvolve multiplexed MS/MS spectra for metabolite identification and support the SWATH based untargeted metabolomics. In DecoMetDIA, it selected multiple model peaks to model the co-eluted and unresolved chromatographic peaks of fragment ions in multiplexed spectra, and decomposed them into a linear combination of the model peaks. DecoMetDIA enabled to reconstruct the MS2 spectra of metabolites from a variety of different biological samples with high coverages. We also demonstrated that the deconvolved MS2 spectra from DecoMetDIA were of high accuracy through the comparison to the experimental MS2 spectra from data dependent acquisition (DDA). Finally, about 90% of deconvolved MS2 spectra in various biological samples were successfully annotated using software tools such as MetDNA and Sirius. The results demonstrated that the deconvolved MS2 spectra obtained from DecoMetDIA were accurate and valid for metabolite identification and structural elucidation. The comparison of DecoMetDIA to other deconvolution software such as MS-DIAL demonstrated that it performs very well for small polar metabolites. The DecoMetDIA software is freely available on the Internet (https://github.com/ZhuMSLab/DecoMetDIA). 1 ACS Paragon Plus Environment
Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 2 of 17
KEYWORDS SWATH-MS; spectral deconvolution; untargeted metabolomics; multiplexed MS/MS; metabolite identification.
INTRODUCTION Untargeted metabolomics comprehensively measures metabolites in a large-scale to provide the functional characterization of physiological and pathological status of a given cellular or biological system, and offers the mechanistic insights towards the biological phenotypes1-4. Liquid chromatography-mass spectrometry (LC-MS) is one of the most commonly used techniques for untargeted metabolomics5-8. Data dependent acquisition (DDA) and data independent acquisition (DIA) techniques are two common data acquisition methods in LC-MS based untargeted metabolomics9-11. In DDA, all precursor ions are first measured to generate MS1 spectra. Then, MS1 ions are sorted by their intensities, and sequentially isolated for fragmentation to acquire their corresponding MS/MS (MS2) spectra. In DDA, the precursor MS1 ions and their MS2 spectra are directly linked. Metabolite identification is achieved by matching experimental MS2 spectrum with those
from
the
standard
spectral
library
such
as
METLIN12,
MassBank13,
MoNA
(http://mona.fiehnlab.ucdavis.edu/), etc. However, DDA technique is suffered from the low acquisition coverage of MS2 spectra and the undefined MS2 spectral quality since precursor ions are not always isolated and fragmented at the peak apex. Our previous report demonstrated that less than 60% of MS1 ions were selected to acquire MS2 spectra in one DDA analysis14. These limitations compromise its application in untargeted metabolomics and the following biological research. In contrast, DIA techniques have been recently developed for untargeted metabolomics due to its capability to acquire all MS2 spectra and high quantitative accuracy15-22. For example, SWATH based DIA technique sequentially isolates all precursor ions within a predefined wide mass range (e.g., 25 Da) to generate multiplexed MS2 spectra for all precursor ions in one analysis23. Therefore, SWATH-MS technique enables to select either MS1 or MS2 ions for metabolite quantification, which increases the quantitative accuracy15,16,23,24. However, the direct link between MS1 and MS2 ions in multiplexed MS2 spectra is missing. Shared fragments from the co-isolated precursor ions increased the complexity of multiplexed MS2 spectra. Both reasons present a great challenge to process and reconstruct MS2 spectrum for metabolite identification in SWATH-MS15,16. In general, targeted metabolite extraction and MS/MS spectral deconvolution are two major strategies for metabolite identification in SWATH-MS based untargeted metabolomics15,16. 2 ACS Paragon Plus Environment
Page 3 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
Previously, our group developed a software tool, MetDIA19, for targeted extraction of metabolites from multiplexed MS/MS spectra generated from SWATH-MS. MetDIA approach considered each metabolite as an analysis target. In MetDIA, both MS1 and MS2 ion chromatograms for each metabolite in the in-house spectral library were extracted to generate the peak groups and pseudo MS2 spectra for metabolite identification. Then, the metabolite-centric identification was performed by calculating two orthogonal scores: peak-peak correlation (PPC) and spectrum-spectrum similarity (SSM). This metabolite-centric strategy provided accurate and sensitive identification of metabolites included in the in-house spectral library. However, it is suffered from the small coverage and size of the in-house spectral library. In MetDIA, no deconvolved MS2 spectra were provided to perform metabolite identification using external spectral libraries. As a comparison, MetaboDIA20, developed by Chen et al., performed a large-scale and targeted extraction using an external spectral library, which was generated from either the DDA dataset or DIA dataset. The extracted MS2 ions from DIA datasets can be used for the metabolite quantifications. In addition, the metabolite identification was also performed by the spectral similarity match with the external spectral libraries. Instead, MS-DIAL18 developed by Tsugawa et al. was the first MS/MS spectral deconvolution-based software tool for SWATH-MS based untargeted metabolomics. It employed an MS/MS spectral deconvolution method (MS2Dec) modified from a commonly used deconvolution algorithm in GC-MS25 and reconstructed the MS2 spectra by linearly decomposing the original chromatograms with two adjacent peaks on both sides of the objective peak (3 model peaks in total). This algorithm calculated the coefficients with linear algebra, and had a high computational efficiency for data processing. However, this method could not fully model the complexity of chromatograms for the shared fragments from co-isolated precursor ions. The sensitivity of the peak spotting algorithm was limited and affected by the smoothing method and settings. The usage of MS-DIAL is only available for Windows system, and limits its application in processing large-scale metabolomics datasets with hundreds to thousands of samples. The software tools for DIA based untargeted metabolomics with high efficiency and high quality are still lacking. In this work, we developed a new deconvolution algorithm, namely, DecoMetDIA (Figure 1), to deconvolve multiplex MS/MS spectra for metabolite identification and support the SWATH-MS based untargeted metabolomics. In DecoMetDIA, multiple model peaks were first selected to facilitate modeling the complexed co-elution conditions in chromatograms of fragment ions. Then the chromatographically unresolved EIC peaks were decomposed into a linear combination of the model peaks and the MS2 spectrum of the targeted analyte was reconstructed. Finally, metabolites were identified through matching the deconvolved MS2 spectra with external spectral libraries and other software tools such as MetDNA26 and Sirius27. Here, we also demonstrated that DecoMetDIA reconstructed the precursor-fragment ion relationships with high
3 ACS Paragon Plus Environment
Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 4 of 17
accuracy through the comparison between the deconvolved MS2 spectra from SWATH-MS and experimental MS2 spectra from DDA-MS. We also validated that DecoMetDIA enabled to provide high confident metabolite identification compared to DDA-MS and other DIA processing software such as MS-DIAL.
Figure 1. The schematic illustration of DecoMetDIA workflow. (a) MS1 peak detection and alignment; (b) extraction of multiplexed MS2 spectrum & ion chromatograms; (c) MS2 peak detection and determination of components; (d) model peak selection & deconvolution; (e) MS2 peak reconstruction & MS2 spectrum generation; (f) metabolite identification through spectral match.
4 ACS Paragon Plus Environment
Page 5 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
EXPERIMENTAL SECTION All LC-MS data sets were acquired using a UHPLC system (Agilent 1290, Agilent Technologies) coupled to a quadruple time-of-flight mass spectrometer (TripleTOF 6600, Sciex). For SWATH-MS data acquisition, the cycle time was set as 990 ms, including one TOF-MS scan (100 ms/scan; mass scan range 100-700 Da) and 24 SWATH-MS2 scans (35 ms/scan; 25 Da/SWATH window; mass scan range 25-700 Da). The 24 SWATH windows covered all mass range of 100-700 Da. We also acquired the data using 24 viable SWATH windows28,29 (see Supporting Information for details). The SWATH-MS2 spectra were acquired in high sensitivity mode. The collision energy was 30 eV. For DDA-MS data acquisition, the general parameters, such as cycle time, mass range, and collision energy were set the same as SWATH acquisition. Several DDA specific parameters were set as following: top 6 most intense ions (number of MS2 scan, 140 ms/scan); minimum precursor ion intensity: 100 cps; exclude former target ions: 4 seconds after 2 occurrences; isolation window: unit resolution; and dynamic background subtraction applied. Other experimental details about chemicals, sample preparation, and data acquisition parameters were provided in the Supporting Information. DecoMetDIA workflow. DecoMetDIA is an R package, and freely available in GitHub (https://github.com/ZhuMSLab/DecoMetDIA). All acquired raw MS data files (.wiff) were first converted to mzXML format with ‘msconvert’ program in ProteoWizard software (v3.0.6428), and then imported into DecoMetDIA for data processing. The processing times of DecoMetDIA were summarized in Supporting Information. The entire workflow of DecoMetDIA contains the following 6 steps: (1) MS1 peak detection & grouping. The MS1 peaks were detected using the ‘CentWave’30 algorithm in XCMS31 (version 1.46). For multiple data files, peak alignment was performed using ordered bijective interpolated warping (OBI-Warp)32 algorithm. Multiple peaks were grouped using the density method in XCMS (Figure 1a). MS1 peaks were further annotated using the R package CAMERA33. The parameters of peak detection and grouping were described in the Supporting Information. (2) Extraction of multiplexed MS2 spectrum & ion chromatograms. For each detected MS1 peak, the corresponding multiplexed MS2 spectrum was extracted from the corresponding SWATH window at the apex of MS1 peak. In each spectrum, MS2 ions with intensity less than 50 counts were considered as noise and removed, and MS2 ions with m/z larger than the precursor ion were removed as well. The intensity limit of 50 counts is an empirical setting (roughly for s/n>3 in TripleTOF instrument), which can be modified by users through the parameter “int.filter”. The ion 5 ACS Paragon Plus Environment
Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 6 of 17
chromatograms of all MS2 ions within the extended retention time (RT) range between (RTMS1apex − 2.5 × peak width) and (RTMS1apex + 2.5 × peak with) were extracted, and subjected to the deconvolution. The extraction m/z tolerance was set as 15 ppm (or 0.006 Da for MS1 ion < 400 Da). The chromatogram of the precursor ion within its peak range was also extracted (Figure 1b). If one spectrum has n MS2 ions, n ion chromatograms were extracted. If the MS2 ion was missing in a scan, the intensity was recorded as 0. (3) MS2 peak detection & determination of components. For each extracted ion chromatogram (EIC) of MS2 ion, the noise level and baseline were determined. The EICs with maximum signal to noise (S/N) ratio lower than 3 were removed. The locally weighted scatterplot smoothing (LOESS) method was applied to smooth each EIC. For each smoothed EIC, local maximum and minimum were detected, and assigned as the apex and boundaries of the MS2 peak in EIC. The noisy EICs with high fluctuation (normalized standard noise 0.35, see Supporting Information) were also removed. If only one MS2 peak was detected in a given EIC, the EIC was considered as a ‘simple peak’, otherwise, it was considered as a ‘complex peak’. Finally, the MS1 EIC peak and corresponding MS2 EIC peaks were combined as one peak group. For each peak group, the number of components was determined using a modified a two-step hierarchical clustering analysis (HCA) method similar to the previously reported in the ADAP-GC 3.034, which was originally used for spectral deconvolution in GC-MS. However, in our work, we modified the method to make it suitable for high resolution SWATH based LC-MS/MS data. Retention time and peak-shape similarity were used as the two-step HCA analysis. Specifically, MS2 EIC peaks in each peak group were first clustered according to their RT with the ‘centroid’ method. The cutting threshold was set to 1/4 of the maximum height of the HCA (or 15 for threshold < 15). The clusters with more than one peak were kept as RT clusters. Next, instead of the spectral similarity clustering in ADAP-GC34,35, pear-peak correlation (PPC) was calculated as the distances between peaks using a modified Pearson correlation coefficient in each RT cluster (see Supporting Information). Thirdly, HCA with a cutting threshold of 0.25 was used to classify the detected MS2 EIC peaks and determine the number of components (Figure 1c). Similar to the RT clustering, components with no less than two EIC peaks were kept. (4) Model peak selection & deconvolution. To select the model peaks from all components, the sharpness of each MS2 peak was calculated using a method reported in ADAP-GC with some modifications. In each component, if there were multiple simple peaks available, the sharpest peak among simple peaks was selected. If all the peaks in the component were complex peaks, the sharpest one from the complex peak was selected. Specially, for the component contained MS1 EIC peak, we selected the MS2 EIC peak with the highest PPC score with the MS1 peak as model peak and defined as the ‘target model peak’. In this case, the simple MS2 peak was still preferred over the
6 ACS Paragon Plus Environment
Page 7 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
complex peak. If no component contained the MS1 peak, the MS1 peak was added as an additional component, and itself was selected as the model peak and ‘target model peak’. All the model peaks were normalized by their highest intensity. MS2 EICs containing complex MS2 peaks were subjected to deconvolution. Each MS2 EIC peak was decomposed with linear combination using all selected model peaks with a constrained optimization method (L-BFGS-B)36. The best parameters found in the optimization defined the weight of model peaks, representing contributions of the selected model peaks (Figure 1d). The magnitudes of weight gave rise to the intensity axis and were used to reconstruct the corresponding MS2 peaks. (5) MS2 peak reconstruction & MS2 spectrum generation. MS2 peaks wee reconstructed with the deconvoluted weights of target model peak in all EICs. The spectrum from the apex of the apex of MS1 peak was extracted as the deconvolved MS2 spectrum of the targeted precursor ion (Figure 1e). With multiple samples, all deconvolved MS2 spectra of the same MS1 peak from all samples were used to generate consensus spectrum as it was done in our previous publication14. If the samples were grouped, we generated a consensus MS2 spectrum from each sample group. To obtain a qualified MS2 spectrum, the minimal frequency in the samples of the fragment ions was set to make sure the fragment present in at least 2 samples. (6) Metabolite identification. Metabolite identification was achieved through matching the accurate precursor mass (MS1) and MS2 spectrum (MS/MS) with those from the standard spectral library (Figure 1f). The match tolerances for MS1 and MS2 m/z values were set as 25 ppm and 35 ppm, respectively. In addition, we also submitted the deconvoluted MS2 spectra to other software tools such as MetDNA26 and Sirius27 for metabolite identification. For MetDNA, we uploaded the MS1 peak table and MS2 spectra (in .msp format) to MetDNA (version 1.1, http://metdna.zhulab.cn/). The metabolite identification parameters were set as: “HILIC” for liquid chromatography; “Sciex TripleTOF” for instrument; and “30” for collision energy. We performed the MetDNA annotation in both positive and negative modes, separately. For Sirius, both molecular formula identification and structure
elucidation
were
performed
using
the
latest
Sirius
software
(Version
4.0.1,
https://bio.informatik.uni-jena.de/sirius/). First, all deconvoluted MS2 spectra were converted to an MGF file after the removal of isotopic peaks annotated by CAMERA. For molecular formula identification, ion adducts [M+H]+, [M+Na]+ and [M+K]+ were checked for positive mode, while [M-H]-, [M+Cl]- and [M+Br]- were checked for negative mode. The formulas with a tolerance of 15 ppm and the top 10 best formulas were retained for the following CSI:FingerID based structural elucidation. PubChem was chosen as the search database. All structure candidates were outputted.
7 ACS Paragon Plus Environment
Analytical Chemistry
1.0 0.8 0.6 0.4 0.0 1.0
intensity
1.0 0.0
10
● ●
●
●
●
0 660 670 680 690 700 710
Retention time (s)
1.0
● ●
●
●
60
●
●
80
●
100 120 140 ●
●
●
0.0
●
1.0
intensity
40
5
0.4
0.6
Intensity
0.8
15 x104
1.0
f
0.2
Retention time (s)
Retention time (s)
Retention time (s)
e
660 670 680 690 700 710
660 670 680 690 700 710
660 670 680 690 700 710
Retention time (s)
d
0.2
Relative intensity
1.0 0.8 0.6 0.4 0.0
0 660 670 680 690 700 710
Relative intensity
c
0.2
10
Relative intensity
15 x104
b
5
Intensity
a
0.0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 8 of 17
●
40
●
●
60
●
●
80
100 120 140
mz
Figure 2. Deconvolution of metabolite peak M146T687 in human tissue sample using DecoMetDIA. (a) The extracted ion chromatograms (EICs) from the multiplexed spectra. The red and blue lines represented the fragments present or absent, respectively, in its related MS2 spectrum acquired using DDA technique. The dashed black line showed the corresponding MS1 peak. (b) All detected MS2 peaks after normalization with its highest intensity. Different colors showed the RT clusters after the first step of HCA clustering with the RT of MS2 peaks. (c) The components were determined after the second step of HCA and marked with different colors. (d) The model peaks selected from the components. (e) The reconstructed MS2 EIC peaks for the corresponding MS1 peak. (f) The raw multiplexed MS2 spectrum (upper panel) and the deconvoluted MS2 spectrum (lower panel) were compared to the MS2 spectrum from DDA-MS.
8 ACS Paragon Plus Environment
Page 9 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
RESULTS AND DISCUSSION DecoMetDIA workflow. We first illustrated the DecoMetDIA workflow (Figure 1) using a set of human colorectal cancer (CRC) tissue samples acquired using SWATH-MS technique. In positive mode, a total of 4,294 MS1 peaks and 3,715 deconvoluted MS2 spectra (86.5%) were obtained, which indicated a highly successful rate for spectral deconvolution. To demonstrate the deconvolution process, a metabolite peak M146T687 was demonstrated as an example (Figure 2). The MS1 peak has m/z value of 146.1175 Da and the RT range of 681.5 - 693.7s with the peak apex at 687.6s. In the extracted multiplexed MS2 spectrum, a total of 162 qualified MS2 ions were found and used for extraction of ion chromatograms within the RT range of 687.6 30.6 s (Figure 2a). Next, 83 MS2 peaks were detected in the RT range, 45 of which were marked as complex peaks. Then, the two-step HCA analysis was performed to determine the number of components. Two RT clusters were defined in first RT clustering (Figure 2b). Then, 11 components were generated after the peak-shape similarity clustering (Figure 2c). Model peaks for each component were selected and used to deconvolve the peak group (Figure 2d). The MS2 EIC peaks corresponding to the target MS1 EIC peak were reconstructed (Figure 2e). Finally, the MS2 spectrum of M146T687 was generated at the apex of MS1 peak (Figure 2f). Similarly, the deconvolution process was repeated for the same MS1 peak in the other samples. Then, all MS2 spectra from biological samples were combined to generate a consensus MS2 spectrum towards metabolite identification (Figure 2f). Finally, M146T687 peak was identified as deoxycarnitine through the spectral match with MONA (MS1 error: