Generation of molecular network from electron ionization mass

8 hours ago - Molecular networks (MN) allows to organize tandem mass spectrometry (MS/MS) data by spectral similarities. Cosine-score used as a metric...
0 downloads 0 Views 2MB Size
Subscriber access provided by CARLETON UNIVERSITY

Letter

Generation of molecular network from electron ionization mass spectrometry data by combining MzMine2 and MetGem software. Nicolas Elie, cyrille SANTERRE, and David Touboul Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.9b02802 • Publication Date (Web): 20 Aug 2019 Downloaded from pubs.acs.org on August 21, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Generation of molecular network from electron ionization mass spectrometry data by combining MZmine2 and MetGem software. Nicolas Elie,1 Cyrille Santerre,1,2 David Touboul1,* Institut de Chimie des Substances Naturelles, CNRS UPR2301, Université Paris-Sud, Université Paris-Saclay, Avenue de la Terrasse, 91190 Gif-sur-Yvette, France. 2 Institut Supérieur International Parfum Cosmétique Arômes, Plateforme scientifique, ISIPCA, 34-36 rue du parc de Clagny, 78000 Versailles, France. 1

ABSTRACT: Molecular networking (MN) allows to organize tandem mass spectrometry (MS/MS) data by spectral similarities. Cosine-score used as a metric to calculate the distance between two spectra is based on peak lists containing fragments and neutral losses from MS/MS spectra. Until now the workflow excluded the generation of molecular network from electron ionization (EI) MS data as no selection of the putative parent ion is achieved when performing classical gas chromatography (GC)-EI-MS analysis. In order to fill this gap, new functionalities on MetGem 1.2.2 software (https://github.com/metgem/metgem/releases ) have been implemented and results from a large EI-MS database and GC-EI-MS analysis will be exemplified.

Molecular network (MN) workflow has been firstly described by the group of P. Dorrestein1 based on an original approach introduced by M.L. Gross in 2002.2 MN offers the unique possibility to classify large collections of tandem mass spectrometry (MS/MS) data by spectral similarity and significantly accelerated their annotation by searching in experimental and in silico MS/MS databases. Nevertheless, one of the major limitation for database search is the non-uniformity of registered MS/MS data. Depending on the type of tandem mass spectrometers (quadripole versus ion trap for example), collision energy, activation time, gas pressure in the collision cell, MS/MS data can vary dramatically.3 On the opposite, electron ionization (EI) at 70 eV offers highly reproducible MS data with very low inter-instrument variability and large open or commercial databases. Until now, EI-MS data have never been employed to generate MNs and to improve sample annotation. We propose here to fill this gap by describing a complete workflow based on two freely available software, i.e. MZmMine24 and MetGem,5 which can be easily implemented on laptop and personal computer. Experimental section GC-EI-MS analysis Four commercial perfumes were purchased and stored at 12 °C: Black opium from Yves Saint Laurent (France), Poison Girl from DIOR (France), 212 SEXY from Carolina Herrera (New york, USA) and 24 FAUBOURG from HERMES (France). The analysis of the four perfumes was performed using an Agilent 7890B GC, equipped with a Supelcowax 10 capillary column (30 m, 0.25 mm i.d., 0.25 µm film thickness) and coupled to a mass spectrometer 5977 Agilent Technologies. Helium was chosen as carrier gas at a flow rate of 1.3 mL/min. Column temperature was initially fixed at 40 °C for 2 min, then gradually increased to 240 °C at 3 °C/ min, and finally 240 °C for 5 min. For GC-MS detection an electron impact source was

used with an ionization energy fixed at 70 eV. Data were acquired using full scan mode with a m/z range from 30 to 450. The perfumes were diluted 1:10 (v:v) with ethanol and 1.0 µL of the diluted samples was automatically injected in split mode (split ratio 100:1). Injector and detector temperatures were set at 250 and MS source at 230 °C, respectively. Agilent G1701EA MSD Productivity ChemStation Software Version E.02.02 was used to manage analysis. Experimental data were deposited in the Zenodo data repository (https://zenodo.org/ ) and are publicly available (Access Number: 3249821). MZmine 2 Data-Preprocessing Parameters. Raw files were directly processed using MZmine 2.39 software. Only MS1 level was converted as no MS2 data are registered when acquiring GC-EI-MS data. The mass detection was performed fixing the noise level at 10. The chromatogram building was achieved using the ADAP Chromatogram Builder Module with a minimum group size of 5 scans, a group intensity at 100, a minimum group intensity at 300. m/z tolerance of 0.3 (or 1000 ppm) as centroid data were acquired. For chromatogram deconvolution, Wavelet (ADAP) method6 was selected with the following settings: S/N threshold = 7, minimum feature height = 200, coefficient/area threshold = 50, peak duration range = 0.02−0.6 min and RT wavelet range = 0.0-0.05. A spectral deconvolution was performed using Multivariate Curve Resolution algorithm with deconvolution window width at 0.2 min, retention time tolerance at 0.15 min and minimum number of peaks at 1. This step is the one significantly different compared to classical LC-MS/MS data analysis. Moreover, isotopic peak grouper algorithm was not used for GC-EI-MS. Peak list was generated by aligning data set using RANSAC aligner7 with RT tolerance before correction at 0.5 and 0.1 after correction, RANSAC iterations at 0, minimum number of points at 80% and threshold value at 0.2. Data were finally exported as .mgf files for spectral information and .csv files for metadata information (peak area, peak intensity, retention times …).

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Software and Python Libraries. MetGem software was already described in a previous publication.5 A new option was added to the input data file dialog. When this option is activated, input spectra are treated as MS1 and thus parent m/z ratio is fully ignored. Molecular Network Analysis. The molecular networks were created using MetGem 1.2.2 software (https://metgem.github.io/ ). EI-MS spectra were window filtered by choosing only the top 6 peaks in the ±50 Da window throughout the spectrum. Cosine scores were calculated using a m/z tolerance of 0.3 Th. Networks were then created where edges were filtered to have a cosine score above 0.7 (or 0.75 in case of GC-EI-MS data from perfums) and more than six matched peaks. Furthermore, edges between two nodes were kept in the network if and only if each of the nodes appeared in each other’s respective top 10 most similar nodes. The library spectra were filtered in the same manner as the input data. Results and discussion The workflow to generate MN from GC-IE-MS is first consisting in converting the original file in a vendor format into a .mzXML by MSconvert freely distributed by ProteoWizard.8 The decisive step of the procedure is to optimize the MZmine2 data extraction to generate the .mgf file containing consensus MS data and .csv file with the metadata, such as retention time or peak area integration. Data treatment can be split into 4 steps. First, mass detection of centroid data is performed followed by ADAP Chromatogram builder to construct extracted ion chromatograms (EICs) and detect chromatographic peaks from EICs. Then spectral deconvolution using Hierarchical Clustering is achieved before chromatogram alignment if required. Finally .mgf and .csv files are generated.

Figure 1. Global molecular network generated by MetGem with 15.113 EI-MS spectra from MoNA (MassBank of North America); single nodes were removed. In color: the three clusters depicted in Figure 2.

Before generating MNs with MetGem 1.2.2 software, the .mgf file needs to be cleaned by fixing all “pepmass” values at

Page 2 of 11

0. In fact, the MNs will be calculated based on spectra comparison taking into account only fragment ions and not neutral losses as the molecular ion is usually not detected on EIMS spectra. This procedure has been directly integrated into MetGem 1.2.2 software by selecting this option when downloading the .mgf file. In order to test the feasibility of generating MNs from EI-MS data, a set of 15.113 EI-MS spectra, freely available through a license CC BY 4.0, has been downloaded from MoNA (MassBank of North America),9 and analyzed by MetGem 1.2.2. The resulting MNs are provided in Figure 1 showing clusters related to different chemical spaces. Figure 2 shows the annotation of the three different clusters in details. Cluster MN1 (Figure 2A) is aggregating EI-MS spectra related to carbohydrate compounds. By adjusting the cosine score threshold at 0.7, it was possible to generate sub-groups from MN1 allowing a fine discrimination between β(1→4) Glucose or α(1→6) Glucose disaccharides even if EI-MS data are bearing quite similar fragmentation patterns except a signal at m/z 160 for β(1→4) Glucose and m/z 191 for α(1→6) Glucose (Figure 3). MN2 (Figure 2B) is related to cyclic monoterpenoids with sub-clusters of cyclic monoterpenes, oxidized monoterpenes and related esters, terpineol esters and dihydrocarveol esters. Lastly, cluster MN3 (Figure 2C) allows gathering phthalate-related compounds. Interestingly, MN3 clearly shows several sub-clusters associated to symmetric phthalates, non-symetric phthalates, benzoic acid and benzaldehyde derivatives. These three examples noticeably demonstrate the discrimination and aggregating powers of MN generated by MetGem 1.2.2 software from EI-MS data. Finally, experimental GC-EI-MS data from four commercial perfumes were processed by MZmine2 and MetGem 1.2.2 software (Figure 4). The complete MN is provided in supporting information and was annotated using the MoNA database using a cosine score threshold at 0.75. A focus on the second largest cluster related to sesquiterpenes (C15H24) allows us to find typical spectral features of each perfume. A first node was annotated as α-cedrene (Retention Time (RT) = 23.8 min), one of the main component of cedar essential oil, according to a high spectral similarity (cosine score 0.94) with the archived spectrum from MoNA database. This compound was only detected in Poison Girl from Dior exhibiting a scent with typical notes of cedar, orange blossom and vanilla,10 that is in good accordance with the detection of α-cedrene. Two other nodes (1 (RT = 26.4 min) and 2 (RT = 30.7 min)) only present in Poison Girl show similarity score at 0.87 and 0.82 with α-cedrene standard, respectively, and can thus be annotated as α-cedrene analogues. Finally, three nodes (3 (RT = 34.8 min), 4 (RT = 58.2 min) and 5 (RT = 58.3min)) were only present in 212 Sexy from Carolina Herrera. The EI-MS spectra did not match with any of the standard from MoNA database using a cosine score threshold at 0.75. By decreasing the threshold at 0.65, α-cedrene was one the best annotation with score at 0.69, 0.72 and 0.69 for 3, 4 and 5, respectively, indicating other analogues of αcedrene. A last node was annotated as gurjunene according to a cosine score at 0.85 with the archived spectrum from MoNA database. This compound displays a medium woody scent that is typical of 24 Faubourg from Hermes.11 Conclusion Combining MZmine2, data formatting procedure by removing the parent ion mass information and MetGem 1.2.2

ACS Paragon Plus Environment

Page 3 of 11

Spectral similarities on GC-EI-MS data will also be of high interest for annotation of metabolomic data where unknown features are common and require complex workflows for their isolation and identification by complementary analytical tools. Moreover, the combination of MNs generated from complementary LC-MS/MS and GC-EI-MS data of the same samples could significantly increase the metabolome coverage.

software, we demonstrated for the first time that MNs can be efficiently generated from GC-EI-MS data. This new workflow is compatible with low or high resolution mass spectra and allows to compare and annotate EI-MS data in fast and reliable fashion. As in-house or licensed databases can be read on MetGem 1.2.2 software and interrogated on a local computer, the annotation process can be significantly enriched.

A

Polyol C6 Polyol C4 Sugar acid Oxidized sugar 4-O-β-disaccharide D-glucose derivatives D-glucytol derivatives Fructose derivatives Disaccharide β(1→4)-Glc Disaccharide α(1→6)-Glc

OH

OH

O

HO

OH OH

OH

OH

HO

OH

OH O

OH

OH

OH

OH

O OH

OH

OH

OH

HO

OH

HO

OH

OH

OH

OH OH

OH

O

B O

O

C

O

O O

H

O

O

O

O O O

O

O O O O

HO O

Symmetrical long-chain phtalate Benzaldehyde derivatives Non-symmetrical phtalate Short-chain phtalate

Oxidized monoterpenes and related esters Cyclic monoterpenes Dihydrocarveol and esters Terpineol and esters

Figure 2. Molecular networks related to sugar derivatives (MN1, A), monoterpenoids (MN2, B) and phthalate derivatives (MN3, C).

Relative Intensity

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

100 Melibiose

50 0 50

D-Panose

100 0

100

200

300 m/z

400

500

Figure 3. EI-MS spectral comparison of Melibiose and D-Panose.

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

H

1

α-cedrene (0.94)

2

3 4 5 β-gurjunene (0.85) Poison Girl (Dior) Black Opium (Yves Saint Laurent) 212 Sexy (Carolina Herrera) 24 Faubourg (Hermes)

Figure 4. Cluster related to sesquiterpene family calculated from GC-EI-MS data of four different commercial perfums.

ASSOCIATED CONTENT Supporting Information The Supporting Information is available free of charge on the ACS Publications website. S1. Complete molecular network generated from GC-EI-MS data of four commercial perfumes

(2) (3) (4)

AUTHOR INFORMATION Corresponding Author

(5)

* Tel.: +33 686 24 16 92. E-mail: [email protected].

ORCID

(6)

Nicolas Elie: 0000-0002-8733-0971 David Touboul: 0000-0003-2751-774X (7)

Author Contributions The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript.

(8)

Notes The authors declare no competing financial interest.

ACKNOWLEDGMENT This work was supported by the Agence Nationale de la Recherche (Grant ANR-16-CE29-0002-01 CAP-SFC-MS).

REFERENCES (1) Wang, M.; Carver, J. J.; Phelan, V. V.; Sanchez, L. M.; Garg, N.; Peng, Y.; Nguyen, D. D.; Watrous, J.; Kapono, C. A.; Luzzatto-Knaan, T.; Porto, C.; Bouslimani, A.; Melnik, A. V.; Meehan, M. J.; Liu, W.-T.; Crusemann, M.; Boudreau, P. D.; Esquenazi, E.; Sandoval-Calderon, M.; Kersten, R. D.; Pace, L. A.; Quinn, R. A.; Duncan, K. R.; Hsu, C.-C.; Floros, D. J.;

(9) (10) (11)

Page 4 of 11

Gavilan, R. G.; Kleigrewe, K.; Northen, T.; Dutton, R. J.; Parrot, D.; Carlson, E. E.; Aigle, B.; Michelsen, C. F.; Jelsbak, L.; Sohlenkamp, C.; Pevzner, P.; Edlund, A.; McLean, J.; Piel, J.; Murphy, B. T.; Gerwick, L.; Liaw, C.-C.; Yang, Y.-L.; Humpf, H.-U.; Maansson, M.; Keyzers, R. A.; Sims, A. C.; Johnson, A. R.; Sidebottom, A. M.; Sedio, B. E.; Klitgaard, A.; Larson, C. B.; Boya P, C. A.; Torres-Mendoza, D.; Gonzalez, D. J.; Silva, D. B.; Marques, L. M.; Demarque, D. P.; Pociute, E.; O'Neill, E. C.; Briand, E.; Helfrich, E. J. N.; Granatosky, E. A.; Glukhov, E.; Ryffel, F.; Houson, H.; Mohimani, H.; Kharbush, J. J.; Zeng, Y.; Vorholt, J. A.; Kurita, K. L.; Charusanti, P.; McPhail, K. L.; Nielsen, K. F.; Vuong, L.; Elfeki, M.; Traxler, M. F.; Engene, N.; Koyama, N.; Vining, O. B.; Baric, R.; Silva, R. R.; Mascuch, S. J.; Tomasi, S.; Jenkins, S.; Macherla, V.; Hoffman, T.; Agarwal, V.; Williams, P. G.; Dai, J.; Neupane, R.; Gurr, J.; Rodriguez, A. M. C.; Lamsa, A.; Zhang, C.; Dorrestein, K.; Duggan, B. M.; Almaliti, J.; Allard, P.-M.; Phapale, P.; Nothias, L.-F.; Alexandrov, T.; Litaudon, M.; Wolfender, J.-L.; Kyle, J. E.; Metz, T. O.; Peryea, T.; Nguyen, D.-T.; VanLeer, D.; Shinn, P.; Jadhav, A.; Muller, R.; Waters, K. M.; Shi, W.; Liu, X.; Zhang, L.; Knight, R.; Jensen, P. R.; Palsson, B. O.; Pogliano, K.; Linington, R. G.; Gutierrez, M.; Lopes, N. P.; Gerwick, W. H.; Moore, B. S.; Dorrestein, P. C.; Bandeira, N. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking, Nat. Biotechnol. 2016, 34, 828-837. Wan, K. X.; Vidavsky, I.; Gross, M. L. Comparing similar spectra: from similarity index to spectral contrast angle, J. Am. Soc. Mass Spectrom. 2002, 13, 85-88. Scheubert, K.; Hufsky, F.; Böcker, S. Computational mass spectrometry for small molecules, J. Cheminform., 2013, 5, 12. Pluskal, T.; Castillo, S.; Villar-Briones, A.; Orešič, M. MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data, BMC Bioinform., 2010, 11, 1-11. Olivon, F.; Elie, N.; Grelier, G.; Roussi, F.; Litaudon, M.; Touboul, D. MetGem Software for the Generation of Molecular Networks Based on the t-SNE Algorithm, Anal. Chem., 2018, 90, 13900-13908. Ni, Y.; Su, M.; Qiu, Y.; Jia, W.; Du, X. ADAP-GC 3.0: Improved Peak Detection and Deconvolution of Co-eluting Metabolites from GC/TOF-MS Data for Metabolomics Studies. Anal. Chem. 2016, 88, 8802-8811. Fischler MA, Bolles RC: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Comm Of the ACM 1981, 24: 381–395. Chambers, M. C.; Maclean, B.; Burke, R.; Amodei, D.; Ruderman, D. L.; Neumann, S.; Gatto, L.; Fischer, B.; Pratt, B.; Egertson, J.; Hoff, K.; Kessner, D.; Tasman, N.; Shulman, N.; Frewen, B.; Baker, T. A.; Brusniak, M.-Y.; Paulse, C.; Creasy, D.; Flashner, L.; Kani, K.; Moulding, C.; Seymour, S. L.; Nuwaysir, L. M.; Lefebvre, B.; Kuhlmann, F.; Roark, J.; Rainer, P.; Detlev, S.; Hemenway, T.; Huhmer, A.; Langridge, J.; Connolly, B.; Chadick, T.; Holly, K.; Eckels, J.; Deutsch, E. W.; Moritz, R. L.; Katz, J. E.; Agus, D. B.; MacCoss, M.; Tabb, D. L.; Mallick, P. A Cross-platform Toolkit for Mass Spectrometry and Proteomics, Nat. Biotechnol. 2012, 30, 918920. http://mona.fiehnlab.ucdavis.edu/downloads https://en.wikipedia.org/wiki/Poison_(perfume) The New Perfume Handbook, second edition, Nigel Groom, p 421.

ACS Paragon Plus Environment

Page 5 of 11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

254x190mm (300 x 300 DPI)

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2A 254x190mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 6 of 11

Page 7 of 11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figure 2B 254x190mm (300 x 300 DPI)

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2C 254x190mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 8 of 11

Page 9 of 11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figure 3 254x190mm (300 x 300 DPI)

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 4 254x190mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 10 of 11

Page 11 of 11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Table of content

ACS Paragon Plus Environment