Comprehensive Strategy to Construct In-House ... - ACS Publications

May 29, 2018 - metabolites and endogenous metabolites research.10−14 ... of 0.1% formic acid in water (A) and acetonitrile (B). The .... SM d18:1/12...
0 downloads 0 Views 1MB Size
Subscriber access provided by - Access paid by the | UCSB Libraries

Article

A Comprehensive Strategy to Construct In-house Database for Accurate and Batch Identification of Small Molecular Metabolites Xinjie Zhao, Zhongda Zeng, Aiming Chen, Xin Lu, Chunxia Zhao, Chunxiu Hu, Lina Zhou, Xinyu Liu, Xiaolin Wang, Xiaoli Hou, Yaorui Ye, and Guowang Xu Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.8b01482 • Publication Date (Web): 29 May 2018 Downloaded from http://pubs.acs.org on May 29, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 25

A Comprehensive Strategy to Construct In-house Database for Accurate and Batch Identification of Small Molecular Metabolites

Xinjie Zhao1, Zhongda Zeng1,2, Aiming Chen2, Xin Lu1, Chunxia zhao1, Chunxiu Hu1, Lina Zhou1, Xinyu Liu1,Xiaolin Wang1, Xiaoli Hou1, Yaorui Ye1, Guowang Xu1*

1. CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Science, Dalian 116023, China. 2. Dalian ChemDataSolution Information Technology Co. Ltd , 116023, Dalian, China.

* Correspondence: Prof. Dr. Guowang Xu, e-mail: [email protected], Tel: 0086-411-84379530, Fax: 0086-411-84379559

1

ACS Paragon Plus Environment

Page 3 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

ABSTRACT: Identification of the metabolites is an essential step in metabolomics study to interpret regulatory mechanism of pathological and physiological processes. However, it is still a big headache in LC-MSn-based studies because of the complexity of mass spectrometry, chemical diversity of metabolites, and deficiency of standards database. In this work, a comprehensive strategy is developed for accurate and batch metabolite identification in non-targeted metabolomics studies. First, a well defined procedure was applied to generate reliable and standard LC-MS2 data including tR, MS1 and MS2 information at a standard operational procedure (SOP). An in-house database including about 2000 metabolites was constructed and used to identify the metabolites in non-targeted metabolic profiling by retention time calibration using internal standards, precursor ion alignment and ion fusion, auto-MS2 information extraction and selection, and database batch searching and scoring. As an application example, a pooled serum sample was analyzed to deliver the strategy, 202 metabolites were identified in the positive ion mode. It shows our strategy is useful for LC-MSn-based non-targeted metabolomics study.

2

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 25

■ INTRODUCTION Metabolomics, as one of the important techniques of systems biology, has been widely used in human, animal, plant and microorganism fields to interpret metabolism regulatory mechanism of pathological and physiological processes.1,2 Amongst the steps in metabolomics studies, identification of the metabolites has a great importance to connect the experimental data and the following pathway analysis and biological explanation. Liquid chromatography coupled with mass spectrometry (LC-MS) significantly exceeds other techniques for metabolomics discovery based on its excellent performance for simultaneous separation of complex samples and finding of metabolite features with high sensitivity.3,4 Unfortunately, it is reported that only 1.8% of experimental MS data can be annotated in non-targeted metabolomics studies5. This is completely unacceptable and makes metabolites identification one of the biggest bottlenecks in metabolomics areas. Generally, identification of a large number of unknown MS features of metabolites is a time-consuming and laborious task. Most of studies for metabolites identification were performed by using online databases, such as HMDB, Metlin, PubChem, etc., which is based on experimental or predicted MS1 mass and MSn fragmentation.6 Independent data acquisition (IDA) or SWATH (window acquisition sequential of all theoretical spectra fragment-ion) was widely applied to acquire precursor ions with simultaneous collection of MS2 data automatically, which obviously improves acquisition speed for identification analysis.7-10 But improvement of data quality and handling of MS1 and MS2 data are required for identification matching and information extraction. Batch searching of MS1 can be partially achieved to some online databases, while massive false positive results are unavoidable if only MS1 information is used. Although MS2 information of some small molecules is available 3

ACS Paragon Plus Environment

Page 5 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

in some databases, batch retrieval and scoring of MS2 are still difficult to be achieved for most metabolites with great interests. Some researchers have developed retrieval methods to attain batch identification, which have been applied in protein, drug metabolites and endogenous metabolites research. 10,11,12,13 14 In addition, the retention behavior of LC separation provides an important reference for identification of metabolites. Huan et al. introduced a dansylation labeled LC-MS library, and retention time (tR) was utilized as a searchable parameter for metabolite identification.15 Aicheler et al. illustrated that combining tR model and accurate mass search significantly reduced the rate of false positives in complex lipid data sets, which improved identification in nontargeted lipidomics approaches.16 Up to date, available LC-based retention time database is rare to improve identification accuracy. In this work, a comprehensive strategy was proposed for rapid metabolite identification in non-targeted LC-MS metabolomics study. First, a standard operation procedure (SOP) was defined based on LC-MS analytical method. An in-house LC-MS2 database was established under the SOP conditions by using available metabolite standards and previously identified metabolites, in which tR of the target compounds, MS1 and MS2 spectra were simultaneously collected to each standard. Then, a series of systematic and automated approaches were developed to enhance data quality and identification results, including LC-MS2 data preprocessing and integration, retention time calibration, determination of noise level, and high throughput database searching and results scoring. As an application example, a pooled serum sample was analyzed and identified by using the developed identification strategy. The general framework is shown in Figure 1.

4

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 25

Figure 1. General framework of the rapid metabolites identification strategy

EXPERIMENTAL SECTION LC-MS Analysis Method All 1500 available metabolite standards were contained in the in-house LC-MS2 database. The retention time, accurate mass and MS/MS fragments of these standards were consistently determined by using a Waters ACQUITY-Ultra High Performance liquid chromatography (UPLC) system (Waters Corp, Milford, USA) coupled to AB SCIEX Triple Q-TOF 5600+ System (AB SCIEX, Concord, ON, Canada) in both positive and negative ion modes. 5

ACS Paragon Plus Environment

Page 7 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

For the ESI positive ion mode, a BEH C8 column (2.1×100 mm with 1.7 µm particle size) (Waters, Milford, MA, USA) was used for separation, and the mobile phases

consisted

of

0.1%

formic

acid

in

water

(A)

and

acetonitrile

(B). The gradient started with 5% B, held for 1 min, and was then linearly increased to 100% B within 24 min, and held for another 4 min, then came back to 5% B. For the ESI negative ion mode, the separation was performed on a HSS T3 column (2.1×100 mm with 1.7 µm particle size) (Waters, Milford, MA, USA), and the mobile phases consisted of 6.5 mM ammonium bicarbonate in water (C) and 6.5 mM ammonium bicarbonate in 95% methanol and water (D). The gradient started with 2% D, held for 1 min, was linearly increased to 100% D within 18 min, and held for 4 min, then came back to 2% D. The time for post –equilibrium was set as 2 min and flow rate was 0.35 mL/min in both ion modes. The column temperature was kept at 50 °C in positive ion mode and 55 °C in negative ion mode. A mixture of internal standards was injected, and their times were used to correct retention time drifts due to different instruments and different experimental batches. The internal standards in positive ion mode and negative ion mode were respectively shown in Table1 and Supporting Information Table S1. The MS2 fragments were simultaneously collected in high (45 eV), middle (30 eV), and low (15 eV) collision energy, respectively. The ion spray voltage of mass spectrometry was set at 5500 V in positive ion mode and 4500 V in negative ion mode. Interface heater temperature was 500 °C. Curtain gas, ion source gas 1 and ion source gas 2 were set at 35 Psi, 50 Psi and 50 Psi, respectively.

Application of Non-targeted LC-MS-based Metabolomics Method with IDA-based Auto-MS2 6

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 25

Sera from 10 volunteers were mixed to produce a pooled serum and used as the test sample. The pooled serum was added 4 volumes of acetonitrile containing the internal standards, centrifuged at 13,000 g for 15 min for deproteinization. The supernatant was dried in a vacuum centrifuge, then reconstituted in 100 µL acetonitrile/water (2:8), and 5 µL of them was injected for LC-MS analysis.

Table 1. List of internal standards used for retention time calibration in positive ion mode. Internal standard

Molecular formula accurate mass tR (min)

carnitine C2:0-d3

C9D3H14NO4

206.1392

0.77

Phenylalanine-d5

C9D5H6NO2

170.1181

1.35

Tryptophan-d5

C11D5H7N2O2

209.1290

2.31

Leucine-enkephalin C28H37N5O7

555.2693

6.36

Carnitine C8:0-d3

C15D3H26NO4

290.2331

7.59

Carnitine C10:0-d3

C17D3H30NO4

318.2644

9.86

CA-d4

C24D4H36O5

412.3189

11.1

CDCA-d4

C24D4H36O4

396.3240

12.85

carnitine C16:0-d3

C23D3H42NO4

402.3583

14.13

LPC 19:0

C27H56NO7P

537.3794

16.49

SM d18:1/12:0

C35H71N2O6P

646.5050

19.64

The above pooled serum extract was analyzed by a UPLC system coupled with an AB SCIEX TripleTOF 5600+ System. IDA auto-MS2 mode was used for MS2 spectra collection. The IDA-based auto-MS2 was performed each 0.25 sec, the 10 most intense metabolite ions in a cycle of full scan were used to acquire auto-MS2. The CE 7

ACS Paragon Plus Environment

Page 9 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

voltage was also set at 15, 30, and 45 eV. Two different LC separation methods were employed. One was the same method as described above for LC-MS2 database construction, a 30 min elution gradient (gradient 1) was used. In order to test the results of retention time correction, the separation was also performed on a 2.1×50 mm ACQUITYTM 1.7 µm C8 column with a short time elution gradient (15 min) by using UHPLC (Shimadzu, Kyoto, Japan). For positive ion mode, the short time elution gradient program was 5% B, held 0.5 min, and linearly changed to 100% B within 12 min, and held for 3 min, then reverted back to 5% B. To compare the mass spectrum difference of different MS instruments, the sample was also analyzed by UPLC coupled with Q-Exactive HF (Thermo Fisher Scientific, Rockford, IL, USA), and UPLC coupled with Xevo G2-XS Q-TOF (Waters, Massachusetts, USA) with IDA auto-MS2 mode with the CE voltage set at 15, 30, and 45 eV, respectively. A 30 min elution gradient was used.

Data Processing. Non-targeted LC-MS data from multiple runs were extracted and aligned by Markerview workstation (AB SCIEX, Concord, ON, Canada) and Sieve software (version 2.2, Thermo Fisher Scientific, Rockford, IL, USA), the list of ions peaks including tR, m/z and intensity were exported in an excel set. Raw data files of IDA-based auto-MS2 analysis at CE voltages of 15, 30 and 45 eV were converted to MGF files. The tR, m/z, charge, and intensity of product ions, and the corresponding precursor ions were exported. Then the excel set of ion peaks and the corresponding auto-MS2 MGF files from three different CE voltages were imported to in-house software OSI/SMMS for the next data processing. The details about the procedures and methods for data processing to construct the software will be introduced below. 8

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 25

■ RESULTS AND DISCUSSION Construction of In-house LC-MS2 Database Reversed-phase liquid chromatography is the most widely used technique in LC−MS based metabolomics studies. Columns C8 and C18 commonly with water containing formic acid and acetonitrile as buffers have optimal separation and detection in the positive ion mode. In our previous work17-19 the corresponding methods with these columns have been proven to obtain ideal results for separation of complex metabolites. Therefore, in this study, BEH C8 column was selected to establish a SOP analysis method in order to have better separation of lipid compounds. Similarly, HSS T3 column with 6.5 mM ammonium bicarbonate water solution (C) and 6.5 mM ammonium bicarbonate in 95% methanol and water (D) was applied in the negative ion mode. Based on the defined SOP method, 1500 available pure metabolite standards including organic acid, amino acids, fatty acids, lipids, poly phenols, and flavonoids, etc. were analyzed to collect comprehensive qualitative information including tR, accurate mass of various adduct forms, such as [M+H]+, [M+Na]+, [M+K]+, [M+NH4]+, etc. An LC-MS2 database was constructed to contain the retention time, MS and MS/MS of all these standards. Some of these compounds are easily in-source dissociated, they usually are difficult to be identified because of the deficiency of molecular ion peaks. In the LC-MS2 database, the fragmentation products of in-source dissociation were also collected in MS1 data. And MS2 fragments were collected in high, middle and low CE voltages, respectively. Pure standards guarantee satisfactory quality of MS1 and MS2 data. The adduct forms and fragments were experimentally generated under the real conditions, which ideally avoids false positives in contrast to 9

ACS Paragon Plus Environment

Page 11 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

theoretical computation. In addition, some metabolites have no standards commercially available, but can be detected and identified by accurate mass in practice in terms of the characteristic fragments and retention time in LC separation previously. Such metabolites were extensively collected and constructed as extended database for potential identification. This type of metabolites have about 600 in the database, including free fatty acids, carnitines, bile acids and lipids, etc.20-23

Strategy of Metabolite Identification based on the Database In order to achieve rapid metabolite identification in non-targeted LC-MS metabolomics study, a comprehensive strategy is proposed as below. After the pretreatment, the extract was performed with IDA-based auto-MS2 for LC-MS based metabolomics analysis. Then, an one-step solution for identification of small molecule compounds by using OSI/SMMS software was developed to achieve precursor ion fusion, precursor ions and its product ions extraction, retention time calibration, and then database searching and scoring, automatically. The general framework for rapid metabolites identification strategy is shown in Figure 1.

Step 1: Precursor ion alignment and fusion. In our previous study, a systematic approach of ion fusion was developed for high-resolution LC−MS metabolomics data.24 It was proved that the redundant MS features can be significantly reduced after ion fusion analysis, and the chance for the accurate identification of unknown metabolites was then enhanced through largely excluding ambiguous candidate hits. Using the ion fusion approach, the redundant ions, such as isotopic ions, adduct ions (e.g., [M + Na]+, [M + K]+ or [M + NH4]+), fragment ions (e.g., loss of H2O, CO2), and oligomers (2M + H, 3M + H) were removed or combined to produce data with the 10

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 25

property of one metabolite corresponding to one ion, or ion group. Then the list of ion features after alignment and fusion was used for the next steps of data processing.

Step 2: Auto-MS2 information extraction and selection. First, fundamental noise level of auto-MS2 information is determined and filtered to reduce the influence of the MS2 spectra comparison, especially for low abundance compounds. This is achieved in terms of the principles introduced in a previously reported algorithm for processing of tandem mass spectrometry-based proteomics study25. Next, the tR shift (∆tR) and m/z difference (∆m/z) in MS1 and MS2 were applied to filter the precursor ion and its product ions from a large number of MS2 spectra. Our previous work introduced the whole procedure in details about the software platform called as MRM-FINDER.26 Furthermore, a similarity comparison of MS2 spectra was used to detect peak purity. It should be noted that a precursor ion may acquire more than one auto-MS2 spectra, and MS2 spectra in a window of tR and m/z may derive from different precursor ions with the approximate molecular weight in an experimental run. MS2 spectra with high similarity may be generated by the same precursor ion, therefore MS2 spectra were merged into a spectrum to guarantee high quality of MS2 spectra. Low similarity prompts that MS2 spectra may be generated from different precursor ions, and further selection is necessary to avoid false match of precursor ion and its product ions.

Step 3: Retention time calibration. Chromatographic retention behavior is one of the important references for the identification of compounds. While in different experimental batches, or in different instruments, tR drifts were unavoidable. In order to reduce the tR difference between experimental data and in-house database, 11 internal standards were chosen in positive ion mode, shown in Table 1. And a local linear regression calibration method was applied to calibrate retention time shift.15 For negative ion mode, 9 internal standards were used (Table S1 in Supporting 11

ACS Paragon Plus Environment

Page 13 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Information).

Step 4: Automatic Batch Searching and Scoring. Successful findings of metabolites from known database strongly depend on the following two factors, 1) comprehensive data integration and weighting of retention time, MS1 and MS2, 2) employment or ignoring of MS2 intensity information. In this work, retention time and MS1 are simultaneously applied as pre-requisite indices using instrumental-dependent thresholds, and then integrated together with MS2 for scoring evaluation with the help of user-defined weights. Two methods including linear and exponential weighting of retention time, MS1 and MS2 are provided for optimal choose and comparison to each feature in the in-house constructed software. The scoring for the determination of metabolite candidates is calculated on the basis of accurate finding of the common and uncommon features of each MS pair extracted from real samples and standards database, which is attained by using a pre-defined acceptable difference of m/z value, namely, ∆m/z. To the MS with inclusion of N features, the scoring will be enhanced or reduced one fraction of N units to each common or uncommon feature, respectively. This makes the scoring with the range from zero to one, in which the candidate with higher possibility should have larger value of scoring. Then, in order to reduce the influence of incorrect determination of noise level, a noise-dependent buffering area is defined between the definitive signal and noise level. The fundamental principle is illustrated in Figure 2(A). After obtaining noise level with the help of the methods mentioned above, the three zones, namely Zone 1, Zone 2, and Zone 3 were recommended to less than 5-, 5~10-, and larger than 10-times of the calculated results. Of course, these thresholds can be manually regulated as parameter setting in the software. To the MS features included in Zone 1 and Zone 3, the MS intensity was definitively not applied for scoring computation, 12

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 25

respectively. That is, these features are removed for consideration while employing the information of MS intensity for weighting, since they can or cannot be definitely determined as real signal with certainty. To the MS features found in the ambiguous area (that is, Zone 2), an extra weight of Rcommon of each feature was integrated to calculate the final score according to the changing rate (Wintensity) of intensity between the features to be searched and included the database. Obviously, the features in Zone 2 may be inappropriately applied for MS comparison because of the potential fault results for determination of real features, and then weighting of intensity for scoring should be partially reduced.

Figure 2. A. Illustration of fundamental principle of noise-dependent buffering area . B. and C. Illustration of fundamental principle of forward and reverse search for scoring of metabolites identification, respectively.

13

ACS Paragon Plus Environment

Page 15 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

After this, forward and reverse searching are simultaneously employed for MS searching. The fundamental principle of the two strategies are provided in Figure 2(B) and 2(C). The former crucially considers the number of MS features in the standards, and the latter mainly concerns about the number of MS features of the standards matched with them in the experimental data. Further, an averaging integrated scoring of the two methods was calculated for holistic identification evaluation. It is obvious that the findings with higher scores should have larger possibility as potential metabolites to be identified.

Application of Metabolites Identification.

Precursor ion alignment and fusion, and Auto-MS2 information extraction and selection. A pooled serum non-targeted metabolomics study was employed to deliver the proposed strategy for systematic metabolite identification. The pooled serum extract was detected by a UPLC system coupled with an AB SCIEX Triple TOF 5600+ System. The 10 most intense metabolite ions in a cycle of full scan were performed by using IDA-based auto-MS2, and MS2 spectra were simultaneously acquired in high, middle and low CE voltages, respectively (Figure 3A). Precursor ions were aligned by Peakview workstation, and 2241 ions were obtained. After combining the MS ions derived from the same metabolites by using ion fusion approach, 1526 ion features were retained through removing or grouping isotopic ions, adduct ions, oligomers and some fragment ions (e.g., loss of H2O, CO2). For auto-MS2 information extraction and selection, ∆tR and ∆m/z between precursor ion and its product ions were defined as 10 s and 0.007 Da, respectively. Totally 1299 ion features matched with their MS2 spectra. Figure 3B shows the distribution of ions matching with MS2, or not, respectively. If the number of co-eluting ions is massive, 14

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 25

such as in the zone close to the dead time, the ions of metabolites with low intensity may be not acquired with auto-MS2. Finally, 85% of ion features were discovered with comprehensive information of tR, MS and MS/MS spectra (Figure 3C). The ion features without MS/MS spectra can be characterized in the database only according to orthogonal tR and MS1 information.

Figure 3. A. Total ion chromatograms (TICs) of non-targeted LC-MS metabolomics analysis with IDA-based auto-MS2 in high, middle and low CE voltages, respectively; B. Distribution of ions matching with MS2 (blue color) or not (red color); and C. Percentage of ion features with matching comprehensive information.

15

ACS Paragon Plus Environment

Page 17 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Retention time calibration. Next, retention times of remaining MS ions were corrected by a local linear calibration method with the 11 internal standards. In this study, another twenty metabolites in serum (Supporting Information Table S2) were chosen to evaluate the power of algorithm to correct retention time. After corrected, the tR difference of metabolites between the real samples and in-house database was obviously reduced (Figure 4), which will greatly improve identification performance. In the same elution gradient as that in established database, ∆tR of all 20 compounds were less than 20 sec, and ∆tR of 18 compounds were less than 10 sec. Apparently, the less retention time shift potentially reduced the false positive matches. As an extreme example of a big change of retention time, the sample was also separated on a 2.1×50 mm 1.7 µm C8 column, but with a shorter elution gradient (15 min) by using UHPLC (Shimadzu). After the retention times were corrected, ∆tR of 15 compounds were less than 20 sec, those of three compounds were less than 30 sec, and those of two compounds were less than 40 sec. All these results show local linear regression calibration was quite suitable for the tR correction with variational experimental conditions. Automatic Batch Searching and Scoring. The identification of compounds is based on the hybrid information of retention time, accurate mass and MS/MS by OSI/SMMS software. According to results of retention time correction described above, the window of tR was set to 20 s. ∆m/z of MS1 and MS2 were set to 0.005 Da and 0.01 Da, respectively. The combination information could reduce the incorrect metabolites matching greatly, and improve the identification quality. For example, an ion peak with tR = 49.8 sec and m/z = 137.046 Da can match two metabolites as hypoxanthine and allopurinol in the standard database. These two compounds were identified as isomers with molecular formula C5H4N4O. It is known that retention 16

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 25

time and accurate mass were not sufficient to uniquely distinguish the isomers.

Figure 4. Correlation plots of the retention times obtained in experiment vs those in the database before retention time calibration (in blue) and after retention time calibration (in red). A. 30 min elution gradient; and B. 15 min elution gradient with a short column by UHPLC (Shimadzu).

However, this can be distinguished through combining them with MS/MS information. The upper parts of Figure 5A and 5B show MS2 spectra of the ion peaks, the lower 17

ACS Paragon Plus Environment

Page 19 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

parts of the Figure 5A and 5B show MS2 spectra of hypoxanthine and allopurinol standards in standard database. The matching scores were 0.762 and 0.441, respectively. So the ion peaks were explicitly identified as hypoxanthine on the basis of MS2. The possible fragment mechanism supported the identified results (Figure 5C). In another case, the MS2 spectra of some isomers were not easily distinguished, such

as

glycochenodeoxycholate

(GCDCA),

glycodeoxycholate

(GDCA),

glycoursodeoxycholate (GUDCA) and glycohyodeoxycholate (GHDCA), but after the retention times were applied, they can easily be distinguished.

Figure 5. Upper parts A. and B. show MS2 spectra of the ion peak with tR = 49.8 sec and M/Z = 137.046 Da. Lower parts of A and B show MS2 spectra of hypoxanthine and allopurinol standards in the standard database, respectively. Possible fragment mechanism is shown in C.

Based on the combination information of retention time, accurate mass MS1 and MS2 spectra, the non-targeted metabolomics analysis data were used to batch retrieve the

18

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 25

in-house database. The results with the score greater than 0.6 were retained. Ultimately, 202 metabolites were identified from the pooled serum based on LC−MS In-house Database and OSI/SMMS in ESI positive mode (details in Table S3), which involved in 37 metabolic pathways (Figure S1).

The applicability in different high resolution mass spectrometers. Unlike GC-MS, stable ion fragments can be obtained by using EI (Electronic Impact Ionization), the MS2 spectra from different LC-MS instruments may have a larger difference. In order to achieve better MS2 spectra matching, weight of MS2 intensity was considered as well. The fragments with a higher intensity were given higher weight to avoid the unstable detection of low intensity fragment effects of scoring results. And forward and inverse MS searches were combined for better matching with the standards in the database. Using tryptophan as an example, the applicability of our in-house database and identification strategy on different instruments were determined. The results showed that the experimental ion fragments were very similar to those in the database with matching scores of 0.974 by using the same mass spectrometer AB SCIEX Triple TOF 5600+ (Figure 6A). Unfortunately, some fragments with low intensity were not matched very well with the standard in the database by using the different mass spectrometers, Thermo Fisher Scientific Q-Exactive HF and Waters Xevo G2-XS Q TOF. Some of them were not detected experimentally, such as m/z 188.070 (Figure 6B) and m/z 159.091 (Figure 6B and 6C).

19

ACS Paragon Plus Environment

Page 21 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figure 6. Upper parts show MS2 spectra of tryptophan obtained by A. AB SCIEX Triple TOF 5600+, B. Thermo Fisher Scientific Q-Exactive HF, and C. Waters Xevo G2-XS Q TOF, respectively. Lower parts show MS2 spectra of tryptophan standards in standard database.

Another case is some fragments can be found in the experiment but missed in the database. As introduced above, through the combination of forward and inverse MS search results, the matching scores with different mass spectrometers were 0.806 and 0.760, respectively. The higher scores indicated metabolite was identified with high 20

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 25

confidence level. Thus, the MS2 spectra obtained from three instruments matched quite well with the standard included in the database. Furthermore, the comparisons of 20 metabolites scores by using different instruments were shown in the Supporting Information Table S4. The matching scores of metabolites including amino acids, organic acids, lipids etc. were acceptable. It has been proven that our method and the database are suitable for different high resolution mass spectrometers.

■ CONCLUSIONS In this study, an integrated strategy was introduced for rapid metabolite identification in non-targeted metabolomics study. The procedure comprehensively includes data preprocessing, auto-MS2 information extraction and selection, retention time calibration and then comprehensive information of tR, MS and MS/MS spectra automatic batch database searching. Theoretical algorithms were specifically proposed to each step for data processing. This opens an integrated way for metabolites identification on the basis of in-house database. In order to achieve this comprehensive strategy, an in-house LC-MS2 database was established, and a systematic and automated approach and homemade software (OSI/SMMS) was developed. As an application instance, a pooled serum sample was analyzed and identified by using the rapid metabolite identification strategy. Totally 202 metabolites were identified based on LC−MS In-house Database and OSI/SMMS software by using the combination information of retention time, accurate mass and MS/MS in the ESI positive mode. The strategy was proved to be effective in LC-MS non-targeted metabolomics study for rapid metabolites identification.

21

ACS Paragon Plus Environment

Page 23 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

■ ASSOCIATED CONTENT Supporting Information This Supporting Information is available free of charge on the ACS Publication website at http://pubs.acs.org/. Additional information as noted in text includes Table S1 list of internal standards used for retention time calibration in negative ion mode. Table S2 evaluation of algorithm to correct retention time with twenty metabolites in serum. Table S3 202 metabolites were identified from the pooled serum based on LC−MS In-house Database and OSI/SMMS in ESI positive mode. Figure S1 Pathway analysis of the pooled serum based on identified 202 metabolites.

■ AUTHOR INFORMATION Notes The authors declare no competing financial interest.

■ ACKNOWLEDGMENTS This research was supported by the foundations (21575140, 21675154, 21775147 and 81472374) and key foundation (21435006) from the National Natural Science Foundation of China and the National Key Research and Development Program of China (2017YFC0906900).

22

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 25

■ REFERENCES (1) Patti, G. J.; Yanes, O.; Siuzdak, G. Nature Reviews Molecular Cell Biology 2012, 13, 263-269. (2) Weckwerth, W. Annual Review of Plant Biology 2003, 54, 669-689. (3) Lu, X.; Zhao, X.; Bai, C.; Zhao, C.; Lu, G.; Xu, G. Journal of Chromatography B-Analytical Technologies in the Biomedical and Life Sciences 2008, 866, 64-76. (4) Ren, S. C.; Shao, Y. P.; Zhao, X. J.; Hong, C. S.; Wang, F. B.; Lu, X.; Li, J.; Ye, G. Z.; Yan, M.; Zhuang, Z. P.; Xu, C. L.; Xu, G. W.; Sun, Y. H. Molecular & Cellular Proteomics 2016, 15, 154-163. (5) da Silva, R. R.; Dorrestein, P. C.; Quinn, R. A. Proceedings of the National Academy of Sciences of the United States of America 2015, 112, 12549-12550. (6) Chen, J.; Zhao, X.; Fritsche, J.; Yin, P.; Schmitt-Kopplin, P.; Wang, W.; Lu, X.; Haring, H. U.; Schleicher, E. D.; Lehmann, R.; Xu, G. Analytical Chemistry 2008, 80, 1280-1289. (7) Zhu, X. C.; Chen, Y. P.; Subramanian, R. Analytical Chemistry 2014, 86, 1202-1209. (8) Chen, L. Y.; Zhou, L.; Chan, E. C. Y.; Neo, J.; Beuerman, R. W. Journal of Proteome Research 2011, 10, 4876-4882. (9) Bruderer, T.; Varesio, E.; Hopfgartner, G. Journal of Chromatography B-Analytical Technologies in the Biomedical and Life Sciences 2017, 1071, 3-10. (10) Bouyssie, D.; Dubois, M.; Nasso, S.; de Peredo, A. G.; Burlet-Schiltz, O.; Aebersold, R.; Monsarrat, B. Molecular & Cellular Proteomics 2015, 14, 771-781. (11) Gao, Y.; Zhang, R. P.; Bai, J. F.; Xia, X. J.; Chen, Y. H.; Luo, Z. G.; Xu, J.; Liu, Y. L.; He, J. M.; Abliz, Z. Analytical Chemistry 2015, 87, 7535-7539. (12) Lynn, K. S.; Cheng, M. L.; Chen, Y. R.; Hsu, C.; Chen, A.; Lih, T. M.; Chang, H. Y.; Huang, C. J.; Shiao, M. S.; Pan, W. H.; Sung, T. Y.; Hsu, W. L. Analytical Chemistry 2015, 87, 2143-2151. (13) Zhu, Z. J.; Schultz, A. W.; Wang, J. H.; Johnson, C. H.; Yannone, S. M.; Patti, G. J.; Siuzdak, G. Nature Protocols 2013, 8, 451-460. (14) Wang, Y.; Kora, G.; Bowen, B. P.; Pan, C. Analytical Chemistry 2014, 86, 9496-9503. (15) Huan, T.; Wu, Y. M.; Tang, C. Q.; Lin, G. H.; Li, L. Analytical Chemistry 2015, 87, 9838-9845. (16) Aicheler, F.; Li, J.; Hoene, M.; Lehmann, R.; Xu, G. W.; Kohlbacher, O. Analytical Chemistry 2015, 87, 7698-7704.

23

ACS Paragon Plus Environment

Page 25 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

(17) Zhao, X.; Zhou, L.; Yin, P.; Xu, G. Methods in molecular biology (Clifton, N.J.) 2015, 1277, 61-73. (18) Zhao, X.; Xu, F.; Qi, B.; Hao, S.; Li, Y.; Li, Y.; Zou, L.; Lu, C.; Xu, G.; Hou, L. Journal of Proteome Research 2014, 13, 1101-1111. (19) Lu, C. X.; Zhao, X. J.; Li, Y.; Li, Y. J.; Yuan, C. K.; Xu, F.; Meng, X. Y.; Hou, L. H.; Xu, G. W. Journal of Pharmaceutical and Biomedical Analysis 2016, 120, 127-133. (20) Hansen, J. S.; Zhao, X.; Irmler, M.; Liu, X.; Hoene, M.; Scheler, M.; Li, Y.; Beckers, J.; de Angelis, M. H.; Haering, H.-U.; Pedersen, B. K.; Lehmann, R.; Xu, G.; Plomgaard, P.; Weigert, C. Diabetologia 2015, 58, 1845-1854. (21) Huang, Q.; Tan, Y.; Yin, P.; Ye, G.; Gao, P.; Lu, X.; Wang, H.; Xu, G. Cancer Research 2013, 73, 4992-5002. (22) Ren, S.; Shao, Y.; Zhao, X.; Hong, C. S.; Wang, F.; Lu, X.; Li, J.; Ye, G.; Yan, M.; Zhuang, Z.; Xu, C.; Xu, G.; Sun, Y. Molecular & Cellular Proteomics 2016, 15, 154-163. (23) Huang, Y.; Chen, G.; Liu, X.; Shao, Y.; Gao, P.; Xin, C.; Cui, Z.; Zhao, X.; Xu, G. Journal of Proteome Research 2014, 13, 5715-5723. (24) Zeng, Z. D.; Liu, X. Y.; Dai, W. D.; Yin, P. Y.; Zhou, L. N.; Huang, Q.; Lin, X. H.; Xu, G. W. Analytical Chemistry 2014, 86, 3793-3800. (25) Xu, H.; Freitas, M. A. Bmc Bioinformatics 2010, 11, 436. (26) Luo, P.; Dai, W.; Yin, P.; Zeng, Z.; Kong, H.; Zhou, L.; Wang, X.; Chen, S.; Lu, X.; Xu, G. Analytical Chemistry 2015, 87, 5050-5055.

24

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 25

For TOC only

25

ACS Paragon Plus Environment