SWATHtoMRM: Development of High-Coverage ... - ACS Publications

Feb 27, 2018 - Interdisciplinary Research Center on Biology and Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of. Sciences .... ...
0 downloads 4 Views 2MB Size
Subscriber access provided by UNIV OF SCIENCES PHILADELPHIA

Article

SWATHtoMRM: Development of High-Coverage Targeted Metabolomics Method Using SWATH Technology for Biomarker Discovery Haihong Zha, Yuping Cai, Yandong Yin, Zhuozhong Wang, Kang Li, and Zheng-Jiang Zhu Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.7b05318 • Publication Date (Web): 27 Feb 2018 Downloaded from http://pubs.acs.org on February 27, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Analytical Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

SWATHtoMRM: Development of High-Coverage Targeted Metabolomics Method Using SWATH Technology for Biomarker Discovery Haihong Zha1,2,†, Yuping Cai1,2,†, Yandong Yin1,†, Zhuozhong Wang1,3, Kang Li3,Zheng-Jiang Zhu1 1

Interdisciplinary Research Center on Biology and Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai, 200032 P. R. China

2

3

University of Chinese Academy of Sciences, Beijing, 100049 P. R. China Department of Epidemiology and Biostatistics, School of Public Health, Harbin Medical

University, Harbin 150086 P. R. China Corresponding Author:

Dr. Zheng-Jiang Zhu

*e-mail: [email protected], Phone: 86-21-68582296

Author Contributions: †, These authors contributed equally

ABSTRACT The complexity of metabolome presents a great analytical challenge for quantitative metabolite profiling, and restricts the application of metabolomics in biomarker discovery. Targeted metabolomics using multiple-reaction monitoring (MRM) technique has excellent capability for quantitative analysis, but suffers from the limited metabolite coverage. To address this challenge, we developed a new strategy, namely, SWATHtoMRM, which utilizes the broad coverage of SWATH-MS technology to develop high-coverage targeted metabolomics method. Specifically, SWATH-MS technique was first utilized to untargeted profile one pooled biological sample, and to acquire the MS2 spectra for all metabolites. Then, SWATHtoMRM was used to extract the large-scale MRM transitions for targeted analysis with coverage as high as 1000-2000 metabolites. Then, we demonstrated the advantages of SWATHtoMRM method in quantitative analysis such as coverage, reproducibility, sensitivity, and dynamic range. Finally, we applied our SWATHtoMRM approach to discover potential metabolite biomarkers for colorectal cancer (CRC) diagnosis. A high-coverage targeted metabolomics method with 1,303 metabolites in one injection was developed to profile colorectal cancer tissues from CRC patients. 20 potential metabolite biomarkers were discovered and validated for CRC diagnosis. In plasma samples from CRC patients, 17 out of 20 potential biomarkers were further validated to be associated with tumor resection, which may have a great

1

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 20

potential in assessing the prognosis of CRC patients after tumor resection. Together, the SWATHtoMRM strategy provides a new way to develop high-coverage targeted metabolomics method, and facilitates the application of targeted metabolomics in disease biomarker discovery. The SWATHtoMRM program is freely available on the Internet (http://www.zhulab.cn/software.php).

KEYWORDS Targeted metabolomics; multiple-reaction monitoring (MRM); high coverage; SWATH-MS; biomarker discovery; colorectal cancer (CRC)

INTRODUCTION Metabolomics aims to quantitatively measure the dynamic changes of all metabolites at the systems level, and provides a comprehensive characterization of disease phenotype to facilitate biomarker discovery.1-3 Metabolites have the enormous structural diversity and a broad range of concentrations (at least nine orders of magnitude).4 Therefore, metabolomics requires powerful analytical techniques with broad coverage, high sensitivity and specificity, and wide dynamic range. Liquid chromatography-mass spectrometry (LC-MS) is one of the most popular technologies for metabolite profiling, including untargeted and targeted metabolomics.1,5 Untargeted metabolomics commonly employs high-resolution MS (such as time-of-flight (TOF)6 and Orbitrap7) to measure metabolites as many as possible, including known and unknown ones.5,8 Untargeted metabolomics has a broad coverage on metabolite measurement, but is significantly limited by dynamic range, quantitative accuracy and sensitivity.9 For example, the detector is readily saturated by high abundant ions in a MS scan,10 which makes the accurate quantitation of metabolites across a wide range of concentrations challenging. In comparison, LC-MS based targeted metabolomics typically using multiple-reaction monitoring (MRM) technique could precisely quantify a predefined set of known metabolites in biological samples.9,11-13 With the chemical standards, several targeted approaches have been developed to measure 100-300 metabolites in one analysis.14-19 Targeted metabolomics has been considered as the gold standard for metabolite quantitation due to its high sensitivity, wide dynamic range and good reproducibility.11 Nevertheless, the low metabolite coverage is a major bottleneck for targeted metabolomics. Therefore, it is important to develop a high-coverage targeted metabolomics method to analyze both known and unknown metabolites discovered in untargeted metabolomics. Recently, several targeted metabolomic approaches have been developed to quantify metabolites with broad coverage. A large-scale set of MRM transitions for targeted analysis was obtained from untargeted profiling. For example, GOT-MS method developed by Gu et. al. utilized 2

ACS Paragon Plus Environment

Page 3 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

selected ion monitoring (SIM) incremental scanning and MS/MS incremental scanning in QqQ instrument to profile metabolites and construct MRM transitions.20 Similarly, Chen et. al. recently developed a sequentially stepped targeted MS/MS approach to obtain MRM transitions from high resolution Orbitrap instrument.21 Both approaches have targeted unknown metabolites to expand the coverage of targeted metabolomics. However, both approaches require 20-30 injections to acquire MS2 spectra from biological samples. Alternatively, another method, namely, pseudo targeted metabolomics, was developed by picking MRM transitions from untargeted profiling using data dependent MS2 acquisition (DDA).22,23 Only one LC-MS analysis was performed to increase the efficiency, but the metabolite coverage was largely restricted due to the biased selection of precursor ions in DDA.24,25 In DDA, it is well known that the selection of precursor ions for fragmentation heavily favors the high abundant ions. Thus, the low abundant ions usually have no MS2 spectra acquired.25 In addition, the detectable rate for all methods, defined as the number of detected MRM transitions in targeted analysis divided by the number of all generated MRM transitions from untargeted profiling, was not evaluated. Taken together, metabolite coverage, data acquisition efficiency and detectable rate are the key aspects for the development of high-coverage targeted metabolomics method. Alternatively, data-independent acquisition (DIA) technique has been recently developed for untargeted metabolomics due to its high efficiency and broad coverage on the acquisition of all MS2 spectra.26,27 For example, SWATH-MS technique sequentially parallelizes the fragmentation of precursor ions within a predefined wide mass range (e. g., 25 Da).28 Therefore, MS2 spectra of all precursor ions are readily acquired in one analysis. Tsugawa et. al. and our group have recently demonstrated the application of SWATH-MS technique in metabolomics, and programs such as MS-DIAL26 and MetDIA27 have been developed for metabolite identification. The quantification performances of SWATH-MS for metabolomics (using either SWATH_MS1 or SWATH_MS2 ions) were systematically evaluated by Chen et. al.,2 9 but the dynamic range and reproducibility is limited for the analysis of complex biological samples. Since SWATH-MS is capable to acquire MS2 spectra of all precursor ions in one analysis, here, we explored the possibility to utilize SWATH-MS technique to develop a high-coverage targeted metabolomics method, which we call SWATHtoMRM. Specifically, SWATH-MS technique was first utilized to acquire the MS2 spectra for all metabolites in biological samples. Then, the program SWATHtoMRM is developed to extract a large-scale set of MRM transitions for targeted analysis with coverage as high as 1000-2000 metabolites in one experiment. We also demonstrated the advantages of SWATHtoMRM method in quantitative analysis such as reproducibility, sensitivity, and dynamic range compared to the quantitation using either SWATH_MS1 or SWATH_MS2. Finally, to demonstrate the utility, we applied our SWATHtoMRM approach to discover the potential metabolite biomarkers for colorectal cancer (CRC) diagnosis. CRC is one of the most

3

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 20

common and lethal cancers worldwide.30 The current CRC diagnosis approaches such as the preoperative endoscopic and radiological imaging are either invasive or of high cost.31 Recent studies have revealed the significant metabolic dysregulation associated with CRC.31-35 The clinical diagnosis of CRC using metabolites as potential biomarkers is considered to be promising.31 Here, using the SWATHtoMRM approach, a high-coverage targeted metabolomics method with 1,303 metabolites in one injection was first developed to profile colorectal cancer tissues and adjacent normal tissues surgically excised from CRC patients. Then, 20 potential metabolite biomarkers were discovered and externally validated using an independent set of CRC tissues. Importantly, in plasma samples, 17 out of 20 potential biomarkers have been further validated to be associated with tumor resection. The discovered 17 potential biomarkers in plasma are of great diagnostic potential for clinical application, and they may also be valuable in assessing the outcome of tumor resection surgery and in predicting the recurrence of CRC. Together, the SWATHtoMRM strategy facilitates the development of high-coverage targeted metabolomics method and the use of targeted metabolomics for biomarker discovery.

EXPERIMENT SECTION All SWATH-MS data were acquired using a UHPLC system (Agilent 1290, Agilent Technologies) coupled to a quadruple time-of-flight mass spectrometer (TripleTOF 6600, Sciex). All MRM data were acquired using a UHPLC system (Agilent 1290, Agilent Technologies) coupled to a triple quadruple mass spectrometer (Agilent 6495 QqQ, Agilent Technologies). Experimental details about chemicals, sample preparation and data acquisition parameters are provided in the Supporting Information. SWATHtoMRM workflow. SWATHtoMRM is an R package to construct a large-scale set of MRM transitions from the acquired SWATH-MS data files (Figure 1a), which is freely available on the Internet (http://www.zhulab.cn/software.php). It mainly includes two parts: untargeted analysis of SWATH-MS data and generation of MRM transitions. (1) Untargeted analysis of SWATH-MS data. The raw SWATH-MS data (.wiff files) were first converted to mzXML files using the "msconvert” program form ProteoWizard (version 3.0.6428). Multiple data files were grouped in one folder and processed by SWATHtoMRM. There are 4 steps for the analysis of SWATH-MS data: (1) MS1 peak detection and alignment; (2) extraction of MS2 peaks and chromatograms; (3) MS1 & MS2 peak grouping; (4) generation of consensus MS2 spectrum. Firstly, peak detection was operated on MS1 data using CentWave algorithm36 in XCMS (version 1.46.0, https://bioconductor.org/packages/3.2/bioc/html/xcms.html). The m/z tolerance for peak detection was set as 15 ppm. The parameter “peakwidth” was set as (8, 30) in unit of seconds. The parameter “sn”, referring to signal to noise ratio, was set as 20. The parameter “minfrac”, referring to 4

ACS Paragon Plus Environment

Page 5 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

the frequency of the metabolite detected in one sample group, was set as 1. R package CAMERA37 was used for peak annotation, and we removed the isotope peaks and low abundant peaks with intensities less than 1,000 counts. For multiple data files, peak alignment was performed using ordered bijective interpolated warping (OBI-Warp) algorithm38 in XCMS. Secondly, for each detected MS1 peak, the corresponding multiplexed MS2 spectrum was extracted from the corresponding SWATH window at the apex of MS1 peak (Figure 1b). Then, we purified the MS2 spectrum by removing ions with intensity lower than 200 counts, isotope ions, and ions with m/z larger than the precursor ion. The ion chromatograms of the remained fragment ions were further extracted. Thirdly, we grouped the extracted ion chromatograms (EICs) of precursor ion and their corresponding MS2 ions. Similar to our previous publication of MetDIA27, peak−peak correlation (PPC) scores were calculated between the precursor ion and each of its product ions. According to the results in MetDIA, MS2 ions with the PPC score larger than 0.8 were regarded as true product ions, and were selected to generate a pseudo MS2 spectrum (Figure 1b). Finally, for each MS1 peak, all pseudo MS2 spectra from biological samples were combined to generate one consensus MS2 spectrum (Figure 1c).The consensus spectrum strategy was applied to improve the spectral reproducibility demonstrated in previous work.39 Specifically, fragment ions from the common precursor ion with frequency over 50 % of all samples were remained in the consensus spectrum. The final m/z and relative intensity of each fragment ion in consensus spectrum was defined by calculating the mean m/z and normalized intensity values from all samples. Thus, the consensus spectra were output for the subsequent generation of MRM transitions. (2) Generation of MRM transitions. Each product ion in the consensus spectrum of a specific metabolite was further evaluated to generate the MRM transitions. There are three criteria for the selection of MRM transitions: (1) m/z(product ion ) < m/z(precursor ion)−14.0126 Da, which means that at least a “CH2” group (14.0156 Da ± 3 mDa) is lost from the precursor ion upon fragmentation; (2) eliminating the product ions with the loss of H2O, NH3 or CO2, and users can include different neutral losses in the program; (3) choosing the product ion with the highest intensity among the remaining product ions. Here, each metabolite has one chosen MRM transition. Finally, a csv file containing all generated MRM transitions was output for targeted analysis. A dynamic MRM method (dMRM) was then constructed using Agilent MassHunter Method Editor to maximize the number of measured MRM transitions in each analysis. User-defined parameters such as minimum dwell time (5 ms) and cycle time (990 ms) were used. Data processing and statistical analysis. For data processing, raw MRM data (.d files) and MRM transition list were imported into Skyline40 (version 3.6.0.10493) for qualitative analyses. For each MRM transition, the integration range was manually checked for accurate quantitation.

5

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 20

Univariate analysis (Wilcoxon test) and multivariate analysis (principle component analysis (PCA) and partial least-squares (PLS)) were performed using R (version 3.3.2).

Figure 1. (a) The general workflow for SWATHtoMRM. (b) The extraction of pseudo MS2 spectrum from SWATH-MS data. (c) The generation of consensus MS2 spectra among multiple samples for the selection of MRM transitions. The fragments with more than 50 % frequency were kept.

RESULTS AND DISCUSSION SWATHtoMRM workflow. We first demonstrated the feasibility of the SWATHtoMRM strategy (Figure 1) using a variety of biological samples, such as human urine, human colorectal tissue, and Jurkat cell samples. Taken human urine sample as an example, a total of 3,554 MS1 features were detected from the SWATH-MS data set, and 950 MS1 features were further removed as isotopes or low abundant ones. Among the remaining 2,604 metabolite peaks, 2,091 metabolites (80.3%) were generated with suitable MRM transitions using our SWATHtoMRM workflow. The metabolite peaks without MRM transition generated are due to the lack of MS2 ions with PPC score larger than 0.8 in the consensus MS2 spectrum. Manual analysis of all 2,091 MRM transitions confirmed that up to 1,614 metabolites (77.2%) were successfully detected in human urine samples (Figure 2a). Figure 2b demonstrated an example to construct the MRM transitions using SWATHtoMRM. Taken a detected peak in SWATH-MS data (m/z = 126.0212 Da, RT = 518 seconds) as an example, it was identified as taurine through spectral match with our in-house spectral library (Figure S1 in Supporting Information). Firstly, the raw multiplexed MS2 spectra containing multiple MS2 ions were 6

ACS Paragon Plus Environment

Page 7 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

extracted. Ion chromatograms of all MS2 ions were further extracted and MS2 ions with PPC scores larger than 0.8 were kept. Then, a consensus MS2 spectrum containing 13 MS2 ions was generated from the human urine samples (n=6). The generated consensus spectrum is well matched with the standard MS2 spectrum from the chemical standard, which proved the validity of our workflow. Finally, the most intensive MS2 ion (m/z = 44.0496 Da) was selected to construct MRM transition (Q1/Q3, 126.0/44.0) for metabolite taurine. Similarly, we demonstrated the SWATHtoMRM approach using other biological samples (Figure 2c and Supporting Data Files 01-03). More than 75% of metabolite MRM transitions were successfully detected in targeted analysis. Taken together, we conclude that our developed SWATHtoMRM strategy is capable to generate reliable MRM transitions from biological samples in a large-scale. In this work, all data were acquired in positive mode to demonstrate the workflow of SWATHtoMRM. The SWATHtoMRM approach is also applicable for negative mode data.

Figure 2. (a) The generation of the large-scale MRM transitions using human urine sample. (b) The generation of MRM transition for metabolite taurine (m/z = 126.0212 Da, RT = 518 seconds) from SWATH-MS data. The peak intensities of MS2 ions except m/z 44.0496 was magnified by 10-fold in the second panel. (c) The number of detected MRM transitions for different biological samples: human urine, human colorectal tissue, and Jurkat cell.

7

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 20

Broad coverage of SWATHtoMRM. In order to demonstrate the broad coverage of our strategy, we compared the performance of our strategy with DDA-MS technique (Figure 3). Using the same human urine samples, we sequentially acquired both SWATH-MS and DDA-MS data with the same acquisition parameters (See Supporting Information). A modified SWATHtoMRM workflow was used to simultaneously process both SWATH-MS and DDA-MS data sets (See Supporting Information). For the same urine sample, 2,604 and 2,149 features were detected in SWATH-MS and DDA-MS data sets, respectively (Figure 3a and 3b). We compared the MS2 coverage in SWATH-MS and DDA-MS data sets. In Figure 3b, it is clear that the MS2 coverage in SWATH-MS (2,105 features, 80.8 %) is significantly higher than in DDA-MS data (1,174 features, 54.6 %). Further, we compared the MS2 coverage from the shared 1,539 MS1 features acquired by both techniques (Figure 3a). Again, the MS2 coverage of SWATH-MS (84.9 %) is much higher than in DDA-MS (61.1 %).

Figure 3. Broad coverage of MRM transitions generated from SWATH-MS data. (a) The Venn diagram of shared and unique MS1 features in the SWATH-MS and DDA-MS data sets. (b) Distribution of MS1 features with MS2 spectra using SWATH and DDA techniques. Red/blue: features with MS2 spectrum; grey: features without MS2 spectrum. (c) MRM transitions constructed from SWATH-MS and DDA-MS data sets. (d) The comparison of detected MRM transitions of metabolites generated from SWATH-MS and DDA-MS data sets in targeted analyses.

8

ACS Paragon Plus Environment

Page 9 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Using the detected MS2 spectra, we generated 2,091 and 1,163 MRM transitions from SWATH-MS and DDA-MS datasets, respectively, to construct the targeted assays (Figure 3c). There were 852 metabolites commonly existing in both datasets. The detectable rate of the shared 852 metabolite was as high as 87.2 % and 88.7 % using the MRM transitions generated from SWATH-MS and DDA-MS datasets, respectively (Figure 3d). It proved that SWATHtoMRM can generate reliable MRM transitions from SWATH-MS data, which are comparable to conventional DDA-MS technique. However, for the total analyzed metabolites, the number of detected metabolites from SWATH-MS data was 66 % more than that from DDA-MS data. In summary, our SWATHtoMRM approach enabled to construct MRM transitions from SWATH-MS data with significantly broader coverage than those from DDA-MS data. Meanwhile, MRM transitions generated from SWATH-MS have high quality similar to those from DDA-MS. Quantitative performance of SWATHtoMRM. We further evaluated the quantitative performance of our SWATHtoMRM approach. We compared the sensitivity, dynamic range, and reproducibility of SWATHtoMRM to both SWATH_MS1 and SWATH_MS2 in SWATH-MS data set. Here, we serially diluted the human urine samples up to 210-fold, and analyzed these samples using both SWATH-MS and SWATHtoMRM. Then, we randomly selected 629 detected metabolites for subsequent comparison (see Supporting Information). In Figure 4a, we first compared the sensitivity by demonstrating the numbers of detected metabolites at each diluted concentration using SWATHtoMRM, SWATH_MS1 and SWATH_MS2. Obviously, SWATHtoMRM approach showed a much better sensitivity than SWATH_MS1 and SWATH_MS2 due to the high sensitivity of QqQ instrument compared to TOF instrument. However, the sensitivities of SWATH_MS1 and SWATH_MS2 are comparable. We further compared the sensitivity of three approaches for different abundant metabolites, and SWATHtoMRM performed best in all concentration range, especially for low abundant metabolites (Figure 4b). For the comparison of dynamic range, we calculated the R2 values for the quantification of 629 metabolites in the entire 210-fold dilution series of urine samples. The dynamic range of MRM method is much broader than SWATH_MS1 and SWATH_MS2. The R2 values of 0.95 ± 0.15, 0.87 ± 0.17 and 0.80 ± 0.2 (median ± SD) were obtained using SWATHtoMRM, SWATH_MS1 and SWATH_MS2 for metabolite quantification, respectively. The percentage of metabolites with R2 value larger than 0.8 was almost 80 % in SWATHtoMRM, while it was only less than 60 % using either SWATH_MS1 or SWATH_MS2 analyses (Figure 4c). And the stepped shape of the curves is due to the existing of non-detected metabolites in low concentrations. Interestingly, a better quantitative performance was achieved using SWATH_MS1 data compared to SWATH_MS2 data. Taken two metabolites 2-Aminoadipic acid and Tyrosine as examples, SWATHtoMRM approach demonstrated a wider dynamic range than SWATH_MS1 and SWATH_MS2 (Figure 4d). For SWATH_MS1 and

9

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 20

SWATH_MS2 approaches, metabolites are not detected in low concentrations (i.e., high folds of dilution) generates the narrower dynamic ranges.

Figure 4.The quantitative performance of SWATHtoMRM. (a) Comparison of the number of detected metabolites at each diluted concentration among SWATHTtoMRM, SWATH_MS1 and SWATH_MS2. (b) Number of detected metabolites with different abundances. (c) Cumulative percentages of R2 values for the quantifications using SWATHtoMRM, SWATH_MS1 and SWATH_MS2 in the entire 210-fold dilution series of urine samples. (d) The dynamic ranges of two metabolites: 2-aminoadipic acid and tyrosine. (e) The distribution of measured intensity ratio of 629 metabolites between two adjacent diluted urine samples.

10

ACS Paragon Plus Environment

Page 11 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

We further investigated the quantitative reproducibility of three approaches. The measured intensity ratios of 629 metabolites in adjacent diluted urine samples (4X vs.16X and 16X vs. 64X by serial dilution) were calculated. As shown in Figure 4e, the measured intensity ratios using either SWATH_MS1 or SWATH_MS2 displayed broader distributions than those using SWATHtoMRM method. Therefore, the SWATHtoMRM method performed significantly better in quantitative reproducibility especially for the lower metabolite concentrations. Collectively, our SWATHtoMRM approach demonstrated the high sensitivity, broad dynamic range and high reproducibility, and is suitable for quantitative metabolomic profiling. Application of SWATHtoMRM to biomarker discovery. To demonstrate the utility, we applied our SWATHtoMRM approach to discover potential metabolite biomarkers for colorectal cancer diagnosis. We collected the paired cancer and adjacent healthy tissues surgically excised from CRC patients (n=18 pairs for the training set; n=42 pairs for the validation set; Table S1 in Supporting Information). The pooled QC samples from the training set were used to collect SWATH-MS data and construct the list of MRM transitions. A total of 1,705 MRM transitions were obtained from SWATH-MS data and were subjected to the verification using the MRM analysis. After manual analysis, 1,303 MRM transitions (76.4 %) were verified to be successfully detected in targeted analysis. Then, a dynamic MRM method with all of 1,303 MRM transitions in one injection was developed for the targeted analysis of 36 tissue samples in the training set. Among these metabolites, a total of 1,213 MRM transitions (93.1 %) were detected in 4 or more tissue samples and kept for subsequent statistical analysis. For the comparison, we also collected the SWATH-MS data for the training set. To evaluate the quantitative performances of SWATHtoMRM in analyzing clinical samples, Firstly, we calculated the relative standard variations (RSDs) of metabolites detected in pooled QC samples acquired using SWATH-MS1 and SWATHtoMRM. The SWATHtoMRM demonstrated higher reproducibility and sensitivity compared to SWATH-MS1 (Figure 5a and Figure S2a in Supporting Information). The higher RSDs for the SWATH-MS1 data are likely due to different reasons such as missing values and wrong peak grouping in the data processing. We further performed the PCA analysis on these two datasets (Figure 5b). The SWATHtoMRM distinguished the CRC and controls better than the SWATH-MS1 approach. For intra-group comparison, the sum of distances between each sample and the center of the each group was calculated to characterize the clustering tightness of each group and the classification reproducibility. In SWATHtoMRM dataset, the clustering tightness for both CRC and healthy control groups was about 25 % higher than in SWATH-MS dataset (Figure S2b in Supporting Information). The inter-group distance was also larger using SWATHtoMRM compared to SWATH-MS1 (28.7 versus 25.8, Figure S2c in Supporting Information). The results demonstrated that SWATHtoMRM reduced the intra-group data variability and improved

11

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 20

the inter-group discrimination ascribing to its good quantification performances. Given the advantages of SWATHtoMRM on metabolite quantification, we demonstrated the use of SWATHtoMRM strategy to facilitate the potential biomarker discovery for colorectal cancer diagnosis (Figure 5c). In biomarker discovery phase, 1,303 metabolites were targeted analyzed in training set, and 358 of them were significantly changed (fold-change ≥ 1.5 and p-value < 0.01) (Figure 5d). 67 out of the 358 changed metabolites were successfully identified through matching the consensus spectra against our in-house spectral library. Meanwhile, the PLS-DA analysis was applied to demonstrate the metabolites differences between cancer and healthy tissue (Figure S3a in Supporting Information). The top 20 metabolites ranked by the values of variable importance in the projection (VIP) were considered as the potential biomarkers (Figure 5d and Figure S3b, Figure S4and Table S2 in Supporting Information). The PLS based prediction model using the selected 20 metabolites has an excellent classification performance with an AUC value of 1 (Figure S5 in Supporting Information). In external validation set of tissues, the MRM transitions of 20 potential biomarkers were targeted analyzed and total 20 metabolites were detected. Then, we further evaluated the prediction accuracy using the 20 potential metabolite biomarkers in validation set. Similarly, the PLS based prediction model using the 20 metabolites demonstrated an excellent classification performance (Figure 5e and Figure S6 in Supporting Information). The AUC value is 0.998 (95 % confidence interval, CI 0.993-1.000) with the sensitivity of 100 % and the specificity of 97.6 %, and the optimal cut-off value is set as 0.392. Finally, we investigated the diagnostic potential of these metabolite biomarkers using plasma samples collected from 34 CRC patients before (with tumor) and after (without tumor) tumor resection surgery (Table S1 in Supporting Information). In targeted analysis, 17 out of 20 metabolites were detected in plasma samples (Table S2 in Supporting Information). As illustrated in Figure 5f and Figure S7 in Supporting Information, the panel of 17 potential biomarkers showed a reliable prediction of tumor resection in CRC patients, and the AUC value is as high as 0.779 (95 % confidence interval, CI 0.657-0.900) with the sensitivity of 91.2 % and the specificity of 64.7 %. These metabolites validated in plasma samples may have a great potential in assessing the prognosis of CRC patients after tumor resection. The post-operative follow-up of these patients is in progress. In the future, we will correlate these metabolites with the clinical data (i.e., CRC recurrence or complete remission) and validate whether these metabolites can be used to predict or differentiate the post-operation status. Collectively, these results demonstrated the utility of SWATHtoMRM method to profile metabolites in clinical samples towards biomarker discovery.

12

ACS Paragon Plus Environment

Page 13 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figure 5. (a) The distribution of relative standard variations (RSDs) for 1,213 metabolites measured from the pooled QC samples in the training set. (b) The PCA score plots of metabolic profiles from CRC tissues and adjacent healthy tissue using SWATH-MS1and SWATHtoMRM, respectively. (c) The integrative workflow for potential biomarker discovery for CRC diagnosis using the SWATHtoMRM strategy. (d) Volcano plot of 1,213 metabolites measured in the training set. Red dots represent 20 potential biomarkers. (e) Left panel is the receiver operating characteristic (ROC) curve for PLS based prediction model using 20 potential biomarkers. Right pane is the distribution of probabilities. The cut-off value is 0.392. (f) Left panel is the ROC curve for PLS based prediction model using 17 potential biomarkers in plasma sample. Right pane is the distribution of probabilities. The cut-off value is 0.650.

13

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 20

CONCLUSIONS In this study, we demonstrated a new strategy, namely, SWATHtoMRM, to develop targeted metabolomics method with coverage as high as 1000-2000 metabolites in one experiment. The coverage of metabolite MRM transitions generated using SWATHtoMRM method is much broader compared to that using DDA-MS method. Importantly, SWATHtoMRM is able to generate the MRM transitions for both known and unknown metabolites from the SWATH-MS data set. Meanwhile, MRM transitions generated from SWATH-MS data have the high quality similar to those from DDA-MS data. We also demonstrated that the SWATHtoMRM method has better reproducibility, higher sensitivity and wider dynamic range than methods using SWATH-MS1 or SWATH-MS2. As a proof of practical application, SWATHtoMRM was utilized to discover potential biomarkers for colorectal cancer diagnosis. A high-coverage targeted metabolomics method with 1,303 metabolites in one injection was developed to profile colorectal tissues. As a result, 20 potential metabolite biomarkers were discovered, and 17 out of them were further validated in plasma samples to be related with tumor resection surgery. The results suggest that the 17 metabolites have a great potential in clinical application for postoperative inspection of CRC patients. Therefore, we believe that SWATHtoMRM provides a new approach for the development of large-scale targeted metabolomics and should have a broad application in metabolomics, especially for the discovery of potential metabolite biomarkers in clinical application.

ASSOCIATED CONTENT Supporting Information The Supporting Information Available is available free of charge via the Internet at http://pubs.acs.org. Tables S1-S2, Figures S1-S7, and supplemental experimental section. Supporting data files (.csv) for the SWATHtoMRM result outputs of human urine, colorectal tissue and Jurkat cell samples.

AUTHOR INFORMATION Corresponding Author *E-mail: [email protected], phone: 86-21-68582296 Notes The authors declare no competing financial interest.

ACKNOWLEDGMENTS The work is financially supported by National Natural Science Foundation of China (Grant No. 21575151). Z.-J. Z. is supported by Thousand Youth Talents Program from Government of China. 14

ACS Paragon Plus Environment

Page 15 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

REFERENCES (1) Patti, G. J.; Yanes, O.; Siuzdak, G. Nat. Rev. Mol. Cell Biol. 2012, 13, 263-269. (2) Nicholson, J. K.; Holmes, E.; Kinross, J. M.; Darzi, A. W.; Takats, Z.; Lindon, J. C. Nature 2012, 491, 384-392. (3) Wishart, D. S. Nat. Rev. Drug Discov. 2016, 15, 473-484. (4) Want, E. J.; Wilson, I. D.; Gika, H.; Theodoridis, G.; Plumb, R. S.; Shockcor, J.; Holmes, E.; Nicholson, J. K. Nat. Protoc. 2010, 5, 1005-1018. (5) Cajka, T.; Fiehn, O. Anal. Chem. 2016, 88, 524-545. (6) Zhu, Z. J.; Schultz, A. W.; Wang, J.; Johnson, C. H.; Yannone, S. M.; Patti, G. J.; Siuzdak, G. Nat. Protoc. 2013, 8, 451-460. (7) Lu, W.; Clasquin, M. F.; Melamud, E.; Amador-Noguez, D.; Caudy, A. A.; Rabinowitz, J. D. Anal. Chem. 2010, 82, 3212-3221. (8) Vinayavekhin, N.; Saghatelian, A. Curr.Protoc. Mol. Biol. 2010, Chapter 30, Unit 30.1.1-24. (9) Zhou, J.; Yin, Y. Analyst 2016, 141, 6362-6373. (10) Lacorte, S.; Fernandez-Alba, A. R. Mass Spectrom. Rev. 2006, 25, 866-880. (11) Griffiths, W. J.; Koal, T.; Wang, Y.; Kohl, M.; Enot, D. P.; Deigner, H. P. Angew. Chem., Int. Ed. 2010, 49, 5426-5445. (12) Wong, J. W.; Abuhusain, H. J.; McDonald, K. L.; Don, A. S. Anal. Chem. 2012, 84, 470-474. (13) Tsugawa, H.; Arita, M.; Kanazawa, M.; Ogiwara, A.; Bamba, T.; Fukusaki, E. Anal. Chem. 2013, 85, 5191-5199. (14) Bajad, S. U.; Lu, W.; Kimball, E. H.; Yuan, J.; Peterson, C.; Rabinowitz, J. D. J.Chromatogr. A. 2006, 1125, 76-88. (15) Wei, R.; Li, G.; Seymour, A. B. Anal. Chem. 2010, 82, 5527-5533. (16) Bennette, N. B.; Eng, J. F.; Dismukes, G. C. Anal. Chem. 2011, 83, 3808-3816. (17) Yuan, M.; Breitkopf, S. B.; Yang, X.; Asara, J. M. Nat. Protoc. 2012, 7, 872-881. (18) Locasale, J. W.; Melman, T.; Song, S.; Yang, X.; Swanson, K. D.; Cantley, L. C.; Wong, E. T.; Asara, J. M. Mol. Cell.Proteomics 2012, 11, M111.014688. (19) Cai, Y.; Weng, K.; Guo, Y.; Peng, J.; Zhu, Z.-J. Metabolomics 2015, 11, 1575-1586. (20) Gu, H.; Zhang, P.; Zhu, J.; Raftery, D. Anal. Chem. 2015, 87, 12355-12362. (21) Chen, Y.; Zhou, Z.; Yang, W.; Bi, N.; Xu, J.; He, J.; Zhang, R.; Wang, L.; Abliz, Z. Anal. Chem. 2017, 89, 6954-6962. (22) Chen, S.; Kong, H.; Lu, X.; Li, Y.; Yin, P.; Zeng, Z.; Xu, G. Anal. Chem. 2013, 85, 8326-8333. (23) Luo, P.; Dai, W.; Yin, P.; Zeng, Z.; Kong, H.; Zhou, L.; Wang, X.; Chen, S.; Lu, X.; Xu, G. Anal. Chem. 2015, 87, 5050-5055. (24) Zhu, X.; Chen, Y.; Subramanian, R. Anal. Chem. 2014, 86, 1202-1209. (25) Roemmelt, A. T.; Steuer, A. E.; Poetzsch, M.; Kraemer, T. Anal. Chem. 2014, 86, 11742-11749. (26) Tsugawa, H.; Cajka, T.; Kind, T.; Ma, Y.; Higgins, B.; Ikeda, K.; Kanazawa, M.; VanderGheynst, J.; Fiehn, O.; Arita, M. Nat. Methods 2015, 12, 523-526. (27) Li, H.; Cai, Y.; Guo, Y.; Chen, F.; Zhu, Z. J. Anal. Chem. 2016, 88, 8757-8764. (28) Gillet, L. C.; Navarro, P.; Tate, S.; Rost, H.; Selevsek, N.; Reiter, L.; Bonner, R.; Aebersold, R. Mol. Cell.Proteomics 2012, 11, O111.016717. (29) Chen, G.; Walmsley, S.; Cheung, G. C. M.; Chen, L.; Cheng, C. Y.; Beuerman, R. W.; Wong, T. Y.; Zhou, L.; Choi, H. Anal. Chem. 2017, 89, 4897-4906. 15

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 20

(30) Siegel, R. L.; Miller, K. D.; Jemal, A. CA Cancer J. Clin. 2016, 66, 7-30. (31) Wang, H.; Tso, V. K.; Slupsky, C. M.; Fedorak, R. N. Future Oncol. 2010, 6, 1395-1406. (32) Zhang, F.; Zhang, Y.; Zhao, W.; Deng, K.; Wang, Z.; Yang, C.; Ma, L.; Openkova, M. S.; Hou, Y.; Li, K. Oncotarget 2017, 8, 35460-35472. (33) Zhu, J.; Djukovic, D.; Deng, L.; Gu, H.; Himmati, F.; Chiorean, E. G.; Raftery, D. J. Proteome Res. 2014, 13, 4120-4130. (34) Manna, S. K.; Tanaka, N.; Krausz, K. W.; Haznadar, M.; Xue, X.; Matsubara, T.; Bowman, E. D.; Fearon, E. R.; Harris, C. C.; Shah, Y. M.; Gonzalez, F. J. Gastroenterology 2014, 146, 1313-1324. (35) Long, Y.; Sanchez-Espiridion, B.; Lin, M.; White, L.; Mishra, L.; Raju, G. S.; Kopetz, S.; Eng, C.; Hildebrandt, M. A. T.; Chang, D. W.; Ye, Y.; Liang, D.; Wu, X. Cancer 2017, 123, 4066-4074. (36) Tautenhahn, R.; Bottcher, C.; Neumann, S. BMC Bioinformatics 2008, 9, 504. (37) Kuhl, C.; Tautenhahn, R.; Bottcher, C.; Larson, T. R.; Neumann, S. Anal. Chem. 2012, 84, 283-289. (38) Prince, J. T.; Marcotte, E. M. Anal. Chem. 2006, 78, 6140-6152. (39) Yang, X.; Neta, P.; Stein, S. E. Anal.Chem. 2014, 86, 6393-6400. (40) Egertson, J. D.; MacLean, B.; Johnson, R.; Xuan, Y.; MacCoss, M. J. Nat. Protoc. 2015, 10, 887-903.

16

ACS Paragon Plus Environment

Page 17 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

For TOC only

17

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 1. (a) The general workflow for SWATHtoMRM. (b) The extraction of pseudo MS2 spectrum from SWATH-MS data. (c) The generation of consensus MS2 spectra among multiple samples for the selection of MRM transitions. The fragments with more than 50 % frequency were kept. 87x88mm (150 x 150 DPI)

ACS Paragon Plus Environment

Page 18 of 20

Page 19 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figure 2. (a) The generation of the large-scale MRM transitions using human urine sample. (b) The generation of MRM transition for metabolite taurine (m/z = 126.0212 Da, RT = 518 seconds) from SWATHMS data. The peak intensities of MS2 ions except m/z 44.0496 was magnified by 10-fold in the second panel. (c) The number of detected MRM transitions for different biological samples: human urine, human colorectal tissue, and Jurkat cell. 173x104mm (150 x 150 DPI)

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 4.The quantitative performance of SWATHtoMRM. (a) Comparison of the number of detected metabolites at each diluted concentration among SWATHTtoMRM, SWATH_MS1 and SWATH_MS2. (b) Number of detected metabolites with different abundances. (c) Cumulative percentages of R2 values for the quantifications using SWATHtoMRM, SWATH_MS1 and SWATH_MS2 in the entire 210-fold dilution series of urine samples. (d) The dynamic ranges of two metabolites: 2-aminoadipic acid and tyrosine. (e) The distribution of measured intensity ratio of 629 metabolites between two adjacent diluted urine samples. 162x149mm (150 x 150 DPI)

ACS Paragon Plus Environment

Page 20 of 20