Metabolite Identification for Mass Spectrometry ... - ACS Publications

Dec 28, 2014 - Metabolomics Using Multiple Types of Correlated Ion Information ... Institute of Information Science, Academia Sinica, Taipei, Taiwan. ...
1 downloads 0 Views 1MB Size
Subscriber access provided by A.A. Lemieux Library | Seattle University

Article

Metabolite Identification for Mass Spectrometry-Based Metabolomics Using Multiple Types of Correlated Ion Information Wen-Lian Hsu Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/ac503325c • Publication Date (Web): 28 Dec 2014 Downloaded from http://pubs.acs.org on January 1, 2015

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Analytical Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Analytical Chemistry

This document is confidential and is proprietary to the American Chemical Society and its authors. Do not copy or disclose without written permission. If you have received this item in error, notify the sender and delete all copies.

Metabolite Identification for Mass Spectrometry-Based Metabolomics Using Multiple Types of Correlated Ion Information

Journal: Manuscript ID: Manuscript Type: Date Submitted by the Author: Complete List of Authors:

Analytical Chemistry ac-2014-03325c.R1 Article 01-Dec-2014 Lynn, Ke-Shiuan; Institute of Information Science, Academia Sinica Cheng, Mei-ling; Chang Gung University, Department of Biomedical Sciences Sung, Ting-Yi; Institute of Information Science, Academia Sinica Chen, Yet-Ran; Agricultrual Biotechnology Research Center, Hsu, Chin; Department of Exercise Health Science, National Taiwan University of Physical Education and Sport Chen, Ann; Department of Biomedical Sciences, Chang Gung University Lih, T. Mamie; TIGP Bioinformatics Program, Academia Sinica Chang, Hui-Yin; TIGP Bioinformatics Program, Academia Sinica Huang, Ching-Jang; National Taiwan University, Biochemical Science and Technology Shiao, Ming-Shi; Department of Biomedical Sciences, Chang Gung University Pan, Wen-Harn; Academia Sinica, Institute of Biomedical Sciences Hsu, Wen-Lian; Academia Sinica, Bioinformatics Lab, Institute of Information Science

ACS Paragon Plus Environment

Page 1 of 12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Metabolite Identification for Mass Spectrometry-Based Metabolomics Using Multiple Types of Correlated Ion Information Ke-Shiuan Lynn1, Mei-Ling Cheng2, Ting-Yi Sung1, Yet-Ran Chen3, Chin Hsu4, Ann Chen2, T. Mamie Lih5, Hui-Yin Chang5, Ching-jang Huang6, Ming-Shi Shiao2, Wen-Harn Pan7, Wen-Lian Hsu1,* 1

Institute of Information Science, Academia Sinica, Taipei, Taiwan Department of Biomedical Sciences, Chang Gung University, Taoyuan, Taiwan 3 Agricultural Biotechnology Research Center, Academia Sinica, Taipei, Taiwan 4 Department of Exercise Health Science, National Taiwan University of Physical Education and Sport, Taichung, Taiwan 5 Bioinformatics Program, TIGP, Institute of Information Science, Academia Sinica, Taipei, Taiwan 6 Department of Biochemical Science and Technology, National Taiwan University, Taipei, Taiwan 7 Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan 2

ABSTRACT: Metabolite identification remains a bottleneck in mass spectrometry (MS)-based metabolomics. Currently, this process relies heavily on tandem mass spectrometry (MS/MS) spectra generated separately for peaks of interest identified from previous MS runs. Such a delayed and labor-intensive procedure creates a barrier to automation. Further, information embedded in MS data has not been used to its full extent for metabolite identification. Multimers, adducts, multiply charged ions, and fragments of given metabolites occupy a substantial proportion (40–80%) of the peaks of a quantitation result. However, extensive information on these derivatives, especially fragments, may facilitate metabolite identification. We propose a procedure with automation capability to group and annotate peaks associated with the same metabolite in the quantitation results of opposite modes, and to integrate this information for metabolite identification. In addition to the conventional mass and isotope ratio matches, we would match annotated fragments with low-energy MS/MS spectra in public databases. For identification of metabolites without accessible MS/MS spectra, we have developed characteristic fragment and common substructure matches. The accuracy and effectiveness of the procedure were evaluated using one public and two in-house liquid chromatography-mass spectrometry (LC-MS) datasets. The procedure accurately identified 89% of 28 standard metabolites with derivative ions in the datasets. With respect to effectiveness, the procedure confidently identified the correct chemical formula of at least 42% of metabolites with derivative ions via MS/MS spectrum, characteristic fragment, and common substructure matches. The confidence level was determined according to the fulfilled identification criteria of various matches and relative retention time.

Although the mass spectrometry (MS)-based metabolomics platform has become increasingly popular, hurdles exist in metabolite identification1. It is usually more costly, timeconsuming, and labor-intensive to handle high throughput metabolite identification than peptide/protein identification owing to the complexity of these metabolites, the lack of MS/MS spectra following MS scans, and the limited number of spectrum databases.2,3 Although there are some commercial kits that provide rapid identification of hundreds of metabolites, they are usually expensive and limited to certain types of samples and metabolites. For untargeted metabolomics, MS remains one of the most widely used platforms, and hence, there is a pressing need for a fast and effective metabolite identification tool. The basic approach for metabolite identification is mass match which searches for the metabolite in databases4-7 that has the closest mass-to-charge ratio (m/z) to that of the peak. However, due to noise and other limitations, the computed peak mass often drifts slightly away from its theoretical value. Even with the state-of-the-art mass spectrometer claiming an error tolerance of a few ppm, one can still obtain multiple can-

didates from mass match. Nevertheless, around 40–80% of peaks in a quantitation result are metabolite derivatives,8-10 causing false identifications for mass match. Although these derivative peaks may seem undesirable, they can provide precious information facilitating metabolite identification. Several approaches use such information including those that combine mass match with information on adduct, neutral loss, and isotope ratio (IR) —the ratio of the first isotope abundance to the monoisotope,— to further reduce the candidate numbers.10-14 The efficacy is modest for the adduct/neutral loss match via multiple mass matches, as well as for IR match, which is often degenerated by inevitable spectral noise and poor quantitation accuracy. Other efforts using chemical rules,15 transformations,16 metabolic pathways/networks,17 etc. have limited applications and thus produce limited effects on candidate reduction. Without the aid of tandem MS spectra, retention time (RT) is probably the most effective factor that can be coordinated with mass to pinpoint a metabolite. Although several approaches have been developed for RT prediction,18,19 they are specific to experimental design, sample type, instrument, certain classes of metabolite, etc. Overall, current ap-

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

proaches do not effectively utilize all the ion types for metabolite identification in three ways. First, the related ions are not properly grouped and annotated. Second, peaks detected in different ion modes are not integrated to provide sufficient information for the underlying metabolites. Third, various ion types, especially fragments, are not considered simultaneously for identification. To the best of our knowledge, no algorithm proposed thus far is able to address all the above problems. In addition, although fragments are known to appear in MS spectra and tools have been developed to annotate them,8-11,20 no attempt has been reported to identify a metabolite by matching its fragments in MS spectra with those in low energy MS/MS spectra in metabolite databases. Here we propose a metabolite identification procedure that utilizes information on multimers, adducts, multiply charged ions, isotopes, and fragments to facilitate metabolite identification for untargeted LC-MS-based metabolomics studies. This paper is arranged in the following order. The major tasks in the proposed procedure are described first. We then present the findings of a validation study with three datasets (two in-house, one public) to show the accuracy and effectiveness of our procedure, which identifies metabolites through MS data only. Finally, we discuss some of the remaining issues in metabolite identification.

METHODS Types of MS ion information considered. We considered the following MS ion types for metabolite identification. Adduct: An adduct ion is formed by the addition of atoms or molecules to a metabolite and is frequently observed in both electrospray ionization (ESI) and matrix-assisted laser desorption/ionization (MALDI) analyses. The proton adduct of [M+H]+ in positive ion mode and [M-H]- in negative ion mode are most frequently observed. In addition, [M+Na]+, [M+K]+, and [M+Cl]- are commonly observed adducts in MS spectra due to the ubiquity of sodium, potassium, and chloride ions in organic samples. Types of adducts may vary with different experimental settings, including sample type, sample preparation method, pH, solvent and buffer used, and contaminations.21,22 Fragment: Some weak bonds in a metabolite can be broken during the ionization process producing a series of fragment peaks in MS data.13,20,22 The percentage of fragments in MS data depends on experimental design, LC-MS hyphenations, quantitation tools, and parameters adopted therein. This type of ion reveals certain structural information about its parental metabolite. Although fragment match usually relies on existing MS/MS spectra in the database, information on fragments of some metabolite subclasses, such as lipids, amino acids, and glycans, can be either found in the literature or derived using chemical rules.15 Notably, some fragment peaks, which are called characteristic fragments and are unique to a certain class of metabolite, may assist metabolite identification. We consider two major types of characteristic fragments. The subclass-specific characteristic fragment is unique to a specific subclass of metabolite. For example, the fragment at 184.07 Da in positive ion mode spectra can characterize glycerophosphocholines for their phosphocholine (PC) head group, whereas the fragments representing the fatty acyl substituents can be used to specify their structure. The neutral loss-specific characteristic fragment is the peak revealing specific neutral loss

Page 2 of 12

of a metabolite with modifications, e.g., 176.03 Da for glucuronidation and 162.05 Da for glycosylation. Isotope: Since metabolites are combinations of different atoms, peaks associated with their isotopes can be detected in MS spectra. The m/z distance between the isotopic peaks of a metabolite reveals its charge state, whereas the relative abundances disclose its possible atom composition and the existence of coelution. The IR is useful in distinguishing isobaric metabolites with different atom compositions. Multiply charged ion: The formation of a multiply charged ion depends on the chemical structure of the metabolites, experiment settings, solvents used, acids and other conditions.22 Such ions are more frequently observed in ESI-MS spectra and their information may be used for metabolite identification. Without charge information, a charge-n (n = 2, 3, etc.) ion may still be recognized since it has the same IR but nearly 1/n the m/z value of its singly charged companion. Multiply charged ions can also be observed with various adducts and neutral losses. Multimer: When the sample concentration is high, some metabolites can form multimers, mostly dimers and trimers. The abundance of the multimer is usually much lower than that of the monomer. Multimers can sometimes appear in the forms of non-proton adducts or with neutral losses that often lead to false annotations. In such circumstances, a multimer can be distinguished from other ion types by checking whether its IR is roughly n times (n = 2, 3, etc.) its associated monomer. We have collected information on many possible adducts and characteristic fragments in serum and urine samples and list in Tables S1—S5 (Supporting Information, SI) the mutual mass differences in both modes for these ions. Workflow of the proposed MS-based metabolite identification procedure. We propose a metabolite identification procedure based on the afore-mentioned ion information. Its workflow involves two stages: (i) peak grouping and annotation and (ii) metabolite identification (Figure 1). The first stage aims to process the input data in order to group and annotate peaks that are associated with the same metabolite. The second stage then applies different identification strategies to the peak groups based on the content of the annotated ions in each group and evaluates the corresponding identification confidences. Given the quantitation results of opposite ion modes (e.g, from XCMS23 or MZmine24), the procedure can be implemented in an automated fashion. The details of this procedure are described below. Task 1: Grouping related ion peaks. Since different derivative ions a given metabolite can elute at around the same time, they may co-occur in MS scans. In addition, their relative abundances would exhibit similar patterns across samples. We used Pearson’s correlation coefficient (PCC) to evaluate the similarity of peak abundance.9,10 We ignored peak shape similarity8,10,13,20 since it requires raw spectra and becomes less accurate for small and low-resolution peaks. We grouped peaks that eluted within ±1.5 peak width of the highest peak in RT and with a PCC ≥ 0.8 across samples. The grouping method is detailed in Method S1 and an example that it successfully grouped related peaks of various ion types is shown in Figure S1 (SI). The Matlab programs for charge and IR determination, and for PCC computation are provided in Data S1 and Data S2 (SI), respectively.

ACS Paragon Plus Environment

Page 3 of 12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figure 1. The proposed metabolite identification procedure for MS-based metabolomics studies using multiple types of correlated MS ion information. First, the peaks in quantitation results are grouped and annotated at the Peak Grouping and Annotation stage. Then in the Metabolite Identification stage, each annotated peak group is performed with mass, spectrum, characteristic fragment, substructure, and IR matches so as to determine the most likely metabolite. The identifications are assigned with confidence levels and reasoning based on the matches, and the results are refined.

Task 2: Removing irrelevant peak groups. We removed peak groups that contained peaks corresponding to large compounds with molecular weight ≥ 1,000 Da, since metabolomics studies focus on small molecular compounds. In addition, almost all separation and detection techniques can introduce unwanted compounds or contaminants into the analytical system.22 We collected information on the contaminant peaks that were provided by the machine venders (e.g., Waters Corporation25), reported in the literature (e.g., Keller et al.22), in the well-known MaConDa database,26 and detected in the blank MS data (if available) and removed the corresponding peak groups. Task 3: Annotating ion types and determining the metabolite mass for each peak group. For each peak group, we converted all the multiply charged m/z values into the singly charged format and calculated the m/z differences between peaks. The resultant m/z value of the peaks and the m/z differences between peaks were matched against those of characteristic fragments, adducts, neutral losses, and modifications in Tables S1—S5 (SI). A match was noted if the error is within 30 ppm. Based on the matched ion types, we annotated peaks and deduced the metabolite mass. All of the remaining peaks

with mass less than the deduced metabolite mass were annotated as fragments. If multiple metabolite masses were deduced, the largest one was retained. If the metabolite mass could not be deduced, we chose the peak with the largest m/z value and iteratively assigned it as a possible adduct or a fragment with a water loss to determine “assumed” metabolite masses. Task 4: Identifying the metabolite associated with a peak group. For each peak group, we used the deduced metabolite mass or each of the assumed metabolite masses to search against public metabolite databases, e.g., HMDB,6 METLIN,4 MassBank,7 and ChemSpider27 within the mass tolerance of 30 ppm. If no matched metabolite was found, the peak group was noted as unidentifiable; otherwise, the matched metabolites were considered as candidates. For each peak group with such candidates, we next applied a series of matches to determine the most likely metabolite. To be more specific, for each peak group with metabolite candidates after precursor mass match, we first performed MS/MS spectrum match by comparing the annotated fragments with peaks of MS/MS spectra in the databases, specifically 0 and 10 V MS/MS spectra in METLIN and 0–15 V MS/MS

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

spectra in HMDB or MassBank, as shown in Figure S2 (SI). Merged or ramped spectra were also used if no low-energy spectrum was available in these databases. For each metabolite candidate, the spectrum coverage, — the sum of the matched fragment intensities over the total fragment intensities in the spectrum—, was used (provided it was at least 0.5) to determine the underlying metabolite. To resolve a tie in the coverage results, the fragment coverage, the number of matched fragments over the total number of fragments in the spectrum, was adopted for the final determination. If the metabolite could not be identified by the spectrum match, we performed a characteristic fragment match, which was achieved by checking whether characteristic fragments exist in the peak group to reduce the candidates. For example, if a positive fragment at around 184.07 Da was detected, we inferred the possible existence of a phosphocholine component in the compound and thus narrowed down to metabolites with such a head group. Then, for each candidate, we further checked whether its branch chains and their water neutral losses could be found in the annotated fragments. Similarly, we also checked for several known modifications (e.g., glucuronidation, phosphorylation, sulfation, etc.) to refine possible candidates. An example of identifying Chrysin 7-glucuronide via the characteristic fragment of glucuronidation is shown in Figure S3 (SI). For the remaining multi-peak groups, we performed a substructure match that employed a mass match on each of the fragment masses followed by a search for common substructures (e.g., sub-chemical compositions) among the candidates of different fragments. The metabolite candidate of the precursor that consists of the largest common substructure with the candidates of its fragments was selected for the final identification. For example, 5-methyldeoxycytidine (C10H15N3O4) is identified as the positive peak group of [242.11 126.07] Da because it is the metabolite among the three candidates (C10H15N3O4, C15H15NO2, and C12H19NO2S) for the precursor at 242.11 Da that has the largest common substructure of C5H7N3O with, and the only candidate for, the fragment at 126.07 Da, as shown in Figure S4 (SI). After the above matches, IR matches were performed on all the peak groups with metabolite candidates. For the precursor peak in the groups, its IR was evaluated on all replicates/samples and the median value was adopted as the realistic IR. For each peak group, its candidates were further filtered by the criterion of |theoretical IR - realistic IR| ≤ 0.05. If no candidate satisfied the criterion, the criterion was disregarded and all candidates were retained. If multiple candidates still existed after the IR match, the one that had the closest mass with the precursor was assigned as the final identification. Task 5: Assigning identification confidence and refining identification results. For each peak group, we ranked the identifications to five different levels: discarded, unidentifiable, with high-confidence, with medium-confidence, and with low-confidence. The peak group containing large compounds or contaminants, as described in Task 2, was denoted as discarded. The peak group was marked as unidentifiable, if its deduced metabolite mass failed to match the mass of any metabolite in the databases within 30 ppm. For the remaining peak groups, we utilized IR and fragment matches (including MS/MS, characteristic fragment, and substructure) to evaluate the confidence level of the identification. If both criteria were satisfied, the identification confidence was high. If one of the

Page 4 of 12

two criteria was satisfied, the identification confidence was medium, otherwise, the confidence was low. We also provided a reasoning section for each type of identification to allow for further verification. The reasoning section contained information on satisfied criteria and matched ion types and their m/z values. All the identified peak groups were further conducted with a refinement process. The process started with checking the relative RTs (i.e., the sequential order of the RTs) among metabolites of the same class since they are relatively stable across experiments. For example, the RTs of several lysoPCs reported in Dong et al.28 suggest that the length of a fatty acid chain and the location of a double bond in a lysoPC are highly associated with its polarity and influence the RT. According to their work, we derived rules to verify the relative RTs for PCs and PEs, as provided in Method S2 (SI). These rules were checked between an identified and a standard metabolite or among at least three identified metabolites. If the rules were satisfied, the confidence of the identified metabolite was increased; otherwise, the confidence was lowered. The refinement process also attempted to recover group members for each multi-peak group from its nearby singlepeak groups. Such peaks are usually small, noisy and/or coeluted peaks, and their correlations with other peaks in the specific group are not high enough to be included in the grouping process. For each multi-peak group, the recovery was performed by adding neighboring single peaks whose abundances were low and the corresponding m/z values were matched with possible multimers, multiply charged ions, adducts, modifications, or fragments (if the MS/MS spectrum was available).

EXPERIMENTAL SECTION We used three datasets to test the proposed identification procedure: two in-house datasets, a rat diabetes development LCMS dataset and a bitter gourd powder (BGP) diet mouse LCMS dataset and one public dataset, the AtMetExpress Development LC-MS dataset.29 The animal studies for the two inhouse datasets were individually approved by the Institutional Animal Care and Use Committee of Chang-Gung University and that of the National Taiwan University. The rat diabetes development LC-MS dataset was obtained from urine samples of 36 rats including 8 healthy controls (CT), 10 with high-fructose (HF) diet, 9 with high-fat (HL) diet, and 9 treated with streptozotocin (STZ) to induce diabetes, to study interactions among multiple risk factors in the progression of obesity and diabetes. Metabolome data was acquired from an LC-ESI-QTOF/MS system (UPLC: Waters ACQUITY UPLC system; MS: Waters Synapt G1 HDMS System) in both modes and with six technical replicates per sample. A detailed description on sample preparation and experimental settings is provided in Method S3 (SI). MassLynx V4.1 (Waters Corp., Manchester, U.K.) was used to generate quantitation results from the datasets. Key parameters for quantitation were as follows: mass tolerance = 0.03 Da; intensity threshold = 50 counts; mass window = 0.03 Da; RT window = 0.1 min; deisotope data was checked; and noise elimination level = 6.0. As a result, 1,076 and 978 deisotoped peaks were found from the positive and negative mode, respectively. In addition, MS/MS spectra of 15 known metabolites (13 in positive and 2 in negative mode) were further used to verify our identifications. The precursors and fragments of the 15 metabolites are listed in Table S6 (SI).

ACS Paragon Plus Environment

Page 5 of 12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

The BGP diet mouse LC-MS dataset was obtained from liver tissues of 14 mice, including 7 mice on a basal diet and 7 mice on a BGP diet for comparing metabolite changes that occurred in mouse liver between basal diet mice and those fed with an additional 5% of lyophilized BGP. Metabolome analysis was performed with an LC-ESI-Q-TOF/MS system (LC: Dionex Ultimate 3000 RSLC system; MS: Bruker maXis UHR-QTOF System) in positive mode only and with four technical replicates per sample. A detailed description of sample preparation and experimental settings of this dataset is provided in Method S4 (SI). Quantitation results of each replicate were obtained from ProfileAnalysis V2.0 (Bruker Daltonics, Bremen, Germany) with mass window = 0.01 Da and RT window = 0.2 min, and aligned by MetaboAnalyst V2.0.30 The quantitation result consisted of 757 deisotoped peaks. The AtMetExpress Development LC-MS dataset was constructed for the analysis of phytochemical accumulation during the development of Arabidopsis thaliana. The dataset contained 36 tissues, each with four technical replicates, from various growth stages and organs of Arabidopsis thaliana. Metabolome analysis was performed with an LC-ESI-QTOF/MS system (HPLC: Waters ACQUITY UPLC system; MS: Waters Q-TOF Premier) in both positive and negative modes. Quantitation results and fragment m/z values were extracted from its supplementary data. A total of 5,042 deisotoped peaks were detected in the dataset via metAlign31 and an in-house software32, and the structures of 167 metabolites were elucidated. Out of the 167 annotated metabolites, 34 were further experimentally verified by standard metabolites, in which 22 were only identified in positive mode, 2 only in negative mode, and 10 in both modes. The precursors and fragment lists of the 34 metabolites are provided in Table S7 (SI).

RESULTS AND DISCUSSION Identification result of the rat diabetes development LCMS dataset. We merged the quantitation results of the opposite modes and obtained 2,054 (1,076 positive and 978 negative) peaks. These peaks were grouped by the PCC-based method (with RT difference ≤ 0.02 min and PCC ≥ 0.8) producing 1,100 peak groups. Because each peak group may be associated with a single metabolite, roughly 46% (954 (=2,054-1,100) of 2,054) of peaks were multimers, adducts, multiply charged ions, or fragments, resulting an average of 0.87 derivative ion per metabolite. Of the 1,100 peak groups, 222 groups contained multiple peaks (totaling 1172 peaks). The remaining 878 groups only consisted of a single peak. Since many existing tools can identify individual peaks,4-7,10-14 the following demonstration only focuses on the 222 multipeak groups. We first removed 10 of the 222 multi-peak groups that were probably associated with large compounds or contaminants. Then, we annotated and identified the remaining 212 peak groups. As a result, 103 groups were identified by the novel MS/MS spectrum, characteristic fragment, or substructure matches, 107 groups were identified by the conventional mass match, and 2 groups were unidentifiable. After identification refinement, 4 peaks corresponding to 2 multi-peak groups were recovered based on their matched spectra (878 single peaks remaining, as shown in Figure 2A). Finally, out of the total 222 multi-peak groups, 103 (46%) were identified with high confidence, 91 (41%) with medium confidence, 16 (7%)

with low confidence, 2 (1%) were unidentifiable, and 10 (5%) were discarded (Figure 2B). Among the 103 confidently identified peak groups, 47 (46%) were identified via MS/MS spectrum, 53 (51%) were identified via characteristic fragments corresponding to methylation, sulfation, glycosidation, and glucuronidation, and 3 (3%) were identified via substructures (Figure 2C). The quantitation and detailed identification results are provided in Data S3 (SI). These results indicate that our identification procedure can group related peaks together and use them, especially fragments, to provide more accurate identifications (103 of 222, 46% in Figure 2B) in the multipeak groups when compared to the conventional mass and IR matches. Moreover, a considerable 54% (56 of 103) of the identifications were achieved via the novel characteristic fragment and substructure matches demonstrating their importance in metabolite identification without MS/MS spectra.

Figure 2. Identification statistics of the rat diabetes development LC-MS dataset; (A) numbers of single-peak groups and multipeak groups; (B) numbers of multi-peak groups that were discarded, unidentified, and identified with high, medium, and low confidence; (C) numbers of peak groups that were confidently identified via MS/MS spectrum, characteristic fragment, and substructure.

Furthermore, we evaluated the above identification results with 15 known metabolites. Of the 15 metabolites, 6 were associated with multi-peak groups, 5 were associated with single-peak groups, and the remaining 4 had no related peak in the quantitation result. Of the 6 multi-peak groups (totaling 15 peaks), 5 containing fragments were identified via MS/MS spectrum match with high confidence and 1 containing only the positive precursor and a potassium adduct was identified via IR match with medium confidence. One of the 5 highconfidence identifications, the nicotinuric acid, exhibited a time shift of 0.2 min (0.79 → 0.99) from the standard, possibly due to changes in sample complexity or environmental condition. When examining the 5 single-peak groups, we found that the matched precursor of 5-hydroxyindoleacetate at 192.07 Da had a low-correlated (PCC = 0.43) fragment at 146.07 Da, which was due to low abundance and few (5 of 35) carried samples of the precursor. Therefore, our procedure was not only capable of correctly grouping 15 of 17 peaks that were associated with 6 standard metabolites, but was also able to correctly identify 6 of 7 (86%) metabolites that exhibited multiple peaks in the quantitation result (detailed in Table S6, SI). Identification results of the BGP diet mouse LC-MS dataset. For each of the 757 peaks in this dataset, we grouped neighboring (RT difference ≤ 0.2 min) peaks with high (≥ 0.8) PCC of abundance, resulting in 389 peak groups (83 multipeak groups and 306 single-peak ones). A larger RT difference was used owing to low (0.1 min) RT resolution in the quantitation. The difference between the peak and the group numbers suggests that 49% (368 (=757-389) of 757) of peaks were pu-

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

tative multimers, adducts, multiply charged ions, or fragments, leading to an average of 0.95 derivative ion per metabolite in the dataset. Of the 83 multi-peak groups, 27 (mainly among 3.5–5.5 min) were discarded for large compounds or contaminants. The remaining 56 peak groups were subsequently conducted with mass, MS/MS spectrum, characteristic fragment, substructure, and IR matches. Firstly, 16 peak groups were successfully identified with high confidence via MS/MS spectrum match. In the remaining 40 peak groups, we found 8 peak groups with characteristic fragments for acetyl groups, lysoPEs, lysoPCs, and water neutral loss, respectively. Finally, for the remaining 32 peak groups, since no substructure was found, their identities were determined by mass and IR matches leading to 25, 4, and 3 peak groups that were identified with medium confidence, low confidence, and were unidentifiable, respectively. During the refinement step, we recovered three peaks that were sodium adducts of lysoPE(22:6), lysoPC(22:6), and lysoPC(20:4), respectively (303 single peaks remaining, as shown in Figure 3A). Moreover, we used the 6 confidently identified lysoPCs and lysoPEs to identify 5 more lysoPCs and 3 more lysoPEs in the single-peak groups via mass, IR, and relative RT matches. As a result, of the 83 multi-peak groups, 24 (29%) were identified with high confidence, 25 (31%) with medium confidence, 4 (5%) with low confidence, 3 (4%) were unidentified, and 27 (32%) were discarded (Figure 3B). If the 8 metabolites identified via relative RT match were also accounted for, we identified a total of 61 metabolites in this dataset. Of the 24 high-confidence identifications, 8 were identified by characteristic fragments and 16 by MS/MS spectra matches (Figure 3C). The quantitation and detailed identification results of the dataset are provided in Data S4 (SI). The results not only show that fragments can be generated in a different brands of mass spectrometer (Bruker in comparison with Waters in the previous dataset), but also demonstrated that our novel approaches of characteristic fragment match and relative RT refinement can help metabolite identification without the assistance of MS/MS spectra.

Figure 3. Identification statistics of the BGP diet mouse LC-MS dataset; (A) numbers of single-peak groups and of multi-peak groups; (B) numbers of multi-peak groups that were discarded, unidentified, and identified with high, medium, and low confidence; (C) numbers of peak groups that were confidently identified via MS/MS spectrum, characteristic fragment, and substructure.

Evaluation of the grouping and identification methods using the AtMetExpress Development LC-MS dataset. We used two sets of information in the AtMetExpress dataset: (1) the quantitation result of 5,042 peaks with nominal m/z values and (2) an extracted list of precursors and MS/MS fragments of 34 verified metabolites with rational m/z values. We noted that the peaks associated with a metabolite exhibited large

Page 6 of 12

time shifts (> 0.02 min) between positive-mode and negativemode quantitation results. Therefore, we performed the analysis on the quantitation results separately. Furthermore, since the provided fragments were extracted from a ramped (5–50 eV) MS/MS spectrum and could be different from those in an MS spectrum, we searched METLIN and MassBank for lowenergy MS/MS spectra of the 34 metabolites to generate an augmented fragment list (Table S7, SI). We removed the 5thand the 19th metabolites, and positive-mode ions of the 3rd and those of the 31st metabolites in the list, since the fragments were seemingly inconsistent with their corresponding precursors. However, we keep the negative-mode ions of the 3rd and those of the 31st metabolites. We also removed the 2nd, 4th, 10th, 11th, 13th, 15th, 16th, 18th, 24th, 27th, and 30th metabolites for lack of fragment in the list. As a result, we included 21 metabolites (listed in Table 1) for the following analysis. For this dataset, we did not directly apply the proposed identification procedure to the quantitation result because the nominal m/z values provided therein may result in an enormous number of metabolite candidates after mass match, which would not only suffocate the identification process, but have no reference data for validation. Instead, we attempted to use the 21 verified metabolites and their fragments to test the grouping effectiveness of PCC and to assess the capability of the proposed identification procedure. We compared the quantitation result with the augmented fragment list and obtained 64 matches in two modes including 25 precursors (4 with both positive and negative precursors) and 39 fragments for the 21 metabolites. We first tested whether PCC could effectively group fragments with their corresponding precursors. For each precursor of the 21 metabolites, we grouped its neighboring (RT difference ≤ 0.02 min) peaks with high (≥ 0.8) PCC of abundances in the quantitation result. Consequently, of the aforementioned 39 fragments associated with the 21 metabolites, 36 (blue numbers in Table 1) fragments were successfully grouped with the precursor of 19 metabolites resulting in a high effectiveness of 92% ((23 precursors + 36 fragments) out of (25 precursors + 39 fragments)) for grouping precursors and fragments using PCC. The three unidentified fragments in two metabolites (red numbers in Table 1) were not highly correlated with their corresponding precursors due to low peak intensities and noisy signals. The imputed value (i.e. triple the background noise) for missing peaks assigned by metAlign31 may also have affected the computed correlations. Although nominal masses are not suitable for mass match, we attempted to assess the identification capability of the proposed procedure on the 21 metabolites with fragments. Among them, 13 could be identified via an MS/MS spectrum match with METLIN or MassBank, 5 via characteristic fragments (i.e., mass difference of 162 Da for glucoside and 146 Da for rhamnopyranoside), 1 via substructure match (C12H23NO10S3 for precursor at 436 and C11H19NO9S2 for the fragment at 372 Da, with the common composition of C11H19NO9S2), and the remaining 2 were unidentified due to missing fragments. The proposed procedure thus achieved a 90% (19 of 21) identification rate. We summarize the grouping results and possible identification strategy of the 21 metabolites in Table 1 and provide the detailed information of the original 34 metabolites in Table S7 (SI).

ACS Paragon Plus Environment

Page 7 of 12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Table 1. The 21 verified metabolites with fragments in the AtMetExpress Development LC-MS dataset, their matched precursors, fragments, and the potential identification strategy Metabolite Name [PubChem CID or CAS number] Sn-glycero-3-phosphocholine [11234]

m/z of Matched PrecurPeak Group Description; sors and Fragments* Potential Identification Strategy† 257 pos: 258, 184, 104 Single precursor peak; Mass match Multiple peaks containing fragment peaks; 437 neg: 436, 372 Substructure match (C11H19NO9S2) Multiple peaks containing fragment peaks; 149 pos: 150, 133 Spectrum match, METLIN(26) Multiple peaks containing fragment peaks; 307 pos: 308, 179 Spectrum match, METLIN(44) pos: 613, 484, 355; Multiple peaks containing fragment peaks; 612 neg: 611 Spectrum match, METLIN(45) Multiple peaks containing fragment peaks; 181 pos: 182, 165, 136 Spectrum match, METLIN(34) Multiple peaks containing fragment peaks; 137 pos: 138, 121 Spectrum match, METLIN(60) Multiple peaks containing fragment peaks; 165 pos: 166, 120 Spectrum match, METLIN(28) pos: 205, 188 Multiple peaks containing fragment peaks; 204 neg: 203 Spectrum match, METLIN(33) Multiple peaks containing fragment peaks; 354 pos: 355, 193 Char. frag. match (gluc.)

Mass

4-methylsulfinyl-n-butylglucosinolate [9548634] Methionine [6137] Glutathione (reduced form) [745] Glutathione (oxidized form) [975] Tyrosine [6057] Tyramine [5610] Phenylalanine [6140] Tryptophan [6305] Scopolin Quercetin-3-O-a-L-rhamnopyranosyl (1,2)-b-Dglucopyranoside-7-O-a-L-rhamnopyranoside [161993-01-7] Kaempferol-3-O-a-L-rhamnopyranosyl (1,2)-b-Dglucopyranoside-7-O-a-L-rhamnopyranoside [162062-89-7]

756 pos: 757, 499

Multiple peaks containing fragment peaks; Spectrum match, MassBank(PR101052)

740 pos: 741, 595, 433, 287

Multiple peaks containing fragment peaks; Spectrum match, MassBank(PR101051)

Multiple peaks containing fragment peaks; Spectrum match, METLIN(3420) Multiple peaks containing fragment peaks; 610 pos: 611, 449, 303 Char. frag. match (gluc. and rham.) Multiple peaks containing fragment peaks; 594 pos: 595, 433, 287 Char. frag. match (gluc. and rham.) 164 pos: 165, 147 Single precursor peak; Mass match pos: 579, 433; Multiple peaks containing fragment peaks; 578 neg: 577, 431 Char. frag. match (rham.) neg: 339, 223, 208, 179, Multiple peaks containing fragment peaks; 340 164, 149, 133, 115 Spectrum match, MassBank(PR101003) pos: 207; neg: 339, 179, Multiple peaks containing fragment peaks; 340 164, 133, 115 Spectrum match, MassBank(PR101003) Multiple peaks containing fragment peaks; 177 pos: 178, 114 Spectrum match, METLIN(58318) pos: 449, 303; Multiple peaks containing fragment peaks; 448 neg: 447 Spectrum match, Char. frag. match (rham.) 290 neg: 289, 245

Epicatechin [182232] Quercetin-3-O-b-glucopyranosyl-7-O-arhamnopyranoside [18016-58-5] Quercetin-3,7-O-a-L-dirhamnopyranoside [28638-13-3] P-coumaric acid [322] Kaempferol 3,7-O-dirhamnopyranoside [482-38-2] Sinapoymalate [11953815] Sinapoymalate (isomer) [11953815] Sulforaphan [5350] Kaempferol-3-Glu [5282102]

* Black numbers are matched precursors; blue numbers are high-correlated fragments; red numbers are low-correlated fragments †

Char.

frag.

match

for

Characteristic

fragment

match,

Other comments on metabolite identification. Conventional metabolite identification tools that use mass and IR matches to identify peaks in a MS quantitation result may either select incorrect metabolites among candidates for precursors or waste time in identifying ions other than precursors. Although recent annotation tools may reduce the problem, they will not solve it. The capability of recognizing the precursor and (characteristic) fragments is actually crucial in improving the identification accuracy. For this purpose, we propose a metabolite identification procedure with automation capability that not only groups and annotates various ion types in a quantitation result, but also uses these ion types, especially fragments, to identify the most potential metabolites associated with the peak groups. Our test results of the total 28 standard

gluc.

for

glucoside,

and

rham.

for

rhamnopyranoside

metabolites that exhibited fragments in the first and the third datasets revealed that our identification procedure was able to correctly group 91% (74 of 81) of related ions using PCC and confidently identify all the 25 standard metabolites whose fragments were grouped with their precursors. In addition, of the total 305 multi-peak groups in the two in-house datasets, our identification procedure was capable of discarding 37 (12%) irrelevant peak groups and then respectively identifying 127 (42%), 116 (38%), and 20 (7%) peak groups with high, medium, and low confidence, leaving only 5 (1%) peak groups unidentifiable. A software package that implements the proposed procedure is under development.

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

We suggest the following three possible strategies for further increasing confidence in metabolite identification: (i) increasing the number of dissociated metabolites; (ii) enriching the MS/MS spectrum database; and (iii) enhancing background knowledge filtering. Firstly, over 80% of the peak groups in the two in-house datasets contained no fragment, implying that many metabolites remained intact in the MS data. Increasing the number of dissociated metabolites in the MS data (e.g., via increasing the cone voltage during ionization) may enable them to be identified via fragment match. However, such a manipulation should be performed with caution so that the precursors remain detectable. Secondly, only about 20% of multi-peak groups with fragments in the two inhouse datasets were identified by spectrum match, indicating that there are still too few MS/MS spectra in the public databases. We expect to have more metabolite databases that are available for download or electronic query in the near future. In addition, an in-house spectrum database could be included to improve the identification. Thirdly, at least 20% of multipeak groups in the two in-house datasets were identified via the proposed characteristic fragments, common substructures, and relative RT. There is considerably more background knowledge from different fields that could be implemented to further improve the identification.15-17 Of particular note, if structure information could somehow be digitized, we could verify an identification result by checking whether certain fragments (e.g., H2O, COOH) are possible for the metabolite. For metabolite identifications based on quantitation results of MS data, the accuracy of the quantitation results is crucial to its success. Although many quantitation software packages have been developed, only a few are exclusively designed for metabolomics studies and sometimes the quantitation results are inaccurate. Issues such as noise removal, broken peaks, deconvolution, and deisotoping remain open problems. In addition, most of the software packages do not provide charge and IR information, which are essential for metabolite identification. Moreover, some software packages require many parameter inputs which can be inconvenient for general users who are lack of signal processing and mass spectrometry background. To address the above issues, we have also developed algorithms along with an integrated quantitation software package exclusively for metabolomics studies. The peak grouping process not only helps metabolite identification but may also provide accurate abundance of a metabolite through adding up the abundances of its correlated peaks. However, caution must be used when accumulating the effect of these peaks. Although similar RT and similar abundance profiles across samples for the grouped peaks almost assure them to be associated with a single metabolite, we have observed multiple metabolites in a peak group. For example, in the Data S3 (SI), the 4th peak group includes citric acid, malic acid, and succinic acid from the citric acid cycle,33 whereas the 169th peak group contains acacetin and its modifications including methylated acacetin, hydroxylated acacetin, and their glycuronidated metabolites. In addition, in the Data S4 (SI), two pairs of phospholipids were also grouped together. Although some of the phospholipids were reported to exhibit similar RTs,28 the reason why they had similar abundance profiles across samples demands further investigations. Currently, the proposed procedure can only identify the metabolite with the largest mass in each group. For such groups, the relationship between an individual peak and the multiple metabolites should be clarified before performing the accumulation.

Page 8 of 12

ASSOCIATED CONTENT Supporting Information Additional information is noted in text. This material is available free of charge via the Internet at http://pubs.acs.org.

AUTHOR INFORMATION Corresponding Author * E-mail: [email protected]. Fax: +886-2-2789-2910. Tel.: +886-2-2788-3799 ext.1804.

Notes The authors declare no competing financial interest.

ACKNOWLEDGMENT This work was supported by National Science Council of Taiwan under grant NSC102-2319-B-010-002.

REFERENCES (1) In 2009 ASMS Conference Metabolomics Survey, http://metabolomics.us/subdomains/metabolomics/2009/ASMS/Metab olomicsWorkshop/SurveyResults/ASMS2009MetabolomicsSurveyDistributed.html (2) Dunn, W. B., et al. Chemical Society reviews 2011, 40, 387426. (3) Want, E. J., et al. Nature protocols 2010, 5, 1005-18. (4) Smith, C. A., et al. Therapeutic drug monitoring 2005, 27, 74751. (5) Cui, Q., et al. Nature biotechnology 2008, 26, 162-4. (6) Wishart, D. S., et al. Nucleic acids research 2013, 41, D801-7. (7) Horai, H., et al. Journal of mass spectrometry : JMS 2010, 45, 703-14. (8) Scheltema, R., et al. Bioanalysis 2009, 1, 1551-7. (9) Brown, M., et al. The Analyst 2009, 134, 1322-32. (10) Alonso, A., et al. Bioinformatics 2011, 27, 1339-40. (11) Draper, J., et al. BMC bioinformatics 2009, 10, 227. (12) Lane, A. N., et al. Methods in cell biology 2008, 84, 541-88. (13) Brown, M., et al. Bioinformatics 2011, 27, 1108-12. (14) Zhou, B., et al. PloS one 2012, 7, e40096. (15) Kind, T.; Fiehn, O. BMC bioinformatics 2007, 8, 105. (16) Rogers, S., et al. Bioinformatics 2009, 25, 512-8. (17) Zhou, B., et al. Proteomics 2013, 13, 248-60. (18) Hagiwara, T., et al. Bioinformation 2010, 5, 255-8. (19) Creek, D. J., et al. Analytical chemistry 2011, 83, 8703-10. (20) Kuhl, C., et al. Analytical chemistry 2012, 84, 283-9. (21) Gao, S., et al. Journal of chromatography. B, Analytical technologies in the biomedical and life sciences 2005, 825, 98-110. (22) Keller, B. O., et al. Analytica chimica acta 2008, 627, 71-81. (23) Benton, H. P., et al. Analytical chemistry 2008, 80, 6382-9. (24) Pluskal, T., et al. BMC bioinformatics 2010, 11, 395. (25) In Waters Background Ion List, Waters Corporation, http://www.waters.com/webassets/cms/support/docs/bkgrnd_ion_mstr _list.pdf. (26) Weber, R. J. M., et al. Bioinformatics 2012, 28, 2856-2857. (27) ChemSpider, Royal Society of Chemistry, http://www.chemspider.com/. (28) Dong, J., et al. Metabolomics 2010, 6, 478-488. (29) Matsuda, F., et al. Plant physiology 2010, 152, 566-78. (30) Xia, J., et al. Nucleic acids research 2012, 40, W127-33. (31) Lommen, A. Analytical chemistry 2009, 81, 3079-86. (32) Matsuda, F., et al. The Plant journal : for cell and molecular biology 2009, 57, 555-77. (33) Kornberg, H. L. Biochem Soc Symp 1987, 1-2.

ACS Paragon Plus Environment

Page 9 of 12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Authors are required to submit a graphic entry for the Table of Contents (TOC) that, in conjunction with the manuscript title, should give the reader a representative idea of one of the following: A key structure, reaction, equation, concept, or theorem, etc., that is discussed in the manuscript. Consult the journal’s Instructions for Authors for TOC graphic specifications.

Insert Table of Contents artwork here

Keywords: metabolite identification, LC-MS, quantitation result, characteristic fragment match, common substructure match

ACS Paragon Plus Environment

9

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 1. The proposed metabolite identification procedure for MS-based metabolomics studies using multiple types of correlated MS ion information. First, the peaks in quantitation results are grouped and annotated at the Peak Grouping and Annotation stage. Then in the Metabolite Identification stage, each annotated peak group is performed with mass, spectrum, characteristic fragment, substructure, and IR matches so as to determine the most likely metabolite. The identifications are assigned with confidence levels and reasoning based on the matches, and the results are refined. 363x269mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 10 of 12

Page 11 of 12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figure 2. Identification statistics of the rat diabetes development LC-MS dataset; (A) numbers of single-peak groups and of multi-peak groups; (B) numbers of multi-peak groups that were discarded, unidentified, and identified with high, medium, and low confidence; (C) numbers of peak groups that were confidently identified via MS/MS spectrum, characteristic fragment, and substructure. 338x114mm (300 x 300 DPI)

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3. Identification statistics of the BGP diet mouse LC-MS dataset; (A) numbers of single-peak groups and of multi-peak groups; (B) numbers of multi-peak groups that were discarded, unidentified, and identified with high, medium, and low confidence; (C) numbers of peak groups that were confidently identified via MS/MS spectrum, characteristic fragment, and substructure. 334x104mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 12 of 12