Increasing Compound Identification Rates in Untargeted Lipidomics

Aug 10, 2018 - ... Marco Piparo†§ , Francesco Cacciola∥ , Luigi Mondello§⊥# , and Oliver Fiehn*† ... Polo Annunziata, University of Messina,...
0 downloads 0 Views 684KB Size
Subscriber access provided by Kaohsiung Medical University

Article

Increasing Compound Identification Rates in Untargeted Lipidomics Research with Liquid Chromatography Drift Time-Ion Mobility Mass Spectrometry Ivana Blaženovi#, Tong Shen, Sajjan S. Mehta, Tobias Kind, Jian Ji, Marco Piparo, Francesco Cacciola, Luigi Mondello, and Oliver Fiehn Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.8b01527 • Publication Date (Web): 10 Aug 2018 Downloaded from http://pubs.acs.org on August 12, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Increasing Compound Identification Rates in Untargeted Lipidomics Research with Liquid Chromatography Drift Time-Ion Mobility Mass Spectrometry Ivana Blaženović1‡, Tong Shen1‡, Sajjan S. Mehta1‡, Tobias Kind1, Jian Ji1,2, Marco Piparo1,3, Francesco Cacciola4, Luigi Mondello3,5,6 and Oliver Fiehn1* 1

West Coast Metabolomics Center, UC Davis, Davis, CA, 95616, U.S.A.

2

School of Food Science, State Key Laboratory of Food Science and Technology, National Engineering Research Center for Functional Foods, School of Food Science Synergetic Innovation Center of Food Safety and Nutrition, Jiangnan University, Wuxi, Jiangsu 214122, China 3

Dipartimento di Scienze Chimiche, Biologiche, Farmaceutiche ed Ambientali, University of Messina-Polo Annunziata, Viale Annunziata, 98168, Messina, Italy 4

Dipartimento di Scienze Biomediche, Odontoiatriche e delle Immagini Morfologiche e Funzionali, University of Messina, Via Consolare Valeria, 98125 Messina, Italy

5

Chromaleont s.r.l., c/o Dipartimento di Scienze Chimiche, Biologiche, Farmaceutiche ed Ambientali, Polo Annunziata, University of Messina, viale Annunziata, 98168 Messina, Italy 6

Department of Medicine, University Campus Bio-Medico of Rome, Via Álvaro del Portillo 21, 00128 Rome, Italy



These authors contributed equally.

KEYWORDS: Collision cross section (CCS); Ion mobility-mass spectrometry (IM-MS); Lipid; Lipidomics; Compound identification; Compound ID; Structure elucidation; Machine learning; Classification; Metabolomics

ABSTRACT: Unknown metabolites represent a bottleneck in untargeted metabolomics research. Ion mobility-mass spectrometry (IM-MS) facilitates lipid identification because it yields collision cross section (CCS) information that is independent from mass or lipophilicity. To date, only a few CCS values are publicly available for complex lipids such as phosphatidylcholines, sphingomyelins, or triacylglycerides. This scarcity of data limits the use of CCS values as an identification parameter that is orthogonal to mass, MS/MS, or retention time. A combination of lipid descriptors was used to train five different machine learning algorithms for automatic lipid annotations, combining accurate mass (m/z), retention time (RT), CCS values, carbon number, and unsaturation level. Using a training data set of 429 true positive lipid annotations from four lipid classes, 92.7% correct annotations overall were achieved using internal cross-validation. The trained prediction model was applied to an unknown milk lipidomics data set and allowed for class 3 level annotations of all features detected in this application set according to Metabolomics Standards Initiative (MSI) reporting guidelines. INTRODUCTION Lipids are molecules of large interest for the broad research community, as they are major components of cell membranes and play many vital roles in cells such as energy storage, cell signaling, and interactions with proteins.1-4 Dysregulation of lipid homeostasis, particularly saturated fatty acids, have been associated with several major human diseases including obesity, diabetes, cardiovascular disease, and Alzheimer’s disease.5-8 In contrast, unsaturated fatty acids have been associated with reduced risks of these disorders.9,10 Unknown metabolites represent a major bottleneck in scientific research. Computational chemists are developing software tools which assist in the structure elucidation process in order to increase compound identification rates in untargeted metabolomics.11-14 Despite continuous efforts, annotating the full lipidome is still impossible because there are few commercially available lipid standards for many lipid classes. Fortunately, many lipids fragment in a predictable manner in tandem mass spectrom-

etry (MS/MS) experiments, enabling the creation of fragmentation rule based in silico lipid libraries for compound annotations. One such library is the LipidBlast database, in addition to other programs and tools for LC-MS/MS untargeted lipid annotation and quantification.15-19 Lipids are structurally diverse and can be divided into classes and subclasses depending on the head group, number and position of double bonds, and the composition of the acyl chains. Thus, powerful analytical platforms are needed in order to comprehensively separate and identify lipids in complex samples. Constant developments in liquid chromatography and mass spectrometry have accelerated the emerging field of lipidomics.20-22 Yet, in all untargeted lipidomics studies, many MS/MS remain unresolved. In recent years ion mobility-mass spectrometry (IM-MS) has been tested for a wide range of research fields.23-27 In addition to well established measurements of mass-to-charge (m/z) ratio of molecular ions and their retention times on a chromatographic column, in drift time (DT) IM-MS ions are separated while they travel through a buffer gas under the influence of an electric field. The separation is affected by mass,

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

charge, and most prominently by shape (collision cross section, CCS), which provides another dimension for separating molecules.28-30 Both retention time (RT) and MS/MS descriptors in LC-MS/MS are affected by matrix effects (co-elution), fluctuations in solvent gradients, flow rates, temperature, electronic drift, and long-term drift.31,32 Tandem mass spectra are further affected by ion activation mode, precursor isolation window, acquisition speed, dynamic exclusion parameters, and monoisotopic precursor selection.33-35 Lipidomics is of interest for ion mobility because CCS measurements, unlike RT and MS/MS, have been proven to be reproducible across three laboratories with a relative standard deviation (RSD) of only 0.29% for 120 unique ion species.27 While having additional orthogonal parameters such as CCS values increases identification rates, the number of compounds covered in CCS databases remains low but is increasing.36-41 The limitations of publicly available CCS databases can be overcome by using CCS values generated in silico with algorithms such as support vector regression (SVR).42 Software like MetCCS Predictor are powerful, user-friendly, and can quickly and accurately predict the CCS values of metabolites when molecular descriptors are provided. The problem is that such descriptors can only result from already identified metabolites and have to be extensively validated for each new class of chemicals. In the present work, we describe a platform developed to improve compound annotation of untargeted lipidomics drift time ion mobility-mass spectrometry data. We chose bovine milk fat as the matrix in this work because it is primarily rich in triglycerides while other milk lipids include diacylglycerides, phospholipids, cholesterol and free fatty acids (FFA). It is also abundant in isomers and lipids with odd numbers of carbons which are synthesized in by bacteria in rumen, which provides complexity.43 We developed a compound ID prediction model that utilizes the accurate mass, RT, and experimentally derived CCS values of identified compounds analyzed on DT-IM-MS in order to predict the lipid class, carbon number, and saturation number of the unknowns that make up the vast majority of any untargeted lipidomics data set. EXPERIMENTAL SECTION MATERIALS AND EXTRACTION Bovine milk samples were obtained from the Dairy Barn at University of California, Davis. 20 µl aliquots of milk were thawed on ice and added to 225 µl of cold methanol, mixed, and shaken for 10 s. Subsequently, 750 µl methyl tert-butyl ether (MTBE) was added with 10 min of continuous shaking. Last, 188 µl of LCMS grade water was added and mixed by shaking. The suspension was centrifuged for 2 min at 14000 rcf. 350 µl from the supernatant was transferred to a new tube. Samples were dried and resuspended with 110 µl methanol: toluene (9:1, v/v) prior to LC-MS analysis.22 This extraction protocol extracts all main lipid classes in plasma with high recoveries, specifically phosphatidylcholines (PC), sphingomyelins (SM), phosphatidylethanolamines (PE), lysophosphatidylcholines (LPC), ceramides (Cer), cholesteryl esters (CholE), and triacylglycerols (TG).44 Lipid standards were purchased from Avanti Polar Lipids (Alabaster, USA). INSTRUMENTATION All measurements were carried out on an Agilent 6560 drift tubeion mobility-Q-ToF (Agilent Technologies) coupled with an Agilent 1290 Infinity II LC system.45 1 µL of diluted samples were separated on an Waters Acquity UPLC CSH C18 column (100 × 2.1 mm; 1.7 µm) coupled to an Acquity UPLC CSH C18 Van-

Page 2 of 11

Guard precolumn (5 × 2.1 mm; 1.7 µm). The column was maintained at 65 °C at a flow rate of 0.6 mL/min. The mobile phases consisted of (A) acetonitrile: water (60:40, v/v) with ammonium formate (10 mM) and formic acid (0.1%) and (B) 2-propanol: acetonitrile (90:10, v/v) with ammonium formate (10 mM) and formic acid (0.1%). The separation was conducted under the following gradient: 0 min 15% B; 0−2 min 30% B; 2−2.5 min 48% B; 2.5−11 min 82% B; 11−11.5 min 99% B; 11.5−12 min 99% B; 12−12.1 min 15% B; 12.1−15 min 15% B. The QTOF MS instrument was operated in electrospray ionization (ESI) in positive mode with the following parameters: mass range, 120−1700 m/z; capillary voltage, 3.5 kV; nozzle voltage, 1 kV; gas temperature, 275 °C; drying gas (nitrogen), 8 L/min; nebulizer gas (nitrogen), 35 psi; sheath gas temperature, 325 °C; sheath gas flow (nitrogen), 11 L/min; fragmentor, 400 V; acquisition rate, 1 frame/s; total cycle time, 0.5 s. Ion trap fill time was 30 ms and its ion gate release time was 300 µs. The drift tube and ion funnels operated as follows: high pressure funnel RF, 100 V; trap funnel RF, 100 V; IM-drift tube entrance voltage, 1574 V; IM-drift tube exit voltage, 224 V; rear funnel entrance voltage, 217.5 V; rear funnel RF, 100 V; rear funnel exit voltage, 45 V. The IM drift gas pressure (N2) was maintained at ca. 4 Torr. DATA PROCESSING AND IDENTIFICATION The LC-IMS-MS data were analyzed by the Agilent IM-MS Browser B.07.01 and Mass Profiler B.08.00 for feature finding. Detailed parameter settings are listed in Table S1. Accurate masses, retention times, CCS values, and peak heights were then exported, and further analysis was performed in the R and Python programming languages. 2,086 features were found in the milk samples and were used for modeling. Automated annotation of metabolites was performed with the help of an in-house library built from hundreds of authentic lipid standards of different classes, in addition to matching MS/MS spectra against the NIST14 and LipidBlast libraries.19,22 COMPUTATIONAL METHODS To gain more insight into the experimental results and increase compound identification rates, several machine learning methods were utilized. An automated machine learning for supervised classifications tasks of ion mobility data was briefly used, the Tree-based Pipeline Optimization Tool (TPOT).46,47 TPOT uses a broad range of supervised classification algorithms and transformers, and it optimizes parameters by genetic programming. We specifically utilized the k-nearest neighbors (KNN) Classifier from the scikit-learn machine learning package in Python.48 We tested several classification methods such as KNN Classifier, Gaussian Naïve Bayes Classifier, and Linear and Radial Basis Function (RBF) kernel support vector machine algorithms in order to find the best fit for lipidomics IM-MS data.49 RESULTS AND DISCUSSION CREATION OF THE CCS DATABASE First, we used MetCCS software and its Metabolite Match function, which aims to provide an annotation for the query: m/z and CCS. For the 27 metabolites that we queried (~6% of our training set) with 15 ppm tolerance for the m/z and 3% tolerance for the CCS matching, only 5 annotations were possible, leaving the other 22 without any result (see Table S1). The limited efficacy of the MetCCS software led us to create a new CCS database for lipidomic analyses. To this end, we analyzed bovine milk samples on UPLC DTIM-MS and processed the data with the vendor’s MassProfinder software. Raw data tables containing m/z, RT, DT,

ACS Paragon Plus Environment

Page 3 of 11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

and CCS values were exported. 429 lipids were then identified with our in-house m/z-RT library that was extensively validated by MS/MS spectrometry.45 In total, 429 lipids belonging to 4 lipid classes (with their CCS values) were identified in bovine milk: 364 triacylglycerides (TG), 25 diacylglycerides (DG), 27 phosphatidylcholines (PC), and 13 sphingomyelins (SM). This limited lipidome diversity and prevalence of TGs is consistent with the milk lipidome literature.43 146 identified lipids consisted of isomer pairs. Each of the lipid classes formed several adduct species including [Na]+, [NH4]+, [K]+, and [H]+ adducts. However, only a subset of these adducts were used for further CCS value modeling due to the lack of representatives in the training set for each adduct species per lipid class. We have observed identical nominal lipids such as TG 55:2 with both [Na]+ and [NH4]+ adducts that differed in CCS values. However, such lipids might still be structurally different causing CCS differences, for example, by the composition of triacyl-groups. Hence, such adducts should not be combined and we separated each lipid class and modeled one specific adduct at a time. Figure 1 displays the relationships of the TGs and their descriptors=: m/z, RT and CCS. This adduct specific modeling enabled us to perform predictive model building for fast lipid annotations. To restrict model complexity, we limited predictions to the positive ionization mode. Three adduct species were used for modeling: [M+Na]+ (n=98), [M+NH4]+ (n=296), and [M+H]+ (n=35). Each lipid class and adduct form was treated separately and the data set was divided into training and validation set in 80:20 ratio. We used different descriptors for modeling lipid annotations for all lipids, including accurate mass (m/z), retention time (RT), CCS, carbon number, and unsaturation level. Lipids used for modeling in both training and validation data sets are shown in Table S2. The range of CCS values observed in Tables S1 and S2 indicates a high degree of structural diversity among lipid classes present in milk samples.50,51 MODELING CCS VALUES FOR COMPLEX LIPIDS After obtaining CCS values for 429 unique lipid species, we used machine learning methods for metabolite prediction. Supervised machine learning methods can utilize experimental data to optimize models that are useful for classifying metabolomics data. We used scikit-learn and the TPOT pipeline optimizer with a minimal configuration to optimize hyperparameters and determine the best preprocessing and classification methods out of the KNN Classifier, Gaussian Naïve Bayes Classifier, and linear and radial basis function (RBF) kernels for support vector machines (SVM). TPOT was run over the entire specified dataset to choose an optimal classifier. To avoid overfitting, we selected a model that yielded a high internal cross validation score as well as strong average accuracy when bootstrapping on 5,000 iterations over randomized 80/20 training/test splits of the data set. After final evaluations, we found that the KNN classifiers consistently performed best lipid annotations. Data scaling parameters were optimized for each set of input data and labels. We considered three different classification inputs (m/z and RT; m/z and CCS; m/z, RT, and CCS) with two outcomes (lipid class; lipid class with carbon number). While RT in combination with accurate mass is a strong parameter in classifying lipid classes with carbon numbers with 84.05% (Table 1), CCS is clearly a stronger discriminator, boosting the correct classifications to 89.23%. The combination further improves the classification accuracy to 91.78%, which is expected due to separation of isomeric species by CCS. For annotating lipid classes alone, we found that our classifier reliably predicted the lipid class with 95.38% accuracy when utilizing the full classification input of m/z, RT, and CCS.

To provide more complete lipid annotations that would include degree of unsaturation, we developed a secondary prediction to be run on the primary lipid class classification. The first model provides calculated the lipid class and carbon number information. Accurate mass information was subsequently used to predict the most likely number of double bonds by modeling total lipid masses with various adducts. We further modeled prediction accuracy by testing the impacts of increased mass errors or data sets that included different classes of lipids within similar mass and CCS ranges. Using a loop for C/H/O element ratios, we constructed a simple interpolation/extrapolation model for each lipid class in our training set to estimate the carbon number from the m/z value. Subsequently, the saturation number estimator was run on adjacent carbon number values to determine the most likely total match. To reduce false positive matches, we only considered predictions within 10 mDa of the observed m/z. The impact of mass errors (or new lipid classes with isobaric interference) was tested by applying a jitter within 10 ppm of the measured m/z values in our data set. Over 10,000 iterations using random jitters, we obtained an accurate prediction rate of 100% for the first saturation number estimator and 97.88% for the second full lipid description predictor. For example, an input of (TG, m/z 916.8314) yielded a correct prediction of TG (55:3) as [M+NH4]+ adduct. Combining the two predictive models with the accuracy of the lipid class classifier gave a total lipid annotation accuracy of 93.3%. The interpolation of carbon numbers for TGs is shown in Figure S1. The final, optimized lipid classification method was then applied to the blinded set of 2,087 bovine milk lipidomics data consisting of 1,658 unknown lipid features and the true positive set of 429 identified lipids. In addition to accurately classifying the set of identified lipids, this procedure also yielded class 3 level annotations based on the compound annotation scheme of the Metabolomics Standards Initiative (MSI) for an additional 179 previously unknown lipids with an estimated accuracy of 95.38% using Classifier #6 (Table S1)52. This annotation therefore resulted in confident annotations of additional 74 DGs, 8 PCs and 97 TGs for a bovine milk lipidome. COLLISIONAL CROSS SECTION CALIBRATION AND REPRODUCIBILITY Since all of the calculations, modeling, and annotations heavily rely on the influence of the CCS values, we investigated their stability. We performed multiple intra-day and inter-day repeated measurements (Table 2). Calibration of drift times and CCS were carried out using tune mix of known CCS provided by Agilent Technologies. A mass range of 60-1700 m/z was used to cover all masses of the bovine milk lipidome. Within the same day, CCS varied by 1 Ų, with a relative standard deviation (RSD) of 0.2%; among inter-day multiple repeated measurements, CCS values varied at 0.7% RSD. These results are consistent with the previously reported ranges and proved to be reliable orthogonal parameter. 27 COMPARISON WITH Q-TOF DATA Odd- and branched-chain fatty acids (OBCFA) are highly specific components of complex milk lipids in cattle, goats, and animals with symbiotic fermentation sites of ingested food.53 While ion mobility provides an additional orthogonal parameter to metabolomics experiments, we used the same samples to determine lipidomics results in classic UPLC- QTOF MS/MS analysis without using drift time ion mobility-mass spectrometry. We compared

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

the results of accurate mass and MS/MS similarity matching, according to MSI level 2 annotations. An example of annotation the MS/MS spectrum for TG 49:1[M+NH4]+ eluting at 10.806 min is given in Figure 2. Due to the complexity of milk triacylglycerol species, many MS/MS spectra for isobaric and isomeric species remained unresolved. In addition, high throughput lipidomics chromatography (12-minute run times) did not separate all regio- and stereoisomers. Nevertheless, due to the specific neutral losses observed in product ion spectra, LipidBlast MS/MS matching annotated multiple compounds even for the same precursor ion (see supporting information) while regio- and stereoisomers usually remained ambiguous. To evaluate the advantage of using drift time ion mobility-mass spectrometry, we used m/z and RT values to model lipids detected by UPLC-QTOF MS/MS in the same way as given above. Results summarized in in Table 3 demonstrate that ion mobility data set (consisting of 429 identified peaks) provides significantly stronger predicting power than the same sample set analyzed on a Q-TOF with only 72 identified lipids (but 559 lipids if considering all MS/MS low quality matches, including mixed spectra). When using these 72 confidently identified lipids as training data and testing the resulting TPOT model on the complement of 429 true positive lipid annotations (using the in-house lipidomic database), the best algorithm resulted in 92.31% accuracy in predicting the lipid class, but not the degree of unsaturation. In comparison, a recent investigation detected only 243 TG species in milk using UPLC-Orbitrap MS/MS detection.54 CONCLUSION We provide here a lipidomic annotation database for lipids that span 300–1450 Da with CCS values ranging from 220 to 420 Å2. High-throughput UPLC-IM-QTOF MS measurements and a streamlined analysis workflow enabled the determination of CCS values for ions of 662 unique annotated lipid molecules. Importantly, cross-validation yielded an accuracy of more than 92% for these annotations. Using ion mobility added peak capacity to further distinguish chromatographically unresolved lipids and detect distinct isomers with high confidence. In positive electrospray ionization, QTOF MS/MS spectra often lack neutral losses to distinguish acyl group identities, but MSI level 3 modeling lipid classes, carbon number, and degree of unsaturation will be very helpful in dairy nutritional studies.

Page 4 of 11

These results will be used (1) to improve the confidence in lipid metabolite identification, (2) to annotate lipid classes based on CCS–m/z information, and (3) as benchmarks for the development of more general methods for calculating CCS values. The classification models and scripts are freely available at https://bitbucket.org/fiehnlab/ion-mobility-classification.

ASSOCIATED CONTENT Supporting Information The Supporting Information is available free of charge on the ACS Publications website. Table S1. Lipidomics training data of 429 lipid species belonging to 4 lipid classes with experimentally derived CCS dat Table S2. Lipidomics test data set annotating 1838 at MSI class 3 level using m/z, retention time and CCS combined classifiers Table S3. Settings used for ion mobility data processing with MassProfiler 8.0. Figure S1. Carbon number interpolation for triacylglycerols.

AUTHOR INFORMATION Corresponding Author * Oliver Fiehn, PhD email: [email protected] ORCID Oliver Fiehn : 0000-0002-6261-8928

Author Contributions The manuscript was written through contributions of all authors. / All authors have given approval to the final version of the manuscript. / ‡These authors contributed equally.

ACKNOWLEDGMENT We are thankful to John Fjeldsted and Jose Mesa from Agilent Technologies for the guidance and support provided for this research. The authors would like to acknowledge the University of Messina for support through the “Research and Mobility” collaborative project. We are thankful to Jessica Kwok for revision and linguistic editing efforts.

ACS Paragon Plus Environment

Page 5 of 11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Table 1. Results of classification systems by utilizing TPOT with a minimal configuration to optimize hyperparameters and determine the best scaling and classification methods out of: KNN Classifier, Gaussian Naive Bayes Classifier, Linear & RBF SVM. KNN (N=2, p=2, w=dist) means that the K Nearest Neighbors was used over the data, the nearest N = 2 points were looked at, the distance function was the Minkowski metric with parameter p =2, and points were weighted by inverse distance.

Input

Annotation

Algorithm

Bootstrapping Score

Classifier #1

m/z, RT

Lipid Class + Carbon Number

StandardScaler + KNN (n=3, p=1, weights=distance)

84.05%

Classifier #2

m/z, RT

Lipid Class

KNN (n=7, p=2, weights=distance)

93.51%

Classifier #3

m/z, CCS

Lipid Class + Carbon Number

KNN (n=3, p=1, weights= distance)

89.23%

Classifier #4

m/z, CCS

Lipid Class

StandardScaler + KNN (n=3, p=2, weights=distance)

94.56%

Classifier #5

m/z, RT, CCS

Lipid Class + Carbon Number

KNN (n=1, p=1, weights=distance)

91.78%

Classifier #6

m/z, RT, CCS

Lipid Class

StandardScaler + KNN (n=13, p=1, weights=distance)

95.38%

Table 2. Measurements of cross collisional section reproducibility

DT (ms)

Ω (Ų)

Inter-day

Max∆

Dif%

RSD

Max∆

Dif%

RSD

DG 24:0

0.34

0.8%

0.4%

3.0

1.4%

0.6%

DG 20:1

0.37

1.0%

0.4%

2.7

1.3%

0.6%

LPE 17:1

0.34

0.8%

0.3%

3.7

1.7%

0.7%

DG 24:0

0.03

0.1%

0.0%

0.2

0.1%

0.0%

DG 20:1

0.05

0.1%

0.1%

0.2

0.1%

0.0%

LPE 17:1

0.17

0.4%

0.2%

0.9

0.4%

0.2%

Intra-day

ACS Paragon Plus Environment

5

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 11

Table 3. Results of classification systems comparing identified ion mobility peaks with equivalent QTOF peaks. The IM classifier refers to the optimized classification model determined from the ion mobility training data in Table S2. The QTOF classifier was determined analogously utilizing TPOT and bootstrapping on the 72 identified peaks in the QTOF data. The results demonstrate that the larger ion mobility data set (429 identified peaks) provides significantly stronger predicting power.

Input Data

Classifier

Outcome

Bootstrapping Score

IM m/z, RT

IM Classifier MinMaxScaler + KNN (n=7, p=2, weights=distance)

Lipid Class

98.61%

QTOF m/z, RT

IM Classifier MinMaxScaler + KNN (n=7, p=2, weights=distance)

Lipid Class

92.31%

QTOF m/z, RT

QTOF Classifier PCA(power=3) + GaussianNaiveBayes()

Lipid Class

90.21%

Figure 1. Selected triacylgycerides found in milk and comparison of the measured orthogonal parameters: (A) m/z vs RT, (B) CCS vs m/z, and (C) CCS vs RT

ACS Paragon Plus Environment

6

Page 7 of 11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figure 2. Comparison of unresolved MS/MS spectra: 3 co-eluting peaks and their mixed spectra that could not be resolved on a QTOF.

ACS Paragon Plus Environment

7

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 11

Table of Contents artwork

(1) Wymann, M. P.; Schneiter, R. Lipid signalling in disease. Nature reviews. Molecular cell biology 2008, 9, 162-176. (2) Hong, H. Role of Lipids in Folding, Misfolding and Function of Integral Membrane Proteins. Advances in experimental medicine and biology 2015, 855, 1-31. (3) Vuorela, T.; Catte, A.; Niemela, P. S.; Hall, A.; Hyvonen, M. T.; Marrink, S. J.; Karttunen, M.; Vattulainen, I. Role of lipids in spheroidal high density lipoproteins. PLoS computational biology 2010, 6, e1000964. (4) Barrera, N. P.; Zhou, M.; Robinson, C. V. The role of lipids in defining membrane protein interactions: insights from mass spectrometry. Trends in cell biology 2013, 23, 1-8. (5) Chait, A.; Eckel, R. H. Lipids, Lipoproteins, and Cardiovascular Disease: Clinical Pharmacology Now and in the Future. The Journal of clinical endocrinology and metabolism 2016, 101, 804-814. (6) Albrink, M. J. Serum lipids, diet, and cardiovascular disease. Postgraduate medicine 1974, 55, 87-92. (7) Krauss, R. M. Lipids and lipoproteins in patients with type 2 diabetes. Diabetes care 2004, 27, 1496-1504. (8) Lim, W. L.; Martins, I. J.; Martins, R. N. The involvement of lipids in Alzheimer's disease. Journal of genetics and genomics = Yi chuan xue bao 2014, 41, 261-274. (9) Simopoulos, A. P. Omega-3 fatty acids in health and disease and in growth and development. The American journal of clinical nutrition 1991, 54, 438-463. (10) Simopoulos, A. P. The importance of the ratio of omega-6/omega-3 essential fatty acids. Biomedicine & pharmacotherapy = Biomedecine & pharmacotherapie 2002, 56, 365-379. (11) Blazenovic, I.; Kind, T.; Torbasinovic, H.; Obrenovic, S.; Mehta, S. S.; Tsugawa, H.; Wermuth, T.; Schauer, N.; Jahn, M.; Biedendieck, R.; Jahn, D.; Fiehn, O. Comprehensive comparison 8 ACS Paragon Plus Environment

Page 9 of 11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

of in silico MS/MS fragmentation tools of the CASMI contest: database boosting is needed to achieve 93% accuracy. Journal of cheminformatics 2017, 9, 32. (12) Wenk, M. R. Lipidomics: new tools and applications. Cell 2010, 143, 888-895. (13) Koelmel, J. P.; Kroeger, N. M.; Ulmer, C. Z.; Bowden, J. A.; Patterson, R. E.; Cochran, J. A.; Beecher, C. W. W.; Garrett, T. J.; Yost, R. A. LipidMatch: an automated workflow for rule-based lipid identification using untargeted high-resolution tandem mass spectrometry data. BMC bioinformatics 2017, 18, 331. (14) Yang, K.; Cheng, H.; Gross, R. W.; Han, X. Automated lipid identification and quantification by multidimensional mass spectrometry-based shotgun lipidomics. Analytical chemistry 2009, 81, 4356-4368. (15) Herzog, R.; Schuhmann, K.; Schwudke, D.; Sampaio, J. L.; Bornstein, S. R.; Schroeder, M.; Shevchenko, A. LipidXplorer: a software for consensual cross-platform lipidomics. PloS one 2012, 7, e29851. (16) Sud, M.; Fahy, E.; Subramaniam, S. Template-based combinatorial enumeration of virtual compound libraries for lipids. Journal of cheminformatics 2012, 4, 23. (17) Misra, B. B. New tools and resources in metabolomics: 2016-2017. Electrophoresis 2018, 39, 909-923. (18) Kyle, J. E.; Crowell, K. L.; Casey, C. P.; Fujimoto, G. M.; Kim, S.; Dautel, S. E.; Smith, R. D.; Payne, S. H.; Metz, T. O. LIQUID: an-open source software for identifying lipids in LC-MS/MSbased lipidomics data. Bioinformatics 2017, 33, 1744-1746. (19) Kind, T.; Liu, K. H.; Lee, D. Y.; DeFelice, B.; Meissen, J. K.; Fiehn, O. LipidBlast in silico tandem mass spectrometry database for lipid identification. Nature methods 2013, 10, 755-758. (20) Zhou, Z.; Tu, J.; Xiong, X.; Shen, X.; Zhu, Z. J. LipidCCS: Prediction of Collision Cross-Section Values for Lipids with High Precision to Support Ion Mobility-Mass Spectrometry based Lipidomics. Analytical chemistry 2017, 89, 9559-9566. (21) Sales, S.; Knittelfelder, O.; Shevchenko, A. Lipidomics of Human Blood Plasma by High-Resolution Shotgun Mass Spectrometry. Methods in molecular biology 2017, 1619, 203-212. (22) Cajka, T.; Fiehn, O. LC-MS-Based Lipidomics and Automated Identification of Lipids Using the LipidBlast In-Silico MS/MS Library. Methods in molecular biology 2017, 1609, 149-170. (23) Paglia, G.; Astarita, G. Metabolomics and lipidomics using traveling-wave ion mobility mass spectrometry. Nature protocols 2017, 12, 797-813. (24) Ibrahim, Y. M.; Garimella, S. V.; Prost, S. A.; Wojcik, R.; Norheim, R. V.; Baker, E. S.; Rusyn, I.; Smith, R. D. Development of an Ion Mobility Spectrometry-Orbitrap Mass Spectrometer Platform. Analytical chemistry 2016, 88, 12152-12160. (25) Gonzalez-Mendez, R.; Watts, P.; Howse, D. C.; Procino, I.; McIntyre, H.; Mayhew, C. A. Ion Mobility Studies on the Negative Ion-Molecule Chemistry of Isoflurane and Enflurane. Journal of the American Society for Mass Spectrometry 2017, 28, 939-946. (26) Berry, K. A.; Barkley, R. M.; Berry, J. J.; Hankin, J. A.; Hoyes, E.; Brown, J. M.; Murphy, R. C. Tandem Mass Spectrometry in Combination with Product Ion Mobility for the Identification of Phospholipids. Analytical chemistry 2017, 89, 916-921. (27) Stow, S. M.; Causon, T.; Zheng, X.; Kurulugama, R. T.; Mairinger, T.; May, J. C.; Rennie, E. E.; Baker, E. S.; Smith, R. D.; McLean, J. A.; Hann, S.; Fjeldsted, J. C. An Interlaboratory Evaluation of Drift Tube Ion Mobility - Mass Spectrometry Collision Cross Section Measurements. Analytical chemistry 2017, 89, 9048-9055. (28) Stach, J.; Baumbach, J. Ion mobility spectrometry-basic elements and applications. Int J Ion Mobility Spectrom 2002, 5, 1-21. (29) Kanu, A. B.; Dwivedi, P.; Tam, M.; Matz, L.; Hill, H. H., Jr. Ion mobility-mass spectrometry. Journal of mass spectrometry : JMS 2008, 43, 1-22. (30) Mason, E. A.; Schamp, H. W. Mobility of gaseous lons in weak electric fields. Annals of Physics 1958, 4, 233-270. ACS Paragon Plus Environment

9

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 11

(31) Beyaza, A.; Fana, W.; Carr, P. W.; Schellinger, A. P. Instrument parameters controlling retention precision in gradient elution reversed-phase liquid. Journal of chromatography. A 2014, 1371, 90-105. (32) Marchand, D. H.; Williams, L. A.; Dolan, J. W.; Snyder, L. R. Slow equilibration of reversed-phase columns for the separation of ionized solutes. Journal of chromatography. A 2003, 1015, 53-64. (33) Kind, T.; Tsugawa, H.; Cajka, T.; Ma, Y.; Lai, Z.; Mehta, S. S.; Wohlgemuth, G.; Barupal, D. K.; Showalter, M. R.; Arita, M.; Fiehn, O. Identification of small molecules using accurate mass MS/MS search. Mass spectrometry reviews 2017, 37, 513-532. (34) Cotter, R. J. High energy collisions on tandem time-of-flight mass spectrometers. Journal of the American Society for Mass Spectrometry 2013, 24, 657-674. (35) Kandiah, M.; Urban, P. L. Advances in ultrasensitive mass spectrometry of organic molecules. Chemical Society reviews 2013, 42, 5299-5322. (36) Stephan, S.; Hippler, J.; Kohler, T.; Deeb, A. A.; Schmidt, T. C.; Schmitz, O. J. Contaminant screening of wastewater with HPLC-IM-qTOF-MS and LC+LC-IM-qTOF-MS using a CCS database. Analytical and bioanalytical chemistry 2016, 408, 6545-6555. (37) Hines, K. M.; Herron, J.; Xu, L. Assessment of altered lipid homeostasis by HILIC-ion mobility-mass spectrometry-based lipidomics. Journal of lipid research 2017, 58, 809-819. (38) Paglia, G.; Astarita, G. Metabolomics and lipidomics using traveling-wave ion mobility mass spectrometry. Nature protocols 2017, 12, 797-813. (39) May, J. C.; Goodwin, C. R.; Lareau, N. M.; Leaptrot, K. L.; Morris, C. B.; Kurulugama, R. T.; Mordehai, A.; Klein, C.; Barry, W.; Darland, E.; Overney, G.; Imatani, K.; Stafford, G. C.; Fjeldsted, J. C.; McLean, J. A. Conformational ordering of biomolecules in the gas phase: nitrogen collision cross sections measured on a prototype high resolution drift tube ion mobility-mass spectrometer. Analytical chemistry 2014, 86, 2107-2116. (40) Zheng, X.; Aly, N. A.; Zhou, Y.; Dupuis, K. T.; Bilbao, A.; Paurus, V. L.; Orton, D. J.; Wilson, R.; Payne, S. H.; Smith, R. D.; Baker, E. S. A structural examination and collision cross section database for over 500 metabolites and xenobiotics using drift tube ion mobility spectrometry. Chem Sci 2017, 8, 7724-7736. (41) Paglia, G.; Kliman, M.; Claude, E.; Geromanos, S.; Astarita, G. Applications of ionmobility mass spectrometry for lipid analysis. Analytical and bioanalytical chemistry 2015, 407, 49955007. (42) Zhou, Z.; Xiong, X.; Zhu, Z. J. MetCCS predictor: a web server for predicting collision cross-section values of metabolites in ion mobility-mass spectrometry based metabolomics. Bioinformatics 2017, 33, 2235-2237. (43) Mansson, H. L. Fatty acids in bovine milk fat. Food & nutrition research 2008, 52. (44) Matyash, V.; Liebisch, G.; Kurzchalia, T. V.; Shevchenko, A.; Schwudke, D. Lipid extraction by methyl-tert-butyl ether for high-throughput lipidomics. J Lipid Res 2008, 49, 1137-1146. (45) Cajka, T.; Smilowitz, J. T.; Fiehn, O. Validating Quantitative Untargeted Lipidomics Across Nine Liquid Chromatography-High-Resolution Mass Spectrometry Platforms. Analytical chemistry 2017, 89, 12360-12368. (46) Olson, R. S.; Urbanowicz, R. J.; Andrews, P. C.; Lavender, N. A.; Kidd, L. C.; Moore, J. H.: Automating Biomedical Data Science Through Tree-Based Pipeline Optimization. In Applications of Evolutionary Computation: 19th European Conference, EvoApplications 2016, Porto, Portugal, March 30 -- April 1, 2016, Proceedings, Part I; Squillero, G., Burelli, P., Eds.; Springer International Publishing: Cham, 2016; pp 123-137. (47) Olson, R. S.; Bartley, N.; Urbanowicz, R. J.; Moore, J. H.: Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science. In Proceedings of the Genetic and Evolutionary Computation Conference 2016; ACM: Denver, Colorado, USA, 2016; pp 485-492. (48) Fabian, P.; Ga; l, V.; Alexandre, G.; Vincent, M.; Bertrand, T.; Olivier, G.; Mathieu, B.; Peter, P.; Ron, W.; Vincent, D.; Jake, V.; Alexandre, P.; David, C.; Matthieu, B.; Matthieu, P.; douard, ACS Paragon Plus Environment

10

Page 11 of 11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

D. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. %@ 1532-4435 2011, 12, 28252830. (49) Cortes, C.; Vapnik, V. Support-vector networks. Machine Learning 1995, 20, 273-297. (50) Fenn, L. S.; Kliman, M.; Mahsut, A.; Zhao, S. R.; McLean, J. A. Characterizing ion mobility-mass spectrometry conformation space for the analysis of complex biological samples. Analytical and bioanalytical chemistry 2009, 394, 235-244. (51) Hines, K. M.; Ross, D. H.; Davidson, K. L.; Bush, M. F.; Xu, L. Large-Scale Structural Characterization of Drug and Drug-Like Compounds by High-Throughput Ion Mobility-Mass Spectrometry. Analytical chemistry 2017, 89, 9023-9030. (52) Salek, R. M.; Steinbeck, C.; Viant, M. R.; Goodacre, R.; Dunn, W. B. The role of reporting standards for metabolite annotation and identification in metabolomic studies. GigaScience 2013, 2, 13. (53) Vlaeminck, B.; Fievez, V.; Cabrita, A. R. J.; Fonseca, A. J. M.; Dewhurst, R. J. Factors affecting odd- and branched-chain fatty acids in milk: A review. Animal Feed Science and Technology 2006, 131, 389-417. (54) Liu, Z.; Wang, J.; Cocks, B. G.; Rochfort, S. Seasonal Variation of Triacylglycerol Profile of Bovine Milk. Metabolites 2017, 7.

ACS Paragon Plus Environment

11