LipidMS: An R Package for Lipid Annotation in Untargeted Liquid

High resolution LC-MS untargeted lipidomics using data independent acquisition (DIA) has the potential to increase lipidome coverage, as it enables th...
0 downloads 0 Views 2MB Size
Article Cite This: Anal. Chem. 2019, 91, 836−845

pubs.acs.org/ac

LipidMS: An R Package for Lipid Annotation in Untargeted Liquid Chromatography-Data Independent Acquisition-Mass Spectrometry Lipidomics María Isabel Alcoriza-Balaguer,†,# Juan Carlos García-Cañaveras,†,# Adrián López,† Isabel Conde,‡ Oscar Juan,†,§ Julián Carretero,∥ and Agustín Lahoz*,†

Downloaded via UNIV OF BIRMINGHAM on January 2, 2019 at 11:52:48 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.



Biomarkers and Precision Medicine Unit and Analytical Unit, Instituto de Investigación Sanitaria Fundación Hospital La Fe, Valencia 46026, Spain ‡ Hepatology Unit, Department of Digestive Medicine and §Department of Medical Oncology, Hospital Universitari i Politècnic La Fe, Valencia 46026, Spain ∥ Department of Physiology, University of Valencia, Burjassot 4100, Spain S Supporting Information *

ABSTRACT: High resolution LC-MS untargeted lipidomics using data independent acquisition (DIA) has the potential to increase lipidome coverage, as it enables the continuous and unbiased acquisition of all eluting ions. However, the loss of the link between the precursor and the product ions combined with the high dimensionality of DIA data sets hinder accurate feature annotation. Here, we present LipidMS, an R package aimed to confidently identify lipid species in untargeted LCDIA-MS. To this end, LipidMS combines a coelution score, which links precursor and fragment ions with fragmentation and intensity rules. Depending on the MS evidence reached by the identification function survey, LipidMS provides three levels of structural annotations: (i) “subclass level”, e.g., PG(34:1); (ii) “fatty acyl level”, e.g., PG(16:0_18:1); and (iii) “fatty acyl position level”, e.g., PG(16:0/18:1). The comparison of LipidMS with freely available data dependent acquisition (DDA) and DIA identification tools showed that LipidMS provides significantly more accurate and structural informative lipid identifications. Finally, to exemplify the utility of LipidMS, we investigated the lipidomic serum profile of patients diagnosed with nonalcoholic steatohepatitis (NASH), which is the progressive form of nonalcoholic fatty liver disease, a disorder underlying a strong lipid dysregulation. As previously published, a significant decrease in lysophosphatidylcholines, phosphatidylcholines and cholesterol esters and an increase in phosphatidylethanolamines were observed in NASH patients. Remarkably, LipidMS allowed the identification of a new set of lipids that may be used for NASH diagnosis. Altogether, LipidMS has been validated as a tool to assist lipid identification in the LC-DIA-MS untargeted analysis of complex biological samples.

L

into eight classes.11,12 In general, lipids can be described as a combination of various building blocks, usually a core structure that defines their class (e.g., glycerol, sphingoid bases, and cholesterol) and subclass (e.g., polar head groups of phospholipids as phosphocholine and phophoethanolamine) and a variable number of fatty acyl chains (FA) attached to that core structure13 (Figure S1). As a result of the different structural arrangements of the FA into the core structures, isobaric lipids (e.g., PC(18:1/18:1) vs PC(18:0/18:2)) and isomeric lipids (e.g., PC(16:0/20:4) vs PC(20:4/16:0)) can be found, hindering their actual identification. Liquid chromatography (LC) coupled to mass spectrometry (MS) is a powerful tool, which enables the comprehensive

ipidomics can be understood as the systems-level scale analysis of lipids and their interacting partners.1 More concretely, from an analytical point of view, it can be defined as the determination of the complete set of lipids (lipidome) present in a given biological sample (e.g., cell, tissue, biofluid, organism, etc.). Lipids are a heterogeneous group of metabolites involved in many biological functions as intermediates or products in signaling pathways, structural components of cell membranes, and energy storage sources, among others.1 Alterations in general lipid profiles and in particular lipid species have been identified in many diseases including cancer,2,3 nonalcoholic fatty liver disease,4,5 diabetes,6 heart disease,7 and neurological diseases.8 From a quantitative point of view, lipids represent 60−70% of all detected and identified metabolites in the human serum metabolome9 and 20% of the human urine metabolome.10 On the basis of the LIPID MAPS Consortium, lipids are classified © 2018 American Chemical Society

Received: July 30, 2018 Accepted: November 30, 2018 Published: November 30, 2018 836

DOI: 10.1021/acs.analchem.8b03409 Anal. Chem. 2019, 91, 836−845

Article

Analytical Chemistry

Figure 1. Simplified diagram of LipidMS operations.

lipid characterization of biological samples.14 Lipid identification in untargeted MS-based lipidomics usually relies on the combined acquisition of full MS, which provides information about the nominal mass and formula of the lipids, and MS/MS data, which allows identification of the building blocks that compose them.13,14 The most common procedure for the acquisition of MS/MS spectra is to perform data dependent acquisition (DDA), in which ions (lipids) of interest are isolated and then subsequently fragmented to obtain their corresponding MS/MS spectra.15 MS data independent acquisition (DIA) is an alternative to DDA, in which no ion isolation is performed, and all the ions that elute at a given time are fragmented and detected jointly; thus, MS/ MS information is obtained for all the eluting compounds. However, the management of DIA data sets is not trivial, and it is even more complicated in the case of lipids, where apart from the parent and fragment ions’ coelution, their building

block nature generates a number of fragments that are common to several lipid species, and which usually are not chromatographically well-resolved (Figure S2). On top of that, the lack of a comprehensive collection of purified wellcharacterized lipid standards forces lipid identification to be based on the combination of MS and MS/MS data with the only additional support of known fragmentation rules.16 A number of software tools have been developed for the identification of lipids using DDA: MS-DIAL,17 Greazy,18 LipiDex,19 LDA,20 or the use of the LipidBlast in silico database16 searched via the NIST MS Search program. However, only a few freely available tools are designed for DIA-based lipid identification: MS-DIAL,17 Lipid-Pro,21 and LipidMatch,22 with MS-DIAL being the most used one (based on the number of cites reported by Google Scholar). Here, we present LipidMS, an R package (https://CRAN.Rproject.org/package=LipidMS) for lipid annotation in LC837

DOI: 10.1021/acs.analchem.8b03409 Anal. Chem. 2019, 91, 836−845

Article

Analytical Chemistry

specific rules for each polarity. For further details, the reader is referred to the manual package (https://CRAN.R-project.org/ package=LipidMS). Data Conversion. Lipid identification functions within LipidMS require a separate peak list for each collision energy used (e.g., MS1, MS2low, and MS2high) as input and peak picking tools usually handle only MS1 as input. Therefore, it is mandatory to convert complex DIA-MS data into a format that can be used for peak picking. To convert raw data into mzXML, MSConvert software (ProteoWizard 3.0.10800 64 bit)26 can be used. The procedure to extract each collision energy file for the raw data is instrument dependent. Here, two Q-ToF independent platforms have been used (i.e., a Waters Synapt G2-Si Q-ToF and an Agilent Q-ToF 6550). For the Waters instrument, raw archive file contains three different data acquisition functions (i.e., MS1, MS2, and lockspray). Lockspray files must be removed, and the other functions have to be separated by collision energy and then converted into .mzXML files. Whereas for Agilent, the .d raw data archive is directly converted into a single mzXML file and subsequently separated into collision energy independent files using the LipidMS sepByCE function. Figure 1 shows the recommended data-processing workflow for LipidMS. Peak Detection and Alignment. Data preprocessing (i.e., peak picking, deisotoping, and alignment) can be performed using either free GUI software packages such as MZmine,27 XCMS,28 enviPick (https://CRAN.R-project.org/package= enviPick) or commercial software packages such as Progenesis QI or MassHunter Workstation. LipidMS includes a function that takes advantage of enviPick for peak picking and of CAMERA29 for alignment and deisotoping, which is strongly recommended for performing data processing. Moreover, the use of the LipidMS dataProcessing function is the easiest way to get the required data inputs for using the PFCS to complement tR windows for the association of parents and fragments. Lipid Identification. LipidMS allows efficient annotation of lipids within a wide range of concentrations (Figure S3). However, as a general rule, the use of saturated signals for lipid identification should be avoided, as it deteriorates both mass accuracy and peak shape, thus hampering feature annotation. Lipid identification is separately performed using data from positive and negative ESI modes through idPOS and idNEG functions, respectively. Nevertheless, specific lipid classes can be identified alone by using class-defined functions (Table S1). The implementation of LipidMS within a lipidomics workflow is depicted in Figure S4. DIA data used to test LipidMS performance and an example script can be found at GitHub (https://github.com/maialba3/LipidMS-data-v1.0). Samples Included in the Study. Test Samples. Two test samples were used to evaluate the performance of LipidMS. These samples were prepared by spiking a mixture containing 50 lipid standards into a blank sample or a pooled human serum sample (Sigma-Aldrich, Madrid, Spain). These lipid standards were selected attending to their biological relevance, their representativeness of lipid classes, and their analytical relevance; to this end, isobaric/isomeric species were also included (Table S2). Serum Samples from Patients with NAFLD. Patients diagnosed with NAFLD at the Liver Transplantation and Hepatology Unit at the Hospital La Fe (Valencia) were enrolled in this study. NAFLD diagnosis was performed by histological examination of liver biopsy specimens. NAFLD was assessed by using the NAFLD activity score (NAS).30 A

DIA-MS. LipidMS calculates a precursor and fragment coelution score (PFCS) for those ions present in a predefined retention time (tR) window, and then, it applies a set of fragmentation and fragment intensity rules to annotate lipids. Moreover, LipidMS uses either .csv, for already preprocessed data sets, or the common file format for MS data, .mzXML, as data input formats; thus, it is compatible with multiple mass spectrometer vendors. To assess LipidMS performance, it was first showcased to process LC-DIA-MS data from two test samples (i.e., a standard sample and a pooled human serum sample). These samples were prepared by adding a mixture of 50 representative lipid standards and then analyzed using two mass spectrometers (i.e., Agilent Q-ToF 6550 and Waters Synapt G2-Si Q-ToF). LipidMS was also compared with DDA and other DIA existing tools.17 Finally, to exemplify the package utility in a biological context, LipidMS was applied in the lipidomic analysis of serum samples from patients diagnosed with nonalcoholic steatohepatitis (NASH), which is the progressive form of nonalcoholic fatty liver disease (NAFLD), a disorder characterized by a strong lipid dysregulation. NAFLD and NASH have been extensively studied by metabolomics and lipidomics approaches, and specific lipid patterns have been proposed as diagnostic and prognostic biomarkers signatures.4,23,24 Not only do our results confirm previously published lipid-related markers but also they provide a new set of lipids that are now proposed as a NASH biomarker lipid-based signature.



EXPERIMENTAL SECTION Other experimental details about chemicals, lipidome extraction, labeling techniques, LC-MS settings, and data-processing parameters are provided in the Supporting Information. LipidMS Processing Workflow. LipidMS was developed in an R programming environment,25 and it is available via CRAN (https://CRAN.R-project.org/package=LipidMS). LipidMS includes dedicated functions for: MS data processing, lipid identification, data import, lipid annotations export, database customization, and creating an inclusion list for targeted MS analysis (Table S1). Format Requirement for Lipid Annotation Functions. LipidMS identification functions require two data inputs: (i) a peak table for MS1 and one or two peak tables for MS2, depending on the number of collision energies used, and (ii) one raw data table for MS1 and one or two raw data tables for MS2, depending on the number of collision energies used. The peak tables are mandatory and are used for identification, while the raw data tables are optional and only used for the calculation of the PFCS. If the raw data tables are not used, the association between parent and fragments ions will be based exclusively on tR windows. Both types of tables are obtained from mzXML files when the dataProcessing function is employed. The peak tables must contain deisotoped and tR aligned peaks. Formally, they have to be stored as data frames containing, at least, the following columns: m/z, tR (in seconds), intensity/area, and peak identification (PeakID column). The raw data tables provide scan by scan information on each MS or MS/MS data file and have to contain the following columns: m/z, tR (in seconds), intensity/area, peakID, and scan number. These tables can be easily obtained performing data processing with LipidMS, although other approaches can also be used. Data acquired in positive and negative electrospray ionization modes (ESI) have to be provided separately, as lipid identification functions apply 838

DOI: 10.1021/acs.analchem.8b03409 Anal. Chem. 2019, 91, 836−845

Article

Analytical Chemistry

Figure 2. Flow diagram of lipid annotation in LipidMS. The steps for the identification of 747.5177 m/z with a tR of 285 s is shown as an example.

total of 20 patients with an NAS ≥ 5, which strongly correlates with NASH, were selected. Additionally, 14 serum samples from healthy donors with similar demographic characteristics from the Biobank at IIS-La Fe were selected as control group. All the samples were obtained after receiving informed consent. The study was approved by the Institutional Ethics Committee.

untargeted LC-DIA-MS lipidomics. The building block nature of the majority of lipids enables the establishment of generic structure-derived fragmentation rules that can be used for MSbased identification and structure elucidation. This strategy has been satisfactorily implemented for lipid identification in both DDA and DIA approaches.16,20,22 However, most of the current methods rely on the use of the most intense fragments to accomplish lipid identification, which can generate false positives because the poor selectivity of these ions when coelution is present. In reverse phase chromatography, lipids elution depends on both the lipid class and their FA composition; thus, each lipid class usually elutes within a



RESULTS AND DISCUSSION Rationale behind LipidMS. LipidMS has been developed in R programming language to serve as an easy-to-use and highly adaptable end-user tool for assisting lipid annotation in 839

DOI: 10.1021/acs.analchem.8b03409 Anal. Chem. 2019, 91, 836−845

Article

Analytical Chemistry

LipidMS Annotation Workflow. LipidMS contains 31 functions aimed to annotate 22 lipid classes using either positive or negative ESI modes (Table S1). To exemplify LipidMS annotation workflow, the annotation procedure for PG(16:0/18:1) is described in Figure 2. Overall, the following steps (internal functions, indicated in italics) are executed within each identification function survey for lipid annotation (i.e., idPGneg): (i) On the basis of the set of chemical entities included in the bbDB (Table S5) and on the ionization properties selected for each lipid class (Table S6), a target ion list is generated by LipidMS (QDB). This list is subsequently used to interrogate the full MS data within a defined tR window and a mass error gap (f indCandidates). These parameters can be easily set up by the user. At this step, putatively annotated lipids are identified based on the lipid class, and the number of carbons and double bonds is determined. This level of survey is not reported by LipidMS by default, as we considered it as noninformative. However, this information can be easily recovered by the f indCandidates function or the class identification functions (e.g., idPGneg). (ii) The coeluting fragment ions for each putatively annotated lipid are selected based on the defined tR window. Optionally, a PFCS is then calculated for each of the pair ions used for lipid identification, and only those above a previously defined threshold are retained. To minimize false positives, a value of 10 s for the tR and a PFCS value of 0.8 are set by default. However, these values can be easily changed by the user (coelutingFrags). (iii) On the basis of the established fragmentation rules (Tables S3 and S4) and on a by default mass error of 10 ppm, a survey of informative fragment ions of the lipid class (e.g., head groups) is performed among the coeluting fragments extracted in step (ii) (checkClass). The mass error used in each survey can be modified by the user (ppm_products argument). (iv) Then, the same procedure is applied for searching fragment ions informative of the fatty acyl component (chainFrags). (v) On the basis of the proposed fatty acyl components, combinations that sum up the expected total number of carbons and double bonds determined in step (i) are searched in the MS/MS data (combineChains). (vi) Once the fatty acyl components have been determined, intensity rules, which are based on the relative intensities ratios between the fragments, are applied to elucidate the position of those chains (checkIntensityRules). For further details regarding intensity rules, see Tables S3 and S4 and previously published data.19 Attending to the MS structural evidence reached by each annotation survey, LipidMS provides different levels of structural information:20,35 (i) “subclass level”, where specific class fragments (e.g., head groups of phospholipids) are used to determine the subclass, and the precursor ion is used to calculate the total number of carbons and double bonds of the chains. At this level, LipidMS cannot differentiate which fatty acids are linked to the backbone, and a sum of several isobaric/ isomeric compounds is proposed (e.g., PG(34:1)); (ii) “fatty acyl level” (FA level), where the composition of the constituent chains is assigned based on chain-specific fragments, but no positional information is given (e.g., PG(16:0_18:1); and (iii) “fatty acyl position level” (FA position level), where the specific position of each chain is elucidated through fragment intensity ratios (e.g., PG(16:0/ 18:1)). As a result of the execution of lipid identification function (idPOS or idNEG), two separate R objects, which can be easily saved as tables, are generated (i.e., ”results peak table” and

narrow tR window. As a result, many common fragments, as those corresponding to head groups, are poorly chromatographically resolved (Figure S2), which strongly affects their selectivity for lipid annotation. This issue is particularly relevant when complex biological samples are analyzed. To overcome these drawbacks, lipid annotation in LipidMS is based on combining two complementary approaches. First, to modulate the stringency in the association of parent with coeluting fragment ions, a PFCS is calculated for all the MS/ MS ions present in a predefined tR window around the parent ion. The PFCS score is formally defined as a Pearson correlation coefficient calculated based on the peak shape (distribution of intensities over elution time) of parent and fragment ions, and it can be used to test the similarity among those ion chromatograms. This approach has been successfully applied to the analysis of MS data in the field of metabolomics.31 Second, and most importantly, LipidMS takes advantage of the use of fragmentation and fragment intensity rules. The last are defined based on the relation between the intensities of different fragment ions and are used to elucidate the position of the different FA into the lipid backbone structure. Both fragmentation and intensities rules have been manually curated by using public available spectral information (i.e., LipidMaps, 32 Metlin, 33 LipidBlast, 16 HMDB34) and in-house generated MS/MS spectra for DDA and DIA in two different MS/MS platforms (Waters Synapt G2-Si Q-ToF and Agilent Q-ToF 6550). In the fragmentation rules curation procedure, the use of highly intense fragments common to several lipid classes has been avoided when possible, and specific well-characterized fragments and adducts have been selected instead. Specific selected fragments as well as the preferred acquisition mode (i.e., ESI+ and ESI−) for each lipid class are summarized in Tables S2−S4. Additionally, the experimental data supporting the selection of the fragmentation rules used by LipidMS are represented in Figures S5−S20. Lipid Coverage and Building Block Database Customization. As previously mentioned, most of the lipids can be defined by a backbone structure, which defines the lipid class and subclass, and a number of acyl residues attached to that core structure. Thanks to these features, a lipid database can be built by defining both the lipid core and the set of acyl chains to be incorporated.13 In LipidMS, the acyl residues are specified in the building block database (bbDB), where an entity (e.g., FA(16:0) can be used as a specific candidate (i.e., FA(16:0)) or as fatty acyl radical of a number of more complex lipids (e.g., PL, GL, SM). By default, the bbDB includes 30 fatty acids, 4 sphingoid bases, and 3 bile acids (Table S5), which were selected based on their biological relevance.12 LipidMS arranges those chemical entities to build up a query database (QDB), which will be eventually used to interrogate the MS data. The arrangement of the 37 entities included in the default bbDB covers 22 lipid classes and results in 2502 potential molecular formulas and more than 53 000 individual lipids. The lipidome coverage provided by the LipidMS can be easily modified by varying the chemical entities provided in the bbDB by just using the createLipidDB function. For instance, odd fatty acyls as FA(19:0) can be included, which would be used as a potential candidate or as a part of more complex lipids (e.g., PC(19:0_19:0) or TG(19:0_19:0_19:0)). Additionally, the repertoire of lipids included in the bbDB used to build the QDB can also be exported elsewhere to be used as a library or a target inclusion list (createLipidDB). 840

DOI: 10.1021/acs.analchem.8b03409 Anal. Chem. 2019, 91, 836−845

Article

Analytical Chemistry Table 1. Summary of the Lipid Standards Identified in the Test Sample Using the Agilent Q-ToF 6550 LIPIDMS class a

FA (16) CE (1)a

LPL (1)a PL (11)a

Cer (2)a SM (1)a glycerolipids (9)a bile acids (9)a,b

possible levels of structural annotation (PFCS ≥ 0.8) c

subclass level subclass level fatty acyl levelc subclass level fatty acyl levelc subclass level fatty acyl level fatty acyl position levelc fatty acyl levelc subclass level fatty acyl levelc fatty acyl level fatty acyl position levelc subclass level total identified standards total identified standards at max. annotation level total number of false positivesd

MS-DIAL

DIA

DDA

DIA

16 1 0 0 1 0 4 6 2 0 1 2 7 9 49/50 42/50 9

12 0 0 0 1 0 9 0 2 1 0 5 0 30/41g 29/41f 4

9 1 0 0 1 0 11 0 2 1 0 6 0 31/41g 29/41g 23

a

Denotes the total number of lipids per class. bMS-DIAL does not support bile acid identification. cThe maximum level of structural annotation reached in each lipid class is in bold. dFalse positive identities are annotated based on molecular ion and characteristic lipid fragment; specific identities are listed in Table S10. Statistical p-value was calculated by χ2 test. ep < 0.05. fp < 0.01. gp < 0.001.

Table 2. Summary of the Lipid Standards Identified in the Spiked Serum Sample Using the Agilent Q-ToF 6550 LIPIDMS class FA (16)a CE (1)a LPL (1)a PL (11)a

Cer (2)a SM (1)a glycerolipids (9)a bile acids (9)a,b

MS-DIAL

possible levels of structural annotation (PFCS ≥ 0.8)

DIA

DDA

DIA

subclass levelc subclass level fatty acyl levelc subclass level fatty acyl levelc subclass level fatty acyl level fatty acyl position levelc fatty acyl levelc subclass level fatty acyl levelc fatty acyl level fatty acyl position levelc subclass level total identified standards total identified standards at max. annotation level total number of identified lipids

16 0 0 0 1 1 1 9 2 0 1 0 9 7 47/50 45/50 366

9 0 0 0 1 0 8 0 2 1 0 4 0 25/41f 25/41e 112

12 0 0 0 1 1 9 0 2 0 1 7 0 33/41f 32/41e 326

a

Denotes the total number of lipids per class. bMS-DIAL does not support bile acid identification. cThe maximum level of structural annotation reached in each lipid class is in bold. Statistical p-value was calculated by χ2 test dp < 0.05. ep < 0.01. fp < 0.001.

“annotated peak table”). On the one hand, the “results peak table” contains the following information for each annotated lipid: (i) feature identity, annotated as lipid class, total number of carbons, double bonds, and fatty acid composition; (ii) peak properties, including m/z, tR, peak intensity, and peakID information; and (iii) identification criteria used, reporting information about adduct/s detected, m/z error, structural annotation level, and the mean PFCS value. On the other hand, the “annotated peak table” links the original MS1 data with the “results peak table”, providing the following information for each feature: m/z, tR, peak intensity, peakID, all the possible identities ranked by the annotation level, ion adducts, and the mean value of the PFCS used in each lipid identification. Further information about the fragments that

support each identification can be explored using class-specific identification functions (i.e., idPGneg). Among the extra functions incorporated in LipidMS, two should be further explained due to their utility: (i) the getInclusionList function, which builds a list of all annotated lipids with the following information: formula; tR in seconds; monoisotopic neutral mass; and lipid identity. This table may be used to apply the DIA-based identities to automatize targeted peak picking in multiple samples containing only MS data or to prioritize ion fragmentation in DDA-based approaches (Figure S4) and (ii) the searchIsotopes function, which allows identification of compound isotopes when labeled compounds are used as tracers (e.g., U−13C-glucose or U−13Cglutamine). Here, LipidMS uses a control sample, where the 841

DOI: 10.1021/acs.analchem.8b03409 Anal. Chem. 2019, 91, 836−845

Article

Analytical Chemistry

Figure 3. Lipidome alterations in the serum of NASH patients. (A) Principal component analysis scores plot for the control and NASH samples; (B) volcano plot of the 258 lipids annotated by LipidMS and colored by lipid class, significant differential abundance for lipid species was assigned to p-value < 0.05 and log2 fold of change (FC = mean value control/mean value NASH) > 1.5. (C) Boxplots showing significant changes in lipid classes; (D) Boxplots showing significant changes for lipids that have been previously reported as NASH biomarkers detected by LipidMS. Mann− Whitney U tests were used to calculate statistical significance, and p-values were corrected using the Benjamini-Hochberg procedure. *, p-value < 0.05; **, p-value < 0.01; ***, p-value < 0.001.

Performance Evaluation of LipidMS. As a first step to test the performance of LipidMS, a mixture of 50 representative lipid standards comprising several lipid classes (Table S2) was used to prepare a standard test sample and to fortify a pooled human serum sample. These two test samples were subsequently analyzed in both positive and negative ESI modes in a Q-ToF mass spectrometer (Agilent Q-ToF 6550). LipidMS was able to identify 49 standards at the subclass level in the standard test sample. Among them, 42 reached the maximum level of annotation possible for each class (i.e., FA and FA position levels), while for the serum test sample, 47 lipid standards were identified at the subclass level and 45 of them at FA and FA position levels when possible (Tables 1 and 2, S8−S12).

tracer is not present, to generate a target inclusion list of lipids and their corresponding tR. This list is subsequently used to search for isotopes in each tR using the raw data generated in the presence of the tracer. Thus, lipids isotope distributions can be obtained (Figure S21). To test the utility of the searchIsotopes function, A549 cells were incubated in parallel containing either U−12C-D-glucose or U−13C-D-glucose. Labeling incorporation into palmitic acid was used as an example showing that LipidMS can effectively assess 13Cpatterns when labeled compounds are used (Table S7). However, it should be noted that further improvements have to be implemented to take full advantage of LipidMS identifications capabilities when using 13-C-labeled samples. 842

DOI: 10.1021/acs.analchem.8b03409 Anal. Chem. 2019, 91, 836−845

Article

Analytical Chemistry

Therefore, finding new noninvasive NAFLD diagnosis and prognosis biomarkers has aroused much interest. An important number of studies have relied on metabolomics or lipidomics for metabolite biomarker discovery.4,23,24,37 Here, LipidMS was applied for the LC-DIA-MS untargeted analysis of serum samples of patients diagnosed with NASH and of healthy donors. The baseline characteristics of the patients enrolled in the study are summarized in Table S16. The groups were similar with respect to gender, age, body mass index, fasting blood sugar, and hepatic synthetic functions. A pooled sample was generated by mixing equal amounts of each sample and used for lipid identification based on DIA-MS/MS. Combining both positive and negative ionization modes, 258 lipids were identified in the pooled sample and then extracted from the rest of the samples based on their accurate m/z and tR. Principal component analysis showed a clear separation between control and NASH groups (Figure 3A), suggesting differences in their underlying lipidomic profiles. In total, 22 lipids were significantly altered between control and NASH patients (p-value ≤ 0.05 and a |log2 fold of change| ≥ 1) (Figure 3B). Moreover, when analyzing generic trends based on the sum of the intensities of the lipids belonging to a given class, a significant decrease in PC, LPC, and CE and an increase in PE were observed for NASH patients (Figure 3C). These observations are in agreement with previously published data, where it is suggested that these lipid species could play a role in disease progression.4,23,24 Furthermore, LipidMS was also able to identify some specific lipids that have been previously proposed as NAFLD or NASH biomarkers (e.g., PE(16:0/22:6), PE(18:0/22:6), PC(16:0/20:4), and TG(54:5) among others4,23,24 (Figure 3D). Interestingly, LipidMS also identified a set of new potential biomarkers of NASH (Table S17). However, this lipidomic signature should be further confirmed in a larger cohort of NASH patients. Overall, our results confirmed previously published data and validated LipidMS for DIA data analysis in untargeted LC-MS lipidomic approaches involving complex biological samples.

Once the reliability of LipidMS was proven, we decided to compare it with already available tools. MS-DIAL17 was selected as the software of reference, since it is one of the most valuable and cited tools used for lipid identification in both DDA and DIA modes. MS-DIAL employs a combination of mass spectral deconvolution, spectral matching, and LipidBlast (an in silico library with a broad lipid coverage) for lipid annotation. LipidMS identified a higher number of lipid standards in both test samples compared to MS-DIAL, independently of the acquisition mode (Tables 1 and 2). Accordingly, LipidMS also reported a higher number of total identified lipids in the untargeted analysis of the pooled human serum sample (Table 2). Interestingly, although the higher number of identifications was reported by LipidMS, MS-DIAL applied to DIA data also provided a higher number of identifications than MS-DIAL applied to DDA data, which highlights the importance of using DIA approaches. The number of false positive identifications was the only parameter in which DDA slightly outperformed DIA-based approaches in our comparison (Table 1). However, even in that aspect, LipidMS proved superior to MS-DIAL when applied to DIA samples. We would like to remark that MS-DIAL only reports two levels of identification: “annotated”, based on MS data, or “identified”, based on both MS and MS/MS data. However, no detailed information about the actual level of structural evidence is reported, and the highest level of annotation that can be achieved is FA level. Compared to MS-DIAL, LipidMS provides a more detailed report of the level of structural evidence that supports the identification, and thanks to the implementation of fragment intensity rules, a highest level of structural information can be reached (i.e., FA position level). Thus, LipidMS significantly outperformed MS-DIAL in the level of structural information reached in each standard identification (Tables S8 and S11). Ideally, further comparisons with other commonly used DIA methods as LipidMatch22 or Lipid-Pro21 should have been performed. However, LipidMatch only supports Thermo (Q Exactive) files for DIA, while in Lipid-Pro, fragmentation rules have to be manually provided for each lipid class, which was found to be very time-consuming. Moreover, we did not find the way to fully implement the rules employed by LipidMS in Lipid-Pro. Finally, to prove that LipidMS can be used with DIA data obtained from multiple platforms, we decided to compare the results obtained for the two test samples analyzed in two different Q-ToF instruments (i.e., a Waters Synapt G2-Si QToF and an Agilent Q-ToF 6550). No significant differences were observed in terms of the number of lipid standards identified in both test samples (Table 1, 2, S8, S11, and S13− S15), where 98 and 92% of coincidence were achieved, respectively. Furthermore, similar lipidomic characterization in terms of the type of lipid classes and the level of structural information reached was observed for both instruments (Figure S22). Altogether, these results proved that LipidMS performance is not dependent on the analytical platform used. However, its suitability for other mass analyzers (e.g., Orbitrap) or other vendors could be further confirmed once the package is used by the MS-based lipidomics community. Application of LipidMS in the LC-DIA-MS Analysis of NAFLD. NAFLD is now the most common liver disorder in the developed world, affecting up to a third of individuals. However, diagnosis is usually based on imaging tests, and liver biopsy is required for disease confirmation and staging.36



CONCLUSIONS A new freely available method for DIA data sets analysis in LCMS untargeted lipidomics, namely LipidMS, has been developed. The new method takes advantage of combining curated fragmentation and intensity rules with a parent and fragment coelution score, which is calculated in predefined retention time windows for the reliable identification of lipids. LipidMS provides wide lipid coverage, and it is easily customizable thanks to the use of an R environment.25 Compared to existing DDA and DIA tools (MS-DIAL), LipidMS significantly detected a higher number of lipids in the analysis of two test samples (standard and human serum samples). Moreover, LipidMS provides a detailed description of the level of structural information achieved for each identified lipid, and thanks to the fragment and intensity rules implemented in LipidMS a higher level of structural information can be reached (FA position level, compared to FA composition that is the highest level reached by other tools). Data analysis independency and reproducibility was also proven by comparing the results obtained by two independent Q-ToF analytical platforms (Waters Synapt G2-Si Q-ToF and Agilent Q-ToF 6550). LipidMS usefulness was further demonstrated when it was applied to the analysis of real clinical samples, that is, NASH serum samples, where not only previously identified lipid patterns were corroborated but also 843

DOI: 10.1021/acs.analchem.8b03409 Anal. Chem. 2019, 91, 836−845

Article

Analytical Chemistry

(2) Hilvo, M.; Denkert, C.; Lehtinen, L.; Muller, B.; Brockmoller, S.; Seppanen-Laakso, T.; Budczies, J.; Bucher, E.; Yetukuri, L.; Castillo, S.; Berg, E.; Nygren, H.; Sysi-Aho, M.; Griffin, J. L.; Fiehn, O.; Loibl, S.; Richter-Ehrenstein, C.; Radke, C.; Hyotylainen, T.; Kallioniemi, O.; et al. Cancer Res. 2011, 71, 3236−3245. (3) Patterson, A. D.; Maurhofer, O.; Beyoglu, D.; Lanz, C.; Krausz, K. W.; Pabst, T.; Gonzalez, F. J.; Dufour, J. F.; Idle, J. R. Cancer Res. 2011, 71, 6590−6600. (4) Puri, P.; Baillie, R. A.; Wiest, M. M.; Mirshahi, F.; Choudhury, J.; Cheung, O.; Sargeant, C.; Contos, M. J.; Sanyal, A. J. Hepatology 2007, 46, 1081−1090. (5) Garcia-Canaveras, J. C.; Peris-Diaz, M. D.; Alcoriza-Balaguer, M. I.; Cerdan-Calero, M.; Donato, M. T.; Lahoz, A. Electrophoresis 2017, 38, 2331−2340. (6) Rhee, E. P.; Cheng, S.; Larson, M. G.; Walford, G. A.; Lewis, G. D.; McCabe, E.; Yang, E.; Farrell, L.; Fox, C. S.; O’Donnell, C. J.; Carr, S. A.; Vasan, R. S.; Florez, J. C.; Clish, C. B.; Wang, T. J.; Gerszten, R. E. J. Clin. Invest. 2011, 121, 1402−1411. (7) Meikle, P. J.; Wong, G.; Tsorotes, D.; Barlow, C. K.; Weir, J. M.; Christopher, M. J.; MacIntosh, G. L.; Goudey, B.; Stern, L.; Kowalczyk, A.; Haviv, I.; White, A. J.; Dart, A. M.; Duffy, S. J.; Jennings, G. L.; Kingwell, B. A. Arterioscler., Thromb., Vasc. Biol. 2011, 31, 2723−2732. (8) Han, X.; Rozen, S.; Boyle, S. H.; Hellegers, C.; Cheng, H.; Burke, J. R.; Welsh-Bohmer, K. A.; Doraiswamy, P. M.; Kaddurah-Daouk, R. PLoS One 2011, 6, e21643. (9) Psychogios, N.; Hau, D. D.; Peng, J.; Guo, A. C.; Mandal, R.; Bouatra, S.; Sinelnikov, I.; Krishnamurthy, R.; Eisner, R.; Gautam, B.; Young, N.; Xia, J.; Knox, C.; Dong, E.; Huang, P.; Hollander, Z.; Pedersen, T. L.; Smith, S. R.; Bamforth, F.; Greiner, R.; et al. PLoS One 2011, 6, e16957. (10) Bouatra, S.; Aziat, F.; Mandal, R.; Guo, A. C.; Wilson, M. R.; Knox, C.; Bjorndahl, T. C.; Krishnamurthy, R.; Saleem, F.; Liu, P.; Dame, Z. T.; Poelzer, J.; Huynh, J.; Yallou, F. S.; Psychogios, N.; Dong, E.; Bogumil, R.; Roehring, C.; Wishart, D. S. PLoS One 2013, 8, e73076. (11) Fahy, E.; Subramaniam, S.; Brown, H. A.; Glass, C. K.; Merrill, A. H., Jr.; Murphy, R. C.; Raetz, C. R.; Russell, D. W.; Seyama, Y.; Shaw, W.; Shimizu, T.; Spener, F.; van Meer, G.; VanNieuwenhze, M. S.; White, S. H.; Witztum, J. L.; Dennis, E. A. J. Lipid Res. 2005, 46, 839−861. (12) Fahy, E.; Subramaniam, S.; Murphy, R. C.; Nishijima, M.; Raetz, C. R.; Shimizu, T.; Spener, F.; van Meer, G.; Wakelam, M. J.; Dennis, E. A. J. Lipid Res. 2009, 50 (Suppl), S9−14. (13) Han, X.; Yang, K.; Gross, R. W. Mass Spectrom. Rev. 2012, 31, 134−178. (14) Cajka, T.; Fiehn, O. TrAC, Trends Anal. Chem. 2014, 61, 192− 206. (15) Zhu, X.; Chen, Y.; Subramanian, R. Anal. Chem. 2014, 86, 1202−1209. (16) Kind, T.; Liu, K. H.; Lee, D. Y.; DeFelice, B.; Meissen, J. K.; Fiehn, O. Nat. Methods 2013, 10, 755−758. (17) Tsugawa, H.; Cajka, T.; Kind, T.; Ma, Y.; Higgins, B.; Ikeda, K.; Kanazawa, M.; VanderGheynst, J.; Fiehn, O.; Arita, M. Nat. Methods 2015, 12, 523−526. (18) Kochen, M. A.; Chambers, M. C.; Holman, J. D.; Nesvizhskii, A. I.; Weintraub, S. T.; Belisle, J. T.; Islam, M. N.; Griss, J.; Tabb, D. L. Anal. Chem. 2016, 88, 5733−5741. (19) Hutchins, P. D.; Russell, J. D.; Coon, J. J. Cell Systems 2018, 6, 621−625.e5. (20) Hartler, J.; Triebl, A.; Ziegl, A.; Trötzmüller, M.; Rechberger, G. N.; Zeleznik, O. A.; Zierler, K. A.; Torta, F.; Cazenave-Gassiot, A.; Wenk, M. R.; Fauland, A.; Wheelock, C. E.; Armando, A. M.; Quehenberger, O.; Zhang, Q.; Wakelam, M. J. O.; Haemmerle, G.; Spener, F.; Köfeler, H. C.; Thallinger, G. G. Nat. Methods 2017, 14, 1171−1174. (21) Ahmed, Z.; Mayr, M.; Zeeshan, S.; Dandekar, T.; Mueller, M. J.; Fekete, A. Bioinformatics 2015, 31, 1150−1153.

a new set of biomarkers was proposed. Altogether, LipidMS has been validated as a tool to assist lipid identification in LCDIA-MS untargeted analysis of complex biological samples.



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.analchem.8b03409.



Supplementary experimental section: chemicals, preparation of lipid standards, lipid extraction from human serum samples, LC-MS analysis, parent-fragment coelution score (PFCS), isotopic labeling technique; Supplementary tables: list of functions and data sets implemented in LipidMS package, ionization and fragmentation rules employed for lipid annotation, FA chains and sphingoid bases employed for building the QDB, detailed results from searchIsotopes function using a labeled sample, detailed annotation results for both standard and pooled human serum samples, demographic characteristics of NASH and control patients enrolled in the study, NASH lipid-related biosignature; Supplementary figures: general structures for the main classes of lipid present in humans, coelution profile of generic fragment 184.074 (phosphocholine) of PC and SM for a complex lipid sample, MS response to increasing concentration of standards for a representative lipid species of each lipid class, graphic description of the implementation of LipidMS within a lipidomics study, fragmentation patterns of several lipid classes covered by LipidMS, workflow proposed for using LipidMS using samples incubated with isotope tracers, comparison of annotation results using a pooled human serum sample for two different Q-ToF mass spectrometers (PDF)

AUTHOR INFORMATION

Corresponding Author

*Mailing Address: Agustiń Lahoz, Biomarkers and Precision Medicine Unit and Analytical Unit, Instituto de Investigación Sanitaria Fundación Hospital La Fe, Av. Fernando Abril Martorell 106, Valencia 46026, Spain; E-mail: agustin.lahoz@ uv.es; Tel: 961246652; Fax: 961246620. ORCID

María Isabel Alcoriza-Balaguer: 0000-0001-5691-2787 Juan Carlos García-Cañaveras: 0000-0001-7112-1537 Agustín Lahoz: 0000-0001-7232-0626 Author Contributions #

M.I.A.-B. and J.C.G.-C. contributed equally.

Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This work has been supported by the European Regional Development Fund (FEDER) Institute of Health Carlos III of the Spanish Ministry of Economy and Competitiveness (PI14/ 0026 and PI17/01282).



REFERENCES

(1) Wenk, M. R. Nat. Rev. Drug Discovery 2005, 4, 594−610. 844

DOI: 10.1021/acs.analchem.8b03409 Anal. Chem. 2019, 91, 836−845

Article

Analytical Chemistry (22) Koelmel, J. P.; Kroeger, N. M.; Ulmer, C. Z.; Bowden, J. A.; Patterson, R. E.; Cochran, J. A.; Beecher, C. W. W.; Garrett, T. J.; Yost, R. A. BMC Bioinf. 2017, 18, 331. (23) Anjani, K.; Lhomme, M.; Sokolovska, N.; Poitou, C.; AronWisnewsky, J.; Bouillot, J.-L.; Lesnik, P.; Bedossa, P.; Kontush, A.; Clement, K.; Dugail, I.; Tordjman, J. J. Hepatol. 2015, 62, 905−912. (24) Puri, P.; Wiest, M. M.; Cheung, O.; Mirshahi, F.; Sargeant, C.; Min, H. K.; Contos, M. J.; Sterling, R. K.; Fuchs, M.; Zhou, H.; Watkins, S. M.; Sanyal, A. J. Hepatology 2009, 50, 1827−1838. (25) R Core Team. R Foundation for Statistical Computing, 2016. (26) Kessner, D.; Chambers, M.; Burke, R.; Agus, D.; Mallick, P. Bioinformatics 2008, 24, 2534−2536. (27) Pluskal, T.; Castillo, S.; Villar-Briones, A.; Oresic, M. BMC Bioinf. 2010, 11, 395. (28) Smith, C. A.; Want, E. J.; O’Maille, G.; Abagyan, R.; Siuzdak, G. Anal. Chem. 2006, 78, 779−787. (29) Kuhl, C.; Tautenhahn, R.; Bottcher, C.; Larson, T. R.; Neumann, S. Anal. Chem. 2012, 84, 283−289. (30) Brunt, E. M.; Kleiner, D. E.; Wilson, L. A.; Belt, P.; Neuschwander-Tetri, B. A. Hepatology 2011, 53, 810−820. (31) Li, H.; Cai, Y.; Guo, Y.; Chen, F.; Zhu, Z.-J. Anal. Chem. 2016, 88, 8757−8764. (32) Fahy, E.; Sud, M.; Cotter, D.; Subramaniam, S. Nucleic Acids Res. 2007, 35, W606−612. (33) Smith, C. A.; O’Maille, G.; Want, E. J.; Qin, C.; Trauger, S. A.; Brandon, T. R.; Custodio, D. E.; Abagyan, R.; Siuzdak, G. Ther. Drug Monit. 2005, 27, 747−751. (34) Wishart, D. S.; Jewison, T.; Guo, A. C.; Wilson, M.; Knox, C.; Liu, Y.; Djoumbou, Y.; Mandal, R.; Aziat, F.; Dong, E.; Bouatra, S.; Sinelnikov, I.; Arndt, D.; Xia, J.; Liu, P.; Yallou, F.; Bjorndahl, T.; Perez-Pineiro, R.; Eisner, R.; Allen, F.; et al. Nucleic Acids Res. 2013, 41, D801−D807. (35) Rustam, Y. H.; Reid, G. E. Anal. Chem. 2018, 90, 374−397. (36) Younossi, Z.; Anstee, Q. M.; Marietti, M.; Hardy, T.; Henry, L.; Eslam, M.; George, J.; Bugianesi, E. Nat. Rev. Gastroenterol. Hepatol. 2017, 15, 11−20. (37) Garcia-Canaveras, J. C.; Donato, M. T.; Castell, J. V.; Lahoz, A. J. Proteome Res. 2011, 10, 4825−4834.

845

DOI: 10.1021/acs.analchem.8b03409 Anal. Chem. 2019, 91, 836−845