LipidMS: an R package for lipid annotation in untargeted liquid

2 days ago - Finally, to exemplify the utility of LipidMS, we investigated the lipidomic serum profile of patients diagnosed with non-alcoholic steato...
0 downloads 0 Views 767KB Size
Subscriber access provided by University of Rhode Island | University Libraries

Article

LipidMS: an R package for lipid annotation in untargeted liquid chromatography-data independent acquisition-mass spectrometry lipidomics Maria Isabel Alcoriza-Balaguer, Juan Carlos García-Cañaveras, Adrian Lopez, Isabel Conde, Oscar Juan, Julian Carretero, and Agustín Lahoz Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.8b03409 • Publication Date (Web): 30 Nov 2018 Downloaded from http://pubs.acs.org on November 30, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

1

LipidMS: an R package for lipid annotation in untargeted

2

liquid chromatography-data independent acquisition-mass

3

spectrometry lipidomics

4 5

María Isabel Alcoriza-Balaguer 1, #, Juan Carlos García-Cañaveras 1, #, Adrián López1, Isabel

6

Conde2, Oscar Juan1,3, Julián Carretero4, Agustín Lahoz 1,*

7 8

1 Biomarkers and Precision Medicine Unit and Analytical Unit, Instituto de Investigación

9

Sanitaria Fundación Hospital La Fe, Valencia 46026, Spain.

10

2 Hepatology Unit. Department of Digestive Medicine Hospital Universitari i Politècnic

11

La Fe, Valencia, 46026, Spain.

12

3 Department of Medical Oncology, Hospital Universitari i Politècnic La Fe, Valencia

13

46026, Spain.

14

4 Department of Physiology, University of Valencia, Burjassot 4100, Spain

15

# These authors contributed equally

16

*To whom correspondence should be addressed. Agustín Lahoz. E-mail:

17

[email protected]. Biomarkers and Precision Medicine Unit, Analytical Unit,

18

Instituto de Investigación Sanitaria Fundación Hospital La Fe, Av. Fernando Abril

19

Martorell 106, Valencia 46026, Spain. Tel: 961246652, Fax: 961246620

20 21

1 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 25

1

Abstract

2

High resolution LC-MS-untargeted lipidomics using data-independent acquisition (DIA)

3

has the potential to increase lipidome coverage as it enables the continuous and unbiased

4

acquisition of all eluting ions. However, the loss of the link between the precursor and

5

the product ions combined with the high dimensionality of DIA data sets hinder accurate

6

feature annotation. Here, we present LipidMS, an R-package aimed to confidently

7

identify lipid species in untargeted LC-DIA-MS. To this end, LipidMS combines a

8

coelution score, which links precursor and fragment ions, with fragmentation and

9

intensity rules. Depending on the MS evidence reached by the identification function

10

survey, LipidMS provides three levels of structural annotations: i) “subclass level”, e.g.,

11

PG(34:1); ii) “fatty acyl level”, e.g., PG(16:0_18:1); and iii) “fatty acyl position level”,

12

e.g., PG(16:0/18:1). The comparison of LipidMS with freely available data-dependent

13

acquisition (DDA) and DIA identification tools showed that LipidMS provides

14

significantly more accurate and structural informative lipid identifications. Finally, to

15

exemplify the utility of LipidMS, we investigated the lipidomic serum profile of patients

16

diagnosed with non-alcoholic steatohepatitis (NASH), which is the progressive form of

17

non-alcoholic fatty liver disease, a disorder underlying a strong lipid dysregulation. As

18

previously published, a significant decrease in lyso- and phosphatidylcholines and

19

cholesterol esters and an increase in phosphatidylethanolamines were observed in NASH

20

patients. Remarkably, LipidMS allowed to identify a new set of lipids that may be used

21

for NASH diagnosis. Altogether, LipidMS has been validated as a tool to assist lipid

22

identification in the LC-DIA-MS untargeted analysis of complex biological samples.

23 24 25 26

Keywords: lipidomics, mass spectrometry, data-independent acquisition, lipid annotation, r-package, non-alcoholic steatohepatitis.

2 ACS Paragon Plus Environment

Page 3 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

1

Lipidomics can be understood as the systems-level scale analysis of lipids and their

2

interacting partners.1 More concretely, from an analytical point of view, it can be defined

3

as the determination of the complete set of lipids (lipidome) present in a given biological

4

sample (e.g. cell, tissue, biofluid, organism…). Lipids are a heterogeneous group of

5

metabolites involved in many biological functions as intermediates or products in

6

signalling pathways, structural components of cell membranes and energy storage sources,

7

among others.1 Alterations in general lipid profiles and in particular lipid species have

8

been identified in many diseases including cancer,2,3 non-alcoholic fatty liver disease,4,5

9

diabetes,6 heart disease,7 and neurological diseases.8 From a quantitative point of view,

10

lipids represent 60-70% of all detected and identified metabolites in the human serum

11

metabolome9 and 20% of the human urine metabolome.10 Based on LIPID MAPS

12

Consortium, lipids are classified into eight classes.11,12 In general, lipids can be described

13

as a combination of various building blocks, usually a core structure that defines their

14

class (e.g., glycerol, sphingoid bases, and cholesterol) and subclass (e.g. polar head

15

groups of phospholipids as phosphocholine and phophoethanolamine) and a variable

16

number of fatty acyl chains (FA) attached to that core structure13 (Figure S1). As a result

17

of the different structural arrangements of the FA into the core structures, isobaric lipids

18

(e.g. PC(18:1/18:1) vs. PC(18:0/18:2)) and isomeric lipids (e.g. PC(16:0/20:4) vs.

19

PC(20:4/16:0)) can be found, hindering their actual identification.

20

Liquid chromatography (LC) coupled to mass spectrometry (MS) is a powerful tool,

21

which enables the comprehensive lipid characterization of biological samples.14 Lipid

22

identification in untargeted MS-based lipidomics usually relies on the combined

23

acquisition of full MS, which provides information about the nominal mass and formula

24

of the lipids, and MS/MS data, which allows to identify the building blocks that compose

25

them.13,14 The most common procedure for the acquisition of MS/MS spectra is to

26

perform a data-dependent acquisition (DDA) in which ions (lipids) of interest are isolated

27

and then subsequently fragmented to obtain their corresponding MS/MS spectra.15 MS

28

data-independent acquisition (DIA) is an alternative to DDA in which no ion isolation is

29

performed and all the ions that elute at a given time are fragmented and detected jointly,

30

thus MS/MS information is obtained for all the eluting compounds. However, the

31

management of DIA data sets is not trivial and it is even more complicated in the case of

32

lipids, where apart from the parent and fragment ions coelution, their building block

33

nature generates a number of fragments that are common to several lipid species and

34

which usually are not well chromatographically resolved (Figure S2). On top of that, the 3 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 25

1

lack of a comprehensive collection of purified well-characterized lipid standards forces

2

lipid identification to be based on the combination of MS and MS/MS data with the only

3

additional support of known fragmentation rules.16 A number of software tools have been

4

developed for the identification of lipids using DDA: MS-DIAL,17 Greazy,18 LipiDex,19

5

LDA,20 or the use of LipidBlast in silico database

6

program. However, only a few freely available tools are designed for DIA-based lipid

7

identification, among them; MS-DIAL,17 Lipid-Pro21 and LipidMatch,22 being MS-DIAL

8

the most used one (based on the number of cites reported by Google Scholar).

9

Here,

we

present

LipidMS,

an

16

searched via NIST MS Search

R

package

(https://CRAN.R-

10

project.org/package=LipidMS) for lipid annotation in LC-DIA-MS. LipidMS calculates

11

a precursor and fragment coelution score (PFCS) for those ions present in a predefined

12

retention time (tR) window, then it applies a set of fragmentation and fragment intensity

13

rules to annotate lipids. Moreover, LipidMS uses either .csv, for already pre-processed

14

data sets, or the common file format for MS data .mzXML as data input formats, thus, it

15

is compatible with multiple mass spectrometer vendors. To assess LipidMS performance,

16

it was first showcased to process LC-DIA-MS data from two test samples (i.e., a standard

17

sample and a pooled human serum sample). These samples were prepared by adding a

18

mixture of 50 representative lipid standards and then analysed using two mass

19

spectrometers (i.e., Agilent Q-ToF 6550 and Waters Synapt G2-Si Q-ToF). LipidMS was

20

also compared with DDA and other DIA existing tools.17 Finally, to exemplify the

21

package utility in a biological context, LipidMS was applied in the lipidomic analysis of

22

serum samples from patients diagnosed with non-alcoholic steatohepatitis (NASH),

23

which is the progressive form of non-alcoholic fatty liver disease (NAFLD), a disorder

24

characterized by a strong lipid dysregulation. NAFLD and NASH have been extensively

25

studied by metabolomics and lipidomics approaches and specific lipid patterns have been

26

proposed as diagnostic and prognostic biomarkers signatures.4,23,24 Not only do our results

27

confirm previously published lipid-related markers, but they also provide a new set of

28

lipids that are now proposed as NASH biomarker lipid-based signature.

29 30 31 32 33

4 ACS Paragon Plus Environment

Page 5 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

1 2

Experimental section

3

Other experimental details about chemicals, lipidome extraction, labelling techniques,

4

LC-MS settings and data processing parameters are provided in the Supporting

5

Information.

6 7

LipidMS processing workflow

8

LipidMS was developed in R programming environment25 and it is available via CRAN

9

(https://CRAN.R-project.org/package=LipidMS). LipidMS includes dedicated functions

10

for: MS-data processing, lipid identification, data import, lipid annotations export, data

11

base customization and creating inclusion list for targeted MS analysis (Table S1).

12 13

Format requirement for lipid annotation functions.

14

LipidMS identification functions require two data inputs: i) a peak table for MS1 and one

15

or two peak tables for MS2, depending on the number of collision energies used and ii)

16

one raw data table for MS1 and one or two raw data tables for MS2, depending on the

17

number of collision energies used. The peak tables are mandatory and are used for

18

identification, while the raw data tables are optional and only used for the calculation of

19

the PFCS. If the raw data tables are not used, the association between parent and

20

fragments ions will be based exclusively on tR windows. Both types of tables are obtained

21

from mzXML files when the dataProcessing function is employed. The peak tables must

22

contain deisotoped and tR aligned peaks. Formally, they have to be stored as data frames

23

containing, at least, the following columns: m/z, tR (in seconds), intensity/area and peak

24

identification (PeakID column). The raw data tables provide scan by scan information of

25

each MS or MS/MS data file and have to contain the following columns: m/z, tR (in

26

seconds), intensity/area, peakID and scan number. These tables can be easily obtained

27

performing data processing with LipidMS, although other approaches can also be used.

28

Data acquired in positive and negative electrospray ionization modes (ESI) have to be

29

provided separately as lipid identification functions apply specific rules for each polarity.

30

For further details the reader is referred to the manual package (https://CRAN.R-

31

project.org/package=LipidMS).

32 33

Data conversion

34

Lipid identification functions within LipidMS require a separate peak list for each 5 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 25

1

collision energy used (e.g. MS1, MS2low and MS2high) as input and peak picking tools

2

usually handle only MS1 as input. Therefore, it is mandatory to convert complex DIA-

3

MS data into a format that can be used for peak picking. To convert raw data into mzXML,

4

MSConvert software (ProteoWizard 3.0.10800 64 bit)26 can be used. The procedure to

5

extract each collision energy file for the raw data is instrument dependent. Here, two Q-

6

ToF independent platforms have been used (i.e., a Waters Synapt G2-Si Q-ToF and an

7

Agilent Q-ToF 6550). For the Waters instrument, raw archive file contains three different

8

data acquisition functions (i.e. MS1, MS2 and lockspray). Lockspray files must be

9

removed and the other functions have to be separated by collision energy and then

10

converted into .mzXML files. Whereas for Agilent, the .d raw data archive is directly

11

converted into single mzXML file and subsequently separated into collision energy

12

independent files using the LipidMS sepByCE function. Figure 1 shows the recommend

13

data processing workflow for LipidMS.

14 15

Peak detection and alignment

16

Data pre-processing (i.e. peak picking, deisotoping and alignment) can be performed

17

using either free GUI software packages such as MZmine,27 XCMS,28 enviPick

18

(https://CRAN.R-project.org/package=enviPick) or commercial software packages such

19

as Progenesis QI or MassHunter Workstation. LipidMS includes a function that takes

20

advantage of enviPick for peak picking and of CAMERA29 for alignment and

21

deisotoping, which is strongly recommended for performing data-processing. Moreover,

22

the use of LipidMS dataProcessing function is the easiest way to get the require data

23

inputs for using the PFCS to complement tR windows for the association of parents and

24

fragments.

25 26

Lipid identification

27

LipidMS allows to efficiently annotate lipids within a wide range of concentrations

28

(Figure S3). However, as general rule the use of saturated signals for lipid identification

29

should be avoided as it deteriorates both mass accuracy and peak shape thus hampering

30

feature annotation. Lipid identification is separately performed using data from positive

31

and negative ESI modes through idPOS and idNEG functions, respectively. Nevertheless,

32

specific lipid classes can be identified alone by using class-defined functions (Table S1).

33

The implementation of LipidMS within a lipidomics workflow is depicted in Figure S4.

6 ACS Paragon Plus Environment

Page 7 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

1

DIA data used to test LipidMS performance and an example script can be found at GitHub

2

(https://github.com/maialba3/LipidMS-data-v1.0).

3 4

Samples included in the study

5 6

Test Samples

7

Two test samples were used to evaluate the performance of LipidMS. These samples were

8

prepared by spiking a mixture containing 50 lipid standards into a blank sample or a

9

pooled human serum sample (Sigma-Aldrich, Madrid, Spain). These lipid standards were

10

selected attending to their biological relevance, their representativeness of lipid classes,

11

and their analytical relevance, to this end isobaric/isomeric species were also included

12

(Table S2).

13 14

Serum samples from patients with NAFLD

15

Patients diagnosed with NAFLD at the Liver Transplantation and Hepatology Unit at the

16

Hospital La Fe (Valencia) were enrolled in this study. NAFLD diagnosis was performed

17

by histological examination of liver biopsy specimens. NAFLD was assessed by using

18

NAFLD activity score (NAS).30 A total of 20 patients with a NAS ≥ 5, which strongly

19

correlates with NASH, were selected. Additionally, 14 serum samples from healthy

20

donors with similar demographic characteristics from the Biobank at IIS-La Fe were

21

selected as control group. All the samples were obtained after receiving informed consent.

22

The study was approved by the Institutional Ethics Committee.

23 24

Results and discussion

25 26

Rationale behind LipidMS

27

LipidMS has been developed in R programming language to serve as an easy-to-use and

28

highly adaptable to end-user tool for assisting lipid annotation in untargeted LC-DIA-MS

29

lipidomics. The building block nature of the majority of lipids enables the establishment

30

of generic structure-derived fragmentation rules that can be used for MS-based

31

identification and structure elucidation. This strategy has been satisfactorily implemented

32

for lipid identification in both DDA and DIA approaches.16,20,22 However, most of the

33

current methods rely on the use of most intense fragments to accomplish lipid 7 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 25

1

identification, which can generate false positives due to the poor selectivity of these ions

2

when coelution is present. In reverse phase chromatography, lipids elution depends on

3

both the lipid class and their FA composition, thus each lipid class usually elutes within

4

a narrow tR window. As a result, many common fragments, as those corresponding to

5

head groups, are poorly chromatographically resolved (Figure S2), which strongly affects

6

their selectivity for lipid annotation. This issue is particularly relevant when complex

7

biological samples are analysed. To overcome these drawbacks, lipid annotation in

8

LipidMS is based on combining two complementary approaches. First, to modulate the

9

stringency in the association of parent with coeluting fragment ions, a PFCS is calculated

10

for all the MS/MS ions present in a predefined tR window around the parent ion. The

11

PFCS score is formally defined as a Pearson correlation coefficient calculated based on

12

the peak shape (distribution of intensities over elution time) of parent and fragment ions

13

and it can be used to test the similarity among those ion chromatograms. This approach

14

has been successfully applied to the analysis of MS-data in the field of metabolomics. 31

15

Second, and most importantly, LipidMS takes advantage of the use of fragmentation and

16

fragment intensity rules. The last are defined based on the relation between the intensities

17

of different fragment ions and are used to elucidate the position of the different FA into

18

the lipid backbone structure. Both fragmentation and intensities rules have been manually

19

curated by using public available spectral information (i.e. LipidMaps,32 Metlin,33

20

LipidBlast,16 HMDB34) and in-house generated MS/MS spectra for DDA and DIA in two

21

different MS/MS platforms (Waters Synapt G2-Si Q-ToF and Agilent Q-ToF 6550). In

22

the fragmentation rules curation procedure, the use of highly intense fragments common

23

to several lipid classes has been avoided when possible and specific well-characterized

24

fragments and adducts have been selected instead. Specific selected fragments as well as

25

the preferred acquisition mode (i.e., ESI+ and ESI-) for each lipid class are summarized

26

in Tables S2-S4. Additionally, the experimental data supporting the selection of the

27

fragmentation rules used by LipidMS are represented in Figures S5-S20.

28 29

Lipid coverage and building block database customization

30

As previously mentioned, most of the lipids can be defined by a backbone structure,

31

which defines the lipid class and subclass, and a number of acyl residues attached to that

32

core structure. Thanks to these features, a lipid database can be built by defining both the

33

lipid core and the set of acyl chains to be incorporated.13 In LipidMS the acyl residues are

34

specified in the building block database (bbDB), where an entity (e.g., FA(16:0) can be 8 ACS Paragon Plus Environment

Page 9 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

1

used as a specific candidate (i.e., FA(16:0)) or as fatty acyl radical of a number of more

2

complex lipids (e.g., PL, GL, SM). By default the bbDB includes 30 fatty acids, 4

3

sphingoid bases, and 3 bile acids (Table S5), which were selected based on their

4

biological relevance.12 LipidMS arranges those chemical entities to build up a query

5

database (QDB), which will be eventually used to interrogate the MS data. The

6

arrangement of the 37 entities included in the default bbDB covers 22 lipid classes and

7

results in 2502 potential molecular formulas and more than 53000 individual lipids. The

8

lipidome coverage provided by the LipidMS can be easily modified by varying the

9

chemical entities provided in the bbDB by just using the createLipidDB function. For

10

instance, odd fatty acyls as FA(19:0) can be included, which would be used as potential

11

candidate or as a part of more complex lipids (e.g., PC(19:0_19:0) or

12

TG(19:0_19:0_19:0)). Additionally, the repertoire of lipids included in the bbDB used to

13

build the QDB can also be exported elsewhere to be used as a library or a target inclusion

14

list (createLipidDB).

15 16

LipidMS annotation workflow

17

LipidMS contains 31 functions aimed to annotate 22 lipid classes using either positive or

18

negative ESI modes (Table S1). To exemplify LipidMS annotation workflow, the

19

annotation procedure for PG(16:0/18:1) is described in Figure 2. Overall, the following

20

steps (internal functions, indicated in italics) are executed within each identification

21

function survey for lipid annotation (i.e. idPGneg):

22

i)

Based on the set of chemical entities included in the bbDB (Table S5) and on

23

the ionization properties selected for each lipid class (Table S6) a target ion

24

list is generated by LipidMS (QDB). This list is subsequently used to

25

interrogate the full MS data within a defined tR window and a mass error gap

26

(findCandidates). These parameters can be easily set up by the user. At this

27

step, putatively annotated lipids are identified based on the lipid class and the

28

number of carbons and double bonds is determined. This level of survey is

29

not reported by LipidMS by default, as we considered it as non-informative.

30

However, this information can be easily recovered by the findCandidates

31

function or the class identification functions (e.g. idPGneg).

32

ii)

The coeluting fragment ions for each putatively annotated lipid are selected

33

based on the defined tR window. Optionally, a PFCS is then calculated for

34

each of the pair ions used for lipid identification and only those above a 9 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 25

1

previously-defined threshold are retained. To minimize false positives, a

2

value of 10 seconds for the tR and a PFCS value of 0.8 are set by default.

3

However, these values can be easily changed by the user (coelutingFrags).

4

iii)

Based on the established fragmentation rules (Tables S3-S4) and on a by

5

default mass error of 10 ppm, a survey of informative fragment ions of the

6

lipid class (e.g. head groups) is performed among the coeluting fragments

7

extracted in step (ii) (checkClass). The mass error used in each survey can be

8

modified by the user (argument ppm_products).

9

iv)

of the fatty acyl component (chainFrags).

10 11

Then, the same procedure is applied for searching fragment ions informative

v)

Based on the proposed fatty acyls components, combinations that sum up the

12

expected total number of carbons and double bonds determined in step (i) are

13

searched in the MS/MS data (combineChains).

14

vi)

Once the fatty acyls components have been determined, intensity rules, which

15

are based on the relative intensities ratios between the fragments, are applied

16

to elucidate the position of those chains (checkIntensityRules). For further

17

details regarding intensity rules see Tables S3 and S4 and previously

18

published data.19

19

Attending to the MS structural evidence reached by each annotation survey, LipidMS

20

provides different levels of structural information:20,35 i) “subclass level”, where specific

21

class fragments (e.g. head groups of phospholipids) are used to determine the subclass

22

and the precursor ion is used to calculate the total number of carbons and double bonds

23

of the chains. At this level, LipidMS cannot differentiate which fatty acids are linked to

24

the backbone and a sum of several isobaric/isomeric compounds is proposed (e.g.

25

PG(34:1)); ii) “fatty acyl level” (FA level), where the composition of the constituent

26

chains is assigned based on chain specific fragments but no positional information is

27

given (e.g. PG(16:0_18:1); and iii) “fatty acyl position level” (FA position level), where

28

the specific position of each chain is elucidated through fragment intensity ratios (e.g.

29

PG(16:0/18:1)).

30

As a result of the execution of lipid identification function (idPOS or idNEG), two

31

separate R objects, which can be easily saved as tables, are generated (i.e., ‘results peak

32

table’ and ‘annotated peak table’). On the one hand, the ‘results peak table’ contains the

33

following information for each annotated lipid: i) feature identity, annotated as lipid class,

34

total number of carbons, double bonds and fatty acid composition, ii) peak properties, 10 ACS Paragon Plus Environment

Page 11 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

1

including m/z, tR, peak intensity and peakID information and iii) identification criteria

2

used, reporting information about adduct/s detected, m/z error, structural annotation level,

3

and the mean PFCS value. On the other hand, the ‘annotated peak table’ links the original

4

MS1 data with the ‘results peak table’, providing the following information for each

5

feature: m/z, tR, peak intensity, peakID, all the possible identities ranked by the annotation

6

level, ion adducts and the mean value of the PFCS used in each lipid identification.

7

Further information about the fragments that support each identification can be explored

8

using class-specific identification functions (i.e. idPGneg).

9

Among the extra functions incorporated in LipidMS two should be further explained due

10

to their utility: i) the getInclusionList function, which builds a list of all annotated lipids

11

with the following information: formula; tR in seconds; monoisotopic neutral mass; and

12

lipid identity. This table may be used to apply the DIA-based identities to automatize

13

targeted peak picking in multiple samples containing only MS data or to prioritize ion

14

fragmentation in DDA-based approaches (Figure S4) and ii) the searchIsotopes function,

15

which allows to identify compound isotopes when labelled compounds are used as tracers

16

(e.g., U-13C-glucose- or U-13C-glutamine-). Here, LipidMS uses a control sample, where

17

the tracer is not present, to generate a target inclusion list of lipids and their corresponding

18

tR. This list is subsequently used to search for isotopes in each tR using the raw data

19

generated in the presence of the tracer. Thus, lipids isotope distributions can be obtained

20

(Figure S21). To test the utility of the searchIsotopes function, A549 cells were incubated

21

in parallel containing either U-12C-D-Glucose or U-13C-D-Glucose. Labelling

22

incorporation into palmitic acid was used as an example showing that LipidMS can

23

effectively assess 13C-patterns when labelled compound are used (Table S7). However,

24

it should be noted that further improvements have to be implemented to take full

25

advantage of LipidMS identifications capabilities when using 13-C-labelled samples.

26 27

Performance evaluation of LipidMS

28

As a first step to test the performance of LipidMS, a mixture of 50 representative lipid

29

standards comprising several lipid classes (Table S2) was used to prepare a standard test

30

sample and to fortify a pooled human serum sample. These two test samples were

31

subsequently analyzed in both positive and negative ESI modes in a Q-ToF mass

32

spectrometer (Agilent Q-ToF 6550). LipidMS was able to identify 49 standards at the

33

subclass level in the standard test sample, among them 42 reached the maximum level of

34

annotation possible for each class (i.e., FA and FA position levels), while for the serum 11 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 25

1

test sample, 47 lipid standards were identified at the subclass level and 45 of them at FA

2

and FA position levels when possible (Table 1-2, S8-S12).

3

Once the reliability of LipidMS was proven we decided to compare it with already

4

available tools. MS-DIAL17 was selected as the software of reference since it is one of

5

the most valuable and cited tools used for lipid identification in both DDA and DIA

6

modes. MS-DIAL employs a combination of mass spectral deconvolution, spectral

7

matching and LipidBlast (an in silico library with a broad lipid coverage) for lipid

8

annotation. LipidMS identified a higher number of lipid standards in both test samples

9

compared to MS-DIAL, independently of the acquisition mode (Tables 1-2).

10

Accordingly, LipidMS also reported a higher number of total identified lipids in the

11

untargeted analysis of the pooled human serum sample (Table 2). Interestingly, although

12

the higher number of identifications was reported by LipidMS, MS-DIAL applied to DIA

13

data also provided a higher number of identifications than MS-DIAL applied to DDA

14

data, which highlights the importance of using DIA approaches. The number of false

15

positive identifications was the only parameter in which DDA slightly outperformed

16

DIA-based approaches in our comparison (Table 1). However, even in that aspect,

17

LipidMS proved superior to MS-DIAL when applied to DIA samples. We would like to

18

remark that MS-DIAL only reports two levels of identification: “annotated”, based on

19

MS data, or “identified”, based on both MS and MSMS data. However, no detailed

20

information about the actual level of structural evidence is reported and the highest level

21

of annotation that can be achieved is FA level. Compared to MS-DIAL, LipidMS provides

22

a more detailed report of the level of structural evidence that supports the identification

23

and thanks to the implementation of fragment intensity rules, a highest level of structural

24

information can be reached (i.e., FA position level). Thus, LipidMS significantly

25

outperformed MS-DIAL in the level of structural information reached in each standard

26

identification (Table S8 and S11).

27

Ideally further comparisons with other commonly used DIA methods as LipidMatch22 or

28

Lipid-Pro21 should have been performed. However, LipidMatch only supports Thermo

29

(Q Exactive) files for DIA, while in Lipid-Pro fragmentation rules have to be manually

30

provided for each lipid class, which was found to be very time-consuming. Moreover, we

31

did not find the way to fully implement the rules employed by LipidMS in Lipid-Pro.

32

Finally, to prove that LipidMS can be used with DIA data obtained from multiple

33

platforms, we decided to compare the results obtained for the two test samples analyzed

34

in two different Q-ToF instruments (i.e., a Waters Synapt G2-Si Q-ToF and an Agilent 12 ACS Paragon Plus Environment

Page 13 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

1

Q-ToF 6550). No significant differences were observed in terms of the number of lipid

2

standards identified in both test samples (Table 1-2, S8, S11, S13-S15), where a 98% and

3

92% of coincidence was achieved respectively. Furthermore, similar lipidomic

4

characterization in terms of the type of lipid classes and the level of structural information

5

reached was observed for both instruments (Figure S22). Altogether, these results proved

6

that LipidMS performance is not dependent on the analytical platform used. However, its

7

suitability for other mass analyzers (e.g. Orbitrap) or other vendors could be further

8

confirmed once the package is used by the MS-based lipidomics community.

9 10

Application of LipidMS in the LC-DIA-MS analysis of NAFLD

11

NAFLD is now the commonest liver disorder in the developed world affecting up to a

12

third of individuals. However, diagnosis is usually based on imaging tests and liver biopsy

13

is required for disease confirmation and staging.36 Therefore, finding new non-invasive

14

NAFLD diagnosis and prognosis biomarkers has aroused much interest. An important

15

number of studies have relied on metabolomics or lipidomics for metabolite biomarkers

16

discovery.4,23,24,37 Here, LipidMS was applied for the LC-DIA-MS untargeted analysis of

17

serum samples of patients diagnosed with NASH and of healthy donors. The baseline

18

characteristics of the patients enrolled in the study are summarized in Table S16. The

19

groups were similar with respect to gender, age, body mass index, fasting blood sugar,

20

and hepatic synthetic functions. A pooled sample was generated by mixing equal amounts

21

of each sample and used for lipid identification based on DIA-MS/MS. Combining both

22

positive and negative ionization modes, 258 lipids were identified in the pooled sample

23

and then extracted from the rest of the samples based on their accurate m/z and tR.

24

Principal component analysis showed a clear separation between control and NASH

25

groups (Figure 3A), suggesting differences in their underlying lipidomic profiles. In total,

26

22 lipids were significantly altered between control and NASH patients (p-value ≤ 0.05

27

and a |log2 Fold of Change| ≥ 1) (Figure 3B). Moreover, when analyzing generic trends

28

based on the sum of the intensities of the lipids belonging to a given class, a significant

29

decrease in PC, LPC, and CE and an increase in PE were observed for NASH patients

30

(Figure 3C). These observations are in agreement with previously published data where

31

it is suggested that these lipid species could play a role in disease progression.4,23,24

32

Furthermore, LipidMS was also able to identify some specific lipids that have been

33

previously proposed as NAFLD or NASH biomarkers (e.g. PE(16:0/22:6), PE(18:0/22:6),

34

PC(16:0/20:4) and TG(54:5) among others4,23,24 (Figure 3D). Interestingly, LipidMS also 13 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 25

1

identified a set of new potential biomarkers of NASH (Table S17). However, this

2

lipidomic signature should be further confirmed in a larger cohort of NASH patients.

3

Overall, our results confirmed previously published data and validated LipidMS for DIA

4

data analysis in untargeted LC-MS lipidomic approaches involving complex biological

5

samples.

6 7

Conclusions

8

A new freely available method for DIA data sets analysis in LC-MS untargeted

9

lipidomics, namely LipidMS, has been developed. The new method takes advantage of

10

combining curated fragmentation and intensity rules with a parent and fragment coelution

11

score, which is calculated in predefined retention time windows for the reliable

12

identification of lipids. LipidMS provides wide lipid coverage and it is easily

13

customizable thanks to the use of R environment.25 Compared to existing DDA and DIA

14

tools (MS-DIAL), LipidMS significantly detected a higher number of lipids in the

15

analysis of two test samples (standard and human serum samples). Moreover, LipidMS

16

provides a detailed description of the level of structural information achieved for each

17

identified lipid and thanks to the fragment and intensity rules implemented in LipidMS a

18

higher level of structural information can be reached (FA position level, compared to FA

19

composition that is the highest level reached by other tools). Data analysis independency

20

and reproducibility was also proved by comparing the results obtained by two

21

independent Q-ToF analytical platforms (Waters Synapt G2-Si Q-ToF and Agilent Q-

22

ToF 6550). LipidMS usefulness was further demonstrated when it was applied to the

23

analysis of real clinical samples, that is NASH serum samples, where not only previously

24

identified lipid patterns were corroborated, but also a new set of biomarkers was

25

proposed. Altogether, LipidMS has been validated as a tool to assist lipid identification

26

in LC-DIA-MS untargeted the analysis of complex biological samples.

27 28

14 ACS Paragon Plus Environment

Page 15 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51

Analytical Chemistry

References (1) Wenk, M. R. The emerging field of lipidomics. Nat Rev Drug Discov 2005, 4, 594-610. (2) Hilvo, M.; Denkert, C.; Lehtinen, L.; Muller, B.; Brockmoller, S.; Seppanen-Laakso, T.; Budczies, J.; Bucher, E.; Yetukuri, L.; Castillo, S.; Berg, E.; Nygren, H.; Sysi-Aho, M.; Griffin, J. L.; Fiehn, O.; Loibl, S.; Richter-Ehrenstein, C.; Radke, C.; Hyotylainen, T.; Kallioniemi, O., et al. Novel theranostic opportunities offered by characterization of altered membrane lipid metabolism in breast cancer progression. Cancer Res 2011, 71, 3236-3245. (3) Patterson, A. D.; Maurhofer, O.; Beyoglu, D.; Lanz, C.; Krausz, K. W.; Pabst, T.; Gonzalez, F. J.; Dufour, J. F.; Idle, J. R. Aberrant lipid metabolism in hepatocellular carcinoma revealed by plasma metabolomics and lipid profiling. Cancer Res 2011, 71, 6590-6600. (4) Puri, P.; Baillie, R. A.; Wiest, M. M.; Mirshahi, F.; Choudhury, J.; Cheung, O.; Sargeant, C.; Contos, M. J.; Sanyal, A. J. A lipidomic analysis of nonalcoholic fatty liver disease. Hepatology 2007, 46, 1081-1090. (5) Garcia-Canaveras, J. C.; Peris-Diaz, M. D.; Alcoriza-Balaguer, M. I.; Cerdan-Calero, M.; Donato, M. T.; Lahoz, A. A lipidomic cell-based assay for studying drug-induced phospholipidosis and steatosis. Electrophoresis 2017, 38, 2331-2340. (6) Rhee, E. P.; Cheng, S.; Larson, M. G.; Walford, G. A.; Lewis, G. D.; McCabe, E.; Yang, E.; Farrell, L.; Fox, C. S.; O'Donnell, C. J.; Carr, S. A.; Vasan, R. S.; Florez, J. C.; Clish, C. B.; Wang, T. J.; Gerszten, R. E. Lipid profiling identifies a triacylglycerol signature of insulin resistance and improves diabetes prediction in humans. J Clin Invest 2011, 121, 1402-1411. (7) Meikle, P. J.; Wong, G.; Tsorotes, D.; Barlow, C. K.; Weir, J. M.; Christopher, M. J.; MacIntosh, G. L.; Goudey, B.; Stern, L.; Kowalczyk, A.; Haviv, I.; White, A. J.; Dart, A. M.; Duffy, S. J.; Jennings, G. L.; Kingwell, B. A. Plasma lipidomic analysis of stable and unstable coronary artery disease. Arterioscler Thromb Vasc Biol 2011, 31, 2723-2732. (8) Han, X.; Rozen, S.; Boyle, S. H.; Hellegers, C.; Cheng, H.; Burke, J. R.; Welsh-Bohmer, K. A.; Doraiswamy, P. M.; Kaddurah-Daouk, R. Metabolomics in early Alzheimer's disease: identification of altered plasma sphingolipidome using shotgun lipidomics. PLoS One 2011, 6, e21643. (9) Psychogios, N.; Hau, D. D.; Peng, J.; Guo, A. C.; Mandal, R.; Bouatra, S.; Sinelnikov, I.; Krishnamurthy, R.; Eisner, R.; Gautam, B.; Young, N.; Xia, J.; Knox, C.; Dong, E.; Huang, P.; Hollander, Z.; Pedersen, T. L.; Smith, S. R.; Bamforth, F.; Greiner, R., et al. The human serum metabolome. PLoS One 2011, 6, e16957. (10) Bouatra, S.; Aziat, F.; Mandal, R.; Guo, A. C.; Wilson, M. R.; Knox, C.; Bjorndahl, T. C.; Krishnamurthy, R.; Saleem, F.; Liu, P.; Dame, Z. T.; Poelzer, J.; Huynh, J.; Yallou, F. S.; Psychogios, N.; Dong, E.; Bogumil, R.; Roehring, C.; Wishart, D. S. The human urine metabolome. PLoS One 2013, 8, e73076. (11) Fahy, E.; Subramaniam, S.; Brown, H. A.; Glass, C. K.; Merrill, A. H., Jr.; Murphy, R. C.; Raetz, C. R.; Russell, D. W.; Seyama, Y.; Shaw, W.; Shimizu, T.; Spener, F.; van Meer, G.; VanNieuwenhze, M. S.; White, S. H.; Witztum, J. L.; Dennis, E. A. A comprehensive classification system for lipids. J Lipid Res 2005, 46, 839-861. (12) Fahy, E.; Subramaniam, S.; Murphy, R. C.; Nishijima, M.; Raetz, C. R.; Shimizu, T.; Spener, F.; van Meer, G.; Wakelam, M. J.; Dennis, E. A. Update of the LIPID MAPS comprehensive classification system for lipids. J Lipid Res 2009, 50 Suppl, S9-14. (13) Han, X.; Yang, K.; Gross, R. W. Multi-dimensional mass spectrometry-based shotgun lipidomics and novel strategies for lipidomic analyses. Mass Spectrom Rev 2012, 31, 134-178. (14) Cajka, T.; Fiehn, O. Comprehensive analysis of lipids in biological systems by liquid chromatography-mass spectrometry. Trends Analyt Chem 2014, 61, 192-206. (15) Zhu, X.; Chen, Y.; Subramanian, R. Comparison of information-dependent acquisition, SWATH, and MS(All) techniques in metabolite identification study employing ultrahighperformance liquid chromatography-quadrupole time-of-flight mass spectrometry. Anal Chem 2014, 86, 1202-1209. 15 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

Page 16 of 25

(16) Kind, T.; Liu, K. H.; Lee, D. Y.; DeFelice, B.; Meissen, J. K.; Fiehn, O. LipidBlast in silico tandem mass spectrometry database for lipid identification. Nat Methods 2013, 10, 755-758. (17) Tsugawa, H.; Cajka, T.; Kind, T.; Ma, Y.; Higgins, B.; Ikeda, K.; Kanazawa, M.; VanderGheynst, J.; Fiehn, O.; Arita, M. MS-DIAL: data-independent MS/MS deconvolution for comprehensive metabolome analysis. Nat Methods 2015, 12, 523-526. (18) Kochen, M. A.; Chambers, M. C.; Holman, J. D.; Nesvizhskii, A. I.; Weintraub, S. T.; Belisle, J. T.; Islam, M. N.; Griss, J.; Tabb, D. L. Greazy: Open-Source Software for Automated Phospholipid Tandem Mass Spectrometry Identification. Anal Chem 2016, 88, 5733-5741. (19) Hutchins PD, R. J., Coon JJ. LipiDex: An Integrated Software Package for High-Confidence Lipid Identification. Cell Systems 2018, 6, 621-625. (20) Hartler J, T. A., Ziegl A, Trötzmüller M, Rechberger GN, Zeleznik OA, Zierler KA, Torta F, Cazenave-Gassiot A, Wenk MR, Fauland A, Wheelock CE, Armando AM, Quehenberger O, Zhang Q, Wakelam MJO, Haemmerle G, Spener F, Köfeler HC, Thallinger GG. Deciphering lipid structures based on platform-independent decision rules. Nature Methods 2017, 14, 1171-1174. (21) Ahmed, Z.; Mayr, M.; Zeeshan, S.; Dandekar, T.; Mueller, M. J.; Fekete, A. Lipid-Pro: a computational lipid identification solution for untargeted lipidomics on data-independent acquisition tandem mass spectrometry platforms. Bioinformatics 2015, 31, 1150-1153. (22) Koelmel JP, K. N., Ulmer CZ, Bowden JA, Patterson RE, Cochran JA, Beecher CWW, Garrett TJ, Yost RA. LipidMatch: an automated workflow for rule-based lipid identification using untargeted high-resolution tandem mass spectrometry data. BMC Bioinformatics 2017, 18, 112. (23) Kavya Anjani, M. L., Nataliya Sokolovska, Christine Poitou, Judith Aron-Wisnewsky, Jean-Luc Bouillot, Philippe Lesnik, Pierre Bedossa, Anatol Kontush, Karine Clement, Isabelle Dugail, Isabelle Dugail, Isabelle Dugail, Joan Tordjman. Circulating phospholipid profiling identifies portal contribution to NASH signature in obesity. Journal of Hepatology 2015, 62, 905-912. (24) Puri P, W. M., Cheung O, Mirshahi F, Sargeant C, Min HK, Contos MJ, Sterling RK, Fuchs M, Zhou H, Watkins SM, Sanyal AJ. The Plasma Lipidomic Signature of NonalcoholicSteatohepatitis. Hepatology 2009, 50, 1827-1838. (25) R Core Team. R Foundation for Statistical Computing, 2016. (26) R Core Team. R: A language and environment for statistical computing. 2008. (27) Pluskal, T.; Castillo, S.; Villar-Briones, A.; Oresic, M. MZmine 2: modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data. BMC Bioinformatics 2010, 11, 395. (28) Smith, C. A.; Want, E. J.; O'Maille, G.; Abagyan, R.; Siuzdak, G. XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Analytical chemistry 2006, 78, 779-787. (29) Kuhl, C.; Tautenhahn, R.; Bottcher, C.; Larson, T. R.; Neumann, S. CAMERA: an integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets. Analytical chemistry 2012, 84, 283-289. (30) Brunt EM, K. D., Wilson LA, Belt P, Neuschwander-Tetri BA; NASH Clinical Research Network (CRN). Nonalcoholic fatty liver disease (NAFLD) activity score and the histopathologic diagnosis in NAFLD: distinct clinicopathologic meanings. Hepatology 2011, 53, 810-820. (31) Hao Li, Y. C., Yuan Guo, Fangfang Chen, and Zheng-Jiang Zhu. MetDIA: Targeted Metabolite Extraction of Multiplexed MS/MS Spectra Generated by Data-Independent Acquisition. Analytical chemistry 2016, 88, 8757-8764. (32) Fahy, E.; Sud, M.; Cotter, D.; Subramaniam, S. LIPID MAPS online tools for lipid research. Nucleic Acids Res 2007, 35, W606-612. (33) Smith, C. A.; O'Maille, G.; Want, E. J.; Qin, C.; Trauger, S. A.; Brandon, T. R.; Custodio, D. E.; Abagyan, R.; Siuzdak, G. METLIN: a metabolite mass spectral database. Ther Drug Monit 2005, 27, 747-751. (34) Wishart, D. S.; Jewison, T.; Guo, A. C.; Wilson, M.; Knox, C.; Liu, Y.; Djoumbou, Y.; Mandal, R.; Aziat, F.; Dong, E.; Bouatra, S.; Sinelnikov, I.; Arndt, D.; Xia, J.; Liu, P.; Yallou, F.; Bjorndahl, T.; 16 ACS Paragon Plus Environment

Page 17 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1 2 3 4 5 6 7 8 9 10 11

Analytical Chemistry

Perez-Pineiro, R.; Eisner, R.; Allen, F., et al. HMDB 3.0--The Human Metabolome Database in 2013. Nucleic Acids Res 2013, 41, D801-807. (35) Yepy Hardi Rustam, a. G. E. R. Analytical Challenges and Recent Advances in Mass Spectrometry Based Lipidomics. Analytical chemistry 2018, 90, 374-397. (36) Younossi, Z.; Anstee, Q. M.; Marietti, M.; Hardy, T.; Henry, L.; Eslam, M.; George, J.; Bugianesi, E. Global burden of NAFLD and NASH: trends, predictions, risk factors and prevention. Nature reviews. Gastroenterology & hepatology 2018, 15, 11-20. (37) Garcia-Canaveras, J. C.; Donato, M. T.; Castell, J. V.; Lahoz, A. A comprehensive untargeted metabonomic analysis of human steatotic liver tissue by RP and HILIC chromatography coupled to mass spectrometry reveals important metabolic alterations. Journal of proteome research 2011, 10, 4825-4834.

12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 17 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 25

1

Acknowledgements

2 3 4

This work has been supported by the European Regional Development Fund (FEDER) Institute of Health Carlos III of the Spanish Ministry of Economy and Competitiveness (PI14/0026 and PI17/01282).

18 ACS Paragon Plus Environment

Page 19 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figures

Figure 1. Simplified diagram of LipidMS operations.

19 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 25

Figure 2. Flow diagram of lipid annotation in LipidMS. The steps for the identification of 747.5177 m/z with a tR of 285 seconds is shown as an example.

20 ACS Paragon Plus Environment

Page 21 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figure 3. Lipidome alterations in the serum of NASH patients. (A) Principal component analysis scores plot for the control and NASH samples; (B) Volcano plot of the 258 lipids annotated by LipidMS and coloured by lipid class, significant differential abundance for lipid species was assigned to p value 1.5. (C) Boxplots showing significant changes in lipid classes; (D) Boxplots showing significant changes for lipids that have been previously reported as NASH biomarkers detected by LipidMS. Mann-Whitney U tests were used to calculate statistical significance, and p values were corrected using the Benjamini-Hochberg procedure. *, p value < 0.05; **, p value < 0.01; ***, p value < 0.001

21 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 25

Tables Table1. Summary of the lipid standards identified in the test sample using the Agilent Q-ToF 6550. Possible Levels of structural annotation (PFCS ≥ 0.8)

Class FA (16)1 CE (1) 1

LPL (1) 1

Subclass Level3 Subclass Level Fatty Acyl Level

3

Subclass Level Fatty Acyl Level

3

Subclass Level PL (11)

1

Fatty Acyl Level Fatty Acyl Position Level

Cer (2) 1 SM (1) 1 Glycerolipids (9) 1 Bile acids (9) 1, 2

LIPIDMS

MS-DIAL

DIA

DDA

DIA

16

12

9

1

0

1

0

0

0

0

0

0

1

1

1

0

0

0

4

9

11

6

0

0

Fatty Acyl Level3

2

2

2

Subclass Level

0

1

1

Fatty Acyl Level3

1

0

0

2

5

6

7

0

0

9

-

-

Total identified standards

49/50

30/41***

31/41***

Total identified standards at max. annotation level

42/50

29/41**

29/41***

9

4

23

3

Fatty Acyl Level Fatty Acyl Position Level

3

Subclass Level

Total number of false positives 4

(1) denotes the total number of lipids per class, (2) MS-DIAL does not support bile acids identification, (3) in bold the maximum level of structural annotation reached in each lipid class, (4) false positives identities are annotated based on molecular ion and characteristic lipid fragment, specific identities are listed in Table S10. Statistical p-value was calculated by 2 test * p