Lipostar, a Comprehensive Platform-Neutral Cheminformatics Tool for

May 4, 2017 - The isotopic pattern relation in Lipostar was developed to take into ... (5) Once the lipid identification step is performed (see below)...
1 downloads 0 Views 754KB Size
Subscriber access provided by Van Pelt and Opie Library

Article

Lipostar, a comprehensive platformneutral cheminformatics tool for lipidomics Laura Goracci, Sara Tortorella, Paolo Tiberi, Roberto Maria Pellegrino, Alessandra Di Veroli, Aurora Valeri, and Gabriele Cruciani Anal. Chem., Just Accepted Manuscript • Publication Date (Web): 04 May 2017 Downloaded from http://pubs.acs.org on May 4, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Analytical Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 9

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Lipostar, a comprehensive platform-neutral cheminformatics tool for lipidomics Laura Goracci1†, Sara Tortorella1†, Paolo Tiberi2, Roberto Maria Pellegrino1, Alessandra Di Veroli1, Aurora Valeri1, Gabriele Cruciani1* 1 2

Department of Chemistry, Biology and Biotechnology, University of Perugia, Via Elce di Sotto 8, 06123 – Perugia, Italy Molecular Discovery Ltd., Pinner, Middlesex, London, United Kingdom

lipidomics, bioinformatics, LC-MS, drug safety ABSTRACT: To date, the main limitations for LC-MS-based untargeted lipidomics reside in the lack of adequate computational and cheminformatics tools that are able to support the analysis of several thousands of species from biological samples, enabling data mining and automating lipid identification and external prediction processes. To address these issues, we developed Lipostar, a novel vendor-neutral high-throughput software that effectively supports both targeted and untargeted LC-MS lipidomics, implementing data acquisition, user-friendly multivariate analysis (to be used for model generation and new sample predictions), and advanced lipid identification protocols that can work with or without the support of preformed lipid databases. Moreover, Lipostar integrates the lipidomic processes with a full metabolite identification (MetID) procedure, essential in drug safety applications and in translational studies. Case studies demonstrating a number of Lipostar features are also presented.

INTRODUCTION Lipids are essential molecules for biological systems, determining the bio-functionality, forming the structural components of cell membranes, storing energy, regulating and controlling cellular signalling, function and disease.1,2 Alteration of lipid regulation can assist in the pathophysiology of diseases including diabetes, obesity, heart diseases, infectious diseases or neurodegenerative diseases.1,3 Consequently, lipidomics, a sub-speciality of metabolomics, represents an emerging field to investigate the lipidome of diseases epidemiology, with the aim of unravelling diagnostic biomarkers, new drug targets, and of rationalizing toxicity effects. Mass spectrometry (MS), due to its sensitivity and selectivity, is the elected method for qualitative and quantitative lipidomics analysis, and the recent improvements in MS technologies have significantly improved the knowledge about lipid metabolism and impairment processes at the level of individual species.4,5 Nowadays, soft ionization techniques such as electrospray ionization (ESI)6,7 are broadly used in lipidomics; lipid separation is usually performed prior to the use of MS (e.g. by liquid chromatography or ion mobility), but direct infusion of a lipid solution into the MS instrument (i.e. shotgun lipidomics) represents a valuable alternative. Both approaches have been extensively described elsewhere,1,8 and each one has its advantages and pitfalls.9 The recent improvement of high-throughput lipidomics techniques has shifted interest from targeted to untargeted approaches. According to SciFinder,10 the term “untargeted lipidomics” first appeared in 2012 and until now only 18 publications have been reported. Although both targeted and untargeted approaches have again their own advantages and limita-

tions,9 the use of one or the other mostly depends on the level of knowledge prior to the lipidomics analysis. In targeted lipidomics, predefined lipid species are monitored, aiming to quantify potential lipid biomarkers. However, the selection of lipids to be monitored will strongly influence the final results (Figure 1a-b). An untargeted approach is, on the other hand, unbiased and applicable in an early phase of research, aiming at revealing possible lipid impairment induced by a disease state or by drug treatments. Not truly quantitative, this approach provides a blurrier description of the entire lipidome (Figure 1c), although advances in LC-MS analysis have contributed to make untargeted lipidomics accurate enough for sample comparison.9 In addition, an untargeted approach can represent a starting point to drive a targeted one at a later stage, as exemplified in Figure 1d. Another consequence of the upcoming high-throughput approaches for the characterization of complex lipid mixtures is that computational tools have become essential to handle the massive amount of generated data.11–15 Such in silico support has greatly accelerated the advances in the field, as demonstrated by the ever-increasing number of publications on lipidomics year after year (over 1500 in the last five years).10 As summarized by Song et al.,16 these in silico tools share a general procedure for data processing, dealing with five general steps: 1) baseline or noise reduction; 2) smoothing; 3) signal to noise ratio calculation; 4) peak extraction; 5) deisotoping and deconvolution. Afterwards, identification or statistical analysis can be performed on the processed data. A number of hardware vendor-dependent (LipidView17-SCIEX; LipidSearch18-THERMO FISHER; MassHunter19 and Mass Profiler Professional20-AGILENT) or vendor-neutral (SimLipid21, MZmine22, MS-DIAL23 among others) bioinformatics tools for

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

lipidomics have been proposed in the recent years, but unfortunately most of these software only partially cover the requirements for untargeted lipidomics (Table 1). Indeed, in addition to the quantitative aspects, the untargeted lipidomics approach is usually based on the comparison of samples from complex lipid mixtures. It follows that multivariate analysis represents an essential tool to transform information from single samples to knowledge about a given biological effect, but the algorithms supported are often very limited. In addition, when untargeted lipidomics is applied to pharmaceutical and medical scenarios, two novel needs appear: i) when drug safety studies are performed, the drug used for treatment and its metabolites can be trapped in the biological matrix, strongly influencing the statistical analysis if not removed; ii) if lipidomics outcomes are used for the detection of a potential disease state, predictive statistical tools are needed. Thus, it can be concluded that, especially in untargeted lipidomics, the complexity of lipids and their regulation at multiple levels requires a full integration across methodologies such as algorithm and software development, cheminformatics, biophysical and pathway modeling, and high performance computing.

Page 2 of 9

cation. Targeted lipidomics and quantitative lipid analysis are also supported, associated or not with previous untargeted analyses. How Lipostar performs the single steps of the lipidomics workflow will be described in the next sections, namely: raw data handling, data processing, multivariate analysis, and lipid identification, and additional details are provided in the Supporting Information (SI). Finally, two simple case studies are reported to illustrate the bases of the Lipostar workflow. Table 1. Lipidomics bioinformatic tools and platform comparisons. VN: vendor neutral; PD: peak detection; M: metabolites identification; ID: lipid identification; Q: quantification; MA: multivariate analysis (PCA, CPCA, PLS, PLS-DA, PLSTP); EP: external predictions. ID

Q

LipidView

VN

PD



M





LipidSearch







MassHunter











Mass Profiler Professional SimLipid

MA

EP



✓a

















✓b

MS-DIAL











✓b

Lipostar













MZmine c



a

: only class prediction available; b: only PCA available; c: specifically designed for data-independent acquisitions

Figure 1. Exemplification of the difference between targeted and untargeted approaches to lipidomics. Assuming that the overall lipid profile associated with a biological sample is the painting of Mona Lisa and that this painting is not disclosed, using a targeted approach accurate and quantitative analyses are performed; in this case, the nature of the painting (lipidome) may be still revealed by details to an expert eye, but interpretation strongly depends on the region selected for inspection (a), and (b). Using an untargeted approach, results are less accurate but still informative (c) and they can drive a targeted analysis as a second step (d).

To address these challenges, in this paper we present Lipostar, to the best of our knowledge the first fully automated and vendor-neutral bioinformatics procedure tailored for untargeted lipidomics using LC-MS; comprising raw data processing, multivariate statistical tools for modeling and external predictions, and automatic drug metabolite and lipid identifi-

WORKFLOW AND METHODS Raw data handling. Developing Lipostar, our first aim was to design a tool dealing with multiple input file sources. To date, files acquired on Agilent Q-TOF (.d), Waters Q-TOF (.raw), Bruker Q-TOF (.d), ABSciex Triple-TOF (.wiff), Thermo Ion-Trap and Orbitrap (.raw) are currently supported by Lipostar, either directly or by using a converter with the additional vendor specific libraries. Both data dependent (DDS, DDA) and independent (MSE, All ions, all ion fragmentations, SWATH or Broad band) acquisition modes are supported, as well as positive and negative polarity in separate acquisitions or single acquisition such as for Thermo. Data processing and alignment. As previously mentioned, according to Song et al.16 bioinformatics tools for lipidomics generally perform data processing in five steps: 1) baseline and noise reduction, 2) peak extraction; 3) smoothing; 4) signal to noise ratio and 5) deisotoping and deconvolution. In Lipostar, the first four steps match the general trend, dealing with noise reduction, peak recognition and smoothing. Indeed, these steps are generally needed in computational tools for LC-MS analysis, from –omics to MetID studies. Thus, to cover steps 1-4, the algorithms previously developed in MassMetaSite (a widely used software for drug metabolite identification) are used in Lipostar with minor modifications, and general details are described elsewhere.24 Concerning the applied modifications, the Savitzky-Golay algorithm25 for smoothing was added as an alternative to the Mass-MetaSite26 smoothing algorithm, as we found performed better in complex matrices. All of the standard data processing parameters can be adjusted based on the instrumentation and acquisition mode used; nevertheless, default settings for each supported instrument are also provided, based on in house testing on experimental raw data (see Fig. S1-1). Compared to exclusively

ACS Paragon Plus Environment

Page 3 of 9

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

targeted software for LC-MS data elucidation, the major innovative point in Lipostar concerns the deisotoping pattern and deconvolution step. In the study of lipids from complex matrices the co-elution of lipids with very similar structure (e.g. differing for one double bond only) is likely to occur, and thus the recognition of the isotopic pattern for each species can be challenging. In targeted lipidomics, the starting point in deisotoping is the chemical formula of the known compound, which allows the generation of the theoretical patterns. In untargeted lipidomics, the chemical formula of all lipids in the matrix is not given a priori, removing the possibility to use subtraction algorithms27 generally applied in targeted approaches. To overcome this limit, Lipostar includes an additional step in data processing that is termed the samples alignment. As untargeted lipidomics usually deals with a comparison of lipid profiles from several samples, data processing after peak extraction in Lipostar can be summarized as follows: • The lipid profile of each analytical sample is initially defined as a series of mass-to-charge ratios at given retention times (m/z@RT); at this stage, isotopic peaks of a lipid as well as peaks related to different adducts are not clustered and are treated as distinct entities; • The so defined lipid profiles are aligned, to form a first data matrix where rows represent the analysed samples and columns represent the m/z@RT entries; m/z and retention times (RT) tolerance for the alignment process is tuneable by the user to fit data from different instruments; • After alignment, a column-wise search for isotopic patterns is performed among those peaks that possess similar RT, according to a strict but slightly tuneable tolerance. The isotopic pattern relation in Lipostar was developed to take into account theoretical spacing and abundance based on a series of chemical formula that are compatible with the m/z value under investigation. The shape of the potentially correlated peaks is also evaluated; • Upon isotopic pattern clustering, a new matrix is generated, with the same number of rows (samples) but a reduced number of columns. In this second matrix, adducts for a lipid are still not clusterized, so that they will occupy different columns. Indeed, at this stage the lack of additional structural data does not allow a reliable clustering process. Nevertheless, this second matrix can be used for multivariate statistical analysis as is, reasonably assuming that adduct peaks related to the same lipids will be highly correlated each other. • Once the lipid identification step is performed (see below), a third matrix that takes into account adduct clustering can be generated and used for further statistical analysis. The handling of large amounts of data as a matrix from the data processing stage represents a central feature that not only allows to by-pass the issue of isotopic patterns for unknown compounds but also to enrich the information needed for identification. Firstly, it can happen that the baseline and noise suppression routines as well as signal-to-noise algorithm could remove lipids with weak signal in a sample, but the same lipid is detected in other samples (e.g. due to higher intensity). Working column-wise, Lipostar analyses the matrix-gaps and it comes back to the total ion chromatogram of the corresponding sample to search for the presence of the missing signal, to eventually add it in the matrix. The reduction of matrix-gaps is extremely important in statistical multivariate methods, due to the potentially detrimental weight of missing variables. The

second advantage of the matrix-based approach in Lipostar relies in the possibility to selectively enrich the MS/MS data associated with a given data matrix. Indeed, in untargeted lipidomics, the lack of a large amount of MS/MS data in a first analytical run could occur due to potential co-elution of a large number of lipids at the same time. Additionally, as also reported by Dennis et al.5, a valuable strategy to increase the identification coverage with MS/MS data could be to initially operate the mass spectrometer in the full-scan mode (e.g. Full-MS pos/neg in Thermo or MS-FPS Fast Polarity Switching in Agilent) to search for new m/z and in a further investigation to export the list of important compounds as inclusion or preferred lists to carry out DDS experiments on only few representative samples. Bearing this in mind, Lipostar was designed to allow the enrichment of the data matrix with MS/MS data using a three-steps approach: 1) selection of the lipid of interest lacking MS/MS data, to generate an inclusion list for the instrument used; 2) automatic extraction of the minimum number of samples to re-run in order to detect all of the compounds in the list; 3) once the extracted samples are reanalysed, automatic upload of the MS/MS data for the selected lipids to be associated with the original data matrix. Therefore, a second identification process will be based on more MS/MS data reducing false discovery rate, without wasting time to reprocess all of the data. Finally, the Lipostar data matrices are connected to a number of inspection tools. For instance, detected artefacts28 can be visualized in a m/z vs. RT plot prior to exclusion (Figure S1-2). Multivariate Analysis. Multivariate analysis has emerged as a powerful tool for supporting lipidomics investigations.12,29–35 Individual lipid species, lipid families, or specific lipid changes from sample to sample can be easily revealed using multivariate statistical procedures. To this end, different unsupervised (e.g. principal component analysis: PCA36,37, Consensus PCA38,39) and supervised (e.g. partial least squares: PLS40, PLS-DA41, PLS-TP42) algorithms for multivariate analysis are available in Lipostar. However, the tendency of lipidomics tools is to make multivariate statistical analysis as simple as possible for the user, leading in many cases to “black boxes” in which advanced data interpretation (e.g. spurious variables, statistical artefacts) is very limited. In Lipostar, instead of opting for “black box” simplicity we decided to implement user-friendly advanced tools for the visualization and interpretation of the statistical analysis outcomes, as well as the evaluation of robustness. For instance, prior to selecting variables (i.e. lipids) to be used for a statistical purpose, the inspection of the frequency plot (frequency of variable occurrence vs. number of sample, Figure 2a) can represent a valid statistical approach for variables selection to reduce the risk of spurious relationships associated with the use of less populated variables. Once a model is built, the relationships among lipids can be explored using a number of plots (loadings plot, weights plot, and coefficients plot). To this aim, the coefficients plot shown in Figure 2b provides the users with a clear picture of the m/z@RT variables, i.e. potential lipids, which are important for the observed correlation: positive coefficients are associated to more expressed lipids, while negative coefficients to less expressed ones. In addition, the box and whiskers plot is also available to analyse the different lipid or lipid family distribution among two or more samples (Figure 2c). Variables that emerge as important from the statistical analysis can also be inspected in the m/z vs. RT plot,

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 9

Figure 2. Examples of visualization tools available in Lipostar for interpreting multivariate analysis: (a) compound frequency plot, representing the number of samples that share a specific number of compounds; (b) PLS coefficients plot (c) box and whiskers plot showing the distribution of triacylglycerols in samples belonging to Group A and samples belonging to Group B (d) m/z vs. RT plot with blue points representing automatically identified lipids; (e) Coomans plot.

suggesting whether or not they belong to a specific lipid family (Figure 2d). Moreover, training set or test set samples distance to a given model can also be displayed using a Coomans plot43 (Figure 2e). Finally, a major goal in statistical analysis is the “Prediction” step, in which the developed statistical models can be used to predict the state/condition of novel samples for the purposes of early discovery, prevention, or monitoring of disease pathology. To simplify the prediction step, Lipostar records all of the steps performed for model building, and is subsequently able to read the novel data and automatically repeat each step with the same settings to ultimately project the novel samples onto the model and make the prediction, limiting the risk of error by the user (Figure S1-3). Lipid Identification. Lipid identification represents one of the most critical tasks toward biomarker discovery. To address this issue, a number of tools and procedures have been developed and proposed in recent years.23,44–46 Usually, such identification tools inquire a precompiled (public or customized) database of lipids and relative MS/MS fragmentations.23,44,45 However, this last workflow, being exclusively database driven, is far from the goal of an untargeted approach. Moreover, one drawback of dealing with in silico generated lipids and relative fragmentations is that the number of false positives is likely to increase, and therefore a number of precautions must be followed (e.g. limiting searches to lipid classes known to exist in a given organism, verifying hits by acquiring mass spectra of additional reference compounds).44,45 To address these issues, Lipostar allows both databasedriven and database-free identification algorithms (workflow summarized in Figure 3 and Table S1-1 in SI). Briefly, the central engine is a proprietary Database of Fragmentation Rules (FR-DB), collected from literature32,47,48 and in house experimental data. The FR-DB can be used:

1)

to generate a database of lipid fragments (LIP-DB) from databases of lipid structures (e.g. LIPID MAPS49,50, Lipidblast44 or customized) through the DB manager tool. Once fragments libraries are generated, identification is based on a query of the MS-MS/MS lipids database looking for matches; 2) to interpret the experimental MS/MS spectra of an unknown lipid, allowing a primary classification of the lipid into its lipid class/family. In particular, the additional presence of a database-free algorithm for lipid identification as in 2) overcomes the limitations associated with the use of a database-only-based algorithm where, if the lipid is not included a priori in the database, it will never be identified. Obviously, the more complete the LC-MS/MS information for a lipid, the more accurate and reliable the identification (Figure S1-4 and Table S1-1 in SI). To this aim, Lipostar allows merging all of the experimental information available (i.e. spectra acquired at different polarities, multiple adducts fragmentation) in order to minimize false discovery rates.32 Identification results are provided with a traffic light colour palette, in which uniquely identified lipids are represented in green, unidentified lipids in red, and identifications that require additional (manual or semi-automatic) inspection and approval because of sum composition or lipid family conflicts, in orange. In addition to colour-labelling, an identification score is available to estimate the reliability of results (details in Table S1-1 in SI). After conflict inspection, lipid approval can be saved to be recalled by Lipostar in future processing. Indeed, in any subsequent new identification, the list of “already approved” lipids (characterized by MS-MS/MS and RT information) can be used to guide the match assignment. This particular feature of being trainable makes Lipostar unique when compared with other software that aims to provide au-

ACS Paragon Plus Environment

Page 5 of 9

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

tomatic lipid profiling. It is noteworthy that when the identification approach based on a lipid database is used, unidentified m/z entries without MS/MS data are further processed to test if they can be oxidized forms of lipids in the database, based on the exact masses comparison. Additional details are given in Table S1-1 in SI.

Figure 3. Lipid identification workflow in Lipostar. More details are given in Table S1-1.

CASE STUDIES Lipid identification applied to a standard mixture. Although this test case is far away from representing a general lipidomics study, in our view this was the simplest way to describe the basic features of the Lipostar identification process. Nowadays, the classification of lipids into eight categories as proposed by the LIPID MAPS consortium50 (fatty acyls, glycerolipids, glycerophospholipids, sphingolipids, sterol lipids, prenol lipids, saccharolipids, and polyketides) represents the reference in lipidomics, although prenol lipids, saccarolipids and polyketides are relatively less studied.32 Therefore, in this case study 40 standards were acquired aimed at covering the main lipid classes49 (excluding saccarolipids and polyketides), and the major sub-classes. The lipid mixture was analysed by LC-MS as described in the Materials and Methods section below. As previously mentioned, the Lipostar identification algorithm can be optionally based on a LIP-DB of theoretical lipid MS, MS/MS and, if available, RT data previously generated by the DB manager tool from databases of lipid structures; in addition experimental MS/MS spectra, when available, lead to a more accurate identification. A targeted analysis of the standards was performed, using an inclusion list to force MS/MS acquisition. The use of inclusion lists is a general trend in targeted lipidomics, and it is also recommended in Lipostar when specific lipids are targeted. Thus, a LIP-DB generated from the LIPID MAPS database49 was used to support identification. As shown in Table S2-1 in the SI, all of the analysed lipids were successfully identified by Lipostar and furthermore, in 60% of the cases, identification was unequivo-

cal (except for acyl chains position, dark-green label assignment). For instance, PA (17:0/17:0) was successfully identified in negative mode with six experimental fragments matches with the correspondent theoretical LC-MS/MS information in the LIP-DB database (see section 2.1 in SI). In the same fashion, PC (16:0/18:1) was identified along with the possible isobar PC (18:1/16:0) in positive mode with six experimental fragments. In other cases (e.g. lutein F, vitamin E), mainly due to the absence of lipid-specific fragmentation information, the lipid standard was identified but it was not possible to distinguish it from other isobars (Tables S2-1 and S2-2 in SI). However, this issue can be resolved for future Lipostar applications by training the Lipostar database (Table S1-1 in SI). Untargeted lipidomics for drug safety assessment. Lipidomics is often applied to search for biomarkers associated to a disease state.1,8,30,51,52 In untargeted lipidomics, innovative applications concerns the safety risk assessment associated with drugs.53 Cell/based in vitro assays are commonly performed for drug safety evaluation. Although it is well-known that certain drugs affect lipid metabolism (e.g. ximelagatran54 and isoproterenol55) or induce lipid accumulation (e.g. amiodarone and other cationic amphiphilic drugs56,57), to the best of our knowledge lipidomics approaches in this field are extremely rare, and limited to target specific lipid classes.53,58 A possible reason for this scarce use is that untargeted lipidomics is still time-consuming, and drug safety evaluation requires a significant number of samples to determine the concentration effect. In this case study, we have used the Lipostar platform to monitor the potential impairment of cellular lipids induced in 3D InSight™ liver microtissues (InSphero AG, CH). Indeed, 3D microtissues have become an appealing liver model for safety risk assessment, allowing studies under chronic exposure conditions.59 Here, microtissues were treated with two drugs, troglitazone or pioglitazone at 20 µM along 11 days, with re-dosing every 48 hours. Treated samples were compared with control-vehicle ones (i.e. samples with medium containing 0.1% DMSO), analysed at days 7, 9, 11, and 14 of treatment. Troglitazone (Noscal or Rezulin) is an oral antidiabetic drug60 that was withdrawn from the market due to associated hepatotoxicity in humans.61 This hepatotoxic effect was not observed in animals, and the cause of the idiosyncratic effect in humans has been explored extensively.62,63 Pioglitazone, as troglitazone, is a thiazolidinedione drug for the treatment of diabetes mellitus type II, and the chemical structures of the two compounds are similar (Figure S3-1 in SI). However, pioglitazone has not been associated with severe hepatotoxicity so far, although cardiovascular events and especially an increased risk of bladder cancer have been reported.64 Since troglitazone was found to be cytotoxic in HepG2 cells65,66 we hypothesized that lipidomics should highlight differences in the lipid profile of 3D liver microtissues associated with the treatment of each drug. Of course, compared to studies in HepG2 cells, studies in 3D liver microtissues are more costeffective and thus a reduced cell number is usually used for treatment. After treatment (experimental details are reported in SI, Methods Section), the lipid content of each sample was analysed by LC-MS, and the raw data were processed with Lipostar using an untargeted approach. An important issue of lipidomics applied to hepatotoxicity research is that drugs and their metabolites, depending on the lipophilicity, can be potentially partially extracted with lipids during sample preparation.

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

This is more evident with cationic amphiphilic drugs (e.g. amiodarone or imipramine, data not shown) but it is always recommended to check whether drugs and related compounds are present in the chromatogram for two reasons: 1) they may alter the statistical analysis contributing to cluster formation for each drug, when the effect of different drugs is compared; 2) the knowledge about potential metabolites formed during analysis may be critical for future interpretation of the biological effect. Lipostar is the only software that is able to combine lipidomics analysis with fully automatic metabolite identification without the necessity of any metabolites library23. To this end, the MetaSite algorithm26,67,68,24 can be connected to Lipostar through the interface so that peaks related to used drugs and their metabolites are automatically identified and a specific filter can be applied to remove them before statistical analysis. In the present case study, the drug analysis led to the identification of troglitazone and pioglitazone in their corresponding treated samples. In addition, for troglitazone treated samples a metabolite formed by hydroxylation (Table S3-1 in SI) was observed. After removal of the drug-related peaks, a principal component analysis was performed. The scores plot (Figure 4a) clearly illustrates time-dependent changes in lipidomes of samples treated with pioglitazone and troglitazone. The samples corresponding to shorter exposure of the liver microtissues to troglitazone or pioglitazone are closer to the control samples, however the longer exposure samples show more pronounced differences and interestingly, the troglitazone and pioglitazone samples are almost orthogonally oriented. These data suggest that not only the lipid impairment induced by the two drugs increases with time, but also that the lipids involved in the impairment are different and not directly correlated. In this study, our interest was mainly related to lipid impairment induced by troglitazone, highlighted in red (increasing lipids) and green (decreasing lipids) circles in the PCA loadings plot in Figure 4b. Both LIP-DB-based and FR-DB-based approaches were used for lipid identification, with LIP-DB being generated from an in-house lipid database composed from the LIPID MAPS database49 (accessed December 2016) and a virtual lipid database of an additional 800,000 structures. Among the 40 increasing lipids selected from the loadings plot (Figure 4b, red circles), 28 (75%) were automatically identified (pale or dark green color labelling in Table S3-2 in SI), based on their exact mass, isotopic pattern, adduct analysis and, when available, MS/MS data. An additional 11 variables were labelled in orange upon identification, indicating that the user’s further inspection is recommended, while for one compound identification was unsuccessful. Concerning the subset of variables increasing upon troglitazone treatment, they were mainly composed of triacylglycerols (TGs, Table S3-2 in SI), and TG accumulation has been already observed for another structural analogue of troglitazone, rosiglitazone.69 Figure 5 illustrates the area comparison for ten TGs, identified by identical sum composition, among control vehicles, troglitazone treated samples and pioglitazone treated samples. Concerning lipids that, according to the multivariate statistical model, decrease in troglitazone treated samples (green circles in Figure 4b), 75% of the selected variables were automatically identified as diacylglycerophosphocholine; most of them were not available in the LIPID MAPS database49, but were identified from the in-house virtual database and from the presence of the abundant fragment with m/z 184 in the correspondent MS/MS spectra. It is noteworthy that MS/MS data were not available for all

Page 6 of 9

selected variables. As a second step of this study, inclusion lists of the lipids of interests can be generated to force the MS/MS acquisition, and Lipostar will automatically upload the missing MS/MS data into the work-session for a refined identification (see data processing section). However, this was beyond the scope of this case study, aiming to describe a procedure rather than safety risk assessment results. For sure, these preliminary insights might inspire (it is exactly the scope of untargeted lipidomics) targeted studies focused on a limited number of lipids or lipid classes (e.g. TG(56:5), TG(56:6) or TG(56:7) in Figure 5).

Figure 4. PCA analysis of the lipid profile of 3D InSight™ liver microtissues upon drug treatment. (a) scores plot of samples treated with troglitazone (red) or pioglitazone (blue). Control-vehicles are colored in orange. Square size refers to days of treatment, from 7 days (smaller) to 14 days (larger) (b) Loadings plot highlighting the increasing (red) or decreasing (green) lipids upon treatment with troglitazone. Their identification is reported in Table S3-2 in SI.

Figure 5. Area comparison for ten TGs, classified by sum composition, for control-vehicles samples (orange), troglitazone treated samples (red), and pioglitazone treated samples (blue).

CONCLUSIONS In this paper we have presented Lipostar, a novel software tailored for untargeted lipidomics. Lipostar has been devel-

ACS Paragon Plus Environment

Page 7 of 9

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

oped to successfully process data from all of the major LC-MS instruments (Agilent, Thermo, Bruker, ABSciex, Waters), therefore allowing instrument and vendor-neutral lipidomics. Lipostar was designed as a comprehensive in silico tool, handling all the steps from raw data reading to multivariate analysis and lipid identification. Concerning the algorithms, the major innovative points are the matrix-based data processing based on sample alignment, to assist untargeted handling of isotopes and adducts, and the co-existence of two approaches for lipid identification. The first approach searches for matches on databases of fragmented lipids (generated by the DB Manager tool), and customized databases can be generated. In addition, precompiled databases can be also trained by novel experimental results. The second identification approach is database-independent and is based on the interpretation of the experimental MS/MS spectra searching for fragments that are lipid-class-specific. When the first approach is used, the nonidentified lipids are also reprocessed to evaluate whether they can be oxidized forms of lipids represented in the database. Finally, Lipostar was primarily designed for pharmaceutical and medical applications. To this aim, the connection with Mass-MetaSite for the identification of drug metabolites in safety assessment studies, and prediction tools associated to statistical models make Lipostar unique in this research field. The utility of Lipostar applied to lipidomics has been illustrated presenting two cases studies, firstly showing the application to automatic lipid identification and secondly to the investigation of drug safety profiles by multivariate statistical approach. We believe that such a high-throughput and comprehensive lipidomics platform might contribute to the successful use of untargeted lipidomics from pharmaceutical and medical applications to biomarkers discovery. MATERIALS AND METHODS Chemicals. Standard lipids were purchased from Cayman Chemicals (Ann Arbor, MI, USA), from Avanti polar (Alabaster, AL, USA) and from Sigma-Aldrich (St. Louis, MO, USA), with a purity grade > 98%. All solvents were purchased from Sigma-Aldrich and Biosolve (Dieuze, FR). Troglitazone and pioglitazone were purchased from Sigma-Aldrich. 3D Insight™ Human liver microtissues (hLiMT) and 3D Insight™ Human liver Maintenance Medium were purchased from InSphero AG. LC-MS analysis. A Thermo Q-exactive mass spectrometer (Thermo Fisher Scientific, Waltham, MA USA) was used. The LC system, governed by Chromeleon X-press software, consists of a Binary pump, thermostated autosampler and column compartment, all Dionex Ulimate 3000 series modules (Thermo Fisher Scientific, Waltham, MA USA). For analysis of the standard lipids, the mixture preparation is described in the Supporting Information. A volume of 2 µl was injected. Chromatographic separation of lipids was conducted in reverse phase chromatography. In brief, a Supelco Ascentis Express F5 HPLC column (3×100 mm, 2.7 µm) was used and the mobile phases consisted of A: 5 mM ammonium formate and Formic Acid 0.1 % in water, B: 5 mM ammonium formate and Formic acid 0.1% in isopropanol. The LC flow was set at 0.65ml/min and the gradient was varied linearly as follows: time 0min- solvent A 80%, B20%; time 3min- solvent A 60%, B40%; time 16min- solvent A40%, B60%; time 16.5min- solvent A30%, B70%; time 24min- solvent A26%, B74%; time 28min- solvent A5%, B95%; time 30 stop run.

The column was operating at constant temperature of 45°C. The LC effluents were introduced into the Q-Exactive mass spectrometer by H-ESI source that operated in positive and negative mode with sheath gas flow rate 50, auxiliary gas flow rate 15, spray voltage 2.5 kV, capillary temperature and auxiliary gas heater temperature were respectively 270°C and 290°C, S-lens RF level 70. The Q-Exactive mass spectrometer operates in Data Dependent Scan (DDS) mode, with a resolution of 35.000 in full mass and 17.500 in MS/MS, in the scan mass range 100-1800 both in positive and negative ionization mode at collision energy of 35 V. For drug safety assay, samples preparation is described in the Supporting Information. A volume of 2 µl of each sample was injected. Chromatographic separation of lipids was conducted in reverse chromatography according to the Bird et al. method70 with minor modifications. In brief, an Ascentis Express (Supelco) C8 column (2.1×75 mm, 2.7 µm) was used and the mobile phases consisted of A: 10 mM ammonium formate + 0.1% formic acid in water:acetonitrile, 6:4 (v/v), and B:10 mM ammonium formate + 0.1% formic acid in acetonitrile:isopropanol, 2:8 (v/v), C: 5 mM ammonium formate in isopropanol. The LC flow was set at 0.5mL/min and the gradient was varied linearly as follows: time 0min-solvent A 87%, B13%; time 5min- solvent A 35%, B 65%; time 17minsolvent A 20%, B 80%; time 20- min solvent A 5%, B 95%; time 23min- solvent A 5%, B 95%; time 23 stop run. The column was operating at constant temperature of 35°C.

ASSOCIATED CONTENT Lipostar is freely available for non-profit research institutions. Supporting Information The Supporting Information is available free of charge on the ACS Publications website. The material includes details on software design, algorithms and workflows, lipid mixture preparation, samples for safety-study preparation, LC-MS analysis, and the list of identified lipids in the drug safety case study.

AUTHOR INFORMATION Corresponding Author * Gabriele Cruciani, [email protected]

Author Contributions The manuscript was written through contributions of all authors. / All authors have given approval to the final version of the manuscript. / †These authors contributed equally.

REFERENCES (1) (2) (3) (4) (5) (6) (7) (8) (9) (10)

Ekroos, K.; Ed, R. Lipidomics; Ekroos, K., Ed.; Wiley-VCH Verlag GmbH & Co. KGaA: Weinheim, Germany, 2012. Wymann, M. P.; Schneiter, R. Nat. Rev. Mol. Cell Biol. 2008, 9, 162–176. Murphy, S. A.; Nicolaou, A. Mol. Nutr. Food Res. 2013, 57, 1336–1346. Wenk, M. R. Cell 2010, 143, 888–895. Harkewicz, R.; Dennis, E. A. Annu. Rev. Biochem. 2011, 80, 301–325. Fenn, J.; Mann, M.; Meng, C.; Wong, S.; Whitehouse, C. Science (80-. ). 1989, 246, 64–71. Han, X.; Gross, R. W. Mass Spectrom. Rev. 2005, 24, 367–412. Han, X. Nat. Rev. Endocrinol. 2016, 12, 668–679. Cajka, T.; Fiehn, O. Anal. Chem. 2016, 88, 524–545. SciFinder https://scifinder.cas.org/scifinder (accessed Nov 17, 2016).

ACS Paragon Plus Environment

Analytical Chemistry (11)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(12) (13) (14) (15) (16) (17) (18) (19) (20)

(21) (22) (23) (24) (25) (26) (27) (28) (29) (30) (31) (32) (33) (34) (35) (36) (37) (38) (39) (40)

Tsugawa, H.; Ohta, E.; Izumi, Y.; Ogiwara, A.; Yukihira, D.; Bamba, T.; Fukusaki, E.; Arita, M. Front. Genet. 2015, 5, 471. Niemelä, P. S.; Castillo, S.; Sysi-Aho, M.; Orešič, M. J. Chromatogr. B 2009, 877, 2855–2862. Yetukuri, L. R. Bioinformatics approaches for the analysis of lipidomics data; 2010. Ahmed, Z.; Mayr, M.; Zeeshan, S.; Dandekar, T.; Mueller, M. J.; Fekete, A. Bioinformatics 2015, 31, 1150–1153. Kind, T.; Liu, K.-H.; Lee, D. Y.; DeFelice, B.; Meissen, J. K.; Fiehn, O. Nat. Methods 2013, 10, 755–758. Song, H.; Ladenson, J.; Turk, J. J. Chromatogr. B 2009, 877, 2847–2854. LipidView Sciex http://sciex.com/products/software/lipidviewsoftware (accessed Nov 2, 2016). LipidSearch Thermo Scientific https://www.thermofisher.com/order/catalog/product/IQLAAEG ABSFAPCMBFK (accessed Nov 2, 2016). Mass Hunter Agilent Technologies http://www.agilent.com/enus/products/software-informatics/masshuntersuite/masshunter/masshunter-software (accessed Nov 2, 2016). Mass Profiler Professional Agilent Technologies http://www.agilent.com/en-us/products/softwareinformatics/masshunter-suite/masshunter-for-life-scienceresearch/mass-profiler-professional-software (accessed Nov 2, 2016). SimLipid PREMIER Biosoft http://premierbiosoft.com/lipid/index.html (accessed Nov 2, 2016). Katajamaa, M.; Miettinen, J.; Oresic, M. Bioinformatics 2006, 22, 634–636. Tsugawa, H.; Cajka, T.; Kind, T.; Ma, Y.; Higgins, B.; Ikeda, K.; Kanazawa, M.; VanderGheynst, J.; Fiehn, O.; Arita, M. Nat. Methods 2015, 12, 523–526. Zamora, I.; Fontaine, F.; Serra, B.; Plasencia, G. Drug Discov. Today Technol. 2013, 10, e199–e205. O’Neal, J. Introduction to Signal Transmission; TheInstitute of Electrical and ElectronicsEngineers,Inc.,: New York, NY, USA, 1972; Vol. 20. Bonn, B.; Leandersson, C.; Fontaine, F.; Zamora, I. Rapid Commun. Mass Spectrom. 2010, 24, 3127–3138. Horn, D. M.; Zubarev, R. A.; McLafferty, F. W. J. Am. Soc. Mass Spectrom. 2000, 11, 320–332. James, P. F.; Perugini, M. A.; O’Hair, R. A. J. J. Am. Soc. Mass Spectrom. 2006, 17, 384–394. Dill, A. L.; Eberlin, L. S.; Zheng, C.; Costa, A. B.; Ifa, D. R.; Cheng, L.; Masterson, T. A.; Koch, M. O.; Vitek, O.; Cooks, R. G. Anal. Bioanal. Chem. 2010, 398, 2969–2978. Camera, E.; Ludovici, M.; Tortorella, S.; Sinagra, J.-L.; Capitanio, B.; Goracci, L.; Picardo, M. J. Lipid Res. 2016, 57, 1051–1058. Brulet, M.; Seyer, A.; Edelman, A.; Brunelle, A.; Fritsch, J.; Ollero, M.; Laprevote, O. J. Lipid Res. 2010, 51, 3034–3045. Han, X. Lipidomics Comprehensive Mass Spectrometry of Lipids; Desiderio, D. M.; Loo, J. A., Eds.; Cambridge University Press, 2016. Buas, M. F.; Gu, H.; Djukovic, D.; Zhu, J.; Drescher, C. W.; Urban, N.; Raftery, D.; Li, C. I. Gynecol. Oncol. 2016, 140, 138–144. Zhang, Y.; Liu, Y.; Li, L.; Wei, J.; Xiong, S.; Zhao, Z. Talanta 2016, 150, 88–96. Schwudke, D.; Hannich, J. T.; Surendranath, V.; Grimard, V.; Moehring, T.; Burton, L.; Kurzchalia, T.; Shevchenko, A. 2007, 79, 4083–4093. Jolliffe, I. In Wiley StatsRef: Statistics Reference Online; John Wiley & Sons, Ltd: Chichester, UK, 2014. Pearson, K. Philos. Mag. Ser. 6 1901, 2, 559–572. Wold, S.; Hellberg, S.; Lundstedt, T.; Sjostrom, M.; Wold, H. In Proceedings PLS-meeting; Frankfurt, Germany, 1987; pp. 1–21. Smilde, A. K.; Westerhuis, J. A.; de Jong, S. J. Chemom. 2003, 17, 323–337. Wold, S.; Sjöström, M.; Eriksson, L. Chemom. Intell. Lab. Syst. 2001, 58, 109–130.

(41) (42) (43) (44) (45) (46) (47) (48) (49)

(50) (51) (52) (53) (54) (55) (56) (57) (58) (59)

(60) (61) (62) (63) (64) (65) (66) (67) (68) (69) (70)

Page 8 of 9

Barker, M.; Rayens, W. J. Chemom. 2003, 17, 166–173. Svensson, O.; Kourti, T.; MacGregor, J. F. J. Chemom. 2002, 16, 176–188. Esbensen, K. H.; Guyot, D.; Westad, F.; Houmøller, L. P. Multivariate Data Analysis – In Practice; 5th ed.; CAMO: Oslo, Norway, 2002. Kind, T.; Liu, K.-H.; Lee, D. Y.; DeFelice, B.; Meissen, J. K.; Fiehn, O. Nat. Methods 2013, 10, 755–758. Herzog, R.; Schuhmann, K.; Schwudke, D.; Sampaio, J. L.; Bornstein, S. R.; Schroeder, M.; Shevchenko, A. PLoS One 2012, 7, e29851. Ahmed, Z.; Mayr, M.; Zeeshan, S.; Dandekar, T.; Mueller, M. J.; Fekete, A. Bioinformatics 2015, 31, 1150–1153. McAnoy, A. M.; Wu, C. C.; Murphy, R. C. J. Am. Soc. Mass Spectrom. 2005, 16, 1498–1509. Murphy, R. C.; James, P. F.; McAnoy, A. M.; Krank, J.; Duchoslav, E.; Barkley, R. M. Anal. Biochem. 2007, 366, 59– 70. Sud, M.; Fahy, E.; Cotter, D.; Brown, A.; Dennis, E. A.; Glass, C. K.; Merrill, A. H.; Murphy, R. C.; Raetz, C. R. H.; Russell, D. W.; Subramaniam, S. Nucleic Acids Res. 2007, 35, D527– D532. Fahy, E. J. Lipid Res. 2005, 46, 839–862. McEvoy, J.; Baillie, R. a; Zhu, H.; Buckley, P.; Keshavan, M. S.; Nasrallah, H. a; Dougherty, G. G.; Yao, J. K.; KaddurahDaouk, R. PLoS One 2013, 8, e68717. Zhou, X.; Mao, J.; Ai, J.; Deng, Y.; Roth, M. R.; Pound, C.; Henegar, J.; Welti, R.; Bigler, S. a. PLoS One 2012, 7, e48889. Mortuza, G. B.; Neville, W. A.; Delaney, J.; Waterfield, C. J.; Camilleri, P. Biochim. Biophys. Acta - Mol. Cell Biol. Lipids 2003, 1631, 136–146. Sergent, O.; Ekroos, K.; Lefeuvre-Orfila, L.; Rissel, M.; Forsberg, G.-B.; Oscarsson, J.; Andersson, T. B.; LagadicGossmann, D. Toxicol. Vitr. 2009, 23, 1305–1310. Prince, P. S. M.; Sathya, B. Eur. J. Pharmacol. 2010, 635, 142– 148. Kasim, S.; Bagchi, N.; Brown, T.; Khilnani, S.; Jackson, K.; Steinman, R.; Lehmann, M. Horm. Metab. Res. 1990, 22, 385– 388. Begriche, K.; Massart, J.; Robin, M. A.; Borgne-Sanchez, A.; Fromenty, B. J. Hepatol. 2011, 54, 773–794. Dehairs, J.; Derua, R.; Rueda-Rincon, N.; Swinnen, J. V. Drug Discov. Today Technol. 2015, 13, 33–38. Bell, C. C.; Hendriks, D. F. G.; Moro, S. M. L.; Ellis, E.; Walsh, J.; Renblom, A.; Fredriksson Puigvert, L.; Dankers, A. C. A.; Jacobs, F.; Snoeys, J.; Sison-Young, R. L.; Jenkins, R. E.; Nordling, Å.; Mkrtchian, S.; Park, B. K.; Kitteringham, N. R.; Goldring, C. E. P.; Lauschke, V. M.; Ingelman-Sundberg, M. Sci. Rep. 2016, 6, 25187. Sparano, N.; Seaton, T. L. Pharmacother. J. Hum. Pharmacol. Drug Ther. 1998, 18, 539–548. B., W. P.; W., W. R. N. Engl. J. Med. 1998, 338, 916–917. Jaeschke, H. Toxicol. Sci. 2007, 97, 1–3. Chojkier, M. Hepatology 2005, 41, 237–246. Tuccori, M.; Filion, K. B.; Yin, H.; Yu, O. H.; Platt, R. W.; Azoulay, L. BMJ 2016, 352, i1541. Masubuchi, Y. Drug Metab. Pharmacokinet. 2006, 21, 347–356. Yamamoto, Y.; Nakajima, M.; Yamazaki, H.; Yokoi, T. Life Sci. 2001, 70, 471–482. Cruciani, G.; Carosati, E.; De Boeck, B.; Ethirajulu, K.; Mackie, C.; Howe, T.; Vianello, R. J. Med. Chem. 2005, 48, 6970–6979. Cruciani, G.; Baroni, M.; Benedetti, P.; Goracci, L.; Fortuna, C. G. Drug Discov. Today Technol. 2013, 10, e155–e165. Rogue, A.; Anthérieu, S.; Vluggens, A.; Umbdenstock, T.; Claude, N.; de la Moureyre-Spire, C.; Weaver, R. J.; Guillouzo, A. Toxicol. Appl. Pharmacol. 2014, 276, 73–81. Bird, S. S.; Marur, V. R.; Stavrovskaya, I. G.; Kristal, B. S. Anal. Chem. 2012, 84, 5509–5517.

ACS Paragon Plus Environment

Page 9 of 9

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

FOR TOC ONLY

ACS Paragon Plus Environment

9