Subscriber access provided by READING UNIV
Article
Predicting the risk of Phospholipidosis with in silico models and an image based in vitro screen Lucia Fusani, Martin Brown, Hongming Chen, Ernst Ahlberg, and Tobias Noeske Mol. Pharmaceutics, Just Accepted Manuscript • DOI: 10.1021/acs.molpharmaceut.7b00388 • Publication Date (Web): 27 Oct 2017 Downloaded from http://pubs.acs.org on October 28, 2017
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
Molecular Pharmaceutics is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 29
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Molecular Pharmaceutics
Predicting the risk of Phospholipidosis with in silico models and an image based in vitro screen Lucia Fusani1, Martin Brown2, Hongming Chen3, Ernst Ahlberg1, Tobias Noeske1* 1
Drug Safety & Metabolism, Innovative Medicines and Early Development Biotech Unit,
AstraZeneca, Pepparedsleden 1, Mölndal, 431 83, Sweden. 2
Discovery Sciences, Innovative Medicines and Early Development Biotech Unit, AstraZeneca,
Cambridge Science Park, Milton Road, Cambridge, CB4 0WG, United Kingdom. 3
Discovery Sciences, Innovative Medicines and Early Development Biotech Unit, AstraZeneca,
Pepparedsleden 1, Mölndal, 431 83, Sweden. KEYWORDS: Phospholipidosis, QSAR, image analysis, machine learning, concensus model.
ABSTRACT
The drug-induced accumulation of phospholipids in lysosomes of various tissues is predominantly observed in regular repeat dose studies – often after prolonged exposure – and further investigated in mechanistic studies prior to candidate nomination. The finding can cause delays in the discovery process inflicting high costs to the affected projects. This paper presents an in vitro imaging-based method for early detection of Phospholipidosis liability and a hybrid approach for early detection and risk mitigation of Phospolipidosis utilizing the in vitro readout
ACS Paragon Plus Environment
1
Molecular Pharmaceutics
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 2 of 29
with in silico model prediction. A set of reference compounds with Phospolipidosis annotation was used as external validation set yielding accuracies between 77.6% and 85.3% for various in vitro and in silico models, respectively. By means of a small set of chemically diverse known drugs with in vivo Phospholipidosis annotation the advantages of combining different prediction methods to reach an overall improved Phospholipidosis prediction will be discussed.
Introduction Drug-induced Phospholipidosis (PLD) is a disorder characterized by the accumulation of polar phospholipids into lamellar bodies within affected tissues. Many different tissue types can be affected but it mostly occurs in lung, liver and lymphocytes.1 PLD is reversible upon termination of drug administration and exposure where the time course depends on the dissociation constant of the drug from the phospholipid and its elimination rate from the tissue.2 PLD is commonly regarded as largely an adaptive rather than an adverse response to xenobiotics, yet there is some evidence suggesting that phospholipogenic compounds are associated with concurrent toxicities preclinically.3 A PLD finding at the drug discovery stage can therefore become a safety hurdle for project progression and cause serious delays. Preclinically PLD is usually detected after histological examination of the affected tissue. By using light microscopy, PLD can be identified by the appearance of foamy macrophages or cytoplasmic vacuoles in many cell types.4 Membranous lamellar inclusions, concentric multilamellar bodies, myeloid bodies and other similar structures can be observed by using Transmission Electron Microscopy (TEM) technology.5 Although TEM is generally considered the gold standard for confirming PLD other less invasive approaches were examined in pre-clinical and clinical situations: Nile red staining using flow cytometry of circulating lymphocytes has been proposed6 but it identifies PLD only in
ACS Paragon Plus Environment
2
Page 3 of 29
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Molecular Pharmaceutics
one compartment – the blood – and more recently a plasma and/or urinary biomarker, bis(monoacylglycerol)phosphate (BMP), was reported.7
Until now a large number of phospholipogenic and non-phospholipogenic drugs has been reported, many of them even with TEM confirmation.8 A typical structural pattern among phospolipogenic compounds is the presence of a hydrophilic cationic side chain attached to a hydrophobic domain consisting of an aromatic and/or aliphatic ring structure, with drugs bearing this moiety often being described as cationic amphiphilic drugs (CAD).9 Further investigations have led to the finding that with physico-chemical (phys-chem) parameters like CLogP and pKa the PLD-inducing potential of drugs can be predicted with a reasonable accuracy.10 More recently, a number of studies has been published reporting in vitro prediction of PLD,11 the identification of structural alerts for PLD,12 computational predictions13-14 or even a combination of in vitro and computational approaches.15-16 In this study we report the development of a medium throughput in vitro screen to detect phospholipogenic drugs and the development of consensus computational models predicting the in vivo PLD results. Our optimal model was applied on a set of 183 registered drugs with in vivo PLD annotation taken from the FDA´s reference set8 and excellent prediction accuracy was achieved.
Methods In Vitro Screen
ACS Paragon Plus Environment
3
Molecular Pharmaceutics
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 4 of 29
Routine cell culture: H4-II-E-C c3 (ECACC, Salisbury, UK) rodent hepatoma cells were selected for compatibility with high content imaging, their usability across a panel of other toxicity profiling endpoints within the lab, and following confirmation of identical PLD predictivity to an earlier PLD in vitro model (in I13.35 murine splenocytes) published by Morelli et al.17 (data not shown). Cells were seeded (16,000 cells/well) onto Packard 96 well Viewplates (PerkinElmer, Buckinghamshire, UK) in Roswell Park Memorial Institute media (RPMI-1640) containing 10% Fetal Calf Serum (FCS), 1% (2mM) L-Glutamine and 1% Non-Essential Amino Acids (NEAA) (all reagents from Sigma-Aldrich, Hampshire, UK) and allowed to recover for 24 hours prior to compound treatment. Cell cryopreservation as an assay-ready bank: H4-II-E c3 cells were cryopreserved in RPMI1640 containing 7.5% Dimethyl sulfoxide (DMSO) and 50% FCS to allow use direct in assay. Following thawing cells were seeded (8,000 cells/well) onto Packard 96 well Viewplates and allowed to recover for 72 hours prior to compound treatment. Fluorescence-tagged phospholipid addition: Just prior to compound treatment wells were spiked
with
20µL
of
media
containing
6µg/mL
1,2-dipalmitoyl-sn-glycero-3-
phosphoethanolamine-N-(7-nitro-2-1,3-benzoxadiazol-4-yl) (NBD-PE, Avanti Polar Lipids, Alabama, USA), to give a final concentration of 1.2µg/mL. Compound treatment: 1,731 compounds, including 194 from the Orogo dataset, were tested on two occasions. All compounds were obtained from Sigma-Aldrich, UK or the AstraZeneca compound collection. Compounds were dissolved in DMSO to give 250X stocks, to 50mM or as high as solubility would allow if lower. From the stock concentration a Hamilton MicroLab STAR liquid handler (Hamilton Co., Bonaduz, Switzerland) generated a 10-point two-fold
ACS Paragon Plus Environment
4
Page 5 of 29
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Molecular Pharmaceutics
(halving) DMSO dilution series for each compound. Each plate also contained eight positive (10µM Amiodarone) and eight negative (0.25% DMSO) control wells. Compounds were dosed to cells and a 0.25% DMSO concentration was maintained in all wells. Cells were exposed to compound for 24 hours prior to fixation. Cell fixation and viable staining: Following compound treatment cells were exposed to 500nM of a cell impermeable DNA-binding viability fluorescent dye (SYTOX Orange, Invitrogen, USA) for 20 minutes. Cells were then fixed in 3.2% final paraformaldehyde (PFA, SigmaAldrich, UK) containing 16µg/mL Hoechst 33342 DNA-binding fluorescent dye, for 45 minutes. Fixed cells were washed twice and left in PBS for imaging. Image acquisition: Images of stained cells were captured using an ImageXpress MICRO (Molecular Devices, California, USA) automated fluorescence microscope. Acquisition used the following Brightline excitation and emission filters (Semrock, Rochester, USA); DAPI-5060BNTE-ZERO (Hoechst), FITC-3540B-NTE-ZERO (NBD-PE), and TRITC-A-NTE-ZERO (SYTOX Orange). A single 10x image was acquired per well representing approximately 1,000 cells. Quantification of fluorescence images: The presence of PLD can be correlated to and inferred from the build-up of punctate cytoplasmic staining from the introduced fluorescence-tagged phospholipid. To quantify PLD levels, a Definiens Developer (www.definiens.com, Munich, Germany) image analysis algorithm first identified bright (>3-fold over background) Hoechst stained nuclei, and then NBD-PE cytoplasmic staining by a low pass background filter. Punctate NBD-PE staining was identified by a high pass background filter and the intensity summated within these regions. As similar punctate staining can also be caused by pre-apoptotic cellular
ACS Paragon Plus Environment
5
Molecular Pharmaceutics
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 6 of 29
membrane changes, a second filter algorithm was used screen out punctate staining per cell based on very bright (>100-fold over background) nuclear associated staining by the SYTOX viability dye, such that only spot intensities from viable cells were counted. No effect/negative controls (0.25% DMSO) and positive controls (10µM Amiodarone) were used to determine quality measures (such as signal to background ratio and Z’) for plate exclusion and for normalisation. Compound concentration response results were calculated from valid total PLD spot intensity values using the following calculation:
Compound % effect = 100 ∗
(1)
where X represented the duplicate mean values for the compound concentration, min was median of on-plate DMSO controls and max was median of on-plate Amiodarone controls. To identify artefacts due to non-specific cellular toxicity the nuclei count per well was correlated against the median of on-plate negative controls, and where this fell below a threshold (>60% loss relative to control) derived from observation of responses to known cytotoxic compounds, (and confirmed in a series of cytotoxicity assays as per Morelli et al.) PLD signal data were excluded from concentration-response (CR) curve fit. Compound data was then aggregated by concentration into individual test sample compound curves and overall compound curves. Agonist CR curves were fitted using GeneData Screener (Genedata, Basel, Switzerland) to the modified Hill equation: =
"#$%&'()*+
! %&'()*+ ,∗./0 ,-1234- 5
+ 789::9;
(2)
ACS Paragon Plus Environment
6
Page 7 of 29
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Molecular Pharmaceutics
where R was the response. Variables were constrained to realistic values and curve asymptote top (Ctop) and curve asymptote bottom (Cbottom) were based on median values relative to the valid on-plate control wells, [A] was the compound concentration and EC50 was the concentration of compound producing 50% of the maximal response. Prior to initiation of this study the sensitivity of this method for detection of PLD was assessed with commercially available compounds that exhibited well-defined PLD in vitro responses,17 in addition to the availability of confirmatory in vivo data. 28 PLD positive and 8 PLD negative compounds were tested in triplicate and, using a 15µM EC50 activity threshold, demonstrated identical PLD predictivity to that published previously (data not shown). Compounds are considered phospholipogenic (PLD positive) if the EC50 value is below 15µM or if the EC50 value is greater than 15µM with a top effect of 10% or greater.
In Silico Methods Data set: Two types of PLD data were used in the current study were a set of 1,731 AstraZeneca in-house compounds screened on the in vitro screen and a public in vivo dataset described in Orogo et al. The Orogo dataset comprises 743 compounds in total of which 293 compounds (95 PLD positives and 198 PLD negatives) have high confidence PLD in vivo annotation and a reported chemical structure. This set of 293 compounds is here referred to as “Orogo full set”. Among our in-house in vitro set there were 194 compounds overlapping with the Orogo full set. 11 of these compounds showed inconclusive results, hence, 183 compounds were used as validation set (here referred to as “Orogo subset”). The remaining 1,537 compounds were used for further modelling analysis. For the binary in silico model the in vitro EC50 cut-off is defined
ACS Paragon Plus Environment
7
Molecular Pharmaceutics
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 8 of 29
as 15µM; compounds with an EC50 value below 15µM and still being inconclusive (e.g. EC50 > 12.5µM) were discarded. 30% of the in vitro set (458 compounds; 150 in vitro positive and 308 in vitro negative) were randomly selected as a test set and the remaining 70% (1,079 compounds; 291 positive and 708 negative) were kept as a training set for building models. Although our in silico models are based on in vitro data and not in vivo data, their performance against the in vivo data remains the ultimate goal. Therefore the Orogo subset served as an additional test set for evaluating the performance of the in silico models on in vivo data. See Supporting Information Table S1 for the full data set.
Machine learning workflow: A set of 192 2D and 3D descriptors representing important molecular properties18 such as shape, lipophilicity, hydrogen bonding properties, charge, molecular weight, polar surface area, etc. was used to build in silico models. Two non-linear machine learning methods, support vector machine (SVM)19 and random forest (RF),20 were used to build the in silico models based on in vitro screen data. The models were built with the in-house machine learning package AZOrange.21
Principal Component Analysis (PCA): PCA is a statistical analysis where a set of variables (here: descriptors) for a given data set is converted into orthogonal (i.e. linearly uncorrelated) principal components. This enables to describe and plot most information of a high-dimensional vector space with few principal components. In order to assess and visualize the compound similarity across the three compound sets (in vitro training set, in vitro test set and Orogo subset) a PCA was trained on the in vitro training set (1,079 compounds) using the descriptor set described above. The analysis was carried out in SIMCA 14 a commercially available
ACS Paragon Plus Environment
8
Page 9 of 29
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Molecular Pharmaceutics
multivariate tool to extract and visualize information from large data sets (www.umetrics.com). Compounds of the in vitro test set and the Orogo subset were subsequently mapped onto the in vitro training set, that is, their information was translated into the same principal components.
Ploemen: A rule based model published by Ploemen et al. was used as an external comparison. They observed that simple phys-chem properties play an important role in predicting PLD and proposed a simple rule based model (shown in equation 3) containing calculated pKa basic (pKaB) and logP (ClogP):
?@AB@CD E
: 11IJKL 5M + 17NCOOKB@G>: 11IJKL 5M + 17NCO 120µM). The scatter plot (Figure 1b) shows the activity distribution of all compounds with in vitro data in terms of EC50 and top effect given as percent of Amiodarone control (most of PLD active compounds are assigned to the active class due to their top effect, not EC50 value, which only denotes activity above assay background and has no relation to potential PLD liability).
Link to Figure1
In Vitro For the in vitro analysis three example compounds have been selected: one known PLD negative compound – Fexofenadine – and two PLD positive compounds with similar potencies: Cloralgil
ACS Paragon Plus Environment
10
Page 11 of 29
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Molecular Pharmaceutics
and Chlorcyclizine. The fluorescence images, Fig. 2, show the presence of phospholipid accumulation at two different compound concentrations for each example. The corresponding concentration-response curves for these examples are shown in Fig. 3. For each compound an EC50 value was calculated using experimental results from two separate test occasions.
Link to Figure2
Link to Figure3
In Silico Models Summary results for all in silico methods on the in vitro test set are presented in Table 1. The SVM model is in all considered parameters superior to the RF model. Ploemen provides a slightly better specificity than SVM (0.916 compared to 0.909) but on the cost of a substantially impaired sensitivity (0.307 versus 0.680). Based on these results the SVM model and Ploemen were considered for further analysis.
Table 1. Performance of the various in silico models on the in vitro test set. N=458 cpds
In vitro
SVM
RF
Ploemen
TPa
150
102
96
46
FNa
NA
48
54
104
TNa
308
280
279
282
FPa
NA
28
29
26
ACS Paragon Plus Environment
11
Molecular Pharmaceutics
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Sensitivity
NA
0.680
0.640
0.307
Specificity
NA
0.909
0.906
0.916
Accuracy
NA
0.834
0.819
0.716
MCC
NA
0.613
0.575
0.287
Page 12 of 29
Note: a) TP: true postives; TN: true negatives; FP: false positives; FN: false negatives.
Summary data for the Orogo subset, with both in vitro and in vivo results, are presented in Table 2. It contains in vitro and in silico predictions of the in vivo activity. What has been observed for the in silico test set holds true for the Orogo subset: Ploemen provides the best specificity (0.940) but the SVM model has a higher sensitivity than Ploemen (0.653 versus 0.571) and the highest accuracy and Mathews Correlation Coefficient (MCC) of all models (0.853 and 0.609, respectively).
Table 2. Performance of various models on the in vivo data of Orogo subset. N=183
In vivo
In vitro
SVM
Ploemen
TP
49
32
32
28
FN
NA
17
17
21
TN
134
110
124
126
FP
NA
24
10
8
Sensitivity
NA
0.653
0.653
0.571
Specificity
NA
0.821
0.925
0.940
Accuracy
NA
0.776
0.853
0.842
MCC
NA
0.455
0.609
0.570
ACS Paragon Plus Environment
12
Page 13 of 29
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Molecular Pharmaceutics
Compound Similarity The result of the Principal Component Analysis is presented in Figure 4. The first two components plotted here account for 58% of all information of the entire compound set. Apart from a few outliers no separation can be made for compounds belonging to different compound set. This indicates that compounds from all three sets can be considered similar regarding the descriptors used for building in silico models.
Link to Figure4
Consensus Models Consensus approaches were evaluated to enhance the decision support to projects. The analysis suggested a decision tree approach where the Ploemen rule is used as a first pass filter and, if the result is negative, with a decision is made on the in vitro or an in silico model to finally decide if a compound should be categorised as PLD negative or PLD positive. A graphical representation of the approach is given in Figure 5.
Link to Figure5
This combination approach was applied on both in vitro test set and the Orogo subset. The results of the consensus approach are presented in Table 3 in which the Ploemen/SVM consensus model was applied on the in vitro test set and the combinations of Ploemen with either in vitro
ACS Paragon Plus Environment
13
Molecular Pharmaceutics
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 14 of 29
readout or SVM in silico prediction were applied on the in vivo data set. Most noteworthy is that the consensus model using SVM and Ploemen reduces the number of false negatives to only 12 and is the model with the highest accuracy (0.853) and MCC (0.631).
Table 3. Performance of the selected consensus approaches on the in vitro test set and the Orogo subset. In vitro test set In vitro
Orogo subset
N=458
Ploemen + In vivo SVM N=183
Ploemen + Ploemen + In vitro SVM
TP
150
110
49
36
37
FN
NA
40
NA
13
12
TN
308
268
134
108
119
FP
NA
40
NA
26
15
Sensitivity
NA
0.733
NA
0.735
0.755
Specificity
NA
0.870
NA
0.806
0.888
Accuracy
NA
0.825
NA
0.787
0.853
MCC
NA
0.603
NA
0.506
0.631
Discussion Compound Set The fact that about two third of all compounds constituting the in vitro data set for model building (1,026 out of 1,537) have an inconclusive EC50 value (e.g. > 60µM) led us to abandon all efforts of building a regression in silico model which returns continuous predicted activity
ACS Paragon Plus Environment
14
Page 15 of 29
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Molecular Pharmaceutics
values. Such in vitro measurements can't be used for building regression models since the actual EC50 value is unknown. Instead we focused on building categorical in silico models.
In Vitro Readouts The processed images for Chlorcyclizine and Cloralgil indicate that cells build up large amounts of phospholipids already when treated with drug at a 6.25µM concentration (Figure 2a and 2c, respectively). For Chlorcyclizine the quantifiable amount of phospholipids increases at the higher drug concentration while for Cloralgil the response is different: the number of punctate inclusions (i.e. accumulated phospholipids) actually decreases slightly at the higher concentration (Figure 2c and 2d, respectively) while the per cell intensity continues to increase due to reducing cell cytoplasmic area. This suggests concurrent cytotoxicity in the Cloralgil samples at these concentrations, similar to that observed previously with Amiodarone by Morelli et al. Such increasing cytotoxic effects of Cloralgil on the cells create an artificially enhanced PLD signal which could contribute to the higher false positive seen with the in vitro screen relative to the in vivo data. Fexofenadine, in turn, is non-phospholipogenic which is in agreement with the images where large amounts of unaffected cells are observed at both drug concentrations (Figure 2e and 2f).
In Silico Efforts Predicting in vivo PLD is not a trivial task, as can be seen from the results in Table 1 to 3. For the present application, risk detection in early drug discovery, it is important to reduce the number of false negative compounds under the “fail early, fail cheap” assumption, since the later
ACS Paragon Plus Environment
15
Molecular Pharmaceutics
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 16 of 29
those compounds with liabilities are identified in a drug discovery program, the (potentially) longer the delay they can cause. In Table 1 we describe the process of in silico model selection based on the in vitro data. In this step the modeling method using SVMs was selected together with the previously defined Ploemen rule. The combination of SVM and Ploemen, presented in Table 3, provides the best model in terms of MCC and accuracy: only 27 out of 183 compounds were incorrectly predicted (12 FN and 15 FP). The ultimate goal of this study is to accurately predict in vivo Phospolipidosis. It is important to remember that the in silico model is trained on in vitro data and that the Ploemen rule was derived using in vivo data. As can be seen in Table 3 the combination provides the lowest number of false negatives making this an effective in silico screen for projects. Based on the FDA data, it is not clear if a follow up test in the in vitro screen is a good next step as the number of false positives is significantly higher. The sensitivity of the various models applied to predict in vivo results shown in Table 2 and 3) ranges from 0.571 (Ploemen) to 0.755 (Ploemen+SVM). Both the Ploemen and the in silico method provide good positive predictive accuracies, thus of the compounds flagged by the models as positive over 75% are actually positive. The specificity of the machine learning models is better compared to Ploemen.
Advantage of Consensus Models Concensus models have been shown to have superior prediction accuracy compared with individual component models. [Chen] The following four compounds exemplify why neither a simple phys-chem property based prediction (Ploemen) nor an in vitro screen or computational model alone is sufficient to predict in vivo PLD of the FDA compounds (Figure 6).
ACS Paragon Plus Environment
16
Page 17 of 29
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Molecular Pharmaceutics
Chlorcyclizine is phospholipogenic which is correctly predicted by both the in vitro screen and SVM (in silico model). The Ploemen model fails to predict this correctly. Bromhexine, in turn, is also phospholipogenic but this time the in vitro result is wrong and both in silico predictions (SVM and Ploemen) correctly predict the in vivo result. Fexofenadine, however, is confirmed clean which is in alignment with the in vitro model and SVM whereas Ploemen is incorrect. All three compounds show the typical features of cationic amphiphilic drugs (CAD) which are linked to PLD. Still, no single model (Ploemen, SVM, or in vitro screen) predicts all of these compounds positive, which highlights the complexity of predicting in vivo PLD. Sulfamethoxazole, which is the only non-cationic amphilic drug in this list, is only mispredicted by the in vitro screen, while Ploemen and SVM correctly predict it to be negative. These results mean that every individual model can pick up some PLD-related compound properties which the other models miss. Therefore the consensus model performs better than e.g. a simple rule saying high ClogP plus high pKa = phospholipogenic. For the 183 compounds constituting the Orogo subset the consensus model combining Ploemen's pKa and CLogP based rule and the SVM in silico model is superior in PLD prediction than any model alone.
Link to Figure6
The selected consensus approach (Figure 5b) applied to the compounds in Figure 6 shows that the first three compounds would be picked up as positive in silico. Fexofenadine, after being flagged by Ploemen, would be mitigated by the in vitro screen. Chlorcyclizine would be correctly labelled by SVM and confirmed in vitro whereas Bromhexine would be incorrectly labelled by the in vitro screen. Finally, when used as a selection filter for the in vitro screen,
ACS Paragon Plus Environment
17
Molecular Pharmaceutics
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 18 of 29
Sulfamethoxazole would be flagged negative by the in silico consensus and, thus, not measured in vitro.
Data Quality and Translatability When the performance of the individual models with respect to the in vivo data of the Orogo subset was analyzed we were surprised to learn that the SVM in silico model performed as well as or better than the in vitro screen. However the in vitro screen displays reasonable performance metrics (Z’ > 0.3, minimum discriminatory ratio (MDR) < 3, and long term stable reference compound EC50 values). As such these deviations are more probably related to the complexities and limitations of testing compounds in the in vitro setting, where compound precipitation in buffers and non-PLD cytotoxicity complicate the interpretation of results. It was difficult to pinpoint one particular cause as the underlying reasons are likely to be manifold. The preprocessing of the in vitro data for the in silico training data may result in a cleaner relationship between a given compound and its in vivo PLD activity. The difference in results between the in silico and in vitro models could indicate that the in silico models are naturally better able to reach a conclusion from a single test occasion and that some of the compounds should be re-tested in the in vitro screen. Not all compounds underwent repeated testing in the in vitro screen due to compound availability issues – but for some of those with different results to the in vivo or in silico predictions on the first test occasion we could see deviation in the in vitro experimental results with re-test, indicating these compounds were near to the limits of detection of the in vitro system. This has resulted in a modification to AstraZeneca’s internal in vitro PLD testing methodology, such that all in vitro PLD positive compounds must be re-tested to confirm activity.
ACS Paragon Plus Environment
18
Page 19 of 29
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Molecular Pharmaceutics
Another aspect is the in vitro / in vivo correlation. Whilst the in vitro screen measures one defined endpoint – the rapid accumulation of phospholipids in a 2D rodent cell line – the process leading to PLD in an animal or human is more prolonged and complex with potentially multiple mechanisms being involved like (i) inhibition of lysosomal phospholipases or lysosomal enzyme transport systems, (ii) increased synthesis of phospholipids, and (iii) increased cholesterol synthesis which are not accurately represented in the in vitro system. Finally, there is always a concern about the quality or rather consistency of the PLD annotation of the in vivo reference set. It should be stressed that PLD annotations for the compounds constituting the FDA reference set are the result of a keyword search across published literature and FDA archives. Compounds being associated with electron microscopy related keywords (e.g. membraneous lamellar inclusions) were considered phospholipogenic whereas compounds lacking any PLD related keywords were considered clean.
This study advances our understanding of the translation of our current PLD model. Overall the models presented give the greatest probability of identifying a potential phospholipogenic liability within the chemistry of a drug discovery project prior to confirmation in vivo, enabling projects to modify chemistry where possible or best prepare for the additional regulatory scrutiny such a result will entail.
Acknowledgment We thank Jianming Liu for providing a concentration-response-curve image for the three example compounds Cloralgil, Chlorcyclizine and Fexofenadine.
ACS Paragon Plus Environment
19
Molecular Pharmaceutics
Page 20 of 29
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment
20
Page 21 of 29
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Molecular Pharmaceutics
For Table of Contents Only
ACS Paragon Plus Environment
21
Molecular Pharmaceutics
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 22 of 29
References 1. Chatman, L. A.; Morton, D.; Johnson, T. O.; Anway, S. D., A strategy for risk management of drug-induced phospholipidosis. Toxicol Pathol 2009, 37 (7), 997-1005. 2. Reasor, M. J.; Hastings, K. L.; Ulrich, R. G., Drug-induced phospholipidosis: issues and future directions. Expert Opin Drug Saf 2006, 5 (4), 567-83. 3. Barone, L. R.; Boyer, S.; Damewood, J. R., Jr.; Fikes, J.; Ciaccio, P. J., Phospholipogenic pharmaceuticals are associated with a higher incidence of histological findings than nonphospholipogenic pharmaceuticals in preclinical toxicology studies. J Toxicol 2012, 2012, 308594. 4. Robison, R. L.; Visscher, G. E.; Roberts, S. A.; Engstrom, R. G.; Hartman, H. A.; Ballard, F. H., Generalized phospholipidosis induced by an amphiphilic cationic psychotropic drug. Toxicol Pathol 1985, 13 (4), 335-48. 5. Cartwright, M. E.; Petruska, J.; Arezzo, J.; Frank, D.; Litwak, M.; Morrissey, R. E.; MacDonald, J.; Davis, T. E., Phospholipidosis in neurons caused by posaconazole, without evidence for functional neurologic effects. Toxicol Pathol 2009, 37 (7), 902-10. 6. Halstead, B. W.; Zwickl, C. M.; Morgan, R. E.; Monteith, D. K.; Thomas, C. E.; Bowers, R. K.; Berridge, B. R., A clinical flow cytometric biomarker strategy: validation of peripheral leukocyte phospholipidosis using Nile red. J Appl Toxicol 2006, 26 (2), 169-77. 7. Liu, N.; Tengstrand, E. A.; Chourb, L.; Hsieh, F. Y., Di-22:6bis(monoacylglycerol)phosphate: A clinical biomarker of drug-induced phospholipidosis for drug development and safety assessment. Toxicol Appl Pharmacol 2014, 279 (3), 467-76. 8. Orogo, A. M.; Choi, S. S.; Minnier, B. L.; Kruhlak, N. L., Construction and Consensus Performance of (Q)SAR Models for Predicting Phospholipidosis Using a Dataset of 743 Compounds. Mol Inform 2012, 31 (10), 725-39. 9. Lullmann, H.; Lullmann-Rauch, R.; Wassermann, O., Lipidosis induced by amphiphilic cationic drugs. Biochem Pharmacol 1978, 27 (8), 1103-8. 10. Ploemen, J. P.; Kelder, J.; Hafmans, T.; van de Sandt, H.; van Burgsteden, J. A.; Saleminki, P. J.; van Esch, E., Use of physicochemical calculation of pKa and CLogP to predict phospholipidosis-inducing potential: a case study with structurally related piperazines. Exp Toxicol Pathol 2004, 55 (5), 347-55. 11. Muehlbacher, M.; Tripal, P.; Roas, F.; Kornhuber, J., Identification of drugs inducing phospholipidosis by novel in vitro data. ChemMedChem 2012, 7 (11), 1925-34. 12. Przybylak, K. R.; Alzahrani, A. R.; Cronin, M. T., How does the quality of phospholipidosis data influence the predictivity of structural alerts? J Chem Inf Model 2014, 54 (8), 2224-32. 13. Pelletier, D. J.; Gehlhaar, D.; Tilloy-Ellul, A.; Johnson, T. O.; Greene, N., Evaluation of a published in silico model and construction of a novel Bayesian model for predicting phospholipidosis inducing potential. J Chem Inf Model 2007, 47 (3), 1196-205. 14. Choi, S. S.; Kim, J. S.; Valerio, L. G., Jr.; Sadrieh, N., In silico modeling to predict druginduced phospholipidosis. Toxicol Appl Pharmacol 2013, 269 (2), 195-204. 15. Sun, H.; Shahane, S.; Xia, M.; Austin, C. P.; Huang, R., Structure based model for the prediction of phospholipidosis induction potential of small molecules. J Chem Inf Model 2012, 52 (7), 1798-805.
ACS Paragon Plus Environment
22
Page 23 of 29
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Molecular Pharmaceutics
16. Bauch, C.; Bevan, S.; Woodhouse, H.; Dilworth, C.; Walker, P., Predicting in vivo phospholipidosis-inducing potential of drugs by a combined high content screening and in silico modelling approach. Toxicol In Vitro 2015, 29 (3), 621-30. 17. Morelli, J. K.; Buehrle, M.; Pognan, F.; Barone, L. R.; Fieles, W.; Ciaccio, P. J., Validation of an in vitro screen for phospholipidosis using a high-content biology platform. Cell Biol Toxicol 2006, 22 (1), 15-27. 18. Labute, P., A widely applicable set of descriptors. J Mol Graph Model 2000, 18 (4-5), 464-77. 19. Cortes, C.; Vapnik, V., Support-Vector Networks. Machine Learning 1995, 20 (3), 273297. 20. Breiman, L., Random Forests. Machine Learning 2001, 45 (1), 5-32. 21. Stalring, J. C.; Carlsson, L. A.; Almeida, P.; Boyer, S., AZOrange - High performance open source machine learning for QSAR modeling in a graphical programming environment. J Cheminform 2011, 3, 28.
ACS Paragon Plus Environment
23
Molecular Pharmaceutics
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Figure 1. Distribution of the 1,537 compounds tested in vitro and used for building the in silico model. The bar chart (a) sorts all compounds by the EC50 value. Dark columns denote PLD in vitro positive compounds, light columns denote PLD in vitro negative compounds. The scatter plot in logarithmic scale (b) plots the two readouts of the in vitro screen against each other. Hashed lines in both plots denote the corresponding cutoffs (15µM EC50 and 10% top effect, respectively). 338x451mm (96 x 96 DPI)
ACS Paragon Plus Environment
Page 24 of 29
Page 25 of 29
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Molecular Pharmaceutics
Figure 2. Processed images of cells showing NBD-PE after compound incubation with punctate PLD-like inclusions highlighted in blue. (a) Chlorcyclizine 6.25 µM, (b) Chlorcyclizine 12.5 µM, (c) Cloralgil 6.25 µM, (d) Cloralgil 12.5 µM, (e) Fexofenadine 7.5 µM and (f) Fexofenadine 15 µM. The blue dots in the image represent cells showing accumulated phospholipids, while the light dots are in cells that do not have phospholipid build-up. 338x451mm (96 x 96 DPI)
ACS Paragon Plus Environment
Molecular Pharmaceutics
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Figure 3. Quantified phospholipogenic effects are given as percentage activity over background. Activity denotes normalised counts of build-up of punctate cytoplasmic staining from the introduced fluorescencetagged phospholipid. Effects of three selected compounds on phospholipogenic activity are shown: Fexofenadine, Cloralgil and Chlorcyclizine. Circles, squares and triangles denote the mean of two separate test occasions. 338x190mm (96 x 96 DPI)
ACS Paragon Plus Environment
Page 26 of 29
Page 27 of 29
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Molecular Pharmaceutics
Figure 4. Scatterplot depicting the first and second principal component of the PCA. Filled circles denote in vitro training set compounds, grey circles denote in vitro test set compounds and empty circles represent Orogo subset compounds. 338x190mm (96 x 96 DPI)
ACS Paragon Plus Environment
Molecular Pharmaceutics
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Figure 5. The schemes demonstrating the decision tree based consensus models. Figure (a) combines the Ploemen rules with our own PLD in vitro screen readout and Figure (b) combines the Ploemen rules with an in silico model. 254x142mm (96 x 96 DPI)
ACS Paragon Plus Environment
Page 28 of 29
Page 29 of 29
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Molecular Pharmaceutics
Figure 6. Four example compounds of the Orogo subset with contradictory PLD predictions: The phospholipogenic agents Chlorcyclizine and Bromhexine and the clean drugs Fexofenadine and Sulfamethoxazole. 338x190mm (96 x 96 DPI)
ACS Paragon Plus Environment