ADVERPred – web service for prediction of adverse effects of drugs

service (http://www.way2drug.com/adverpred/), which enables a user to predict five ADEs based ... the high-quality training sets for the creation of S...
0 downloads 4 Views 730KB Size
Application Note Cite This: J. Chem. Inf. Model. 2018, 58, 8−11

pubs.acs.org/jcim

ADVERPred−Web Service for Prediction of Adverse Effects of Drugs Sergey M. Ivanov,*,†,‡ Alexey A. Lagunin,†,‡ Anastasia V. Rudik,† Dmitry A. Filimonov,† and Vladimir V. Poroikov† †

Department of Bioinformatics, Institute of Biomedical Chemistry, 119121, Pogodinskaya Street, 10, Moscow, Russia Medico-biological Faculty, Pirogov Russian National Research Medical University, 1179971, Ostrovityanova Street, Moscow, Russia



S Supporting Information *

ABSTRACT: Application of structure−activity relationships (SARs) for the prediction of adverse effects of drugs (ADEs) has been reported in many published studies. Training sets for the creation of SAR models are usually based on drug label information which allows for the generation of data sets for many hundreds of drugs. Since many ADEs may not be related to drug consumption, one of the main problems in such studies is the quality of data on drug−ADE pairs obtained from labels. The information on ADEs may be included in three sections of the drug labels: “Boxed warning,” “Warnings and Precautions,” and “Adverse reactions.” The first two sections, especially Boxed warning, usually contain the most frequent and severe ADEs that have either known or probable relationships to drug consumption. Using this information, we have created manually curated data sets for the five most frequent and severe ADEs: myocardial infarction, arrhythmia, cardiac failure, severe hepatotoxicity, and nephrotoxicity, with more than 850 drugs on average for each effect. The corresponding SARs were built with PASS (Prediction of Activity Spectra for Substances) software and had balanced accuracy values of 0.74, 0.7, 0.77, 0.67, and 0.75, respectively. They were implemented in a freely available ADVERPred web service (http://www.way2drug.com/adverpred/), which enables a user to predict five ADEs based on the structural formula of compound. This web service can be applied for estimation of the corresponding ADEs for hits and lead compounds at the early stages of drug discovery.



INTRODUCTION Adverse effects of drugs (ADEs) are one of the leading causes of death in developed countries,1,2 the second reason for stopping the development of new drugs at later stages of clinical trials, and the main reason for drug recalls from the market.3 This is due to disadvantages of animal toxicological experiments as well as clinical trials which cannot detect all lifethreatening ADEs owing to interspecies differences and the idiosyncratic nature of many undesirable effects; therefore, additional methods for ADE prediction are currently being developed.4 These methods are usually based on machine learning or analyses of biological networks and various properties of drug-like compounds: chemical descriptors, drug protein targets, and drug-induced gene expression profiles as well as the phenotypic properties of cells treated with drugs. Unlike other approaches, structure−activity relationships (SARs) require only information on structural formulas of drug-candidates and, therefore, can be used for prediction of ADEs at the earliest stages of drug development. Currently developed SAR models were built using various approaches, descriptors, and training sets,4,5 but they are mostly based on the ADE data from drug labels, which may contain ADEs that have no causal relationships to drug consumption.6 As a result, the SAR models published can yield wrong prediction results even in the case of a relatively high accuracy of prediction obtained by k-fold cross-validation procedures. This can be © 2017 American Chemical Society

explained by the fact that many drugs in the training set, belonging to the same pharmacological classes and having similar structures, may have a link to an ADE. However, this link may be due to common cofactors, e.g., similar indication, similar comorbidities, comedications, etc., and unrelated to the drug itself. Thus, the ADE data may not look like as random and may provide a relatively high accuracy of prediction, but the corresponding SAR models cannot be used to solve practical tasks. Since drug labels are the most available source of information on ADEs, it is necessary to distinguish causal drug−ADE relationships from noncausal ones. Typical drug labels have different sections with the description of ADEs: “Boxed warning” (BW), “Warnings and Precautions” (WP), and “Adverse reactions” (AR). The latter may contain many ADEs which are not related to drug consumption; on the contrary, the first two sections, especially BW, usually contain the most frequent and severe ADEs that have either known or probable relationships to drug consumption. Chen et al.6 used such information to classify drugs into three categories of hepatotoxicity: (1) “most concern” drugs which cause severe liver damage, e.g. acute liver failure or liver necrosis, described in the BW or WP sections of the label or withdrawn from the market owing to hepatotoxicity; (2) “less concern” drugs which Received: September 20, 2017 Published: December 5, 2017 8

DOI: 10.1021/acs.jcim.7b00568 J. Chem. Inf. Model. 2018, 58, 8−11

Application Note

Journal of Chemical Information and Modeling

those drugs which were withdrawn from the market because of those five ADEs and considered them as “actives”. All other drugs in the data sets, which did not meet the above-mentioned criteria, were considered as “inactives” (Figure 1). Additionally we retrieved compounds from the Credible Meds website,10 which contains the lists of drugs associated with QT interval prolongation and Torsades de Pointes and included them as “actives” into the arrhythmia data set. To filter out false negatives, we excluded from the data sets the compounds with modes of administration other than oral and parenteral since they may not provide a sufficient drug concentration in the blood, required for induction of the ADEs investigated. The data set for hepatotoxicity includes only oral drugs with average daily doses exceeded 10 mg, because liver damage usually appears only if the drugs are used orally in relatively high doses.11 The information about routes of administration and daily doses was taken from ATC/DDD Index12 and drug labels. Finally, inorganic drugs, as well as the drug structures with less than three carbon atoms or molecular weight more than 1250 Da, were excluded from the data sets according to the solid drug-likeness criteria.13 Stereoisomers, different salts, and esters of remaining drugs were merged, and the drug structures were reduced to neutral forms. The obtained data sets were used for the creation of corresponding SAR models. These data sets can be accessed as sdf files via the website (http://www.way2drug.com/adverpred/definition. php) and Supporting Information. The names of drugs, DrugBank IDs, and SMILES for all drugs are also represented in Table S1. Creation of Structure−Activity Relationships. SARs were created for myocardial infarction, arrhythmia, cardiac failure, and hepato- and nephrotoxicity using PASS (Prediction of Activity Spectra for Substances) software.14−16 PASS is used to predict various types of biological activity including therapeutic effects, molecular mechanisms of action, interaction with enzymes of metabolism and transporters, and specific toxicities. It uses multilevel neighborhoods of atoms descriptors and a Bayesian approach and is available as a desktop program as well as a freely available web service on the Way2Drug platform.17 The PASS Online web service has thousands of users and about two hundred independent success stories of its practical application with experimental confirmation of prediction results.16 PASS provides the user with two estimates of probabilities for each biological activity of a chemical compound: Pa probability to be active, Pi probability to be inactive. If a compound has Pa > Pi, it can be considered as active. The larger Pa and Pa−Pi values, the greater the probability of obtaining an activity in the experiment or clinical trials. A detailed description of PASS is presented in the Supporting Information.

cause less severe liver damages and/or described in the AR section of label; (3) “no concern” drugs which have no hepatotoxicity related events in any sections. This classification can be extended to other ADEs and used to generate the highquality training sets for the creation of SAR models. In our study, we used the classification scheme mentioned above and PASS software (see Materials and Methods) to create the SAR models for cardio-, nephro-, and hepatotoxicity since they are the most frequent and severe ADEs, which often lead to death and require the withdrawal of drug from the market. Since the SAR models created had a relatively high accuracy of prediction, we implemented them in a freely available web service ADVERPred (http://www.way2drug. com/adverpred/), which can be used for estimation of druginduced myocardial infarction, arrhythmia, cardiac failure, and severe hepato- and nephrotoxicity based on the structural formula of a compound.



MATERIALS AND METHODS Creation of Training Sets. We have generated data sets for five ADEs: myocardial infarction, arrhythmias, cardiac failure, severe hepatotoxicity, and severe nephrotoxicity using the following scheme (see Figure 1).

Figure 1. Workflow describing the classification of drugs into “actives” and “inactives” based on the information from the drug labels.



First, we retrieved drug−ADE pairs for these effects from SIDER 4.1,7,8 which accumulated corresponding information from drug labels. Each drug−ADE pair in SIDER had a link to original labels; therefore, we manually checked the sections where the ADE was mentioned: BW, WP, or AR. If myocardial infarction, arrhythmia, or cardiac failure were mentioned in the BW or WP sections of the label, we considered the corresponding drugs as “actives”. Hepato- and nephrotoxicity appear as pathologies of different degrees of severity, and they are associated with most of the drugs; therefore, we considered drugs as “actives” if they cause acute liver/kidney failure and these effects were described in the BW or WP sections of label or the LiverTox website.9 We also included into the data sets

RESULTS AND DISCUSSION We have created the SARs models for myocardial infarction, arrhythmia, cardiac failure, hepatotoxicity, and nephrotoxicity using PASS software and manually curated data sets on these ADEs (see Materials and Methods). An area under the ROC curve, sensitivity, specificity, precision, and balanced accuracy were calculated for each SAR models related to appropriate ADE using a 5-fold cross-validation procedure (Table 1). Since the models created had a relatively high accuracy of prediction, we implemented them in a freely available web service ADVERPred (http://www.way2drug.com/adverpred/). To make a prediction, users should upload the 2D structural 9

DOI: 10.1021/acs.jcim.7b00568 J. Chem. Inf. Model. 2018, 58, 8−11

Application Note

Journal of Chemical Information and Modeling

interpretation of PASS results are represented in the Supporting Information and in the “Interpretation” menu of ADVERPred web service.

Table 1. Number of Compounds in Data Sets and Accuracies of Prediction for Five ADEsa adverse effect myocardial infarction arrhythmia cardiac failure hepatotoxicity nephrotoxicity

comp

act

AUC

sens

spec

prec

BA

896

92

0.85

0.75

0.73

0.25

0.74

911 904 684 900

177 83 181 98

0.77 0.86 0.71 0.82

0.71 0.80 0.69 0.75

0.70 0.74 0.66 0.75

0.37 0.24 0.42 0.26

0.71 0.77 0.67 0.75



CONCLUSIONS The present work describes the creation of manually curated training sets for the five most serious and frequent adverse effects of drugs: myocardial infarction, arrhythmia, cardiac failure, severe hepatotoxicity, and severe nephrotoxicity. Each active drug in these data sets is associated with severe and/or frequent adverse effects, and this relationship is causal. The usage of these data sets allows us to create classification models of structure−activity relationships which provide a correct prediction of five adverse effects of drugs without the bias associated with the presence of wrong drug−effect pairs in the training sets. These structure−activity relationships were implemented in the freely available web service ADVERPred (http://www.way2drug.com/adverpred/) which can be used for estimation of the ability of drug-like compounds to cause myocardial infarction, arrhythmia, cardiac failure, hepatotoxicity, and nephrotoxicity based on their structural formulas. In particular, ADVERPred can be applied for estimation of the corresponding adverse effects for hits and lead compounds at the early stages of drug discovery.

a

All accuracy values were obtained by 5-fold cross-validation procedure. Comp is the number of compounds in the data set; act is the number of active compounds in the data set; AUC is the area under the ROC curve; sens is sensitivity; spec is specificity; prec is precision; BA is balanced accuracy.

formula of a compound as a MOL file, insert it as SMILES string, or draw it in Marvin molecular editor and press the “Make prediction” button (Figure 2). The results of prediction can be downloaded as sdf, csv, or pdf files. Figure 2 represents the prediction results for rofecoxib which was withdrawn from the market because of its ability to cause myocardial infarction.18 Also, Rofecoxib is strongly associated with cardiac failure, although the risk of this effect is significantly higher in patients with preexisting heart diseases.19 Rofecoxib is not associated with arrhythmias and causes less severe hepato- and nephrotoxicity, which does not lead to liver or kidney failure; thus, rofecoxib is considered as “inactive” for these ADEs (see Materials and Methods). The prediction results obtained for rofecoxib using ADVERPred is in accordance with these data. Myocardial infarction and cardiac failure obtained relatively high Pa values, which are the estimates of probabilities to be active, and low Pi values, which are the estimates of probabilities to be inactive. Thus, it may be inferred that rofecoxib may cause these effects. Hepatotoxicity and arrhythmia were also predicted for rofecoxib with Pa > Pi values; however, the Pa values are lower than for myocardial infarction and cardiac failure, and approximately equal to the corresponding Pi values, which is considered relatively high in comparison with the Pi values for “myocardial infarction” and “cardiac failure.” Thus, we can conclude that rofecoxib is unlikely to cause hepatotoxicity, arrhythmia, and nephrotoxicity which was not predicted according to Pa > Pi threshold. The detailed guidelines for



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jcim.7b00568. Description of the PASS approach and table with information on the five data sets used for the creation of the SAR models for adverse effects of drugs (PDF) SDF files of the five data sets used for the creation of the SAR models for adverse effects of drugs (ZIP)



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. ORCID

Sergey M. Ivanov: 0000-0002-3177-6237 Alexey A. Lagunin: 0000-0003-1757-8004 Anastasia V. Rudik: 0000-0002-8916-9675

Figure 2. ADVERPred web service with the prediction results for rofecoxib. Rofecoxib was excluded from all data sets, and the corresponding SAR models were rebuilt before prediction (Image reprinted from http://way2drug.com/adverpred/rofecoxib/. Copyright 2011 W2D Team.). 10

DOI: 10.1021/acs.jcim.7b00568 J. Chem. Inf. Model. 2018, 58, 8−11

Application Note

Journal of Chemical Information and Modeling

Dmitry A. Filimonov: 0000-0002-0339-8478 Vladimir V. Poroikov: 0000-0001-7937-2621 Funding

All authors received funding for this work from Russian Science Foundation grant 14-15-00449. Notes

The authors declare no competing financial interest.



ABBREVIATIONS ADEs, adverse effects of drugs; AUC, area under the ROC curve; MNA, multilevel neighborhoods of atoms; PASS, prediction of activity spectra for substances; SAR, structure− activity relationship



REFERENCES

(1) Starfield, B. Is US Health Really the Best in the World? JAMA 2000, 284, 483−485. (2) Xu, J.; Murphy, S. L.; Kochanek, K. D.; Bastian, B. A. Deaths: Final Data for 2013. Natl. Vital Stat. Rep. 2016, 64, 1−119. (3) Hornberg, J. J.; Laursen, M.; Brenden, N.; Persson, M.; Thougaard, A. V.; Toft, D. B.; Mow, T. Exploratory Toxicology as an Integrated Part of Drug Discovery. Part I: Why and How. Drug Discovery Today 2014, 19, 1131−1136. (4) Ivanov, S. M.; Lagunin, A. A.; Poroikov, V. V. In Silico Assessment of Adverse Drug Reactions and Associated Mechanisms. Drug Discovery Today 2016, 21, 58−71. (5) Chen, M.; Bisgin, H.; Tong, L.; Hong, H.; Fang, H.; Borlak, J.; Tong, W. Toward Predictive Models for Drug-Induced Liver Injury in Humans: Are We There Yet? Biomarkers Med. 2014, 8, 201−213. (6) Chen, M.; Vijay, V.; Shi, Q.; Liu, Z.; Fang, H.; Tong, W. FDAApproved Drug Labeling for the Study of Drug-Induced Liver Injury. Drug Discovery Today 2011, 16, 697−703. (7) Kuhn, M.; Letunic, I.; Jensen, L. J.; Bork, P. The SIDER Database of Drugs and Side Effects. Nucleic Acids Res. 2016, 44, D1075−D1079. (8) SIDER 4.1: Side Effect Resource. http://sideeffects.embl.de/ (accessed June 1, 2017). (9) LiverTox: Clinical and Research Information on Drug-Induced Liver Injury. https://livertox.nih.gov/ (accessed June 1, 2017). (10) CredibleMeds. https://crediblemeds.org/ (accessed June 1, 2017). (11) Chen, M.; Suzuki, A.; Borlak, J.; Andrade, R. J.; Lucena, M. I. Drug-Induced Liver Injury: Interactions Between Drug Properties and Host Factors. J. Hepatol. 2015, 63, 503−514. (12) ATC/DDD Index 2017. https://www.whocc.no/atc_ddd_ index/ (accessed June 1, 2017). (13) Fourches, D.; Muratov, E.; Tropsha, A. Curation of Chemogenomics Data. Nat. Chem. Biol. 2015, 11, 535. (14) Filimonov, D.; Poroikov, V.; Borodina, Yu.; Gloriozova, T. Chemical Similarity Assessment Through Multilevel Neighborhoods of Atoms: Definition and Comparison with the Other Descriptors. J. Chem. Inf. Comput. Sci. 1999, 39, 666−670. (15) Filimonov, D. A.; Poroikov, V. V. In Chemoinformatics Approaches to Virtual Screening; Varnek, A.; Tropsha, A., Ed.; RSC Publishing: Cambridge, 2008; Chapter 6, pp 182−216. (16) Filimonov, D. A.; Lagunin, A. A.; Gloriozova, T. A.; Rudik, A. V.; Druzhilovskii, D. S.; Pogodin, P. V.; Poroikov, V. V. Prediction of the Biological Activity Spectra of Organic Compounds Using the Pass Online Web Resource. Chem. Heterocycl. Compd. 2014, 50, 444−457. (17) PASS Online. http://www.way2drug.com/PASSOnline/ (accessed June 1, 2017). (18) Sibbald, B. Rofecoxib (Vioxx) Voluntarily Withdrawn from Market. CMAJ. 2004, 171, 1027−1028. (19) Maxwell, C. B.; Jenkins, A. T. Drug-Induced Heart Failure. Am. J. Health-Syst. Pharm. 2011, 68, 1791−1804.

11

DOI: 10.1021/acs.jcim.7b00568 J. Chem. Inf. Model. 2018, 58, 8−11