Subscriber access provided by READING UNIV
Application Note
ADVERPred – web service for prediction of adverse effects of drugs Sergey M. Ivanov, Alexey A. Lagunin, Anastasia V. Rudik, Dmitry A. Filimonov, and Vladimir V. Poroikov J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.7b00568 • Publication Date (Web): 05 Dec 2017 Downloaded from http://pubs.acs.org on December 7, 2017
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
Journal of Chemical Information and Modeling is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
ADVERPred – web service for prediction of adverse effects of drugs Sergey M. Ivanov*1,2, Alexey A. Lagunin1,2, Anastasia V. Rudik1, Dmitry A. Filimonov1, Vladimir V. Poroikov1 1
Department of Bioinformatics, Institute of Biomedical Chemistry, 119121, Pogodinskaya Street, 10, Moscow, Russia
2
Medico-biological Faculty, Pirogov Russian National Research Medical University, 1179971, Ostrovityanova Street, Moscow, Russia
Abstract Application of structure-activity relationships (SARs) for the prediction of adverse effects of drugs (ADEs) has been reported in many published studies. Training sets for the creation of SAR models are usually based on drug label information which allows for the generation of datasets for many hundreds of drugs. Since many ADEs may not be related to drug consumption, one of the main problems in such studies is the quality of data on drug-ADE pairs obtained from labels. The information on ADEs may be included in three sections of the drug labels: “Boxed warning,” “Warnings and Precautions” and “Adverse reactions.” The first two sections, especially “Boxed warning,” usually contain the most frequent and severe ADEs that have either known or probable relationships to drug consumption. Using this information, we have created
ACS Paragon Plus Environment
1
Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 2 of 15
the manually curated datasets for the five most frequent and severe ADEs: myocardial infarction, arrhythmia, cardiac failure, severe hepatotoxicity, and nephrotoxicity, with more than 850 drugs on the average for each effect. The corresponding SARs were built by PASS (Prediction of Activity Spectra for Substances) software and had balanced accuracy values of 0.74, 0.7, 0.77, 0.67 and 0.75, respectively. They were implemented in a freely available ADVERPred web service (http://www.way2drug.com/adverpred/), which enables a user to predict five ADEs based on the structural formula of compound. This web service can be applied for estimation of the corresponding ADEs for hits and lead compounds at the early stages of drug discovery.
Introduction Adverse effects of drugs (ADEs) are one of the leading causes of death in developed countries,1,2 the second reason for stopping the development of new drugs at later stages of clinical trials, and the main reason for drug recalls from the market.3 This is due to disadvantages of animal toxicological experiments as well as clinical trials which cannot detect all life-threatening ADEs owing to inter-species differences and idiosyncratic nature of many undesirable effects; therefore, additional methods for ADEs prediction are currently developed.4 These methods are usually based on machine learning or analyses of biological networks and various properties of drug-like compounds: chemical descriptors, drug protein targets, drug-induced gene expression profiles as well as the phenotypic properties of cells treated with drugs. Unlike other approaches, structure-activity relationships (SAR) require only information on structural formulas of drugcandidates and, therefore, can be used for prediction of ADEs at the earliest stages of drug development. Currently developed SAR models were built using various approaches, descriptors and training sets,4,5 but they are mostly based on the ADE data from drug labels, which may contain ADEs that have no causal relationships to drug consumption.6 As a result, the SAR
ACS Paragon Plus Environment
2
Page 3 of 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
models published can yield wrong prediction results even in the case of a relatively high accuracy of the prediction obtained by k-fold cross-validation procedure. It can be explained by the fact that many drugs in the training set, belonging to the same pharmacological classes and having similar structures, may have a link to ADE. However, this link may be due to common co-factors, e.g., similar indication, similar co-morbidities, co-medications, etc. and unrelated to the drug itself. Thus, the ADE data may not look like as random and may provide a relatively high accuracy of prediction, but the corresponding SAR models cannot be used to solve practical tasks. Since drug labels are the most available source of information on ADEs, it is necessary to distinguish causal drug-ADE relationships from non-causal ones. Typical drug labels have different sections with the description of ADEs: “Boxed warning” (BW), “Warnings and Precautions” (WP) and “Adverse reactions” (AR). The latter may contain many ADEs which are not related to drug consumption; on the contrary, the first two sections, especially BW, usually contain the most frequent and severe ADEs that have either known or probable relationships to drug consumption. Chen M., et al.6 used such information to classify drugs into three categories of hepatotoxicity: (1) “most concern” drugs which cause severe liver damage, e.g. acute liver failure or liver necrosis, described in BW or WP sections of label or withdrawn from the market owing to hepatotoxicity; (2) “less concern” drugs which cause less severe liver damages and/or described in AR section of label; (3) “no concern” drugs which have no hepatotoxicity related events in any sections. This classification can be extended to other ADEs and used to generate the high-quality training sets for the creation of SAR models. In our study, we used the classification scheme mentioned above and PASS software (see Materials and Methods) to create the SAR models for cardio-, nephro- and hepatotoxicity since they are the most frequent and severe ADEs, which often lead to death and require the
ACS Paragon Plus Environment
3
Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 4 of 15
withdrawal of drug from the market. Since the SAR models created had a relatively high accuracy of prediction, we implemented them in a freely available web service ADVERPred (http://www.way2drug.com/adverpred/), which can be used for estimation of drug-induced myocardial infarction, arrhythmia, cardiac failure, severe hepato- and nephrotoxicity based on the structural formula of a compound.
MATERIALS AND METHODS Creation of training sets We have generated datasets for five ADEs: myocardial infarction, arrhythmias, cardiac failure, severe hepatotoxicity and severe nephrotoxicity using the following scheme (see Figure 1).
Figure 1. Workflow describing the classification of drugs into “actives” and “inactives” based on the information from the drug labels.
ACS Paragon Plus Environment
4
Page 5 of 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
First, we retrieved drug-ADE pairs for these effects from SIDER 4.1,7,8 which accumulated corresponding information from drug labels. Each drug-ADE pair in SIDER had a link to original labels; therefore, we manually checked the sections where the ADE was mentioned: BW, WP or AR. If myocardial infarction, arrhythmia or cardiac failure were mentioned in BW or WP sections of the label, we considered the corresponding drugs as “actives”. Hepato- and nephrotoxicity appear as pathologies of different degrees of severity, and they are associated with most of the drugs; therefore, we considered drugs as “actives” if they cause acute liver/kidney failure and these effects were described in BW or WP sections of label or the LiverTox website.9 We also included into the datasets those drugs which were withdrawn from the market because of those five ADEs, and considered them as “actives”. All other drugs in the datasets, which did not meet the above-mentioned criteria, were considered as “inactives” (Figure 1). Additionally we retrieved compounds from the Credible Meds website,10 which contains the lists of drugs associated with QT interval prolongation and Torsades de Pointes, and included them as “actives” into the arrhythmia dataset. To filter out false negatives, we excluded from the datasets the compounds with modes of administration other than oral and parenteral since they may not provide a sufficient drug concentration in the blood, required for induction of the ADEs investigated. The dataset for hepatotoxicity includes only oral drugs with average daily doses exceeded 10 mg, because liver damage usually appears only if the drugs are used orally in relatively high doses.11 The information about routes of administration and daily doses was taken from ATC/DDD Index12 and drug labels. Finally, inorganic drugs, as well as the drug structures with less than three carbon atoms or molecular weight more than 1250 Da, were excluded from the datasets according to the solid drug-likeness criteria.13 Stereoisomers, different salts, and esters of
ACS Paragon Plus Environment
5
Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 6 of 15
remaining drugs were merged, and the drug structures were reduced to neutral forms. The obtained datasets were used for the creation of corresponding SAR models. These datasets can be accessed as sdf files via the website (http://www.way2drug.com/adverpred/definition.php) and Supporting Information. The names of drugs, DrugBank IDs, and SMILES for all drugs are also represented in Table S1. Creation of structure-activity relationships Structure-activity relationships (SARs) were created for myocardial infarction, arrhythmia, cardiac failure, hepato- and nephrotoxicity using PASS (Prediction of Activity Spectra for Substances) software.14-16 PASS is used to predict various types of biological activity including therapeutic effects, molecular mechanisms of action, interaction with enzymes of metabolism and transporters, and specific toxicities. It uses Multilevel Neighborhoods of Atoms descriptors and a Bayesian approach and is available as a desktop program as well as a freely available web service on the Way2Drug platform.17 The PASS Online web service has thousands of users and about two hundred independent success stories of its practical application with experimental confirmation of prediction results.16 PASS provides the user with two estimates of probabilities for each biological activity of a chemical compound: Pa – probability to be active, Pi – probability to be inactive. If a compound has Pa>Pi, it can be considered as active. The larger Pa and Pa-Pi values, the greater the probability of obtaining an activity in the experiment or clinical trials. A detailed description of PASS is represented in Supporting Information.
RESULTS AND DISCUSSION We have created the SARs models for myocardial infarction, arrhythmia, cardiac failure, hepatotoxicity and nephrotoxicity using PASS software and manually curated datasets on these
ACS Paragon Plus Environment
6
Page 7 of 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
ADEs (see Materials and Methods). An area under the ROC curve, sensitivity, specificity, precision and balanced accuracy were calculated for each SAR models related to appropriate ADE using 5-fold cross-validation procedure (Table 1).
Table 1. The number of compounds in datasets and accuracies of prediction for five ADEs. All accuracy values were obtained by 5-fold cross-validation procedure. Adverse effect
Comp. Act.
AUC
Sens.
Spec. Prec.
B. Acc.
Myocardial infarction
896
92
0.85
0.75
0.73
0.25
0.74
Arrhythmia
911
177
0.77
0.71
0.70
0.37
0.71
Cardiac failure
904
83
0.86
0.80
0.74
0.24
0.77
Hepatotoxicity
684
181
0.71
0.69
0.66
0.42
0.67
Nephrotoxicity
900
98
0.82
0.75
0.75
0.26
0.75
Comp. is the number of compounds in the dataset; Act. is the number of active compounds in the dataset; AUC is the area under the ROC curve; Sens. is sensitivity; Spec. is specificity; Prec. is precision; B. Acc. is balanced accuracy.
Since the models created had a relatively high accuracy of prediction, we implemented them in a freely available web service ADVERPred (http://www.way2drug.com/adverpred/). To make a prediction, users should upload the 2D structural formula of a compound as MOL file, insert it as SMILES string or draw it in Marvin Molecular Editor, and press the “Make prediction” button (Figure 2). The results of prediction can be downloaded as sdf, csv or pdf file.
ACS Paragon Plus Environment
7
Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 8 of 15
Figure 2. ADVERPred web service with the prediction results for rofecoxib. Rofecoxib was excluded from all datasets, and the corresponding SAR models were rebuilt before prediction.
Figure 2 represents the prediction results for rofecoxib which was withdrawn from the market because of its ability to cause myocardial infarction.18 Also, Rofecoxib is strongly associated with cardiac failure, although the risk of this effect is significantly higher in patients with preexisting heart diseases.19 Rofecoxib is not associated with arrhythmias and causes less severe hepato- and nephrotoxicity, which does not lead to liver or kidney failure; thus, rofecoxib is considered as “inactive” for these ADEs (see Materials and Methods). The prediction results obtained for rofecoxib using ADVERPred is in accordance with these data. Myocardial infarction and cardiac failure obtained relatively high Pa values, which are the estimates of probabilities to be active, and low Pi values, which are the estimates of probabilities to be inactive. Thus, it may be inferred that rofecoxib may cause these effects. Hepatotoxicity and arrhythmia were also predicted for rofecoxib with Pa>Pi values; however, the Pa values are lower than for myocardial infarction and cardiac failure, and approximately equal to the corresponding Pi values, which is considered relatively high in comparison with the Pi values for
ACS Paragon Plus Environment
8
Page 9 of 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
“myocardial infarction” and “cardiac failure.” Thus, we can conclude that rofecoxib is unlikely to cause hepatotoxicity, arrhythmia, and nephrotoxicity which was not predicted according to Pa>Pi threshold. The detailed guidelines for interpretation of PASS results are represented in Supporting Information and in the “Interpretation” menu of ADVERPred web service.
CONCLUSIONS The present work describes the creation of manually curated training sets for the five most serious and frequent adverse effects of drugs: myocardial infarction, arrhythmia, cardiac failure, severe hepatotoxicity and severe nephrotoxicity. Each active drug in these datasets is associated with severe and/or frequent adverse effect, and this relationship is causal. The usage of these datasets allows us to create classification models of structure-activity relationships which provide a correct prediction of five adverse effects of drugs without the bias associated with a presence of wrong drug-effect pairs in the training sets. These structure-activity relationships were
implemented
in
the
freely
available
web
service
ADVERPred
(http://www.way2drug.com/adverpred/) which can be used for estimation of the ability of druglike compounds to cause myocardial infarction, arrhythmia, cardiac failure, hepatotoxicity and nephrotoxicity based on their structural formulas. In particular, ADVERPred can be applied for estimation of the corresponding adverse effects for hits and lead compounds at the early stages of drug discovery.
ACS Paragon Plus Environment
9
Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 10 of 15
ASSOCIATED CONTENT Supporting Information Supporting information contains a description of PASS approach, table and sdf files with information on the five datasets used for the creation of the structure-activity relationships models for adverse effects of drugs. AUTHOR INFORMATION Corresponding Author *E-mail:
[email protected] Funding Sources All authors received funding for this work from Russian Science Foundation grant 14-15-00449. ABBREVIATIONS ADEs, Adverse Effects of Drugs; AUC, Area Under the ROC Curve; MNA, Multilevel Neighborhoods of Atoms; PASS, Prediction of Activity Spectra for Substances; SAR, StructureActivity Relationship. REFERENCES (1) Starfield, B. Is US Health Really the Best in the World? JAMA, 2000, 284, 483-485. (2) Xu, J.; Murphy, S.L.; Kochanek, K.D.; Bastian, B.A. Deaths: Final Data for 2013. Natl. Vital Stat. Rep., 2016, 64, 100-119.
ACS Paragon Plus Environment
10
Page 11 of 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
(3) Hornberg, J.J.; Laursen, M.; Brenden, N.; Persson, M.; Thougaard, A.V.; Toft, D.B.; Mow, T. Exploratory Toxicology as an Integrated Part of Drug Discovery. Part I: Why and How. Drug Discov. Today, 2014, 19, 1131-1136. (4) Ivanov, S.M.; Lagunin, A.A.; Poroikov, V.V. In Silico Assessment of Adverse Drug Reactions and Associated Mechanisms. Drug Discov. Today, 2016, 21, 58-71. (5) Chen, M.; Bisgin, H.; Tong, L.; Hong, H.; Fang, H.; Borlak, J.; Tong, W. Toward Predictive Models for Drug-Induced Liver Injury in Humans: Are We There Yet? Biomark. Med., 2014, 8, 201-213. (6) Chen, M.; Vijay, V.; Shi, Q.; Liu, Z.; Fang, H.; Tong, W. FDA-Approved Drug Labeling for the Study of Drug-Induced Liver Injury. Drug Discov. Today, 2011, 16, 697-703. (7) Kuhn, M.; Letunic, I.; Jensen, L.J.; Bork, P. The SIDER Database of Drugs and Side Effects. Nucleic Acids Res., 2016, 44, D1075- D1079. (8) SIDER 4.1 : Side Effect Resource, http://sideeffects.embl.de/ (accessed June 1, 2017) (9) LiverTox : Clinical and Research Information on Drug-Induced Liver Injury, https://livertox.nih.gov/ (accessed June 1, 2017) (10) CredibleMeds, https://crediblemeds.org/ (accessed June 1, 2017) (11) Chen, M.; Suzuki, A.; Borlak, J.; Andrade, R.J.; Lucena, M.I. Drug-Induced Liver Injury: Interactions Between Drug Properties and Host Factors. J. Hepatol., 2015, 63, 503-514. (12) ATC/DDD Index 2017, https://www.whocc.no/atc_ddd_index/ (accessed June 1, 2017)
ACS Paragon Plus Environment
11
Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 12 of 15
(13) Fourches, D.; Muratov, E.; Tropsha, A. Curation of Chemogenomics Data. Nat. Chem. Biol., 2015, 11, 535. (14) Filimonov, D.; Poroikov, V.; Borodina, Yu.; Gloriozova, T. Chemical Similarity Assessment Through Multilevel Neighborhoods of Atoms: Definition and Comparison with the Other Descriptors. J. Chem. Inf. Comput. Sci., 1999, 39, 666–670. (15) Filimonov, D.A.; Poroikov, V.V. In Chemoinformatics Approaches to Virtual Screening; Varnek, A.; Tropsha, A., Ed.; RSC Publishing: Cambridge, 2008; Chapter 6, pp 182-216. (16) Filimonov, D.A.; Lagunin, A.A.; Gloriozova, T.A.; Rudik, A.V.; Druzhilovskii, D.S.; Pogodin, P.V.; Poroikov, V.V. Prediction of the Biological Activity Spectra of Organic Compounds Using the Pass Online Web Resource. Chem. Heterocycl. Compnds., 2014, 50, 444– 457. (17) PASS Online, http://www.way2drug.com/PASSOnline/ (accessed June 1, 2017) (18) Sibbald, B. Rofecoxib (Vioxx) Voluntarily Withdrawn from Market. CMAJ., 2004, 171, 1027-1028. (19) Maxwell, C.B.; Jenkins, A.T. Drug-Induced Heart Failure. Am. J. Health Syst. Pharm., 2011, 68, 1791-1804.
ACS Paragon Plus Environment
12
Page 13 of 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
for Table of Contents use only
ADVERPred – web service for prediction of adverse effects of drugs Sergey M. Ivanov, Alexey A. Lagunin, Anastasia V. Rudik, Dmitry A. Filimonov, Vladimir V. Poroikov
ACS Paragon Plus Environment
13
Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Figure 1. Workflow describing the classification of drugs into “actives” and “inactives” based on the information from the drug labels.
ACS Paragon Plus Environment
Page 14 of 15
Page 15 of 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
Figure 2. ADVERPred web service with the prediction results for rofecoxib. Rofecoxib was excluded from all datasets, and the corresponding SAR models were rebuilt before prediction. 127x57mm (300 x 300 DPI)
ACS Paragon Plus Environment