QSAR Modeling and Prediction of Drug–Drug ... - ACS Publications

Dec 15, 2015 - Computer-Aided Drug Design Group, Chemical Biology Laboratory, Center ... and Cheminformatics, A.V. Bogatsky Physical Chemical Institut...
2 downloads 3 Views 1MB Size
Subscriber access provided by MONASH UNIVERSITY

Article

QSAR Modeling and Prediction of Drug-Drug Interactions Alexey V. Zakharov, Ekaterina V. Varlamova, Alexey A. Lagunin, Alexander V. Dmitriev, Eugene N. Muratov, Denis Fourches, Victor E. Kuz'min, Vladimir V. Poroikov, Alexander Tropsha, and Marc C. Nicklaus Mol. Pharmaceutics, Just Accepted Manuscript • DOI: 10.1021/acs.molpharmaceut.5b00762 • Publication Date (Web): 15 Dec 2015 Downloaded from http://pubs.acs.org on December 16, 2015

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Molecular Pharmaceutics is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

QSAR Modeling and Prediction of Drug-Drug Interactions Alexey V. Zakharov1, Ekaterina V. Varlamova2,3, Alexey A. Lagunin4,5, Alexander V. Dmitriev4, Eugene N. Muratov6, Denis Fourches7, Victor E. Kuz’min2, Vladimir V. Poroikov4, Alexander Tropsha6, Marc C. Nicklaus1* 1

Computer-Aided Drug Design Group, Chemical Biology Laboratory, Center for Cancer

Research, National Cancer Institute, National Institutes of Health, DHHS, NCI-Frederick, 376 Boyles St., Frederick, MD 21702, USA. 2

Department of Molecular Structure and Cheminformatics, A.V. Bogatsky Physical Chemical

Institute, National Academy of Sciences of Ukraine; Lustdorfskaya Doroga 86, Odessa 65080, Ukraine. 3

Chemical-Technological Department, Odessa National Polytechnic University, 1 Shevchenko Ave, Odessa, 65000, Ukraine. 4

Institute of Biochemical Chemistry, 10/8, Pogodinskaya street, 119121, Moscow, Russia.

5

Medico-Biological Department, Pirogov Russian National Research Medical University, Ostrovitianov str. 1, Moscow, 117997, Russia.

ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

6

Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry,

UNC Eshelman School of Pharmacy, University of North Carolina, Beard Hall 301, CB#7568, Chapel Hill, NC, 27599, USA. 7

Department of Chemistry, Bioinformatics Research Center, North Carolina State University, Raleigh, NC, 27695, USA.

KEYWORDS: drug-drug interactions, QSAR modeling, simplex descriptors, GUSAR, QNA, DDI, toxicity, adverse drug reactions, mixtures

ACS Paragon Plus Environment

Page 2 of 45

Page 3 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

ABSTRACT

Severe adverse drug reactions (ADRs) are the fourth leading cause of fatality in the US with more than 100,000 deaths per year. As up to 30% of all ADRs are believed to be caused by drugdrug interactions (DDIs), typically mediated by cytochrome P450s, possibilities to predict DDIs from existing knowledge are important. We collected data from public sources on 1,485, 2,628, 4,371, 27,966 possible DDIs mediated by four cytochrome P450 isoforms 1A2, 2C9, 2D6, 3A4 for 55, 73, 94, 237 drugs, respectively. For each of these datasets we developed and validated QSAR models for the prediction of DDIs. As a unique feature of our approach, the interacting drug pairs were represented as binary chemical mixtures in a 1:1 ratio. We used two types of chemical descriptors: Neighborhoods of Atoms (QNA) and Simplex descriptors. Radial basis functions with self-consistent regression (RBF-SCR) and Random Forest (RF) were utilized to build QSAR models forecasting the likelihood of DDIs for any pair of drug molecules. Our models showed balanced accuracy of 72%-79% for the external test sets with the coverage of 81.36%-100% when the conservative threshold for model applicability domain was applied. We generated virtually all possible binary combinations of marketed drugs and employed our models to identify drug pairs predicted to be instances of DDI. More than 4,500 of these predicted DDIs that were not found in our training sets were confirmed by data from the DrugBank database.

INTRODUCTION The average American takes three drugs per day.1 These multiple drug regimens may cause undesired effects known as drug-drug interactions (DDIs). DDIs typically lead to a reduction of the efficiency of one or both drugs, or to an enhancement of the drug effect(s) resulting in additional and/or stronger adverse effects. DDIs effects can be physical (e.g., change of the pH

ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 45

caused by the absorption of compounds such as ketoconazole and glipizide), chemical (e.g., ciprofloxacin is a chelator of cations such as aluminum, magnesium and iron), or biological, via complex interactions with various proteins. The last type of interactions is the most common one and therefore the most compelling2 to address by both the scientific and medical communities. The frequency of adverse drug interactions ranges between 3 and 20% for patients taking 2 to 10 drugs simultaneously.3 Serious adverse drug reactions are estimated to be the fourth leading fatality cause in the United States, resulting in 100,000 deaths annually.4 As DDIs cause up to one third of all adverse drug reactions,5 it is of great importance to prevent DDIs when administering drugs to patients. Several mechanisms underlying DDIs have been described in humans, but overall they can be divided

into

two

major

categories:

pharmacokinetic

and

pharmacodynamic

ones.

Pharmacokinetic DDIs include cases where one drug affects the absorption, distribution, metabolism and excretion of another drug. Pharmacodynamic DDIs include synergistic, additive or antagonistic pharmacological effects of drugs taken together. The most frequent case of DDIs is when the co-administered drugs are substrates, inducers, or inhibitors of the same metabolizing enzyme(s), potentially altering the expected rate of metabolism of one or both compounds.6 It is known that most drugs are metabolized by cytochrome P450 (CYP) isoenzymes.7 Therefore, the majority of DDIs are also related to the inhibition of cytochrome P450s as evidenced by the data from "The Top 100 Drug Interactions 2010: A Guide to Patient Management"8 and "The Top 100 Drug Interactions 2014: A Guide to Patient Management".9 More than 50% of all DDIs described in these books were mediated by different isoforms of cytochrome P450. For instance, well-known tyrosine kinase inhibitors used for cancer treatment, such as dasatinib, erlotinib, and gefitinib, are substrates of the 3A4 isoform of cytochrome P450

ACS Paragon Plus Environment

Page 5 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

(CYP3A4). If these drugs are co-administered with antimicrobials, which are also CYP3A4 inhibitors, e.g., clarithromycin, erythromycin, or fluconazole, these antimicrobials may increase the plasma level of the tyrosine kinase inhibitors leading to toxic reactions including skin rashes, anemia, hemorrhage, and gastrointestinal symptoms.8 Analysis of possible DDIs is complicated10 by the fact that if one drug is a cytochrome inhibitor, and the second drug is a substrate of the same cytochrome, this does not always lead to clinical manifestations of drug-drug interactions. Often, these combinations of drugs are quite safe. To address this challenge of DDIs evaluation, several computational approaches have been proposed.11 They can be formally divided into network-based methods and structure-activity relationship modeling. The network-based methods entail the analysis of common targets and pathways affected by drugs that are associated with adverse drug reactions. For instance, Takarabe et al.12 applied a network-based method for the annotation and analysis of 45,180 interactions involving 1,352 drugs. Huang et al.13 constructed a protein-protein interaction network of 1,249 FDA-approved drugs, which included 4,776 associations with 1,289 targets for the prediction of pharmacodynamic DDIs. The main limitation of these methods is that many suppositions concerning putative DDIs are based on different assumptions,14 which can significantly decrease the accuracy of predictions made with this method. Another approach used to study adverse drug effects is Quantitative Structure-Activity Relationships (QSAR) modeling. For example, Bender et al.15 created Bayesian models for 70 targets related to adverse drug reactions with an overall correct classification rate of about 94%. Matthews et al.16 developed 14 QSAR models for the prediction of the cardiac adverse effects for generic pharmaceutical substances. Vilar et al.17,18 employed MACCS keys17 and interaction

ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

profile fingerprints (IPFs)18 in similarity-based modeling for the prediction of drug-drug interactions, using the same dataset of 928 drugs and 9,454 well-established DDIs from the DrugBank database for modeling of DDIs and adverse effects of these drugs. Similarity based approaches for predicting DDIs were reviewed in details elsewhere.19,20 Text mining approaches for predicting DDIs were used by Percha et al.21 and Tari et al.22 These and other approaches along with DDIs data sources have been recently reviewed by Percha and Altman.23 However, all these methods are focused on the prediction of the adverse drug reactions induced by a single drug, not a combination of drugs, which is after all at the heart of DDIs. In this study, we have developed and rigorously validated QSAR models for the prediction of DDIs. To enable this cheminformatics analysis and modeling of DDIs, a combination of two drugs potentially liable for drug-drug interaction was uniquely represented as a binary chemical mixture. This representation allowed us to employ approaches developed previously for QSAR modeling of mixtures that we reviewed recently24. Such models are obviously less complicated than pathway-based approaches. In addition, we have generated all possible binary combinations of marketed drugs and employed our QSAR models to predict those combinations that are likely to cause DDIs. We have analyzed these predictions at the therapeutic and pharmacological level using the Anatomical Therapeutic Chemical (ATC) classification system and compared predictions with independent information from the DrugBank database. Models developed in this study can be used as effective tools in support of clinical decisions concerning the choice of multiple medications for the patients.

MATERIALS AND METHODS

ACS Paragon Plus Environment

Page 6 of 45

Page 7 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

DATA SETS Data on drugs and their combinations were extracted from the book "The Top 100 Drug Interactions 2010: A Guide to Patient Management".8 The OpeRational ClassificAtion (ORCA)25 system for drug-drug interactions was used for DDI classification. The ORCA classification defines 5 classes of DDIs (class descriptions taken directly from ref. 25): 1. Avoid combination (Risk of combination outweighs benefit). For example, azapropazone (inhibitor of CYP2C9) blocks metabolism of warfarin and considerably increases its anticoagulating effect leading to hemorrhage. 2. Usually avoid combination (Use only under special circumstances). For instance, inhibitors of CYP2D6 (amiodarone, haloperidol, etc.) decrease the rate of metabolism of some antiarrhytmics that are substrates of CYP2D6 (flacainide, and propafenone): a. Interactions for which there are clearly preferable alternatives for one or both drugs; b. Interactions to avoid by using an alternative drug or other therapy unless the benefit is judged to outweigh the increased risk. 3. Minimize Risk (Assess risk and take one or more of the following actions if needed). For instance, inhibitors of CYP3A4 (amprenavir, cyclosporine, etc.) may decrease the rate of metabolism of some calcium channel blockers that are substrates of CYP3A4 (amlodipine, nifedipine, etc.): a. Consider alternatives: Alternatives may be available that are less likely to interact; b. Circumvent: Take action to minimize the interaction (without avoiding combination); c. Monitor: Early detection can minimize the risk of an adverse outcome.

ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 45

4. No special Precautions (Risk of adverse outcome appears small). 5. Ignore (Evidence suggests that the drugs do not interact). Only drug-drug interactions in Class 1, 2 and 3 were considered important in the clinic25, and thus were extracted for the purpose of this study. The book (2010 edition)8 included 358 drugs with approximately 5,000 DDIs belonging to one of three classes (1, 2 or 3). The book did not have any information about DDIs of Class 4 and 5 due to their insignificance. We created four sets of drugs interacting with four respective CYP isoforms, 1A2, 2C9, 2D6, and 3A4, based on the information about DDIs related to Classes 1, 2 and 3, totaling 332 unique drugs across the three classes. Some drugs were simultaneously represented in different sets because they are metabolized by different isoforms of CYP (Table 1). For example, column 1A2 shows that the set of drugs interacting with CYP1A2 consists of 55 drugs of which 11 drugs interact also with 2C9, 17 interact with 2D6, and 34 interact with 3A4.

Table 1. Number of drugs for different isoforms of cytochrome P450. Data extracted from Ref. 8. 1A2

2C9

2D6

1A2

55

2C9

11

73

2D6

17

9

94

3A4

34

39

56

The

structures

of

the

3A4

237

drugs

were

retrieved

from

the

ChemIDPlus

website

(http://chem.sis.nlm.nih.gov/chemidplus/). Separate sets (in SD file format) for each isoform of cytochrome P450 were created for single drug molecules interacting (as inhibitors or substrates)

ACS Paragon Plus Environment

Page 9 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

with each CYP isoform. Some drugs are known as both an inhibitor and a substrate and were labeled accordingly. These data were used to form training sets with each record including a combination of SD files for two drugs interacting with the same isoforms of P450 (Figure 1).

Figure 1. Scheme of the training sets creation with DDI data.

Classes 1, 2 and 3 mentioned in the book were used as indicators of unsafe DDIs (Table 2). These unsafe combinations were assigned to class “1”. All others combinations which were not related to Classes 1, 2, or 3 were considered as safe and were assigned to class “0”. For example (see the first row in Table 2), the book includes 55 drugs interacting with CYP 1A2 (12 inhibitors, 42 substrates and 1 drug which is both a substrate and an inhibitor). In total, there are 1,485 possible pair combinations for 55 drugs: (55*54)/2 = 1485. Only 70 out of 1,485 possible

ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 45

DDIs belonged to class 1, 2, or 3 in the book and were therefore considered as unsafe (class “1”) for the purpose of this study. The remaining 1,415 pair combinations were considered as safe (class “0”). This led to a training set (SD file) for CYP 1A2 containing 1,485 pairs of drugs, which included 70 unsafe and 1,415 safe binary drug combinations. The training sets for the other CYP isoforms were created in the same manner. Thus, the total number of drug pairs for each CYP isoform varied from 1,485 for CYP 1A2 to 27,966 for CYP 3A4 (Table 2).

Table 2. Investigated drug pairs. CYP

Number of drugs

Number of inhibitors

Number of substrates

Total number of possible DDIs

Number of unsafe DDIs

1A2

55

13

43

1485

70

2C9

73

32

43

2628

126

2D6

94

37

76

4371

329

3A4

237

45

216

27966

2117

Number of unsafe DDIs = number of pairs of drugs with DDI found in categories 1, 2, or 3 (see text).

METHODS General approach and software For the development of QSAR models two programs were used: HiT QSAR26 and GUSAR (General Unrestricted Structure Activity Relationships; version 2013).27,28,29,30 HiT QSAR employs the Simplex representation of molecular structure (SiRMS) descriptors and the Random Forest (RF) method for model building. GUSAR uses Quantitative Neighbourhoods of Atoms (QNA) descriptors,31,32 “biological” descriptors (PASS based predictions)33, and whole-molecule descriptors. It uses an algorithm based on radial basis functions combined with self-consistent

ACS Paragon Plus Environment

Page 11 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

regression (RBF-SCR).34 Thus, in total two types of molecular descriptors (SiRMS, QNA) and two types of machine learning approaches (RF, RBF-SCR) were used. Brief descriptions of each approach are presented below.

HiT QSAR Simplex representation of molecular structure (SiRMS) In the framework of the Simplex representation of molecular structure (SiRMS)26,35,36 any molecule can be represented as an ensemble of different simplexes: tetratomic fragments of fixed composition and topology. The connectivity of atoms in a simplex, their atom type and bond nature (single, double, triple or aromatic) are considered at the 2D level (Figure 2). Bonded and non-bonded 2D simplexes were used. Not only atom type, but other physicochemical characteristics of atoms, i.e., partial charge, lipophilicity, atomic refraction and the atom's ability to be a hydrogen-bond donor and/or acceptor were used for labeling the atoms in the simplexes. For these characteristics a binning procedure was used to transform real values (for charge, lipophilicity, and refraction) to four categories corresponding to (i) partial charge A ≤ -0.05 < B ≤ 0 < C ≤ 0.05 < D, (ii) lipophilicity A ≤ -0.5 < B ≤ 0 < C ≤0.5 < D, and (iii) refraction A ≤ 1.5 < B ≤3 < C ≤ 8 < D. Three characteristics of atom H-bond formation ability were distinguished: A (hydrogen-bond acceptor); D (hydrogen-bond donor); and I (indifferent atom, e.g., atom that do not form H-bonds).

Figure 2. All possible topological types of simplexes.

ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 45

Bonded simplexes describe only single components of the mixture (compound 1 or 2), whereas non-bonded simplexes can describe both the constituent parts and the mixture as a whole (Figure 3). In this context, it is necessary to indicate whether the parts of a non-bonded simplex belong to the same molecule or to different ones. A special mark is used during descriptor generation to distinguish such simplexes. Descriptors of constituent parts (compounds 1 or 2) are weighted according to their molar fraction in the combination, and then mixture descriptors (non-bonded simplexes describing both compounds 1 and 2 simultaneously) are multiplied on doubled minimal weight according to Equation 1.

 x D + x B DB D= A A  2 xA DA+ B ,

(1)

where D is the descriptor value, xA and xB are the molar fractions of components A and B (xA < xB and xA + xB = 1), respectively, and DA, DB and DA+B are the descriptor values for individual compounds A and B, and for their mixture, respectively.

ACS Paragon Plus Environment

Page 13 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

Figure 3. Simplex descriptors for mixtures.

Random Forest Random Forest (RF) models were built using RF algorithm37 as implemented by Polishchuk et al.38 RF is an ensemble of single decision trees. This ensemble produces a corresponding number of outputs. The outputs of all trees are aggregated to obtain one final prediction as the average of the individual tree predictions. Each tree has been grown as follows: (i) a bootstrap sample, which will be the training set for the current tree, is produced from the whole training set of N compounds. Compounds that are not in the current tree training set are placed in an out-of-bag (OOB) set (∼N/3 molecules). (ii) The best split among the M randomly selected descriptors from the initial set is chosen in each node by the CART algorithm.39 The value of M is just one tuning parameter to which RF models are sensitive. (iii) Each tree is grown to the largest possible extent without any pruning. The model selection has been done according to the forest's performance on an out-of-bag (OOB) set.40

ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 45

GUSAR Descriptors GUSAR uses a combination of three types of descriptors: whole-molecule descriptors, QNA (Quantitative Neighborhoods of Atoms) descriptors31, and “biological” descriptors33 which are based on predictions from the PASS algorithm.41 Only QNA descriptors were used for the representation of mixtures, generated by GUSAR as follows. QNA descriptors are defined by two functions, P and Q. The values for P and Q for each atom i are calculated as: 1 Pi = Bi ∑ ( Exp ( − C )) ik Bk , 2 k

(2)

1 Qi = Bi ∑ ( Exp ( − C )) ik Bk Ak , 2 k

(32)

where the k are all other atoms in the molecule and Ak = 12 ( IPk + EAk ), Bk = ( IPk − EAk ) − . 1 2

(4)

Here IP is the ionization potential and EA is the electron affinity for each atom, and C is the connectivity matrix for the molecule. Two-dimensional Chebyshev polynomials are used for approximating the functions P and Q over all atoms of the molecule. As result, QNA descriptors can be easy calculated for a multi-component system or a mixture of compounds. The simplest way of the description of a mixture is to combine QNA descriptors calculated for each component of the mixture (see Figure 4) into one descriptor vector. This approach was used in the study.

ACS Paragon Plus Environment

Page 15 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

H3C

CH3 Structures

CH3 QNA descriptors “Mixture” of QNA descriptors

P 1 ,Q Q1 :൫P11 ,Q11 ൯, ൫P21 ,Q12 ൯, … ሺP௡1 ,Q1௡ ሻ

2 2 ሻ P2 ,Q Q2 :൫P12 ,Q21 ൯, ൫P22 ,Q22 ൯, … ሺP௠ ,Q௠

2 2 ሻ P 1,2 ,Q Q1,2 :൫P11 ,Q11 ൯, ൫P21 ,Q12 ൯, … ሺP௡1 ,Q1௡ ሻ,൫P12 ,Q21 ൯, ൫P22 ,Q22 ൯, … ሺP௠ ,Q௠

Figure 4. Example of QNA descriptors of a mixture.

The GUSAR uses three randomly selected parameters to generate different QSAR models based on QNA descriptors: (a) calculation of the QNA descriptors for either all atoms or for only the atoms in a molecule with two or more immediate neighbors; (b) adjustment of the connectivity matrix coefficient; and (c) adjustment of the parameters of the Chebyshev polynomials. The detailed algorithm is described elsewhere.27 The final QSAR model is the consensus of several different QNA-based models built in this way.

RBF-SCR To create QSAR models, we applied the radial basis function self-consistent regression (RBF-SCR) method that we previously described elsewhere, and for which we showed that it provides more accurate prediction results than other machine learning approaches including even consensus predictions of these methods.34 In the RBF-SCR method, the descriptors are weighted during the calculation of the radial basis functions by the coefficients obtained from self-consistent regression (SCR). These coefficients reflect the contribution of each particular descriptor (variable) to the final equation for the given activity. The higher the absolute value of the coefficient, the greater its contribution. Self-

ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 45

consistent regression is implemented as a regularized least-squares method that can be formulated as: m m n  a = ArgMin ∑ ( yi − ∑ xik ak ) 2 + ∑ vk ak2  k =0 k =1  i =1 ,

(5)

where a is the vector of regression coefficients, n is the number of objects, yi is the response value of the ith object, m is the number of independent variables, xik is the value of the kth independent variable of the ith object, ak is the kth value of the regression coefficients, and vk is the kth value of the regularization parameters. Thus, RBF-SCR can be expressed as the equation: N

y ( x ) = ∑ wiφ ( ax − axi i =1

) ,

(6)

where a is taken from equation 5 (SCR). The RBF-SCR method uses linear radial basis functions because they allow modelling of diverse training sets with a high level of dissimilarity between the set’s objects. Thus, the salient features of our RBF-SCR method are: (a) the weights for each descriptor vector used for the calculation of RBF are based on that descriptor’s importance for the given activity as determined by SCR, (b) linear basis functions are used for better description of diverse data sets.

Applicability domain estimation A Local (Tree) approach was used for the estimation of the applicability domain (AD) of the random forest models.42 For GUSAR, three different approaches to define the AD of the models were used: similarity, leverage, and accuracy assessment. A test compound is considered to be inside the applicability domain of a model if it passes all three AD filters. Otherwise, the compound is considered to be outside of the applicability domain.

ACS Paragon Plus Environment

Page 17 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

Similarity For every compound, the pair-wise distance to each of its three nearest neighbors in the training set is calculated using Pearson’s correlation coefficient in the space of the independent variables obtained after SCR. The compound is considered to pass this AD filter if the average of these three distances is less than or equal to 0.7. Leverage Leverage calculations are a method for identifying outliers based on the contribution of each molecule to its own predicted value: Leverage = x T ( X T X ) −1 x ,

where x is the vector of the descriptors for a test compound, and X is the matrix formed from the rows corresponding to the descriptors of all molecules in the training set. A compound is considered to pass this AD filter if its leverage is higher than the 99th percentile in the distribution of the leverage values calculated for the training set. Accuracy assessment For every compound the three most similar compounds in the training set are calculated. Then this AD approach calculates an RMSE value of these three compounds (RMSE3NN) and compares it with the RMSE value of the whole training set (RMSEtrain) using the following equation: ADvalue = RMSE3NN / RMSEtrain , An RMSE3NN value larger than the RMSEtrain value thus yields an ADvalue larger than 1. This threshold was used to classify the prediction of a test compound as inaccurate. Thus, only compounds that had ADvalue equal to or less than 1 passed the AD cut-off in this study.

ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 45

Consensus modeling GUSAR consensus model In GUSAR, the consensus model was calculated using a weighted average of the predictions from all obtained QSAR models. Each model was based on a different set of QNA descriptors, and its predictions for each compound were weighted by the similarity value calculated during the Similarity AD assessment described above. General consensus model The final predicted value for a compound was calculated as the average of the predictions obtained from the GUSAR consensus and the HiT QSAR RF models if the compound fell in the applicability domain of both models. If the compound fell in the applicability domain of only one of the models then the final predicted value was taken from that model.

Data set balancing To account for the imbalance in the training sets (a situation when the data set has a small ratio of active compounds to inactive ones or vice versa), two different strategies were applied during model building for the two programs used. In GUSAR, the datasets were not balanced prior to modeling; instead, the “adjusting decision threshold” approach was used during model generation. This approach selects the optimum value of the decision threshold (boundary) from the set of possible values for assignment of the class memberships during the leave-many-out cross-validation procedure, in order to improve the balanced accuracy results. The selected threshold value is then used for class assignment upon prediction of test set compounds. In the HiT QSAR modeling the “one-sided random sampling” approach was used. This approach randomly selects compounds from the majority class until the total number of selected

ACS Paragon Plus Environment

Page 19 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

compounds becomes equal to the number of compounds in the minority class. As a result, the HiT QSAR model is built on the reduced and balanced training set. A detailed description of both methods can be found elsewhere.28 It is necessary to emphasize that these balancing approaches were applied only to the training sets; no balancing procedures were applied to either the test or external validation sets.

Validation procedure The prepared modeling sets were divided into training and test sets according to the “Compounds out” strategy24 described below. This procedure was repeated 10 times. The obtained models were used for predicting the remaining mixtures that were excluded prior to model development. "Compounds out" strategy Here, all the mixtures containing selected compounds are simultaneously placed in the same external fold (Figure 5). Thus, every mixture in the external set contains at least one compound that is absent in the training set. This differs from the classical external CV algorithm where the folds are not created randomly, but supervised in order to keep the number of both pure compounds and their mixtures amongst the folds more or less constant. The supervision is especially needed in cases when one pure compound may be found in only one mixture and another compound can be a component of many mixtures; here, the classical random division algorithm is unable to handle such a situation during external folds creation. Moreover, despite the supervised process of folds creation, still some folds can be predicted badly because of the considerable lack of information in the training set of the given fold. Every mixture is placed in the external set n times, where n is the number of components in the mixture, except in the cases

ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 45

when some pure compounds, which are constituents of the given mixture, belong to the same external fold. Thus, the final number of predictions after the "Compounds out" procedure is ~1.8 higher than the initial number of points. This procedure simulates the addition of a novel component to an existing matrix of mixtures. The "Compounds out" strategy has been used in other studies43,44 as the most rigorous method of external validation of QSAR modeling of mixtures.

Figure 5. "Compounds out" strategy. Pairs of drugs that are excluded from a training set to form an external test set are filled in black.

Evaluation of the model prediction accuracy For estimating the accuracy of prediction the following statistical parameters were calculated: 1) Sensitivity: accuracy of predicting “positive” (active) when the true outcome is positive. Sensitivity =

TP FN + TP ,

ACS Paragon Plus Environment

Page 21 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

where TP: true positive and FN: false negative. 2) Specificity: accuracy of predicting “negative” (inactive) when the true outcome is negative. Specificity =

TN TN + FP ,

where TN: true negative and FP: false positive. 3) Balanced Accuracy: Average between Sensitivity and Specificity. Accuracy = ( Sensitivit y + Specificit y ) / 2

4) Coverage: ratio of compounds inside of the model’s AD (NAD) and the total number of compounds (N). Coverage =

N AD ×100% N

RESULTS Model development and validation Prior to model building, all initial data sets corresponding to various CYP isoforms were divided into training and test sets 10 times according to the “Compounds out” strategy described above. For each training set, we built one GUSAR consensus model based on twenty models generated with QNA descriptors and RBF-SCR; and one HiT QSAR model based on Random Forest of 1,000 trees with Simplex descriptors. In addition, we built a consensus model integrating GUSAR and HiT QSAR models. Statistical characteristics of these models, assessed for each isoform of cytochrome P450 via 10 rounds of the “Compounds out” external validation, are shown in Table 3. As discussed in the “Data set balancing” section under Materials and Methods, all external validation sets were imbalanced by design.

ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 45

Table 3. Model accuracy estimated on external imbalanced datasets determined by "Compounds out" procedure. CYP isoforms

1A2

2C9

2D6

3A4

Method

TP

TN

FP

FN

Sensitivity

Specificity

Accuracy

Coverage

QNA and SCR-RBF

83

2271

426

50

0.62

0.84

0.73

99.47%

Simplex and RF

63

1692

574

31

0.67

0.75

0.71

82.95%

Consensus

89

2035

677

44

0.67

0.75

0.71

100%

QNA and SCR-RBF

158

3589

540

66

0.71

0.87

0.79

86.66%

Simplex and RF

166

1463

2213

33

0.83

0.4

0.62

77.39%

Consensus

194

2573

2059

46

0.81

0.56

0.68

96.99%

QNA and SCR-RBF

380

6364

1051

231

0.62

0.86

0.74

96.17%

Simplex and RF

302

5102

1566

156

0.66

0.77

0.71

85.69%

Consensus

430

6047

1521

197

0.69

0.8

0.74

98.19%

QNA and SCR-RBF

2813

38725

3993

683

0.80

0.91

0.86

89.04%

Simplex and RF

2938

37737

6096

463

0.86

0.86

0.86

91.03%

Consensus

3366

41071

6473

621

0.84

0.86

0.85

99.29%

As one can see from Table 3, both approaches showed good external predictivity. The accuracy of prediction ranged from 0.73 to 0.86 (coverage 86-99%) for the QNA descriptors in GUSAR with the SCR-RBF method, while for Simplex and RF it ranged from 0.62 to 0.86 (coverage 77-91%). The best results were achieved by both methods for the 3A4 data set – perhaps because the 3A4 training set was significantly larger compared to the other sets. The advantage of QNA and SCR-RBF over Simplex and RF models observed for the smaller dataset is most likely explained by the way of balancing the training sets in the statistical approaches used. In QNA and SCR-RBF, the “adjusting decision threshold” balancing approach is used during modeling (see Methods), which keeps entire training sets. In contrast hereto, the Simplex and RF models were built using the “one-sided random sampling” approach which equalized the

ACS Paragon Plus Environment

Page 23 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

number of DDIs pairs in classes “1” and “0” by eliminating a significant fraction of the larger class. Finally, for each isoform, the developed models were united in consensus ensembles that reached an accuracy of 0.68-0.85 with almost full coverage of 97-100%. Although the consensus approach is expected to show better predictivity then any of the individual methods,45 in this case the consensus model afforded accuracy similar to that of the GUSAR models. However, unlike these latter models, the consensus model reached nearly 100% coverage for all isoforms, which is a highly desirable outcome for any user relying on QSAR predictions.

Additional external evaluation In addition to the “Compounds out” strategy for external validation, we evaluated the predictivity of our QSAR models using new data that became available after we finished building the models.46 This allowed us to avoid the influence of the splitting procedure on the models’ performance. There are approximately 450 drugs with 6,800 DDIs in the 2014 edition of Hansten and Horn’s book9 and some of them are new in comparison with DDIs in the 2010 edition8. We therefore created an external validation set based on data on drug combinations (38 unsafe DDIs for CYP3A4) from the 2014 edition of Hansten and Horn9 and on data on fixed dose drug combinations (20 safe DDIs) selected from the Thomson Reuters Integrity database.47 Fixed dose drug combinations (FDCs) are combinations of two or more active drugs in a single dosage form. One of the most important aspects of FDCs is that the combination should not have supra-additive toxicity of the ingredients.48 Assuming that such single-dosage drug combinations are safe, we selected all known FDCs as safe DDIs. This evaluation set was subjected to prediction by our QSAR models (Figure 6).

ACS Paragon Plus Environment

Molecular Pharmaceutics

0.9

0.79

0.77

0.8

0.72 0.7

Balanced Accuracy

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 45

0.6 0.5 0.4 0.3 0.2 0.1 0 QNA and SCR-RBF (100% AD)

Simplex and RF (81.36% AD)

Consensus (100% AD)

Figure 6. Accuracy and coverage of prediction for additional external validation set of 59 unsafe DDIs (the data are for the 3A4 isoform).

Both methods showed similar prediction results with balanced accuracy exceeding 0.7 for the new data set; however, the coverage of SiRMS model was smaller. The best result was obtained for the consensus prediction: accuracy of 0.79 with 100% coverage. This analysis provided additional evidence that the consensus ensemble of models can be most reliably applied for the prediction of DDIs.

Prediction of drug-drug interactions We used 332 drugs with known DDI mechanisms for building our QSAR models. Obviously, the number of marketed drugs is much higher. For most of the possible pairwise drug

ACS Paragon Plus Environment

Page 25 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

combinations there are no published data on their interactions, or such data are anecdotal and inconclusive. To fill this gap, we decided to apply our QSAR models to predict potential DDIs for currently marketed drugs. For this purpose, we mined DrugBank49 for FDA-approved monocomponent electroneutral organic drugs that can be classified by the Anatomical Therapeutic Chemical (ATC)50 classification system. These criteria allowed us to extract 1,134 drugs from DrugBank. We generated all possible binary combinations of the extracted drugs with the same algorithm as for the training sets, 642,411 in total, to construct a comprehensive putative DDIs data set. We found that a considerable part of the drugs and corresponding DDIs from our training sets were included in the set thus created from the DrugBank data. DrugBank does not use the ORCA classification and has its own definition of drug-drug interactions: “Drugs that are known to interact, interfere or cause adverse reactions when taken with this drug.” DrugBank includes data on DDIs from the following sources: Physician's Desk Reference,51 e-Therapeutics,52 Medicines Complete,53 Epocrates RX,54 Drugs.com.55 Many unsafe DDIs from DrugBank were the same as in the training sets but without indication of the CYP isoform(s) involved. At the same time, some safe DDIs according to our training sets were labeled as unsafe in DrugBank. An example in case is the training set for CYP1A2. The training set of DDIs associated with CYP1A2 consisted of 55 drugs, 42 of which were present in DrugBank. Out of 70 unsafe DDI combinations from our CYP1A2 training set, there is none in DrugBank that would be described as interacting with CYP1A2. At the same time, DrugBank includes information about 32 of those 70 unsafe DDI combinations but without mentioning CYP1A2. In addition, 27 DDIs which were considered as safe in our CYP1A2 training set are considered as unsafe in DrugBank. For CYP3A4 we can see a somewhat different picture. The training set of DDIs related to CYP3A4 consists of 237 drugs and 185 of those are in DrugBank.

ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 45

Out of 2,117 unsafe DDI combinations from our CYP3A4 training set there are 224 in DrugBank that are described as interacting with CYP3A4. At the same time DrugBank also includes information on 509 of the 2,117 unsafe DDIs without mentioning CYP3A4. Also, we found that 152 DDIs which were considered as safe in CYP3A4 training set are considered as unsafe in DrugBank (Table 4).

Table 4. Data overlap between training sets and DrugBank. CYP

Training set Number of drugs

DrugBank

Number of unsafe DDIs

Number of safe DDIs

Number of drugs from training sets

Number of unsafe DDIs from training sets

Number of safe DDIs from training sets considered as unsafe DDIs in DrugBank

1A2

55

70

1415

42

0* (32)**

27

2C9

73

127

2501

57

7* (11)**

34

2D6

94

329

4042

83

0* (102)**

52

3A4

237

2117

25849

185

224* (509)**

152

* Number of unsafe DDIs from training sets which are described as interacting with the same CYP isoform; ** Number of unsafe DDIs from training sets without description of the CYP isoform involved.

As many DDIs are described in DrugBank without mentioning of CYP isoforms, revealing a possible reason (i.e. interaction with a specific CYP) would be one of the important results of our DDI predictions. The QSAR models developed for the 3A4 isoform were used for the prediction of the generated combinations because this isoform had by far the largest training set among the studied isoforms and, furthermore, most xenobiotics are metabolized by 3A4.56 We therefore performed an exhaustive screening of binary DDIs for uncharacterized drugs.

ACS Paragon Plus Environment

Page 27 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

Application of our 3A4 DDI consensus model predicted 86,164 out of 642,411 two-drug combinations as unsafe DDIs. The prediction results showed approximately the same ratio of unsafe vs. safe DDIs (0.14) as the original data (0.08) from the book,8 which indicates that the obtained QSAR models preserved this characteristic of the data distribution independent of whether the sets were balanced or not. As many as 79,138 predicted DDIs from 86,164 (91.85%) fell into the AD of the consensus model (see Methods). The sets of the top 10 and bottom 10 drugs predicted to cause unsafe DDIs when interacting with other drugs the most or least frequently, respectively, are shown in Table 5. The results for all 1,134 drugs can be found in the supplementary material (Table S1).

Table 5. Top 10 and bottom 10 drugs predicted to have unsafe DDIs. Drug name

Number of predicted DDIs

Pasireotide

1131

Anidulafungin

1126

Vapreotide

1124

Gonadorelin

1100

Polymyxin B Sulfate

1084

Octreotide

1068

Josamycin

1067

Atazanavir

1064

Terlipressin

1062

Amiodarone

1051





Voglibose

7

Zidovudine

7

Ethambutol

6

ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Ganciclovir

6

Methyldopa

6

Penciclovir

6

Telbivudine

6

Alendronate

5

Amifostine

5

Masoprocol

2

Page 28 of 45

Pasireotide, Anidulafungin, Vapreotide, and Gonadorelin were predicted as the most frequently interacting drugs. The number of DDIs predicted for each of them exceeded 1,100. Masoprocol, Amifostine, and Alendronate were the drugs least likely to interact according to our prediction results, with the number of predicted DDIs being less than 10 for all of them. Comparison with published clinical data could support or contradict our prediction results. Unfortunately, all effects or DDIs of marketed drugs are typically not available. Also, the publications regarding DDIs deal with reported (adverse) effects and usually there is no incentive to explicitly report the fact that for a given drug combination nothing special or worrisome in terms of DDIs was observed. This leads to a systematic lack of published negative results (drug combinations without unsafe DDI) in the context of our study. Therefore, we had to limit ourselves to confirm our positive (unsafe DDIs) predictions using an independent source. For this purpose, we mined DrugBank for all known information about our predicted unsafe DDIs for all the binary combinations of drugs we had generated. Unfortunately, the information in DrugBank is very limited: We found at least one DDI for only 43% of all drugs in DrugBank. Therefore, in a large number of cases, we could not estimate the accuracy of our predictions. Nevertheless, as a proof of concept we calculated the sensitivity value between the numbers of unsafe CYP3A4-related DDIs predicted by our models vs. those found in DrugBank. DrugBank

ACS Paragon Plus Environment

Page 29 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

contains information on 1,983 CYP3A4-related DDIs for 1,134 drugs used in this prediction. We predicted 1,776 out of 1,983 CYP3A4-related DDIs, yielding a sensitivity for this set of 0.90. After excluding drugs and their DDIs belonging to our CYP3A4 training set from the set from DrugBank, we obtained a second, more “objective,” set containing information on 1,759 CYP3A4-related DDIs for 949 drugs. The result of DDI predictions included 1,552 out of those 1,759 CYP3A4-related DDIs for the second set, yielding still very high sensitivity for the second set of 0.88. These results indicated that our models appear to be able to predict clinically reported unsafe DDIs reasonably well. In total, we found more than 4,500 confirmations of DDIs predicted by our models. Several examples of DDIs predictions confirmed by DrugBank are described below. For instance, octreotide is an octapeptide that mimics natural somatostatin pharmacologically. Octreotide is approved for the treatment of growth hormone producing tumors, pituitary tumors that secrete thyroid stimulating hormone (thyrotropinoma), diarrhea and flushing episodes associated with carcinoid syndrome, and diarrhea in patients with vasoactive intestinal peptidesecreting tumors. According to the data in DrugBank, additive QTc-prolongation may occur when Octreotide is taken together with Artemether, Lumefantrine, Tacrolimus, Toremifene, Trimipramine, Voriconazole, Vorinostat, Ziprasidone, etc. Octreotide decreases the effect of Cyclosporine. Concomitant therapy with somatostatin analogs may increase the blood-glucoselowering effect of insulin Lispro and thus increase the risk of hypoglycemia. Josamycin is a macrolide antibiotic. According to DrugBank, Josamycin, may increase the effect of Cyclosporine, Alprazolam, Buspirone, Carbamazepine, and Midazolam, and the toxicity of Statin, Lovastatin, and Theophylline. The combination of Josamycin with Moxifloxacin, Cisapride, Astemizole, or Thioridazine may increase the risk of cardiotoxicity and arrhythmias.

ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 45

Ergotism and severe ischemia may occur with the combination of Josamycin with Dihydroergotamine, Ergotamine, or Methysergide. Atazanavir is an antiretroviral drug of the protease inhibitor class. According to the data in DrugBank, it may increase the effect and toxicity of the tricyclic antidepressant, Amitriptyline, by decreasing its metabolism. Atazanavir may increase the anticoagulant effect of Acenocoumarol, Anisindione, Dicoumarol, and Warfarin. The strong CYP3A4 inhibitor Atazanavir may decrease the metabolism and clearance of CYP3A4 substrates such as Teniposide, Tiagabine, and Trimipramine. The interaction of Atazanavir with Lidocaine, Quinidine, and Dihydroquinidine barbiturate may increase the risk of cardiotoxicity and arrhythmias. Voriconazole, (the most DDI-prone drug according to DrugBank: it interacts with 178 drugs) is a triazole antifungal medication that is generally used to treat serious, invasive fungal infections. Voriconazole, a strong CYP3A4 inhibitor, may increase the serum concentration of Darifenacin, Flunisolide, Cyclosporine, Calcitriol, Bortezomib, Tramadol, Erythromycin, Sildenafil, Dofetilide, Citalopram, etc. by decreasing their metabolism. Additive QTc prolongation may occur, when Voriconazole is taken together with Moxifloxacin, Ziprasidone, Disopyramide, Ibutilide, Amitriptyline, Protriptyline, Loxapine, etc. Telithromycin was the first ketolide antibiotic, which belongs to the macrolide group and is a semi-synthetic erythromycin derivative. We found in DrugBank that Telithromycin interacts with 164 drugs. Telithromycin may reduce clearance of Cyclosporine, Calcitriol, Flunisolide, Bortezomib, Tramadol, Erythromycin, Sildenafil, Citalopram, Eletriptan, Ranolazine, etc. Telithromycin may increase the adverse effects of Lovastatin by decreasing its metabolism. The

ACS Paragon Plus Environment

Page 31 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

combination of Telithromycin with Dofetilide may increase the risk of cardiotoxicity and arrhythmias. All predicted DDIs can be found in the supplementary material (Table S1). It is interesting to analyze why some drugs may interact with many different drugs while other drugs are more limited in their interactions. In other words, what kinds of DDI mechanisms correlate with these interactions? It has been shown6 that one of the possible mechanisms is the CYP inhibition potency of drugs. Thus, some drugs are stronger inhibitors of CYPs, which leads to stronger side effects during interactions with other drugs. Therefore, the half-maximal inactivation (Ki) values, which are experimentally derived as in vitro kinetic constants, are useful descriptors of the drug potency as an inhibitor of CYPs. To investigate the dependency of the most and the least frequently interacting drugs (Table 5) on inhibition potency (Ki values) against CYP3A4, we performed a literature search and database analysis. As result, we found that Josamycin, Atazanavir, and Amiodarone (top 10 drugs, Table 5) provided significant inhibitory effect on CYP3A4, while Zidovudine, Ethambutol, Ganciclovir, Penciclovir, Alendronate, and Amifostine (bottom 10 drugs, Table 5) have all shown57 less than 50% inhibitory activity at 10 µM concentration. Thus, our predictions correlate with one of the possible factors responsible for DDIs. It is necessary to emphasize that in addition to inhibition potency there are many other factors that may play an important role in the causation of DDIs, e.g., P-glycoprotein transport, half-life time, therapeutic dose, plasma concentration, etc. Pglycoprotein can alter the intracellular concentration of CYP3A4 inhibitors and inducers and hence the magnitude of the inhibitory and inductive responses. Both a high level of the therapeutic dose and a long half-life time may lead to the accumulation of the drug in plasma, which may increase toxic reactions.8 Taking into account all these additional factors of the

ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 45

underlying mechanisms of DDI could in principle further improve the accuracy of computational models but will require significant further investigation beyond the scope of this study.

Interpretation of the predicted drug-drug interactions using ATC classification To interpret the predicted results from a therapeutic and pharmacological point of view we used the Anatomical Therapeutic Chemical (ATC) classification system. For this purpose we used fourth-level codes of the ATC hierarchical classification of drugs. The classification was applied to the predicted DDIs. Binary drug combinations were grouped by ATC terms to determine which therapeutic and pharmacological classes of drugs (ATC terms) appeared as the most and less interacting ones, respectively. The top 10 and bottom 10 predicted most and least interacting ATC terms, respectively, are shown in Table 6. The results for all ATC terms are available in the supplementary material (Table S2).

Table 6. Top 10 and bottom 10 ATC terms with most and least frequently predicted DDIs, respectively ATC term

Unique number of interactions by ATC terms

Total number of interactions by drugs

Most frequent interacting ATC term

Somatostatin and analogues

301

2777

Benzodiazepine derivatives

Antibiotics

300

3370

Protein kinase inhibitors

Gonadotropin-releasing hormones

295

922

Benzodiazepine derivatives

Protease inhibitors

294

6626

Benzodiazepine derivatives

Macrolides

293

4424

Benzodiazepine derivatives

Vasopressin and analogues

292

1586

Benzodiazepine derivatives

Antiarrhythmics, class III

290

1133

Protease inhibitors

Imidazole derivatives

289

936

Benzodiazepine derivatives

ACS Paragon Plus Environment

Page 33 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

Polymyxins

287

867

Benzodiazepine derivatives

Benzothiazepine derivatives

281

848

Benzodiazepine derivatives









Fourth-generation cephalosporins

10

19

Macrolides

Bisphosphonates

9

43

Organic nitrates

Iodine (123I) compounds

9

15

Somatostatin and analogues

Acridine derivatives

6

9

Organic nitrates

Belladonna alkaloids, tertiary amines

6

8

Somatostatin and analogues

Heparins or heparinoids for topical use

6

14

Protease inhibitors

Adamantane derivatives

5

6

Somatostatin and analogues

Cyclic amines

5

8

Organic nitrates

Dopa and Dopa derivatives

5

7

Organic nitrates

Sympathomimetics, combinations excl. corticosteroids

5

7

Somatostatin and analogues

In general, 304 unique drug classes (ATC terms) were identified. The most interacting classes of drugs were somatostatin and analogues, antibiotics, gonadotropin-releasing hormones, protease inhibitors, and macrolides. These results are not too surprising given that it is known that antibiotics, macrolides, and protease inhibitors may interact with different drugs leading to adverse drug reactions or toxic effects.6,8 As mentioned above and found elsewhere,6 the reason for this is that most of these drugs are strong inhibitors of CYP3A4. Thus, drugs belonging to highly interacting classes may be involved in DDIs with more or less any other drug class. Sympathomimetics, DOPA derivatives, adamantane derivatives, and heparins were predicted as the least interacting classes, which was also confirmed by DrugBank data. The most frequently interacting ATC terms between pairwise combinations of drugs were benzodiazepine derivatives, protein kinase inhibitors, and protease inhibitors. Thus, according to our predictions, each new

ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 45

drug belonging to the investigated 304 ATC terms has a high probability to interact with the above-mentioned most frequently interacting classes of drugs (benzodiazepine derivatives, protein kinase inhibitors, and protease inhibitors).

CONCLUSIONS We have collected information about 25,000 drug-drug interactions for four types of cytochrome P450 isoforms (1A2, 2C9, 2D6, 3A4). Using binary chemical mixture representation to formally characterize the interacting pairs of drugs, we have developed models based on QNA and Simplex descriptors for mixtures, applying the RBF-SCR and Random Forest machine learning approaches, respectively. Since the validation of QSAR models for mixtures in general and DDIs in particular is more complicated than in traditional QSAR analysis, the obtained models were validated using the "Compounds out" strategy specifically developed for rigorous validation of QSAR models of mixtures. The developed models showed good accuracy of prediction (balanced accuracy exceeding 70%) while maintaining high coverage of the external test sets (80%-100%). In addition, we generated 642,411 binary combinations of a significant subset of marketed drugs (1,134) and performed comprehensive DDI prediction runs for them. As a result, 86,164 out of 642,411 binary combinations of drugs were predicted as unsafe DDIs. More than 4,500 of these predicted DDIs were confirmed by data from DrugBank. As for all predictive models based on statistical approaches, not all of the remaining 80,000+ predicted DDIs should be expected to be confirmed in the clinical practice; nor can we claim that none of the over 556,000 drug combinations not predicted to cause DDIs will truly be safe in all cases. Nevertheless, we hope that the predictions obtained in this study as well as the models

ACS Paragon Plus Environment

Page 35 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

themselves may be useful as an alert system for potentially dangerous combinations of drugs as part of pharmacovigilance.

ASSOCIATED CONTENT Supporting Information The DDI prediction results obtained for all possible binary combinations of approved drugs are available in the Supplementary Material.

AUTHOR INFORMATION Corresponding Author *National Cancer Institute, National Institutes of Health, Chemical Biology Laboratory, 376 Boyles St., Frederick, MD 21702. E-mail: [email protected], telephone +1-301-846-5903. Author Contributions The manuscript was written with contributions from all authors. All authors have given approval to the final version of the manuscript Notes The authors declare no competing financial interest.

ACKNOWLEDGMENT

ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 36 of 45

This project has been funded in part with Federal funds from the Frederick National Laboratory

for

Cancer

Research,

National

Institutes

of

Health,

under

contract

HHSN261200800001E. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products or organizations imply endorsement by the U.S. Government. Creation of the data sets of drugs made by A.L., A.D. and V.P. was partly supported by the Russian Scientific Foundation grant 14-15-00449. E.M., D.F., and A.T. acknowledge partial support by NIH (grants GM66940 and GM096967), and FAPEG (grant 201310267001095). A.T. acknowledges partial support from the Russian Scientific Foundation (project 14-43-00024). V.A. thanks CNPq and the Science without Borders program for financial support for his visit to the University of North Carolina at Chapel Hill. E.V., E.M., and V.K. acknowledge partial support from STCU (Grant P407). We thank Neil B. Sandson for useful discussions.

ABBREVIATIONS DDI, drug-drug interaction; SiRMS, Simplex Representation of Molecular Structure; GUSAR, General Unrestricted Structure-Activity Relationships; PASS, Prediction of Activity Spectra for Substances; RBF-SCR, radial basis functions - self-consistent regression; QSAR, quantitative structure-activity relationships; QNA, quantitative neighborhoods of atoms (descriptors).

REFERENCES (1)

FastStats

-

Therapeutic

Drug

Use

http://www.cdc.gov/nchs/fastats/drug-use-

therapeutic.htm (accessed May 22, 2015).

ACS Paragon Plus Environment

Page 37 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

(2)

Murphy, J. E.; Malone, D. C.; Olson, B. M.; Grizzle, A. J.; Armstrong, E. P.; Skrepnek,

G. H. Development of Computerized Alerts with Management Strategies for 25 Serious Drug– drug Interactions. Am. J. Health. Syst. Pharm. 2009, 66 (1), 38–44 (3)

Hardman, J. G.; Limbird, L. E.; Gilman, A. G. Goodman & Gilman’s The

Pharmacological Basis of Therapeutics, 10th ed.; The MkGraw-Hill, 2001. (4)

Lazarou J; Pomeranz BH; Corey PN. Incidence of Adverse Drug Reactions in

Hospitalized Patients: A Meta-Analysis of Prospective Studies. JAMA 1998, 279 (15), 1200– 1205. (5)

Kuhlmann, J.; Mück, W. Clinical-Pharmacological Strategies to Assess Drug Interaction

Potential during Drug Development. Drug Saf. Int. J. Med. Toxicol. Drug Exp. 2001, 24 (10), 715–725. (6)

Zhou, S.-F.; Xue, C. C.; Yu, X.-Q.; Li, C.; Wang, G. Clinically Important Drug

Interactions Potentially Involving Mechanism-Based Inhibition of Cytochrome P450 3A4 and the Role of Therapeutic Drug Monitoring: Ther. Drug Monit. 2007, 29 (6), 687–710. (7)

Guengerich, F. P. Cytochrome P450 and Chemical Toxicology. Chem. Res. Toxicol.

2008, 21 (1), 70–83. (8)

Hansten, P. D.; Horn, J. R. The Top 100 Drug Interactions 2010: A Guide to Patient

Management; H&H Publications, LLP: Freeland, WA, 2010. (9)

Hansten, P. D.; Horn, J. R. The Top 100 Drug Interactions: A Guide to Patient

Management, 15th edition; H & H Publications LLP, 2014.

ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(10) Wapner

J.

Deadly

Drug

Combinations.

Sci.

Page 38 of 45

Am.

2015,

313(4),

29-30.

http://www.scientificamerican.com/article/new-software-and-genetic-analyses-aim-to-reduceproblems-with-multiple-drug-combinations/ (accessed Oct 5, 2015). (11) Bulusu, K. C.; Guha, R.; Mason, D. J.; Lewis, R. P. I.; Muratov, E.; Kalantar Motamedi, Y.; Cokol, M.; Bender, A. Modelling of Compound Combination Effects and Applications to Efficacy and Toxicity: State-of-the-Art, Challenges and Perspectives. Drug Discov. Today. (12) Takarabe, M.; Shigemizu, D.; Kotera, M.; Goto, S.; Kanehisa, M. Network-Based Analysis and Characterization of Adverse Drug-Drug Interactions. J. Chem. Inf. Model. 2011, 51 (11), 2977–2985. (13) Huang, J.; Niu, C.; Green, C. D.; Yang, L.; Mei, H.; Han, J.-D. J. Systematic Prediction of Pharmacodynamic Drug-Drug Interactions through Protein-Protein-Interaction Network. PLoS Comput Biol 2013, 9 (3), e1002998. (14) Khatri, P.; Sirota, M.; Butte, A. J. Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges. PLoS Comput. Biol. 2012, 8 (2). (15) Bender, A.; Scheiber, J.; Glick, M.; Davies, J. W.; Azzaoui, K.; Hamon, J.; Urban, L.; Whitebread, S.; Jenkins, J. L. Analysis of Pharmacology Data and the Prediction of Adverse Drug Reactions and Off-Target Effects from Chemical Structure. ChemMedChem 2007, 2 (6), 861–873. (16) Matthews, E. J.; Frid, A. A. Prediction of Drug-Related Cardiac Adverse Effects in humans—A: Creation of a Database of Effects and Identification of Factors Affecting Their Occurrence. Regul. Toxicol. Pharmacol. 2010, 56 (3), 247–275.

ACS Paragon Plus Environment

Page 39 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

(17) Vilar, S.; Harpaz, R.; Uriarte, E.; Santana, L.; Rabadan, R.; Friedman, C. Drug-Drug Interaction through Molecular Structure Similarity Analysis. J. Am. Med. Inform. Assoc. 2012, 19 (6), 1066–1074. (18) Vilar, S.; Uriarte, E.; Santana, L.; Tatonetti, N. P.; Friedman, C. Detection of Drug-Drug Interactions by Modeling Interaction Profile Fingerprints. PloS One 2013, 8 (3), e58321. (19) Vilar, S.; Uriarte, E.; Santana, L.; Friedman, C.; Tatonetti, N. P. State of the Art and Development of a Drug-Drug Interaction Large Scale Predictor Based on 3D Pharmacophoric Similarity. Curr. Drug Metab. 2014, 15 (5), 490–501. (20) Vilar, S.; Uriarte, E.; Santana, L.; Lorberbaum, T.; Hripcsak, G.; Friedman, C.; Tatonetti, N. P. Similarity-Based Modeling in Large-Scale Prediction of Drug-Drug Interactions. Nat. Protoc. 2014, 9 (9), 2147–2163. (21) Percha, B.; Garten, Y.; Altman, R. B. Discovery and Explanation of Drug-Drug Interactions via Text Mining. Pac. Symp. Biocomput. 2012, 410–421. (22) Tari, L.; Anwar, S.; Liang, S.; Cai, J.; Baral, C. Discovering Drug-Drug Interactions: A Text-Mining and Reasoning Approach Based on Properties of Drug Metabolism. Bioinforma. Oxf. Engl. 2010, 26 (18), i547–i553. (23) Percha, B.; Altman, R. B. Informatics Confronts Drug-Drug Interactions. Trends Pharmacol. Sci. 2013, 34 (3). (24) Muratov, E. N.; Varlamova, E. V.; Artemenko, A. G.; Polishchuk, P. G.; Kuz’min, V. E. Existing and Developing Approaches for QSAR Analysis of Mixtures. Mol. Inform. 2012, 31 (34), 202–221.

ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 40 of 45

(25) Hansten, P. D.; Horn, J. R.; Hazlet, T. K. ORCA: OpeRational ClassificAtion of Drug Interactions. J. Am. Pharm. Assoc.(Wash) 2001, 41 (2), 161–165. (26) Kuz’min, V. E.; Artemenko, A. G.; Muratov, E. N. Hierarchical QSAR Technology Based on the Simplex Representation of Molecular Structure. J. Comput. Aided Mol. Des. 2008, 22 (6-7), 403–421. (27) Zakharov, A. V.; Lagunin, A. A.; Filimonov, D. A.; Poroikov, V. V. Quantitative Prediction of Antitarget Interaction Profiles for Chemical Compounds. Chem. Res. Toxicol. 2012, 25 (11), 2378–2385. (28) Zakharov, A. V.; Peach, M. L.; Sitzmann, M.; Nicklaus, M. C. QSAR Modeling of Imbalanced High-Throughput Screening Data in PubChem. J. Chem. Inf. Model. 2014, 54 (3), 705–712. (29) Fedorova, E. V.; Buryakina, A. V.; Zakharov, A. V.; Filimonov, D. A.; Lagunin, A. A.; Poroikov, V. V. Design, Synthesis and Pharmacological Evaluation of Novel VanadiumContaining Complexes as Antidiabetic Agents. PloS One 2014, 9 (7), e100386. (30) Zakharov, A. V.; Peach, M. L.; Sitzmann, M.; Filippov, I. V.; McCartney, H. J.; Smith, L. H.; Pugliese, A.; Nicklaus, M. C. Computational Tools and Resources for Metabolism-Related Property Predictions. 2. Application to Prediction of Half-Life Time in Human Liver Microsomes. Future Med. Chem. 2012, 4 (15), 1933–1944. (31) Filimonov, D. A.; Zakharov, A. V.; Lagunin, A. A.; Poroikov, V. V. QNA-Based “Star Track” QSAR Approach. SAR QSAR Environ. Res. 2009, 20 (7-8), 679–709.

ACS Paragon Plus Environment

Page 41 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

(32) Kokurkina, G. V.; Dutov, M. D.; Shevelev, S. A.; Popkov, S. V.; Zakharov, A. V.; Poroikov, V. V. Synthesis, Antifungal Activity and QSAR Study of 2-Arylhydroxynitroindoles. Eur. J. Med. Chem. 2011, 46 (9), 4374–4382. (33) Lagunin, A.; Zakharov, A.; Filimonov, D.; Poroikov, V. QSAR Modelling of Rat Acute Toxicity on the Basis of PASS Prediction. Mol. Inform. 2011, 30 (2-3), 241–250. (34) Zakharov, A. V.; Peach, M. L.; Sitzmann, M.; Nicklaus, M. C. A New Approach to Radial Basis Function Approximation and Its Application to QSAR. J. Chem. Inf. Model. 2014, 54 (3), 713–719. (35) Muratov, E. N.; Artemenko, A. G.; Varlamova, E. V.; Polischuk, P. G.; Lozitsky, V. P.; Fedchuk, A. S.; Lozitska, R. L.; Gridina, T. L.; Koroleva, L. S.; Sil’nikov, V. N.; Galabov, A. S.; Makarov, V. A.; Riabova, O. B.; Wutzler, P.; Schmidtke, M.; Kuz’min, V. E. Per Aspera Ad Astra: Application of Simplex QSAR Approach in Antiviral Research. Future Med. Chem. 2010, 2 (7), 1205–1226. (36) Kuz’min, V. E.; Artemenko, A. G.; Muratov, E. N.; Volineckaya, I. L.; Makarov, V. A.; Riabova, O. B.; Wutzler, P.; Schmidtke, M. Quantitative Structure-Activity Relationship Studies of [(biphenyloxy)propyl]isoxazole Derivatives. Inhibitors of Human Rhinovirus 2 Replication. J. Med. Chem. 2007, 50 (17), 4205–4213. (37) Breiman, L. Random Forests. Mach. Learn. 2001, 45 (1), 5–32. (38) Polishchuk, P. G.; Muratov, E. N.; Artemenko, A. G.; Kolumbin, O. G.; Muratov, N. N.; Kuz’min, V. E. Application of Random Forest Approach to QSAR Prediction of Aquatic Toxicity. J. Chem. Inf. Model. 2009, 49 (11), 2481–2488.

ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 42 of 45

(39) Breiman, L. Classification and Regression Trees; The Wadsworth statistics/probability series; Wadsworth International Group: Belmont, Calif, 1984. (40) Svetnik, V.; Liaw, A.; Tong, C.; Culberson, J. C.; Sheridan, R. P.; Feuston, B. P. Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling. J. Chem. Inf. Comput. Sci. 2003, 43 (6), 1947–1958. (41) Lagunin, A.; Filimonov, D.; Zakharov, A.; Xie, W.; Huang, Y.; Zhu, F.; Shen, T.; Yao, J.; Poroikov, V. Computer-Aided Prediction of Rodent Carcinogenicity by PASS and CISOCPSCT. QSAR Comb. Sci. 2009, 28 (8), 806–810. (42) Artemenko, A. G.; Muratov, E. N.; Kuz’min, V. E.; Muratov, N. N.; Varlamova, E. V.; Kuz’mina, A. V.; Gorb, L. G.; Golius, A.; Hill, F. C.; Leszczynski, J.; Tropsha, A. QSAR Analysis of Nitroaromatics’ Toxicity in Tetrahymena Pyriformis: Structural Factors and Possible Modes of Action. SAR QSAR Environ. Res. 2011, 22 (5-6), 575–601. (43) Oprisiu, I.; Varlamova, E.; Muratov, E.; Artemenko, A.; Marcou, G.; Polishchuk, P.; Kuz’min, V.; Varnek, A. QSPR Approach to Predict Nonadditive Properties of Mixtures. Application to Bubble Point Temperatures of Binary Mixtures of Liquids. Mol. Inform. 2012, 31 (6-7), 491–502. (44) Muratov, E. N.; Varlamova, E. V.; Artemenko, A. G.; Polishchuk, P. G.; NikolaevaGlomb, L.; Galabov, A. S.; Kuz’min, V. E. QSAR Analysis of Poliovirus Inhibition by Dual Combinations of Antivirals. Struct. Chem. 2013, 24 (5), 1665–1679.

ACS Paragon Plus Environment

Page 43 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

(45) Zhu, H.; Tropsha, A.; Fourches, D.; Varnek, A.; Papa, E.; Gramatica, P.; Öberg, T.; Dao, P.; Cherkasov, A.; Tetko, I. V. Combinatorial QSAR Modeling of Chemical Toxicants Tested against Tetrahymena Pyriformis. J. Chem. Inf. Model. 2008, 48 (4), 766–784. (46) Sheridan, R. P. Time-Split Cross-Validation as a Method for Estimating the Goodness of Prospective Prediction. J. Chem. Inf. Model. 2013, 53 (4), 783–790. (47) Thomson

Reuters

Integrity

https://integrity.thomson-pharma.com/integrity/xmlxsl/

(accessed May 22, 2015). (48) Gautam, C. S.; Saha, L. Fixed Dose Drug Combinations (FDCs): Rational or Irrational: A View Point. Br. J. Clin. Pharmacol. 2008, 65 (5), 795–796. (49) DrugBank http://www.drugbank.ca/ (accessed Sep 22, 2014). (50) WHOCC - Home http://www.whocc.no/ (accessed Sep 22, 2014). (51) PDR.Net http://www.pdr.net/ (accessed May 26, 2015). (52) e-Therapeutics by CPhA http://www.e-therapeutics.ca/ (accessed May 26, 2015). (53) MedicinesComplete https://www.medicinescomplete.com/mc/ (accessed May 26, 2015). (54) Point of Care Medical Applications | Epocrates http://www.epocrates.com/ (accessed May 26, 2015). (55) Drugs.com

|

Prescription

Drug

Information,

Interactions

http://www.drugs.com/ (accessed May 26, 2015).

ACS Paragon Plus Environment

&

Side

Effects

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 44 of 45

(56) Wang, J.-F.; Zhang, C.-C.; Chou, K.-C.; Wei, D.-Q. Structure of Cytochrome P450s and Personalized Drug. Curr. Med. Chem. 2009, 16 (2), 232–244. (57) CHEMBL1909138

Assay

Report

https://www.ebi.ac.uk/chembl/assay/inspect/CHEMBL1909138 (accessed Jul 28, 2015).

ACS Paragon Plus Environment

Card

Page 45 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

Table of Contents Graphic

QSAR Modeling and Prediction of Drug-Drug Interactions Alexey V. Zakharov1, Ekaterina V. Varlamova2,3, Alexey A. Lagunin4,5, Alexander V. Dmitriev4, Eugene N. Muratov6, Denis Fourches7, Victor E. Kuz’min2, Vladimir V. Poroikov4, Alexander Tropsha6, Marc C. Nicklaus1*

ACS Paragon Plus Environment