The Development of Target-specific Machine Learning Models as

Feb 25, 2019 - ... Editors' Choice: Investigating Dry Eye Syndrome — and More! This week: Investigating dry eye syndrome — and more! Each and ever...
1 downloads 0 Views 1MB Size
Subscriber access provided by Washington University | Libraries

Pharmaceutical Modeling

The Development of Target-specific Machine Learning Models as Scoring Functions for Docking-based Target Prediction. Mauro S. Nogueira, and Oliver Koch J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.8b00773 • Publication Date (Web): 25 Feb 2019 Downloaded from http://pubs.acs.org on February 26, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

The Development of Target-specific Machine Learning Models as Scoring Functions for Dockingbased Target Prediction. Mauro S. Nogueira and Oliver Koch*. Faculty of Chemistry and Chemical Biology, TU Dortmund University, Otto-Hahn-Straße 6, 44227, Dortmund, Germany

ACS Paragon Plus Environment

1

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 45

ABSTRACT: The identification of possible targets for a known bioactive compound is of utmost importance for drug design and development. Molecular docking is one possible approach for the in-silico protein target prediction whereas a molecule is docked into several different protein structures to identify potential targets. This reverse docking approach is hampered by the limitation of current scoring functions to correctly discriminate between targets and non-targets. In this work, the development of target-specific scoring functions is described that showed improved prediction performances for the correct target prediction of both actives and decoys on three validation data sets. In contrast to pure ligand-based approaches, that are in general faster and include a greater target space, docking-based approaches can cover also unknown chemical space that lies outside the known bioactivity data. These target-specific scoring functions are based on known bioactivity data retrieved from ChEMBL and supervised machine learning approaches. Neural Networks and Support Vector Machines (SVMs) models were trained for 20 different protein targets. Our protein-ligand interaction fingerprint PADIF (Protein Atom Score Contributions Derived Interaction Fingerprint) represents the input for training, whereas the PADIFs are calculated based on docking poses of active and inactive compounds. Different datasets of previously unseen molecules were used for the final evaluation and analysis of the prediction performance of the created models. For a singletarget selectivity dataset, the correct target model returns in most of the cases the highest probabilities scores for their active molecules and with statistically significant differences to the other targets. These probability scores were also predicted and successfully used to rank the targets for molecules of a multi-target dataset with activity data described simultaneously for two, three, and four to seven protein targets.

ACS Paragon Plus Environment

2

Page 3 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

INTRODUCTION Conventional virtual screening of chemical libraries has been used widely in drug discovery and development to search for new modulators of a protein target1. In contrast, the in-silico target prediction in general, and inverse virtual screening of a known bioactive small molecule in particular, is rarely applied, although promising results were already achieved2,3 and experimental target validation still requires considerable efforts4. Several important applications exist for target prediction approaches5: The identification of off-targets for preventing side effects, the repurposing of known drugs, the prediction of polypharmacology effects and the identification of a target in phenotypic screening approaches. All of them are of utmost importance for both, industrial and academic research. Up to now, there are various methods available that address the prediction of protein targets for a molecule of interest. Based on the underlying data and approaches, they can be grouped into ligand- and target-centric approaches, in more detail into ligand-based, network-based, based on phenotypic analysis, and based on the 3D protein structure5-7. The ligand-centric approaches are most widely and successfully applied and rely on the comparison of 2D and/or 3D similarities between ligands or active molecules to suggest similarities with the target profile8-16. The most recent web-based target prediction tools described are the polypharmacology browser17 that uses a combination of several molecular fingerprints and bioactivity data retrieved from ChEMBL or CSNAP3D that combines 3D chemical similarity with network similarity analysis8. Other approaches are based on machine learning approaches, e.g. self-organizing maps, and show impressive results18.

ACS Paragon Plus Environment

3

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 45

In contrast, structure-based approaches utilize the three-dimensional information about protein structures for target prediction19. Molecular docking is the most widely applied approach20, but binding site comparisons21,22 or pharmacophore-based approaches23 are also described. In contrast to docking-based virtual screening, an inverse docking approach can be used to find and predict targets and off-targets for a given molecule. These methods are applied in approaches such as idTarget24, TarFisDock25, iRAISE26, and DPDR-CPI27,28 and depend mainly on scoring functions for the prediction and ranking of the most likely targets for a molecule. For a detailed overview about existing ligand and structure-based target prediction methods please refer to5-7,19,26. A list of available web servers can be found in Table 1. Table 1. Summary of available web servers for target prediction. Website

Principle

Input File

SuperPred Webserver9

http://prediction.charite.de/

Similarity distribution among the targets’ ligands

PubChem/ SMILES

Swisstarget Prediction10

http://www.swisstargetprediction.ch/

2D and 3D similarity search

SMILES

Cheaper11

lilab.ecust.edu.cn/chemmapper/

3D similarities may have relatively similar target association profiles

SMILES

ChemProt12

http://potentia.cbs.dtu.dk/ChemProt/

2D fingerprints comparison and search for target

name/ SMILES

Mantra 2.013

http://mantra.tigem.it/

Network theory and statistics on gene expression data and MoA. Integrates the novel drug in the network of compounds.

SMILES

Pharmaexpert

http://www.pharmaexpert.ru/passonline Fingerprint based Bayesian / approach; knowledge based on structure–activity relationships for more than 260,000 compounds

SMILES

14

ACS Paragon Plus Environment

4

Page 5 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

SEA15

http://sea.bkslab.org/

Pharmmapper http://lilab.ecust.edu.cn/pharmmapper/ 23

Uses fingerprint similarity of ligands to search for targets from ChEMBL

SMILES

Pharmacophore search based on 7000 target pharmacophores derived from complex crystal structures

SMILES

iD target24

http://idtarget.rcas.sinica.edu.tw/

Uses AutoDock Vina Scoring functions and MeDock docking to calculate binding affinities

several types

TarFisDock25

http://www.dddc.ac.cn/tarfisdock/

Docking the molecule in the Potential Drug Target Database outputs the top percent of structures ranked by energy score of DOCK program

mol2

Dpdr-CPI27, 28

https://cpi.bio-x.cn/drar/

Interaction profiles of the chemical-protein interactome of library molecules, using AutoDock Vina program + machine learning models

SMILES

SPiDER29

http://modlabcadd.ethz.ch/software/spider/

Self organizing maps created with chemical similarity, e.g, pharmacophore features, physical/chemical properties

SMILES

Target Hunter30

http://www.cbligand.org/TargetHunter/ Fingerprint similarity (ECFPs of the ChemAxon) based on ChEMBL data

SMILES

HitPick31

http://mips.helmholtzmuenchen.de/hitpick/

SMILES

Combines 2D fingerprints and a machine learning method based on a Laplacian-modified naive Bayesian model

Molecular docking is a common technique in structure-based drug design with the primary goal to predict the binding pose of a given ligand molecule to a molecular protein target by sampling possible conformations of the ligand molecule in the protein binding site. The selection of the most promising binding poses relies on scoring functions that are used to rank them to select the preferred one32,33. Popular docking algorithms perform fairly well in generating sound poses but

ACS Paragon Plus Environment

5

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 45

scoring functions most often fail to properly evaluate the binding affinity34, which also causes limitations for generating a relative ranking of different molecules1,35. Although it seems reasonable to use molecular docking for identifying a protein target for a known bioactive compound (“target prediction”)36, Kellenberger et al. have reviewed that ranking targets based on scoring functions are even more problematic. This hampers the use of molecular docking for prioritization of targets in target prediction approaches37. One reason is the “endogenous” variance among proteins, i.e., the scoring variation across the binding pockets. This means, the scores are not comparable for a specific molecule between different protein targets and therefore limit the target ranking capabilities. For a reliable application of docking in target prediction an improvement of scoring functions is therefore needed. Later analysis by Wang et al. confirmed this inter-protein scoring noise38 and they suggested a correction term to improve the target prediction performance. This correction term was based on the ratio between the hydrophobic surface area and the hydrophilic surface area in the binding site of the target protein. A few improvements regarding this effect were also implemented in DPDR-CPI, for which the docking scores are normalized by each drug and then by each protein28. Luo et al. showed that score normalization can improve inverse docking performance, but this depends on the docking software and the scoring function, respectively39. Interestingly, the application of convolutional neural networks in protein-ligand scoring improves the performance for pose prediction and virtual screening but decreases the performance in inter-target ranking40. The use of protein-ligand interaction fingerprints (PLIF) has shown to facilitate re-ranking ligand docking poses based on their similarity to the known binding modes of relevant reference molecules. Indeed, these fingerprints often outperformed standard scoring functions in terms of identifying correct ligand binding modes and recovering active compounds in virtual screening

ACS Paragon Plus Environment

6

Page 7 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

trials conducted on a range of target proteins41. Other studies have used PLIFs to enhance the performance of current scoring functions in binding affinity prediction42 or revealed machine learning models based on PLIFs outperform docking scores for in silico screening43,44. It was also shown by Kellenberger et al. that PLIF based post-processing of the docking results could improve the target prediction results, but a simple reference complex structure was not sufficient enough to massively improve the results37. Li et al. utilize PLIFs for the identification of a reliable docking poses before a final scoring of these poses are applied for target prediction45. Our group has developed PADIF (Protein Atom Score Contributions Derived Interaction Fingerprint)46, an n-bit atom-based fingerprint that contains the decomposed energy contributions of all protein binding site atoms, such as metal and hydrophobic interactions, hydrogen bonds and others. These energies, which are binding site specific, are calculated for a docking pose during docking with GOLD47. PADIF extracts these contributions from the docking pose and compare them with a reference ligand. PADIF has shown better performances for virtual screening than GOLD scores for most of the targets evaluated 46.

Figure 1: Overview about the presented docking-based target prediction approach. This encouraged us to explore PADIFs for inverse docking and propose a docking-based target prediction approach by combining the calculated PADIFs and machine learning methods for a

ACS Paragon Plus Environment

7

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 45

proof of concept study (see Figure 1). This leads to target-specific models that can be applied as target-specific scoring functions with improved performance in inverse docking. Bioactivity data stored in ChEMBL48 was used for development of various datasets based on active and inactive (decoy) molecules for 20 targets. These molecules were docked and the protein-ligand-interaction fingerprints (PADIFs) were calculated for the high-scoring binding poses and used throughout this approach. Based on the calculated PADIFs of the training and validation datasets, target-specific classification machine learning models were developed using neural networks and support-vector machines. The classification models were finally evaluated by means of unseen molecules (and their PADIFs as input) in a single- and a multi-target dataset, including inter-target selectivity and ranking. Obviously, the need for protein structures and the effort to develop models for each protein target hampers the application in comparison to pure ligand-based target prediction approaches. In case a molecule or fragment shows a reasonable similarity to already known compounds, the ligandbased approaches are clearly preferable. By contrast with the latter, docking-based approaches have the advantage that they can be used to predict molecules that represent previously unexplored chemical space. This indicates the need for reliable docking-based target prediction approaches, even when a huge amount of computational time is needed for the development of the underlying models and the evaluation of molecules of interest.

ACS Paragon Plus Environment

8

Page 9 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

METHODS Workflow from data collection to final datasets. Figure 2 describes the applied workflow for the creation of the different datasets for training, validation and evaluation. The ChEMBL database48 was the starting point for collecting the bioactivity data for the selected 20 protein targets. After preprocessing, the initial data was divided into datasets with molecules showing activity on one protein target and on multiple protein targets. The former dataset was used for training and validation of the models, as well as for inter-target selectivity, and the latter was used as a multi-target dataset for the final application in target prediction.

extraction

assay data for 20 protein targets

processing

actives & decoys single target data

ChEMBL

diversity selection

actives & decoys multi target data

remaining active data  lig_left dataset

training & validation data sets  training: train_set  testing: test_set  external: ext_set , ext_set_tf

selection

selection

inter-target selectivity dataset

multi-target dataset

Figure 2. Workflow for dataset creation. Data collection. The protein selection was focused on targets described previously in the work of Schomburg et al.26 and Bauer et al.49 and the availability of x-ray structures for docking (see Table 2). The protein targets consist of enzymes, receptors and ion channels. The bioactivity data of molecules assayed against these 20 proteins was extracted from the ChEMBL database48 (chembl_23) utilizing an in-house python script. Here, only confidence class 8 and 9 data points with IC50, Ki, and EC50 were considered. The bioactivity data was processed using a KNIME50

ACS Paragon Plus Environment

9

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 45

workflow: a) the data was converted into the logarithmic scale and filtered to remove duplicates; b) molecules with bioactivity data showing a difference greater than 1 logarithmic units of different measurements for one protein targets were excluded; c) molecules with reported bioactivity for more than one protein were set apart to build a multi-target set which was used for validation of the target prediction approach (see Multi-Target-Set.xlsx, Supporting Information). Table 2. Selected targets (ion channels, enzymes, and receptors) for modeling. Target

Uniprot ID

ACE BRAF COX-2 DHFR FXa GSK3B HDAC2 HDAC6 HDAC8 IGF1R JAK3 PDE5 PDK1 PI3Kg PIM-2 Thrombin TIE2 TPA Trypsin VEGFR1

P12821 P15056 Q05769 P00374 P00742 P49841 Q92769 Q9UBN7 Q9BY41 P08069 P52333 O76074 O15530 P48736 Q9P1W9 P00734 Q02763 P00750 P00760 P17948

PDB structure Angiotensin-1 converting enzyme 1uze Serine/Threonine-protein kinase B-Raf 3skc Cyclooxigenase-2 1cx2 Dihydrofolate reductase 1s3v Factor X (Prothrombinase) 1for Glycogen synthase kinase 3 beta 3i4b Histone deacetylase 2 3max Histone deacetylase 6 5edu Histone deacetylase 8 3sff Insuline-like growth factor 1 receptor 3nw7 Janus kinase 3 3lxl cGMP-specific phosphodiesterase type 5 1xp0 Piruvate dehydrogenase kinase-1 2xch Phosphoinositid-3-kinase gamma 3dbs (proto-oncogene)Serine/threonine-protein kinase 2 2iwi Thrombin 3rm2 Angiopoietin-1 receptor 2oo8 Tissue plasminogen activator 1a5h Trypsin 2g5n Vascular endothelial growth factor rec. 1 3hng Description

ACS Paragon Plus Environment

10

Page 11 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Data processing. The remaining data was sorted by activity range and divided into “decoys” and “actives” based on a specific activity data cut-off. All molecules with an activity below the threshold of pIC50 ≤ 5, i.e, IC50 ≥ 10,000 nM were considered as decoys. This cut-off was determined by the frequency distribution of the activity data (see Figure S2, Supporting Information). Molecules up to 1000-fold weaker than the most potent inhibitor for their respective targets are considered as actives. In most of the cases, the resulting pIC50 values was bigger than 6, as we were interested in molecules that show high affinity to be considered as actives. This is also in agreement with Lenselink et al. who created models using a ChEMBL bioactivity benchmark set with actives that show a measured IC50 below 300 nM51. The final active and decoy sets were created separately for each protein using a KNIME50 workflow. All molecules with low (< 180 u) and high (> 900 u) molecular weights were also eliminated in this step. The final models should be created with a balanced number of actives and decoys. Therefore, the number of decoys determined the number of actives for each protein, since the number of decoys is considerably lower for all targets in comparison to the number of actives (see Figure S2, Supporting Information). The size of the active sets were reduced by the calculation of the molecular Morgan fingerprints (RDkit Fingerprint node) and selection of the centroid molecules to a number close or equal to that of the decoys (RDKit Diversity Picker node). The active and decoy sets contained at the end around 200-400 molecules each with a different data size for each target (see Active_and_Decoys_Datasets.xlsx, Supporting Information). This data was used to build the model training/test and validation set for each target. The collected bioactivity data, that was left out after centroid selection to build the active datasets (lig_left), was also used as an external dataset for the evaluation of the model predictability.

ACS Paragon Plus Environment

11

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 45

Datasets for training and validation. The processed data was split into training and validation sets using a ‘random split’, whereas 10% of the data was partitioned with a fixed random seed and set apart for an independent external validation (ext_set). The remaining 90% were similarly partitioned in a 75% training (train_set) and 25% test set (test_set). For an unbiased evaluation of the trained models, all molecules were excluded from the ext_set that have a Tanimoto similarity ≥ 0.7 to any molecule of both the train_set and the test_set (see ext_tf_tanimoto_coefficients.csv, Supporting Information). Model selection. The final models were selected based on the best precision (1), accuracy (2), F1 score (3) and AUC values for both classes in the test_set and ext_set. These were calculated with the Scikit-learn public API in Python (sklearn.metrics function).54 For the evaluation of differences in the performances of SVM and NN methods, the validation accuracies on the validation (test_set, ext_set and lig_left) were compared for statistically significance differences with t-tests. 𝑇𝑟𝑢𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑟𝑢𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 + 𝐹𝑎𝑙𝑠𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 (1)

𝐹1𝑆𝑐𝑜𝑟𝑒 =

𝑇𝑟𝑢𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠

𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑟𝑢𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 + 𝐹𝑎𝑙𝑠𝑒𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠 (2)

2 × (𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑅𝑒𝑐𝑎𝑙𝑙) 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙

(3)

Preparation of molecules and protein structures. The small-molecules (ligands and decoys) were prepared for docking using the program MOE.52 They were first protonated using the “wash” function with the option “scale to reasonable bond length” enabled. After that, they were minimized using default settings and the following parameters: MMFF94x force field, option “add hydrogens” disabled, option “preserve existing chirality” enabled. Proteins were also prepared in MOE. For a final selection, all available PDB structures with high resolution for the same target

ACS Paragon Plus Environment

12

Page 13 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

were superposed in MOE. The Binding sites were visually inspected and one of the structures that represented the most common binding site was selected. In case the protein structures showed identical binding sites, the structure was selected that was already used in other docking-based studies

26,49.

Redundant chains, water molecules and ions were deleted. Protonation was carried

out using the function “protonate 3D” and manual correction using the option “correct”. The selected proteins are given in Table 2. Docking. Docking experiments were carried out using GOLD v5.6 with the default scoring function ChemPLP.47,53 Deviant from standard settings, the options “allow early termination” (Fitness and Search Options) and “Detect cavity — restrict atom selection to solvent-accessible surface” (Define Binding Site) were disabled. The options “flip pyramidal N”, “flip amide bonds” and “flip ring corners” (Ligand Flexibility) were enabled. ‘Search efficiency’ was set to 100% and the number of genetic algorithm (GA) runs was set to 100. The binding site was defined on the basis of the contained reference ligands in the complex structure with a radius of 7 Å. Besides, the options “write cavity atoms to file” and “save per atom scores” were enabled in all cases, since a list of the cavity atoms and the per atom scores are needed for the PADIF scoring.46 PADIF generation. The PADIFs were generated for the best docking poses as described in Jasper et al.46 PADIFs are derived from the per atom score contribution of the GOLD default scoring function ChemPLP either from rescoring the reference ligands (experimental complexes) or GOLD solutions files (docking poses) for the binding site atoms defined in the “cavity.atoms” file written out by GOLD. The PADIFs have the dimension N x 8, where N is the number of binding site atoms which contribute to the total ChemPLP score and 8 is the number of interaction terms (ChemScore_PLP.Hbond, ChemScore_PLP.CHO, ChemScore_PLP.Metal, PLP.S(hbond), PsLP.S(metal), PLP.S(buried), PLP.S(nonpolar) and PLP.S(repulsive)). Depending on the

ACS Paragon Plus Environment

13

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 45

contributions, the respective float values have different signs: for ChemScore_PLP.Hbond, ChemScore_PLP.CHO and ChemScore_PLP.Metal, positive values represent favorable interactions, for the other contributions negative values represent favorable interactions. For easier processing, signs of ChemScore_PLP.Hbond, ChemScore_PLP.CHO and ChemScore_PLP.Metal are reversed in the PADIF generation process, so that negative values always represent favorable interactions. Further details can be found in Jasper et al.46 Model Creation. The module “Gaussian Random Projection” from scikit-learn54 was applied to reduce the dimensionality of the data for faster processing times and smaller model sizes. Based on the retrieved PADIFs, the NN models were created with Keras using the Theano backend55. The NN was composed of the following architecture: an input layer, for instance, with 200 nodes connected to 3 hidden layers of 1000, 500, 250 of rectified linear units (ReLU). The nodes of the hidden layers used a linear activation function. The following parameters were determined together with the number of neurons on the input layer after a grid search of several parameters and evaluated by the best accuracy and loss obtained during training: weight initialization = 5, regularization = l2 and dropout rate = 0.75. The output layer used a sigmoid function, as the problem is a two-task classification. The two classes were represented by 1 (actives) and 0 (decoys). For model compilation, “Adam” was utilized as optimizer with the default parameters. To prevent overfitting, besides the use of dropout rate on the hidden layers, early stopping was used to validate the loss and accuracy on a test set (25% of the training data), and the model stopped training if the network does not improve after 400 epochs. Therefore, the number of epochs was variable for each model. The batch sizes ranged from 200-800. SVMs were trained using the SVC class from scikit-learn54. Models were parametrized as following: radial basis function kernel (rbf) wherein gamma was set at ranges of 3-10 and epsilon

ACS Paragon Plus Environment

14

Page 15 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

was set at ranges of 0.001-0.009. The best combination of gamma and epsilon was optimized for each model. The output data consisted of the binary classifiers 1(actives) and 0 (decoys). For the final prediction, the output is given by probability scores, where values over 0.5 should belong to class 1 and below to class 0. Dataset for Inter-target Selectivity. For testing the inter-target selectivity of the 20 created models for each target, a dataset was created out of the lig_left dataset with sample groups of 80100 target-specific active molecules. For this, a diversity-based selection was used for retaining the structural diversity in the sample for each target (total = 1940 molecules). This dataset was docked in all targets, the PADIFs were calculated and the probability scores were predicted by each target model. The latter means that, e.g., active molecules for ACE were predicted by the ACE model and the further 19 target models. The data distribution in each target was calculated by the Shapiro-Wilk Normality Test in R.56 This showed that the datasets were not normally distributed (data not shown). A pairwise Wilcoxon test was employed on the probability scores of each group of target-specific molecules in each target model. The Wilcoxon test makes corrections for multiple testing and it is used when the data is not normally distributed. This leads to a comparison based on the distribution and not only based on the mean. The “paired” form was used as the probability scores for each group are independent for all targets. If a target-specific group presented probability scores in its corresponding model higher and statistically significant different (p < 0.05) from that of the other 19 target models, that target model is rather selective. Multi-Target Dataset for Inter-target Ranking. A multi-target dataset was created before data selection for the training and validation sets. This dataset contained unseen active and/or inactive molecules with reported data for more than one target. For data representation and analysis of the results, we subdivided the dataset in three subsets according to the number of active/inactive

ACS Paragon Plus Environment

15

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 45

described. This leads to multi-target sets with two proteins, targets three proteins targets, and fourto-seven protein targets. All molecules were docked into the 20 proteins followed by the calculation of the respective PADIF. Using the target-specific models, the probability scores were predicted for all molecules. The probability scores of the 20 models were top-down listed for every molecule. Proteins with probability scores < 0.5 were considered as a non-target. Proteins were considered as targets if they showed probability scores > 0.5 and appeared on the top of the list, e.g., among the first top2 to top6 positions for the two targets set, among the first top3 to top6 for the three targets set, and among the first top4 to top8 for the four-to-seven targets sets. Finally, the percentages of the correctly predicted targets and non-targets were calculated. RESULTS AND DISCUSSION Machine Learning Modeling based on PADIFs and ChEMBL data. Table 3 reports the AUC for the best performing models for the 20 targets that were trained using the training data sets for each individual target. The best models were chosen based on best precision, accuracy, F1 score (a measure of test accuracy) and AUC values for both classes in the test_set and ext_set (see Table S1 for statistics on training and test set for all reliable models). In the training sets (train_set), accuracies ranged from 0.94 to 1.0. For the neural network based models, the accuracies ranged from 0.69 to 0.9 for the random test sets (test_set), from 0.71 to 0.9 for the external sets (ext_set), from 0.6 to 1 for the external set with Tanimoto selected structures (ext_tf), and from 0.69 to 1 for the data containing actives left after centroid selection (lig_left). The SVM based models showed accuracies from 0.71 to 0.88 (test_set), 0.68 to 0.91 (ext_set), 0.64 to 1 (ext_tf), and 0.75 to 1 (lig_left). The accuracies for the external dataset (ext_set), which consisted of 10% of the initial data that was split for validation, were also consistently comparable with that for the test (test_set) and the lig_left set.

ACS Paragon Plus Environment

16

Page 17 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Table 3. Summary of the performances (accuracy) for the best models created with Neural Networks (Keras library with Theano55) and SVM (Scikit-learn54). a) AUC of the models in the prediction of both actives and decoys; train_set: training set; test_set: test set; ext_set: external set; ext_tf: external set containing only molecules structurally dissimilar from that of train_set and test_set; lig_left: data set with active molecules left after centroid selection. *In this case the accuracy is reported, as the dataset consist only of active molecules).

train_set NN SVM ACE 0.99 0.97 BRAF 0.99 0.99 COX2 0.99 0.98 DHFR 0.97 0.99 FXa 0.84 0.96 GSK3B 0.96 1.00 HDAC2 0.98 0.95 HDAC6 0.99 0.96 HDAC8 1.00 0.97 IGF1R 1.00 1.00 JAK3 0.99 0.99 PDE5 1.00 1.00 PDK1 1.00 0.99 PI3Kg 0.91 0.93 PIM2 0.99 0.99 Thrombin 0.94 0.96 TIE2 0.94 0.99 TPA 0.94 0.97 Trypsin 1.00 0.98 VEGFR1 0.98 0.97

test_set NN SVM 0.87 0.85 0.79 0.86 0.89 0.88 0.77 0.74 0.76 0.78 0.74 0.80 0.78 0.78 0.91 0.91 085 0.82 0.83 0.84 0.81 0.83 0.73 0.82 0.84 0.85 0.72 0.75 0.79 0.82 0.85 0.84 0.80 0.78 0.69 0.76 0.78 0.82 0.90 0.91

AUC a ext_set NN SVM 0.90 0.87 0.85 0.83 0.82 0.84 0.77 0.86 0.78 0.82 0.71 0.68 0.79 0.75 0.85 0.89 0.80 0.84 0.78 0.81 0.77 0.85 0.86 0.84 0.84 0.91 0.77 0.84 0.83 0.80 0.8 0.80 0.88 0.91 0.73 0.81 0.82 0.80 0.76 0.83

ext_tf NN SVM 0.84 0.92 0.92 0.84 0.83 1.00 0.70 0.80 0.86 0.74 0.64 0.65 0.84 0.80 0.68 0.83 0.77 0.67 0.79 0.83 0.77 0.75 0.86 0.93 0.75 0.86 0.72 0.82 0.82 0.78 0.73 0.74 0.80 0.77 0.60 0.65 0.73 0.64 0.82 0.59

lig_left* ANN SVM 0.79 0.78 0.87 0.89 1.00 1.00 0.86 0.90 0.69 0.82 0.78 0.81 0.86 0.83 0.86 0.88 0.84 0.75 0.82 0.89 0.80 0.88 0.81 0.88 0.96 0.95 0.76 0.76 0.80 0.85 0.79 0.75 0.86 0.88 --0.77 0.77 0.89 0.94

The accuracy and AUC-ROC comparison between the ext_tf set and other validation sets for each target (see Figure 3 and Figure S1, Supporting Information) showed that for 14 target models the ext_tf accuracies were equal or even higher than that for their respective ext_set. For GSK3B,

ACS Paragon Plus Environment

17

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 45

HDAC8, Thrombin, TPA, Trypsin and VEGFR1, the models showed lower accuracies in the ext_tf (0.59 to 0.80) than that for their respective ext_set, but those target models showed also overall lower accuracies on other validation sets (Table 3 and Figure 3). The ext_tf sets for HDAC8 (N=14), TPA (N=17), Trypsin (N=14), and VEGFR1 (N=19) were indeed unbalanced with respect to the two classes (data not shown) and contained smaller N, which might have contributed for a decrease in the model accuracy. The ext_tf contained only active and decoy molecules that presented lower structure similarity than that for the molecules of training and test sets (Tanimoto < 0.7). So, our results suggest that most of the machine learning models created with the PADIFs and ChEMBL data have good classification power for new molecules independently of their structural similarity. These results are in agreement with the previously described observations, that interaction fingerprints of known protein-ligand complex structures should enable interactionbased comparisons that are rather independent of the small molecular structure.46,57,58 The SVM models showed slightly higher prediction accuracies than NN models, although NN models showed slightly better accuracies for BRAF, HDAC2, HDAC8, and Thrombin. There were no statistically significant differences in the performances between the models created with NN and SVM, except for TPA and JAK3, that justify choosing one method over the other (Figure 3). Moreover, it was observed that the NN models return overall higher probabilities scores than the SVM models. As the probability scores needs to be comparable among targets for the inter-target prediction, we combined the prediction power of both NN and SVM models in a consensus model. The probabilities scores in the consensus models, which were used to rank the targets, consisted basically of the averaged scores between the NN and SVM models.

ACS Paragon Plus Environment

18

Page 19 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 3: Comparison of the prediction performances between NN and SVM models for each target. Data consisted of the validation accuracies (test_set, ext_set and lig_left) of the best models. Same color indicates the same target. (*) Statistically significant differences between NN and SVM (p < 0.05, unpaired t-test, data normal like distributed). Inter-target Selectivity Prediction. The next analysis deals with the prediction of inter-target selectivity. This means, the correct classification of molecules as active in their respective protein target and inactive in all other targets. For this, an inter-target selectivity dataset was built out of the lig_left set with nearly 100 active molecules per target (overall number: 1940 molecules). Although it may contain molecules that are active for more than one target, this was unfortunately not experimentally tested. However, it is more likely that most of the active molecules are inactive on the other targets. Thus, if the target-specific models are selective enough, it is expected that

ACS Paragon Plus Environment

19

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 45

they return lowest probabilities for the inactive molecules — the active molecules of the other targets — and the highest probability scores for their own true active molecules. This dataset was docked in all the 20 targets and the generated PADIFs used for the prediction in each target model. Figure 4 and Table 4 show the results for a pure ChemPLP scoring-valuebased target prediction without the usage of the generated PADIFs and the created prediction models. Figure 4 shows the box plots for the rank distribution for the correct target within a sorted list based on the scoring values of the highest scored pose in each target. The perfect target prediction would always lead to rank one for the correct target. The box plots in contrast show that there is a quite broad rank distribution. Table 4 shows that only in a low amount of cases the correct target is on rank one. In contrast, proteins like HDAC8 often create poses with very high scoring values for all ligands. So, one could argue that this protein has a huge influence on the target prediction. Removing HDACs from this analysis, does not change the results dramatically. Then the BRAF protein occurs often on rank one (data not shown). Based on these analyses it can be said that a pure scoring-value-based approach fails for this dataset, which is in agreement with already discussed studies.37,38 We also tested an available machine-learning based scoring-function that showed improved performance in virtual screening.59 In contrast to our approach of targetspecific scoring functions, they developed a generalized scoring function that can be applied on every protein structure. Table S2 and Figure S2 show the results of the pose rescoring. Here, the protein targets IGF1R and HDAC2 are the ones with most high-scoring poses. This is similar to the ChemPLP-based analysis, were HDAC8 exhibits docking poses with a high-score which biased the results. It is also in agreement to a described behavior for a neural-network-based scoring function, where an improved virtual screening performance can lead to a decrease in inter-target ranking performance.40

ACS Paragon Plus Environment

20

Page 21 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 4. Box-plots for rank distribution (1-20) of each ligand set of the inter-target selectivity dataset (lig_left) when docked in all targets. The targets were ranked for each ligand based on the calculated highest score value in each target. These are results for a pure ChemPLP scoring-valuebased target prediction without the usage of the generated target-specific models.

Table 4. Summary of targets with highest scored pose for each ligand out of the inter-target selectivity dataset of 1940 molecules. The first column shows the known target of the ligand set, the last column the number of molecules with known activity for this target. The columns in between show the number of times a specific target was predicted, which means the docking pose of a specific ligand in this target got the highest ChemPLP score value in comparison to all other targets. The red box indicates the correct predicted target for each ligand set. These are results for a pure ChemPLP scoring-value-based target prediction without the usage of target-specific models.

ACS Paragon Plus Environment

21

Journal of Chemical Information and Modeling

Distribution of predicted targets FactorXa

GSK3B

HDAC2

HDAC6

IGF1R

JAK3

PDE5

PDK1

PI3Kg

PIM2

TIE2

TPA

Thrombin

Trypsin

VEGFR1

3

1

0

9

4

39

0

0

0

0

0

1

0

8

0

0

3

101

BRAF

5

23

2

3

0

0

2

4

27

0

0

6

0

1

6

17

0

0

0

4

100

COX2

13

2

27

1

1

0

0

2

38

0

0

8

0

0

2

1

0

1

0

1

97

DHFR

0

5

1

5

1

0

20

5

7

1

0

2

0

0

11

0

1

1

0

8

68

FactorXa

16

6

0

9

9

0

1

1

7

0

0

0

0

0

0

0

14 27

0

0

90

GSK3B

5

8

2

2

2

0

16 16 26

0

3

7

0

1

0

1

6

1

0

4

100

HDAC2

22

2

2

1

1

0

22 20 11

0

1

2

0

2

0

0

3

3

0

1

93

HDAC6

6

4

0

1

0

0

24 43 15

0

0

1

0

1

2

0

1

1

0

0

99

HDAC8

2

4

2

1

0

0

21 44 12

0

0

0

0

0

2

1

0

3

0

1

93

IGF1R

0

36

0

7

2

0

1

0

30

0

2

4

0

0

4

0

2

1

0

11

100

JAK3

0

10

0

8

2

0

8

3

34

0

9

12

1

1

4

2

1

3

0

11

109

PDE5

6

3

1

17

0

2

1

10 15

0

0

30

0

1

7

1

3

4

0

2

103

PDK1

5

6

2

5

1

0

6

10 51

0

5

3

0

2

6

2

4

1

0

1

110

PI3Kg

5

7

0

12

0

0

21 13 23

0

2

7

1

2

8

0

0

0

0

2

103

PIM2

0

7

1

8

1

2

0

7

21

0

4

19

0

3

18

1

1

4

0

1

98

TIE2

9

18

0

28

0

0

11

2

8

0

1

4

0

0

6

8

0

0

0

3

98

TPA

9

17

1

4

3

0

9

11

9

0

1

2

0

0

0

3

7

13

1

6

96

Thrombin

HDAC8

DHFR

2

ACE

BRAF

COX2

No of ligands per target

15 16

Ligand

ACE

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 45

10 28

2

7

2

0

7

4

15

0

0

3

0

0

0

0

5

10

0

3

96

Trypsin

4

19

1

11

5

0

1

21

3

0

0

2

0

0

1

0

10

6

1

1

86

VEGFR1

6

17

0

13

0

0

7

2

11

0

1

0

1

0

0

0

1

0

0

41

100

The results summarized in Figure 5 show that 12 target-specific models exhibited very good selectivity, as they predicted their active molecules with the highest probabilities and statistically significant different in comparison to the other 19 targets (plots where only a single box is colored, Figure 5). A similar behavior was observed for GSK3B, PIM2, Thrombin and Trypsin models which showed one additional model with statistically significant false active probability scores. Factor Xa and TPA models performed worse. These models cannot discriminate or predict their active molecules with the highest probability score. Here, other models also show similar high probability scores (Figure 5). The TPA models have shown the lowest accuracies on the validation

ACS Paragon Plus Environment

22

Page 23 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

sets, which suggests some limitations on the overall predictions and thus on the selectivity. The prediction performance of the Factor Xa models were not as good as for most of the other target models. This agrees with our previous study, where the PADIF approach was evaluated for virtual screening. Here, PADIF-based virtual screening for Factor Xa was worse than scoring-based ranking using the scoring functions of GOLD.47 As the models depend on the docking for the generation of the best poses and the PADIF for modeling, limitation for a given target in both methods may be reflected on the prediction capability of the models for that target. The HDAC8 model showed good selectivity for correctly predicting their active molecules. However, this model also estimated high probability scores for active molecules of several other targets, i.e., FXa, GSK3B, HDAC2, HDAC6, PIM2, TPA, and Trypsin (Figure 5). This could indicate a kind of promiscuity of the target, although it is very unlikely that HDAC8 is a promiscuous target. Therefore, we concluded that the target-specific model is selective for molecules active on HDAC8 molecules, but the model also predict activity for other molecules. This can presumably be explained by the specific zinc-binding moiety of HDAC8 inhibitors combined with mainly hydrophobic scaffolds. The HDAC8 inhibitors only find their specific interaction pattern with a high probability score in the HDAC8 model. In contrast, other hydrophobic scaffolds missing this zinc-binding moiety can also bind with a high probability score which would explain the predicted promiscuity. As a conclusion, the zinc-binding does not seem to be as important for scoring in the HDAC8 model as it should be to properly discriminate HDAC8 inhibitors from other molecules. In general, Trypsin, HDAC8 and ACE models returned high probability scores among the selected targets (Figure 5). This may lead to a prediction bias for the inter-target prediction as they might be often ranked among the top targets, i.e., to be considered as more promiscuous targets.

ACS Paragon Plus Environment

23

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 45

ACE, an enzyme of the renin angiotensin system,60 has already been considered as a quite promiscuous target and is known to act in several physiological and pathophysiological processes, like induction of cell proliferation, angiogenesis, fibrosis, blood pressure control, apoptosis, fertility, and inflammation.61,62 Protein family selectivity. The HDAC protein targets also allow analyzing protein family selectivity. Interestingly, the different HDAC models show different selectivity capabilities. The HDAC2 model was not selective for their active molecules when compared with HDAC6 and HDAC8. The HDAC6 model was slightly selective in comparison to HDAC2, but not in comparison to HDAC8. These results can be explained by the fact that HDAC binding sites are very similar and only a few specific inhibitors are existing.63,64 This makes it considerably difficult to achieve selectivity within this family for target prediction approaches. Thus, the models might predict if a given molecule has among their targets an HDAC rather than a specific target of the HDAC family.

ACS Paragon Plus Environment

24

Page 25 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

ACS Paragon Plus Environment

25

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 45

Figure 5. Box plots for the probability scores of active molecules for the inter-target selectivity dataset (80-100 centroid selected active molecules for each target) predicted by each targetspecific model (consensus model). Colored boxes in the plots indicate no statistically significant difference between the targets, but a statistically difference to the grey boxed targets (p < 0.05, paired Wilcoxon test, data departs from normal distribution). Red arrow indicates the correct predicted target. In most of the cases, models for a given target predicted the corresponding active molecules with the highest probabilities and statistically significant different from that of the other 19 target models. Multi-target Prediction. As seen in the previous section, some molecules are predicted to be active on several protein targets. For analyzing this in more detail, a multi-target dataset was created. This dataset consists of molecules that were tested on more than one target with reported activity or no activity on this target. Therefore, the molecules are assigned true active and/or true inactive for more than one target simultaneously. After docking the data into all targets, their PADIFs were calculated and the probability scores were predicted by each target model. The probability scores from all target models were used to rank the targets for each molecule. A protein is considered as a predicted target for a given molecule if it appears on the top of the list with probability score > 0.5. Targets were considered correctly predicted if the true targets appeared among the first top2 to top6 positions for the two multi-target targets case, among the first top3 to top6 for the three multi-target case, and among the first top4 to top8 for the four to seven multitargets case. The multi-target subset with two reported activities consists of the largest number of molecules (N=1141, 626 are active on both targets, 468 are inactive on both targets, and 47 are active on one and inactive on the other target). For the 626 molecules in this dataset that are active for two

ACS Paragon Plus Environment

26

Page 27 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

targets, 52.4% had both targets predicted within the top6 predictions and 64.1% had at least one target predicted among the first top4 targets (Figure 6A). For 30% of the molecules active on two targets, only one target was correctly predicted at all (Figure 6B) and this target was correctly predicted in 75% of the cases within the first top 4 of the predicted targets (Figure 6D). The lower selectivity among the targets evaluated for Factor Xa (see Figure 5) also lowers the predicting power for these two multi-target datasets, since Factor Xa represented one of the predicted targets in 19.4% (122) of the active molecules. Another influence on the overall performance is the difficult selectivity for the HDAC targets. The overall percentages of molecules with their targets correctly predicted increased when HDACs were considered as a family target (Figure 6C). For the 47 HDAC molecules in this dataset that are classified as inactive for one target and active for the second, their targets were also predicted. Their true targets were ranked among the first four targets for 64% of the molecules (Figure 6E). Interestingly, for the 468 inactive molecules for both targets, 76% had both inactive proteins predicted as non-targets (probability score under 0.5), 21% had at least one inactive protein predicted as non-targets (Figure 6F), and only 3% were not correctly predicted for both targets, i.e., predicted with probability scores over 0.5. Overall, this analysis shows a reasonable performance for the models on the two multi-target datasets for the prediction of actives and inactive molecules.

ACS Paragon Plus Environment

27

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 45

Figure 6. Prediction of targets for active and inactive molecules and inter-target ranking for the multi-target data (results for the two targets case). The target ranking with the cumulative percentage of active molecules is shown with their targets predicted among the top1 and top6 positions: Active molecules for two targets or at least one target (A), active molecules for two targets and no HDAC selectivity (C), active molecules for two targets, but only one target correctly predicted (D), and molecules active only for one target (E). The pie charts show the percentages of true and false predictions for active molecules (B) and the percentages of true and false predictions for inactive molecules (F). For the three multi-target case (N=191), the three targets and at least two targets were correctly predicted among the first top 6 targets for 46.5% and 74.6% of the active molecules, respectively (Figure 7A). Similar as stated above for the 2 targets case, the overall percentages are influenced by the lack of selectivity on the HDACs, since 58 of 71 active molecules have HDAC2, HDAC6 and HDAC8 as their targets simultaneously. For 28% of the molecules only two of the three targets were correctly predicted (Figure 7B) and 75% and 100% had their target ranked at the first top4 targets and first six targets, respectively (Figure 7C). For the inactive molecules, 52% have their

ACS Paragon Plus Environment

28

Page 29 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

three tested proteins classified as non-targets, while 35% of the molecules were mistakenly predicted to have at least one of the proteins as a target, and 12% of the molecules were mistakenly predicted to have two targets, and only 1% was mistakenly predicted to have three targets (Figure 7D).

Figure 7. Prediction of active and inactive molecules and inter-target ranking for the multi-target validation set (results for the three targets case). The target ranking with the cumulative percentage of active molecules is shown with their targets predicted among the top1 and top6 positions: Molecules active on three targets (A), molecules active on three targets, but only one target correctly predicted (C). The pie charts show the percentages of true and false predictions for inactive molecules (B) and the percentages of true and false predictions for active molecules (D).

ACS Paragon Plus Environment

29

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 45

For the four-to-seven multi-target case, the percentage of targets correctly predicted are presented, as this dataset does not contain many molecules. 82% of the targets were correctly predicted for the active molecules and ranked at top positions, while 78% of the targets were correctly predicted as non-targets for the inactive molecules (Table 5). Further targets have been predicted and were also listed for each molecule in Table 5. It was observed that trypsin, HDAC8 and ACE were often suggested as targets. As described above, the models for those targets overestimate the probability scores, and therefore, the prediction regarding those targets should be doubted. Therefore, improvements are still necessary on their respective models. For the 20 selected targets, only 25 molecules were found to be tested for four targets, seven molecules for five targets, two molecules for six targets and two molecules for seven targets. This was somehow expected for such diverse group of targets, as it is less likely to find molecules in publicly available databases such as ChEMBL that have been tested for a wide range of targets. In general, researchers test rather selected molecules over a specific target than the opposite. This is the main limitation for a complete validation in an inter-target prediction, as it is not practical to know if new predicted targets for a given molecule are in fact their true targets, unless these molecules are tested experimentally.

ACS Paragon Plus Environment

30

Page 31 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Table 5. Prediction of active and inactive molecules and inter-target ranking for the multi-target validation set based on the trained models for each target (results for the four-to-seven targets case). Predicted targets (prob. Targets with experimental data but mostly without activity score > 0.7) in ranking in ChEMBL (coloring: active a; inactive b) c order based on score ACE, PDE5, PI3Kg, 1 CHEMBL1834657 IGF1R GSK3B JAK3 PIM2 IGF1R, JAK3, TIE PI3Kg, IGF1R, HDAC8, 2 CHEMBL2048912 IGF1R BRAF VEGFR1 GSK3B PDE5, ACE, BRAF VEGFR1, ACE, PIM2, 3 CHEMBL223460 PDK1 IGF1R JAK3 PIM2 Trypsin, GSK3B BRAF, JAK3, ACE, 4 CHEMBL2348417 IGF1R VEGFR1 GSK3B TIE2 PI3Kg, Thrombin, PDE5 Molecules ChEMBL ID

5 CHEMBL3606021 HDAC8, DHFR, JAK3

PDK1

GSK3B

JAK3

PIM2

ACE, BRAF, HDAC8, 6 CHEMBL3785951 Trypsin, DHFR, GSK3B, PI3Kg

PDK1

PI3Kg

GSK3B

JAK3

7 CHEMBL2001539 HDAC8

IGF1R VEGFR1

JAK3

PIM2

8 CHEMBL451401

HDAC8, PDE5

IGF1R VEGFR1

JAK3

PIM2

9 CHEMBL3356117

HDAC2, Trypsin, ACE, BRAF, IGF1R, GSK3B

IGF1R

BRAF

VEGFR1 GSK3B

10 CHEMBL1991782 HDAC8, VEGFR1

IGF1R VEGFR1 GSK3B

JAK3

11 CHEMBL1448

PDK1

GSK3B

TIE2

VEGFR1 GSK3B

JAK3

IGF1R VEGFR1 GSK3B

JAK3

HDAC8, IGF1R

12 CHEMBL3355482 ACE, PIM2, BRAF, TIE Trypsin, PDE5, JAK3, 13 CHEMBL3818247 BRAF, Fxa, VEGFR1, PIM2 Trypsin, PI3Kg, ACE, 14 CHEMBL1614725 JAK3, HDAC8, PDE, Thrombin, PIM2 PI3Kg, PDE5, Trypsin, 15 CHEMBL2148053 HDAC8 VEGFR1, IGF1R, Fxa, 16 CHEMBL2409778 PDE5, PI3Kg, ACE, DHFR Trypsin, ACE, PI3Kg, 17 CHEMBL1929238 PDE5, VEGFR1, Fxa 18 CHEMBL3741589 HDAC8, HDAC6 Trypsin, IGF1R, PI3Kg, ACE, JAK3, BRAF, PDE5, VEGFR IGF1R, HDAC8, 20 CHEMBL3745885 VEGFR1, GSK3B 19 CHEMBL103667

PDK1

BRAF

TIE2

IGF1R

BRAF

VEGFR1 GSK3B

TIE2

IGF1R

BRAF

VEGFR1 GSK3B

TIE2

IGF1R

BRAF

VEGFR1 GSK3B

TIE2

IGF1R VEGFR1 GSK3B

JAK3

PDK1

IGF1R VEGFR1 GSK3B

IGF1R

BRAF

PDK1

IGF1R

VEGFR1 GSK3B BRAF

PIM 2 PIM JAK3 2 TIE2

JAK3 TIE2 PIM2

VEGFR1 JAK3 TIE2 PIM2

ACS Paragon Plus Environment

31

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecules ChEMBL ID

Correctly predicted targets within top rank based on probability score

Page 32 of 45

Targets with experimental data in ChEMBL (coloring: active a; inactive b)

c

21 CHEMBL281872

top2

PDK1

IGF1R VEGFR1

TIE2

22 CHEMBL1983268 top2

PDK1

IGF1R

GSK3B

JAK3

23 CHEMBL2031893 top2

IGF1R

JAK3

TIE2

PIM2

24 CHEMBL3621294 top3

VEGFR1 HDAC2 HDAC8 HDAC6

25 CHEMBL3622533 top5

PI3Kg

26 CHEMBL388978

top5

PDK1

27 CHEMBL3621296 top5

PDK1

28 CHEMBL460472

top5

IGF1R

JAK3

TIE2

PDK1

29 CHEMBL1945559 top6

IGF1R

GSK3B

JAK3

PIM2

30 CHEMBL599428

top6

IGF1R

GSK3B

JAK3

PIM2

31 CHEMBL1784637 top7

BRAF

VEGFR1

JAK3

TIE2

32 CHEMBL3827894 top7

PI3Kg

HDAC2 HDAC8 HDAC6

33 CHEMBL105819

top8

HDAC2 HDAC8 HDAC6 BRAF

PIM2

JAK3

HDAC2 HDAC8 HDAC6

Thrombin

FXa

TPA

Trypsin

34 CHEMBL2206666 top8

IGF1R

BRAF

VEGFR1 GSK3B

35 CHEMBL1254007 top8

IGF1R

BRAF

VEGFR1 GSK3B

36 CHEMBL3827281 top7

PI3Kg

IGF1R

HDAC2 HDAC8 HDAC6

37 CHEMBL3651966 top5

PDK1

IGF1R

GSK3B

JAK3

TIE2

PIM2

a Green

and red boxes: molecule is experimentally active on that protein; green box: protein was correctly predicted as a target for the molecule (true prediction); red box: protein was not predicted as target for the molecule (false prediction). b Blue

and yellow boxes: molecule is experimentally inactive on that protein (decoy); blue box: protein was correctly predicted as non-target for the molecule; yellow box: protein was false predicted as a target for the molecule. c Ranking positions (top2 to top8) for targets predicted and in accordance with experimental data.

New targets predicted are placed in the ranking order (decreasingly) separated by comma.

ACS Paragon Plus Environment

32

Page 33 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

SUMMARY AND CONCLUSION Overall, the presented analysis shows that it is possible to develop target-specific scoring models based on machine learning methods using docking and PADIFs for the reasonable activity prediction of previously unseen molecules. Here, we used the protein-ligand interaction fingerprint PADIF46 that is based on protein-atom score contributions of the GOLD47 scoring function ChemPLP53 in combination with ChEMBL bioactivity data. With those target-specific models it was not only possible to classify a molecule as active or inactive for a given target, but also to calculate their probability scores for that target. These probability scores generated by each model were used for ranking and the inter-target comparison. The results showed that the approach led in most of the cases to reasonable target prediction by the selection of the top ranked targets and comparison with the known target activity profile for a molecule. Furthermore, reasonable prediction capabilities are achieved also for compounds with bioactivity data for more than one of the 20 targets. Two major problems had to be solved throughout the development of the protein target models: The identification and classification of active and decoy molecules and how to deal with decoy molecules in a docking-based approach, since decoys do not bind to the protein target. In the first case, a KNIME based workflow was developed that automatically process the bioactivity data from ChEMBL for each target into the needed datasets. The classification into active and decoys was based on analysis of the bioactivity data frequency distribution (Figure S3). All molecules under the threshold of pIC50 ≤ 5, i.e., IC50 ≥ 10,000 nM, were considered as real decoys showing a very low activity. All molecules up to 1000-fold weaker than the most potent inhibitor were considered as actives leading to active molecules with pIC50 > 6 in most of the times. This is also

ACS Paragon Plus Environment

33

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 45

in agreement with Lenselink et al. who created lately models using a ChEMBL bioactivity benchmark set with actives showing an activity with IC50 lower than 300 nM.50 The second problem dealt with the application of decoy molecules during target model creation. In principle, inactive molecules could not show reasonable docking poses, since they do not bind to the protein target. However, due to the simplification and resulting short comings of scoring functions, inactive molecules cannot be recognized reliable based on scoring values alone. As described, this question the use of molecular docking for prioritization of targets in target prediction approaches based on scoring.37 Therefore, the trained models have to learn the produced docking poses of inactive molecules and the underlying artificial protein-ligand interactions. For model training the high-scoring docking pose of the decoy molecules are used based on the assumption that protein-ligand interactions of active molecules differ from decoy molecules and discrimination is possible. The successful prediction of active and decoys underlines this assumption. Model creation and validation for each target is obviously a time-consuming step, but the data preprocessing using KNIME workflows and model architecture created in Python allows developing an automated process. This can be further applied in a straightforward manner to several other proteins from the PDB databank with ChEMBL bioactivity data. Even the docking procedure could be automated using PyGOLD, a python-based API for automated docking workflow generation.65 This would lead to a large coverage of targets and the development of a complete prediction tool. In addition, it should also be possible to train multi-target models using different protein structures of the same target to represent different conformations and protein flexibility.

ACS Paragon Plus Environment

34

Page 35 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Of course, the question remains open, if this huge effort is paying off. Ligand-based approaches are in general very fast and independent from a protein structure. One disadvantage of these methods is the dependency on molecular bioactivity data. An unknown ligand-target interaction cannot be predicted if the molecular structure lies outside the measured bioactivity data.5 This is comparable to the application domain of, for example, QSAR analyses.66 With the increasing amount of available bioactivity data, the unknown chemical space will diminish. Successful ligand-based approaches should therefore be the first choice for analyzing bioactive compounds. In contrast, the use of docking-based target prediction has the advantage, that it should be independent of the underlying molecular structure. This is even true for the creation of targetspecific models based on known molecules as shown here. Our approach is only evaluating the protein-atom contributions of the underlying scoring functions, which should in principal be independent from a specific ligand-atom contribution and therefore from a specific molecular structure. As shown during the validation of the trained models, the ext_tf datasets with dissimilar molecules in comparison to the training and validation datasets show reasonable performances. This indicates that our approach does not rely on the knowledge about the underlying molecular structure or the molecular scaffold. However, one has to be aware of the drawbacks of this approach. There is still the need of a reasonable amount of bioactivity data for a specific protein target and, of course, a threedimensional protein structure. Especially the latter one is not always available for all protein targets of interest. In addition, the training of thousands of target-specific models may not be feasible and would cost a huge amount of effort and computational time. An automated workflow can ease this problem, but an initial computational effort is still needed. A huge number of targets can also lead to the problem, that a reliable target ranking is difficult. As shown for some groups of molecules

ACS Paragon Plus Environment

35

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 36 of 45

(e.g. TPA), several different targets could still be predicted. However, it could be shown that the inter-protein scoring noise could be reduced. As a conclusion, the identification of a target in phenotypic screening approaches may not be the application domain of this approach. Here, models for all known protein targets would have to be developed which seems to be infeasible due to missing protein structures and the huge amount of computational time. In our opinion, the interesting application domain lies in the creation of a focused selection of target-specific models for specific applications. This includes the prediction of often occurring off-targets, the repurposing of known drugs for important protein targets or the prediction of polypharmacology effects.

ACS Paragon Plus Environment

36

Page 37 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Corresponding Author Dr. Oliver Koch Faculty of Chemistry and Chemical Biology, TU Dortmund University, Otto‐Hahn‐Str. 6, 44227 Dortmund, Germany. Current address: Westfälische Wilhelms-Universität Münster Institute of Pharmaceutical and Medicinal Chemistry Corrensstr. 48, 48149 Münster, Germany Email: [email protected]; [email protected] Homepage: www.agkoch.de

Author Contributions The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript. Funding Sources The authors acknowledge the Mercator Research Center Ruhr Starting Grant (AN-2015-0053) and the German Federal Ministry for Education and Research (BMBF, Medizinische Chemie in Dortmund, Grant No. BMBF 1316053).

Acknowledgements J. Jasper for the PADIF software and help with the PADIF calculations.

ACS Paragon Plus Environment

37

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 38 of 45

Supporting Information. •

Supporting_Information.pdf: Supplementary Figures and Table



Active_and_Decoys_Datasets.xlsx: List of active and decoy molecules for each target

extracted from ChEMBL •

Ext-TF-Set_Tanimoto-Coefficients.csv: Data about the ext_set and the corresponding

ext_tf_set with molecules showing a Tanimoto coefficients < 0.7 in relation to train_test and test_set •

Validation_1940-Active-Molecules_20-Targets.csv: Data set of selected active molecules

and their probability from NN, SVM, and consensus models for the 20 targets. This data was used for compari-son of the average probabilities, as seen in Figure 4. •

Multi-Target-Set.xlsx: Multi-target data set used for inter-target ranking and their

probability scores from ANN, SVM and consensus models for the 20 targetsAUTHOR INFORMATION This information is available free of charge via the Internet at http://pubs.acs.org

ACS Paragon Plus Environment

38

Page 39 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

REFERENCES [1] Shoichet, B.K. Virtual screening of chemical libraries. Nature, 2004, 432, 862–865. [2] Kremer, L.; Schultz-Fademrecht, C.; Baumann, M; Habenberger, P.; Choidas, A.; Klebl, B., Kordes, S.; Schöler, H.R.; Sterneckert, J.; Ziegler, S.; Schneider, G; Waldmann, H. Discovery of a Novel Inhibitor of the Hedgehog Signaling Pathway through Cell-based Compound Discovery and Target Prediction. Angew. Chem. Int. Ed. Engl, 2017, 56(42), 13021-13025. [3] Schneider, G.; Reker, D.; Chen, T.; Hauenstein, K; Schneider, P.; Altmann, K.H. Deorphaning the Macromolecular Targets of the Natural Anti-cancer Compound Doliculide. Angew Chem Int Ed Engl, 2016, 55(40):12408-11. [4] Ziegler, S.; Pries,V.; Hedberg, C.; Waldmann, H. Target identification for small bioactive molecules: finding the needle in the haystack. Angew. Chem. Int. Ed. Engl, 2013, 52(10), 274492. [5] Lavecchia, A.; Cerchia, C. In silico methods to address polypharmacology: current status, applications and future perspectives. Drug Discov. Today, 2016, 21(2), 288-98. [6] Zloh, M.; Kirton, S.B. The benefits of in silico modeling to identify possible small-molecule drugs and their off-target interactions. Future Med. Chem., 2018, 10(4), 423-432. [7] Schomburg, K.T.; Bietz, S.; Briem, H.; Henzler AM, Urbaczek S, Rarey M. Facing the challenges of structure-based target prediction by inverse virtual screening. J. Chem. Inf. Model., 2014, 54(6),1676-86. [8] Lo, Y.C., Senese, S.; Damoiseaux, R; Torres, J.Z. 3D Chemical Similarity Networks for Structure-Based Target Prediction and Scaffold Hopping. ACS Chem. Biol, 2016, 11(8), 224453. [9] Nickel, J., Gohlke, B.-O.; Ehreman, J., Banerjee, P.,; Rong, W.W., Goede, A; Dunkel, M.; Preissner, R. SuperPred: update on drug classification and target prediction. Nucleic Acids Res. 2014, 42, W26-31. [10][ Gfeller, D; Grosdidier, A.; Wirth, M.; Daina, A.; Michielin, O.; Zoete, V. SwissTargetPrediction: a web server for target prediction of bioactive small molecules. Nucleic Acids Res., 2014, 4, W32-38. [11] Jiayu, G.; Chaoqian, C.; Xiaofeng, L., Xin, K.; Hualiang, J.; Daqi, G.; Honglin, L. ChemMapper: A Versatile Web Server for Exploring Pharmacology and Chemical Structure Association Based on Molecular 3D Similarity Method. Bioinformatics. 2013, 29(14), 18271829. [12] Kringelum, J; Kjaerulff, S.K.; Brunak, S.; Lund, O.; Oprea, T.I.; Taboureau, O. ChemProt3.0: a global chemical biology diseases mapping. Database, 2016, pii:bav123.

ACS Paragon Plus Environment

39

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 40 of 45

[13] Carrella, D.; Napolitano, F; Rispoli. R.; Miglietta, M.; Carissimo, A.; Cutillo, L.; Sirci, F., Gregoretti, F.; di Bernardo, D. Mantra 2.0: an online collaborative resource for drug mode of action and repurposing by network analysis. Bioinformatics, 2014, 30(12),1787-1788. [14] Alexey, L.; Alla, S; Dmitrii, F., Vladimir, P. PASS: prediction of activity spectra for biologically active substances. Bioinformatics. 2000, 16(8), 747–748. [15] Keiser, M.J.; Roth, B.L.; Armbruster, B.N.; Ernsberger, P.; Irwin, J.J.; Shoichet, B.K. Relating protein pharmacology by ligand chemistry. Nat. Biotech., 2017, 25(2),197-206. [16] Peón, A.; Naulaerts, S.; Ballester, P.J. Predicting the Reliability of Drug-target Interaction Predictions with Maximum Coverage of Target Space. Sci. Rep, 2017, 7(1), 3820. [17] Awale, M.; Reymond, J.L. The polypharmacology browser: a web-based multi-fingerprint target prediction tool using ChEMBL bioactivity data. J. Cheminf., 2017, 9, 1-11. [18] Reker D, Perna AM, Rodrigues T, Schneider P, Reutlinger M, Mönch B, Koeberle A, Lamers C, Gabler M, Steinmetz H, Müller R, Schubert-Zsilavecz M, Werz O, Schneider G. Revealing the macromolecular targets of complex natural products. Nat. Chem. 2014, 6(12), 1072-8. [19] Rognan, D. Structure-Based Approaches to Target Fishing and Ligand Profiling. Mol. Inform., 2010, 29(3),176-87. [20] Kharkar, P.S., Warrier, S., Gaud, R.S. Reverse docking: a powerful tool for drug repositioning and drug rescue. Future Med. Chem., 2014, 6(3), 333-42. [21] Ehrt, C; Brinkjost, T.; Koch, O. Impact of Binding Site Comparisons on Medicinal Chemistry and Rational Molecular Design. J. Med. Chem, 2016, 59(9), 4121-51. [22] Ehrt, C.; Brinkjost, T.; Koch, O. A benchmark driven guide to binding site comparison: An exhaustive evaluation using tailor-made data sets (ProSPECCTs). PLoS Comput Biol., 2018, 14(11), e1006483. [23] Wang; X.; Shen, Y.; Wang, S.; Li, S.; Zhang, W.; Liu, X.; Lai, L.; Pei, J.; Li, H. PharmMapper 2017 update: a web server for potential drug target identification with a comprehensive target pharmacophore database. Nucleic Acids Res., 2017, 45(W1),W356-W360. [24] Jui-Chih,W.; Pei-Ying, C.; Chung-Ming, C.; Jung-Hsin, L. idTarget: a web server for identifying protein targets of small chemical molecules with robust scoring functions and a divide-and-conquer docking approach. Nucleic Acids Res., 2012, 40, W393–W399. [25] Li, H.; Gao, Z.; Kang, L.; Zhang, H.; Yang, K.; Yu, K.; Luo, X.; Zhu, W.; Chen, K.; Shen, J.; Wang, X.; Jiang, H. TarFisDock: a web server for identifying drug targets with docking approach. Nucleic Acids Res., 2006, 3, W219-W224.

ACS Paragon Plus Environment

40

Page 41 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

[26] Schomburg , K.T.; Rarey, M. Benchmark Data Sets for Structure-Based Computational Target Prediction. J. Chem. Inf. Model., 2014, 54(8), 2261-2274. [27] Luo, H.; Chen, J.; Shi, L.; Mikailov, M.; Zhu, H.; Wang, K.; He, L.; Yang, L. DRAR-CPI: a server for identifying drug repositioning potential and adverse drug reactions via the chemicalprotein interactome. Nucleic Acids Res., 2011, 39, W492–W498. [28] Heng, L.; Ping, Z.; Xi Hang, C.; Dizheng, D.; Hao, Y.; Hui, H.; Can, L.; Shengying, Q.; Chunling, W.; Leming, S.; Lin, H., Lun Y. DPDR-CPI, a server that predicts Drug Positioning and Drug Repositioning via Chemical-Protein Interactome. Sci. Rep., 2016, 6, 35996. [29] Reker, D.; Rodrigues, T.; Schneider, P.; Schneider, G. Identifying the macromolecular targets of de novo designed chemical entities through self-organizing map consensus. Proc. Natl. Acad. Sci., 2014, 111, 4067- 4072. [30] Lirong, W; Chao, M.; Peter, W.; Haibin, L.; Weiwei, S.; Xiang-Qun., X. TargetHunter: a web portal for predicting the therapeutic potential of small organic molecules based on chemogenomic database. AAPS J, 2013, 15(2), 395-406. [31] Nidhi, G.; Davies, J.W.; Jenkins, J.L. Prediction of biological targets for compounds using multiple-category Bayesian models trained on chemogenomics databases. J. Chem. Inf. Model., 2017, 46, 1124-1133. [32] Liu, J.; Wang, R. Classification of Current Scoring Functions. J. Chem. Inf. Model., 2015, 55(3), 475-482. [33] Kitchen, D.B.; Decornez, H.; Furr, J.R.; Bajorath, J. Docking and scoring in virtual screening for drug discovery: methods and applications. Nat. Rev. Drug. Discov., 2004, 3(11),935-49. [34] Warren, G., Andrews, C.; Capelli, A.; Clarke, B.; LaLonde, J. A critical assessment of docking programs and scoring functions. J. Med. Chem., 2005, 49, 5912-5931. [35] Lionta, E.; Spyrou, G.; Vassilatis, D.K.; Cournia, Z. Structure-Based Virtual Screening for Drug Discovery: Principles, Applications and Recent Advances. Curr. Top. Med. Chem., 2014, 14(16), 1923-1938. [36] Yang. L; Wang, K.; Chen, J.; Jegga, A.G.; Luo, H. Exploring Off-Targets and Off-Systems for Adverse Drug Reactions via Chemical-Protein Interactome - Clozapine-Induced Agranulocytosis as a Case Study. PLOS Comp. Biol., 2011, 7(3), e1002016. [37] Kellenberger, E.; Foata, N.; Rognan, D. Ranking targets in structure-based virtual screening of three-dimensional protein libraries: Methods and problems. J. Chem. Inf. Model., 2008, 48(5), 1014−1025.

ACS Paragon Plus Environment

41

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 42 of 45

[38] Wang,W.; Zhou, X.; He, W.; Fan, Y.; Chen, Y.; Chen, X. The interprotein scoring noises in glide docking scores. Proteins, 2012, 80(1),169-83. [39] Luo, Q.; Zhao, L.; Hu, J.; Jin, H.; Liu, Z.; Zhang, L. The scoring bias in reverse docking and the score normalization strategy to improve success rate of target fishing. PLoS One, 2017, 12(2), e0171433. [40] Ragoza, M.; Hochuli, J.; Idrobo, E.; Sunseri J.; Koes, D.R. Protein-Ligand Scoring with Convolutional Neural Networks. J. Chem. Inf. Model., 2017, 57(4), 942-957. [41] Liu, J.; Su ,M.; Liu, Z.; Li, J.; Li, Y.; Wang, R. Enhance the performance of current scoring functions with the aid of 3D protein-ligand interaction fingerprints. BMC Bioinf., 2017, 18:343. [42] Cheng T, Liu Z, Wang R. A knowledge-guided strategy for improving the accuracy of scoring functions in binding affinity prediction. BMC Bioinf., 2010, 11, 93-208. [43] Sato, T.; Honma, T.; Yokoyama, S. Combining Machine Learning and PharmacophoreBased Interaction Fingerprint for in Silico Screening. J. Chem. Inf. Model., 2010, 50,170-185. [44] Yan, Y.; Wang, W.; Sun, Z.; Zhang, J.Z.H.; Ji, C. Protein-Ligand Empirical Interaction Components for Virtual Screening. J. Chem. Inf. Model., 2017, 57(8):1793-1806. [45] Li, G.B.;, Yu, Z.J.; Liu, S.; Huang, L.Y.; Yang, L.L.; Lohans, C.T.; Yang, S.Y.; IFPTarget: A Customized Virtual Target Identification Method Based on Protein-Ligand Interaction Fingerprinting Analyses. J. Chem. Inf. Model., 2017, 57(7),1640-1651. [46] Jasper, J. B; Humbeck, L.; Brinkjost, T.; Koch, O. A novel interaction fingerprint derived from per atom score contributions: exhaustive evaluation of interaction fingerprint performance in docking based virtual screening. J. Cheminf., 2018, 10 (15), 1-15. [47] Jones, G.; Willett, G.P.; Glen, R.C. Molecular Recognition of Receptor sites using a Genetic Algorithm with a Description of Desolvation. J. Mol. Biol., 1995, 245, 43-53. [48] Bento, A.P.; Gaulton, A.; Hersey, A.; Bellis, L.J.; Chambers, J.; Davies, M.; Krüger, M.; Light, Y.; Mak, L.; McGlinchey, S.; Nowotka, M.; Papadatos, G.; Santos, R.; Overington, J.P. The ChEMBL bioactivity database: an update. Nucleic Acids Res., 2014, 42, 1083-1090. [49] Bauer, M.R.; Ibrahim, T.M.; Vogel, S.M.; Boeckler, F.M. Evaluation and Optimization of Virtual Screening Workflows with DEKOIS 2.0 – A Public Library of Challenging Docking Benchmark Sets. J. Chem. Inf. Model., 2013, 53(6), 1447-1462. [50] Berthold, M.R.; Cebron, N.; Dill, F.; Gabriel, T.R.; Kötter, T.; Meinl, T.; Ohl, P.; Sieb, C.; Thiel, K.; Wiswedel, B. KNIME: The Konstanz Information Miner In: Studies in Classification; Data Analysis, and Knowledge Organization (GfKL 2007), Ed. Springer, 2007, ISBN:978-3540-78239-1, ISSN:1431-8814.

ACS Paragon Plus Environment

42

Page 43 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

[51] Lenselink, E.B.; Ten Dijke, N.; Bongers, B.; Papadatos, G.; van Vlijmen, H.W.T.; Kowalczyk, W.; IJzerman, A.P.; van Westen, G.J.P. Beyond the hype: deep neural networks outperform established methods using a ChEMBL bioactivity benchmark set. J. Cheminf., 2017, 9(1), 1758-2946. [52] Molecular Operating Environment (MOE) 2013.08.1010 Sherbooke 990 St. West, Suite #910, Montreal, QC, H3A 2R7, Canada: Chemical Computing Group ULC, 2018. https://www.chemcomp.com/Research-Citing_MOE.htm. [53] Korb, O.; Stützle, T.; Exner, T.E. Empirical scoring functions for advanced protein–ligand docking with PLANTS. J. Chem. Inf. Model, 2009, 49, 84-96. [54] Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Louppe, G.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; Vanderplas, J.; Passos, A.; Cournapeau, D.; Brucher, M.; Perrot, M.; Duchesnay, E. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res., 2011, 12, 2825-2830. [55] Al-Rfou, R.; Alain, G.; Almahairi,A.; Angermueller, C.; Bahdanau, D.; Ballas, N.; Bastien, F.; Bayer, J.; Belikov, A.; Belopolsky, A.; Bengio, Y.; Bergeron, A.; Bergstra, J.; Bisson, V.; Snyder, J.B.; Bouchard, N.; Boulanger-Lewandowski, B.; Bouthillier, X.; de Brébisson, A.; Breuleux, O.; Carrier, P.-L.; Cho, K.; Chorowski, J.; Christiano, P.; Cooijmans, T.; Côté, M.-A.; Côté, M.; Courville, A.; Dauphin, Y.N.; Delalleau, O.; Demouth, J.; Desjardins, G.; Dieleman, S.;Dinh, L.; Ducoffe, M.; Dumoulin, V.; Kahou, S.E.; Erhan, D.; Fan, Z.; First, O.; Germain, M.; Glorot, X.; Goodfellow, I.; Graham, M.; Gulcehre, C.; Hamel, P.; Harlouchet, I.; Heng, J.-P.; Hidasi, B.; Honari, S.; Jain, A.; Jean, S.; Jia, K.; Korobov, M.; Kulkarni, V.; Lamb, A.; Lamblin, P.; Larsen, E.; Laurent, C.; Lee, S.; Lefrancois, S.; Lemieux, S.; Léonard, N.; Lin, N.; Livezey, J.A.; Lorenz, C.; Lowin, J.; Ma, Q.; Manzagol, P.-A.; Mastropietro, O.; McGibbon, R.T.; Memisevic, R.; van Merriënboer, B.; Michalski, V.; Mirza, M.; Orlandi, A., Pal, C. Pascanu, R.; Pezeshki, M.; Raffel, C.; Renshaw, D.; Rocklin, M.; Romero, A.; Roth, M.; Sadowski, P.; Salvatier, J.; Savard, F.; Schlüter, J.; Schulman, J.; Schwartz, G.; Serban, J.V.; Serdyuk, D.; Shabanian, S.; Simon, E.; Spieckermann, S.; Subramanyam, S.R.; Sygnowski, J.; Tanguay, J.; van Tulder, G.; Turian, J.; Urban, S.; Vincent, P.; Visin, F.; de Vries, H.; Warde-Farley, D.; Webb, D.J.; Willson, M.; Xu, K.; Xue, L.; Yao, L.; Zhang, S.; Zhang, Y.. Theano: A Python Framework for Fast Computation of Mathematical Expressions. arXiv preprint, 2016. [56] R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, 2014. URL http://www.R-project.org/. [57] Desaphy, J.; Raimbaud, E.; Ducrot, P.; Rognan, D. Encoding protein-ligand interaction patterns in fingerprints and graphs. J. Chem. Inf. Model., 2013; 53(3), 623-37. [58] Pérez-Nueno, V.I.; Rabal, O.; Borrell, J.I.; Teixidó, J. APIF: a new interaction fingerprint based on atom pairs and its application to virtual screening. J. Chem. Inf. Model., 2009, 49(5), 1245-60.

ACS Paragon Plus Environment

43

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 44 of 45

[59] Wójcikowski, M.; Ballester, P.J.; Siedlecki, P. Performance of machine-learning scoring functions in structure-based virtual screening. Sci. Rep., 2017, 7:46710. [60] Okwan-Duodu, D.; Landry, J.; Shen, X.Z.; Diaz, R. Angiotensin-converting enzyme and the tumor microenvironment: mechanisms beyond angiogenesis. Am. J. Physiol. Regul. Integr. Comp. Physiol., 2013, 305, R205-R215. [61] Bernstein, K.E.; Ong, F.S.; Blackwell, W.L.; Shah, K.H.; Giani, J.F.; Gonzalez-Villalobos, R.A.; Shen, X.Z.; Fuchs, S.; Touyz, R.M. A modern understanding of the traditional and nontraditional biological functions of Angiotensin-converting enzyme. Pharmacol. Rev, 2013, 65, 1-46. [62] Shen, X.Z.; Ong, F.S.; Bernstein, E.A.; Janjulia, T.; Blackwell, W.L.; Shah, K.H.; Taylor, B.L.; Gonzalez-Villalobos, R.A.; Fuchs, S.; Bernstein, K.E. Nontraditional roles of angiotensinconverting enzyme. Hypertension, 2012, 59, 763-768. [63] Kim, H.J.; Bae, S.C. Histone deacetylase inhibitors: molecular mechanisms of action and clinical trials as anti-cancer drugs. Amer. J. Trans. Res., 2011, 3(2), 166-179. [64] Roche, J.; Bertrand, P. Inside HDACs with more selective HDAC inhibitors. Eur. J. Med. Chem., 2016, 121, 451-48

[65] Patel, H.; Brinkjost, T.; Koch O. PyGOLD: a python-based API for docking based virtual screening workflow generation. Bioinformatics, 2017; 33(16), 2589-2590. [66] Tropsha, A. Best Practices for QSAR Model Development, Validation, and Exploitation. Mol. Inform., 2010, 29(6-7), 476-88.

ACS Paragon Plus Environment

44

Page 45 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Table of Content Graphic

ACS Paragon Plus Environment

45