Time-Domain Analysis of Molecular Dynamics Trajectories Using

Jul 19, 2019 - Extending the recent trend of using machine learning techniques to predict ... Convolutional neural networks based on the 2D tensors of...
0 downloads 0 Views 1MB Size
Subscriber access provided by IDAHO STATE UNIV

Computational Biochemistry

Time-domain Analysis of Molecular Dynamics Trajectories using Deep Neural Networks: Application to Activity Ranking of Tankyrase Inhibitors Vladimir Berishvili, Valentin O. Perkin, Andrew E. Voronkov, Eugene V. Radchenko, s Riyaz, Chittireddy Venkata Ramana Reddy, Viness Pillay, Pradeep Kumar, Yahya Essop Choonara, Ahmed Kamal, and Vladimir A. Palyulin J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.9b00135 • Publication Date (Web): 05 Jul 2019 Downloaded from pubs.acs.org on July 18, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Time-domain Analysis of Molecular Dynamics Trajectories

using

Deep

Neural

Networks:

Application to Activity Ranking of Tankyrase Inhibitors Vladimir P. Berishvili,1 Valentin O. Perkin,1 Andrew E. Voronkov,1,2 Eugene V. Radchenko,1 Riyaz Syed,3 Chittireddy Venkata Ramana Reddy,3 Viness Pillay,4 Pradeep Kumar,4 Yahya E. Choonara,4 Ahmed Kamal, 5 Vladimir A. Palyulin1*

1. Department of Chemistry, Lomonosov Moscow State University, Moscow 119991, Russia

2. Digital BioPharm Ltd., Hovseterveien 42 A, H0301, Oslo 0768, Norway

3. Department of Chemistry, Jawaharlal Nehru Technological University, Kukatpally, Hyderabad 500 085, India

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

4. Wits Advanced Drug Delivery Platform Research Unit, Faculty of Health Sciences, School of Therapeutic Sciences, Department of Pharmacy and Pharmacology, University of the Witwatersrand, Johannesburg, 7 York Road, Parktown 2193, South Africa

5. School of Pharmaceutical Education and Research, Jamia Hamdard, New Delhi 110 062, India

Abstract Molecular dynamics simulations provide valuable insights into the behavior of molecular systems. Extending the recent trend of using machine learning techniques to predict physicochemical properties from molecular dynamics data, we propose to consider the trajectories as multidimensional time series represented by 2D tensors containing the ligandprotein interaction descriptor values for each timestep. Similar in structure to the time series encountered in modern approaches for the signal, speech, and natural language processing, these time series can be directly analyzed using long short-term memory (LSTM) recurrent neural networks or the convolutional neural networks (CNN). The predictive regression models for the ligand-protein affinity were built for a subset of the PDBbind v.2017 database and applied to inhibitors of tankyrase, an enzyme of the poly (ADP-ribose)-polymerase (PARP) family, that can be used in the treatment of colorectal cancer. As an additional test set, a subset of the Community Structure-Activity Resource (CSAR) dataset was used. For comparison, the random forest and simple neural network models based on the crystal pose or the trajectory-averaged descriptors were used, as well as the commonly employed docking 1

ACS Paragon Plus Environment

Page 2 of 40

Page 3 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

and MM-PBSA scores. The convolutional neural networks based on the 2D tensors of ligandprotein interaction descriptors for short (2 ns) trajectories provide the best accuracy and predictive power, reaching the Spearman rank correlation coefficient of 0.73 and Pearson correlation coefficient of 0.70 for the tankyrase test set. Taking into account the recent increase in computational power of modern GPUs and relatively low computational complexity of the proposed approach, it can be used as an advanced virtual screening filter for compound prioritization.

Introduction The molecular dynamics simulations provide valuable insights into the behavior of molecular systems (in particular, biomacromolecules and ligand-target complexes) and can be used to predict various physicochemical properties of such systems.1,2 Molecular dynamics trajectories allow one to establish the mechanism and kinetics of protein-ligand binding.3,4 Another important application of molecular dynamics to drug design involves the prediction of ligand binding affinities. Methods may vary from rigorous and computationally expensive thermodynamic integration5 (TI) and free energy perturbation6 (FEP), enhanced sampling methods such as metadynamics,7 to more computationally accessible end-point approaches such as MM-PB(GB)SA8,9 and linear interaction energy10,11 (LIE) approaches. As demonstrated by a number of successful applications, these methods can compute the proteinligand binding free energies and unbinding rates with high accuracy and reasonable computational cost.9,12,13 On the other hand, methods using machine learning to predict physicochemical properties from the molecular dynamics data are gaining popularity. In one of the first

2

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

examples of this approach,14 Hopfinger et al. used molecular dynamics simulation to generate conformational ensembles of 3D molecular structures, and the grid cell occupancy measures of the atoms were used as descriptors for the 4D-QSAR models. In recent years, leveraging the rapid development of both computational hardware and machine learning techniques, the approaches involving extraction of descriptors from molecular dynamics trajectories attract a growing interest.15 Ash and Fourches16 used the molecular dynamics trajectories of the ERK2 kinase complexes with non-covalent inhibitors to calculate the molecular dynamics descriptors as the expectation values and standard deviations for various 3D conformation descriptors. It was shown that such descriptors enhance the correlation with affinity and lead to smoother structure-activity landscape compared to the static 1D, 2D, and 3D descriptors. Riniker17 also proposed to compress the molecular dynamics trajectories in vacuum and solvent into the “molecular dynamics fingerprints” (MDFP) including the average, standard deviation, and median of the time distributions of small molecule properties such as the potential energy components, radius of gyration, and solvent-accessible surface area (SASA). Combined with the 2D fingerprints, MDFP were used to create machine learning models to predict the solvation free energies and partition coefficients in various solvents. The approach was shown to be computationally inexpensive and comparable in performance to more rigorous methods such as FEP and LIE. A number of approaches using convolutional neural networks (CNN) to predict binding affinities have also been proposed based on different static ligand and protein representations. The DeepDTA model by Öztürk et al.18 uses 1D input data (ligand SMILES codes and protein amino acid sequences) that are embedded into 2D tensors for 2D CNN. On the other hand, the 3D CNN models developed by Ragoza et al.19 and Jiménez et al.20 work with multi-channel grid-based representations of 3D structures of the ligand-protein complexes taking into account various atom types or pharmacophoric centers. Simple deep

3

ACS Paragon Plus Environment

Page 4 of 40

Page 5 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

neural networks and ligand-protein interaction descriptors have also been used to derive target-specific machine-learning scoring functions.21 However, the dynamics of the ligandprotein complexes and the ligand mobility in the binding sites are not considered in these approaches. The increase of computational power associated with the modern graphics processing units (GPU) opens up a new possibility of employing molecular dynamics as one of the components of a virtual screening workflow aimed at ranking the putative ligands with respect to their target binding affinity. Using data derived from short molecular dynamics trajectories along with the appropriate machine learning methods, one can build QSAR models capable of filtering the virtual screening results obtained by simpler but less accurate methods, such as molecular docking or pharmacophore search. Instead of compressing the molecular dynamics (MD) trajectories into statistical quantities, they can be directly characterized by multidimensional time series wherein each frame of the trajectory is described by a vector of ligand-protein interaction descriptors.15 These time series are similar in structure to the time series encountered in modern approaches to the signal, speech, and natural language processing. Within the bounds of this analogy, the MD trajectory corresponds to a sentence, while the frames (timesteps) of the trajectory correspond to the words. The analogy can be further expanded by noting a correspondence between the vectors describing ligand-protein interaction and the word embedding vectors. This structure of data used to describe dynamics of a ligand-protein complex is illustrated in Figure 1.

4

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 1. Multidimensional time series describing the dynamics of the ligand-protein complex. Each complex is characterized by a 2D N×K tensor containing the descriptor values dij for each timestep of a trajectory (N is the number of trajectory frames, K is the number of descriptors).

A number of neural network architectures designed to work effectively with such data have been proposed. Of particular interest are the convolutional neural networks (CNN)22 and the long short-term memory (LSTM)23 recurrent neural networks capable of identifying and analyzing long-term trends in the dynamics, as well as various attention models24 that could be used, for example, to identify the most relevant conformations for estimating the binding energy of the ligand-protein complex. Herein, we use these model architectures, as well as more classical machine learning techniques such as a random forest or simple multilayer neural network, to predict the affinity of ligands on the basis of the MD data. In this work, the predictive regression models for the ligand-protein affinity based on the analysis of molecular dynamics trajectories were built for a subset of the PDBbind v.201725 database and applied to inhibitors of tankyrase, an enzyme of the poly (ADPribose)-polymerase (PARP) family involved in the canonical Wnt signaling pathway26,27 that has a negative regulatory effect on the activity of the Axin complex.28 Selective tankyrase 5

ACS Paragon Plus Environment

Page 6 of 40

Page 7 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

inhibitors showed promising results in the treatment of colorectal cancer, and multiple studies have also demonstrated that simultaneous inhibition of tankyrase and several kinases could be beneficial in terms of efficiency and resistance prevention.27,29 Thus, new virtual screening methods for inhibitors of this target are of practical interest, and the tankyrase inhibitor dataset provides a useful real-world example for the validation of the proposed approach. As an additional test set, a subset of the widely employed Community Structure-Activity Resource (CSAR) benchmark dataset was used.

Methods Data Sets The training set was based on the PDBbind v.2017 Refined set25 database consisting of approximately 4,000 PDB complexes for which the data on the target binding constants (pKi / pKd) are available. This database is among the largest public databases containing both the information on the complex structures and the data on the biological activity of ligands. Since the proposed approach is based on the molecular dynamics results, it was desirable to consider the complexes free from excessive complications during the preparation and simulation. Thus, the complexes containing metal atoms in the binding site were excluded. In addition, it is believed that currently existing systems for automatic generation of topologies and parameters for the ‘non-standard’ molecules (except amino acids and nucleotides) perform better for the uncharged molecules and the quality of topologies drops significantly with increasing absolute value of the charge.30 For this reason, in order to minimize the influence of this source of errors on the final model, it was decided to give preference to the complexes with neutral (uncharged) ligands. As a result, 356 complexes were selected for the MD simulations.

6

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

During the development of the proposed approach, much attention was paid to the formation of test sets that could objectively assess the quality of the models. The test set of 57 tankyrase inhibitors was obtained from the Protein Data Bank31 and their activity values (pIC50) ranging from 5 to 8 were obtained from the BindingDB database.32 In addition, the CSAR test set of 31 ligand-protein complexes with corresponding pKi / pKd values used in the D3R Grand Challenge33 blind affinity ranking competition was considered. These complexes were selected according to the same criteria as for the training set. It should be noted that both test sets (CSAR and tankyrase) are composed of protein targets different from the ones present in the training PDBbind data set. Therefore, they would serve as a stress test allowing to estimate the generalization abilities of the models. A complete list of the complexes used, along with the data on ligand activities, is presented in the Supporting Information.

Molecular Dynamics For each complex, the protein and ligand coordinates were extracted from the PDB file. Water molecules and other small molecule entities present in the crystal structure were ignored. For the missing residues in protein structures, the homology modeling was performed using the Modeller v.9.19 program34 and the unresolved amino acid side chains were modeled using the Dunbrack rotamer library.35 The atomic charges for the ligand molecules were calculated by the AM1-BCC method in the antechamber program from the AmberTools 16 package.36 The ligand parameters were modeled using the General Amber Force Field (GAFF).37 Amber topology and coordinate files using explicit solvent (TIP3P water model, 0.15M NaCl) and dodecahedron periodic box with 10 Å margin were created by the parmchk and tleap programs from the AmberTools package using the AMBER99SB force field38. The data files were converted into the GROMACS format using the acpype

7

ACS Paragon Plus Environment

Page 8 of 40

Page 9 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

script.39 The molecular dynamics simulations of the complexes were carried out for 2 ns by means of the GROMACS 2018.4 program.40 After the initial relaxation by the steepest descent minimization algorithm, the system was subjected to temperature stabilization (300 K) in the NVT ensemble for 100 ps (50,000 steps of 2 fs each) using the modified Berendsen thermostat. Then the molecular dynamics simulation was performed in the NPT ensemble for 100 ps (50,000 steps of 2 fs each) using the Berendsen barostat in order to stabilize the pressure at 1 atm. After the temperature and pressure were equalized, a production molecular dynamics simulation was performed in the GROMACS mdrun program on the NVIDIA GTX 1080 high-performance GPU recording the system states every 10 ps for 2 ns.

MM-PBSA In order to calculate the binding energies of tankyrase inhibitors, 2 ns trajectories were extended to 30 ns to ensure better energy convergence and sampling of the state space. The resulting trajectory files were used to calculate the binding energy by the MM-PBSA method implemented in the g_mmpbsa standalone program41 with the solute dielectric constant set to 4. To reduce the computational cost, binding energy was calculated for 251 equidistant frames from a time interval of 5–30 ns. To obtain the binding energy along with confidence intervals, a bootstrap analysis41 was performed. To estimate the contribution of the entropic term to binding energy, the interaction entropy42 approach was used. The Python script implementing this method is available in the Supporting Information.

Descriptors The choice of the ligand-protein interaction descriptors constitutes a significant research task of its own. There are at least several papers devoted to this issue.43–45 In the present work, it was decided to use the trajectory descriptors that can be obtained using the standard GROMACS 2018 functionality, as well as the descriptors based on the components of the 8

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

AutoDock Vina46 and Smina47 scoring functions that previously were successfully employed in the derivation of the target-specific machine-learning scoring functions.21 The 2D tensors containing vectors of ligand-protein interaction descriptors for each trajectory frame were used as input data for machine learning models. The total number of descriptors was 37 (in particular, 14 GROMACS and 23 Smina descriptors). The descriptors obtained using the GROMACS 2018 software package include the root mean square deviation (RMSD) values for the ligand and the binding site atoms, as well as the electrostatic and van der Waals energy values of the ligand interaction with the protein calculated using the g_mmpbsa program. In addition, the ligand radius of gyration was calculated, as well as the squared relative distance of the ligand center of mass from the binding site center of mass. For simplicity and scalability of the calculations, the binding site was defined as a set of protein atoms within 5 Å from the bound ligand in the starting frame of the trajectory. The descriptors based on the scoring function components were calculated using the Smina47 docking software (a fork of the widely used AutoDock Vina46) by rescoring the ligand-protein complex structures corresponding to each frame. In an attempt to take into account the explicit binding site water molecules present in the MD simulation system, an additional set of descriptors was generated. For each step of the trajectory, the ligand and the protein-water subsystem within 10 Å from the ligand center of mass were extracted in PDB format, and the Smina descriptors were calculated by rescoring these complex structures.

Machine Learning Random forest models The models using random forest regression were built in the scikit-learn48 software package (version 0.19). As input data, the average values of the descriptors over the trajectory or the 9

ACS Paragon Plus Environment

Page 10 of 40

Page 11 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

descriptor values corresponding to the co-crystallized ligand poses were used. The optimal model hyperparameters were determined using the grid search algorithm to minimize the mean square error (MSE) during Monte Carlo cross-validation: n_estimators=100, max_features=4, min_samples_leaf=2, bootstrap=True. The Monte Carlo cross-validation was performed as follows: in each of the 1000 iterations, the input data were split into the training and validation subsets in the 80:20 ratio, a model was built for the training subset and its performance was evaluated on the validation set. The final model quality parameters such as training and validation R2 and RMSE were estimated by averaging the values obtained on all iterations. Pearson and Spearman rank correlation coefficients were calculated by averaging model performance on the test sets. Deep neural network models The following neural network architectures were used to build the regression models: convolutional neural networks (CNN), long short-term memory (LSTM) networks, as well as simple two-layer neural networks (two hidden layers). As input data, the 2D tensor representing the last 151 frames of trajectory (corresponding to the interval from 0.5 to 2 ns) was used in the standard protocol, although other intervals were also evaluated. All models were implemented in the PyTorch 0.4.149 framework using Adam optimizer with fixed learning rate, ReLU activation function, and the mean squared error (MSE) loss function. CNN was implemented based on the previously proposed architecture50 with minor changes (see Figure 2a). Its key step involves applying a set of convolution filters to the input 2D tensor representing a trajectory (100 filters for each of the kernel lengths 5, 7, 11; kernel width is equal to the number of descriptors). The outputs for each kernel size are compressed by the max-pooling layer and concatenated into an internal representation. In the final linear activation layer, the estimated endpoint (activity) value is produced. For the regularization

10

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

purposes, the dropout algorithm with probability 0.5 is used. The LSTM networks (Figure 2b) were composed of two stacked LSTM units (with hidden size equal to 128) and linear layers. The dropout algorithm with 0.5 probability was used for the regularization purposes. Similar to the random forest models, the Monte Carlo cross-validation procedure was used involving 100 iterations and 80:20 split ratio of the training and validation subsets. At each iteration, after 3000 epochs of training, five different models showing the best performance on the validation set were selected for further evaluation. A similar protocol was applied to the training of two-layer neural networks comprising 100 and 150 neurons in the first and second hidden layers, respectively. As in the case of the Random Forest models, the optimal model hyperparameters for the CNN, LSTM and two-layer neural network models were determined using the grid search algorithm. The final model quality parameters such as RMSE and R2 were estimated by averaging the values obtained for the validation subsets on all iterations. The Pearson and Spearman rank correlation coefficients were calculated by averaging the model performance on the test sets. The source code of the models can be found in the Supporting Information. (a)

11

ACS Paragon Plus Environment

Page 12 of 40

Page 13 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(b)

Figure 2. Deep neural network architectures for the input 2D tensor representing a MD trajectory with N steps and K descriptors of ligand-protein interactions. (a) CNN model with F×K kernel size and M filters. (b) LSTM model with hidden size (H) equal to 128.

Results and Discussion Dataset Characterization For the models to be useful in real applications, they should be trained on data from diverse chemical, biological, and activity space.51,52 The test set data should also be diverse and consistent with the training set and the intended applicability domain of the models to ensure their meaningful validation.53 In order to understand the scope of diversity of the ligandprotein complexes in the training and test sets, simple methods of chemical space analysis were used: Principal Component Analysis (PCA) plots and Self-Organizing Maps54 (SOM). In addition, the distributions of the ligand physicochemical properties and activity values were analyzed. The distributions of the significant physicochemical properties of the training and test dataset compounds calculated by the RDkit version 2018.09.1 software55 are shown in Figure 12

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

3 as histogram plots. Most of the compounds lie within the ranges of molecular weight, the number of rotatable bonds, and the octanol-water partition coefficient that satisfy the Lipinski rules.56 In addition, the ranges and distributions of the properties for the test sets are mostly similar to those for the training set. The analysis of the PCA and SOM plots based on the DataWarrior default structural fingerprint descriptors FragFp57 (Figures 4 and 5) suggests that the training and test set compounds have significant structural diversity and, while different, cover roughly the same chemical space. In the first approximation, the chemical space of the training set can be considered as a superset of the chemical space of test compounds, enabling meaningful prediction of activity and evaluation of the models.

Figure 3. Histograms of the distribution of the physicochemical properties of the training and test set compounds: molecular weight (MolWeight); lipophilicity (MolLogP); topological polar surface area (TPSA); molecular volume (ComputeMolVolume). The ranges and distributions of the properties for the test sets are mostly similar to those for the training set.

13

ACS Paragon Plus Environment

Page 14 of 40

Page 15 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 4. PCA plot for the training, tankyrase, and CSAR datasets based on 512 FragFp descriptors. The first principal component explains about 26% of the total variance, the second one — 12%. Training set compounds are shown in green, tankyrase inhibitors in orange, and CSAR compounds in blue. It can be seen that the training and test set compounds have significant structural diversity and the chemical space of the training set can be considered as a superset of the test compounds chemical space.

14

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 5. Self-organizing map for the training, tankyrase, and CSAR datasets based on 512 FragFp descriptors. Training set compounds are shown in green, tankyrase inhibitors in orange, and CSAR compounds in blue. It can be seen that the training and test set compounds have significant structural diversity and, while different, cover roughly the same chemical space. The chemical space of the training set can be considered as a superset of the test compounds chemical space. In order to take into account not only the ligands, but also the target protein structures, we have also built the PCA plot based on the ligand-protein interaction descriptors (Figure 6). It also shows that the training and test set ligand-protein complexes are sufficiently diverse and, while different, cover roughly the same space, enabling meaningful prediction of activity and evaluation of the models.

15

ACS Paragon Plus Environment

Page 16 of 40

Page 17 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 6. PCA plot for the training, tankyrase, and CSAR datasets based on the 14 ligandprotein interaction descriptors for the crystal poses. The first principal component explains about 40% percent of the total variance, the second one — 23%. Training set compounds are shown in green, tankyrase inhibitors in orange, and CSAR compounds in blue. It can be seen that the training and test set ligand-protein complexes are sufficiently diverse and, while different, cover roughly the same space. The distribution of activity values for the training and test sets (pKi / pKd for the training and CSAR sets, pIC50 for the tankyrase set) is shown in Figure 7. As can be seen, the training set includes ligand-protein complexes with a broad range of affinities. The ranges covered by the test sets, although somewhat more narrow, can be considered as subsets of the training set range. This enables meaningful prediction of activity and evaluation of the models.53

16

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 7. Histograms of the distribution of the activity values for the training (pKi / pKd), tankyrase (pIC50), and CSAR (pKi / pKd) datasets. Training set compounds are shown in green, tankyrase inhibitors in orange, and CSAR compounds in blue. It can be seen that the training and test set ranges are sufficiently broad and, while different, cover roughly similar activity space.

Evaluation and Optimization of the Model Quality and Reliability A rational justification for the proposed approach is the hypothesis that the nature of changes in the ligand-protein interaction descriptors over the molecular dynamics trajectory can be used to build more accurate models compared to the ones based on the static complex structures. This assumption is also discussed in the literature16,17 wherein the statistical moments of their distributions are used to characterize the chemical space and to predict the energy of solvation. In the current study, we examine the applicability of this approach to predicting the biological activity of potential enzyme inhibitors. The intended application of the method to virtual screening imposes certain limitations on the maximum length of the calculated trajectories. In this case, 2 ns molecular

17

ACS Paragon Plus Environment

Page 18 of 40

Page 19 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

dynamics was simulated for each ligand-protein complex. The trajectory of such length can be calculated within a reasonable time (less than 20 minutes per complex on the NVIDIA GTX 1080 GPU for the systems containing up to 30,000 atoms) and have been used as a source of data for predicting the relevant complex properties (for example, several short trajectories of a single ligand-protein complex could be used to estimate the binding energy).43 In the virtual screening context, ensuring the correct relative ranking of compounds by their activity values is critical while the absolute accuracy of predictions may be less important (and often more elusive).58,59 Thus, the classification and activity ranking models may have greater practical benefit, and in this study, the rank correlation coefficients are used as the main criterion for model comparison. The predictive ability of the models was examined using the diverse and reliable CSAR and tankyrase datasets. In this section we evaluate the model performance on the training and test sets as well as analyze how certain features and parameters of the modeling workflow affect the results. Comparison models In order to determine the real benefits of a dynamic description of the ligand-protein complexes, we employed several “internal” comparison models. Thus, the neural network models using the time series as input data were compared against the models using ligandprotein interaction descriptors only for the ligand pose corresponding to the crystal structure, as well as the models using the average descriptor values over the length of a trajectory. To build the machine learning models from these types of data, a random forest and two-layer artificial neural networks were used. The simple neural networks based on the crystal or average descriptors are in some ways similar to the models proposed here but do not use the information about the binding dynamics. The random forest technique was selected taking

18

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

into account its high performance and robustness as well as the relative ease (compared to neural networks) of interpreting the significance of descriptors that can be used for dimensionality reduction60 while maintaining the interpretability of the selected variables. In addition, it may be interesting to compare the efficiency of different network architectures using the same amount of input information, that is, all frame data for the full simulation. For this purpose, a simple data augmentation strategy to improve the accuracy of the two-layer NN model was attempted. Instead of using average or crystal pose descriptors, the descriptors for each trajectory frame and corresponding ligand-protein complex endpoint values were used as a single data-label pair. Performance of such models was measured by averaging the results for each complex obtained in the prediction mode (test time augmentation). Descriptor significance and selection For the purposes of interpretation, a chart of descriptor importance in the random forest model of pKi / pKd based on the averaged descriptors for the training set was obtained from the Monte Carlo cross-validation results (Figure 8). The most significant 14 descriptors identified in this way were used to build all production models based on the random forest and neural network methods. Their brief descriptions are presented in Table 1. As can be seen, in addition to the energetic properties, they reflect some of the spatial characteristics of ligand binding such as RMSD, length of a ligand molecule as well as the distance between the ligand and the binding site. They can serve as an inner normalization during model training necessitated by the significant diversity of protein targets.

19

ACS Paragon Plus Environment

Page 20 of 40

Page 21 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 8. Importance of the most significant descriptors based on the Monte Carlo crossvalidation for the random forest models using average descriptors. The mean values are shown.

Table 1. Descriptors used in the machine learning models. №

Descriptor

Source

Meaning

1

Gauss

Smina scoring function

Represents the attractive interactions via Gaussian function (with different mean and variance from ad4_solvation) of the interatomic distance

2

LigandSurfTen

GROMACS software

Ligand apolar solvation energy (SASA model)

3

GyrTotal

GROMACS software

Radius of gyration for protein-ligand complex

20

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

4

Hydrophobic

Smina scoring function

Applies in the case of two hydrophobic atoms. The hydrophobic term is equal to 1 when the interatomic distance is less than 0.5Å and is equal to 0 if the distance exceeds 1.5Å

5

VdWEnergy

Smina scoring function

This term represents a measure of the difference between the interatomic distance and the predefined optimal distance

6

DistBind

GROMACS software

Distance in nm measured between ligand and binding site center of masses

7

Ad4Solvation

Smina scoring function

Represents the desolvation energy term. Calculated as a product of Gaussian function of the interatomic distance by function of atomic volumes and partial charges

8

Electrostatic

Smina scoring function

Represents the Coulomb potential. Calculated as a product of partial charges divided by the characteristic distance

9

HBond

Smina scoring function

If a pair of atoms could form hydrogen bonds, the piecewise linear function is used to calculate this energy term

10

RMSDLigand

GROMACS software

RMSD of the ligand from the initial pose

11

ProteinLigandSurfTen

GROMACS software

Combined protein-ligand apolar solvation energy (SASA model)

12

ProteinSurfTen

GROMACS software

Protein apolar solvation energy (SASA model)

13

Repulsion

Smina scoring function

Calculated as a square of the characteristic interatomic distance

14

LigandLength

Smina scoring function

Length of a ligand molecule

21

ACS Paragon Plus Environment

Page 22 of 40

Page 23 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Trajectory length The length of an MD trajectory may be critically important for the quality of properties calculated from it.61 In order to elucidate how this parameter affects the ability of regression models to predict the pKi / pKd values, a number of computational experiments was conducted. Table 2 shows the performance of the CNN models based on different fragments of the originally simulated trajectory: the first and the last 0.5, 1, and 1.5 nanoseconds, as well as the entire 2 nanosecond trajectory. Table 2. Performance of the CNN models based on various MD trajectory fragments.

Trajectory

R² (train)

RMSE (train)

R² (val)

RMSE (val)

First 0.5 ns

0.64 ± 0.11

1.12 ± 0.19

0.43 ± 0.10

1.42 ± 0.10

First 1.0 ns

0.65 ± 0.13

1.11 ± 0.20

0.43 ± 0.09

1.44 ± 0.12

First 1.5 ns

0.65 ± 0.14

1.12 ± 0.22

0.42 ± 0.10

1.44 ± 0.10

Last 0.5 ns

0.67 ± 0.09

1.09 ± 0.16

0.43 ± 0.08

1.41 ± 0.11

Last 1.0 ns

0.68 ± 0.09

1.09 ± 0.17

0.45 ± 0.07

1.43 ± 0.12

Last 1.5 ns

0.69 ± 0.10

1.06 ± 0.15

0.45 ± 0.08

1.39 ± 0.10

Full 2 ns

0.67 ± 0.10

1.10 ± 0.17

0.42 ± 0.09

1.43 ± 0.09

fragment

The mean values of R² and root mean square error (RMSE) as well as their standard deviations (SD) are obtained by Monte Carlo cross-validation.

As can be seen, the variation of the trajectory length between 0.5 and 2 nanoseconds does not significantly affect the predictive power of the models. This can be explained by the fact that all CNN models operate in the same “regime” of small-scale conformational changes such as sidechain movement. Taking into account the large-scale conformational changes is probably possible only at the expense of a substantial increase in the length of the calculated

22

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 40

trajectory.62 Therefore, in this case the choice of the trajectory length could be viewed as an optimizable hyperparameter. In the further work we used the models built on the last 1.5 ns trajectory data due to their slightly lower validation RMSE and higher R² values. Training set performance The performance of the proposed and comparison models on the training set is summarized in the Table 3 in terms of the mean R² and root mean square error (RMSE) values for pKi / pKd as well as their standard deviations (SD) obtained by means of the Monte Carlo crossvalidation using the 80:20 training and validation subset split as described in the Methods section. These results show that the validation predictivity of all the approaches is very close and acceptable despite better accuracy of the random forest models on the training subset.

Table 3. Training set performance of various models.

Model

R² (train)

RMSE (train)

R² (val)

RMSE (val)

Random forest (crystal pose)

0.88 ± 0.01

0.66 ± 0.02

0.44 ± 0.08

1.41 ± 0.11

Two-layer NN (crystal pose)

0.57 ± 0.08

1.24 ± 0.17

0.44 ± 0.08

1.43 ± 0.18

Random forest (average)

0.88 ± 0.01

0.66 ± 0.01

0.46 ± 0.08

1.39 ± 0.10

Two-layer NN (average)

0.69 ± 0.11

1.04 ± 0.20

0.48 ± 0.09

1.36 ± 0.12

Two-layer NN (all frames)

0.79 ± 0.15

0.80 ± 0.19

0.46 ± 0.09

1.37 ± 0.10

LSTM neural network

0.60 ± 0.12

1.20 ± 0.15

0.46 ± 0.08

1.36 ± 0.12

23

ACS Paragon Plus Environment

Page 25 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

CNN

0.69 ± 0.10

1.06 ± 0.15

0.45 ± 0.08

1.39 ± 0.10

The mean values of R² and root mean square error (RMSE) as well as their standard deviations (SD) are obtained by Monte Carlo cross-validation.

Y-scrambling tests In order to ensure that these results are not caused by chance correlations, the Y-scrambling analysis was performed (Table 4). In all cases, the resulting models based on the scrambled activities are clearly unacceptable due to very low validation R2 values. The training R2 values for the neural network models also are close to zero since the training procedure selects models based on their validation performance. Although the random forest models have no such protection and were evidently overfitted, even their training R² values are significantly lower than for the true models. Thus, the Y-scrambling tests confirm the reliability of the proposed models.63 Table 4. Y-scrambling test of the performance of various models. Model

R² (train)

RMSE (train)

R² (val)

RMSE (val)

Random forest (crystal pose) Y-scrambling

0.75 ± 0.01

0.95 ± 0.14

–0.10 ± 0.08

2.05 ± 0.73

Two-layer NN (crystal pose) Y-scrambling

0.12* ± 0.15

1.78 ± 0.54

–0.02 ± 0.58

1.89 ± 0.47

Random forest (average) Y-scrambling

0.76 ± 0.01

0.93 ± 0.12

–0.10 ± 0.11

1.99 ± 0.77

Two-layer NN (average) Y-scrambling

0.05* ± 0.08

1.85 ± 0.54

–0.05 ± 0.06

1.97 ± 0.58

LSTM neural network Y-scrambling

0.04* ± 0.06

1.89 ± 0.52

0.02 ± 0.04

1.88 ± 0.21

24

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

CNN Y-scrambling

0.08* ± 0.16

1.83 ± 0.79

–0.02 ± 0.07

Page 26 of 40

1.95 ± 0.74

The mean values of R² and root mean square error (RMSE) as well as their standard deviations (SD) are obtained by Monte Carlo cross-validation. *Low training R² values reflect that the model for evaluation was selected based on its performance on the validation set.

Descriptors taking into account explicit solvent Under physiological conditions, ligand-protein complexes exist in an aqueous medium that can strongly influence the binding. A number of approaches have been proposed to account for this effect.64–66 In our case, the approaches based on the modification of the Autodock Vina docking protocol are of particular interest.67,68 They usually imply the development of new force fields or reparameterization of existing ones. It is important to note that these studies were mostly aimed at the determination of the correct ligand position in the binding site. Thus, the proposed parameterizations and force fields may be of limited help in solving a different task of scoring a ligand in a given pose. In this work we made an attempt to take into account the explicit water molecules in a binding site with minimal additional changes to the previously proposed protocol, hoping that the necessary reparameterization will occur implicitly as the network is trained. Thus, an additional set of descriptors was generated by extracting the ligand and the protein-water subsystem for each step of the trajectory and rescoring these complex structures using Smina. It was expected to take into account the formation of the hydrogen bridge bonds as well as possible steric effects associated with the presence of water molecules in the binding site. Unfortunately, these descriptors have not led to a significant improvement in the quality of the models (the results for the CNN models are shown in Table 5). This can probably be explained by the fact that the water molecules are usually removed in standard docking protocol, and the docking methods (including Vina/Smina) have been optimized to handle

25

ACS Paragon Plus Environment

Page 27 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

water effects in implicit way. The GROMACS-derived interaction energies also implicitly take into account water molecules present in the system.

Table 5. CNN model performance using standard and explicit solvent descriptors.

Model

R² (train)

RMSE (train)

R² (val)

RMSE (val)

CNN (implicit solvent)

0.69 ± 0.10

1.06 ± 0.15

0.45 ± 0.08

1.39 ± 0.10

CNN (explicit solvent)

0.64 ± 0.10

1.14 ± 0.15

0.45 ± 0.08

1.40 ± 0.12

The mean values of R² and root mean square error (RMSE) as well as their standard deviations (SD) are obtained by Monte Carlo cross-validation.

Test set performance Two external test sets were used for additional validation of the model quality. The first of these, CSAR, consists of 31 ligand-protein complexes for which Ki / Kd values are available. The second one consists of 57 complexes between the tankyrase enzyme and the ligands with known values of half maximal inhibitory concentration (IC50). The IC50 data are usually more easily available than the data for Ki / Kd, so the ability to predict and rank the compounds by their values is of great practical interest. We would like to note that the CSAR test set is rather diverse in terms of protein targets compared to the tankyrase test set. Such a diverse set may involve additional noise factors making structure-activity trends more difficult to track and decipher. However, even for the CSAR test set reasonable performance has been obtained both for the continuous and rank correlation. The performance of various models on the two test sets is summarized in Table 6 in terms of the continuous (Pearson) and rank (Spearman) correlation coefficients.

26

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 40

Table 6. Test set performance of various models. Model

CSAR dataset

Tankyrase dataset

Spearman correlation

Pearson correlation

Spearman correlation

Pearson correlation

Random forest (average)

0.65 ± 0.04

0.68 ± 0.03

0.62 ± 0.03

0.57 ± 0.02

Random forest (crystal pose)

0.68 ± 0.02

0.65 ± 0.02

0.59 ± 0.02

0.54 ± 0.02

Two-layer NN (average)

0.68 ± 0.08

0.70 ± 0.05

0.68 ± 0.07

0.65 ± 0.05

Two-layer NN (crystal pose)

0.67 ± 0.05

0.68 ± 0.04

0.51 ± 0.05

0.52 ± 0.04

Two-layer NN (all frames)

0.59 ± 0.09

0.64 ± 0.08

0.62 ± 0.07

0.60 ± 0.08

LSTM neural network

0.67 ± 0.07

0.68 ± 0.05

0.60 ± 0.11

0.58 ± 0.07

CNN

0.67 ± 0.03

0.70 ± 0.03

0.72 ± 0.03

0.70 ± 0.03

The mean values of the Spearman and Pearson correlation coefficients as well as their standard deviations (SD) are obtained for the 5 best models identified during the Monte Carlo cross-validation for the training set (500 points in total).

In order to better understand the abilities and limitations of the models, we have built plots of the predicted values vs corresponding true activity values (Figures 9–11). As can be seen, the predictions of different models have similar patterns, for example, for the tankyrase dataset all presented models somewhat underestimate the potency of relatively small compounds such as methyl-substituted pyrimidinones and quinazolinones (PDB: 4PNL and 4PNM) in the bottom left corner of the plots (highlighted in red). However, in the case of the

27

ACS Paragon Plus Environment

Page 29 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

CNN model this pattern is less pronounced compared to the simple two-layer network and the LSTM network. Similar situation is observed in CSAR dataset, e.g., for naphthalene-2carboximidamide (PDB: 4FU8). A possible reason for this bias may be related to the fact that smaller molecules tend to form fewer interactions with the binding site compared to larger compounds. We can speculate that since the interaction descriptors used in the models do not explicitly take spatial fit into account and the small molecules with high affinity are underrepresented in the training set, it might be difficult for the model to distinguish the compounds with high mobility in the binding site from the compounds forming stable interactions. Additional descriptors reflecting ligand spatial fit and mobility may be introduced to solve this problem. In this regard, the combination of the approach discussed in this paper with the use of descriptors derived from the grid-based representations of 3D structures is of certain interest.

Figure 9. Affinity (pKi) values predicted by the CNN model vs the actual pIC50 (TNKS) and pKi (CSAR) values. Linear trend line and confidence interval are shown. The outlier complexes discussed in the text are highlighted in red.

28

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 10. Affinity (pKi) values predicted by the LSTM model vs the actual pIC50 (TNKS) and pKi (CSAR) values. Linear trend line and confidence interval are shown. The outlier complexes discussed in the text are highlighted in red.

Figure 11. Affinity (pKi) values predicted by the two-layer network model built on averaged descriptors vs the actual pIC50 (TNKS) and pKi (CSAR) values. Linear trend line and confidence interval are shown. The outlier complexes discussed in the text are highlighted in red.

29

ACS Paragon Plus Environment

Page 30 of 40

Page 31 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

One can note that the use of the molecular dynamics data can be beneficial compared to the methods that do not take into account the dynamic behavior of the ligand-protein complexes. The CNN and LSTM models as well as the models built on the averaged descriptors are generally more reliable compared to the models using only the ligand-protein interaction descriptors based on the crystal pose. The neural network architecture is also important for model performance. In this case, the most productive model is based on the convolutional neural networks. The influence of the architecture on the neural network performance is an important subject of current research, especially with regard to comparison between the feedforward and recurrent networks.69 Despite greater expressiveness of the latter, their training by means of the gradient descent optimization is much more difficult and less stable, therefore, in practice it is harder to obtain a model with high performance. Moreover, one can view the max-pooling operation implemented in the CNN model as an additional regularization factor that reduces the tendency of the model to overfit and helps reassess the significance of noise factors. Another important point is the conceptual difference between the molecular dynamics trajectories and the verbal sentences commonly analyzed by LSTM networks. In contrast to the natural language processing problems, the semantic relations between the trajectory frames are much less pronounced due to a large number of independent samples. Thus, the potentially infinite fractal-like nature of a trajectory70 might be more suitable for the CNN approach.

Applicability in Virtual Screening In order to compare the proposed approach to the QSAR model construction based on the analysis of molecular dynamics trajectories against the alternative methods that are widely used in virtual screening, we decided to focus on two types of comparison approaches. The docking score-based models provide a fast and crude method to estimate the binding

30

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

affinities while the widely used MM-PBSA method represents a more rigorous but computationally expensive approach to the binding energy estimation. As a benchmark docking-based model, the Vinardo scoring function from the Smina software was used for three reasons: it shares many internal descriptors with the models proposed here, it shows a relatively good performance in various scoring benchmarks,71 and it is available as open source software. The MM-PBSA calculations were carried out for the last 25 ns of the 30 ns trajectories and the entropic contribution was estimated using the interaction entropy approach. The performance of these approaches on the tankyrase test set was evaluated in terms of the correlation coefficients between the actual inhibitor pIC50 endpoint values and the affinity estimates based on the Vinardo scoring function, the MM-PBSA calculations with and without entropy term, and the CNN model. Generally speaking, the pIC50 values are assay-dependent and only indirectly related to pKi but these quantities can be expected to correlate with each other. In fact, it was shown that mixing IC50 data from different assays only adds a moderate amount of noise.72 Since the IC50 endpoint data are usually more easily available, the ability of a model to predict and rank the compounds by their values is of great practical interest in the virtual screening context. Note that the Vinardo and MM-PBSA methods are designed to model the binding energies, leading to inverse (negative) expected correlations with pKi and pIC50. The rank (Spearman) and continuous (Pearson) correlation coefficients between the inhibitor pIC50 values and the affinity estimates are summarized in Table 7 and their correspondence is graphically illustrated in Figure 12. The Vinardo docking scoring function allows one to get a decent correlation coefficient, especially taking into account the crudity of the method and its speed. Unfortunately, the correlations based on the MM-PBSA method are very similar or even worse than those obtained from the docking scores. This situation is not in fact unusual. According to reviews,8,43 the average performance of the method ranges from

31

ACS Paragon Plus Environment

Page 32 of 40

Page 33 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

0 to 0.9 depending on the protein, but only in rare cases the correlation between the activity and calculated binding energy exceeds 0.7.43,73 The entropy term also did not lead to an improvement in this case. In contrast, the proposed method based on the CNN neural network allows one to obtain significantly better correlations compared to the benchmark methods. The predictive power of the models can be further enhanced by various ensemble-based approaches.74 In the present case, even the simple averaging of the values predicted by individual models from the Monte Carlo cross-validation leads to a slight increase in the overall performance, providing the Spearman rank correlation equal to 0.73 and Pearson correlation equal to 0.70. Table 7. The tankyrase test performance of various virtual screening approaches in terms of the rank (Spearman) and continuous (Pearson) correlation coefficients of the inhibitor pIC50 values. Method

Docking scoring function (Vinardo)

MM-PBSA calculations

MM-PBSA calculations with IE

Best NN model (CNN model)

Spearman

–0.42

–0.46

–0.41

0.73

–0.52

–0.29

–0.32

0.70

correlation Pearson correlation

32

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 12. Affinity scores based on the Vinardo scoring function, MM-PBSA calculations without and with the interaction entropy term (MM-PBSA and MM-PBSA IE), and affinity (pKi) values predicted by the convolutional neural network model vs actual pIC50 values for the set of tankyrase inhibitors.

Conclusions In this paper, a new method of constructing the affinity prediction models was developed based on the analysis of short molecular dynamics trajectories. Training of the neural network models even on a relatively small dataset (356 complexes of diverse ligands and proteins) yields results that are comparable or better than the performance of the commonly employed docking and MM-PBSA approaches. Successful ranking of tankyrase inhibitors with the CNN model can be considered as a proof of concept that the analysis of molecular dynamics

33

ACS Paragon Plus Environment

Page 34 of 40

Page 35 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

data by neural networks could be an attractive method with great predictive power. Our results suggest that the use of the convolutional neural networks, at least for short trajectories represented by 2D tensors of ligand-protein interaction descriptors, is preferred to the recurrent networks. Due to relatively low computational complexity, the proposed method can be used as an advanced virtual screening filter. However, further investigation is required in order to expand the training set as well as to identify the optimal MD simulation conditions, network architectures, and descriptors providing more effective models.

Supporting Information Supporting Information includes the following files: (1) CSV file consisting of the PDB codes along with data on resolution and ligand affinity for the training and test sets, (2) Python files containing source code needed to build neural networks based on a MD trajectory, (3) Python file implementing the Interaction Entropy approach.

Corresponding Author Vladimir A. Palyulin: [email protected]

Funding Sources This study was supported by the Russian Foundation for Basic Research (project no. 18-51580028), the Department of Science and Technology (DST) of the Government of India (project no. DST/IMRCD/BRICS/PilotCall2/CCT/2018-G), and the National Research Foundation (NRF) of South Africa (project no. 116014) under the BRICS STI cooperation program (project no. BRICS2017-236). The author VPB thanks the Russian Foundation for Basic Research for the award of the young researcher mobility grant (project no. 19-33-

34

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

50051). The author RS thanks CSIR-HRDG for the award of CSIR-SRAship (13(8906A)/2017-pool).

Conflict of Interest The authors declare that there is no conflict of interest.

References (1) De Vivo, M.; Masetti, M.; Bottegoni, G.; Cavalli, A. Role of Molecular Dynamics and Related Methods in Drug Discovery. J. Med. Chem. 2016, 59, 4035–4061. (2) Heifetz, A.; James, T.; Morao, I.; Bodkin, M. J.; Biggin, P. C. Guiding Lead Optimization with GPCR Structure Modeling and Molecular Dynamics. Curr. Opin. Pharmacol. 2016, 30, 14–21. (3) Buch, I.; Giorgino, T.; De Fabritiis, G. Complete Reconstruction of an Enzyme-Inhibitor Binding Process by Molecular Dynamics Simulations. Proc. Natl. Acad. Sci. U. S. A. 2011, 108, 10184–10189. (4) Decherchi, S.; Berteotti, A.; Bottegoni, G.; Rocchia, W.; Cavalli, A. The Ligand Binding Mechanism to Purine Nucleoside Phosphorylase Elucidated via Molecular Dynamics and Machine Learning. Nat. Commun. 2015, 6, 6155. (5) Kirkwood, J. G. Statistical Mechanics of Fluid Mixtures. J. Chem. Phys. 1935, 3, 300– 313. (6) Zwanzig, R. W. High‐Temperature Equation of State by a Perturbation Method. I. Nonpolar Gases. J. Chem. Phys. 1954, 22, 1420–1426. (7) Brotzakis, Z. F.; Limongelli, V.; Parrinello, M. Accelerating the Calculation of ProteinLigand Binding Free Energy and Residence Times Using Dynamically Optimized Collective Variables. J. Chem. Theory Comput. 2019, 15, 743-750. (8) Genheden, S.; Ryde, U. The MM/PBSA and MM/GBSA Methods to Estimate LigandBinding Affinities. Expert Opin. Drug Discov. 2015, 10, 449–461. (9) Wang, C.; Greene, D.; Xiao, L.; Qi, R.; Luo, R. Recent Developments and Applications of the MMPBSA Method. Front Mol Biosci 2018, 4, 87. (10) Hansson, T.; Marelius, J.; Aqvist, J. Ligand Binding Affinity Prediction by Linear Interaction Energy Methods. J. Comput. Aided Mol. Des. 1998, 12, 27–35. (11) Gutiérrez-de-Terán, H.; Aqvist, J. Linear Interaction Energy: Method and Applications in Drug Design. Methods Mol. Biol. 2012, 819, 305–323. (12) Schuetz, D. A.; Bernetti, M.; Bertazzo, M.; Musil, D.; Eggenweiler, H.-M.; Recanatini, M.; Masetti, M.; Ecker, G. F.; Cavalli, A. Predicting Residence Time and Drug Unbinding Pathway through Scaled Molecular Dynamics. J. Chem. Inf. Model. 2019, 59, 535–549. (13) Genheden, S.; Nilsson, I.; Ryde, U. Binding Affinities of Factor Xa Inhibitors Estimated by Thermodynamic Integration and MM/GBSA. J. Chem. Inf. Model. 2011, 51, 947–958. (14) Hopfinger, A. J.; Wang, S.; Tokarski, J. S.; Jin, B.; Albuquerque, M.; Madhav, P. J.; Duraiswami, C. Construction of 3D-QSAR Models Using the 4D-QSAR Analysis Formalism. J. Am. Chem. Soc. 1997, 119, 10509–10524. (15) Mittal, S.; Shukla, D. Recruiting Machine Learning Methods for Molecular Simulations of Proteins. Mol. Simul. 2018, 44, 891–904. (16) Ash, J.; Fourches, D. Characterizing the Chemical Space of ERK2 Kinase Inhibitors Using Descriptors Computed from Molecular Dynamics Trajectories. J. Chem. Inf. 35

ACS Paragon Plus Environment

Page 36 of 40

Page 37 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Model. 2017, 57, 1286–1299. Riniker, S. Molecular Dynamics Fingerprints (MDFP): Machine Learning from MD Data To Predict Free-Energy Differences. J. Chem. Inf. Model. 2017, 57, 726–741. (18) Öztürk, H.; Özgür, A.; Ozkirimli, E. DeepDTA: Deep Drug-Target Binding Affinity Prediction. Bioinformatics 2018, 34, i821–i829. (19) Jiménez, J.; Škalič, M.; Martínez-Rosell, G.; De Fabritiis, G. KDEEP: Protein–Ligand Absolute Binding Affinity Prediction via 3D-Convolutional Neural Networks. J. Chem. Inf. Model. 2018, 58, 287–296. (20) Ragoza, M.; Hochuli, J.; Idrobo, E.; Sunseri, J.; Koes, D. R. Protein-Ligand Scoring with Convolutional Neural Networks. J. Chem. Inf. Model. 2017, 57, 942–957. (21) Berishvili, V. P.; Voronkov, A. E.; Radchenko, E. V.; Palyulin, V. A. Machine Learning Classification Models to Improve the Docking-Based Screening: A Case of PI3KTankyrase Inhibitors. Mol. Inform. 2018, 37, e1800030. (22) Matsugu, M.; Mori, K.; Mitari, Y.; Kaneda, Y. Subject Independent Facial Expression Recognition with Robust Face Detection Using a Convolutional Neural Network. Neural Netw. 2003, 16, 555–559. (23) Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. (24) Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need, 2017, arxiv:1706.03762. (25) Liu, Z.; Su, M.; Han, L.; Liu, J.; Yang, Q.; Li, Y.; Wang, R. Forging the Basis for Developing Protein-Ligand Interaction Scoring Functions. Acc. Chem. Res. 2017, 50, 302–309. (26) Wu, S.; Wang, S.; Zheng, S.; Verhaak, R.; Koul, D.; Yung, W. K. A. MSK1-Mediated β-Catenin Phosphorylation Confers Resistance to PI3K/mTOR Inhibitors in Glioblastoma. Mol. Cancer Ther. 2016, 15, 1656–1668. (27) Arqués, O.; Chicote I.; Puig, I.; Tenbaum, S. P.; Argilés, G.; Dienstmann, R.; Fernández, N.; Caratù, G.; Matito, J.; Silberschmidt, D.; Rodon, J.; Landolfi, S.; Prat, A.; Espín, E.; Charco, R.; Nuciforo, P.; Vivancos, A.; Shao, W.; Tabernero, J.; Palmer, H. G. Tankyrase Inhibition Blocks Wnt/β-Catenin Pathway and Reverts Resistance to PI3K and AKT Inhibitors in the Treatment of Colorectal Cancer. Clin. Cancer Res. 2016, 22, 644–656. (28) Stamos, J. L.; Weis, W. I. The -Catenin Destruction Complex. Cold Spring Harb. Perspect. Biol. 2012, 5, a007898–a007898. (29) Solberg, N. T.; Waaler, J.; Lund, K.; Mygland, L.; Olsen, P. A.; Krauss, S. TANKYRASE Inhibition Enhances the Antiproliferative Effect of PI3K and EGFR Inhibition, Mutually Affecting β-CATENIN and AKT Signaling in Colorectal Cancer. Mol. Cancer Res. 2018, 16, 543–553. (30) Canzar, S.; El-Kebir, M.; Pool, R.; Elbassioni, K.; Mark, A. E.; Geerke, D. P.; Stougie, L.; Klau, G. W. Charge Group Partitioning in Biomolecular Simulation. J. Comput. Biol. 2013, 20, 188–198. (31) Berman, H. M.; Battistuz, T.; Bhat, T. N.; Bluhm, W. F.; Bourne, P. E.; Burkhardt, K.; Feng, Z.; Gilliland, G. L.; Iype, L.; Jain, S.; Fagan, P.; Marvin, J.; Padilla, D.; Ravichandran, V.; Schneider, B.; Thanki, N.; Weissig, H.; Westbrook, J. D.; Zardecki, C. The Protein Data Bank. Acta Crystallogr. D Biol. Crystallogr. 2002, 58, 899–907. (32) Liu, T.; Lin, Y.; Wen, X.; Jorissen, R. N.; Gilson, M. K. BindingDB: A Web-Accessible Database of Experimentally Determined Protein-Ligand Binding Affinities. Nucleic Acids Res. 2007, 35, D198–D201. (33) Gaieb, Z.; Liu, S.; Gathiaka, S.; Chiu, M.; Yang, H.; Shao, C.; Feher, V. A.; Walters, W. P.; Kuhn, B.; Rudolph, M. G.; Burley, S. K.; Gilson, M. K.; Amaro, R. E. D3R Grand Challenge 2: Blind Prediction of Protein-Ligand Poses, Affinity Rankings, and Relative Binding Free Energies. J. Comput. Aided Mol. Des. 2018, 32, 1–20. (34) Webb, B.; Sali, A. Comparative Protein Structure Modeling Using MODELLER. Curr. Protoc. Bioinformatics 2016, 54, 5.6.1–5.6.37. (35) Shapovalov, M. V.; Dunbrack, R. L., Jr. A Smoothed Backbone-Dependent Rotamer (17)

36

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Library for Proteins Derived from Adaptive Kernel Density Estimates and Regressions. Structure 2011, 19, 844–858. (36) Case, D. A.; Cheatham, T. E., 3rd; Darden, T.; Gohlke, H.; Luo, R.; Merz, K. M., Jr; Onufriev, A.; Simmerling, C.; Wang, B.; Woods, R. J. The Amber Biomolecular Simulation Programs. J. Comput. Chem. 2005, 26, 1668–1688. (37) Wang, J.; Wolf, R. M.; Caldwell, J. W.; Kollman, P. A.; Case, D. A. Development and Testing of a General Amber Force Field. J. Comput. Chem. 2004, 25, 1157–1174. (38) Hornak, V.; Abel, R.; Okur, A.; Strockbine, B.; Roitberg, A.; Simmerling, C. Comparison of Multiple Amber Force Fields and Development of Improved Protein Backbone Parameters. Proteins 2006, 65, 712–725. (39) Sousa da Silva, A. W.; Vranken, W. F. ACPYPE - AnteChamber PYthon Parser interfacE. BMC Res. Notes 2012, 5, 367. (40) Abraham, M. J.; Murtola, T.; Schulz, R.; Páll, S.; Smith, J. C.; Hess, B.; Lindahl, E. GROMACS: High Performance Molecular Simulations through Multi-Level Parallelism from Laptops to Supercomputers. SoftwareX 2015, 1-2, 19–25. (41) Kumari, R.; Kumar, R.; Open Source Drug Discovery Consortium; Lynn, A. G_mmpbsa--a GROMACS Tool for High-Throughput MM-PBSA Calculations. J. Chem. Inf. Model. 2014, 54, 1951–1962. (42) Duan, L.; Liu, X.; Zhang, J. Z. H. Interaction Entropy: A New Paradigm for Highly Efficient and Reliable Computation of Protein-Ligand Binding Free Energy. J. Am. Chem. Soc. 2016, 138, 5722–5728. (43) Wang, C.; Nguyen, P. H.; Pham, K.; Huynh, D.; Le, T.-B. N.; Wang, H.; Ren, P.; Luo, R. Calculating Protein-Ligand Binding Affinities with MMPBSA: Method and Error Analysis. J. Comput. Chem. 2016, 37, 2436–2446. (44) Chupakhin, V.; Marcou, G.; Gaspar, H.; Varnek, A. Simple Ligand–Receptor Interaction Descriptor (SILIRID) for Alignment-Free Binding Site Comparison. Comput. Struct. Biotechnol. J. 2014, 10, 33–37. (45) Weisel, M.; Proschak, E.; Schneider, G. PocketPicker: Analysis of Ligand BindingSites with Shape Descriptors. Chem. Cent. J. 2007, 1, 7. (46) Trott, O.; Olson, A. J. AutoDock Vina: Improving the Speed and Accuracy of Docking with a New Scoring Function, Efficient Optimization, and Multithreading. J. Comput. Chem. 2010, 31, 455–461. (47) Koes, D. R.; Baumgartner, M. P.; Camacho, C. J. Lessons Learned in Empirical Scoring with Smina from the CSAR 2011 Benchmarking Exercise. J. Chem. Inf. Model. 2013, 53, 1893–1904. (48) Abraham, A.; Pedregosa, F.; Eickenberg, M.; Gervais, P.; Mueller, A.; Kossaifi, J.; Gramfort, A.; Thirion, B.; Varoquaux, G. Machine Learning for Neuroimaging with ScikitLearn. Front. Neuroinform. 2014, 8, 14. (49) Paszke, A.; Gross, S.; Lerer, A. Automatic differentiation in PyTorch. 2017. (50) Kim, Y. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP); Association for Computational Linguistics: Stroudsburg, PA, USA, 2014; pp 1746–1751. (51) Polishchuk, P. Interpretation of Quantitative Structure-Activity Relationship Models: Past, Present, and Future. J. Chem. Inf. Model. 2017, 57, 2618–2639. (52) Sushko, I.; Novotarskyi, S.; Körner, R.; Pandey, A. K.; Cherkasov, A.; Li, J.; Gramatica, P.; Hansen, K.; Schroeter, T.; Müller, K. R; Xi, L.; Liu, H.; Yao, X.; Öberg, T.; Hormozdiari, F.; Dao, P.; Sahinalp, C.; Todeschini, R.; Polishchuk, P.; Artemenko, A.; Kuz'min, V.; Martin, T. M.; Young, D. M.; Fourches, D.; Muratov, E.; Tropsha, A.; Baskin, I.; Horvath, D.; Marcou, G.; Muller, C.; Varnek, A.; Prokopenko, V. V.; Tetko I. V. Applicability Domains for Classification Problems: Benchmarking of Distance to Models for Ames Mutagenicity Set. J. Chem. Inf. Model. 2010, 50, 2094–2111. (53) Sheridan, R. P.; Feuston, B. P.; Maiorov, V. N.; Kearsley, S. K. Similarity to Molecules in the Training Set Is a Good Discriminator for Prediction Accuracy in QSAR. J. Chem. Inf. Comput. Sci. 2004, 44, 1912–1928. (54) Blasco, J.; Fueyo, N.; Dopazo, C.; Chen, J.-Y. A Self-Organizing-Map Approach to 37

ACS Paragon Plus Environment

Page 38 of 40

Page 39 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Chemistry Representation in Combustion Applications. Combust. Theory Model. 2000, 4, 61–76. (55) Landrum, G. RDKit: Open-Source Cheminformatics Software. Retrieved May 25, 2019, http://www.rdkit.org. (56) Lipinski, C. A.; Lombardo, F.; Dominy, B. W.; Feeney, P. J. Experimental and Computational Approaches to Estimate Solubility and Permeability in Drug Discovery and Development Settings. Adv. Drug Deliv. Rev. 2001, 46, 3–26. (57) Sander, T.; Freyss, J.; von Korff, M.; Rufener, C. DataWarrior: An Open-Source Program for Chemistry Aware Data Visualization and Analysis. J. Chem. Inf. Model. 2015, 55, 460–473. (58) Wingert, B. M.; Oerlemans, R.; Camacho, C. J. Optimal Affinity Ranking for Automated Virtual Screening Validated in Prospective D3R Grand Challenges. J. Comput. Aided Mol. Des. 2018, 32, 287–297. (59) Scior, T.; Bender, A.; Tresadern, G.; Medina-Franco, J. L.; Martínez-Mayorga, K.; Langer, T.; Cuanalo-Contreras, K.; Agrafiotis, D. K. Recognizing Pitfalls in Virtual Screening: A Critical Review. J. Chem. Inf. Model. 2012, 52, 867–881. (60) Strobl, C.; Boulesteix, A.-L.; Kneib, T.; Augustin, T.; Zeileis, A. Conditional Variable Importance for Random Forests. BMC Bioinformatics 2008, 9, 307. (61) Cooke, B.; Schmidler, S. C. Statistical Prediction and Molecular Dynamics Simulation. Biophys. J. 2008, 95, 4497–4511. (62) Duan, L. L.; Feng, G. Q.; Zhang, Q. G. Large-Scale Molecular Dynamics Simulation: Effect of Polarization on Thrombin-Ligand Binding Energy. Sci. Rep. 2016, 6, 31488. (63) Gramatica, P. On the Development and Validation of QSAR Models. Methods Mol. Biol. 2013, 930, 499–526. (64) García-Sosa, A. T.; Firth-Clark, S.; Mancera, R. L. Including Tightly-Bound Water Molecules in de Novo Drug Design. Exemplification through the in Silico Generation of poly(ADP-Ribose)polymerase Ligands. J. Chem. Inf. Model. 2005, 45, 624–633. (65) Michel, J.; Tirado-Rives, J.; Jorgensen, W. L. Prediction of the Water Content in Protein Binding Sites. J. Phys. Chem. B 2009, 113, 13337–13346. (66) Lemmon, G.; Meiler, J. Towards Ligand Docking Including Explicit Interface Water Molecules. PLoS One 2013, 8, e67536. (67) Forli, S.; Olson, A. J. A Force Field with Discrete Displaceable Waters and Desolvation Entropy for Hydrated Ligand Docking. J. Med. Chem. 2012, 55, 623–638. (68) Forli, S.; Huey, R.; Pique, M. E.; Sanner, M. F.; Goodsell, D. S.; Olson, A. J. Computational Protein-Ligand Docking and Virtual Drug Screening with the AutoDock Suite. Nat. Protoc. 2016, 11, 905–919. (69) Miller, J.; Hardt, M. Stable Recurrent Models, 2019, arXiv:1805.10369. (70) Zhou, Y.-W.; Liu, J.-L.; Yu, Z.-G.; Zhao, Z.-Q.; Anh, V. Fractal and Complex Network Analyses of Protein Molecular Dynamics. Physica A: Statistical Mechanics and its Applications 2014, 416, 21–32. (71) Gaillard, T. Evaluation of AutoDock and AutoDock Vina on the CASF-2013 Benchmark. J. Chem. Inf. Model. 2018, 58, 1697-1706. (72) Kalliokoski, T.; Kramer, C.; Vulpetti, A.; Gedeck, P. Comparability of Mixed IC₅₀ Data - a Statistical Analysis. PLoS One 2013, 8, e61007. (73) Karlov, D. S.; Lavrov, M. I.; Palyulin, V. A.; Zefirov, N. S. MM-GBSA and MM-PBSA Performance in Activity Evaluation of AMPA Receptor Positive Allosteric Modulators. J. Biomol. Struct. Dyn. 2018, 36, 2508–2516. (74) Cortés-Ciriano, I.; Bender, A. Deep Confidence: A Computationally Efficient Framework for Calculating Reliable Prediction Errors for Deep Neural Networks. J. Chem. Inf. Model. 2019, 59, 1269-1281.

38

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

39

ACS Paragon Plus Environment

Page 40 of 40