Common Hits Approach: Combining Pharmacophore Modeling and

Jan 10, 2017 - Common Hits Approach: Combining Pharmacophore Modeling and Molecular Dynamics Simulations ... The approach uses the multiple coordinate...
3 downloads 13 Views 4MB Size
Article pubs.acs.org/jcim

Common Hits Approach: Combining Pharmacophore Modeling and Molecular Dynamics Simulations Marcus Wieder,*,†,‡ Arthur Garon,† Ugo Perricone,†,¶ Stefan Boresch,‡ Thomas Seidel,† Anna Maria Almerico,¶ and Thierry Langer† †

Faculty of Life Sciences, Department of Pharmaceutical Chemistry, University of Vienna, Althanstraße 14, 1090 Vienna, Austria Faculty of Chemistry, Department of Computational Biological Chemistry, University of Vienna, Währingerstraße 17, 1090 Vienna, Austria ¶ Department of Biological, Chemical and Pharmaceutical Sciences and Technologies (STEBICEF), University of Palermo, Via Archirafi 32, Palermo, Italy ‡

S Supporting Information *

ABSTRACT: We present a new approach that incorporates flexibility based on extensive MD simulations of protein−ligand complexes into structure-based pharmacophore modeling and virtual screening. The approach uses the multiple coordinate sets saved during the MD simulations and generates for each frame a pharmacophore model. Pharmacophore models with the same pharmacophore features are pooled. In this way the high number of pharmacophore models that results from the MD simulation is reduced to only a few hundred representative pharmacophore models. Virtual screening runs are performed with every representative pharmacophore model; the screening results are combined and rescored to generate a single hit-list. The score for a particular molecule is calculated based on the number of representative pharmacophore models which classified it as active. Hence, the method is called common hits approach (CHA). The steps between the MD simulation and the final hit-list are performed automatically and without user interaction. We test the performance of CHA for virtual screening using screening databases with active and inactive compounds for 40 protein−ligand systems. The results of the CHA are compared to the (i) median screening performance of all representative pharmacophore models of protein−ligand systems, as well as to the virtual screening performance of (ii) a random classifier, (iii) the pharmacophore model derived from the experimental structure in the PDB, and (iv) the representative pharmacophore model appearing most frequently during the MD simulation. For the 34 (out of 40) protein−ligand complexes, for which at least one of the approaches was able to perform better than a random classifier, the highest enrichment was achieved using CHA in 68% of the cases, compared to 12% for the PDB pharmacophore model and 20% for the representative pharmacophore model appearing most frequently. The availabilithy of diverse sets of different pharmacophore models is utilized to analyze some additional questions of interest in 3D pharmacophore-based virtual screening.



proposed by Koshland,4 and (iii) the conformational selection and population shift hypothesis proposed by Changeux and colleagues5 (different conformations involved in biomolecular recognition exist spontaneously in the absence of the regulatory ligand and the distribution of the protein conformations changes as a result of ligand binding). In the past decade it became clear that the question of molecular recognition and biological function is tightly bound to the dynamic behavior of the macromolecule. Proteins are inherently flexible systems sampling a large ensemble of

INTRODUCTION Highly specific interactions between biological macromolecules and their small molecule ligands are of fundamental importance in all domains of life. An understanding of biomolecular recognition and how the ligand/receptor interaction triggers or blocks a biological response is at the heart of every reductionist view on molecular biology.1,2 Over time, several theories with different levels of complexity for molecular recognition were established: (i) the “lock-andkey” model (the static modelconformations of the free and ligand-bound protein are essentially the same) developed by Fischer,3 (ii) the “induced-fit” hypothesis (ligands induce the receptor to adopt the conformation best suited for binding) © 2017 American Chemical Society

Received: November 2, 2016 Published: January 10, 2017 365

DOI: 10.1021/acs.jcim.6b00674 J. Chem. Inf. Model. 2017, 57, 365−385

Article

Journal of Chemical Information and Modeling interconverting conformations by structure fluctuation over a wide range of time scales, from nanoseconds to minutes. The resulting network of motions drives biomolecular recognition and is essential for protein function and regulation.6 While the development of X-ray crystallography marked a revolution in the understanding of biology and biochemistry, it also led to a distorting emphasis on a single, static coordinate set for understanding protein function and protein ligand interaction. A more complete description, therefore, requires considering the probabilities of the conformational states (thermodynamics) and the energy barriers between them (kinetics)both can be obtained by studying the time-dependent behavior of protein motion.7,8 These insights led to the general agreement that the lockand-key model is too simplistic and, therefore, should be replaced by approaches with a higher degree of complexity that take into account protein/ligand dynamics to a greater extent.2,9,10 Kinetic studies provide evidence that the conformational selection method should be favored over the induced fit approach; that is, only an ensemble of structures can represent and describe the interaction between a protein and a ligand properly.11−13 These findings have been recognized in different ways and to different degrees in the computer-aided molecular design field. Structure-based virtual screening tools aim to identify novel molecules as starting points for discovering drug candidates among thousands and more of available molecular structures solely on the basis of the complementarity of their steric and/or electrostatic properties to the target structure. These tools are nowadays routinely used for in-silico hit discovery, hit to lead expansion, and lead optimization.14 Broadly speaking, two structure-based virtual screening approaches can be distinguished: (i) molecular docking, which aims to predict the ligand conformation, as well as its position and orientation in the binding site and assess the binding affinity using scoring functions,15 and (ii) pharmacophore-based modeling and virtual screening, which first analyzes the interactions between a bound ligand and the protein environment with the aim to construct a 3D model of the observed protein−ligand interactions and then uses this model as an efficient filter in a subsequent screening run.16 Both methods have different advantages and disadvantages with regard to virtual screening as discussed in depth in the literature.17−19 Although protein flexibility and dynamics have been widely regarded as an important issue, most structure-based drug design and molecular modeling techniques operate only on the single coordinate set resulting from X-ray crystal structures stored in the RCSB protein data bank (PDB);15,20,21 in general this approach foreseeably limits the success rate of early drug discovery campaigns.14 Molecular docking approaches that incorporate some form of protein flexibility can be distinguished into (i) methods that use a single protein conformation and (ii) methods that use grid or structure ensembles.22 The former methods comprise soft docking (which reduces the steric component of the scoring function to simulate protein flexibility),23 relaxation methods (refinement of the docked complex using Monte Carlo (MC) or molecular dynamics (MD) simulations)24,25 and docking with stable protein backbone but flexible side chains (e.g., ref 26.). Ensemble docking uses multiple conformations that were generated prior to the actual docking process. These conformations are either obtained experimentally27 or calculated, for example, by MD simulations (e.g., ref 28) or elastic

network models (e.g., ref 29). Ensemble docking can either be performed by grid-based or structure-based procedures.22,30 While molecular docking has seen rapid and increasing development in the field of receptor flexibility, the progress in pharmacophore-based virtual screening with explicit protein flexibility, so far, has been slow. Nevertheless, there have been attempts in recent years to integrate protein flexibility into pharmacophore models. For example, in one such approach, multicomplex pharmacophore models are derived from multiple crystal structures of the same protein in contact with different small molecules. The interactions present for each ligand are identified and merged in pharmacophore maps.31−33 This approach, however, is limited to protein/ligand complexes for which multiple crystal structures are available and each ligand has the same binding mode. Also, pharmacophore-based virtual screening approaches using ensemble structures obtained from MD simulations (dynamic pharmacophore modeling) as opposed to crystal structures have been developed. Most recently, Choudhury et al. derived pharmacophore models for each structure saved during MD simulations, which were then ranked based on docking and screening results.34 While avoiding ambiguities associated with the clustering of structures, this approach requires activity data for validating/ranking the pharmacophore models. Other methods rely on clustering or averaging of the protein−ligand trajectory.35−39 These methods discard the majority of dynamic information obtained during the MD simulation in favor of a handful of presumably representative coordinate sets or features. The approaches rely on the ability of the MD simulation to represent configuration space and the capability of the human operator to select the most representative conformation or cluster.40,41 Furthermore, even in the case of a “perfect” MD simulation, which samples all relevant regions of phase space, and representative clustering, the problem remains that rare or metastable protein conformations may be responsible for ligand binding, and, therefore, clustering constitutes the wrong tool for improving virtual screening results.42,43 In a recent publication we reported the stability of individual pharmacophore features during MD simulations.44 A merged pharmacophore model was constructed consisting of all features observed during the MD simulations, and their frequency was used to prioritize features in the model. This approach, while it gave interesting insight into pharmacophore model dynamics, led to problems in its application to pharmacophore-based virtual screening. In several cases the combination of features from different protein conformations resulted in a consensus model that was not experimentally accessible; that is, the combination of particular features (e.g., all the features that appeared more than 50% of the time during the MD simulation) was never present simultaneously. Also, selecting specific 3D coordinates for the same features at different time steps of the MD simulation eventually led to substantial difficulties, analogous the averaging over Cartesian coordinates. It was not possible to find general solutions for these problems without relying on human intuition. Thus, all methods developed so far for integrating protein flexibility in pharmacophore models have shortcomings, and their assumptions can be questioned at different levels. The crucial, difficult step is the weighting of states (or pharmacophore models) in some form, or phrased differently, how to reduce the number of conformations to consider. 366

DOI: 10.1021/acs.jcim.6b00674 J. Chem. Inf. Model. 2017, 57, 365−385

Article

Journal of Chemical Information and Modeling

pharmacophore model picked from a set of pharmacophore models that have the same feature vector. A RPM should be distinguished from a regular (3D) pharmacophore model; the latter is obtained in the usual manner, for example, from the experimental crystal structure or a particular set of coordinates saved during a MD simulation. Utilizing RPMs reduces the amount of raw data dramatically and makes the approach computationally feasible. Nevertheless, one typically finds several hundred RPMs for a particular protein−ligand system. As described in detail below, we tested our approach for 40 systems, which were selected, among other criteria, based on the availability of screening libraries composed of known active and calculated inactive molecules from the DUD-E database.46 When performing virtual screening runs for each RPM, one, thus, obtains on average several hundred hit-lists for each system. However, instead of trying to pick a posteriori the best performing pharmacophore model, the multiple hit-lists provide background data that make it possible to gauge the performance of several approaches to pharmacophore based virtual screening. Any such method must pass two tests: (i) it must select or construct a hit-list that is significantly enriched with true active compounds for the majority of the investigated test systems. Furthermore, (ii) the hit-lists should perform better than the median screening results obtained with the RPMs for the majority of the systems analyzed. This second test indicates whether a given approach performs better than a randomly selected RPM would. In particular, we test three approaches against the RPMbased screening results. First, as a baseline we subject the “default” approach to this analysis; that is, we investigate the performance of the “usual” 3D pharmacophore model obtained from the experimental structure of the protein−ligand complex. Second, we scrutinize the performance of the RPM based on the feature vector that appeared most frequently during the MD simulations. Finally, we introduce and test the so-called common hits approach (CHA). This method utilizes the hitlists of all RPMs and uses them to rerank the hits found in the individual virtual screenings. The more often a compound is identified as a possible hit in the individual screenings, the higher its “common-hit” rank. The data reduction based on feature vectors and RPMs, combined with the CHA, is an automated procedure, which on the one hand, maintains much of the dynamical information gained during the MD simulations, and on the other hand, does not require manual intervention/selection, as was, for example, the case in our earlier merged pharmacophore approach. Additionally, the approach just outlined leads to the availability of a diverse set of different pharmacophore models for multiple systems. This made it possible to investigate two further questions, which are presented after the comparison of the three approaches. (i) Is the virtual screening performance of the RPMs correlated with the frequency of their appearance during the MD simulation? (ii) Is the number of pharmacophore features correlated with the number of active/inactive compounds that a pharmacophore model retrieves? While the first question is rather specific to our approach, the latter is of general interest in the field of 3D pharmacophore-based virtual screening.

An easy and obvious solution for this problem is to develop a method that can use all coordinate sets obtained during the MD simulation without clustering or the need of reducing the number of structures. Such an approach represents the most comprehensive alternative to selecting representative or average structures for pharmacophore-based and docking-based virtual screening.45 But it is also computationally expensive: if N coordinate sets are saved and M molecules are investigated the problem scales as N × M. For molecular docking the number of virtual screening trials is, in principle, equal to the number of obtained coordinate sets. There is no obvious way to reduce this number since it is not possible to detect a priori which protein structure will give good results in virtual screening. From a virtual screening point of view, every coordinate set that results in differently ranked molecule lists is information that could be potentially important. How to combine the information obtained is, of course, a different question. In pharmacophore modeling, the number of pharmacophore models (one pharmacophore model for each coordinate set) is equal to the number of virtual screening runs. The same argument as for molecular docking applies: from a virtual screening point of view every model that can result in new information is important. However, structure-based pharmacophore models have by far less variability, since pharmacophore feature space is very limited compared to configuration space. If all pharmacophore vector features are regarded as sphere features this space loses again some of its complexity; for example, a pharmacophore model with five features can be fully described by 5 × 3 coordinate sets of the spheres and 5 × 1 feature descriptions (i.e., the type of pharmacophore features): this 5 × 1 vector is subsequently called feature vector. This can be reduced further: since all features are mapped on ligand atoms and the ligand atoms can visit only a very limited number of conformations, it can be argued that the real variability in the screening results will not originate from the same feature vector mapped on different ligand coordinates, but instead from different feature vectors. This line of argument is strengthened by the fact that pharmacophore features have normally considerable tolerance radii (i.e., small variation in coordinates are compensated) and that structural conformations are generated and fitted during the screening. Thus, reducing pharmacophore models with the same feature vector to only one, as we refer to it, representative pharmacophore model (RPM) should not heavily impact virtual screening results. The time and computational cost savings of reducing the number of pharmacophore models, on the other hand, are substantial. The limitations of existing approaches to incorporate flexibility into pharmacophore-based virtual screening, combined with the considerations just made, led to the approach presented in this work. Molecular dynamics simulations are used to obtain an ensemble of structures, for each of which a 3D pharmacophore model is generated. These are pooled based on their feature vector; that is, all pharmacophore models having the same feature vector are considered as belonging to the same group. Then, a single 3D structure is chosen; together with the feature vector this becomes the RPM. All technical details of these and later steps are described in the Methods section. Throughout the remainder of this manuscript, it is important to differentiate between the following two concepts: a feature vector is a reduced representation of pharmacophore models with information about the constituting features but without 3D coordinates. A RPM is a fully fledged



METHODS System Preparation. For this study 43 different protein− ligand complexes were selected from the RCSB PDB (for more details on the different systems see Table S1 in Supporting

367

DOI: 10.1021/acs.jcim.6b00674 J. Chem. Inf. Model. 2017, 57, 365−385

Article

Journal of Chemical Information and Modeling

Figure 1. Generation of distinct feature vectors: four pharmacophore models obtained at different time steps during the MD simulation are reduced to four feature vectors. The four entries in the feature vector correspond to the four different pharmacophore features (H1, H2, HBD1, HBA1). The yellow spheres indicate hydrophobic features (H1 and H2), the green small spheres show hydrogen bond donors (HBD1) and the small red spheres show hydrogen bond acceptors (HBA1). In this example there are only three distinct feature vectors since feature vector 1 and feature vector 4 have the same combination of pharmacophore features.

Molecular Dynamics Simulation. The CHARMM-GUI web interface51 was used to solvate the protein−ligand complex and set up the simulations. MD simulations were carried out using CHARMM,52 utilizing the CHARMM/OpenMM coupling.53,54 Parameters and molecular topologies for the ligands were generated based on the CGenFF force field55 using the ParamChem server (cgenff.paramchem.org). The protein− ligand complexes were solvated in cubic boxes of TIP3P water.56 Ions were added to compensate any net charge of the solute and to set the ion concentration to 0.15 M KCl. The box sizes, number of atoms, and number of water molecules for each system can be found in Table S1 in Supporting Information. Electrostatic interactions were computed using the particle-mesh-Ewald method.57 SHAKE was used to keep all bonds involving hydrogen atoms rigid. After initial equilibration for 500 ps with a 1 fs time step, each system was simulated for 20 ns using Langevin dynamics at 303.15 K; the pressure was kept around 1 atm by a Monte Carlo barostat. The time step of the production calculation was 2 fs; coordinates were saved every 10 ps, resulting in 2000 coordinate sets per simulation. For every system 10 MD simulations were carried out, starting from the same initial coordinates, but different, randomly seeded velocities. This resulted in 20000 coordinate sets per system. The stability of

Information. The choice of the complexes was guided by the following considerations: size, only a single ligand, no metal ions involved in the binding pocket, and a set of active and inactive molecules available in the DUD-E database.46 The quality and correctness of the PDB structures was audited using the WHAT IF web interface to WHAT CHECK47 and vhelibs.48 The software MODELER 9.15 was used to add coordinates of missing amino acids.49 The protonation state of the protein and the ligand was assigned under the assumption of a solution pH of 7.2 with the software PropKa 3.1.50,51 The tautomeric state of the ligand was set in accordance with the isomeric SMILES string provided by the RCSB database (the structures of the ligands are displayed in Figure S1 of Supporting Information). For three protein−ligand systems, no pharmacophore model derived from the experimental structure led to meaningful results during virtual screening. Hence, in these cases additional simulations were carried out starting from PDB structures of the protein in complex with a different ligand. Specifically, 1XL2 was exchanged for 3TOF, 1MV9 was exchanged for 4RMD, and 3BQD was exchanged for 4P6W. While Table S1 provides details for all 43 simulations carried out, 1XL2, 1MV9, and 3BQD were excluded from further analysis; that is, the results and comparisons thereafter are based on 40 systems. 368

DOI: 10.1021/acs.jcim.6b00674 J. Chem. Inf. Model. 2017, 57, 365−385

Article

Journal of Chemical Information and Modeling

Figure 2. Workflow from distinct feature vector to representative pharmacophore model (RPM): The conformational energy of all ligand coordinates having the same distinct feature vector and appearing more than once during the MD simulations is computed with the MMFF94 force field (cf. main text). The coordinate set having the median energy is identified and chosen to build the RPM corresponding to this distinct particular feature vector.

Pharmacophore Generation. LigandScout 4.09.1 was used to generate a structure-based pharmacophore model59 for each frame saved during the MD simulations; that is, 20 000

the simulations was monitored by computing root-mean-square deviations for the protein and ligand, using the MDAnalysis package,58 as well as visual inspection of the trajectories. 369

DOI: 10.1021/acs.jcim.6b00674 J. Chem. Inf. Model. 2017, 57, 365−385

Article

Journal of Chemical Information and Modeling

From now on, we use the term representative pharmacophore model (RPM) to denote the combination of distinct feature vector and representative 3D structural information. It is important not to confuse the term “pharmacophore model”, that is, a particular model generated by LigandScout for some specific coordinate set, with the concepts of “distinct feature vector” and RPM just introduced. Virtual Screening and Postprocessing. The screening databases for the different protein−ligand systems were generated using the known active and calculated inactive molecules (decoys) provided by the DUD-E database for the different protein−ligand systems. The molecules were prepared as libraries for virtual screening using the LigandScout command line tool idbgen. Conformers were generated using the icon best option in idbgen; this option produces a maximum number of 200 conformations for each molecule processed. The number of unique active and unique decoy molecules are shown in Table S2 in Supporting Information. Molecules were deleted from the screening database either because the conformation calculation failed or LigandScout regarded a molecule as a duplicate. To identify molecules in different hit-lists, unique numbers were assigned to every molecule in the screening database. These identifier (ID) numbers were introduced because molecule names are often not uniquefor many active molecules there are tautomeric structures in the DUD-E database, which have the same molecule name. To avoid a bias toward particular molecules, every tautomer was treated as a separate molecule. Each RPM found for a protein−ligand complex was used for virtual screening of the DUD-E derived databases for this system. The same was done with the PDB pharmacophore model. Thus, for each system multiple hit-lists were obtained. The CHA was used to merge and utilize the information contained in the results of the individual screening runs. CHA operates on the multiple hit-lists and combines them into a merged hit-list consisting only of unique compounds. The molecules in this consensus list are ranked according to the number of hit-lists the compound appeared in; for example, if a molecule is present in many hit-lists it is ranked higher than a molecule that appears just in few hit-lists. These steps are illustrated in Figure 3. If two molecules happen to appear in the same number of hit-lists, the ordering is decided based on the sum of the scaled pharmacophore hit scores. Virtual screening returns a ranked list (hit-list) of compounds (hits); a successful screening ranks active molecules higher than inactive ones. To evaluate the performance of pharmacophore models as a useful classifier for distinguishing between active and inactive molecules, the receiver−operator characteristic (ROC) graph and the area under the receiver−operator characteristics graph (ROC-AUC) were calculated.62 The ROC graph plots the true positive rate (TPR) against the false positive rate (FPR) as described in ref 63 and is an established method to judge the performance of a binary classifier. The TPR is the ratio between the number of true positive hits in the hit-list and in the screening database, the FPR is the ratio between the number of false positive hits in the hit-list and in the screening database. For an individual AUC calculation of the ROC graph the possible values lie within the interval [0,1]. A ROC-AUC value of 0.5 represents the performance of a random classifier, which cannot distinguish between active and inactive compounds; a ROC-AUC value less than 0.5 indicates that the classifier favors inactive molecules, and a ROC-AUC value greater than 0.5

pharmacophore models for every protein−ligand system were generated. Waters were discarded during this step. The default feature constraints for the generation of the pharmacophore models were used as described in the manual (http://www. inteligand.com/ligandscout/manual/). Hydrogen bond interactions were transformed from vector features to point features; for example, a potential hydrogen bond acceptor/donor interaction is represented only by a spherical region on the ligand instead of a specific direction involving also the protein. In addition, a pharmacophore model was generated for the PDB structure after preprocessing it, as described above in the System preparation section. We point this out as in some cases, the resulting pharmacophore model was slightly different to the pharmacophore model that would originate from the raw coordinate set obtained from the RCSB PDB (see also Results). The pharmacophore model generated from the (preprocessed) protein and ligand crystal structure will be referred to as “PDB pharmacophore”. Feature Vector and Representative Pharmacophore Model Generation. The pharmacophore models derived from the coordinates saved during the MD simulations, plus the PDB pharmacophore model, were processed as follows. For each pharmacophore model a feature vector (represented as a bit string) was generated. Every dimension of the feature vector represents a unique pharmacophore feature of the system. Pharmacophore features are considered unique if they differ in either pharmacophore feature type or the ligand atoms involved. The entries of the feature vectors were initialized with zero; if one of the unique features appears in a particular pharmacophore model, the value of the corresponding entry was set to 1. This procedure results in 20 001 feature vectors per protein−ligand complex; 20 000 pharmacophore models obtained from the MD simulation plus the PDB pharmacophore. Next, the feature vectors were reduced to distinct feature vectors as follows (see Figure 1 for an illustration of the process): We counted how often a particular feature vector, that is, a specific combination of pharmacophore features, was realized during the MD simulations. Subsequently, this number will be referred to as appearance count. Thus, instead of using 20 000 individual feature vectors, we obtain a smaller number of distinct feature vectors, which were observed one or more times during the MD simulations. Distinct feature vectors observed just once were discarded (and considered as random artifacts); that is, only distinct feature vectors with an appearance count ≥2 were used in the next step. The distinct feature vectors as just described, however, lack the structural information needed for virtual screening. In fact, they correspond to multiple 3D feature coordinate sets obtained during the course of the MD simulations. Therefore, the conformational energy of the ligand for all structures belonging to a distinct feature vector was calculated using the OpenBabel60 implementation of the MMFF94 force field.61 On the basis of the resulting distribution of energies, we chose the 3D pharmacophore structure corresponding to the ligand conformation with the median energy for virtual screening. This procedure is illustrated in Figure 2. The decision to use the conformational energy as selection criteria was guided by the intention to avoid extreme ligand conformations. To achieve the same goal one could also regard the conformational energy of the ligand combined with the protein, RMSD-based criteria, or more specific selections based on angles and distances of moieties at the ligand or protein. 370

DOI: 10.1021/acs.jcim.6b00674 J. Chem. Inf. Model. 2017, 57, 365−385

Article

Journal of Chemical Information and Modeling

last point in the ROC graph calculated from the actual hit-list is located at (FPR|TPR) = (0.092|0.1). Beyond this point the graph is extrapolated linearly to 1.0 on the x- and y-axis. This enables the calculation of the ROC-AUC value at, for example, 2%: the AUC is calculated when either the TPR or the FPR reaches 0.2 along the extrapolated ROC graph. This implies, however, that starting from the last hit in the hit-list, the ROCAUC value will converge to 0.5. With respect to the CHA, one should keep in mind that the merged hit-list is on average (significantly) longer than that returned by a single virtual screening run. The ROC graph was chosen out of many established metrics (enrichment factor, the sum of logarithmic ranks, RIE, BEDROC (e.g., ref 62)) because the method (i) compares the virtual screening performance of the pharmacophore models to the performance of a random classifier and (ii) because it has a published solution to deal with hit-lists that have different length, yet need to be compared to each other. Other metrics cannot easily distinguish between hit-lists with different lengtha good example to illustrate the problems that occur if hit-lists have different length (but use the same screening database) is the case that 1 pharmacophore model identifies 2 active molecules out of 2 total hits and another pharmacophore model returns 50 active molecules out of 51 total hits. Relying only on the enrichment factor as metric, the first pharmacophore model would be regarded superior to the second model, which is obviously wrong.

Figure 3. Illustration of the common hits approach (CHA). Step 1 refers to Figure 1 and Step 2 is illustrated in Figure 2. For every RPM a separate hit-list is obtained. These hit-lists are merged and a score is assigned to every molecule representing the number of hit-lists of which it was part. The higher the score, the better the molecule is ranked in the resulting hit-list.



RESULTS AND DISCUSSION Characteristics of Pharmacophore Models Generated during MD Simulations. For each of the 40 protein−ligand systems 20 000 pharmacophore models were generated. Converting these pharmacophore models to feature vectors and consequently distinct feature vectors (cf. Figure 1) enables the analysis of how many distinct feature vectors were realized during the MD simulation. For the 40 protein−ligand systems on average 300 different combinations of pharmacophore features (i.e., distinct feature vectors) were obtained during the course of the multiple MD simulations. These values range from a minimum of 11 to a maximum of 3317 (see Table 1). On average 62% of the distinct feature vectors were seen more than once during the MD simulations and were used to construct a RPM. A single distinct feature vector represents on average 18 pharmacophore models (average over median values). Figure 4 shows a bar plot of all feature vectors that appeared during the MD simulations for a protein−ligand system; for the other systems the overall trend was very similar. The height of a bar indicates the number of pharmacophore models represented by the distinct feature vector. Typically, a small number of distinct feature vectors represents most of the pharmacophore models that appeared during the MD simulation. When feature vectors are sorted according to the number of pharmacophore models they represent, as is done in Figure 4, then the first 10 usually encompass already more than 80% of all pharmacophore models. It should be noted that the distinct feature vector of the PDB pharmacophore coincides with the feature vector observed most frequently during the MD simulations in only nine cases. If sorted as in Figure 4, the average rank of the feature vector of the PDB pharmacophore is only 21. In other words, the PDB pharmacophore does not represent the most abundant distinct feature vector occurring

means that it favors active molecules. Obviously, only the latter case is useful for virtual screening. The ROC-AUC was calculated for 1%, 2%, 5%, 10%, 20%, 50%, and 100% of the screening database.64 These seven ROCAUC values characterize the behavior of the ROC graph at the different percentages. To ease comparisons, we introduce the following two “averages”, defined as follows ⟨TE⟩ =

1 7

∑ AUCi ;

i = 1%, 2%, 5%, 10%, 20%, 50%, 100%

i

(1)

1 ⟨EE⟩ = 4

∑ AUCi ; i

i = 1%, 2%, 5%, 10% (2)

where TE stands for total enrichment and EE stands for early enrichment. A hit-list retrieved by LigandScout is a ranked subset of the screened libraries, and all compounds in the list are considered as positive hits. Thus, the TPR and FPR can only be obtained for a subset of the ROC graph. To solve this problem and still allow the calculation of the AUC despite partial data, we followed the solution given in Algorithm 1 in the work by Fawcett.63 The author proposes as a workaround to assume average TPR and FPR starting from the end of the hit-list. Consider the following example: a pharmacophore model retrieves 25 true positive and 275 false positive hits, that is, molecules wrongly classified as active compounds, from a screening database consisting of 250 active and 3000 inactive compounds. The actual ROC graph can only be constructed for the 300 hits. At this end point, the ROC graph has a TPR = 25/ 250 = 0.1 and a FPR = 275/3000 = 0.092. In other words, the 371

DOI: 10.1021/acs.jcim.6b00674 J. Chem. Inf. Model. 2017, 57, 365−385

Article

Journal of Chemical Information and Modeling

from all virtual screening runs. We start by an overview of the collected results for all approaches, summarized in Table 2. This is followed by the presentation of results/details specific for a particular approach. It is impossible to present all individual screening results, that is, the ROC-AUC values obtained for each of the individual RPMs of one system. Instead, in Table 2 we report for each protein−ligand complex the median ROC-AUC values for 1%, 2%, 5%, 10%, 20%, 50%, and 100% of the screening database obtained with all RPMs (cf. Methods). RPMs that did not retrieve any hits (neither active nor inactive compounds in the hit-list) have ROC-AUC values of only zeros and were excluded from the calculation of the median. In addition, the analogous ROC-AUC values obtained using the PDB pharmacophore, the HRPM, and the CHA are listed in Table 2 as well. The table is complemented by Figure 5, which shows the distribution of the ROC-AUC values of the RPMs for 1%, 2%, 5%, 10%, 20%, 50%, and 100% of the screening database for eight representative systems. Superposed on the box plots of Figure 5 are the median ROC-AUC values for all RPMS, the ROCAUC values at quartile 1 and quartile 3, as well as the ROCAUC values obtained using the PDB pharmacophore, the HRPM, and the CHA. In almost all of the following sections the median ROC-AUC values are used as reference against which we compare the performance of the PDB pharmacophore on the one hand, and the use of HRPM and CHA on the other hand. To avoid extensive repetition of text, we will use RPMmedian to abbreviate “median ROC-AUC values of the RPMs” and RPM3rdQuartile to abbreviate “ROC-AUC value of the RPMs at the third quartile”. Furthermore, the prefix TE (total enrichment) is used when we are concerned with the overall performance at 1%, 2%, 5%, 10%, 20%, 50% and 100% of the database. Similarly, EE stands for early enrichment, encompassing 1%, 2%, 5%, 10% of the database; see also Methods, in particular eqs 1 and 2. The comparison of the PDB pharmacophore, the HRPM and the CHA against the (median) performance of the individual RPMs is described in separate sections below, followed by a final comparison. It is based on two different, but intertwined criteria/questions. First, we identify the protein−ligand systems for which an approach performed better than a random classifier; that is, ⟨TE⟩ ROC-AUC value > 0.5. Any method that selects or generates hit-lists with ⟨TE⟩ ROC-AUC ≤ 0.5 is considered a failure for this particular system. In Table 3 these cases, which constitute the worst case scenario for rational drug design, are highlighted by an orange background. Second, for each of the approaches tested (PDB pharmacophore, HRPM, CHA) the ⟨TE⟩ and ⟨EE⟩ ROCAUC values were compared to the RPMmedian and RPM3rdQuartile results, that is, the distribution of the virtual screening runs with the individual RPMs. The comparison with the RPMmedian was of special interest. This particular test investigates the likelihood that a randomly selected RPM could reproduce the same results (null-hypothesis) and a one sided p-value is calculated that indicates the probability of obtaining a result as extreme as, or more extreme than, the result actually obtained when the null hypothesis is true. A significance level of α = 0.01 was used.65 Performance of the Representative Pharmacophore Models (RPMs) in Virtual Screening. The median ROC-AUC values (RPMmedian) for 1%, 2%, 5%, 10%, 20%, 50%, and 100% of the screened database are shown in Table 2. Skimming through the rows labeled “median” in Table 2 one sees that for three systems (1J4H, 3PBL, 1E66) ⟨TE⟩ RPMmedian is far below

Table 1. Details of Pharmacophore Models Generated during the MD Simulations PDB code

Nf.v.a

Nf.v.,≥2b

mPMc

rankd

⟨Nfeat.⟩e

f ⟨NRPM feat ⟩

1E66 1J4H 1L2S 1NJS 1SJ0 1SQT 1UYG 2AZR 2ETR 2FSZ 2GTK 2HV5 2HZI 2I78 2ICA 2OF2 2OJ9 2OJG 2P54 2QD9 2RGP 2ZDT 3BZ3 3C4F 3D4Q 3DOE 3EL8 3EML 3F9M 3HMM 3KBA 3L3M 3L5D 3LAN 3LPB 3NXO 3PBL 3TOF 4P6W 4RMD

83 143 300 3317 75 254 127 60 39 91 322 1338 67 1554 23 30 144 44 252 828 88 51 118 11 99 302 309 205 39 26 42 226 635 43 38 90 194 329 229 24

84.3 67.1 76.3 48.9 89.3 81.5 80.3 70.0 84.6 75.8 69.6 62.9 74.6 57.4 82.6 96.7 75.7 84.1 75.0 57.5 71.6 100 74.6 90.9 75.8 65.2 70.2 69.8 84.6 88.5 83.3 64.6 66.5 69.8 81.6 81.1 68.6 68.7 76.4 77.4

12 9 10 5 18 13 16 23 21 11 6 5 12 5 10 130 11 83 10 7 12 14 15 12 10 12 6 9 9 32 7 10 6 51 13 20 8 11 64 38

36 16 7 320 39 6 6 7 2 3 1 14 2 81 1 4 1 15 1 1 3 2 23 1 6 36 11 4 2 3 4 2 175 1 1 4 2 30 2 1

5.1 5.9 5.3 12.9 6.1 5.0 5.3 6.9 3.7 4.8 7.3 8.7 6.9 6.3 4.9 3.5 5.0 2.9 8.6 5.6 5.6 8.8 6.2 4.9 5.1 8.3 8.5 5.0 4.7 6.2 5.1 7.4 4.7 4.8 4.6 4.6 5.1 5.8 6.4 9.4

5.4 5.7 5.4 9.8 6.0 5.1 5.2 7.0 4.0 4.8 6.2 8.5 6.5 5.9 4.8 3.9 5.0 3.7 8.2 5.4 5.5 5.6 6.0 4.3 4.9 8.0 7.6 4.9 4.7 6.1 5.1 6.4 4.7 4.8 4.6 4.7 5.2 5.3 6.4 8.8

a

Number of distinct feature vectors. bNumber of distinct feature vectors that appear more than once during the simulations (in %). c Median number of pharmacophore models per distinct feature vector. d Rank of feature vector of the PDB pharmacophore. eAverage feature number of all pharmacophore models. fAverage number of features of RPMs.

during the MD simulation in 31 out of the 40 systems investigated. Virtual Screening Results. As outlined in the Introduction, the core of this work is virtual screening runs using all RPMs obtained for a particular protein−ligand complex. These data provide the baseline against which three approaches are tested. On the one hand, we use (i) the PDB pharmacophore; this corresponds to a standard 3D-based virtual screening approach. On the other hand, we analyze (ii) the virtual screening performance using just the RPM seen most often during the MD simulations; we refer to this particular RPM as the highest appearance RPM, abbreviated from now on as HRPM. (iii) We apply the CHA, which includes information 372

DOI: 10.1021/acs.jcim.6b00674 J. Chem. Inf. Model. 2017, 57, 365−385

Article

Journal of Chemical Information and Modeling

Figure 4. Distinct feature vectors sorted based on the number of pharmacophore models they represent (appearance count). This plot shows the feature vectors for 2OF2. The red bar indicates the PDB feature vector.

0.5. For some systems ⟨TE⟩ RPMmedian is close to 0.5 (3L3M, 3EL8, 2GTK, 3L5D, 2OF2, 1SQT and 2ZDT). Nevertheless, for the majority of the systems (31 out of the 40 systems) ⟨TE⟩ RPMmedian is above 0.5. In other words, in 77% of the complexes the median virtual screening performance of the RPMs is, in principle, useful for potential drug discovery. However, the RPMmedian values do not correspond to a single hit-list or an individual RPM. The median divides the performance of the RPMs in 50% that have ROC-AUC values above and 50% below its value, indicating that if a RPM were randomly chosen, chances are equally high that the RPM performs above or below the median. In fact, it should be noted in passing that for each of the reported 40 systems at least one RPM performed better in virtual screening than a random classifier. RPMmedian, however, does not provide information about the distribution of the ROC-AUC values of the RPMs. Illustrative examples for eight complexes (2I78, 2QD9, 2ICA, 2P54, 3EML, 3NXO, 3F9M and 3KBA) are shown in Figure 5. In all cases, ⟨TE⟩ RPMmedian was higher than 0.5. The box plots show that even though the median performance of the RPMs for these complexes is adequate, randomly picking a RPM could result in very poor performance. As an example consider the distribution of the ROC-AUC values for 2QD9 in Figure 5: 75% of the RPMs (all models between quartile 1 and quartile 3) have ROC-AUC values between 0.52 and 0.95 for 1% of the screening database. The remaining 25% of the models have ROC-AUC values ranging from 0 to 0.52 and from 0.95 to 1. Thus, while the majority of the RPMs performs well in virtual screening, a nonvanishing number of outliers could lead to very poor, that is, unusable, screening results. Such analyses/considerations based on, for example, box plots as shown in Figure 5 are possible only when the activity of the compounds in the databases is already known. Use of such methods is necessary to validate/develop methods, as done in this work. However, in actual applications the activity of the database entries are unknown. One could argue that at this point the sole benefit gained from MD simulations and multiple screening runs is an increase in hit-lists. This, per se, is not

useful for real-world drug discovery as there is no method that can guide a researcher to choose the best among multiple hitlists. All three methods considered next do not rely on such a priori knowledge (or, alternatively, a posteriori choosing). However, the availability of multiple hit-lists and RPMmedian constitutes a rational metric to validate whether these methods perform better than a randomly chosen RPM would. RPMmedian will be used as the central criterion to gauge the performance of the “default approach” (PDB pharmacophore) and the two newly developed methods (HRPM and CHA). Performance of the PDB Pharmacophore Model in Virtual Screening. An experimentally determined structure, normally taken from the RCSB PDB, is the usual starting point of a structural investigation of a protein−ligand system. Often, all further computational modeling is based on this single coordinate set, and this is also the case for the default manner of carrying out 3D pharmacophore-based virtual screening. Thus, any approach that is concerned with improving results of virtual screening needs to include and compare to screening results obtained using the pharmacophore model built from the PDB structure, that is, the PDB pharmacophore. Before continuing, we would like to remind the reader that the PDB pharmacophore results presented below were created using the processed/refined coordinate set obtained from the crystal structure (cf. Section System Preparation in Methods). In five cases the coordinate refinements changed the position of amino acid side chains, leading to pharmacophore models that differed from the pharmacophore model that would have been derived from the original, uncorrected crystal structure. For these protein−ligand complexes (3EL8, 2GTK, 3DOE, 1SQT, and 4P6W) either one feature was added or removed. The 3D coordinates of the other features were not changed. Conversely, for 35 out of the 40 investigated protein−ligand complexes the pharmacophore model obtained from the refined coordinates was identical (both in terms of features and 3D coordinates) to the pharmacophore model obtained from the unprocessed crystal coordinates. Starting from the raw data in Table 2, we applied two criteria to each of the 40 protein−ligand systems: (i) is the 373

DOI: 10.1021/acs.jcim.6b00674 J. Chem. Inf. Model. 2017, 57, 365−385

Article

Journal of Chemical Information and Modeling Table 2. Detailed Information on the Virtual Screening Results of the 40 Protein−Ligand Systemsa 1E66

1J4H

1L2S

1NJS

1SJ0

1SQT

1UYG

2AZR

2ETR

2FSZ

2GTK

2HV5

2HZI

2I78

2ICA

2OF2

median PDB HRPM CHA median PDB HRPM CHA median PDB HRPM CHA median PDB HRPM CHA median PDB HRPM CHA median PDB HRPM CHA median PDB HRPM CHA median PDB HRPM CHA median PDB HRPM CHA median PDB HRPM CHA median PDB HRPM CHA median PDB HRPM CHA median PDB HRPM CHA median PDB HRPM CHA median PDB HRPM CHA median

1%

2%

5%

10%

20%

50%

100%

0.38 0 0 0 0.06 0.48 0.48 0 1 1 0.94 1 1 0 0 1 1 1 1 1 0.50 0.31 0.93 0.65 0.78 0 0.88 1 1 1 1 1 0.98 1 1 0.88 0.65 0.29 0.03 0.74 0.50 0.36 0.47 0.88 1 1 1 1 0.77 0.49 0.49 1 1 0.99 0.98 1 0.90 0 0.97 1 0.49

0.44 0 0 0 0.21 0.49 0.49 0 1 1 0.97 1 1 0 0 0.88 1 1 1 1 0.50 0.16 0.76 0.55 0.74 0 0.79 1 1 1 1 1 0.97 1 1 0.87 0.58 0.21 0.08 0.58 0.50 0.40 0.48 0.66 0.96 1 1 1 0.65 0.50 0.50 0.91 1 0.99 0.99 1 0.82 0.08 0.89 1 0.44

0.46 0.17 0.20 0 0.31 0.50 0.50 0 1 1 0.98 1 1 0 0 0.68 1 1 1 1 0.52 0.09 0.61 0.71 0.61 0.32 0.82 0.49 1 0.98 1 1 0.90 1 1 0.81 0.53 0.32 0.08 0.34 0.50 0.45 0.49 0.56 0.80 0.85 0.98 1 0.56 0.50 0.50 0.86 0.98 0.99 0.98 1 0.67 0.17 0.85 0.95 0.44

0.48 0.32 0.33 0.05 0.40 0.50 0.50 0 0.99 1 0.95 1 1 0 0 0.59 1 1 1 1 0.51 0.24 0.56 0.81 0.58 0.51 0.86 0.30 0.99 0.97 1 1 0.85 0.99 0.99 0.85 0.52 0.40 0.07 0.25 0.50 0.48 0.50 0.47 0.66 0.70 0.82 1 0.53 0.50 0.50 0.78 0.92 0.98 0.93 1 0.58 0.22 0.55 0.57 0.47

0.49 0.41 0.42 0.07 0.44 0.50 0.50 0.01 0.88 1 0.82 0.97 1 0 0 0.54 0.97 1 0.99 1 0.51 0.48 0.53 0.86 0.53 0.53 0.89 0.43 0.90 0.90 0.87 0.80 0.71 0.85 0.92 0.86 0.51 0.45 0.07 0.19 0.50 0.49 0.50 0.45 0.58 0.60 0.67 0.94 0.51 0.50 0.50 0.68 0.77 0.92 0.82 0.99 0.54 0.25 0.46 0.36 0.49

0.50 0.47 0.47 0.12 0.48 0.50 0.50 0.05 0.67 0.97 0.63 0.92 0.90 0 0 0.52 0.72 0.78 0.75 1 0.50 0.51 0.51 0.88 0.51 0.48 0.74 0.57 0.67 0.68 0.65 0.59 0.58 0.70 0.70 0.83 0.50 0.48 0.23 0.21 0.50 0.50 0.50 0.43 0.53 0.53 0.56 0.73 0.50 0.50 0.50 0.57 0.61 0.69 0.70 0.74 0.52 0.29 0.47 0.31 0.50

0.50 0.49 0.49 0.24 0.49 0.50 0.50 0.21 0.56 0.77 0.55 0.76 0.66 0 0 0.51 0.58 0.60 0.59 0.74 0.50 0.50 0.50 0.83 0.50 0.49 0.59 0.52 0.56 0.57 0.55 0.53 0.53 0.57 0.57 0.68 0.50 0.49 0.40 0.34 0.50 0.50 0.50 0.45 0.51 0.51 0.52 0.58 0.50 0.50 0.50 0.52 0.54 0.57 0.57 0.57 0.51 0.41 0.49 0.44 0.50

374

DOI: 10.1021/acs.jcim.6b00674 J. Chem. Inf. Model. 2017, 57, 365−385

Article

Journal of Chemical Information and Modeling Table 2. continued

2OJ9

2OJG

2P54

2QD9

2RGP

2ZDT

3BZ3

3C4F

3D4Q

3DOE

3EL8

3EML

3F9M

3HMM

3KBA

PDB HRPM CHA median PDB HRPM CHA median PDB HRPM CHA median PDB HRPM CHA median PDB HRPM CHA median PDB HRPM CHA median PDB HRPM CHA median PDB HRPM CHA median PDB HRPM CHA median PDB HRPM CHA median PDB HRPM CHA median PDB HRPM CHA median PDB HRPM CHA median PDB HRPM CHA median PDB HRPM CHA median PDB

1%

2%

5%

10%

20%

50%

100%

0.43 0.39 0.11 0.96 1 1 1 0.98 0 1 1 0.98 0 1 1 0.74 0.99 0.88 0.98 0.97 1 0.84 1 0.49 0.49 0.53 0.80 0.99 1 0.93 1 0.69 0.96 0 0.74 1 1 1 1 1 1 1 1 0.49 0.42 0 0.31 0.93 1 1 1 0.96 0.91 0.87 0.99 1 1 1 1 0.94 0.84

0.50 0.43 0.11 0.95 1 1 1 0.98 0 1 1 0.98 0 1 1 0.67 0.98 0.90 0.98 0.96 0.98 0.90 1 0.49 0.49 0.41 0.57 0.99 1 0.95 1 0.64 0.97 0 0.65 1 1 1 1 1 1 1 1 0.50 0.46 0 0.27 0.89 1 0.99 0.99 0.90 0.93 0.87 0.99 0.99 1 0.96 1 0.89 0.83

0.51 0.47 0.26 0.86 1 1 1 0.99 0 1 1 0.99 0 1 1 0.58 0.95 0.90 0.95 0.84 0.65 0.93 1 0.49 0.50 0.44 0.50 0.98 1 0.85 1 0.53 0.98 0 0.66 1 0.99 1 1 0.99 0.99 1 1 0.50 0.48 0 0.34 0.75 1 0.97 0.99 0.84 0.78 0.78 0.98 0.90 1 0.95 1 0.76 0.77

0.51 0.49 0.43 0.80 0.99 0.99 1 0.97 0 0.99 1 0.97 0 0.99 1 0.54 0.87 0.80 0.82 0.71 0.41 0.83 0.99 0.50 0.50 0.46 0.51 0.92 0.95 0.70 1 0.52 0.97 0 0.65 0.98 0.97 1 1 0.83 0.83 0.95 0.94 0.50 0.49 0 0.41 0.68 1 0.94 0.99 0.78 0.67 0.75 0.96 0.75 0.99 0.90 0.94 0.62 0.67

0.51 0.49 0.53 0.70 0.98 0.97 0.99 0.95 0 0.97 1 0.95 0 0.97 1 0.52 0.73 0.66 0.70 0.61 0.31 0.66 0.89 0.50 0.50 0.48 0.45 0.78 0.78 0.60 0.95 0.53 0.81 0 0.60 0.88 0.93 1 1 0.68 0.68 0.77 0.78 0.50 0.50 0 0.45 0.61 0.99 0.90 0.99 0.68 0.58 0.66 0.96 0.64 0.95 0.74 0.87 0.56 0.59

0.50 0.50 0.56 0.60 0.90 0.80 0.93 0.84 0 0.82 0.98 0.85 0 0.82 0.98 0.51 0.60 0.56 0.67 0.54 0.19 0.55 0.61 0.50 0.50 0.49 0.43 0.60 0.60 0.54 0.68 0.51 0.62 0 0.57 0.66 0.89 0.97 0.98 0.56 0.56 0.60 0.61 0.50 0.50 0 0.48 0.54 0.87 0.77 0.91 0.57 0.53 0.56 0.95 0.55 0.70 0.59 0.78 0.52 0.53

0.50 0.50 0.53 0.54 0.67 0.61 0.68 0.63 0 0.62 0.72 0.63 0 0.62 0.72 0.50 0.53 0.52 0.65 0.51 0.34 0.52 0.51 0.50 0.50 0.50 0.49 0.54 0.54 0.51 0.54 0.51 0.54 0 0.53 0.56 0.68 0.71 0.89 0.52 0.52 0.54 0.54 0.50 0.50 0 0.49 0.51 0.64 0.60 0.76 0.52 0.51 0.52 0.85 0.52 0.57 0.53 0.61 0.51 0.51

375

DOI: 10.1021/acs.jcim.6b00674 J. Chem. Inf. Model. 2017, 57, 365−385

Article

Journal of Chemical Information and Modeling Table 2. continued

3L3M

3L5D

3LAN

3LPB

3NXO

3PBL

3TOF

4P6W

4RMD

HRPM CHA median PDB HRPM CHA median PDB HRPM CHA median PDB HRPM CHA median PDB HRPM CHA median PDB HRPM CHA median PDB HRPM CHA median PDB HRPM CHA median PDB HRPM CHA median PDB HRPM CHA

1%

2%

5%

10%

20%

50%

100%

0.75 1 0.49 0.49 0.59 0.92 0.48 0.48 0.95 0.08 0.79 0.83 0.86 0.99 0.99 1 0.99 1 0.95 0.99 1 1 0.08 0.02 0 0 0.57 1 0.49 0.62 0.55 1 0.89 0.45 0.93 1 0.93 1

0.44 1 0.50 0.50 0.55 0.87 0.49 0.49 0.95 0.37 0.76 0.86 0.88 0.97 0.98 1 0.99 0.99 0.87 0.96 0.99 0.99 0.12 0.04 0 0 0.54 1 0.49 0.68 0.51 0.99 0.73 0.25 0.76 1 0.76 1

0.32 1 0.50 0.50 0.52 0.81 0.50 0.50 0.93 0.40 0.69 0.83 0.89 0.94 0.95 1 0.99 0.97 0.71 0.77 0.88 0.89 0.22 0.08 0.02 0 0.53 0.85 0.50 0.75 0.50 0.99 0.60 0.19 0.61 1 0.61 1

0.32 0.83 0.50 0.50 0.51 0.83 0.50 0.50 0.90 0.52 0.60 0.75 0.87 0.89 0.90 0.97 0.96 0.96 0.63 0.73 0.72 0.83 0.32 0.23 0.07 0.01 0.52 0.69 0.50 0.77 0.50 0.97 0.55 0.27 0.56 1 0.56 0.87

0.35 0.60 0.50 0.50 0.50 0.80 0.50 0.50 0.86 0.62 0.55 0.72 0.78 0.82 0.80 0.90 0.81 0.94 0.57 0.65 0.61 0.74 0.39 0.36 0.17 0.03 0.51 0.60 0.50 0.77 0.50 0.81 0.52 0.34 0.53 0.86 0.53 0.70

0.43 0.48 0.50 0.50 0.50 0.73 0.50 0.50 0.77 0.69 0.52 0.63 0.62 0.72 0.63 0.71 0.69 0.83 0.52 0.56 0.54 0.56 0.46 0.45 0.33 0.13 0.50 0.53 0.50 0.72 0.50 0.62 0.51 0.41 0.51 0.64 0.51 0.57

0.48 0.47 0.50 0.50 0.50 0.59 0.50 0.50 0.66 0.62 0.51 0.55 0.54 0.60 0.55 0.62 0.64 0.67 0.51 0.52 0.51 0.51 0.49 0.48 0.44 0.31 0.50 0.51 0.50 0.61 0.50 0.54 0.50 0.47 0.50 0.55 0.50 0.52

a

For every system the ROC-AUC value for 1%, 2%, 5%, 10%, 20%, 50%, and 100% of the screening database are shown for the different investigated approaches. This includes the ROC-AUC values obtained from the RPMmedian, the PDB pharmacophore model, the RPM with the highest appearance count (abbreviated with “HRPM”) and the CHA.

the green pentagons (which represents the PDB pharmacophore) are on the right of the dotted line indicating the 0.5 threshold, the PDB pharmacophore model clearly performs better than a random classifier. An example for a PDB pharmacophore model that fails to perform better than a random classifier is 2ICA, which is also included in Figure 5. Comparing the ⟨TE⟩ of the PDB pharmacophore with ⟨TE⟩ RPMmedian shows that the virtual screening capability of the PDB pharmacophore model is superior for 3NXO (⟨TE⟩ value of 0.74 vs 0.68); this leads to the “y” entry in Table 3, row 3NXO, column “PDB pharmacophore” and subcolumn “TE > Q2”. Since the ⟨EE⟩ of the PDB pharmacophore model is also higher than the ⟨EE⟩ RPMmedian value, the corresponding entry in the subcolumn “EE > Q2” also contains a “y”. This can also easily be seen in Figure 5. Another example for a PDB pharmacophore model, which outperforms the ⟨TE⟩ and ⟨EE⟩ RPMmedian values is 3F9M, also shown in Figure 5. While all is well in the 3NXO example, the PDB pharmacophore model failed to obtain usable screening results

performance of the PDB pharmacophore better than that of a random classifier and (ii) are the ⟨TE⟩ and ⟨EE⟩ ROC-AUC values obtained with the PDB pharmacophore higher than the ⟨TE⟩ and ⟨EE⟩ values for RPMmedian. The pertinent results are summarized in Table 3. We illustrate the steps how these were obtained from the raw data (Table 2) by considering protein− ligand complex 3NXO. In Table 2 we see that TE RPMmedian values are 0.95 (1%), 0.87 (2%), 0.71 (5%), 0.63 (10%), 0.57 (20%), 0.52 (50%) and 0.51 (100%). The ⟨TE⟩ RPMmedian is 0.68 and the ⟨EE⟩ RPMmedian is 0.79. The ROC-AUC values for the PDB pharmacophore model are 0.99 (1%), 0.96 (2%), 0.77 (5%), 0.73 (10%), 0.65 (20%), 0.56 (50%), and 0.52 (100%). For the PDB pharmacophore model the ⟨TE⟩ ROC-AUC value is 0.74 and the ⟨EE⟩ ROC-AUC value is 0.86. The first test is to check whether the ⟨TE⟩ value for the PDB pharmacophore model is higher than 0.5. This is the case, and passing this test is indicated by the entry “y” in Table 3; see row 3NXO, column “PDB pharmacophore” and subcolumn “TE > 0.5”. One also sees this in the box plot for 3NXO, shown in Figure 5. Since 376

DOI: 10.1021/acs.jcim.6b00674 J. Chem. Inf. Model. 2017, 57, 365−385

Article

Journal of Chemical Information and Modeling

Figure 5. For eight systems (2I78, 2QD9, 2ICA, 2P54, 3EML, 3NXO, 3F9M and 3KBA) the distribution of the ROC-AUC values for 1%, 2%, 5%, 10%, 20%, 50% and 100% of the screening database is shown. The mean value is indicated by a red spot, the median value by a blue, vertical line. The first and third quartile are the lower (first quartile) and upper (third quartile) boundary of the green box, the whiskers are the interquartile distance (IQD) multiplied by 1.5. The whiskers do not exceed the minimum or maximum value. This means that in the case of (Q1 − (IQD × 1.5) < minimum value) or (Q3 + (IQD × 1.5) > maximum value) the whiskers are set to the minimum or maximum value. Data points exceeding the whiskers are indicated by black crosses and regarded as outliers. The ROC-AUC values of the PDB pharmacophore is indicated by a green pentagon, the AUC values of the HRPM are indicated by a blue circle, and the performance of the CHA is shown as a yellow star. The dotted line at x = 0.5 indicates the performance of a random classifier.

for 16 out of 40 systems, based on the ⟨TE⟩ ROC-AUC ≤ 0.5 criterion, that is, in 40% of all cases. These failures include models that could not retrieve any hits (1NJS and 2OJG), but also models that retrieved hit-lists that showed a random distribution of active and inactive molecules (1E66, 1J4H, 1SQT, 1UYG, 2FSZ, 2GTK, 2HZI, 2ICA, 2OF2, 2ZDT, 3EL8, 3L3M, 3L5D, and 3PBL). As already mentioned, if a model fails to deliver screening results that are superior to a random classifier, the cells are highlighted in orange in Table 3. One of the failed systems (2ICA) is also shown as a box plot in Figure 5. To facilitate orientation how many systems passed or failed a particular test, the total number of successes is listed in the last row of Table 3. For the remaining 24 protein−ligand system the ROC-AUC values obtained for different percentages of the database are uniformly above 0.5. In some cases the EE ROC-AUC values are far above 0.5, but the more molecules are considered in the

ROC-AUC calculation, the more the results approach 0.5. Such a behavior can be seen, for example, for 2QD9. Its ⟨EE⟩ ROCAUC value is 0.84, whereas the ROC-AUC value obtained for 100% of the screening database is 0.53. As already outlined in Methods, this is a consequence of extrapolating the ROC graph. The hit-list retrieved by the PDB pharmacophore of 2QD9 consists of 259 true positives (28% of all active compounds present in the data set) and 8216 false positives (23% of all inactive compounds present in the data set). Thus, once the full hit-list has been exhausted, the ROC graph is at position y = 0.28 (TPR) and x = 0.23 (FPR). Starting from this point, the graph is extrapolated to 1 on the y-axis and 1 on the x-axis (as described in the Methods section). This linear extrapolation leads to ROC-AUC values of 0.6 for 50% and 0.53 for 100% of the screening database. The PDB pharmacophore was able to obtain ⟨TE⟩ ROCAUC values for 23 systems that are higher than the ⟨TE⟩ 377

DOI: 10.1021/acs.jcim.6b00674 J. Chem. Inf. Model. 2017, 57, 365−385

Article

Journal of Chemical Information and Modeling Table 3. Comparison of the Results of Virtual Screening Using the PDB Pharmacophore, the HRPM, and the CHAa

a The basis of this comparison is the raw data shown in Table 2. For each approach five tests were carried out: (i) is the TE above the quartile (TE > Q3), (ii) is the EE above the third quartile (EE > Q3), (iii) is the TE above the second quartile (TE > Q2), (iv) is the EE above the second quartile (EE > Q2) and finally (v) is the ⟨TE⟩ > 0.5 (TE > 0.5). If ⟨TE⟩ ≤ 0.5 the system is highlighted in orange.

RPMmedian, and for 15 systems ⟨TE⟩ ROC-AUC values that are greater than ⟨TE⟩ RPM3rd Quartile. In five (2I78, 2QD9, 2P54, 3EML, and 3NXO) out of the eight samples shown in Figure 5 the performance of the PDB pharmacophore model is better than RPMmedian. The probability of randomly selecting a RPM that is above or below the median is 0.5, so it is a trivial exercise to calculate the probability of selecting 23 out of 40 hit-lists that are higher than or equal the RPMmedian. The one-tailed p-value for 23 positive results out of 40 trials is 0.215; this indicates that this result is not statistically significant at an α level of 0.01. Analyzing the ⟨TE⟩ ROC-AUC value of the screening results of the 24 protein−ligand systems that performed better than a random classifier showed that for 20 systems the PDB pharmacophore model outperformed the RPMmedian (even though the last row in Table 3 shows the value of 23 for the number of systems with ⟨TE⟩ RPMmedianbut subtracting the “y” entry for 2ZDT, 2OF2, and 1J4H which had ⟨TE⟩ ROCAUC values ≤ 0.5 results in 20). An investigation of only the EE of the PDB pharmacophore screening results for the 24 protein−ligand systems shows that the PDB model performed better than the RPMmedian for 18 systems.

In summary, the PDB pharmacophore gave usable hit-lists in only ≈60% of the investigated cases. If a given hit-list had a ROC-AUC value >0.5, the virtual screening performance of the PDB pharmacophore was above the RPMmedian for 83% of the systems. However, if all 40 systems are taken as the baseline, the PDB pharmacophore model outperformed RPMmedian only in 58%. Performance of the HRPM in Virtual Screening. It seems reasonable to assume that the most important interactions between a protein and a ligand will occur frequently during a MD simulation, ignoring complications, such as errors in the force field, etc. If this is the case, the HRPM should represent the most relevant interaction pattern between the ligand and the protein. Adding dynamical information from MD simulations would be straightforward; the PDB pharmacophore could simply be replaced by the HRPM. In this section we, therefore, report the screening results of the HRPM. Relevant data can be found in Tables 2 and 3. The analysis presented below is identical to that for the PDB pharmacophore given in the previous section. 378

DOI: 10.1021/acs.jcim.6b00674 J. Chem. Inf. Model. 2017, 57, 365−385

Article

Journal of Chemical Information and Modeling

Figure 6. ROC-AUC values (y-axis) at 1% (blue), 2% (green), 5% (red), 10% (violet), 20% (yellow), 50% (aqua), and 100% (pink) of the screening database are shown for two protein−ligand complexes (3BZ3 and 3F9M). RPMs are sorted based on descending appearance count. RPMs that appear more often during the MD simulations are located on the left, models with lower appearance count on the right of the plot.

For 13 systems use of the HRPM led to ⟨TE⟩ ROC-AUC values ≤ 0.5. Three of the failures (3EL8, 3C4F, and 1NJS) could not retrieve any hits; the remaining 10 systems retrieved hit-lists enriched with false positives. We discuss one of the systems, for which the HRPM failed (3KBA) in more detail since it represents a rare case that was observed only for this particular complex. The very early enrichment (ROC-AUC value at 1%) of the model was reasonable (AUC: 0.75), but the ROC-AUC values for larger percentages of the database all were below 0.5. The ROC-AUC values are 0.44 (2%), 0.32 (5%), 0.32 (10%), 0.35 (20%), 0.43 (50%), and 0.48 (100%), cf. Table 2. This continuous prevalence of false positive compounds justifies the classification of the model as a random classifierand the good early enrichment as an outlier. For 24 of the 40 investigated systems the HRPM obtained ⟨TE⟩ ROC-AUC values above ⟨TE⟩ RPMmedian. This is a slight improvement over the PDB pharmacophore, the one-tailed pvalue of this result is 0.134. Only for four systems did the HRPM perform worse than RPMmedian, but in these cases it was still able to discriminate inactive compounds. For 13 systems the ⟨TE⟩ ROC-AUC value was higher than the ⟨TE⟩ RPM3rd Quartile. The ⟨EE⟩ ROC-AUC value was for 22 systems higher than the ⟨EE⟩ RPMmedian. Overall, the results are similar to those obtained for the PDB pharmacophore: if the model has ⟨TE⟩ ROC-AUC > 0.5, then its performance is better than RPMmedian in most cases.

In summary the HRPM was able to retrieve hit-lists with ⟨TE⟩ ROC-AUC > 0.5 for 67.5% of all investigated protein− ligand complexes. If the virtual screening performance of the model was above the performance of a random classifier, it performed better than RPMmedian in 85% of the cases when using ⟨TE⟩ ROC-AUC value as criterion, and in 74% when using ⟨EE⟩ ROC-AUC value as criterion. When considering all systems, the approach performed better than RPMmedian for 60% of the complexes. However, the HRPM was not able to outperform a randomly chosen RPM by a statistically significant margin (α level of 0.01), as reflected by the onetailed p-value of 0.134. The virtual screening performance and, arguably, failure of the HRPM raised the more general question whether the ROCAUC values of a RPM and its appearance count are correlated. To address this question, we proceeded as follows. (i) The screening results were sorted based on the appearance count of their RPM, and the ROC-AUC was plotted for the different percentages of the database to facilitate visual inspection of the virtual screening results (see Figure 6). (ii) We calculated Spearman’s rank correlation coefficient ρ66 between the different percentages of the ROC-AUC and the appearance count of the RPM. In Figure 6 the screening results of the RPMs sorted according to their appearance count are shown for two systems; the plots for all other 38 systems are shown in Figure S3 in Supporting Information. The rank of the RPMs are plotted on the x-axis, and the respective screening result is 379

DOI: 10.1021/acs.jcim.6b00674 J. Chem. Inf. Model. 2017, 57, 365−385

Article

Journal of Chemical Information and Modeling

Table 4. Average, Median, Maximum, and Minimum Value of the Spearman’s Rank Correlation Coefficient ρ between the Appearance Count and the Virtual Screening Results. Additionally, the Average, Median, Maximum, and Minimum p-Values for Each Corresponding Coefficient Are Given average ρ average p-value median ρ median p-value max ρ max p-value min ρ min p-value

1%

2%

5%

10%

20%

50%

100%

0.06 0.39 0.06 0.39 0.47 0.91 −0.27 0.0022

0.06 0.35 0.09 0.24 0.30 0.83 −0.26 0.0004

0.08 0.35 0.09 0.23 0.40 0.87 −0.29 0.0001

0.08 0.30 0.12 0.22 0.39 0.99 −0.29 0.0001

0.10 0.30 0.13 0.16 0.65 0.95 −0.42 0.0001

0.11 0.30 0.12 0.21 0.78 0.92 −0.31 0.0002

0.12 0.30 0.13 0.22 0.80 0.88 −0.39 0.0003

shown on the y-axis. The RPMs are sorted in descending order of their appearance count; that is, the model with the highest count (the HRPM) is located on the very left. The plots for the two systems shown in Figure 6 do not show any obvious dependencies on the appearance count. Similar observations were made for all other protein−ligand systems investigated. Visual inspection can be misleading, so we calculated Spearman’s rank correlation coefficients. Spearman’s ρ assesses the rank correlation of the data using a monotonic function, and unlike Pearson’s correlation factor, a positive or negative Spearman’s correlation factor does not indicate a linear relationship between the values, but a monotonic relationship between the rank of the values. Spearman’s ρ is a value on the closed interval [−1,+1], −1 indicates a perfect indirect correlation, 0 indicates no correlation, and +1 a perfect positive correlation.66 Spearman’s rank correlation coefficient was calculated per system using the appearance count as independent variable and the different ROC-AUC values of every RPM as dependent variables. Subsequently, the average, the median, the minimum, and the maximum value of ρ, as well as their corresponding p-values, were calculated for the 40 systems studied. The results are summarized in Table 4. The median values, which are between 0.06 for 1% and 0.13 for 100% of the screening library, indicate that there is no correlation between the appearance count and the virtual screening performance of the pharmacophore model. This finding suggests that choosing the RPM which appears most often during the MD simulation is not a suitable strategy to ensure that the resulting RPM will perform better than other RPMs that appear less frequently. The Common Hits Approach (CHA). As described in Methods, the CHA operates on the hit-lists returned from the individual virtual screening runs of all RPMs. The multiple hitlists are merged and reduced to unique molecules, which are sorted based on their hit-count (molecules that have a high count, i.e. that were present in multiple hit-lists, are at the beginning of the list, and molecules with a low count at the end). This results in a new, resorted hit-list in which all molecules that were retrieved by the RPMs are present. As for the PDB pharmacophore- and HRPM-based approaches just discussed, all pertinent data can be found in Tables 2 and 3 and evaluated analogously. A first look at the bottom row of Table 3 and, in particular, at the total number of cases for which ⟨TE⟩ CHA > 0.5, shows, that 32 out of 40 hit-lists obtained from the CHA passed this criterion. In other words, the CHA delivered a hit-list that is useful for further work in 80% of the systems studied. Furthermore, the CHA screening results with ⟨TE⟩ ROCAUC value greater than 0.5 outperformed the ⟨TE⟩ RPMmedian

baseline in all 32 cases. This means that the CHA screening results are above ⟨TE⟩ RPMmedian in 32 out of the total 40 systems. The one tailed p-value of 9.11 × 10−5 indicates that this result is statistically significant at an α level of 0.01. The screening results for the EE were slightly inferior; the ROCAUC values of the hit-list obtained with the CHA were above RPMmedian in 30 cases (one tailed p-value = 0.0011). For all systems shown in Figure 5 the CHA returned hit-lists with ⟨TE⟩ and ⟨EE⟩ ROC-AUC values above RPMmedian and 0.5. For 26 systems the TE of the hit-list obtained with the CHA was also better than the RPM 3rd Quartile , for EE the corresponding value is 22. The CHA result for 2ICA merits a more detailed discussion. The early enrichment AUC values are above the median and for 1%, 2%, and 5% the AUC values are even above quartile 3 of the RPM values. But at 10% the AUC value (0.57) drops down slightly below the median performance of the RPMs (0.58), and the AUC values for 20% (0.36), 50% (0.31), and 100% (0.44) are below 0.5. Overall, the mean AUC value obtained with the CHA hit-list is still above 0.5 and above RPMmedian, but 2ICA is an example for a hit-list that starts with good enrichment; yet, the more hits considered, the poorer the results. In summary, for 80% of all protein−ligand complexes investigated the CHA was able to retrieve hit-lists with ⟨TE⟩ ROC-AUC above 0.5 and above the ⟨TE⟩ RPMmedian. For 65% of the systems the screening results are better than the ⟨TE⟩ RPM3rdQuartile. The p-value of 9.11 × 10−5 for the 32 systems that performed better than the RPMmedian is statistically significant; that is, the CHA can be expected to lead to better ROC-AUC values than a randomly selected RPM. Comparison of the Virtual Screening Results. Having evaluated the virtual screening performance of the three approaches, PDB pharmacophore, HRPM, and CHA, against a random classifier and the RPMmedian independently, we now compare their screening performance against each other. We focus exclusively on the TE ROC-AUC values. While we are aware of the importance of the early enrichment in judging the usefulness of a hit-list for drug discovery, we think that percentages chosen to represent the TE (1%, 2%, 5%, 10%, 20%, 50%, and 100%) are already biased toward a strong emphasis on the early enrichment. Our presentation and discussion is based on the data summarized in Table 5. It consists of three subtables: Table 5A shows for how many systems the three approaches were able to obtain hit-lists with ROC-AUC values greater than 0.5. It presents separately for how many systems none, one, two, and all three approaches worked, including all possible combina380

DOI: 10.1021/acs.jcim.6b00674 J. Chem. Inf. Model. 2017, 57, 365−385

Article

Journal of Chemical Information and Modeling

AUC values greater than 0.5. In just two cases the CHA failed, but the other approaches performed better than a random classifier. These findings suggest that there is a “core group” of 20 protein−ligand systems for which all three approaches were able to obtain good virtual screening results, based on ROCAUC values greater than 0.5. The PDB pharmacophore model increased this number to 24, the HRPM to 27, and the CHA to 32. Within the core group of 20 systems, the highest enrichment was obtained using the PDB pharmacophore in four cases, using the HRPM in one case, and the CHA in 15 cases. The previous to last row in Table 5A lists the number of systems for which neither of the three approaches was able to obtain an enrichment with a ROC-AUC value greater than 0.5. These six protein−ligand systems are 1E66, 1J4H, 2FSZ, 2OF2, 3EL8, and 3PBL. A further investigation of these six systems might reveal possible ways to gauge in advance how likely a pharmacophore-based virtual screening approach will work or fail. The CHA performed for 32 protein−ligand systems better than a random classifier, and for 23 of these systems it performed also better than the PDB pharmacophore model and the HRPM. By comparison, the PDB pharmacophore performed better than a random classifier for 24 systems, and it had the best enrichment in 7 of these 24 cases. Finally, the HRPM was able to obtain a hit-list that was significantly enriched with active compounds for 27 systems, and it had the best enrichment in 4 out of 27 systems. These numbers can be found in the bottom row of each subtable, labeled “Sum”. For this summation only systems with ROC-AUC > 0.5 were counted since it is useless to compare the performance of random classifiers. The same trend is visible in Tables 5B and 5C, which contain the same analysis based on different criteria. For example, the number of systems for which the CHA obtained ROC-AUC values above RPMmedian is 32, compared to 23 using the PDB pharmacophore, and 24 using the HRPM. Similarly, the CHA leads to ROC-AUC values above RPM3rdQuartile for 26, the PDB pharmacophore for 15, and the HRPM for 13 systems. On the basis of these numbers it seems fair to conclude that CHA outperformed the other methods, PDB pharmacophore and HRPM, by a wide margin. To conclude, we return to the statistical significance of the results, in particular the significance of choosing or constructing a hit-list that performs better than RPMmedian. The one-tailed pvalues were 0.214 for the PDB pharmacophore, 0.134 for the HRPM, and 9.108 × 10−5 for the CHA. These numbers indicate that the results for the PDB pharmacophore and for the HRPM are not statistically significant; that is, one cannot expect these approaches to perform better than a randomly selected RPM. The opposite is true for CHA; here the probability of a randomly selected RPM leading to a better performance in virtual screening than the CHA is vanishingly small. Correlation between Number of Features and Virtual Screening Performance. In the field of pharmacophore modeling, the number of pharmacophore features is regarded as a key parameter to influence the number of hits (both true and false positive active molecules in the hit-list) a pharmacophore model will retrieve from a screening database. It has been speculated that pharmacophore models with a higher number of features will result in a lower number of hits, also possibly influencing the number of retrieved true positive and false positive molecules. There are sound arguments for

Table 5. Number of Systems for Which the Different Approaches Yield Virtual Screening Results with ROC-AUC Values > 0.5, ROC-AUC values > Q2, and ROC-AUC values > Q3 shown in Subtables A, B, and Ca

a Different combinations of the results are shown, e.g., the number of systems, for which the PDB pharmacophore and the CHA (cells highlighted in green), but not the HRPM (cell highlighted in red) returned ROC-AUC values >0.5. Within a column, cells highlighted in green indicate approaches which are added to the “Sum”, the respective last row of each subtable. The values in the previous to last row of each subtable, i.e., the cases for which all methods fail, are in parentheses to indicate that they are not added to the total. For one system (1JH4) the PDB and HRPM approach returned the same hitlist; this hit-list was counted for both approaches (indicated by the asterisk). The columns “Highest ROC-AUC values” list how many times a particular method performs best, based on the criterion of the subtable.

tions. The subtable further indicates how often one of the three approaches had the highest ROC-AUC value. Table 5B presents analogous results, using ROC-AUC values above Q2 as the criterion, and Table 5C reports the results for systems with ROC-AUC values above Q3. For 1J4H identical screening results were obtained for the PDB pharmacophore and the HRPM; therefore, this system was counted for both approaches. Instances where this matters are marked by an asterisk in Table 5. We begin by discussing the number of hit-lists with ROCAUC values greater than 0.5 (Table 5A). The PDB pharmacophore passed this test for 24 (60%), the HRPM for 27 (67.5%), and the CHA for 32 (80%) of the 40 investigated systems. Table 5A also shows that there are 20 systems for which all three approaches succeeded to obtain ROC-AUC values greater than 0.5. Further, Table 5A shows that there is no system, for which only the PDB pharmacophore worked. There is only one system, for which only the HRPM, and four systems, for which only the CHA returned hit-lists with ROC381

DOI: 10.1021/acs.jcim.6b00674 J. Chem. Inf. Model. 2017, 57, 365−385

Article

Journal of Chemical Information and Modeling

Table 6. Correlation Analysis between the ⟨NRPM feat ⟩ (Independent Variable) and the Number of Total Retrieved Hits, Retrieved True Positive Hits (Active Molecules) And False Positives (Inactive Molecules) of the RPMs (dependent variables). The Average, Median, Maximum, and Minimum Value of the Spearman’S Rank Correlation Coefficient ρ and the p-Values for the 40 Protein−ligand Systems Are Listed

this hypothesis: the more features of which the pharmacophoric representation of a specific compound consists, the more specific the model becomes for compounds that are of similar chemical structure, for example, the same molecular scaffold. Using, for example, seven features to represent the interaction between ligand and protein will lead to a very specific pharmacophore model of the original ligand, and only a small number of drug candidates will be able to fulfill the chemical and steric requirements of this model (most likely all of them will be very similar to the original ligand). Obviously, a (too) small number of features, for example, ≤3, will lead to a large number of unspecific hits, the vast majority of which is likely to prove useless during further refinement. To the best of our knowledge, this line of reasoning has never been tested rigorously, or, at least, the results were never published. In particular, possible correlations between the number of features of a pharmacophore model and the number of retrieved true/false positive hits have never been investigated in detail. The raw data generated in this study are 20 000 pharmacophore models each for 40 protein−ligand complexes. This makes it possible to address such questions in a quantitative manner. On the basis of our data, a pharmacophore model on average had 6.2 pharmacophore features; the median was 5.5. These numbers varied quite strongly for the 40 systems studied; the two extremes are the average number of features for 2OGT (2.9) and 1NJS (12.9). For 2OGT the low number of features was the result of multiple pharmacophore models that had zero, one, or two featuresnone of them usable for virtual screening. The ligand structure of 1NJS (Figure S1 in Supporting Information) helps explain the high number of features for this particular system: the ligand has many functional groups capable of forming ligand−protein interactions, and many of these potential interactions are actually realized in the course of the MD simulations. The two rightmost columns of Table 1 list the average number of features of all pharmacophore models (⟨Nfeat.⟩), as well as of the RPMs (⟨NRPM feat. ⟩). The differences between the two columns are caused by the way the averages are computed. To investigate correlations between the number of features and the number of hits, we computed Spearman’s rank correlation coefficient ρ for each system, using the number of features (⟨NRPM feat ⟩) as independent variable and the number of retrieved hits, retrieved true positive hits (active molecules) and false positives (inactive molecules) of every RPM as dependent variables. Subsequently, the average, the median, the minimum, and the maximum of the individual ρ values and corresponding p-values for all 40 systems were calculated. These results are shown in Table 6. As shown in Table 6, the median Spearman’s rank correlation coefficient factor between the number of features and the number of hits is −0.90. This finding supports the hypothesis that there is a clear indirect correlation between the ranks of these two parameters. The very low median p-value (1.37 × 10−23) indicates that this finding is statistically significant. The results are in accord with the observation of researchers in the field of virtual screening that pharmacophore models with a large number of features tend to retrieve fewer hits than models with a low pharmacophore feature count. In computer-aided drug discovery the number of true positive hits is more important than the number of total hits. As one can see in Table 6, there is also a strong and statistically significant (indicated by the low median p-values) negative correlation between the number of features and the rank of the

no. of hits average ρ average p-value median ρ median p-value min ρ min p-value max ρ max p-value

−0.85 1.87 × −0.90 1.37 × −0.33 1.50 × −0.96 4.58 ×

10−3 10−23 10−283 10−2

no. of inactives

no. of actives

−0.84 5.25 × −0.90 2.46 × −0.50 1.59 × −0.96 1.18 ×

−0.72 8.05 × −0.74 2.18 × −0.32 5.47 × −0.95 2.76 ×

10−3 10−17 10−132 10−1

10−4 10−16 10−320 10−2

number of retrieved actives/inactives. However, the correlation coefficients suggest that the influence of the number of features is different on the TPR and FPR, respectively. The median ρ between the number of features and the false positives is −0.90, in contrast to only −0.74 for the true actives. This suggests that in order to decrease false positives in the hit-list, increasing the number of features is a suitable strategy. Unfortunately, as reflected by the lower ρ, this strategy will not work as well to increase the number of true actives.



CONCLUSION The default procedure in virtual screening is the direct use of the crystal structure of the protein−ligand complex to derive a pharmacophore model. We tested the performance of this approach in order to have a baseline for our MD based methods. In 24 out of our 40 test systems, that is, in only 60% of all cases, the resulting hit-lists were significantly enriched with true active molecules (⟨TE⟩ ROC-AUC > 0.5). The high number of systems for which the PDB pharmacophore did not lead to a ⟨TE⟩ ROC-AUC value greater than 0.5 is alarming. Our findings suggest that the default PDB pharmacophorebased approach is likely to fail in four out of 10 cases. For the 24 systems, which passed this initial test, the PDB pharmacophore model led to good enrichment values, especially for the early enrichment. Using the median virtual screening performance of all RPMs obtained from the MD simulations as reference, the PDB pharmacophore-based approach beat RPMmedian in 88% of the 24 systems having ⟨TE⟩ ROC-AUC > 0.5. Considering all 40 systems, the PDB pharmacophore model scored better than RPMmedian for 23 systems. Nevertheless, the associated one-tailed p-value of 0.214 indicates that the PDB pharmacophore model does not outperform a randomly selected RPM. The two new approaches reported in this work, that is, use of the HRPM and the CHA, both increased the number of hit-lists that were significantly enriched with active compounds. Using HRPM, the number of systems with ⟨TE⟩ ROC-AUC > 0.5 was 27 out of 40 (compared to 24 for the PDB pharmacophore based approach), and for the CHA the success rate increased to 32 out of 40. There are two reasons why we hesitate to suggest the use of the HRPM as a first improvement over the PDB pharmacophore model. First, the number of systems (24) for which the HRPM performed better than RPMmedian was only slightly higher, and was not statistically significant (one-tailed p382

DOI: 10.1021/acs.jcim.6b00674 J. Chem. Inf. Model. 2017, 57, 365−385

Article

Journal of Chemical Information and Modeling

in an automated and rational manner for a specific protein− ligand system. Since it uses protein−ligand structures obtained from MD simulations these pharmacophore models can be considered as reasonable regarding the underlying molecular structures and many of them might provide a good abstract description of the ligand interactions with the binding-site. Furthermore, the generation and screening of all RPMs allow for an in-depth analysis of the virtual screening results for a particular protein−ligand system. This enables a direct comparison of the pharmacophore models and a statistical evaluation of key properties of interest. In the future we will look for ways to identify in advance protein−ligand systems, for which the PDB pharmacophore model will not yield reasonable virtual screening results. It would obviously be very useful to know prior to validation, for which systems the more expensive CHA is likely to be necessary and in which cases the default PDB pharmacophore approach will suffice. We will also work closely with industry partners to integrate the approach developed here into commercially available software solutions (e.g., LigandScout), as well as an open-source library.

value = 0.134). Second, employing Spearman’s correlation coefficient between the appearance count and the virtual screening performance, we found no statistically significant correlation between them, as supported by their high median pvalues. By contrast, the CHA led to robust results. Our data indicate that it is a suitable and stable method to merge and resort hits obtained from multiple virtual screenings. The CHA utilizes the information gathered from a MD simulation in the form of multiple pharmacophore models in a rational manner. In 80% of the systems tested here, the CHA gave a final hit-list, which performed better than a random classifier (⟨TE⟩ ROC-AUC > 0.5) and better than RPMmedian. Compared with the two other approaches, these are impressive results. The CHA was able to reduce the number of systems for which virtual screening failed to give usable results from ≈40% for the PDB pharmacophore model to just 20%. Further, the number of systems for which the CHA beat RPMmedian is statistically significant with a p-value of 9.108 × 10−5; that is, the results obtained using the CHA are significantly better than one would obtain selecting a RPM at random. In addition, the ROC-AUC values obtained with the CHA outperformed the other approaches by far. For the 34 protein−ligand systems, for which either one, two, or all three of the approaches were able to perform better than a random classifier, the CHA gave the best result in 23 cases (cf. Table 5). By contrast, the HRPM and the PDB pharmacophore model gave the best results in just four and seven cases, respectively. Thus, the computational effort of performing MD simulations and multiple (up to some hundreds) virtual screening runs is well spent. Furthermore, the computational costs for the CHA are not unreasonably high: we ran the calculations on a small cluster (16 nodes, each equipped with 16 CPU cores and a Nvidia GeForce 980GTX graphics card); the total time required for the MD simulations and the subsequent virtual screening runs was on average 48 h per system. The Supporting Information section contains a more in depth table about the computational cost of each approach. While we believe that the number and length of MD simulations is reasonable to sample the conformational changes of the ligand in the binding pocket, we are aware of the limitations the simulation length impose on the observation of larger protein structural changes. But, since the focus in the presented work was clearly on the ligand we believe that the length and number of MD simulations is legitimat,e67 but in principal it can be adopted to any desired time scale. In computational drug discovery, it is of the utmost importance that the virtual screening step, regardless of the underlying principle, returns a hit-list that is either easily identifiable as faulty, or in which the top ranked compounds contain significantly more active than inactive compounds. Pharmacophore models that lead to hit-lists enriched with false positives waste a lot of money and time since they are not easily recognized as flawed. The costs that arise from synthesizing and testing false positive compounds are orders of magnitude higher than carrying out MD simulations and (multiple) virtual screening analyzes. Therefore, every opportunity to increase the number of true positive hits even slightly (as long as it can be reasonably explained and is statistically robust) should be pursued. It should be emphasized that the work-flow starting from the MD simulations and ending with the multiple RPMs and their screening results is, to the best of the authors’ knowledge, the first method to generate a set of unique pharmacophore models



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jcim.6b00674. In Figure S1 the ligand structures of all 43 investigated ligands are shown. In Table S1 various details about the 43 protein−ligand systems used in the MD simulations are given, including the resolution of the X-ray structure, the charge on the ligand during the MD, as well as the total number of atoms, the number of water molecules and the dimensions of the simulation box for the MD simulation. Table S2 provides information about the number of active and inactive molecules that were present in the DUD-E database and selected for the screening database by LigandScout. Table S3 and its associated section “Computational demand of the usage of the PDB pharmacophore model, the HRPM and the CHA” provides comparative information about the computational costs of the PDB pharmacophore approach, the HRPM and the CHA. Figure S2 shows the box-plots of the ROC-AUC values for all systems and Figure S3 shows the screening results sorted based on the appearance count of the RPMs (PDF)



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. ORCID

Marcus Wieder: 0000-0003-2631-8415 Notes

The authors declare no competing financial interest.



REFERENCES

(1) Regenmortel, M. H. V. Reductionism and Complexity in Molecular Biology. EMBO Rep. 2004, 5, 1016−1020. (2) Boehr, D. D.; Nussinov, R.; Wright, P. E. The Role of Dynamic Conformatinal Ensembles in Biomolecular Recognition. Nat. Chem. Biol. 2009, 5, 789−796. (3) Fischer, E. Einfluss der Configuration auf die Wirkung der Enzyme. Ber. Dtsch. Chem. Ges. 1894, 27, 2985−2993.

383

DOI: 10.1021/acs.jcim.6b00674 J. Chem. Inf. Model. 2017, 57, 365−385

Article

Journal of Chemical Information and Modeling (4) Koshland, D. E. Enzyme Flexibility and Enzyme Action. J. Cell. Comp. Physiol. 1959, 54, 245−258. (5) Monod, J.; Wyman, J.; Changeux, J. P. On the Nature of Allosteric Transitions: a Plausible Model. J. Mol. Biol. 1965, 12, 88− 118. (6) Cozzini, P.; Kellogg, G. E.; Spyrakis, F.; Abraham, D. J.; Costantino, G.; Emerson, A.; Fanelli, F.; Gohlke, H.; Kuhn, L. A.; Morris, G. M.; Orozco, M.; Pertinhez, T. A.; Rizzi, M.; Sotriffer, C. A. Target Flexibility: An Emerging Consideration in Drug Discovery and Design. J. Med. Chem. 2008, 51, 6237−6255. (7) Henzler-Wildman, K.; Kern, D. Dynamic Personalities of Proteins. Nature 2007, 450, 964−972. (8) Shi, Y. A Glimpse of Structural Biology through X-Ray Crystallography. Cell 2014, 159, 995−1014. (9) Parak, F. G. Proteins in Action: The Physics of Structural Fluctuations and Conformational Changes. Curr. Opin. Struct. Biol. 2003, 13, 552−557. (10) Feixas, F.; Lindert, S.; Sinko, W.; McCammon, J. A. Exploring the Role of Receptor Flexibility in Structure-Based Drug Discovery. Biophys. Chem. 2014, 186, 31−45. (11) Hatzakis, N. S. Single Molecule Insights on Conformational Selection and Induced Fit Mechanism. Biophys. Chem. 2014, 186, 46− 54. (12) Vogt, A. D.; Pozzi, N.; Chen, Z.; Di Cera, E. Essential Role of Conformational Selection in Ligand Binding. Biophys. Chem. 2014, 186, 13−21. (13) Changeux, J.-P.; Edelstein, S. Conformational Selection or Induced Fit? 50 Years of debate resolved. F1000 biology reports 2011, 3, 19. (14) Kokh, D. B.; Wade, R. C.; Wenzel, W. Receptor Flexibility in small-molecule Docking Calculations. Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2011, 1, 298−314. (15) Meng, X. Y.; Zhang, H. X.; Mezei, M.; Cui, M. Molecular Docking: A Powerful Approach for Structure-Based Drug Discovery. Curr. Comput.-Aided Drug Des. 2011, 7, 146−157. (16) Sanders, M. P. A.; McGuire, R.; Roumen, L.; de Esch, I. J. P.; de Vlieg, J.; Klomp, J. P. G.; de Graaf, C. From the Protein’s Perspective: the Benefits and Challenges of Protein Structure-Based Pharmacophore Modeling. MedChemComm 2012, 3, 28−38. (17) Bielska, E.; Lucas, X.; Czerwoniec, A.; Kasprzak, J. M.; Kaminska, K. H.; Bujnicki, J. M. Virtual Screening Strategies in Drug Design - Methods and Applications. BioTechnologia 2011, 92, 249− 264. (18) Chen, Z.; Li, H.-l.; Zhang, Q.-j.; Bao, X.-g.; Yu, K.-q.; Luo, X.-m.; Zhu, W.-l.; Jiang, H.-l. Pharmacophore-based Virtual Screening versus Docking-based Virtual Screening: a benchmark comparison against eight targets. Acta Pharmacol. Sin. 2009, 30, 1694−708. (19) Peach, M. L.; Nicklaus, M. C. Combining Docking with Pharmacophore Filtering for Improved Virtual Screening. J. Cheminf. 2009, 1, 1−15. (20) Berman, H. M.; Westbrook, J. D.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. The Protein Data Bank. Nucleic Acids Res. 2000, 28, 235−242. (21) Ferreira, L. G.; Dos Santos, R. N.; Oliva, G.; Andricopulo, A. D. Molecules 2015, 20, 13384−13421. (22) Lexa, K. W.; Carlson, H. A. Protein Flexibility in Docking and Surface Mapping. Q. Rev. Biophys. 2012, 45, 301−43. (23) Jiang, F.; Lin, W.; Rao, Z. SOFTDOCK: Understanding of Molecular Recognition through a Systematic Docking Study. Protein Eng., Des. Sel. 2002, 15, 257−263. (24) Mizutani, M. Y.; Takamatsu, Y.; Ichinose, T.; Nakamura, K.; Itai, A. Effective Handling of Induced-Fit Motion in Flexible Docking. Proteins: Struct., Funct., Genet. 2006, 63, 878−891. (25) Pencheva, T.; Lagorce, D.; Pajeva, I.; Villoutreix, B. O.; Miteva, M. a. AMMOS: Automated Molecular Mechanics Optimization Tool for In Silico Screening. BMC Bioinf. 2008, 9, 438. (26) Leach, A. R. Ligand Docking to Proteins with Discrete SideChain Flexibility. J. Mol. Biol. 1994, 235, 345−356.

(27) Bottegoni, G.; Kufareva, I.; Totrov, M.; Abagyan, R. FourDimensional Docking: A Fast and Accurate Account of Discrete Receptor Flexibility in Ligand Docking. J. Med. Chem. 2009, 52, 397− 406. (28) Hritz, J.; De Ruiter, A.; Oostenbrink, C. Impact of Plasticity and Flexibility on Docking Results for Cytochrome P450 2D6: A combined Approach of Molecular Dynamics and Ligand Docking. J. Med. Chem. 2008, 51, 7469−7477. (29) Cheng, S.; Niv, M. Y. Molecular Dynamics Simulations and Elastic Network Analysis of Protein Kinase B (Akt/PKB) Inactivation. J. Chem. Inf. Model. 2010, 50, 1602−1610. (30) Campbell, A. J.; Lamb, M. L.; Joseph-McCarthy, D. EnsembleBased Docking using Biased Molecular Dynamics. J. Chem. Inf. Model. 2014, 54, 2127−2138. (31) Yang, S.-Y. Pharmacophore Modeling and Applications in Drug Discovery: Challenges and recent Advances. Drug Discovery Today 2010, 15, 444−450. (32) Wu, F.; Xu, T.; He, G.; Ouyang, L.; Han, B.; Peng, C.; Song, X.; Xiang, M. Discovery of Novel Focal Adhesion Kinase Inhibitors Using a Hybrid Protocol of Virtual Screening Approach based on Multicomplex-Based Pharmacophore and Molecular Docking. Int. J. Mol. Sci. 2012, 13, 15668−15678. (33) Zou, J.; Xie, H. Z.; Yang, S. Y.; Chen, J. J.; Ren, J. X.; Wei, Y. Q. Towards more Accurate Pharmacophore Modeling: MulticomplexBased Comprehensive Pharmacophore Map and Most-FrequentFeature Pharmacophore Model of CDK2. J. Mol. Graphics Modell. 2008, 27, 430−438. (34) Choudhury, C.; Priyakumar, U. D.; Sastry, G. N. Dynamics based Pharmacophore Models for Screening Potential Inhibitors of Mycobacterial Cyclopropane Synthase. J. Chem. Inf. Model. 2015, 55, 848−860. (35) Park, C.; Thangapandian, S.; Lee, Y.; Son, M. Molecular Dynamic Simulation and Receptor-based Pharmacophore Modeling on Human Renin for Discovery of Novel Inhibitors. Waset. Org. 2013, 7, 33−38. (36) Sohn, Y. S.; Park, C.; Lee, Y.; Kim, S.; Thangapandian, S.; Kim, Y.; Kim, H. H.; Suh, J. K.; Lee, K. W. Multi-Conformation Dynamic Pharmacophore Modeling of the Peroxisome Proliferator-Activated Receptor γ for the Discovery of Novel Agonists. J. Mol. Graphics Modell. 2013, 46, 1−9. (37) Thangapandian, S.; John, S.; Arooj, M.; Lee, K. W. Molecular Dynamics Simulation Study and Hybrid Pharmacophore Model Development in Human LTA4H Inhibitor Design. PLoS One 2012, 7, e34593. (38) Thangapandian, S.; John, S.; Lee, Y.; Kim, S.; Lee, K. W. Dynamic Structure-Based Pharmacophore Model Development: A New and Effective Addition in the Histone Deacetylase 8 (HDAC8) Inhibitor Discovery. Int. J. Mol. Sci. 2011, 12, 9440−9462. (39) Spyrakis, F.; Benedetti, P.; Decherchi, S.; Rocchia, W.; Cavalli, A.; Alcaro, S.; Ortuso, F.; Baroni, M.; Cruciani, G. A Pipeline to Enhance Ligand Virtual Screening: Integrating Molecular Dynamics and Fingerprints for Ligand and Proteins. J. Chem. Inf. Model. 2015, 55, 2256−2274. (40) Sperandio, O.; Mouawad, L.; Pinto, E.; Villoutreix, B. O.; Perahia, D.; Miteva, M. A. How to Choose Relevant Multiple Receptor Conformations for Virtual Screening: A Test Case of Cdk2 and Normal Mode Analysis. Eur. Biophys. J. 2010, 39, 1365−1372. (41) De Paris, R.; Quevedo, C. V.; Ruiz, D. D.; Norberto De Souza, O.; Barros, R. C. Clustering Molecular Dynamics Trajectories for optimizing Docking Experiments. Comput. Intell Neurosci 2015, 2015, 916240. (42) Sinko, W.; Lindert, S.; Mccammon, J. A. Accounting for Receptor Flexibility and Enhanced Sampling Methods in ComputerAided Drug Design. Chem. Biol. Drug Des. 2013, 81, 41−49. (43) Plattner, N.; Noé, F. Protein Conformational Plasticity and Complex Ligand-Binding Kinetics Explored by Atomistic Simulations and Markov Models. Nat. Commun. 2015, 6, 7653. (44) Wieder, M.; Perricone, U.; Boresch, S.; Seidel, T.; Langer, T. Evaluating The Stability of Pharmacophore Features Using Molecular 384

DOI: 10.1021/acs.jcim.6b00674 J. Chem. Inf. Model. 2017, 57, 365−385

Article

Journal of Chemical Information and Modeling Dynamic Simulations. Biochem. Biophys. Res. Commun. 2016, 470, 685−689. (45) Alonso, H.; Bliznyuk, A. A.; Gready, J. E. Combining Docking and Molecular Dynamic Simulations in Drug Design. Med. Res. Rev. 2006, 26, 531−568. (46) Mysinger, M. M.; Carchia, M.; Irwin, J. J.; Shoichet, B. K. Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking. J. Med. Chem. 2012, 55, 6582−6594. (47) Vriend, G. WHAT IF: A Molecular Modeling and Drug Design Program. J. Mol. Graphics 1990, 8, 52−56. (48) Cereto-Massagué, A.; Ojeda, M. J.; Joosten, R. P.; Valls, C.; Mulero, M.; Salvado, M. J.; Arola-Arnal, A.; Arola, L.; Garcia-Vallvé, S.; Pujadas, G. The Good, the Bad and the Dubious: VHELIBS, a Validation Helper for Ligands and Binding Sites. J. Cheminf. 2013, 5, 1−9. (49) Webb, B.; Sali, A. Comparative Protein Structure Modeling Using MODELLER. Curr. Protoc. Bioinformatics 2014, 2014, 5.6.1− 5.6.32. (50) Olsson, M. H. M.; SØndergaard, C. R.; Rostkowski, M.; Jensen, J. H. PROPKA3: Consistent Treatment of Internal and Surface Residues in Empirical pKa Predictions. J. Chem. Theory Comput. 2011, 7, 525−537. (51) Jo, S.; Kim, T.; Iyer, V. G.; Im, W. CHARMM-GUI: A webbased Graphical User Interface for CHARMM. J. Comput. Chem. 2008, 29, 1859−1865. (52) Brooks, B. R.; Brooks, C. L.; Mackerell, A. D.; Nilsson, L.; Petrella, R. J.; Roux, B.; Won, Y.; Archontis, G.; Bartels, C.; Boresch, S.; Caflisch, A.; Caves, L.; Cui, Q.; Dinner, A. R.; Feig, M.; Fischer, S.; Gao, J.; Hodoscek, M.; Im, W.; Kuczera, K.; Lazaridis, T.; Ma, J.; Ovchinnikov, V.; Paci, E.; Pastor, R. W.; Post, C. B.; Pu, J. Z.; Schaefer, M.; Tidor, B.; Venable, R. M.; Woodcock, H. L.; Wu, X.; Yang, W.; York, D. M.; Karplus, M. CHARMM: The Biomolecular Simulation Program. J. Comput. Chem. 2009, 30, 1545−1614. (53) Eastman, P.; Friedrichs, M. S.; Chodera, J. D.; Radmer, R. J.; Bruns, C. M.; Ku, J. P.; Beauchamp, K. A.; Lane, T. J.; Wang, L. P.; Shukla, D.; Tye, T.; Houston, M.; Stich, T.; Klein, C.; Shirts, M. R.; Pande, V. S. OpenMM 4: A Reusable, Extensible, Hardware Independent Library for High Performance Molecular Simulation. J. Chem. Theory Comput. 2013, 9, 461−469. (54) Lee, J.; Cheng, X.; Swails, J. M.; Yeom, M. S.; Eastman, P. K.; Lemkul, J. A.; Wei, S.; Buckner, J.; Jeong, J. C.; Qi, Y.; Jo, S.; Pande, V. S.; Case, D. A.; Brooks, C. L.; MacKerell, A. D.; Klauda, J. B.; Im, W. CHARMM-GUI Input Generator for NAMD, GROMACS, AMBER, OpenMM, and CHARMM/OpenMM Simulations Using the CHARMM36 Additive Force Field. J. Chem. Theory Comput. 2016, 12, 405−413. (55) Vanommeslaeghe, K.; Hatcher, E.; Acharya, C.; Kundu, S.; Zhong, S.; Shim, J.; Darian, E.; Guvench, O.; Lopes, P.; Vorobyov, I.; Mackerell, A. D. CHARMM General Force Field: A Force Field for Drug-Like Molecules Compatible with the CHARMM All-Atom Additive Biological Force Fields. J. Comput. Chem. 2010, 31, 671−690. (56) MacKerell, A.; Bashford, D.; Bellott, M.; Dunbrack, R.; Evanseck, J.; Field, M.; Fischer, S.; Gao, J.; Guo, H.; Ha, S.; JosephMcCarthy, D.; Kuchnir, L.; Kuczera, K.; Lau, F.; Mattos, C.; Michnick, S.; Ngo, T.; Nguyen, D.; Prodhom, B.; Reiher, W.; Roux, B.; Schlenkrich, M.; Smith, J.; Stote, R.; Straub, J.; Watanabe, M.; Wiorkiewicz-Kuczera, J.; Yin, D.; Karplus, M. All-Atom Empirical Potential for Molecular Modeling and Dynamics Studies of Proteins. J. Phys. Chem. B 1998, 102, 3586−3616. (57) Essmann, U.; Perera, L.; Berkowitz, M. L.; Darden, T.; Lee, H.; Pedersen, L. G. A Smooth Particle Mesh Ewald Method. J. Chem. Phys. 1995, 103, 8577−8593. (58) Michaud-Agrawal, N.; Denning, E. J.; Woolf, T. B.; Beckstein, O. MDAnalysis: A Toolkit for the Analysis of Molecular Dynamics Simulations. J. Comput. Chem. 2011, 32, 2319−2327. (59) Wolber, G.; Langer, T. LigandScout: 3-D pharmacophores Derived from Protein-Bound Ligands and their Use as Virtual Screening Filters. J. Chem. Inf. Model. 2005, 45, 160−169.

(60) O’Boyle, N. M.; Banck, M.; James, C. A.; Morley, C.; Vandermeersch, T.; Hutchison, G. R. Open Babel: An Open Chemical Toolbox. J. Cheminf. 2011, 3, 1−14. (61) Halgren, T. A. Merck Molecular Force Field. II. MMFF94 van der Waals and Electrostatic Parameters for Intermolecular Interactions. J. Comput. Chem. 1996, 17, 520−552. (62) Zhao, W.; Hevener, K. E.; White, S. W.; Lee, R. E.; Boyett, J. M. A Statistical Framework to Evaluate Virtual Screening. BMC Bioinf. 2009, 10, 225. (63) Fawcett, T. An Introduction to ROC Analysis. Pattern Recognition Letters 2006, 27, 861−874. (64) Jain, A. N.; Nicholls, A. Recommendations for Evaluation of Computational Methods. J. Comput.-Aided Mol. Des. 2008, 22, 133− 139. (65) Dahiru, T. P-value, a True Test of Statistical Significance? A Cautionary Note. Ann. Ib. Postgrad. Med. 2008, 6, 21−6. (66) Corder, G. W.; Foreman, D. I. Nonparametric Statistics for NonStatisticians 2009, 1−6. (67) Perez, J. J.; Tomas, M. S.; Rubio-Martinez, J. Assessment of the Sampling Performance of Multiple-Copy Dynamics versus a Unique Trajectory. J. Chem. Inf. Model. 2016, 56, 1950.

385

DOI: 10.1021/acs.jcim.6b00674 J. Chem. Inf. Model. 2017, 57, 365−385