ColDock: Concentrated Ligand Docking with All-Atom Molecular

Jul 11, 2018 - ColDock: Concentrated Ligand Docking with All-Atom Molecular ... spontaneous ligand binding to the correct binding sites within a 100 n...
1 downloads 0 Views 5MB Size
Article Cite This: J. Phys. Chem. B XXXX, XXX, XXX−XXX

pubs.acs.org/JPCB

ColDock: Concentrated Ligand Docking with All-Atom Molecular Dynamics Simulation Kazuhiro Takemura,†,#,∥ Chika Sato,‡,∞,∥ and Akio Kitao*,§ Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, and ‡Institute of Molecular and Cellular Biosciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo, Tokyo 113-0032, Japan § School of Life Science and Technology, Tokyo Institute of Technology, 2 Chome-12-1, Ookayama, Meguro, Tokyo 152-8550, Japan Downloaded via UNIV OF SUSSEX on July 12, 2018 at 05:21:20 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.



S Supporting Information *

ABSTRACT: We propose a simple but efficient and accurate method to generate protein−ligand complex structures, called Concentrated ligand Docking (ColDock). This method consists of multiple independent molecular dynamics simulations in which ligands are initially distributed randomly around a protein at relatively high concentration (∼100 mM). This condition significantly increases the probability of the ligand exploring the protein surface, which induces spontaneous ligand binding to the correct binding sites within a 100 ns MD. After clustering of the protein-bound ligand poses, representatives of the populationally dominant clusters are considered as predicted ligand poses. We applied ColDock to four cases starting from holo protein structures and showed that ColDock can generate “correct” ligand poses very similar to the crystal complex structures. Correct ligand poses are also well reproduced in three out of four cases started from apo structures, with the exception being a case with an initially closed binding pocket. The results indicate that ColDock can be used as a protein−ligand docking as long as the ligand binding pocket is initially open. Plausible protein−ligand complex structures can be easily generated by conducting the ColDock procedure using standard MD simulation software.



repulsive forces between the probe molecules.4,8,11 Probe binding sites are predicted by comparing the densities of probe molecules around the protein to their density in the same solution but without protein. In one of the aforementioned studies,8 enhanced sampling simulation was performed to investigate the binding of small organic solvent as well as druglike ligands. The identified binding sites were used to improve protein−ligand docking.12,15,16 Baaden and co-workers successfully used “flooding” MD (MD with oversaturated ligands) to reveal a new extracellular binding site of bromoform in GLIC (Gloeobacter violaceus receptor).18 Ligand bindings to proteins using relatively long unguided MD simulations with multiple ligands have been also reported.19,20 For example, in a study of Src kinase,20 10-μs or longer MD simulations were conducted starting from the initial conditions in which 6 dasatinib molecules (cancer drug) or 2 PP1 ligands (kinase inhibitor) were placed at random locations. In the case of dasatinib, a weak repulsive force was applied to prevent possible ligand aggregation. Without any prior knowledge of the binding sites, binding events were successfully observed. In a study of GPCR with 10 ligands,19 1 μs or longer simulations

INTRODUCTION Protein−ligand docking is a computational method to predict the structure of the complex using physicochemical principles and is an essential tool for drug discovery. Consequently, many different methods have been developed and considerably improved over the past two decades.1,2 More accurate calculations can be attained by improving scoring functions and by explicit treatment of protein flexibility and solvent molecules,2 which can be achieved using all-atom molecular dynamics (MD) simulations. However, MD simulation is timeconsuming, and thus more efficient sampling methods are required. The key idea behind our approach is to conduct MD simulations of proteins surrounded by a high concentration of ligands. Based on the idea of multiple solvent crystal structures (MSCS),3 MD simulations in mixed solvent have recently been conducted for identifying binding sites or “hot spots”.4−17 For example, binding sites have been detected by conducting MD simulations of proteins in water containing homogeneously distributed small organic solvents (probes) such as isopropyl alcohol,5,6,13,15−17 acetonitrile,4,6,7,12,17 and benzene4,8,10,11,14,15 at relatively high concentration (from 0.15 to 10 M). The concentrated condition enhances the probability of probe binding to the protein. Probe aggregation caused by the concentrated condition can be prevented by introducing © XXXX American Chemical Society

Received: March 22, 2018 Revised: June 24, 2018

A

DOI: 10.1021/acs.jpcb.8b02756 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

This procedure permits the selection of multiple ligand poses from one MD snapshot. Each ligand pose is translated and rotated, so that the corresponding protein pose is superimposed onto a reference; then the ligand poses are clustered using the procedure of Daura et al.21 First, all the pair RMSDs (root-mean-square deviations) between the ligand poses are calculated. Second, the number of neighbors from each snapshot are counted using a defined cutoff (heavy atom RMSD of 2 Å); then the pose with the largest number of neighbors is selected as a cluster. This pose and its neighbors are considered as the cluster representative and cluster members, respectively. After excluding this cluster, the procedure is repeated for the remaining poses until no pose is left. The obtained clusters are rank-ordered by the cluster population; then the cluster representatives of the top clusters are selected as the predicted structures by ColDock. This treatment is done based on the expectation that the cluster population would be proportional to the probability of the binding pose occurrence, which should be equivalent to ranking by binding free energy. The number of poses is limited to a maximum of 40,000 to reduce the computational cost for clustering. Once the number exceeds this limit, the sampling frequency is halved. Protein−Ligand Complexes Studied. Three relatively small protein−ligand complexes were selected and examined in this study (Table 1). The first target was FK506 binding protein (FKBP), a member of a protein family exhibiting prolyl isomerase activity. FKBPs have been identified in many eukaryotes from yeast to human and function as chaperones for proteins containing proline residues. We selected human FKBP complexed with two ligands, dimethyl sulfoxide (DMS) and methyl sulfinyl-methyl sulfoxide (DSS).23 The effects of ligand concentration were investigated using 32 and 64 ligands for the FKBP/DMS complex. We also studied a complex comprising human plasminogen kringle 4 and ε-aminocaproic acid (ACA).24 The plasminogen kringle is a precursor of plasmin, a serine protease that acts to dissolve fibrin blood clots. The third target was the X-linked inhibitor of apoptosis protein that stops apoptotic cell death induced either by viral infection or by overproduction of caspases. We chose 4-(4bromo-1H-pyrazol-1-yl) piperidinium (BPP) as the ligand given the results of a fragment-based drug discovery investigation.25 This complex was selected because the holo and apo structures are significantly different around the ligand binding site and the BPP binding site is poorly exposed. Our intention was to use this test case to examine whether ColDock can generate a reasonable binding pose for a buried binding pocket. As the apo form of the protein, we used its average structure of the NMR structures,26 because nonaveraged individual NMR structures are not deposited in the Protein Data Bank and no other structure in the apo form is available. When evaluated using a Ramachandran plot, the initial structure has 8 residues in the generously allowed region and 2 residues in the disallowed region according to the criteria of PROCHECK27 whereas the holo structure does not have such residues. This implies that the apo structure might not be reliable. Table 1 lists the simulated systems, using the abbreviation for each ligand and showing the number of ligands. For example, FKBP with 32 DMS molecules was simulated in the run labeled DMS32. We examined the docking procedure by using the holo protein structures taken from the complex crystal structures for DMS32, DMS64, ACA64, and BPP64 as

resulted in spontaneous binding events observed in 21 of 82 MD simulations. These earlier studies clearly show the advantage of using concentrated ligands in MD simulations for protein−ligand docking. In the current study, we propose a protein−ligand docking procedure, which we call ColDock (Concentrated ligand Docking). In this method, ligand molecules are randomly distributed around a protein at high concentration (∼100 mM), then multiple independent MD simulations with different initial ligand distributions are performed. Repulsive force between the ligands is imposed to prevent ligand aggregation, similar to previous studies.4,8,11,20 We show that the ligands efficiently explore the protein surface and spontaneously bind to the expected positions within 100 ns and a ligand binding pose very similar to the crystal complex can be obtained as a representative of the most populated cluster of ligand distributions around the target protein.



METHOD ColDock Procedure. The ColDock procedure is shown in Figure 1. Given a pair comprising a protein and a ligand,

Figure 1. Procedure for conducting ColDock simulation. Structures were drawn using VMD.22

multiple ligands are randomly placed within 20 Å of the protein atoms. As many protein−ligand complex structures as possible are generated by performing 10 independent 100 ns MD simulations using 10 different initial distributions of the ligands. Among the MD-generated snapshots, the ligands in contact with Nthr (6 by default) or more protein residues are considered to be bound to the protein and are used for clustering. A contact is defined if the shortest heavy atom distance between the protein and the ligand is less than 5 Å. B

DOI: 10.1021/acs.jpcb.8b02756 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B Table 1. Protein−Ligand Complexes Studied Label

Protein

Ligand

PDB IDa

Holo-apo RMSD (Å)

Mwb (g/mol)

# ligands

Conc (mM)c

Box length (Å)d

1D7H /1D6O 1D7I /1D6O 2PK4 /1PMK 5C0L /1F9X 1CEB /1PMK 1FKJ /1D6O

0.1

78.1

0.2

124.2

32 64 64

66.3 132.1 132.1

92.9 93.0 93.0

0.7

131.2

64

151.8

88.8

3.6

231.14

0.9

157.21

64 96 64

106.3 159.5 147.2

100.0 100.0 89.7

0.5

804.02

60

103.2

98.8

DMS32 DMS64 DSS64

FKBP

DMS

FKBP

DSS

ACA64

Human plasminogen kringle 4

ACA

BPP64 BPP96 AMH64

X-linked inhibitor of apoptosis protein

BPP

Human plasminogen kringle 4

AMH

FK60

FKBP

FK

a

PDB ID of the holo/apo protein. bMolecular weight of the ligand. cConcentrations of the ligand. dAverage simulation box length after MD simulations.

the initial protein structures; then, the same procedure was applied to docking started from the apo structures for DMS64, DSS64, ACA64, and BPP64. We found that DMS64 was more efficient than DMS32 (see below). Thus, we did not examine DMS32 for the apo protein structure. Instead, we added another ligand, DSS, to test our procedure for the apo protein structure. The box length and ligand concentration for each simulated system are also listed in Table 1. A simulation with 64 ligand molecules represents a concentration of 100 mM or higher. In each case, ten independent 100 ns MD simulations were performed. Since the prediction for the poorly exposed binding site (BPP/apo) was found to be difficult (see Results and Discussion), we also examined BPP96 in which ligand concentration was increased up to 159.6 mM. Using these target complexes, we tested our approach. Next, we examined the applicability of ColDock using two additional targets below. First, we investigated the effects of mutations on transaminomethyl-cyclohexanoic acid (AMH) binding to human plasminogen kringle 4. We examined the wild-type protein (WT) and mutants, K35I, D55N, D57N, and R71Q. The selected residues are situated in the binding site and were reported to change binding affinities.28 The structures of the mutants were modeled using MODELLER 9.19.29 Since the crystal structure comprising plasminogen kringle4 and AMH is not available, we used the complex structure of AMH with human plasminogen kringle 130 as a reference to calculate L-RMSD and f nat. The sequence identity of kringle 1 with kringle 4 is 40%. Second, we also applied ColDock to a bulky and flexible ligand. For this purpose, we examined FKBP in complex with FK506. We distributed 60 FK506 molecules to achieve a concertation of ∼100 mM (FK60 in Table 1). Simulation Details. Ligand aggregation was prevented during MD simulation by imposing repulsive force ELJ (Figure 2a) in the form of a Lennard-Jones potential between ligand molecules, essentially as described previously,4,8,11,20

Figure 2. (a) Lennard-Jones force f(r) = −∂ELJ(r)/∂r between ligands as a function of interligand distance r. Inset shows the same data for larger r. (b) Radial distribution functions g(r) between ligands. DMS32 and DMS64 with and without the repulsive force. (c) g(r) of ACA and BPP with the repulsive force. g(r) was calculated between the atoms indicated by arrows. The center of mass of the atoms shown in magenta was used for calculation of the repulsive force.

repulsive force becomes significantly large if two ligands approach within 10 Å (Figure 2a). This repulsive force results in a wide separation distance between the ligands as indicated by the radial distribution function g(r) (Figure 2b and Figure 2c). The flat value of g(r) around unity at r > 10 Å indicates a uniform distribution of ligands during the MD simulation at this distance range. In contrast, the ligands can aggregate without this force (Figure 2b). For AMH, the center of mass of the non-hydrogen atoms was used for distance calculations. We used four repulsive centers for each FK506 because this molecule is relatively large, flexible, and nonspherical (see Results and Discussion for details). All the MD simulations were performed using Gromacs 5.1.2.31,32 The system was brought to thermodynamic equilibrium at 300 K and 1 atm using the Nosé−Hoover thermostat and the Parrinello−Rahman barostat. The equa-

E LJ (r ) = 4ε[(σ /r )12 − (σ /r )6 ]

where ε = 10−3 kJ/mol, σ = 20 Å, and r is the distance between the centers of mass of the ligands as shown in magenta in Figure 2b and Figure 2c. The center of mass of the sulfur and carbon atoms in DSS, except for the methyl carbon in the methyl sulfide group was used for distance calculations. The extremely small value of ε results in a negligibly small attractive force compared to thermal noise (inset of Figure 2a) and the C

DOI: 10.1021/acs.jpcb.8b02756 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

Figure 3. Probabilities of ligand contact per residue in (a) DMS32, (b), DMS64, (c) ACA64, and (d) BPP64. Gray, pink, orange, and red show the ranges of the probabilities, 0−20, 20−40, 40−60, and 60−100%, respectively. The crystal ligand poses are shown in green. Structures were drawn using VMD.22

tions of motion were integrated with a time step of 2 fs. Longrange Coulomb energy was evaluated using the particle mesh Ewald (PME) method. The FF14SB force field33 was used for proteins, and the force field for ligands was generated by the general AMBER force field (GAFF).34 The SPC/Eb water model35 with 150 mM KCl was used as the solvent.



RESULTS AND DISCUSSION Ligand Binding Sites. We examined the effects of ligand concentration by first performing ten independent 100 ns MD simulations of DMS32 and DMS64 starting from the holo protein structure. In DMS32, DMS bound to the “correct” position, very similar to that of the crystal complex structure in 5 out of 10 trials, and in 9 out of 10 cases in DMS64, indicating that the probability of occurrence of correct binding is nearly proportional to the ligand concentration. We judged that a higher ligand concentration is more suitable for observing ligand binding in each trial and therefore used 64 ligand molecules in the other simulations since simulating ligand binding using lower concentration would require longer MD runs, which is not efficient computationally. Ligand binding to several distinct positions was observed during the MD simulations. The probabilities of ligand contact with protein residues for the holo protein structures are shown in Figure 3. The residue was regarded to be in contact if any interheavy atom distance between the protein and the ligand was 5 Å or less. As expected, the residues with high contact probability are located around the crystal ligand pose. Nthr (the number of ligand contacts with residues: see Methods) is introduced as the threshold to select the ligand pose in contact with protein. The optimum value of Nthr was determined by calculating distributions of the number of residue contacts (NRC) as shown in Figure 4. We also examined the distributions only for near-native (NN) ligands, in which the

Figure 4. Distribution of the number of residue contacts (NRC). (a) DMS32 and DMS64; (b) ACA64 and BPP64. NN represents nearnative poses only.

ligand RMSD (L-RMSD), defined as the RMSD of ligand heavy atoms when the simulated and crystal holo protein structures are superposed, is less than 5 Å. Since the distributions for the NN ligands in the range 1 ≤ NRC ≤ 6 were significantly lower than the other regions in all cases, we chose Nthr = 6. The effects of Nthr on the results are discussed below. Top Rank Ligand Pose Identified by ColDock. The best poses obtained as the representative of the most populated cluster are listed in Table 2. For all cases that started from holo protein structures, the best poses were very similar to those in the crystal, as indicated by the small L-RMSD and the formation of full native contacts (fraction of native contacts f nat = 1.0 for all cases). It should be noted that the population of the first cluster (Pclst) is high in each case. Although good binding poses were also obtained with lower ligand concentration (DMS32:66 mM), a higher concentration aids more efficient computation, as already discussed. D

DOI: 10.1021/acs.jpcb.8b02756 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

simulation length did not significantly improve the results although L-RMSD and f nat were slightly better in the case of BPP64. In some trajectories of BPP64 and BPP96, the ligands entered the binding pocket, however, did not form the correct interactions with the protein. The reason for the failure of the prediction could be due to the difficulty of the pocket opening, but might be also related to the use of the average NMR structure as the initial structure as mentioned in the Method. The representative structures of the three most populated clusters are shown in Figure S1. The second and third clusters are usually situated on protein surfaces other than the binding interface. Although binding probabilities on such surfaces can be relatively high, ligand binding is transient, more flexible, and less stable, and the ligand can take different orientations. Since we adopted the clustering using RMSD, the ligands at the similar positions with different orientations or conformations were classified into the distinct clusters. Thus, although many ligands made contacts with the protein surface residues (Figure 3c and 3d), the populations of the clusters at the other protein surfaces were not higher than that in the correct binding site. Overall, in 7 out of 8 cases, ColDock successfully predicted the complex structures with an accuracy of L-RMSD < 1.6 Å and f nat = 1.0. Comparison with AutoDock Vina. We compared the ligand docking poses obtained with apo structures using ColDock with those obtained by AutoDock Vina.36 The ColDock-generated poses were rank-ordered based on the cluster population whereas those in AutoDock Vina were ordered by the binding free energy (Table 3). The crystal structures, and top-ranked poses from ColDock and AutoDock Vina are depicted in Figure 6. ColDock outperformed AutoDock Vina in terms of LRMSD for three complexes (DMS64, DSS64, and ACA64). For DMS64, ColDock selected a very low L -RMSD pose (0.4 Å, see Figure 6a) as the most dominant cluster (43.9%) compared to the subsequent two clusters (both 6.7%). In

Table 2. Prediction Results Obtained Using ColDock Initial protein structure Holo

Apo

Label DMS32 DMS64 ACA64 BPP64 DMS64 DSS64 ACA64 BPP64 BPP64 (200 ns) BPP96 BPP96 (200 ns)

Pclst (%)a 48.2 41.1 10.9 20.9 43.9 25.7 9.7 2.4 1.7

(4591) (7839) (2719) (7869) (7980) (5748) (2804) (588) (666)

2.3 (766) 2.0 (737)

L-RMSD (Å)b

f natc

0.4 0.3 1.3 0.4 0.4 0.4 1.6 25.8 7.3

1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 0.4

20.0 20.7

0.0 0.0

Population of the first cluster and the number of snapshots in the cluster (values in parentheses). bL-RMSD of the ligand heavy atoms after superposing the protein structure onto the crystal structure. c Fraction of native contacts. a

The correct pose was predicted successfully using the apo structures except for BPP64, which requires a pocket opening for proper ligand binding. The very small L-RMSD values for DMS64 and DSS64 indicate that both the locations and the orientations of the ligands were correctly predicted. Also, ACA was located essentially in the correct translational position but its orientation was slightly different (see below for the predicted structures). ColDock failed to predict the pose in BPP64, in which protein structures in the apo and holo forms are significantly different (Cα RMSD: 3.6 Å). As shown in Figure 5a, the ligand binding pocket was closed in the apo

Table 3. Comparison with AutoDock Vina ColDock

Figure 5. Structures of the X-linked inhibitor of apoptosis protein in the crystal (a) holo (light-green) and apo (yellow) forms. Darker regions (green and orange) indicate the flexible loop near the binding pocket. BPP is shown in green. (b) Snapshot of the nearest-native ligands (pink carbon atoms) observed in the simulations. The BPP structure in the crystal (cyan carbon atoms) is shown again as a reference. Structures were drawn using VMD.22

Label

Ranka

Pclst (%)c

DMS64

1 2 3 Bestb 1 2 3 Bestb 1 2 3 Bestb 1 2 3 Bestb

43.9 (7980) 6.7 (1227) 6.7 (1224)

DSS64

ACA64

form due to a structural difference in a loop compared to the holo form and did not fully open during the 100 ns MD simulations preventing ligand binding in the correct position. However, the pocket was partially open in some trajectories, which enabled ligand binding near the pocket. The smallest LRMSD was 2.9 Å (Figure 5b). For this target, we also examined BPP96, in which ligand concentration was increased up to 159.6 mM, but the result was not improved (Table 2). We further conducted additional 100 ns MD simulations for BPP64 and BPP96 (200 ns in total). The extension of

BPP64

25.7 4.9 3.3 9.7 4.2 3.3

(5748) (1085) (741) (2804) (1229) (942)

2.4 (588) 2.3 (571) 2.2 (546)

AutoDock Vina

L-RMSD (Å) 0.4 16.5 7.7 0.4 0.4 18.2 18.0 0.4 1.6 14.3 14.0 1.6 25.8 20.2 16.8 3.6

(1)

(1)

(1)

(469)

L-RMSD (Å) 21.6 2.0 22.0 1.9 3.4 2.1 2.5 2.1 1.9 3.5 2.6 1.9 4.3 22.9 4.9 4.3

(6)

(2)

(1)

(1)

a

Ligand poses were rank-ordered based on the cluster population in ColDock and the binding free energy in Autodock Vina. bThe best LRMSD of all the predicted structures (numbers in parentheses indicate the rank of the best pose). cPopulation of the cluster and the number of snapshots in the cluster (values in parentheses).

E

DOI: 10.1021/acs.jpcb.8b02756 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

binding pose for BPP64; although the best pose in ColDock (L-RMSD: 2.9 Å) was better, the L-RMSD of the top-ranked pose (4.3 Å) was much better in AutoDock Vina. Unlike the other targets in which the first poses were dominant, the populations of the first three clusters (2.4−2.2%) were comparable for BPP64, suggesting that the population differences between the clusters are not significant for BPP64. Since L-RMSD tends to be lower if the population of the first cluster is dominant, the population of the top clusters might be a good indicator for judging whether ColDock works properly. In such case, binding energy should also be considered in the ranking. The simulation time dependence of the population and L-RMSD obtained from the three most populated clusters is shown in Figure S2. We observed the increase of the population of the first cluster for the successful predictions because the ligand bound to the correct position stayed at the same position, which in turn resulted in the increase of the population. We did not observe such increase for apo BPP64. Thus, we confirmed that the population of the first cluster is a good indicator for monitoring the convergence and accuracy of the predictions. Effect of Nthr. All the results described above were obtained with Nthr = 6, determined from the distribution of the number of residue contacts for near-native poses. We also examined Nthr = 1 and 0 (Table 4). In the former, all the ligand poses in contact with the protein were considered in the clustering whereas all the poses were analyzed regardless of protein contacts in the latter. To reduce the number of poses, we selected ligand poses every 2 ns (every 1 ns for DMS 32) in the case of Nthr = 0 and used 32,000 poses for the clustering. Although the populations of the clusters drastically decreased, the selected top rank poses did not significantly depend on the choice of Nthr, which indicates that prediction performance is not affected by the choice of the Nthr value. From the practical standpoint, the use of a certain threshold (e.g., Nthr = 6 as used in this study) is recommended because this can reduce the computational time required for clustering. Possible Improvements for Proteins with Closed Pocket. The simulation of BPP64 using ColDock did not generate good poses similar to the crystal structure because the ligand binding pocket was not fully open. The simulation of BPP64 using ColDock did not generate good poses similar to the crystal structure because the ligand binding pocket was not fully open. The use of the repulsive force has two aspects: aggregation of the ligands can be prevented but interactions between the ligands cannot be fully considered. For example, ligands cannot enter the binding site at the same time, which might be related to the failure to open the pocket. One

Figure 6. Predicted and crystal structures of ligands in (a) DMS64, (b) DSS64, (c) ACA64, and (d) BPP64. Ligands shown with cyan, pink, and green carbon atoms represent the crystal, and top ColDock pose and top AutoDock Vina poses, respectively. Structures were drawn using VMD.22

contrast, AutoDock Vina predicted a highly shifted binding position for the first pose, and the lowest L-RMSD pose (2.0 Å) was obtained as the second rank. Similar to DMS64, ColDock generated a very low L-RMSD pose (0.4 Å) as the most dominant cluster (25.7%) for DSS64. The AutoDock prediction for DSS64 is acceptable because the L-RMSD of the first rank was 3.4 Å. As shown in Figure 6b, both ColDock and AutoDock predicted binding to the correct binding pocket. ColDock also predicted the orientation of the ligand correctly while the orientation obtained by AutoDock Vina is slightly different from that of the native ligand. For ACA64, the poses predicted by both methods were located near the correct position but showed different orientations from that of the native ligand, with L-RMSDs of the structures predicted by ColCock and AutoDock Vina of 1.6 and 1.9 Å, respectively. As shown in Figure 3c, ACA made contact with many protein surface residues. Unlike DMS64, the ligand bindings did not concentrate on the correct position, resulting in a lower cluster population of the first cluster (9.7%) compared to the previous two cases. Finally, both methods failed to generate a correct

Table 4. Prediction Results Obtained Using ColDock with Different Values of Nthr Nthr = 6 Holo/Apo

Label

Holo

DMS32 DMS64 ACA64 BPP64 DMS64 DSS64 ACA64 BPP64

Apo

Pclst (%) 48.2 41.1 10.9 20.9 43.9 25.7 9.7 2.4

a

(4591) (7839) (2719) (7869) (7980) (5748) (2804) (588)

Nthr = 1 Pclst (%)

L-RMSD (Å) 0.4 0.3 1.3 0.4 0.4 0.4 1.6 25.8

12.9 10.6 3.7 5.2 10.8 9.4 3.7 0.7

a

(4635) (3921) (1387) (1965) (3998) (2853) (1426) (249)

Nthr = 0 L-RMSD (Å) 0.4 0.3 1.7 0.5 0.4 0.5 1.6 19.7

Pclst (%) 0.7 0.6 0.4 0.6 0.6 0.9 0.5 0.1

a

(229) (192) (139) (192) (200) (279) (150) (36)

L-RMSD (Å) 0.5 0.4 1.7 0.7 0.3 0.4 1.8 20.0

a

Population of the cluster and the number of snapshots in it (values in the parentheses). F

DOI: 10.1021/acs.jpcb.8b02756 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B Table 5. Effects of Mutation on ColDock Results Mutation

ΔGexpa(kcal/mol)

NBindb

NUnbindc

WT K35I D55N D57N R71Q

−6.53 −6.96 −5.05

4 8 4 4 2

0 0 0 1 0

−4.93

Pclst (%) 32.0 55.1 24.7 26.5 10.3

(5704) (11476) (3370) (4878) (1767)

L-RMSD (Å)

f natc

1.2 1.5 1.4 1.4 2.5

0.78 0.78 0.78 0.78 0.78

a

Experimental binding free energies.28 Binding is not detected for D57N. bThe number of binding events. cThe number of unbinding events. See text for definitions of binding and unbinding events.

group of the ligand. K35 interacted with D55 and D57 in the apo form, and might compete with the ligand amino group interaction with D55/D57. These results can explain the increased affinity of K35I and decreased affinities of D55N, D57N, and R71Q. From the observed results, we concluded that the effects of mutations on ligand bindings can be investigated using ColDock. Application to Bulky Ligand. As a challenge of ColDock to a larger ligand, we also examined FK506 binding to FKBP. Since FK506 is flexible, nonspherical, and much larger than the other ligands, the repulsive force using one interaction site per ligand molecule could not prevent aggregation of FK506 (Figure S4). We therefore introduced four interaction sites for each ligand molecule as shown in Figure S4 and successfully avoided aggregation of FK506. Considering that the size of the ligand is much larger, we used Nthr = 15 and conducted 20 independent 200 ns MD simulations for this target. The ligands entered the binding site in 9 out of 20 cases. If we only consider first 100 ns trajectories, this number reduces to 7 cases. These results suggest that longer MD and more trials should be used for relatively large ligands. In practice, default MD length (100 ns) and the number of independent MD (10) can be tried first, and then extension of the MD length and additional trials can be considered depending on the result. We found three significantly populated clusters in the binding site (Figure 7). The populations of these clusters were 26% (LRMSD: 8.6 Å, f nat: 0.87), 18% (1.2 Å, 1.00), and 18% (4.9 Å, 0.87). These three structures reproduced native contacts very well (>0.87), and the second-ranked pose showed the lowest L-RMSD among them. Figure 7d shows the pathway of FK506 binding, indicating that FK506 first bound to the third-ranked site (purple) before deep binding to the second-ranked pose (orange). Among nine trajectories in which ligands entered the binding site, 5, 1, and 3 cases included structures similar to the first-, second-, and third-ranked poses, respectively. Although only one trajectory reached to the pose very similar to the crystal structure, the population of the cluster was relatively high due to the stability of the pose. Overall, ColDock generated three dominant poses which were correct as the binding site and reproduced most of the native contacts, and one of them was very similar to the ligand pose in the crystal structures. Therefore, we would expect that ColDock can be applied to bulky ligands. We used different parameters for this target; 200 ns MD simulation (default: 100 ns), 20 independent runs (10), Nthr = 15 (6), and 4 repulsive centers (1) although ColDock with the default parameters worked well for all small ligands examined. For larger ligands whose end-to-end distances are greater than 8 Å, we suggest to determine the ColDock parameters as follows. First, the repulsive centers should be distributed so that all the atoms of the ligand are situated within 8 Å from one of the centers and the ligand aggregation does not occur.

possible way of improving ColDock results for proteins with closed ligand binding pockets is to use multiple initial protein structures. The use of initial structures containing variations around the pocket, including a more open pocket, might improve docking performance. For example, previous studies showed that ligand binding pockets hidden in the initial structure can be detected by analyzing the trajectories of MD simulations.37,38 This approach shares similarity with ensemble docking,39−41 in which ligands are docked to ensembles of protein conformations. Another possible choice is the combination with mixed solvent MD simulations4−17 since mixed solvent MD simulations can induce known cryptic pockets to open.8 Novel ligand binding sites detected by mixed solvent MD simulation were confirmed by biophysical assays and X-ray crystallography.14 The conformations generated by mixed solvent MD improved the results of subsequent ensemble dockings.15,16 The placement of virtual atoms into the pocket will be useful to keep the generated pocket open, such as in the example of using a balloon potential,42 in which Lennard-Jones (LJ) spheres are initially placed in possible binding pockets and then the van der Waals radii of the LJ spheres are gradually increased. Effect of Mutations on Ligand Binding. As an application of ColDock, we investigated the effects of mutations on the complex of plasminogen kringle 4 with AMH. An experimental study reported that K35I increased the binding affinity, whereas D55N and R71Q decreased the binding affinity, and the binding was not detected for D57N28. During the MD, we considered that the ligand binds to the protein if the L-RMSD is lower than 5 Å for longer than 10 ns and unbinding occurs if the L-RMSD of the bound ligand becomes higher than 10 Å. Using these definitions, we counted the numbers of ligand binding (NBind) and unbinding (NUnbind) as summarized in Table 5. We detected bindings of all mutants including D57N, for which the binding was not detected experimentally. The binding to the D57N mutant might be related to the high concentration used in ColDock compared to the experimental condition (2.5 mM).28 Interestingly, only one unbinding event among 50 MD simulations was observed in the case of D57N, which is consistent with the weakest binding to D57N. ColDock accurately predicted the ligand pose for WT (L-RMSD: 1.2 Å. See Table 5 and Figure S3). Slightly higher L-RMSD and lower f nat compared to the results obtained from DMS or DSS may be due to the use of homologous protein as the reference structure as mentioned in the Method. The populations of the first cluster anticorrelated well with the reported binding free energies because the population is expected to be proportional to the binding probability and to be anticorrelated with the binding free energies. In the WT holo structure, D55 and D57 stabilized the binding through electrostatic interactions with the amino group of the ligand, while R71 interacted with the carbonyl G

DOI: 10.1021/acs.jpcb.8b02756 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

longer MD length may be needed if complete binding of the ligand takes a longer time. Third, the current procedure may not be suitable if multiple ligands should enter the binding pocket at a time, because the use of repulsive force might prevent multiple ligand entry to the pocket. Finally, the current procedure of ColDock is not necessarily efficient to open initially closed binding pockets as we showed in the case of BPP64. As discussed above, the combination of ColDock with a pocket expansion technique is required for such targets and is currently under investigation. Since ColDock uses standard MD simulation with an extra repulsive force, it can be easily combined with any method. In a ColDock simulation, ligands are distributed randomly in solution in the initial condition, enabling us to observe ligand binding pathways by analyzing the MD trajectories. Although the concentrated ligand condition might produce some artifacts, the pathways can provide useful information for understanding the binding mechanism. This application of ColDock is also currently under investigation in our laboratory.



ASSOCIATED CONTENT

* Supporting Information S

Figure 7. (a−c) Predicted and crystal structures of the FKBP/FK506 complex. Ligands shown with cyan, pink, orange, and purple carbon atoms represent the crystal, first-, second-, and third-ranked ColDock poses, respectively. In (d), lines represent the center-of-mass trajectory to reach the second-ranked ColDock pose, and pink, orange, and purple spheres indicate center-of-mass positions of the first-, the second-, and the third-ranked ColDock poses, respectively.

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jpcb.8b02756.

This can be confirmed by the MD simulation of the ligands in solution without a protein. Second, calculate the distribution of NRC and use Nthr = 0.8 max(NRC) for the first trial. Third, examine the convergence of the cluster populations and search optimal values of Nthr and MD length (default: 200 ns). Finally, conduct the first set of calculation with 20 independent MDs. If the obtained ligand poses are not stable, add 10 more MD runs to generate other poses.



The representative structures of the top three clusters (DMS32, DMS64, DSS64, ACA64, and BPP64), the simulation time dependence of L-RMSD, and the population of the first three clusters (DMS32, DMS64, DSS64, ACA64, and BPP64), the predicted structures of AMH64, and the radial distribution functions of FK506, and full author lists of references 14 and 25 (PDF)

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Phone: +81-3-5734-3373. ORCID



Akio Kitao: 0000-0002-5221-0806

CONCLUSION We have proposed a protein−ligand docking procedure, ColDock, based on all-atom MD simulations. We used concentrated ligand conditions to successfully enhance the probability of ligand binding and observed binding to the correct pose within relatively short (100 ns) MD simulations. We applied ColDock to four cases starting from holo structures and showed that ColDock can generate correct binding poses accurately. We then examined four cases starting from apo structures. Except for one difficult case with an initially closed binding pocket, correct binding poses were well reproduced, suggesting that ColDock can be used as a protein−ligand docking method as long as the ligand binding pocket is open. Besides, the effects of mutations on ligand binding affinities were successfully observed using ColDock. Furthermore, ColDock can be applied to systems comprising bulky ligands. This procedure can be easily conducted with standard MD simulation software, only requiring the application of an extra repulsive force between the ligand molecules. Although we would expect that ColDock can be applied to other systems, there should be possible limitations as follows. First, ColDock may require a more computational cost in the case of larger ligands because a larger number of MD trials might be needed as in the case of FKBP/FK506. Second,

Present Addresses #

K. Takemura: School of Life Science and Technology, Tokyo Institute of Technology , 2 Chome-12-1 , Ookayama, Meguro, Tokyo 152-8550 , Japan ∞ C. Sato: Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo , 1-1-1 Yayoi, Bunkyo, Tokyo 113-0032, Japan Author Contributions ∥

Equal contribution.

Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This research was supported by MEXT/JSPS KAKENHI (No. 25104002 and 15H04357) to A.K. and by MEXT as “Priority Issue on Post-K Computer” (Building Innovative Drug Discovery Infrastructure through Functional Control of Biomolecular Systems) to A.K. The computations were partly performed using the supercomputers at the RCCS, The National Institute of Natural Science, and ISSP, The University of Tokyo. This research also used computational resources of the K computer provided by the RIKEN Advanced Institute for Computational Science through the HPCI System H

DOI: 10.1021/acs.jpcb.8b02756 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

(20) Shan, Y.; Kim, E. T.; Eastwood, M. P.; Dror, R. O.; Seeliger, M. A.; Shaw, D. E. How does a drug molecule find its target binding site? J. Am. Chem. Soc. 2011, 133 (24), 9181−9183. (21) Daura, X.; Gademann, K.; Jaun, B.; Seebach, D.; van Gunsteren, W. F.; Mark, A. E. Peptide folding: When simulation meets experiment. Angew. Chem., Int. Ed. 1999, 38 (1−2), 236−240. (22) Humphrey, W.; Dalke, A.; Schulten, K. Vmd: Visual molecular dynamics. J. Mol. Graphics 1996, 14 (1), 33−38. (23) Burkhard, P.; Taylor, P.; Walkinshaw, M. D. X-ray structures of small ligand-fkbp complexes provide an estimate for hydrophobic interaction energies1. J. Mol. Biol. 2000, 295 (4), 953−962. (24) Wu, T. P.; Padmanabhan, K.; Tulinsky, A.; Mulichak, A. M. The refined structure of the.Epsilon.-aminocaproic acid complex of human plasminogen kringle 4. Biochemistry 1991, 30 (43), 10589− 10594. (25) Chessari, G.; et al. Fragment-based drug discovery targeting inhibitor of apoptosis proteins: Discovery of a non-alanine lead series with dual activity against ciap1 and xiap. J. Med. Chem. 2015, 58 (16), 6574−6588. (26) Sun, C.; Cai, M.; Meadows, R. P.; Xu, N.; Gunasekera, A. H.; Herrmann, J.; Wu, J. C.; Fesik, S. W. Nmr structure and mutagenesis of the third bir domain of the inhibitor of apoptosis protein xiap. J. Biol. Chem. 2000, 275 (43), 33777−81. (27) Laskowski, R. A.; Macarthur, M. W.; Moss, D. S.; Thornton, J. M. Procheck - a program to check the stereochemical quality of protein structures. J. Appl. Crystallogr. 1993, 26, 283−291. (28) Graversen, J. H.; Sigurskjold, B. W.; Thogersen, H. C.; Etzerodt, M. Tetranectin-binding site on plasminogen kringle 4 involves the lysine-binding pocket and at least one additional amino acid residue. Biochemistry 2000, 39 (25), 7414−9. (29) Sali, A.; Blundell, T. L. Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol. 1993, 234 (3), 779−815. (30) Mathews, II; Vanderhoff-Hanaver, P.; Castellino, F. J.; Tulinsky, A. Crystal structures of the recombinant kringle 1 domain of human plasminogen in complexes with the ligands epsilonaminocaproic acid and trans-4-(aminomethyl)cyclohexane-1-carboxylic acid. Biochemistry 1996, 35 (8), 2567−76. (31) Berendsen, H. J. C.; van der Spoel, D.; van Drunen, R. Gromacs: A message-passing parallel molecular dynamics implementation. Comput. Phys. Commun. 1995, 91 (1), 43−56. (32) Abraham, M. J.; Murtola, T.; Schulz, R.; Páll, S.; Smith, J. C.; Hess, B.; Lindahl, E. Gromacs: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 2015, 1−2, 19−25. (33) Maier, J. A.; Martinez, C.; Kasavajhala, K.; Wickstrom, L.; Hauser, K. E.; Simmerling, C. Ff14sb: Improving the accuracy of protein side chain and backbone parameters from ff99sb. J. Chem. Theory Comput. 2015, 11 (8), 3696−3713. (34) Wang, J.; Wolf, R. M.; Caldwell, J. W.; Kollman, P. A.; Case, D. A. Development and testing of a general amber force field. J. Comput. Chem. 2004, 25 (9), 1157−1174. (35) Takemura, K.; Kitao, A. Water model tuning for improved reproduction of rotational diffusion and nmr spectral density. J. Phys. Chem. B 2012, 116 (22), 6279−6287. (36) Trott, O.; Olson, A. J. Autodock vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 2010, 31 (2), 455−461. (37) Schmidtke, P.; Bidon-Chanal, A.; Luque, F. J.; Barril, X. Mdpocket: Open-source cavity detection and characterization on molecular dynamics trajectories. Bioinformatics 2011, 27 (23), 3276− 85. (38) Asamitsu, K.; Hirokawa, T.; Okamoto, T. Md simulation of the tat/cyclin t1/cdk9 complex revealing the hidden catalytic cavity within the cdk9 molecule upon tat binding. PLoS One 2017, 12 (2), No. e0171727. (39) Knegtel, R. M.; Kuntz, I. D.; Oshiro, C. M. Molecular docking to ensembles of protein structures. J. Mol. Biol. 1997, 266 (2), 424− 40.

Research project (Project ID: hp140031, hp150049, hp150270, hp160207, hp170254, and hp180201).



REFERENCES

(1) Sousa, S. F.; Fernandes, P. A.; Ramos, M. J. Protein−ligand docking: Current status and future challenges. Proteins: Struct., Funct., Bioinf. 2006, 65 (1), 15−26. (2) Pagadala, N. S.; Syed, K.; Tuszynski, J. Software for molecular docking: A review. Biophys. Rev. 2017, 9 (2), 91−102. (3) Allen, K. N.; Bellamacina, C. R.; Ding, X. C.; Jeffery, C. J.; Mattos, C.; Petsko, G. A.; Ringe, D. An experimental approach to mapping the binding surfaces of crystalline proteins. J. Phys. Chem. 1996, 100 (7), 2605−2611. (4) Foster, T. J.; MacKerell, A. D.; Guvench, O. Balancing target flexibility and target denaturation in computational fragment-based inhibitor discovery. J. Comput. Chem. 2012, 33 (23), 1880−1891. (5) Seco, J.; Luque, F. J.; Barril, X. Binding site detection and druggability index from first principles. J. Med. Chem. 2009, 52 (8), 2363−71. (6) Ung, P. M. U.; Ghanakota, P.; Graham, S. E.; Lexa, K. W.; Carlson, H. A. Identifying binding hot spots on protein surfaces by mixed-solvent molecular dynamics: Hiv-1 protease as a test case. Biopolymers 2016, 105 (1), 21−34. (7) Lexa, K. W.; Carlson, H. A. Full protein flexibility is essential for proper hot-spot mapping. J. Am. Chem. Soc. 2011, 133 (2), 200−202. (8) Oleinikovas, V.; Saladino, G.; Cossins, B. P.; Gervasio, F. L. Understanding cryptic pocket formation in protein targets by enhanced sampling simulations. J. Am. Chem. Soc. 2016, 138 (43), 14257−14263. (9) Tan, Y. S.; Spring, D. R.; Abell, C.; Verma, C. The use of chlorobenzene as a probe molecule in molecular dynamics simulations. J. Chem. Inf. Model. 2014, 54 (7), 1821−1827. (10) Tan, Y. S.; Sledz, P.; Lang, S.; Stubbs, C. J.; Spring, D. R.; Abell, C.; Best, R. B. Using ligand-mapping simulations to design a ligand selectively targeting a cryptic surface pocket of polo-like kinase 1. Angew. Chem., Int. Ed. 2012, 51 (40), 10078−81. (11) Guvench, O.; MacKerell, A. D., Jr Computational fragmentbased binding site identification by ligand competitive saturation. PLoS Comput. Biol. 2009, 5 (7), No. e1000435. (12) Arcon, J. P.; Defelipe, L. A.; Modenutti, C. P.; Lopez, E. D.; Alvarez-Garcia, D.; Barril, X.; Turjanski, A. G.; Marti, M. A. Molecular dynamics in mixed solvents reveals protein-ligand interactions, improves docking, and allows accurate binding free energy predictions. J. Chem. Inf. Model. 2017, 57 (4), 846−863. (13) Bakan, A.; Nevins, N.; Lakdawala, A. S.; Bahar, I. Druggability assessment of allosteric proteins by dynamics simulations in the presence of probe molecules. J. Chem. Theory Comput. 2012, 8 (7), 2435−2447. (14) Tan, Y. S.; et al. Benzene probes in molecular dynamics simulations reveal novel binding sites for ligand design. J. Phys. Chem. Lett. 2016, 7 (17), 3452−3457. (15) Uehara, S.; Tanaka, S. Cosolvent-based molecular dynamics for ensemble docking: Practical method for generating druggable protein conformations. J. Chem. Inf. Model. 2017, 57 (4), 742−756. (16) Kalenkiewicz, A.; Grant, B. J.; Yang, C. Y. Enrichment of druggable conformations from apo protein structures using cosolventaccelerated molecular dynamics. Biology (Basel, Switz.) 2015, 4 (2), 344−66. (17) Ghanakota, P.; Carlson, H. A. Moving beyond active-site detection: Mixmd applied to allosteric systems. J. Phys. Chem. B 2016, 120 (33), 8685−95. (18) Laurent, B.; Murail, S.; Shahsavar, A.; Sauguet, L.; Delarue, M.; Baaden, M. Sites of anesthetic inhibitory action on a cationic ligandgated ion channel. Structure 2016, 24 (4), 595−605. (19) Dror, R. O.; Pan, A. C.; Arlow, D. H.; Borhani, D. W.; Maragakis, P.; Shan, Y.; Xu, H.; Shaw, D. E. Pathway and mechanism of drug binding to g-protein-coupled receptors. Proc. Natl. Acad. Sci. U. S. A. 2011, 108 (32), 13118−13123. I

DOI: 10.1021/acs.jpcb.8b02756 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B (40) Korb, O.; Olsson, T. S.; Bowden, S. J.; Hall, R. J.; Verdonk, M. L.; Liebeschuetz, J. W.; Cole, J. C. Potential and limitations of ensemble docking. J. Chem. Inf. Model. 2012, 52 (5), 1262−74. (41) Araki, M.; Kamiya, N.; Sato, M.; Nakatsui, M.; Hirokawa, T.; Okuno, Y. The effect of conformational flexibility on binding free energy estimation between kinases and their inhibitors. J. Chem. Inf. Model. 2016, 56 (12), 2445−2456. (42) Kimura, S. R.; Tebben, A. J.; Langley, D. R. Expanding gpcr homology model binding sites via a balloon potential: A molecular dynamics refinement approach. Proteins: Struct., Funct., Bioinf. 2008, 71 (4), 1919−1929.

J

DOI: 10.1021/acs.jpcb.8b02756 J. Phys. Chem. B XXXX, XXX, XXX−XXX