Exploring Complex ProteinLigand Recognition Mechanisms with

J. Phys. Chem. B 2009, 113, 4807–4816

4807

Exploring Complex Protein-Ligand Recognition Mechanisms with Coarse Metadynamics Matteo Masetti,† Andrea Cavalli,*,‡,§ Maurizio Recanatini,‡ and Francesco Luigi Gervasio*,†,| Computational Science, Department of Chemistry and Applied Biosciences, ETH Zu¨rich, USI Campus, Via Giuseppe Buffi 13, CH-6900 Lugano, Switzerland, Department of Pharmaceutical Sciences, Alma Mater Studiorum, Bologna UniVersity, Via Belmeloro 6, I-40126 Bologna, Italy, Department of Drug DiscoVery and DeVelopment, Italian Institute of Technology, Via Morego 30, I-16163 GenoVa, Italy and Computational Biophysics Group, Structural Biology and Biocomputing Programme, Spanish National Research Centre, C/Melchor Fdez. Almagro 3, 28029 Madrid, Spain ReceiVed: May 5, 2008; ReVised Manuscript ReceiVed: NoVember 5, 2008

The metadynamics method has been shown to be a valuable tool to study the mechanism of molecular recognition in atomistic detail [Gervasio, F. L.; et al. J. Am. Chem. Soc. 2005, 127, 2600]. However, it requires an a priori knowledge of all slow degrees of freedom relevant to the docking/undocking mechanism. Here we investigate a combination of docking/clustering with metadynamics performed with a subset of the necessary degrees of freedom (coarse metadynamics), and show that it provides a full mechanistic insight on the protein-ligand docking mechanism. Moreover, the proposed protocol is able to clearly distinguish between crystallographic and noncrystallographic poses of protein-ligand complexes, and also to find the transition state of the full undocking mechanism, thus giving an indication on the binding free energy. 1. Introduction Computational methods such as ligand-docking have nowadays become almost routine techniques in medicinal chemistry, and are widely used in both the lead-discovery and the leadoptimization phases in rational drug design. The general docking problem consists in the prediction of the best matching between two or more molecular entities that form an intermolecular complex.1 In this context, ligand-docking methods attempt to find the most favorable configuration of a small molecule (i.e., a conformation relative to a particular orientation, usually referred to as a docked pose) inside a binding site, whose spatial localization and extension are known. Such a challenging issue might in principle be addressed at different levels of complexity for the physical description of the system under investigation. Nevertheless, molecular docking is always concerned with two closely related subproblems,2,1 namely, the sampling of the configurational space (pose generation), and the evaluation of the stability of the intermolecular complex generated at the previous step (pose scoring). The most popular docking programs employed by the pharmaceutical community usually trade accuracy for speed. The main drawbacks of these programs can be summarized as follows: (i) limited coverage of the conformational degrees of freedom, (ii) only partial accounting for the role of the solvent, (iii) approximate estimation of the binding free energy, and (iv) lack of insight on the docking path and the transition state. Up to now, most of the methods only include the conformational space of the ligand, whereas the target protein is invariably treated as a rigid body. Some attempts to include the side chain flexibility (rotamer library, soft-docking3,4) or large domain motions (relevant normal-mode analysis5) begin * To whom correspondence should be addressed. E-mail: fgervasi@ phys.chem.ethz.ch, [email protected]. † ETH Zu¨rich. ‡ Bologna University. § Italian Institute of Technology. | Spanish National Cancer Research Centre.

to appear, even if an exhaustive modeling of the induced-fit is still far from being easily obtained. Similarly, the solvent is usually simulated by means of implicit models,2 thus seriously limiting the reliability of the search, especially when structural water molecules form a constitutive part of the intermolecular complex. Again, attempts to address this problem are beginning to emerge.4 An accurate estimation of the docking free energy and of the full docking/undocking path remains the most difficult task.6,1,7-9 To speed up the calculation of the free energy, most docking programs implement approximated scoring functions of various kinds. Nevertheless, it has been widely shown that such scoring functions have a limited reliability.2,10,1 A possible solution is to use the consensus-scoring approach11 or the geometrical cluster analysis.12,13 It has been shown that, when induced-fit and solvation effects are negligible, the correct pose (namely, the configuration that more closely reproduces the crystallographic one in an rmsd metric) always statistically lies within the most populated clusters.12,13 Although cluster analysis clearly represents a helpful filtering strategy, results may depend on the choice of parameters. A more computationally intensive alternative that is also able to provide an insight of the full docking path is to rely on molecular dynamics (MD) simulations and calculate the relative free energy of different poses with methods such as free energy perturbation,14,15 thermodynamic integration,16-18 multicanonical MD,19 umbrella sampling/weighted histogram analysis,20 force probe MD,21 and metadynamics.22,23 Among the emerging techniques, metadynamics 22,24 has been shown to be a useful method in docking simulations,25,26 allowing different binding poses to be efficiently explored, and providing the free energy profile of the docking and undocking mechanism. In particular, metadynamics was successfully applied to a set of simple intermolecular complexes in order to study both the binding and the unbinding process of the ligand with respect to the target protein.25 The method was found to be robust and as accurate as umbrella sampling with weighted histogram techniques.25

10.1021/jp803936q CCC: $40.75  2009 American Chemical Society Published on Web 03/19/2009

4808 J. Phys. Chem. B, Vol. 113, No. 14, 2009

Masetti et al.

Figure 1. Schematic representation of the applied protocol.

However, it shares with the latter some drawbacks that make it scarcely practical for routine docking tasks. In particular, it requires a careful choice of collective variables (CVs). In order to reconstruct the FES, the CVs must describe all the slow events that are relevant to the ligand binding/unbinding. As it was shown in ref 26, when a relevant CV is neglected, the reconstruction of the FES fails. This often implies the need to run multiple metadynamics simulations to understand which CVs are needed for each protein–ligand complex. Moreover, even when all the needed CVs are included, the reconstruction of the FES in more than 3 dimensions requires large computational resources. The same is true for PTmetaD, a method that combines metadynamics with parallel tempering and is able to reconstruct the FES even when relevant CVs are neglected.27 In light of the above considerations, here we present a protocol whereby we run a local metadynamics by using just two non-system-specific CVs (coarse metadynamics) to study a set of solutions generated by a standard docking method. To this aim, (i) we first perform a docking run to generate a set of putative binding modes, which are subsequently filtered by cluster analysis; (ii) an equilibration of the most populated docking poses with atomistic MD is carried out to further filter the solutions according to well established refinement techniques;28 and (iii) the selected poses are used as starting points for undocking with a coarse metadynamics (Figure 1). The final goal is to provide a reasonably fast computational protocol able to identify the lowest energy binding poses, the docking/ undocking path, and an estimate of the ligand-protein interaction free energy. Such a method was applied to a set of pharmaceutically relevant biological complexes having a different degree of complexity, encompassing superficial and buried binding sites. In particular, glycogen synthase kinase-3β bound with a selective ureidic inhibitor, estrogen receptor-R bound with 4-hydroxy tamoxifen, and E. coli enoyl-ACP reductase bound with triclosan were investigated. In all cases, new insights into the protein-ligand undocking mechanism were revealed. Anticipating our results, coarse metadynamics was able to unambiguously discriminate between the crystallographic and noncrystallographic poses. In all the studied cases, the free energy basin of the crystallographic pose turned out to be deeper than those of other docking solutions. The ∆G for the crystallographic pose calculated with respect to the transition state (TS, ∆G‡calc) was found to be in every case a few kcal mol-1 more than the module of the binding ∆G. In one case, during the metadynamics a noncrystallographic docking solution rearranged

to the crystallographic one before exiting the cavity. Furthermore, understanding the mechanism of undocking provides new insights that might be exploited in the design of more potent and selective drug candidates. 2. Methods Docking, clustering, and metadynamics are applied in sequence as depicted in Figure 1. 2.1. Docking. Docking simulations were performed using the program AutoDock 3.05.29 The starting coordinates were retrieved from the Protein Data Bank.30 The point charges of the ligands and cofactors were fitted, using the RESP scheme,31-33 from the electrostatic potential calculated at the HF/6-31G(d) level of theory. Grid maps were calculated inside a box, whose location and dimensions were adapted onto the different binding sites while using a constant resolution of 4 points Å-1. The search was performed with the default genetic algorithm. The maximum number of energy evaluations was 2.5 × 105, and the maximum number of generations was 2.7 × 104. For each complex, starting from an initial population of 50 individuals, we obtained 100 optimized configurations. 2.2. Clustering. Cluster analysis was performed using the program AClAP,13 which implements a hierarchical-agglomerative approach. The intercluster distance was evaluated using the average-linkage rule, while the KGS penalty function was used as cutting rule to prune the dendrogram. The cluster significance and the clusterability were assessed by the Chauvenet criterion and the Hopkins’ H* test, respectively. The cluster analysis was performed on the heavy atoms of the molecules, imposing chemical equivalences where needed. We ranked the clusters (Table 1) according to their relative population instead of using the scoring function, since it was shown that this procedure could lead to a better agreement with the experimental data.13 2.3. MD and Metadynamics. For each cluster the configuration closest to its geometric center (see ref 13) was chosen. The whole protein-ligand complex was relaxed with a 1 ns long MD simulation at 300 K and 1 atm with NAMD-2.6.34 The ligand bonded parameters were assigned by Antechamber/ GAFF force field.35,33 The point charges were derived with a RESP fit, as reported above. Simulations were performed using the TIP3P water model36 and the AMBER parm9937 force field. All the simulations were performed in an orthorhombic box with periodic boundary conditions. The dimensions of the box were

Protein-Ligand Recognition Mechanisms

J. Phys. Chem. B, Vol. 113, No. 14, 2009 4809

TABLE 1: Docking and Clustering Outcomes for Each of the Complexes Investigateda complex

cluster number

cardinality

rmsd [Å]

AutoDock scoring [kcal mol-1]

‡ ∆Gcalc [kcal mol-1]

∆Gexp [kcal mol-1]

1Q5K

1 2 3 4 1 2 1 2 3

27 19 10 8 40 9 21 19 11

1.35 3.35 8.79 8.14 1.80 5.25 6.40 5.98 1.40

-8.04 -7.96 -7.42 -7.75 -13.98 -12.98 -5.30 -5.51 -6.09

14

-10.143

4 16 14

-13.148

3ERT 1D8A

-16.452

11 18

a For clarity, only the most relevant clusters are reported. The RMSD refers to the distance between the representative pose of each cluster and the crystallographic one. The representative structure is defined as the configuration closest to the geometric center of the cluster.13

chosen to be at least 8 Å larger than the solute in every direction. Long-range electrostatics was computed with the PME method.38 Short range nonbonded interactions were calculated using a cutoff radius of 12 Å for both Coulomb and van der Waals potentials. The temperature was kept constant at 300 K by a Langevin thermostat with a collision coefficient of 5 ps-1. The time step was set to 2 fs. Bond lengths involving hydrogens were restrained by the SHAKE39 algorithm. We used the direct formulation of metadynamics,23 which we implemented in the MD code NAMD.34 To perform a metadynamics simulation, a set of functions of the system coordinates s(x) able to describe the process of interest have to be chosen (i.e., the collective variables, CVs). The CVs must be able to distinguish between the different states visited by the system, which in the present context were all the docked and undocked conformations as well as the stable intermediates. Moreover, CVs should include the main “slow” variables that change during the transformation, such as large scale motions of the protein or changes in the solvation shell of the ligand. The dynamics in the space of the CVs is then guided by the free energy of the system plus a history-dependent potential having the functional form of a sum of Gaussians centered along the CVs space up to the current time t:

VG(s, t) )

[

∫0t dt' W exp - (s(x) -2δss(x(t'))) 2

2

]

(1)

where W is the weight and δs is the width of the Gaussians. As with other methods that sample free energy in generalized coordinates, the reliability of the reconstructed free energy is strongly dependent upon the choice of the CVs. Since the aim of the present work was to investigate how effectively a few nonspecific CVs (coarse metadynamics) were able to discriminate between different docking conformations, we did not try to find the best set of CVs for each system. We used throughout a distance (r) and an angle (θ) where (1) r is the distance between a carefully chosen reference point on the protein pocket and the center of mass (CM) of a rigid moiety of the ligand (see Figure 2), and (2) θ is the angle vector (defined in [0:π]) between a reference point on the protein pocket (not necessarily the one chosen for r), the CM, and the major inertia axis of the rigid moiety of the ligand chosen to define r. The choice of reference point on the protein pocket plays an important role in the efficiency of the method. If the CM of a flexible portion of the pocket is chosen the performance of the method is significantly degraded and the reconstructed free energy basin will be smeared out. After several trials we found that the best performance is obtained by taking the CM of the rigid secondary structure element closest to every docked pose.

Figure 2. Chemical structures of the ligands studied. The boxed region shows the rigid moiety used in the definition of the CVs.

The rigidity of these protein portions was assessed during the unconstrained preliminary MD runs. To minimize the noise on the reconstructed FES, during the metadynamics simulations the CR of those portions of the protein defining the CVs were kept fixed.

4810 J. Phys. Chem. B, Vol. 113, No. 14, 2009 The metadynamics simulations were started from the preequilibrated complexes. The hills width was set to 0.4 Å for the distance, 0.10 rad for the angle vector, and 0.15 rad for the dihedral. The height was set to 0.075 kcal mol-1 and 0.15 kcal mol-1 for bidimensional and three-dimensional metadynamics, respectively. The deposition interval was set to 1000 fs. An upper bound of 18 and 15 Å was set on r in the bidimensional and three-dimensional metadynamics, respectively. To bypass the well recognized problem of lack of convergence in metadynamics,40,25,41 the coarse simulations were here truncated after reaching the rate limiting step for the unbinding (∆G‡calc). Hence, we focus our results on the ∆G‡calc rather than on the ∆Gcalc of binding, which differently would require full convergence on the sampling of the outer free energy basins. This allows a quite fast estimation of the binding free energy of several protein-ligand docking complexes. 3. Results and Discussion For this study, we chose three pharmaceutically relevant biological complexes having different levels of complexity, encompassing superficial and buried binding sites: glycogen synthase kinase-3β bound with a selective ureidic inhibitor, estrogen receptor-R bound with 4-hydroxy tamoxifen, and E. coli enoyl-ACP reductase bound with triclosan. For clarity, the chemical structures of the ligands are reported in Figure 2. 3.1. Glycogen Synthase Kinase-3β Complexed with a Selective Ureidic Inhibitor. Glycogen synthase kinase-3 (GSK3) is a serine-threonine kinase which plays a key regulatory role in a number of cellular functions, it being involved in the phosphorylation of metabolic, signaling, and structural proteins and transcription factors.42 GSK3 is found in cells in two isoforms encoded by different genes, R and β, and a splice variant of the latter is also known.42 A fine-tuning of GSK3β activity is essential for life,42 and its malfunctioning is related to a large number of multifactorial diseases.42,43 Accordingly, the inhibition of GSK3β represents an important therapeutic strategy, against which the drug design is unavoidably complicated by the requirement of selectivity with respect to other kinases.43 Undocking Protocol. We study the undocking of the selective ureidic inhibitor AR-A014418 (N-(4-methoxybenzyl)-N′-(5nitro-1,3-thiazol-2-yl)urea, Figure 2) bound to GSK3β (PDB accession code: 1Q5K43). In Table 1, the most populated poses of the docking/clustering procedure are reported. In this case, the most populated cluster corresponded to the crystallographic geometry with an rmsd of 1.35 Å which is well within the generally accepted limit of 2.5 Å. On the contrary, the AutoDock score value for the crystallographic structure was only marginally better than that of pose 2, which showed an rmsd of 3.35 Å. This further confirms the observation that the discrimination between crystallographic and noncrystallographic binding modes in the absence of a cluster analysis can be difficult. During the MD equilibration, only two of the four representative structures belonging to the most populated clusters proved to be stable. In particular, pose 2 spontaneously converged into pose 1, whereas pose 3 early left the binding pocket. Thus, we performed metadynamics runs only on the stable poses 1 and 4. The relaxed initial geometries are shown in the upper panel of Figure 3, along with a representation of the CVs used to study the undocking event. The reference point on the protein pocket is the CM of the CR of residues 133-134, corresponding to the hinge region. In pose 1, namely the docking solution which reproduces the experimental binding mode, the inhibitor tight binds to the hinge

Masetti et al. region of the kinase (Figure 3A, upper panel). Three main contacts can be identified: (i) the ureidic group and the thiazole ring interact with the backbone of Val135 and Asp133 forming multiple hydrogen bonds; (ii) the p-methoxy-benzyl group is stacked on the guanidine group of Arg141, and interacts with the side chain of Tyr134 via a bridging water molecule; (iii) the nitro group interacts with the side chain of Lys85 by linking a second structural water. Moreover, several van der Waals contacts contribute to the stabilization of this binding mode. Conversely, pose 4 occupies the same pocket of pose 1 with a different orientation (Figure 3B, upper panel). In particular, the nitro group of the ligand is H-bonded to the backbone of Val135, and a structural water is involved in bridging Lys85 to the carboxyl group of the ureidic moiety of AR-A014418. The main difference between the two poses was in the p-methoxy-phenyl orientation, that in pose 4 was almost solvent exposed. In the middle panel of Figure 3, the undocking FESs of the two poses are shown. The metadynamics runs were interrupted shortly after the TS was reached, that is after 6 and 3 ns of simulation (3 and 1.5 days on 8 Core2 cores) for poses 1 and 4, respectively. Since it has been shown that metadynamics is able to find the lowest TS,23 we take as the TS the lowest point on the reconstructed FES connecting the docked free energy basin to the undocked free energy basin. From Figure 3, it is clear that the free energy basin corresponding to pose 1 (the crystallographic structure) is much deeper than that of pose 4. In particular, the calculated ∆G‡calc are 14 and 4 kcal · mol-1 for pose 1 and 4, respectively, as shown in Table 1. Notably, the ∆G‡calc for pose 1 is about 4 kcal mol-1 higher than the experimental binding energy. The ∆G‡calc for pose 1 was confirmed by performing a 2-D umbrella sampling (see Supporting Information). Undocking Mechanism. As we previously pointed out, in addition to the discrimination between the crystallographic and noncrystallographic binding modes, our approach also provides a guess of the putative undocking mechanism. In the lower panel of Figure 3, three representative configurations selected along the undocking trajectory are shown. For comparison, the coordinates of such points in the CVs space are shown as white dots in the matching plot of the FES (Figure 3, middle panel). The first event observed during the undocking of pose 1 was a switch in the hydrogen bond network configuration with respect to that of the global minimum, leading to the detachment of the ureidic moiety of the ligand from the hinge region of the enzyme (Figure 3, lower panel). In particular, to reach the first metastable state, the ligand underwent a small rotation, keeping its p-methoxy-phenyl group in close contact with Arg141. Proceeding with the metadynamics, the ureidic moiety of the ligand lost its interaction with the protein, and a number of water molecules (up to three) filled the gap between the ligand and the amino acids belonging to the hinge region. During these early stages, the ligand kept pointing its most polar part toward the solvent, until the nitro group reached the side chain of Lys85, interacting by means of a hydrogen bond reinforced by charge. The stacking interaction of the p-methoxy-phenyl ring with the guanidine group of Arg141 was the last original interaction to be lost. The next step to undocking involved a large reorientation of the major inertia axis of the ligand, in order to get its most polar part (namely, the thiazole ring and the nitro group) solvent exposed. In the last metastable state, the endocyclic nitrogen and the thiazole ring itself interacted with the side chain of Lys183, while the ureidic moiety was tightly bound with the carboxyl group of Asn185. Finally, all the interactions with the protein were lost.



Figure 3. Upper panel shows relaxed geometries of the AR-A014418 ligand docked to GSK3β. (A) Pose 1, CVs coordinates: (5.49; 2.56). (B) Pose 4, CVs coordinates: (7.36; 1.40). Collective variables are shown in dark red, hydrogen bonds in yellow. Amino acids directly involved in the binding and structural water molecules are shown. The CR trace of the protein is shown as a ribbon, and the color code follows the structured secondary structure: orange for R-helices and violet for β-strands. The lower boundary of the cavity is shown as a Connoly surface. Middle panel shows FESs as a function of r and θ. The white crosses show the initial docked geometry in the CVs space for pose 1 and pose 4. The white dots refer to the selected configurations shown in the lower panel. Lower panel shows three selected configurations of the undocking trajectory of pose 1.

Careful analysis of trial undocking trajectories as well as 3D and preliminary 4D metadynamics runs showed that at least two other important slow CVs should be added to obtain a fully converged FES. A closely related problem is the absence of conversion between the different starting poses, which should be expected if the CVs were sufficient and correct. Among the variables that could be added to improve the results and the agreement of the two FESs, one is connected to the hydration of the docking cavity. Indeed, in order to reach pose 1 starting from pose 4, several structural water molecules had to be displaced from the cavity. The second is a dihedral angle that

would allow a better discrimination of different configurations during the exploration of the binding site. Unfortunately, converging the full 4D metadynamics is prohibitively expensive, and is also out of the scope of the present work. 3.2. Estrogen Receptor-r Complexed with 4-hydroxy Tamoxifen. Estrogen receptors (ERs) belong to the nuclear receptor (NR) superfamily of ligand-inducible transcriptional factors.44-46 ERs are mainly composed by five modular domains, namely either a hormone-independent and a hormonedependent transcription activator factor domain (AF1 and AF2, respectively), a DNA-binding domain via a couple of


Masetti et al.

Figure 4. Upper panel shows relaxed geometries of the tamoxifen docked to ERR. (A) Pose 1, CVs coordinates: (9.48; 2.83). (B) Pose 2, CVs coordinates: (10.60; 0.20). Collective variables are shown in dark red, hydrogen bonds in yellow. Amino acids directly involved in the binding and structural water molecules are shown. The CR trace of the protein is shown as a ribbon, and the color code follows the structured secondary structure: orange for R-helices and violet for β-strands. The lower boundary of the cavity is shown as a Connoly surface. Middle panel shows FESs as a function of r and θ. The white crosses show the initial docked geometry in the CVs space for pose 1 and pose 2. The white dots refer to the selected configurations shown in the lower panel. Lower panel shows three selected configurations of the undocking trajectory of pose 1.

zinc finger motifs, a receptor dimerization domain, and eventually a ligand-binding domain (LBD) where the hormone (17β-estradiol) binds.45,46 Up to now, two isoforms of estrogen receptors are known. ERR and ERβ share a high sequence similarity in their DNA-binding domains, whereas in the LBD the similarity is remarkably low.45 ERs regulate the differentiation and maintenance of neural, skeletal, cardiovascular, and reproductive tissues,44 and the presence of ERR correlates with estrogen-dependent breast cancer.45 For this reason, ERR antagonists are commonly used in anticancer therapy, as they can decrease tumoral cell proliferation,45 or in the treatment of osteoporosis.44 Undocking Protocol. Here, we study the undocking of 4-hydroxy tamoxifen (Figure 2) from the LBD of ERR (PDB

accession code: 3ERT43). In this case, the docking/clustering procedure resulted in only two clusters: the first (40 elements), corresponding to the crystallographic structure, and the second, significantly less populated with only 9 elements (Table 1). A few scarcely populated clusters sharing with pose 1 the same orientation of the rigid triphenylethylene core were also found. All these poses converged to pose 1 after MD equilibration. In this case, the AutoDock scoring function was able to clearly distinguish the correct crystallographic geometry from the other. This was not unexpected, since the free energy difference between the two poses is mainly enthalpic. While in pose 1 the 4-hydroxyl group made a strong H-bond with the carboxylate of Glu353, in pose 2 a weaker H-bond with Nδ of the imidazole ring of His524 was observed (Figure 4, upper panel). We note

Protein-Ligand Recognition Mechanisms in passing that the endogenous agonist, 17β-estradiol, was experimentally found to bind both to Glu353 and His524 by means of two hydroxyl groups.47 In the middle panel of Figure 4, the undocking FESs of the two poses are shown. The reference point on the protein pocket is the CM of the CR of residues 388-391, corresponding to R-helix H5. The reconstruction of the free energy was stopped after 10 and 12 ns for pose 1 and pose 2, respectively. The longer sampling time needed for the undocking was a consequence of the nature of the poses that are in depth buried in a protein cleft. While both poses were very stable, the metadynamics correctly predicted a greater stability for pose 1. In particular, the ∆G‡calc values are 16 and 14 kcal mol-1, for pose 1 and 2, respectively, whereas the experimental binding energy for this complex is -13 kcal mol-1.48 Remarkably, during the metadynamics run, pose 2 spontaneously converted to pose 1 before reaching the TS. Undocking Mechanism. In the first metastable state (Figure 4, lower panel), the triphenylethylene core of the molecule was still in depth buried in the cavity even if the primary H-bond interactions were already lost. In particular, the separation of the phenoxyl group from the side chain of Arg394 and Glu353 was assisted by the entrance of a number of water molecules from the back of the cavity, and the phenyl group was kept stacked with the imidazole ring of His524. Conversely, the dimethylamminoethyl chain was still interacting with the carboxyl group of Asp351, as well as with the side chain of Trp383 belonging to helix 5. The rate limiting step consisted of a reorientation of the drug allowing the insertion of the 4-hydroxyl-phenyl ring between the helices 11 and 8. Notably, the dimethylamminoethyl chain was still tightly bound with either Asp351 or Trp383. Once the drug left the cavity, a bound configuration far from the binding site was reached. This external pocket is formed by helices 3, 5, 11, and 12. Finally, keeping on filling the FES, the ligand achieved a fully hydrated state. The achievement of the conversion from pose 2 to pose 1 is a clear indication that in this case the chosen CVs are enough to describe the unbinding of the drug, and to provide a detailed exploration of the binding site and the peripheral regions of the receptor as well. This consideration is confirmed by a larger amount of similarity between the FESs, as a consequence of a high degree of convergence (Figure 4, middle panel). Nevertheless, by analyzing the various undocking events, we detected the important slow CVs that should be added to obtain a fully converged FES. In particular, the missing CVs should describe the large scale motions of the receptor. Indeed, NRs represent a classic example of proteins bearing buried binding sites, where the binding of the substrate is supposed to be assisted by relevant motions of secondary structure motifs. 3.3. E. coli Enoyl-ACP Reductase Complexed with Triclosan. A fatty acid synthase (FAS) is a multienzyme complex devoted to the catalysis of the biological pathway for lipogenesis.49 Two main different architectures of such a complex are present in nature: type I FAS is found in animals, and consists of a multifunctional polypeptide encoded by a single gene, whereas type II FAS consists of a dissociated system made up by several monofunctional polypeptides, each of them encoded by different genes.50,49 Type II FAS is found in plants, bacteria, fungi, and protozoa.49 For this reason, FAS-II enzymes represent targets for chemotherapeutic agents in many parasitic infections. Triclosan (5-chloro-2-(2,4-dichlorophenoxy)phenol) has been widely used both as a general antibacterial and antifungal agent, and it acts as a selective blocker of the prokaryotic lipid

J. Phys. Chem. B, Vol. 113, No. 14, 2009 4813 biosynthesis, inhibiting the enzyme enoyl-acyl carrier protein (ACP) reductase (ENR), also known as FabI.50,51 In particular, ENR catalyzes the final, regulatory step of the fatty acid synthesis, by reducing a carbon-carbon double bond in an enoyl moiety covalently linked to an acyl carrier protein.51 The binding site of ENR is located inside a very flexible cavity formed by β-sheets and short loops. At the bottom of the cavity, the nicotinamide moiety of the cofactor interacts directly with the inhibitor. It should be noted that the catalytic form of the cofactor is NADH, whereas experimental evidence suggests that the oxidized form is the one primarily involved in the binding with inhibitors.51,52 Thus, we assumed the oxidized form, and we used the parameters of ref 53. Undocking Protocol. We study the undocking of triclosan (Figure 2) from the catalytic site of E. coli ENR (PDB accession code: 1D8A51). The performance of the clustering procedure was much worse than in the previous cases. The first two clusters were almost equipopulated and were both far from the crystallographic geometry with an rmsd of more than 5 Å (Table 1). On the contrary, the AutoDock scoring function correctly ranked the clustered results, predicting for pose 3 the representative structure belonging to the cluster closer to the crystal configuration, a score of about 0.50 kcal mol-1 better than the other two. The relaxed configurations of pose 2 and pose 3 are represented in the upper panel of Figure 5. In pose 3, which best matches the crystallographic binding mode, the phenol ring of triclosan is located on the bottom of the binding site, directly facing the nicotinamide ring of the nucleotide cofactor. In particular, the positively charged nicotinamide ring is involved in a face-to-face π-π stacking with the phenol ring of the ligand, whereas the dichlorophenoxyl group blocks the inner part of the cavity mainly interacting with different amino acids by multiple van der Waals contacts. Notably, in the crystal structure a sulfur-π charge-transfer interaction can also be detected between Met158 and the dichlorophenoxyl moiety of triclosan. Given that we used a nonpolarizable force field, our simulation did not correctly reproduce this kind of interaction. Conversely, in pose 2, the dichlorophenoxyl group of triclosan points toward the nicotinamide ring and the phenol ring blocks the entrance, whereas in both poses the hydroxyl group accepts a hydrogen bond from the 2′ hydroxyl group of the NAD+ ribose. In Figure 5, middle panel, the undocking FESs for the two poses are shown. The reference point on the protein pocket is the CM of the CR of NAD+ binding residues (146, 189-192). The reconstruction of the free energy was stopped after 6 ns for both poses. It is clear that pose 2 (the noncrystallographic one), starting from a minimally stable state (white cross in the right side of Figure 5), moved to a deep basin, which is 2 Å and 0.5 rad apart from the initial configuration. Later, during the metadynamics simulation, a stable state was found lying close to the crystal structure in the space of the CVs. Unfortunately, the proximity of the two states was only apparent: a deep inspection of the structures corresponding to the newly found basin revealed that they were far from the crystal structure. In this complex, the two general-purpose CVs were not able to fully discriminate the different geometries achieved by the ligand during the undocking. Still, the free energy basin corresponding to the crystallographic pose was clearly deeper than that of the noncrystallographic one. Given the failure of the clustering procedure, the clear indication provided by metadynamics was even more important. The ∆G‡calc values are


Masetti et al.

Figure 5. Upper panel shows relaxed geometries of the triclosan docked to the E. coli ENR. (A) Pose 3, CVs coordinates: (4.69; 1.15). (B) Pose 2, CVs coordinates: (9.30; 2.31). Collective variables are shown in dark red, hydrogen bonds in yellow. Amino acids directly involved in the binding and structural water molecules are shown. The CR trace of the protein is shown as a ribbon, and the color code follows the structured secondary structure: orange for R-helices and violet for β-strands. The lower boundary of the cavity is shown as a Connoly surface. Middle panel shows FESs as a function of r and θ. The white crosses show the initial docked geometry in the CVs space for pose 3 and pose 2. The white dots refer to the selected configurations shown in the lower panel. Lower panel shows three selected configurations of the undocking trajectory of pose 3.

11 and 18 kcal mol-1, for poses 2 and 3, respectively, whereas the experimental binding free energy is 16 kcal mol-1 (see Table 1). Undocking Mechanism. The first event observed in the undocking was a sliding movement of the triclosan chlorophenoxyl ring over the nicotinamide group belonging to NAD+. This occurred while keeping the H-bond between the 2′ hydroxyl group of the cofactor ribose and the hydroxyl group of the drug. At the same time, the dichlorophenyl ring reached the side chain of Phe94, located in the outer mouth of the cavity, forming a π-π interaction. To exit from the cavity, the drug had to rotate in order to expose the hydroxyl group of triclosan to the solvent. This movement was helped by a H-bond formed between the 2′ hydroxyl of NAD+ ribose and the chlorine group of the chlorophenoxyl ring of the drug. Finally, triclosan rotated

pointing the hydroxyl group toward the adenine group of the cofactor, and left the cavity. Also in this case we analyzed multiple undocking trajectories and performed 3D metadynamics to identify the missing CVs. As in the case of GSK3β, both the hydration of the cavity and a dihedral angle should be added to obtain a well converged metadynamics run. 3.4. Three-Dimensional Metadynamics Runs. To assess the effect of adding a third CV on the performance of the method, we ran 3D metadynamics on all the complexes. The FESs were reconstructed as a function of r, θ, and ω, where ω is the dihedral angle between two reference points on the protein, and the CM and the major inertia axis of the rigid moiety of the ligand were chosen to define r and θ. When the third variable



Figure 6. 3D FES as a function of r, θ, and ω of the triclosan docking to the E. coli ENR. (A) Pose 1, CVs coordinates: (4.69; 1.15; 2.63). (B) Pose 2, CVs coordinates: (9.30; 2.31; -3.10). The blue crosses represent the docked geometries in the CVs space.

was used, the resolution was much improved at an unavoidably increased computational cost. Of the three complexes, the most interesting case, where the addition of a variable significantly improved the reconstruction of the FES, was the undocking of triclosan from ENR. In Figure 6, we show both the 3D FES (where the free energy is plotted as isocontour surfaces) and the 2D projections: θ versus r, and ω versus r, which can help understanding the plot. From the figure, it can be appreciated how the added dimension provides a better discrimination between the different docking poses. In particular, the shallow minimum, that in the 2D FES of the noncrystallographic pose (Figure 5) could be taken as the correct crystal geometry, now clearly belongs to a different free energy basin. However, the addition of a third dimension is not always sufficient to converge the metadynamics nor to achieve the conversion from the noncrystallographic pose to the crystallographic one. In the case reported here, the addition of a CV connected to the hydration of the cavity makes convergence much faster. This is a clear indication of slowly relaxing water molecules in the binding cavity. 4. Conclusions In this paper, we have shown that coarse metadynamics, i.e., a metadynamics performed with a limited subset of generalpurpose CVs, can be successfully used to investigate the undocking mechanism and to estimate protein-ligand binding free energy. In the proposed protocol, first a high number of poses was generated via a conventional docking algorithm, and then cluster analysis was carried out to prune the solutions. Finally, coarse metadynamics was performed using the representative structure belonging to the most populated clusters, to measure the depth and width of the free energy basins with respect to the TS found along the undocking mechanism (∆G‡calc).

We have observed that two CVs are not sufficient to reconstruct a free energy for the full undocking trajectory. However, in the docking cases examined here, the local depth ‡ of the free energy basins as well as the ∆Gcalc gave a clear unambiguous indication of the crystallographic docking geometry, even if a generic force field was used throughout. In all cases, the ∆G‡calc was around 2 kcal mol-1 more than the experimental binding energy. Such an energy difference could be related to the loss of entropy going from a fully hydrated free ligand to a partially bound TS. If this trend was confirmed for a greater number of cases, the ∆G‡calc itself could be used to predict the binding energy, thus eliminating the need for an expensive fully converged metadynamics. The proposed protocol clearly enhances the predictive power of standard docking techniques joined to geometrical cluster analysis. At the same time, it is computationally cheaper with respect to the plethora of enhanced sampling techniques available nowadays. Moreover, the applied procedure intrinsically provides the full putative undocking mechanism, including the TS structure. This knowledge could be used to optimize the ligand to increase its affinity and/or selectivity toward a certain biological counterpart. Acknowledgment. We thank Annarita Musacchio for technical assistance, and Professor Michele Parrinello, Davide Branduardi, and Francesco Colizzi for useful discussions. We are grateful to CSCS for providing computer time. Supporting Information Available: Additional figure. This material is available free of charge via the Internet at http://pubs.acs.org.

4816 J. Phys. Chem. B, Vol. 113, No. 14, 2009 References and Notes (1) Halperin, I.; Ma, B.; Nussinov, R. Proteins 2002, 47, 409. (2) Sousa, S. F.; Fernandes, P. A.; Ramos, M. J. Proteins 2006, 65, 15. (3) Abagyan, R.; Totrov, M.; Kuznetzov, D. J. Comput. Chem. 1994, 15, 488. (4) Osterberg, F.; Morris, G. M.; Sanner, M. F.; Olson, A. J.; Goodsell, D. S. Proteins 2002, 46, 34. (5) Cavasotto, C. N.; Kovacs, J. A.; Abagyan, R. A. J. Am. Chem. Soc. 2004, 127, 9632. (6) Leach, A. R.; Shoichet, B. K.; Peishoff, C. E. J. Med. Chem. 2006, 49, 5851. (7) Fujitani, H.; Tanida, Y.; Ito, M.; Jayachandran, G.; Snow, C. D.; Shirts, M. R.; Sorin, E. J.; Pande, V. S. J. Comput. Phys. 2005, 123, 1. (8) Wang, J.; Deng, Y.; Roux, B. Biophys. J. 2006, 91, 2798. (9) LinksGohlke, H.; Klebe, G. Angew. Chem., Int. Ed. 2002, 41, 2644– 2676. (10) Bissantz, C.; Folkers, G.; Rognan, D. J. Med. Chem. 2000, 43, 4759. (11) Charifson, P. S.; Corkery, J. J.; Murcko, M. A.; Walters, W. P. J. Med. Chem. 1999, 42, 5100. (12) Bottegoni, G.; Cavalli, A.; Recanatini, M. J. Chem. Inf. Model. 2006, 46, 852. (13) Bottegoni, G.; Rocchia, W.; Recanatini, M.; Cavalli, A. Bioinformatics 2006, 15, e58. (14) Bash, P. A.; Singh, U. C.; Brown, F. K.; Langridge, R.; Kollman, P. A. Science 1987, 235, 574–576. (15) Bash, P. A.; Singh, U. C.; Langridge, R.; Kollman, P. A. Science 1987, 236, 564–568. (16) Sneddon, S. F.; Tobias, D. J.; Brooks, C. L., III. J. Mol. Biol. 1989, 209, 817–820. (17) P.Straatsma, T.; McCammon, J. A. Annu. ReV. Phys. Chem. 1992, 43, 407–435. (18) Kong, X.; Brooks, C. L., III. J. Chem. Phys. 1996, 105, 2414– 2423. (19) Nakajima, N.; Higo, J.; Kidera, A.; Nakamura, H. Chem. Phys. Lett. 1997, 278, 297–301. (20) Patey, G. N.; Valleau, J. P. J. Chem. Phys. 1975, 63, 2334–2339. (21) Heymann, B. A.; Grubmu¨ller, H. Biophys. J. 2001, 61, 1295–1313. (22) Laio, A.; Parrinello, M. Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 12562. (23) Laio, A.; Rodriguez-Fortea, A.; Gervasio, F. L.; Ceccarelli, M.; Parrinello, M. J. Phys. Chem. B 2005, 109, 6714–6721. (24) Iannuzzi, M.; Laio, A.; Parrinello, M. Phys. ReV. Lett. 2003, 90, 23802. (25) Gervasio, F.; Laio, A.; Parrinello, M. J. Am. Chem. Soc. 2005, 127, 2600. (26) Branduardi, D.; Gervasio, F.; Cavalli, A.; Recanatini, M.; Parrinello, M. J. Am. Chem. Soc. 2005, 127, 9147. (27) Bussi, G.; Gervasio, F. L.; Laio, A.; Parrinello, M. J. Am. Chem. Soc. 2006, 128, 13435–13441.

Masetti et al. (28) Alonso, H.; Bliznyuk, A. B.; Gready, J. E. Med. Res. ReV. 2006, 5, 531. (29) Goodsell, G. M. M. D. S.; Halliday, R. S.; Huey, R.; Hart, W. E.; Belew, R. K.; Olson, A. J. Comput. Chem. 1998, 19, 1639. (30) Berman, H.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I.; Bourne, P. Nucleic Acids Res. 2000, 28, 235. (31) Bayly, C. I.; Cieplak, P.; Cornell, W. D.; Kollman, P. A. J. Phys. Chem. 1993, 97, 10269. (32) Cornell, W. D.; Cieplak, P.; Bayly, C. I.; Kollman, P. A. J. Am. Chem. Soc. 1993, 115, 9620. (33) Cornell, W. D.; Cieplak, P.; Bayly, C. I.; Gould, I. R.; Merz, K. M.; Ferguson, D. M.; Spellmeyer, D. C.; Fox, T.; Caldwell, J. W.; Kollman, P. A. J. Am. Chem. Soc. 1995, 117, 5179. (34) Phillips, J. C.; Braun, R.; Wang, W.; Gumbart, J.; Tajkhorshid, E.; Villa, E.; Chipot, C.; Skeel, R. D.; Kale, L.; Schulten, K. J. Comput. Chem. 2005, 26, 1781. (35) Wang, J.; Wolf, R. M.; Caldwell, J. W.; Kollmen, P. A.; Case, D. A. J. Comput. Chem. 2004, 25, 1157. (36) Jorgensen, M. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, L. M. J. Comput. Phys. 1983, 79, 926–935. (37) Wang, J. M.; Cieplak, P.; Kollman, P. A. J. Comput. Chem. 2000, 21, 1049. (38) Essmann, U.; Perera, L.; Berkowitz, M. L.; Darden, T.; Lee, H.; Pedersen, L. G. J. Comput. Phys. 1995, 103, 8577–8593. (39) Ryckaert, L. P.; Ciccotti, G.; Berendsen, H. J. C. J. Comput. Phys. 1977, 23, 327–341. (40) Micheletti, C.; Laio, A.; Parrinello, M. Phys. ReV. Lett. 2004, 92, 170601. (41) Ensing, B.; Klein, M. L Proc. Natl. Acad. Sci. U.S.A. 2005, 102, 6755. (42) Jope, R. S.; Johnson, G. V. W. TRENDS Biochem. Sci. 2004, 29, 95. (43) Bhat, R.; et al. J. Biol. Chem. 2003, 278, 45937. (44) Shiau, A. K.; Barstad, D.; Loria, P. M.; Cheng, L.; Kushner, P. J.; Agard, D. A.; Greene, G. L. Cells 1998, 95, 927. (45) Hart, L. L.; Davie, J. R. Biochem. Cell. Biol. 2002, 80, 335. (46) Pike, A. C. W.; Brzozowski, A. M.; Hubbard, R. E. J. Steroid Biochem. Mol. Biol. 2000, 74, 261. (47) Tanenbaum, D. M.; Wang, Y.; Williams, S. P.; Sigler, P. B. Proc. Natl. Acad. Sci. U.S.A. 1998, 95, 5998–6003. (48) Perola, E.; Walters, W. P.; Charifson, P. S. Proteins 2004, 56, 235. (49) White, S. W.; Zheng, J.; Zhang, Y.-M.; Rock, C. O. Annu. ReV. Biochem. 2005, 74, 791. (50) Smith, S.; Witkowski, A.; Joshi, A. K. Prog. Lipid Res. 2003, 42, 289. (51) Levy, C. W.; Roujeinikova, A.; Sedelnikova, S.; Baker, P. J.; Stuitje, A. R.; Slabas, A. R.; Rice, D. W.; Rafferty, J. B. Nature 1999, 398, 383. (52) Sivraman, S.; Sullivan, T. J.; Johnson, F.; Novichenok, P.; Cui, G.; Simmerling, C.; Tonge, P. J. J. Med. Chem. 2004, 47, 509. (53) Ryde, U. Protein Sci. 1995, 4, 1124.

JP803936Q

Exploring Complex ProteinLigand Recognition Mechanisms with

Recommend Documents