Article pubs.acs.org/jcim
Modeling Binding with Large Conformational Changes: Key Points in Ensemble-Docking Approaches Stefano Motta and Laura Bonati* Department of Earth and Environmental Sciences, University of Milano-Bicocca, Piazza della Scienza 1, 20126 Milan, Italy S Supporting Information *
ABSTRACT: Protein dynamics play a critical role in ligand binding, and different models have been proposed to explain the relationships between protein motion and molecular recognition. Here, we present a study of ligand-binding processes associated with large conformational changes of a protein to elucidate the critical choices in ensemble-docking approaches for effective prediction of the binding geometry. Two study cases were selected in which binding involves different protein motions and intermolecular interactions and, accordingly, conformational selection and induced-fit mechanisms play different roles: binding of multiple ligands to the acetylcholine binding protein and highly specific binding of D-allose to the allose binding protein. Our results indicated that the ensemble-docking technique can provide reliable predictions of the structure of ligand−protein complexes, starting from simulations of the apo systems, when suitable methodological choices are made according to the different mechanistic scenarios. In particular, accelerated molecular dynamics simulations are suitable for conformational sampling when the unbound and bound states are separated by high energy barriers, provided that the acceleration parameters are carefully set to extensively sample the relevant conformations. A strategy specifically developed for geometric clustering of the binding site proved to be effective for selecting a set of conformations relevant to binding from the MD trajectory. Specific strategies have to be selected to incorporate different degrees of ligand-induced protein flexibility into the docking or pose-refinement steps.
■
adopted in the bound state.12−15 While the induced fit processes are kinetically determined, those guided by conformational selection are thermodynamically determined.13 It is conceivable that both conformational selection and induced fit mechanisms play important roles in molecular recognition and that each has its own range of applicability for specific systems and under certain conditions. The extent of protein conformational change between the free and ligandbound states is a key element. In addition, stronger and longranged ligand−protein interactions tend to favor induced-fit mechanisms, whereas weak and short-ranged interactions favor conformational selection.16 On the other hand, the two mechanisms may coexist in the same process. In fact, in several cases, it has been shown that a protein scaffold close to the bound conformation is chosen by the ligand through conformational selection and subsequently further changes occur to optimize the intermolecular interactions by induced fit.17−19 In recent years, the induced-fit and conformational selection models have been used by a number of computational strategies to account for protein flexibility in ligand binding. These strategies have been extensively reviewed elsewhere.5−10,20 Borrowing from the induced-fit model, a first class of methods aims to incorporate different degrees of ligand-induced protein flexibility into the docking procedure. This can be accomplished
INTRODUCTION Proteins are dynamic objects that interconvert among a variety of structures with varying energies. Several experimental and computational methods have been developed to characterize the multidimensional free-energy landscape accessible to these molecules and to describe their wide spectrum of motion. These motions range from slow transitions between distinct conformational states to fast fluctuations among closely related conformations within each state.1 Protein dynamics are critical in many biological processes, including signaling mechanisms, protein−protein interactions, ligand−protein binding, enzyme catalysis, and allosteric regulation.1−4 In particular, the role of receptor flexibility in ligand binding has received specific attention for its implication in structure-based drug discovery studies.5−10 The protein flexibility associated with ligand binding ranges from relatively small side-chain rearrangements to loop fluctuations that can provide large-scale local conformational changes and to hinge-bending motions of the structural subunits about flexible linkers.2 Different models have been proposed to explain the role of conformational dynamics in these processes. The “induced fit” model relies on the hypothesis that the bound protein conformation can be induced by the ligand.11 The “conformational selection” theory describes molecular recognition as a process in which the ligand selects the most complementary receptor conformation from an ensemble of pre-existing metastable states, which in turn shifts the dynamic population equilibrium toward the conformation © 2017 American Chemical Society
Received: February 28, 2017 Published: June 15, 2017 1563
DOI: 10.1021/acs.jcim.7b00125 J. Chem. Inf. Model. 2017, 57, 1563−1578
Article
Journal of Chemical Information and Modeling
Figure 1. Different C-loop conformations in the AChBP X-ray structures here considered. PDB ID, ligand structure, and name (ID) are reported for each structure.
using “soft” van der Waals potentials to account for a small amount of receptor plasticity, by treating the side-chain rearrangements in the binding pocket, or by including backbone flexibility in the docking procedure.21−24 A postdocking refinement of the top-scored binding poses, by energy minimization, molecular dynamics (MD), or Monte Carlo (MC) simulations, may be included. A second class of methods that mimic the conformational selection model is ensemble docking.9,10,25−27 These methods consider ligand docking to predetermined multiple receptor conformations (MRC) that can be obtained from either experimental data (X-ray crystallography or NMR spectroscopy) or computational sampling (MD or MC simulations, normal-mode analysis). Some of the ensemble-based methods employ MRC in a single docking run during the pose-generation phase.25 Other methods perform a series of independent docking runs to all members of the ensemble; for example, in the relaxed complex scheme (RCS), the MRC are generated by MD simulations of the unbound protein, followed by cluster analysis to select representative conformations.28−30 Ensemble-docking approaches are particularly suitable for binding processes that involve large conformational changes, provided that all the relevant protein conformations are considered. MD simulations have been widely used to describe intrinsic protein dynamics and, in recent years, significant increases in computational power have broadened their applicability.31−33 However, due to the time scale limitations, conventional MD simulations are often unable to explore the vast and rugged free-energy landscape of a protein, and conformational states relevant for recognition processes may be separated by high energy barriers that are rarely crossed within the simulation time. To improve the efficiency of sampling and to accelerate the crossing of high energy barriers, various enhanced sampling techniques have been proposed, including accelerated MD, replica-exchange, umbrella sampling, and metadynamics.9,34,35 In particular, accelerated MD36,37 has the advantage that it does not require the a priori selection of a reaction coordinate or collective variables; therefore, it can be used when information on the holo structure is missing. In line with this characteristic, this method has been proposed as a valuable tool for ensemble-based docking and screening methods.9,34 Furthermore, the choice of a reduced set of representative conformations within the trajectory is pivotal to the efficiency of ensemble docking. The set should capture the entire structural diversity of the target with a limited number of significant conformers associated with the different functional substates. Cluster analysis has proven to be effective for this
aim,38 and some optimized clustering techniques for the analysis of conformational ensembles have been proposed.39,40 In this work, we analyze ligand-binding processes associated with large conformational changes of a protein to elucidate the critical choices in ensemble-docking strategies for effective prediction of the structure of ligand−protein complexes. We have selected two study cases, in which binding involves different conformational changes of the receptor and different ligand−protein interactions. The acetylcholine binding protein (AChBP) mechanism is a prototypical example of nonspecific binding of multiple diverse ligands at the same site, modulated by the dynamics of a loop capping the binding pocket.41 Conversely, binding to the allose binding protein (Allose BP) represents an example of hinge-bending motion of two domains that leads to highly specific binding of a single ligand.42,43 The availability of X-ray structures of both the apo and holo forms44−48 makes the above systems ideal test cases for our study. Our results not only confirm ensemble docking as an effective tool to consider protein flexibility in ligand binding but also identify the most appropriate methodological choices for the two binding mechanisms. By comparing the performances of conventional MD (cMD) and accelerated MD (aMD) simulations, we assess the choice of the best sampling technique to overcome the energy barriers between the minima of the apo system and efficiently sample the rough conformational landscape within the holo minimum region. Then, the most appropriate strategy to select a reduced set of protein conformations relevant to binding within the MD trajectories is evaluated by comparing results obtained using different clustering techniques. Finally, the role of the docking method and the importance of introducing a post-docking refinement stage are verified.
■
RESULTS Multiple-Ligand Binding to Acetylcholine Binding Protein. A group of X-ray structures of the AChBP in both the apo and holo forms was selected as an optimal structure set to study multiple-ligand binding at the same site modulated by loop motion. The AChBP is a soluble homopentamer homologous to the extracellular N-terminal ligand-binding domain of the nicotinic acetylcholine receptors (nAChRs), transmembrane proteins involved in ion gating at the basis of neuronal response and muscle activity. Detailed structural analysis of the binding of nicotine and other ligands affecting ion flow and neuronal stimulation to the nAChRs is impaired by the large size and the transmembrane spans of these 1564
DOI: 10.1021/acs.jcim.7b00125 J. Chem. Inf. Model. 2017, 57, 1563−1578
Article
Journal of Chemical Information and Modeling
closer than 2 Å (and, in three cases, closer than 1.5 Å) to each reference structure were broadly sampled in the cMD simulation (Table 1a). In all cases, the frame with the lowest RMSD was sufficiently close to the binding-site conformation adopted in the bound protein (approximately 1 Å RMSD) to make it a suitable candidate for docking studies. Compared to cMD, aMD sampled a low percentage of frames close to the holo structures (Table 1b, c). With both single- and dual-boost aMD, the number of frames within 1.5 Å from each reference holo structure was less than half the number of cMD frames. Furthermore, fewer frames near the 2PGZ and 2BYR structures were recorded in the dual-boost aMD, indicating difficulty in reproducing the correct open conformations. To analyze the effectiveness of the different simulations of the apo AChBP in sampling the space related to the fluctuations of the C-loop, the conformational state probabilities were represented in the subspace defined by two geometric variables (Figure S2): the distance between two Cα, to describe the opening motion of the loop, and the angle among three Cα at the tip of the C-loop, to describe the loop bending. The probability distribution map derived from the cMD simulation (Figure 2a) shows two frequently sampled zones, corresponding to closed and open arrangements of the loop, with a slightly higher probability for the closed state. Projection of the AChBP X-ray holo structures into this subspace showed that these structures are located in the two frequently sampled areas. In particular, the epibatidine (EPJ), hepes (EPE), and lobeline (LOB) ligands bind a C-loop closed conformation, whereas cocaine (COC) and methyllycaconitine (MLK) choose more open conformations. According to the conformational selection hypothesis, each ligand selects the most appropriate state among multiple pre-existing states in the apo protein free-energy landscape that were effectively sampled in the cMD simulation. The same map obtained using single-boost aMD (Figure 2b) shows that the conformational subspace explored by this simulation (in 60 ns) is similar to that of cMD (200 ns). The dual-boost aMD samples a different region (Figure 2c), corresponding to open states in which the loop is bent (low angle values). This explains the low percentage of frames near the open experimental structures (Table 1c). The dual-boost aMD rapidly escaped the closed conformation, evolving to an open state distinct from the experimental one. In addition to the efficiency in sampling the apo protein conformational landscape, the ability to select a reduced number of representative conformations from the MD trajectory is crucial for ensemble-docking applications. Different variables can be considered to cluster distinct macrostates, but for the purpose of ensemble docking, a good clustering protocol should be able to recognize different arrangements of the binding site. Therefore, we employed two different clustering strategies focused on the binding site: the GROMOS RMSD-based algorithm applied to the positions of the bindingsite heavy atoms and the volumetric method based on clustering of the binding-site accessible volumes (see Methods section). The binding-site RMSDs from the holo structures for the cluster representatives, obtained by the two approaches, are reported in Table 2. Both clustering methods identified representative conformations close to the holo structures in the cMD simulation, with slightly lower RMSDs for the ensemble produced by volumetric clustering. The ensemble derived from single-boost aMD showed similar characteristics,
receptors. The AChBP, which shows high similarity in structure and ligand-recognition properties, has been broadly used as a surrogate structure,41 and several X-ray structures of AChBP− ligand complexes have been resolved. These structures indicate that the pentamer subunits are arranged in a cylinder, with each subunit characterized by an N-terminal helical region and a 10strand β-sandwich core. Ligands bind the AChBP at the interfaces between each pair of subunits, in a pocket lined with aromatic residues, behind a loop, extending from one of them, known as the “C-loop”. This loop acts as a flexible gate capping the binding site, and its large opening motion governs ligand specificity. We selected six X-ray structures of the Aplysia californica AChBP, showing different arrangements of the C-loop (Figure 1): closed conformations in the complexes with alkaloid nicotinic agonists (epibatidine, hepes, lobeline),44,45 intermediate conformations in the apo form,44 and more open conformations when bound to the noncompetitive nicotinic ligand cocaine46 and the alkaloid methyllycaconitine antagonist.44 We initially assessed the possibility to sample the apo protein energy landscape efficiently to reach all five holo protein conformations by performing extensive MD simulations of the full AChBP pentamer with different sampling protocols, starting from the apo crystal structure (PDB ID 2BYN). Since the residue positions in the subunits within the pentamer are similar, each pair of subunits forming the binding interface was treated as a replica. At first, 200 ns of cMD simulation was produced for each replica. Additionally, to evaluate whether sampling was improved by the accelerated MD technique (see Methods section), a 60 ns simulation with the single-boost aMD (all degrees of freedom equally subjected to acceleration) and a 40 ns simulation with the dual-boost aMD (additional bias potential imposed on the dihedral angles) were performed for each replica. The acceleration parameters are reported in Table S1. Plots of the RMSD from each holo structure, computed on the binding-site heavy atoms (binding-site RMSD) and monitored during the simulations, are reported in Figure S1, and the percentages of trajectory frames under 1, 1.5, and 2 Å binding-site RMSD were analyzed (Table 1). Conformations Table 1. Lowest Binding-Site RMSD from Each AChBP Holo Structure and Percentage of Frames under 1, 1.5, and 2 Å RMSD Obtained in Different Simulations 2BYQ RMSD (Å) Lowest %