Docking to RNA via Root-Mean-Square-Deviation-Driven Energy

(1) However, the increasing structural knowledge of RNA molecules and awareness of the essential role of RNA in development, gene expression, viral ...
0 downloads 0 Views 2MB Size
J. Chem. Inf. Model. 2008, 48, 1257–1268

1257

Docking to RNA via Root-Mean-Square-Deviation-Driven Energy Minimization with Flexible Ligands and Flexible Targets Christophe Guilbert and Thomas L. James* Department of Pharmaceutical Chemistry - MC 2280, 600 16th Street, University of California, San Francisco, California 94158-2517 Received January 23, 2008

Structure-based drug design is now well-established for proteins as a key first step in the lengthy process of developing new drugs. In many ways, RNA may be a better target to treat disease than a protein because it is upstream in the translation pathway, so inhibiting a single mRNA molecule could prevent the production of thousands of protein gene products. Virtual screening is often the starting point for structure-based drug design. However, computational docking of a small molecule to RNA seems to be more challenging than that to protein due to the higher intrinsic flexibility and highly charged structure of RNA. Previous attempts at docking to RNA showed the need for a new approach. We present here a novel algorithm using molecular simulation techniques to account for both nucleic acid and ligand flexibility. In this approach, with both the ligand and the receptor permitted some flexibility, they can bind one another via an induced fit, as the flexible ligand probes the surface of the receptor. A possible ligand can explore a low-energy path at the surface of the receptor by carrying out energy minimization with root-mean-square-distance constraints. Our procedure was tested on 57 RNA complexes (33 crystal and 24 NMR structures); this is the largest data set to date to reproduce experimental RNA binding poses. With our procedure, the lowest-energy conformations reproduced the experimental binding poses within an atomic root-mean-square deviation of 2.5 Å for 74% of tested complexes. INTRODUCTION

There have been successful efforts made to design ligands that bind to protein receptors using three-dimensional protein structures available via X-ray and NMR methods.1 However, the increasing structural knowledge of RNA molecules and awareness of the essential role of RNA in development, gene expression, viral replication, blocking bacterial protein synthesis, and many other processes emphasizes the potential of certain RNA structural elements to be used as therapeutic targets for small molecule drug discovery. Similar to the active sites of proteins, RNA can form well-defined threedimensional structures including bulges, hairpins, junctions, asymmetries in the deep and shallow grooves around noncanonical base pair clefts, and various binding pockets, all of which constitute attractive targets for specific and selective binding agents. RNA precedes proteins in the translation pathway; inhibiting the function of a single mRNA molecule can result in inhibition of multiple protein copies. Furthermore, there is precedence for RNA to serve as a target: the mechanism of action for many clinically important antibiotics entails binding to specific sites on rRNA. Recent reviews discuss potential benefits of targeting RNA for the development of new antibacterial and antiviral drugs.2,3 Some RNA-targeting examples include drugs such as aminoglycosides, macrolides, and other antibiotics, which are used heavily to fight bacterial infection by specifically targeting the RNA in the bacterial ribosome.4,5 Their * Author to whom correspondence should be addressed. Phone: 415476-1916. Fax: 415-502-8298. E-mail: [email protected].

widespread clinical use demonstrates that RNA is an important drug target. The development of novel antibacterials is the prime response to the threat presented by increasingly widespread resistance to existing antibiotics. Among the revealing X-ray structures of ribosomal subunits, one revealed the binding of aminoglycosides to the A site of the 16S rRNA (rRNA), their apparent functional target.6 Such binding blocks the initiation of translation, elicits premature termination, and causes miscoding, ultimately leading to bacterial cell death. In addition to these well-known clinical applications of mostly natural product analogs and one FDAapproved, entirely synthetic compound, an oxazolidinone, as antibacterials, there has been some promising work targeting other RNA targets. Examples of other important targets include human immunodeficiency virus type 1 (HIV1) RNA, specifically, the trans-activating response element (TAR),7–10 and the Rev response element.11 The functions of these regulatory RNA motifs of HIV-1 are impeded by ligands from interacting with their cognate viral target proteins Tat and Rev. Another HIV-1 RNA target is the dimerization initiation site, which binds aminoglycosides and may inhibit RNA genome packaging.12 Aminoglycosides also act as inhibitors of catalytic RNAs such as the selfsplicing group I introns,13,14 the hepatitis delta virus ribozymes,15,16 and the hammerhead ribozymes of some plant viroids.17 More recently, in the development of anticancer therapies, the telomerase RNA/DNA heteroduplex formed during the catalytic cycle of telomerase was identified as a potential site for inhibition.18 There are typically two experimental approaches used to find low-molecular-weight compounds to bind to a new

10.1021/ci8000327 CCC: $40.75  2008 American Chemical Society Published on Web 05/30/2008

1258 J. Chem. Inf. Model., Vol. 48, No. 6, 2008

protein (or, rarely, RNA) target: (a) high-throughput screening of 104 to 106 compounds or (b) using the structure of a receptor with a ligand to suggest other compounds to synthesize and test. Both approaches are expensive but have certainly had some success. However, computational screening (also called virtual screening (VS) or docking) of a large database of small molecules for binding to the target structure can also be employed as the first stage in novel ligand discovery in the case that a structure of the receptor, either with or without the ligand bound, is available. Kuntz et al. were the first to use the flexible-ligand docking program DOCK to identify compounds in the Available Chemicals Directory (ACD) that would bind to double-helical RNA.19 We successfully screened the ACD to identify novel ligands binding to TAR RNA using rigid-ligand docking with DOCK followed by fully flexible-ligand docking via ICM.8,20 ICM uses Monte Carlo minimization in torsion angle space with progressively more detailed conformational sampling on a progressively smaller list of top-ranking compounds.21 Use of the MCSS method, which took advantage of the nucleic acid parameters developed in the CHARMm force field, was also proposed.22 Studies were also carried out using a combination of docking methods and molecular dynamics simulations.23,24 In the case of TAR RNA, docking and modeling RNA with bound ligands has resulted in characterization of and enhancement of ligand binding features.25–29 Recently, automated docking methods specifically parametrized for docking to RNA were proposed by various authors: Morley and Afshar developed a new RNA-specific regression-based scoring function called RiboDock,26 which was trained and validated on a limited set of 10 RNA-ligand complexes. Detering and Varani systematically tested automated docking tools developed for proteins using another set of experimental three-dimensional structures of RNAsmall molecule complexes.30 The results show that the native structures can generally be reproduced to within 2.5 Å about 50-60% of the time on a data set of 16 RNA-ligand complexes when using the original force field parameters developed for protein binders. Moitessier et al. recently reported docking to hydrated and flexible RNA.31 However, this approach required extensive experimental information such as cocrystal structures of several ligands bound to the same target. It was tested for 11 aminoglycosides bound to the ribosomal A-site RNA. More recently, Pfeffer and Gohlke reported a new knowledge-based scoring function, DrugScoreRNA, to predict RNA-ligand interactions on the basis of a data set of 31 RNA-ligand complexes by deriving energetic information from the statistical analysis of structural parameters for known biomolecular complexes.32 They used the Autodock program,33 which holds the RNA receptor rigid during the docking process, and they obtained a success rate of 42% using a 2 Å cutoff. Success rates of 70-90% are typically obtained on protein-ligand test sets using nearly any of the many docking programs. However, the few comparable studies on smaller RNA-ligand test sets yield success rates of 40-60%. Two main factors help explain the relative lack of VS accuracy for RNA compared to proteins: (a) the lack of a robust empirical scoring function, which can discriminate real binders from nonbinders during the docking process and which can estimate the free binding energy for ranking the best ligands, and (b) the necessity to include RNA flexibility

GUILBERT

AND JAMES

in ligand docking. Good scoring functions and flexibility are probably the two biggest issues for protein docking, but they pose a significant problem for RNA docking because there are relatively few experimental RNA-ligand structures for the development of scoring functions, and functional RNA targets can be notoriously flexible. In the study presented here, we are primarily concerned with addressing the issue of the malleability of RNA upon ligand binding. Several docking programs treat the receptor as a rigid object and cannot account for conformational changes upon binding. This simplification results in relatively poor crossdocking results and low enrichment factors in virtual screening studies.34 Analyses of RNA have established that a complex and its free component molecules often differ in structural details. Examples for which an induced fit has been demonstrated upon small ligand binding include HIV-1 TAR RNA and bacterial A-site rRNA, which has been discussed.2 This reflects the conformational plasticity of RNA often observed upon binding to a protein partner. A single structure, or even the weighted average provided by a crystal structure or NMR structure, may not adequately describe the conformations (substates) accessible to the target RNA. The “structure” determined for RNA structural elements other than double helix typically represents an average or possibly the lowest-energy conformer among several that differ only slightly in energy.35,36 The populations of conformational states depend not only on conditions such as pH, temperature, and ionic strength but also, very importantly, on the introduction of a ligand into the system. Upon ligand binding, many systems undergo rearrangements, which range from local motions of side chains in proteins or base pairing in nucleic acids to large domain movements. We may envision that the initial interaction between a ligand and the receptor in solution would occur with preference for partly fitting receptor and ligand conformations among an ensemble of conformers followed by readjustments to the final stable complex. A combination of conformational selection and induced fit would seem to be the best description of the interaction between molecules that do not fit optimally via their unbound conformations. However, considering the kinetics and thermodynamics of a binding reaction, the induced fit is possible only if the match between the interacting sites is strong enough to provide the initial complex with enough strength and longevity that the subsequent induced fit can take place within a reasonable time.37 For protein-based drug design, the challenge of docking a flexible ligand to a receptor with varying degrees of flexibility has been addressed by a few groups. A plethora of recent good reviews,38–41 and descriptions of methodology,21,31,42–44 for flexible receptor docking have been published. However, in most of the actual docking programs used in these schemes dealing with receptor flexibility, the receptor is fixed for all of the calculations. For RNA-based drug design, there was one RNA docking study where some receptor flexibility was permitted for the best-scoring ligands resulting from rigidreceptor docking.8 One approach to accommodate small changes in conformation utilizes a “soft” scoring function, which permits some overlap between the ligand and the protein, allowing for a small, limited plasticity of the receptor.45 Soft-docking also has the benefit of being fast. Significant improvement over single structure docking has been achieved through the use

DOCKING

RMSD-DRIVEN ENERGY MINIMIZATION

J. Chem. Inf. Model., Vol. 48, No. 6, 2008 1259

of several receptor structures to mimic flexibility in the docking process; rigid receptor docking is simply carried out for several different receptor conformations. These different conformations could come from multiple experimental structures when available.38,46–49 Alternatively, they could be simulated via molecular dynamics50–52 or normal mode calculations.53–55 However, docking many individual structures will be computationally limited with a large database to screen. For efficiency, multiple conformations would need to be combined and incorporated into the VS procedure, which is a nontrivial task. Autodock provides one means of doing so:33 it uses grids that can be combined to approximate an ensemble of rigid conformations. One might also question whether or not the structures selected to represent the ensemble of receptor conformations truly represent those found in solution. Some docking programs use stochastic methods, such as Monte Carlo, which allow side-chain flexibility,21,43 and even minor backbone rearrangements through molecular minimization.8,43 Although incorporating side-chain flexibility was a big step forward, screening using induced fitting of both the ligand and the receptor simultaneously has still not been reported. In the present paper, we describe a new procedure for docking ligands to RNA, which includes the flexibility of the receptor over the course of the binding search using molecular minimization techniques. This enables an induced fit of the ligand and receptor. The principle of the method is basically to add an atomic root-mean-square-deviation (rmsd) constraint energy term to the potential energy, which drives the ligand to probe the surface of the receptor using energy minimization. The use of a rmsd penalty in addition to the regular potential energy in molecular simulations is not new.56,57 Such a rmsd-type force can be used to explore the conformational space of a protein, simulate large-scale molecular motions such as hinge bending in proteins, and simulate the transition pathway between two known structures.57 The program we have developed, MORDOR (MOlecular Recognition with a Driven dynamics OptimizeR), allows induced-fit binding of small molecule ligands with RNA targets via flexible-receptor, flexible-ligand docking. Rigorous molecular force fields are used: CHARMM-27 is used in the calculations for the receptor,58 and the general AMBER force field is used for ligand molecules.59 The protocol for using MORDOR is outlined. The importance of including the internal energy of the receptor in addition to ligandreceptor interaction energy in distinguishing valid ligand binding poses is also demonstrated. To evaluate MORDOR, we have applied it to a data set of 57 RNA-ligand complex structures. The validities of our sampling procedure and our scoring function were assessed in terms of the ability to reconstruct the native RNA-ligand complex geometries. Our data set is the largest published so far and includes nearly all of the complexes used by other groups to validate their VS procedures; hence, our results can be compared to those studies.

molecule and a small-molecule ligand from the protein data bank (PDB). We reviewed them all by hand and initially selected 40 of them (20 crystal and 20 NMR structures) for our study. Our selection criteria included (a) diversity of the receptors and ligands; (b) resolution of X-ray structures in cases where more than one structure was available for a complex; and (c) avoidance of structures with certain problems, such as structures solved with incorrect chiral centers, complexes with significant intermolecular contacts between ligands due to crystal packing, and structures with missing residues in the binding pocket due to missing electron density. For comparison purposes, we later added 17 more complexes used in previous studies. RNA structures were prepared using the following principles. All RNA receptor-ligand complex structures were retrieved from the Protein Data Bank.60 Since our data set uses three-dimensional RNA structures solved by NMR techniques, which unlike some crystal structures do not reveal the positions of metal ions and water, all counterions and water molecules in the PDB files were removed. For crystal structures solved with multiple occupancies of individual atoms, the highest occupancy was selected. For structures with multiple binding sites, we used the first site encountered in the PDB file. In cases where multiple experimental NMR structures were available or there were crystal structures with multiple independent molecules in the crystallographic unit cell, again, we took the first one encountered in the PDB file. Preparation of the Ligands. The ligands were stripped from the receptor and “regularized” using OpenEye’s convert.py tool (OpenEye Scientific Software, http://www. eyesopen.com/), which protonates the ligand at pH 7.0. Next, the topology and parameters using the general GAFF force field and AM1-BCC partial charges were calculated using ANTECHAMBER.61 The ligands were ready for subsequent docking after structures were minimized in vacuo and randomly rotated two times. Preparation of the Receptor. Hydrogen atoms were added to the receptor and optimized in the presence of the ligand using the algorithm Hbuild in CHARMM.62 The complexes were then minimized with CHARMM c34b1 using topology and parameters of CHARMM and GAFF for the receptors and ligands, respectively. The receptors were ready for docking after deleting the ligand. After minimization, all of the receptors had an average rmsd of 0.4 Å (ranging from 0.11 to 0.6 Å) from their experimental counterpart (data not shown). The final data set had a total of 57 structures, which are listed in Table S1 of the Supporting Information. Our list includes 33 X-ray and 24 NMR structures. The resolution of the 33 experimental crystal structures ranges from 1.3 to 3.3 Å. MORDORsCreation of Binding Site and Docking. All calculations reported in this work were performed using a new software tool we developed in our laboratory called MORDOR. It is written in C++ and uses Python for script interface. MORDOR contains all of the tools for virtual screening. The software can perform the main docking using its own energetic and minimization routines, but it is also interfaced with ANTECHAMBER for building ligand databases,61 which include ligand topology and parameters, and CHARMM for some state-of-the-art molecular simulation

TO

RNA

VIA

METHODS

Data Set. In order to test the performance of the MORDOR docking procedure, we compiled a list of complexes that were labeled as containing both an RNA

1260 J. Chem. Inf. Model., Vol. 48, No. 6, 2008

features,62 such as calculating the solvation energy. MORDOR can also find the binding sites at the surface of the receptor to initially place the ligands. Docking results from VS can be displayed in the VIEWDOCK module of the visualization software CHIMERA when both the ligand and the flexible receptor are displayed.63 CHIMERA generates a table from MORDOR results, grouping various useful parameter columns, for example, binding energy, charge, and molecular weight. Each column in the table may be used for sorting, and the receptor-ligand complexes are visualized according to that order by selecting the row corresponding to the desired complexes. Inspection of the docking results is consequently very convenient. Sphere Generation. In order to identify the regions on the receptor most likely to constitute a useful surface for the binding of small molecules, that is, a binding pocket, we generate spheres, which are used for the initial placement of the ligand to a “hot spot” before starting the docking. After the initial placement, the ligand is perfectly free to move away from the spheres during the docking protocol described below. The procedure of sphere generation consists of mapping the entire receptor with a probe that is represented by a dummy atom of varying radius (between 1 and 4.0 Å with a step of 0.2 Å) and varying formal charge (from -1 to +1 with a step of 0.1). The receptor is completely embedded in a three-dimensional grid with 1 Å spacing. The sphere with a given radius and charge is initially placed on one node of the grid, and its position is energy-minimized (van der Waals and electrostatic), so that the sphere can find the lowest-energy pocket in the vicinity at the surface of the receptor. This step is carried out repeatedly for all nodes of the grid. The receptor is held fixed throughout this process. This calculation is repeated for each combination of sphere radii and charge values. Finally, all of the resulting spheres are clustered and ranked according to their interaction energy with the receptor, assuring that they are at least 1.4 Å one from each other. We define the active site by selecting the 10 lowest-energy spheres found to be within a radius of 10 Å from any nonproton atom of the ligand in the experimental structure. We find that these spheres are remarkably able to find the experimental binding pocket but can sometimes identify another nearby “hot spot” as well (data not shown). The details of this approach to identify the hot spot regions will be more fully explained elsewhere. Docking Via MORDOR. The MORDOR method of docking a ligand to the receptor entails three steps. These are as follows: The first step consists of placing the ligand at one of the “hot spots” on the receptor, which are defined by the spheres, by positioning one atom of the ligand in the center of one of the spheres. While the receptor is held rigid, the complex is very quickly minimized to place the ligand globally by removing any clashes between the receptor and the ligand. If no clashes are found, that is, no atom of the ligand is < 1.5 Å from any atom of the receptor, the complex goes to the second step. The same procedure was repeated 120 times with different orientations of the ligand. The initial orientation of the ligand was obtained by aligning the x axis of the ligand with one of the 120 vectors uniformly distributed on the trigonometric sphere. The receptor was held rigid during this procedure.

GUILBERT

AND JAMES

The second step entails energy minimization of the receptor-ligand complex; it allows the ligand to explore the receptor surface using a Path Exploration with Distance Constraints (PEDC) algorithm,57 enabling an induced fit of both the ligand and the receptor. In this process, PEDC forces the ligand to probe the receptor, now permitted to have flexibility, thus exploring its surface. PEDC adds an atomic rmsd penalty to the potential energy. This rmsd penalty, or “distance constraint”,57 is used as a driving force to constantly change the position of the ligand; the rmsd is calculated between the current and the reference ligand positions, where the reference is one of the previous ligand positions. The ligand movement is induced by setting a target distance (or target rmsd) value, which will “constrain” the ligand to move (using minimization) from its current position to a position where the rmsd from the reference structure will match the target rmsd value. The potential is applied in such a way that the ligand is being “pushed away”, beyond the target rmsd value, from the reference position. With a force constant of 500 kcal/(mol · Å2), the penalty decreases quadratically as the rmsd approaches the target value; that is, we pay a penalty when the rmsd is less than the target value; the penalty becomes zero when the rmsd is greater than the target value. The target rmsd is incrementally increased in small steps (0.1 Å); the receptor-ligand complex is minimized for each target rmsd. If the ligand rmsd reaches or exceeds the target rmsd, this minimum is stored as a binding pose for a subsequent rescoring in step 3; the reference is reset to the previous ligand position, and the target rmsd is reset to 0.1 Å. Otherwise, the target rmsd is incremented, and the complex is minimized again. The docking is terminated, and the whole procedure is repeated starting with the next orientation or sphere in step 1, when one of the following conditions is fulfilled: five minima have already been restored or the target rmsd exceeds 10 Å. In this way, the ligand explores many potential binding pockets, which could be as far as 20 Å from its initial position defined in step 1. Because the energy minimization is performed in vacuo, the ligand while probing the surface of the receptor could favor electrostatic interactions and lead to unrealistic receptor conformations that, once established, are almost irreversible using minimization. To avoid this problem, the receptor was restrained during docking with another flat-well atomic rmsd potential, which allowed the receptor’s heavy atoms to change freely within an overall rmsd of 0.3 Å from the initial minimized structure but added a penalty with a force constant of 800 kcal/(mol · Å2) beyond 0.3 Å. The flat-well width of 0.3 Å and the force constants for the receptor and ligand rmsd potentials were chosen as a compromise to keep the receptor close enough to the minimized native structure and yet allow enough local flexibility to accommodate changes associated with the induced fit. Practically, with the protocol and force constant used in this study, the total receptor’s rmsd from the minimized experimental structures is in the range of 0.1-0.6 Å for the best-scoring poses but can be as high as 1.0 Å in some minima (stored poses). During the docking, the penalty driving movement of the ligand is strong enough to overcome the receptor restraint, so rmsd values for local conformational deviations may exceed the total rmsd of the receptor. For the binding interface (defined as all residues of the receptor within 5 Å of any ligand’s atom),

DOCKING

TO

RNA

VIA

RMSD-DRIVEN ENERGY MINIMIZATION

J. Chem. Inf. Model., Vol. 48, No. 6, 2008 1261

Figure 1. Success rate for MORDOR to predict the experimental binding pose for the data set of 57 RNA-ligand complexes as a function of the rmsd cutoff criterion chosen. The rmsd is calculated between the experimental structure and the best-scoring MORDOR prediction for all nonproton atoms of the ligand. Scoring was based on either the interaction energy (square) or total energy (diamond).

the local rmsd is within 0.1-0.8 Å for the best-scoring poses, and it can be as high as 2.1 Å in some minima, and even transiently reach 4.25 Å in the process of minimization. Allowing such local flexibility is especially important, if there is an induced fit of the receptor associated with ligand binding. Steps 1 and 2 of the procedure are repeated for each heavy atom of the ligand for that initially chosen sphere. The whole procedure is then repeated for each of the spheres. (Typically, there are 10 spheres.) In this way, the ligand has been placed in various orientations at several different hot spots on the surface of the defined receptor (step 1), with numerous explorations of induced fitting of the ligand and receptor (step 2), thus enabling a broad sampling of induced-fit ligand binding to the receptor. The third and last step of the procedure consists of rescoring the ensemble of docking poses for a given ligand with a more advanced scoring function (Vide infra), which includes solvation energy, in an effort to improve the affinity prediction. The best total energy of the complex was used to rank and discriminate among the poses. Potential Energy and Scoring Function. The docking simulations were conducted using the all-atom CHARMM27 force field for the RNA receptors58 and the general AMBER force field for ligand molecules.59 Both share the same classical functional form of potential energy. The AM1-BCC model charges were computed using ANTECHAMBER for the ligands.61 Since the docking calculations are done in vacuo, for the sake of efficiency, we used a distancedependent dielectric model with ε(r) ) r, where r is the distance between two atoms. We also used a nonbonded cutoff distance of 12 Å, a shifted function of the electrostatic interaction that began at 8 Å and ended at 10 Å, and a switched function for the Lennard-Jones potential.62 The scoring function, which helps to distinguish the correct docking poses from incorrect ones during the rescoring step, uses the total energy of the complexes. The components of the energy are in the classic forms found in CHARMM or AMBER and are composed of electrostatic, van der Waals, dihedral angle, torsion angle, bond, and Urey-Bradley terms in addition to the more sophisticated implicit solvent treatment: Generalized Born with simple switching (GMSW),64 in CHARMM. The default GBSW parameters were used.

A small video illustrating the MORDOR procedure can be found at http://mondale.ucsf.edu/jcim2008. Typically, a docking run using MORDOR took from 0.5 to 3 h (depending on the receptor and ligand size) on a single AMD Athlon 64 3200 GHz CPU. This combination of software and hardware can typically provide virtual screening of a small database ligand library of 50 000 compounds within a few weeks using current technology clusters. RESULTS AND DISCUSSION

We explored the ability of MORDOR to reconstruct native RNA-ligand complex geometries in docking experiments. The data set used for docking consists of 57 RNA-ligand complexes. To assess how well MORDOR docking was able to reproduce the experimental RNA-ligand complex structures, rmsd values were calculated to characterize the difference between the ligand bound to the experimental structure and the conformation predicted by the docking program. For this purpose, the experimental and predicted complexes were first superimposed on the basis of the heavy atoms of the binding pockets of the receptors. The binding pocket was defined by all the residues that have at least one heavy atom within 5 Å of any heavy atom of the ligand. Then, the rmsd value was calculated using the heavy atoms of experimental and predicted positions of the ligand. The individual rmsd values are reported for each tested complex in the Supporting Information(Table S1), and the data for all 57 complexes are summarized in Figure 1. These rmsd values can be used to assess the success of the docking. Many groups using rigid receptor docking consider a rmsd cutoff of 2 Å between the docking and experimental poses as the metric for success, although more recently for docking to RNA, a cutoff of 2.5 Å was used.30 Although the choice of this cutoff is quite arbitrary, ideally, it should reflect the errors in determining the experimental structures (which are method-dependent) and real fluctuations in the ligand pose expected in solution at room temperature (which may be receptor- and ligand-specific). Typically, molecular structures determined by NMR are represented by an ensemble of structures, each of which is deemed to satisfy the experimental data. Analysis of such ensembles in the Protein Data Bank revealed that on average the position of

1262 J. Chem. Inf. Model., Vol. 48, No. 6, 2008

GUILBERT

AND JAMES

Figure 2. Examples of some correct poses predicted by MORDOR: the best pose with the highest score is shown in all-atom color, and the ligand in yellow is the experimental structure. The binding site of the experimental receptor is shown with the accessible surface colored blue (60% transparency). The Protein Data Bank accession number for the experimental structure is shown in the lower left corner. (a) X-ray structure of paromomycin bound to A-site rRNA. (b) X-ray structure of biotin bound to its aptamer. (c) NMR structure of malachite green A bound to its aptamer. (d) NMR structure of paromomycin bound to A-site rRNA.

a ligand is determined within a rmsd of 1.8 Å in NMR structures (Supporting Information, Table S2). Strictly speaking, this figure reflects the degree of indetermination of the structures by experimental NMR data (i.e., experimental errors), because each member of a NMR ensemble represents an independent structure determination attempt and not an individual solution conformer. However, the real flexibility of the complex in solution contributes indirectly to this figure, because increased flexibility leads to a paucity or even to internal conflicts in observed NMR parameters,65 both of which would lead to an increase in the apparent imprecision of the structure. Importantly, rmsd in the ligand positions can be >3 Å in individual NMR structures (PDB codes 1UTS, 1QD3, 1FMN, 1AJU, 1NEM, 1FYP, and 1AM0) and as high as 8.5 Å (PDB 1EI2, Table S2). Most of these highly variable structures are complexes entailing aminoglycoside antibiotics, where the internal flexibility of the ligand contributes significantly to the overall rmsd (Table S2). In the case of crystal structures, the experimental variability in the ligand position was estimated by analyzing structures where the binding pocket-ligand pairs were independently determined more than once. They included symmetric RNA constructs with multiple binding sites, structures where crystallographic unit cells contained more than one independent molecule, and complexes solved in multiple crystal forms. Although generally lower than in NMR structures, the variability of the ligand position within the binding site

was also quite significant for crystal structures (Table S2), with an average rmsd of 0.7 Å and, in some cases, with rmsd as high as 2.8 Å (for the magnesium and barium crystal forms of a streptomycin complex with RNA aptamer;66 PDB codes 1NTA and 1NTB). For simplicity and to facilitate comparison with published work, we will mostly use a cutoff of 2.5 Å as the criterion of a successful docking in the discussion. As shown in Figure 1, 74% (42 out of 57) of the ligands in the 57 complexes are docked within 2.5 Å when using the total energy of the complex as the scoring function. The success rate drops to 65% (37 out of 57 structures) for the cutoff of 2.0 Å. Generally, at the cutoff of 2.5 Å, MORDOR reproduces the native X-ray structures better than NMR structures: 30 out of 33 and 12 out of 24, respectively (Table S1, Supporting Information), which correlates strongly with the greater variability of ligand positions in the experimental structures determined by NMR. Examples of the successful performance of MORDOR on different sets of ligands are provided in Figure 2. Figure 2a-d show the superposition of the experimental structures and the best docking poses for four complexes. The examples from X-ray crystallography are paromomycin bound to A-site rRNA (Figure 2a, rmsd of 1.05 Å) and biotin bound to its aptamer (Figure 2b, 0.96 Å), and examples from NMR are malachite green A bound to its aptamer (Figure 2c, 1.90 Å) and paromomycin bound to A-site rRNA (Figure 2d, 1.64 Å). rmsd values between the experimental and best docking

DOCKING

TO

RNA

VIA

RMSD-DRIVEN ENERGY MINIMIZATION

J. Chem. Inf. Model., Vol. 48, No. 6, 2008 1263

Figure 3. Examination of docking details for one example: paromomycin bound to A-site rRNA (X-ray crystal structure with PDB code 1j7t). For the 1j7t docking run, there were 1531 docking poses generated. (a) The total energy of docking as a function of the rmsd value between experimental and predicted structures for the RNA-ligand complex. (b) Comparison of the binding interaction energy as a function of ligand rmsd for each of the docking poses; the red dot represents the docking pose with lowest total complex energy.

poses are shown in the Supporting Information (Table S1) for all of the complexes in the test set. Total Energy versus Interaction Energy as Scoring Criterion. Very often, the internal energy of the receptor is neglected in the scoring function since most of the current programs keep the receptor fixed during docking. With a rigid receptor, it is possible to use the receptor-ligand interaction energy exclusively (usually, van der Waals energy and electrostatic energy including the energy of hydrogen bonds), disregarding any internal receptor energy change due to interactions with the ligand. To investigate the effect of the internal energy of the receptor, the same docking poses previously calculated were rescored using the interaction energy between the receptor and the ligand instead of the total energy of the complexes. MORDOR had a much lower success rate at reproducing the experimental conformations when the internal energy was omitted from scoring (Figure 1). Only 46% of the docked structures lie within the cutoff value of 2.5 Å from the experimental structures, compared to 74% when the internal energy of the receptor was added to the interaction energy; that success rate goes down to 40% for a cutoff of 2.0 Å. This shows the importance of taking into account the internal energy of the receptor during flexible docking, because of the ligand-induced change in the receptor’s conformation. Overall, out of the 57 RNA-ligand complexes in our data set, the average rmsd between the

experimental and the best predicted pose is 2.19 Å using the total energy scoring function, compared to 3.34 Å using only the interaction energy. In all but a few cases, the rmsd value is equivalent or better when we use the total energy as the scoring criterion. However, using solely the interaction energy as the scoring criterion would have provided the correct docking poses in three cases where the total energy criterion failed: diaminopurine bound to a mutant G riboswitch (2b57), gentamicin C1A bound to A-site rRNA (1byj), and acetylpromazine bound to TAR RNA (1lvj). There is no obvious reason why these three cases would be better predicted by interaction energy alone, but they do not change the conclusion that the total energy was a much better predictor. Detailed Docking Example with One Structure. As an example, we will show some details of the A site on 16S rRNA from Escherichia coli complexed with the aminoglycoside paromomycin (PDB code 1j7t). Figure 2a compares the structure of the experimental complex with that predicted by MORDOR, and Figure 3a shows the total energy of the complex as a function of the rmsd between the predicted and experimental ligand positions; the data result from the PEDC-driven docking procedure. As expected, the best total energies of the complex are observed when the docked ligand is about 1.1 Å from the experimental structure. The best energy is -5261.13 kcal/mol, which corresponds to a rmsd

1264 J. Chem. Inf. Model., Vol. 48, No. 6, 2008

GUILBERT

AND JAMES

Figure 4. Examples of incorrect docking poses generated by docking with MORDOR. The best-scoring pose is depicted in all-atom color, and the ligand in yellow is the experimental structure. The binding site of the experimental receptor is show with the accessible surface colored blue (60% transparency). The Protein Data Bank accession number for the experimental structure is shown in the lower left corner. (a) X-ray structure of diaminopurine bound to a mutant riboswitch. (b) X-ray structure of gentamicin C1A bound to A-site rRNA. (c) NMR structure of gentamicin C1A bound to A-site rRNA. (d) NMR structure of a substituted methoxybenzene with two charged sidechains (termed RBT203) bound to HIV-1 TAR RNA.

of 1.05 Å. However, among all of the docking poses, the closest ligand has a rmsd of 0.85 Å with an energy 40 kcal/ mol higher (-5227.18 kcal/mol) than the best-ranked pose. This illustrates well the ruggedness of the potential energy surface when small changes in conformation can dramatically change the energy values. One can also see in Figure 3a how widely the algorithm allows the ligand to search the surface of the receptor; in the case of 1j7t, the ligand explored the receptor as far as 18 Å away from the experimental structure. During docking, the rmsd fluctuation of the A-site rRNA receptor binding pocket was found to fluctuate between 0.37 and 1.07 Å from the experimental structure, with the average fluctuation of the 1531 docking poses being 0.62 Å. Initially before starting any docking calculation, the minimized receptor structure was 0.37 Å from the crystal structure. In order to illustrate why MORDOR could not find the correct pose using the interaction energy as a scoring function, we report the binding interaction energy of the docking solution as a function of the ligand rmsd from the native ligand in Figure 3b. In general, if we compare Figure 3a and b, we can see that the correlation between low interaction energy and correct docking pose is not as sharp, which implies that the lowest interaction energy does not necessarily correspond to the best ligand pose. Moreover, if

we look in detail at the Figure 3b plot, the highest score would have picked a ligand 4.2 Å away from the experimental structure, well above the cutoff of 2.5 Å. The orange dot in Figure 3c represents the best docking pose using the best total complex energy as the scoring function; this pose is about 86 kcal/mol higher than the best score when considering only the interaction energy. In other words, use of the interaction energy as the criterion for selecting the best pose would not only yield a pose at substantial variance with the experimental structure but would have distorted the receptor in the process at a cost to the internal energy of the receptor. Docking Failures. We have found it difficult to identify why all docking failures occur, because there are different kinds of docking failures with different causes. Figure 4 presents some examples of docking failures. Figure 4a and b show two examples of docking failures where the ligands, diaminopurine bound to a mutant riboswitch (2b57) and gentamicin C1A bound to A-site rRNA (2et3), are somewhat above the 2.5 Å cutoff, having rmsd values of 3.39 and 3.61 Å, respectively; both structures were determined by X-ray crystallography. In the case of 2b57, the small diaminopurine ligand is essentially flipped 180° such that stacking interactions with the purine rings and H-bonding capabilities inside the binding pocket are not so different compared to the

DOCKING

TO

RNA

VIA

RMSD-DRIVEN ENERGY MINIMIZATION

J. Chem. Inf. Model., Vol. 48, No. 6, 2008 1265

Table 1. Analysis of All Docking Failures Using MORDOR PDBa

ligand

rmsd,b Å

rankc

rmsd,d Å

number of posese

∆E,f kcal/mol

1B57 2FCX 2ET3 1TOB 1BYJ 1UTS 1LVJ 1EI2 1UUD 1FYP 1AJU 1RAW

diaminopurine neomycin A gentamicin C1A tobramycin gentamicin C1A P13 RBT550 acetylpromazine neomycin P14 RBT203 paromomycin arginineamide adenosine monophosphate spermine P12 RBT205 arginineamide

3.39 5.42 3.61 2.63 5.37 4.90 3.61 3.23 8.85 2.54 4.51 2.87

2 2 2 13 2 22 3 3 3 2 2 62

0.17 0.82 0.89 2.46 0.98 1.98 1.37 1.97 1.79 0.97 1.48 2.49

581 884 885 1285 618 1473 1158 1456 922 1082 535 785

-4.23 -4.74 -0.18 -18.00 -1.11 -9.91 -2.12 -5.48 -1.18 -2.99 -0.52 -7.64

5.26 8.35 3.57

28 4 2

2.02 2.11 2.07

678 858 685

-15.24 -1.46 -1.64

1XPF 1UUI 1NBK a

Protein Data Bank (PDB) code; crystal structures are shown in bold. b Deviation from the experimental ligand position of the best-scoring pose resulting from MORDOR; the rmsd was calculated as described in the Methods section. c Rank of the best-scoring pose among those that fall within the deviation cutoff of 2.5 Å. d Deviation from the experimental ligand position of the best-scoring pose among those that fall within the deviation cutoff of 2.5 Å. e Total number of poses stored during docking. f Energy difference between the best-scoring pose and the pose that scored best among those that fall within the deviation cutoff of 2.5 Å.

experimental structure. It might be noted that the crystal structure (2b57) contains nine Co(III) ions and numerous structural water molecules, which are not taken into account in our MORDOR docking (Vide infra). For 2et3, the flexible gentamicin ligand placement is mostly correct, with the exception of the aminomethyl ring, which has flipped 180° compared to the experimental structure, preferring to form a strong electrostatic interaction with the RNA backbone. There are also a few docked ligands displaying large rmsd values, such as the NMR structure of gentamicin C1A bound to A-site rRNA (1byj; Figure 4c) and a substituted methoxybenzene with two charged side chains termed RBT203 bound to HIV-1 TAR RNA (1uud; Figure 4d). Both structures were solved using experimental restraints from NMR. The deviations from the experimental ligand positions were 8.85 and 5.37 Å for 1uud and 1byj, respectively. The RBT203 ligand is one of three found by scientists at Ribotargets as good ligands for TAR RNA, and for which they published convincing model structures based on NMR restraint data.67,10 Surprisingly, MORDOR docked all three comparatively poorly, on the basis of comparing the bestscoring pose from MORDOR with the reported structure models (see Table S1, Supporting Information). In the case of RBT203 (1uud; Figure 4d), a guanidino moiety is in approximately the correct position, but the rest of the ligand is reoriented deeper in the major groove than in the experimental ligand pose, which is more accessible to the solvent. In the case of gentamicin C1A (1byj; Figure 4c), the ligand simply found a better spot along the major groove. Given the large number of hydrogen-bonding possibilities with aminoglycosides binding to RNA, this is not too surprising. In order to assess whether missed poses are due to insufficient conformational sampling during the docking, we reviewed all 15 failures with predicted poses deviating more than 2.5 Å from the experimental positions. Details are reported in Table 1. In particular, we list the rank of the best docking pose with deviation below 2.5 Å. In other words, the lowest-energy pose had a rmsd of more than 2.5 Å for these 15 ligands, but perhaps another pose with a

reasonably low score may be closer to the experimental pose. Table 1 also lists the energy difference between the docking pose with 90%, of the cases. One of those four, tobramycin (1tob), even had a fairly decent rmsd of 2.63 Å for its bestscoring pose. The last column of Table 1 lists the total energy difference between the docking pose and the top-ranking pose. An average of 5.1 kcal/mol is seen; however, the energy gap could be