2214
J. Chem. Inf. Model. 2008, 48, 2214–2225
Protein-Ligand Docking against Non-Native Protein Conformers Marcel L. Verdonk,* Paul N. Mortenson, Richard J. Hall, Michael J. Hartshorn,§ and Christopher W. Murray Astex Therapeutics Ltd., 436 Cambridge Science Park, Milton Road, Cambridge CB4 0QA, United Kingdom Received July 7, 2008
In the validation of protein-ligand docking protocols, performance is mostly measured against native protein conformers, i.e. each ligand is docked into the protein conformation from the structure that contained that ligand. In real-life applications, however, ligands are docked against non-native conformations of the protein, i.e. the apo structure or a structure of a different protein-ligand complex. Here, we have constructed an extensive test set for assessing docking performance against non-native protein conformations. This new test set is built on the Astex Diverse Set (which we recently constructed for assessing native docking performance) and contains 1112 non-native structures for 65 drug targets. Using the protein-ligand docking program GOLD, the Astex Diverse Set and the new Astex Non-native Set, we established that, whereas docking performance (top-ranked solution within 2 Å rmsd of the experimental binding mode) is ∼80% for native docking, this drops to 61% for non-native docking. A similar drop-off is observed for sampling performance (any solution within 2 Å): 91% for native docking vs 72% for non-native docking. No significant differences were observed between docking performance against apo and nonapo structures. We found that, whereas small variations in protein conformation are generally tolerated by our rigid docking protocol, larger protein movements result in a catastrophic drop-off in performance. Some docking performance and nearly all sampling performance can be recovered by considering dockings produced against a small number of non-native structures simultaneously. Docking against non-native structures of complexes containing ligands that are similar to the docked ligand also significantly improves both docking performance and sampling performance. INTRODUCTION
Advances in technologies like X-ray crystallography and NMR have meant that 3-dimensional (3D) structures are available for an ever-increasing number of drug-targets. Protein-ligand docking takes a central role in most modern structure-based drug discovery projects, where it can be used for virtual screening and interactive design. A range of protein-ligand docking programs has been reported in the literature, and an increasing number of these tools are available to the community, e.g. DOCK,1 GOLD,2,3 GLIDE,4,5 and ICM.6 The performance of a docking program is normally measured by its ability to reproduce the ligand binding modes for a test set of protein-ligand complexes from the Protein Data Bank7 (PDB). A number of such test sets have been constructed over the years, but most suffer from issues around data quality, diversity, and relevance to drug discovery. Recently, we constructed the Astex Diverse Set: a new test set containing 85 protein-ligand complexes that addresses the issues with the existing sets.8 In brief, this set (i) only includes structures of targets that are relevant to drug discovery or agrochemistry and contain druglike ligands; (ii) contains a diverse set of protein-ligand complexes with no target represented more than once; and (iii) contains only high-quality structures for which structure factors have been * Corresponding author e-mail:
[email protected]. § Present address: Dotmatics Ltd., 1-2 Thorley Hall Stables, Bishop’s Stortford CM23 4BE, United Kingdom.
deposited in the PDB and for which the electron density accounts for all parts of the experimental binding mode of the ligand. Although other measures for docking success have been proposed,9,10 the most commonly used performance indicator for a docking program is the percentage of ligands for which the top-ranked solution produced by the docking program is within 2.0 Å Root-Mean-Square Distance (rmsd) of the experimental binding mode; we will refer to this as the docking performance. The docking performance of good quality docking protocols tested against large test sets of protein-ligand complexes, as reported in the literature is approximately 70-80%.5,8 From such success rates, one might be inclined to conclude that there is limited scope for further improving docking protocols. However, the success rates quoted are for docking each ligand against its native protein conformation, i.e. the 3D coordinates of the protein are taken from the same structure that contained the ligand. Hence, any protein conformational changes induced by the ligand are reflected in the protein structure that the ligand is docked against (we will refer to this as “native docking”). In real-life applications where the binding modes of newly designed compounds are predicted, the protein structure used to dock against will be that of a complex containing another ligand or that of the apo form of the protein (we will refer to this as “non-native docking”). In reality therefore, when applied to actual drug discovery projects, the success rates of docking programs will be lower than those quoted above. In addition, there is a real danger that docking protocols will
10.1021/ci8002254 CCC: $40.75 2008 American Chemical Society Published on Web 10/28/2008
DOCKING
AGAINST
NON-NATIVE PROTEIN CONFORMERS
be overtrained on validation sets containing only native protein conformations. Several authors have studied the performance of docking programs against non-native conformations of the targets. However, the numbers of targets investigated in these studies has been limited. Murray and co-workers reported on the performance of the docking program PRO_LEADS11 against a set of 21 protein-ligand structures for thrombin (6), thermolysin (9), and neuraminidase (6).12 The authors found that, although success-rate for native docking was 76%, this dropped to 49% for non-native docking. Birch and colleagues also investigated neuraminidase and found that the sensitivity of the docking performance to small induced fits was strongly dependent on the scoring function used.13 Whereas Goldscore performed well for non-native docking on a test set of 15 ‘clean’ neuraminidase complexes (89% success), Chemscore performed poorly (49% success). A test set of 73 complexes of trypsin (10), thrombin (26), and HIV-1 protease (37) was used by Erickson et al.14 to assess the effect of ligand-induced fit on the docking performance of CDOCKER.15 Native docking performance was 67% for trypsin, 36% for thrombin, and 50% for HIV-1 protease, but a significant drop-off in performance was observed when docking was done against the “average” conformation (60%, 27%, and 35%, respectively) or against the apo conformation of the protein (37%, 9%, and 4%, respectively). Recently, the same group reported a study on a wider range of 8 drug targets,16 in which they showed a drop-off in docking performance from 39% for native dockings to 26% for non-native dockings (results are averages over the 8 targets for CDOCKER). Cavasotto and Abagyan reported on the performance of their docking program ICM6 against a set of four protein kinases.17 These authors also observed a significant drop-off in success rates for non-native dockings, when compared to native dockings against these targets. The native docking performance reported is remarkably high: 100% of ligands are docked within 2 Å of the experimental binding mode. Unfortunately, the fraction of non-native dockings for which the ligand is predicted within 2 Å can only be derived from this paper for two targets: 50% for CDK2 and 56% for cAPK. Corbeil et al. tested their docking protocol FITTED against a test set of 33 complexes for five drug targets and observed a drop-off from 67% for native dockings to 49% for non-native dockings.18 There are a host of other studies in the literature reporting on test sets for non-native docking,19-22 particularly for the evaluation of the performance of docking protocols taking into account protein flexibility. The most diverse of these test sets was reported recently by Bottegoni et al.,23 who constructed a test set of 32 complexes for 16 drug targets to test the performance of their flexible protein docking protocol SCARE. Particularly now an increasing number of docking programs attempt to account for protein flexibility (e.g., FlexE,24 GLIDE,21 SCARE/ICM,23 etc.), there is a real need for a substantial, high-quality, relevant, and diverse test set for non-native docking performance. Here, we report on the construction of a large test set for assessing docking performance against non-native protein conformations. This new set, the Astex Non-native Set, is built on the Astex Diverse Set and contains 1112 structures for 65 drug targets. The Astex Non-native Set has undergone quality checks similar to those used in the construction of the original native-
J. Chem. Inf. Model., Vol. 48, No. 11, 2008 2215 Table 1. Filters Applied to Construct the Astex Non-Native Seta
total number of relevant active sites resolution e2.5 Å protein sequence identical for binding site one site per PDB entry (lowest B-factor) filters applied during setup of structures visual inspection
non-native structures
targets
2842 2340 1541 1287 1142 1112
83 81 75 75 66 65
a The numbers of targets and non-native structures available at each stage are shown.
docking Astex Diverse Set and will be made publicly available via the Cambridge Crystallographic Data Centre Web site (http://www.ccdc.cam.ac.uk). Using this new test set, we investigate the effects of ligand-induced fit on rigidprotein docking performance using the protein-ligand docking program GOLD.2,3 In addition, we investigate the potential of docking protocols taking into account multiple protein conformations simultaneously. METHODOLOGY
The Astex Diverse Set was used as a basis for the construction of the Astex Non-native Set. For each entry in the Astex Diverse Set, we set out to find all entries in the PDB where the target is the same, but the structure is either an apo structure or a complex with a different ligand. The non-native docking validation then consisted of docking each ligand from the Astex Diverse Set against all its non-native protein conformers. Hence, the Astex Non-native Set only contains non-native structures of the protein targets in the Astex Diverse Set and does not contain any new ligands. We believe this strategy is useful for the following reasons: (i) The approach allows a direct comparison of non-native docking against the Astex Non-native Set to native docking against the Astex Diverse Set as both sets consist of exactly the same targets and ligands. (ii) It allows us to separate the problem of ligand-induced protein conformation from other ligand-dependent factors like protonation states and tautomers in protein residues; although these are clearly important issues, we feel the purpose of a non-native docking set is rather to assess the influence of protein movement on docking performance. (iii) The newly added ligands and their structures would need to be of the same quality, diversity, and relevance as the structures in the Astex Diverse Set itself; only a limited number of targets have additional liganded structures that conform to these criteria. Selecting the Structures. In order to construct the Astex Diverse Set, we had clustered all protein chains in the PDB into separate targets, based on their sequence homology.8 We then selected, where possible, one representative from each PDB cluster, where the target is relevant to drug discovery or agrochemistry. This resulted in a test set of 85 complexes for 85 different targets. We reused this clustered version of the PDB (after updating it with structures deposited up to August 2007) in order to construct the Astex Non-native Set: for each entry in the Astex Diverse Set in turn, we retrieved all other structures from the PDB cluster that contained that entry. Only structures determined at resolutions e2.5 Å were kept. Table 1 lists the filters applied to the Astex Non-native Set and the number of remaining structures and targets at each stage.
2216 J. Chem. Inf. Model., Vol. 48, No. 11, 2008
Superimposing the Structures. For each target, every non-native structure was superimposed onto the corresponding structure in the Astex Diverse Set (we will refer to this as the reference structure). This is an important and nontrivial task. When the ligands are docked against non-native conformations of the target, it is critical that the non-native structures have been carefully superimposed onto the reference structure in order to get a reliable measure of the rmsd of the docked ligand against the experimental binding mode of the ligand in the corresponding entry in the Astex Diverse Set (the “reference ligand”). For each superimposition of a non-native structure against its reference structure, our approach was as follows. First each chain in the non-native structure was sequence-aligned against each contact chain in the reference structure (a contact chain being a chain that contains atoms within 4 Å of a non-hydrogen atom in the ligand); chain pairs “match” when they have >75% sequence identity. For each permutation of matching chain pairs, superimposition was attempted. The non-native structure was discarded if no permutation could be found for which all protein residues in the reference structure within 6 Å of the reference ligand had a matching residue in the non-native structure; i.e. all binding site residues need to be present and the same as in the reference structure. A large number of mutated structures were removed by this process. The superimposition of a particular chain permutation started off with a list of residue pairs, consisting of all residues in the reference structure within 6 Å of the ligand, and their corresponding residues in the non-native structure. The actual superimposition was done by the proprietary tool KFIT, using an iterative superimposition algorithm.25 In the first step of this algorithm, all protein backbone and Cβ atoms for the residues in the superimposition list were superimposed. In the second step, all atom pairs which, after superimposition, were >1.0 Å apart were removed from this list, and the superimposition was repeated with the new list of atom pairs. In subsequent steps, atom pairs were either removed if they were >1.0 Å apart or added from the original list of atom pairs (binding site residues only) if they had been deleted in a previous step, but were now within 1.0 Å. This process was repeated until the list of superimposed atom pairs no longer changed. This procedure ensured that the final superimposition was done on the parts of the binding site that are structurally most conserved. Chain permutations for which the final list of superimposed atom pairs corresponded to fewer than 6 residues were discarded (this only affected 9 structures). Of the remaining chain permutations, the superimposition with the lowest B-factor (averaged over the binding site residues) was selected to be included in the Astex Non-native Set. Only one non-native conformation was kept for each PDB entry. Identifying Ligands. Ligands were extracted from the PDB files and atom typed using a proprietary protocol as previously described.8 Only ligands that, after superimposition, were in close proximity to the reference ligand were considered. Next, all ligands were classified as solVent, cofactor, or ligand. The solVent class contains small ions, solvents, etc., which were assigned from a list of 443 HET group names (see the Supporting Information). Cofactors were assigned from target-specific HET-group lists (see the Supporting Information), depending on whether the site occupied by the reference ligand coincides with the substrate/
VERDONK
ET AL.
cofactor site or not (e.g., ADP was regarded a cofactor for the pantothenate kinase entry in the Astex Diverse Set but not for the p38 entry). All remaining ligands were put in the ligand class. Binding sites containing no ligands in the ligand class were flagged as apo structures. Setting up the Structures. The previous steps resulted in 1287 non-native structures, each superimposed onto its reference structure from the Astex Diverse Set. Each of these non-native structures now needed to be set up for docking the reference ligand from the corresponding entry in the Astex Diverse Set. A lot of thought and effort has gone into the preparation of the target structures in the Astex Diverse Set in terms of protonation states, tautomers, and side chain flips for histidine, asparagine, and glutamine side chains. Hence, the strategy we adopted here was toswhere appropriatesprepare the binding sites of the non-native structures in exactly the same way as we prepared those of the corresponding reference structures. The automated protocol we developed for this process started off by defining the binding site in both the reference and non-native structures as all residues that were within 6 Å of any heavy atom of the reference ligand. The term “residue” here includes cofactors, ions, and any other moieties that were retained in the reference structure in the Astex Diverse Set. Where the list of binding site residues included a metal ion, any additional residues within 3 Å of the ion were also included, and analogously residues within 3.5 Å of a retained water molecule were added. For each residue in the reference binding site, we attempted to find the matching residue in the non-native structure (not necessarily within the non-native binding site). If no matching residue could be found, or if the residue in the reference structure contained atoms that were absent or different in the non-native structure, then that non-native structure was discarded. For each protein residue in the non-native binding site, the same procedure was applied, i.e. the structure was removed if no appropriate match could be found in the reference structure. For nonprotein residues in the non-native structure (e.g., cofactors, metal ions, waters, etc.), if no matching residue was found in the reference structure, then the unmatched residue was removed, but this was not considered sufficient reason to discard the structure as a whole. In cases where protein side chains were disordered in the non-native structure, the following approach was taken to select one of the conformers: (i) where possible, the conformer with the highest occupancy was selected; (ii) for side chains with equal occupancy, in cases where one of the conformers clashed with the ligand, the other conformer was selected; and (iii) if no decision could be made based on rules (i) and (ii), then the conformer listed first in the PDB file was selected. A total of 3123 residues in 322 structures across 44 targets showed multiple conformations. Of these, in 1530 cases the conformation was chosen based on occupancy, i.e. rule (i), and in the remaining 1593 cases it was chosen arbitrarily, i.e. by applying rule (iii). Interestingly, although part of the algorithm dealing with selecting the conformers, rule (ii) was never actually used in practice. Next, the protocol dealt with side chain flips for asparagine, glutamine, and histidine residues. If, for the relevant matching atom pairs in these residues (terminal carboxamide group for asparagine and glutamine ring atoms for histidine),
DOCKING
AGAINST
NON-NATIVE PROTEIN CONFORMERS
J. Chem. Inf. Model., Vol. 48, No. 11, 2008 2217
the superimposition onto the reference structure was good (rmsd