Improved Docking of Polypeptides with Glide - Journal of Chemical

*E-mail: [email protected]. ... mode of small molecules in protein complexes and identifying active compounds in virtual screening campaig...
0 downloads 0 Views 3MB Size
Article pubs.acs.org/jcim

Improved Docking of Polypeptides with Glide Ivan Tubert-Brohman, Woody Sherman, Matt Repasky, and Thijs Beuming* Schrödinger, Inc., 120 West 45th street, New York, New York 10036, United States S Supporting Information *

ABSTRACT: Predicting the binding mode of flexible polypeptides to proteins is an important task that falls outside the domain of applicability of most small molecule and protein−protein docking tools. Here, we test the small molecule flexible ligand docking program Glide on a set of 19 non-α-helical peptides and systematically improve pose prediction accuracy by enhancing Glide sampling for flexible polypeptides. In addition, scoring of the poses was improved by post-processing with physics-based implicit solvent MMGBSA calculations. Using the best RMSD among the top 10 scoring poses as a metric, the success rate (RMSD ≤ 2.0 Å for the interface backbone atoms) increased from 21% with default Glide SP settings to 58% with the enhanced peptide sampling and scoring protocol in the case of redocking to the native protein structure. This approaches the accuracy of the recently developed Rosetta FlexPepDock method (63% success for these 19 peptides) while being over 100 times faster. Cross-docking was performed for a subset of cases where an unbound receptor structure was available, and in that case, 40% of peptides were docked successfully. We analyze the results and find that the optimized polypeptide protocol is most accurate for extended peptides of limited size and number of formal charges, defining a domain of applicability for this approach.



INTRODUCTION Discovery of small-molecule therapeutics has benefited from the availability of accurate in silico docking tools that predict the binding mode of ligands to their protein targets. With the recently increased focus on the development of peptide-based therapeutics, there is a growing need to extend docking technologies to the prediction of peptide−protein binding modes. The interest in polypeptides as therapeutics has reemerged in recent years with the rising need to inhibit protein−protein interactions and other difficult drug targets coupled with advances in technologies that make peptides viable as drug candidates. The increasing interest in peptide therapeutics can be seen from the growing number of peptides entering the clinic and getting approved as drugs.1−3 Advances in experimental approaches to reduce proteolytic cleavage and improve half-life, such as antibody Fc fusions and binding to serum albumin, have proven successful at mitigating some peptide liabilities.4−6 While experimental approaches have made significant advances related to peptide therapeutics, less progress has been made on structure-based computational approaches. Several studies describe computational approaches for binding mode prediction of peptide−protein complexes based on Monte Carlo or molecular dynamics sampling schemes.7−13 However, most of these approaches are target-specific (e.g., developed specifically for PDZ9 or MHC domains14) and often require the input of an approximate bound-state conformation of the peptide,15 which may not be available in drug discovery projects. A more general tool for peptide docking, called Rosetta FlexPepDock ab initio (henceforth referred to as © 2013 American Chemical Society

FlexPepDock), was recently developed and shows good performance across a range of targets and polypeptides.12 More recently, a version of the HADDOCK protein−protein docking algorithm was shown to also perform very well on this data set.16 Both of these methods (FlexPepDock and HADDOCK) have approached the peptide-docking problem by adapting strategies developed for protein−protein docking. While the results are encouraging, the methods are computationally intensive, thereby limiting the applicability to a small number of docking calculations run on a large number of processors. For example, FlexPepDock requires thousands of CPU hours per polypeptide docking calculation. Here, we have approached the problem by adapting a smallmolecule docking program to better treat peptides. The molecular docking program Glide14,17,18 has proven to be reliable for predicting the binding mode of small molecules in protein complexes and identifying active compounds in virtual screening campaigns.19,20 The Glide SP algorithm is computationally efficient, generally yielding results in 15 s for a typical drug-like molecule. This speed allows large databases of compounds to be screened for putative binders with modest computational recourses. However, to date, Glide has been developed and optimized for docking drug-like small molecules and not polypeptides. To assess the applicability of Glide SP to the more challenging case of polypeptide docking, we predicted the binding mode of all non-α-helical peptides studied in the Received: February 26, 2013 Published: June 26, 2013 1689

dx.doi.org/10.1021/ci400128m | J. Chem. Inf. Model. 2013, 53, 1689−1699

Journal of Chemical Information and Modeling

Article

Table 1. Peptides Used in the Current Docking Benchmarka

a All peptides from the FlexPepDock12 benchmark except for those with α-helical content were included. For cases where both holo and apo structures were available, both PDB codes are given. Secondary structures are indicated as β-strand (b) or random coil (C). Residues that form the peptide−protein interface are indicated in bold font and are used in the iRMSD calculations (see Experimental Section for details).

Table 2. Number of Test Cases with at Least One Good Pose after Each Stage of Docking for Glide Calculations Using Different Sampling Settings (see Supporting Information Table S1 for a detailed overview of all parameter values used)a 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

experiment

ConfGen

rough scoring

SP min

PDM

final pose

default settings (default funnel, 302 directions, 25 rotation steps, 1.0 grid density) input conformation wide funnel wider funnel widest funnel widest funnel + 642 directions widest funnel + 1002 directions widest funnel + 1962 directions 1002 directions wider funnel + 1002 directions wider funnel + 1002 directions + 50 rotation steps wider funnel + 1002 directions + 100 rotation steps widest funnel + 1002 directions + 50 rotation steps widest funnel + 1002 directions + 100 rotation steps dense grid widest funnel + dense grid widest funnel + dense grid +1002 directions widest funnel + rigid input conformation widest funnel + input conformation + 1002 directions rigid docking rigid docking + widest funnel rigid docking + widest funnel + 1002 directions rigid docking + widest funnel + 1002 directions + dense grid rigid docking + widest funnel + 1002 directions + 50 rotation steps rigid docking + widest funnel + 1002 directions + 100 rotation steps

17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17

10 11 10 10 11 12 11 13 9 10 11 11 11 12 9 10 11 12 13 6 6 8 9 8 8

9 11 10 9 9 10 10 9 10 11 10 12 10 10 9 9 9 11 12 10 10 11 11 11 12

5 7 8 8 8 8 8 9 5 8 8 9 9 9 4 9 9 10 12 10 10 11 11 11 12

4 5 3 3 3 5 7 6 4 6 6 4 6 4 3 5 4 6 11 10 10 11 11 11 12

a

A pose is considered accurate when it has iRMSD < 2.0 Å, where iRMSD is the RMSD of the backbone atoms for interface residues (see Table 1 for interface residues). The iRMSD was determined for a maximum of 100 poses from a single Glide run. The best result that did not include the input conformer is shown in bold (experiment 7). Only 17 out of the 19 peptides were used in this section of the benchmark, as two peptides (1NLN and 1RXZ) exceeded the maximum number of rotatable bonds (50) in ConfGen and required amide bond constraints.

on the prediction accuracy of Glide SP was systematically evaluated. The optimized polypeptide docking protocol, which includes enhanced sampling plus post-processing of poses with

FlexPepDock benchmark. Docking was performed using both the holo and, when available, the apo structure of the protein. The effect of various docking settings and post-processing steps 1690

dx.doi.org/10.1021/ci400128m | J. Chem. Inf. Model. 2013, 53, 1689−1699

Journal of Chemical Information and Modeling

Article

number of cases with at least one good pose decreases at each stage of the Glide docking funnel, with the largest drop occurring during the first docking stage (rough scoring). Encouragingly, the ConfGen stage always succeeds in producing at least one conformer with a good iRMSD. However, the iRMSD metric is not sufficient for evaluating conformers produced by ConfGen, as one could have a perfect backbone conformation (iRMSD = 0.0 Å) that is undockable because of poor side-chain conformations. Fortunately, the RMSD for all heavy atoms also shows good performance of ConfGen, with all but one case (i.e., 94%) having at least one good conformer (data not shown). To test the effects of conformation generation, we performed rigid docking using the crystallographic polypeptide conformation to give an upper bound for the docking accuracy if the ConfGen algorithm were able to perfectly reproduce the bioactive conformation of all molecules. Using otherwise default settings, rigid docking (experiment 20) succeeds in finding good final poses for 10 test cases. Interestingly, only six test cases have good poses after the rough scoring stage. In fact, in only 11 of the cases was it possible to find at least one pose that passed rough scoring, meaning that the remaining six tests did not return any pose. The reason for this failure could be that rough scoring is not accurate enough to correctly identify the good poses, rotational/translational sampling is not fine enough, or a combination of these two factors. Indeed, as ligands become longer it is necessary to sample the rotational space much more finely to account for the lever arm effect. The remaining rigid docking experiments (experiments 21− 25) attempt to answer whether using a denser grid, more ligand directions, and increased funnel width can increase the number of cases with a good pose. After rough scoring the number of good poses is increased to nine, but this is still below the number for the default flexible docking run. As a final test of the initial conformation generation, we performed flexible docking in which the input conformation is appended to the automatically generated conformational ensemble to give an indication of how an improved ConfGen algorithm could affect docking accuracy. Default settings (experiment 2), including the input conformer, only increase the success rate from four to five, indicating that other parts of the Glide docking funnel are responsible for the incorrect results. Next, we explored various parameters associated with the docking funnel after the initial conformation generation stage. Increasing funnel width (experiments 3−5) with the other parameters at their default values degrades final results slightly (three cases produce a good pose in the top 10). The difference is not statistically significant. Therefore, it is not possible to draw conclusions, but the lack of improvements emphasizes the challenges of polypeptide docking and that simply tuning a single parameter may not lead to improved results. However, the number of cases with good poses at the post-docking minimization stage after increasing the funnel width actually increases from 5 to 8, showing that there are cases where a good pose exists in the top 100 poses but does not get selected in the final top 10 poses. Indeed, optimization of the scoring function (see below) suggests that the current form of Emodel is not an optimal scoring function for polypeptide pose selection. Decreasing the grid spacing by a factor of 2 for finer sampling (experiment 15) with default Glide SP settings degrades the final results slightly, resulting in three good predictions. However, combined with increased funnel width (experiment

physics-based implicit solvent MM-GBSA calculations, greatly improves the prediction accuracy compared to default Glide SP docking and approaches the accuracy of FlexPepDock at a fraction of the computational cost. Analysis of the features of peptides that are docked well with the new peptide docking workflow identify smaller size, absence of charged residues, and a high degree of extendedness correlating with accurate reproduction of native protein−ligand poses.



RESULTS AND DISCUSSION The FlexPepDock12 data set has 26 polypeptides with diverse secondary structure, including β-strand, random-coil, and αhelical peptides. While the ConfGen21 conformational sampling algorithm used in Glide has been shown to perform well for small molecules,22 it fails to generate α-helical conformations for polypeptides unless backbone torsional constraints are added. Therefore, the seven α-helical polypeptides were excluded in this work, and we focus on modifying the Glide docking algorithm to improve predictions on the remaining 19 peptides with an aim to enable ConfGen to generate helical polypeptide conformers in future work. Table 1 lists the final test set of 19 peptides used in this study along with some of their properties. The peptides range from 4 to 11 residues in length and have between 12 and 62 rotatable bonds. Optimization of Glide SP Sampling for Large Polypeptides. The Glide SP docking algorithm has been described in detail previously.17 Briefly, Glide uses a series of hierarchical filters to search for possible locations of a ligand in the binding site using a pre-defined grid representation of the rigid receptor. The series of filters can be thought of as a funnel, where the shape and properties of the protein are represented on a grid by different sets of fields that provide progressively more accurate scoring. Initial ligand conformations are generated by the ConfGen algorithm,21 and a large number of candidate initial poses are generated using an approximate description of both protein and peptide (rough scoring stage). The poses passing this rough scoring stage are partially optimized using the standard OPLS_200523−25 molecular mechanics force field (SP minimization stage). Finally, all poses are subjected to post-docking minimization (PDM stage) in the gridded protein field. Top poses are rescored and ranked by a custom scoring function (Emodel). In this work, the accuracy of each pose at various points in the Glide docking funnel was checked after (1) ConfGen, (2) rough scoring, (3) SP minimization, (4) post-docking minimization, and (5) the final top pose by Emodel is determined. In order to improve pose prediction accuracy for polypeptides, we first optimized sampling by systematically varying parameters that control pose generation and evaluation during the Glide sampling process. These parameters are described in detail in the Experimental Section. In brief, they control the Glide funnel width, ligand diameter orientation, ligand diameter angle, grid density, and van der Waals scaling. Results from all the tests described in methods other than those involving the van der Waals scaling are summarized in Table 2. All holo structures were considered at this stage, with the exception of structures 1NLN and 1RXZ, which were not used in this parameter exploration due to their large number of rotatable bonds. They are, however, included in the scoring optimization benchmark described in the next section. With default Glide SP settings (referred to as experiment 1), 4 out of 17 test cases result in a good iRMSD (defined as the interface residues having a backbone RMSD under 2.0 Å). The 1691

dx.doi.org/10.1021/ci400128m | J. Chem. Inf. Model. 2013, 53, 1689−1699

Journal of Chemical Information and Modeling

Article

Table 3. Number of Test Cases Producing Good Poses (columns 6−9) for Various Values of the Ligand and Receptor vdW Scaling Factors (columns 2−5)a

a

experiment

ligand charge cutoff

ligand scaling

receptor charge cutoff

receptor scaling

rough all

rough

SP min

final

rigid flexible rigid/scale ligand A rigid/scale ligand B rigid/scale ligand C rigid/scale ligand D rigid/scale ligand E rigid/scale ligand F rigid/scale ligand G rigid/scale receptor A rigid/scale receptor B flexible/scale ligand flexible/scale ligand/experiment 7

0.15 0.15 0.15 0.15 0.15 1 1 1 1 0.15 0.15 1 1

0.8 0.8 0.6 0.4 0.2 0.8 0.6 0.4 0.2 0.8 0.8 0.4 0.4

0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25

1 1 1 1 1 1 1 1 1 0.9 0.8 1 1

11 17 13 14 14 12 14 15 16 13 13 17 17

6 10 7 7 7 7 8 9 10 5 5 11 11

10 9 11 11 12 11 11 12 12 7 7 9 11

10 4 10 10 9 10 10 12 10 7 6 4 5

“Rough all” is the number of test cases that produce at least one pose (good or bad) after rough scoring.

16), the number of good predictions improved to five. It is possible that increased grid density without a corresponding increase in funnel width causes the set of rough poses to be less comprehensive, thus increasing the chances that a good pose will be “crowded out” based on the approximate rough score when in fact it could rank better with a more accurate scoring function. Increasing the number of directions for the ligand diameter from the default value of 302 while keeping other settings at the default Glide SP values (experiment 9) does not affect the final number of good poses. However, combined with an increased funnel width (experiments 6−8), there is an increase in the number of cases with good final poses, reaching a maximum of seven cases for 1002 directions (increasing the directions further deteriorates the result, and importantly, increases the CPU/memory usage significantly). Experiment 7 shows the best results obtained from tuning the docking funnel parameters and is henceforth referred as the SP-PEP method. Including the crystallographic conformation in the initial ConfGen conformational ensemble with the settings from experiment 7 increases the number of cases with good final poses to 11, indicating the best result that can be achieved with the current Glide docking and scoring protocol if the ConfGen algorithm were able to perfectly reproduce the bioactive conformation for all polypeptides studied here. Increasing the number of steps used to rotate the ligand around its diameter did not improve docking accuracy for flexible docking. This setting was only tested in conjunction with an increased number of directions and increased funnel width (experiments 11−14). In fact, increasing the number of steps degraded the result (i.e., 100 rotation steps was worse than 50 rotation steps, which was worse than the default 25). For rigid docking, however, the end result did improve slightly and the best result from Table 2 (experiment 25) is the one for rigid docking with “widest funnel”, 1002 directions, and 100 rotation steps (12 cases resulted in a good final pose). The effect of changing the vdW scaling factor for ligand and receptor was tested first in the rigid docking experiment. Changing the ligand scaling factor to reduce vdW radii resulted in improvements in the number of cases with good poses after rough scoring (Table 3). The best results for rigid docking were obtained from scaling all atoms instead of only the nonpolar atoms and setting the scaling value to 0.4 (experiment “rigid/ scale ligand F”). This is a rather extreme reduction in vdW

radii, and applying it to flexible docking indeed decreased the benefit of increasing the number of directions and funnel width (e.g., changes equivalent to experiment 7, Table 2). Changing the receptor scaling factor did not improve the overall docking outcome. The vdW scaling results are shown in Table 3. Because the greatest loss of good poses happens at the rough scoring stage, especially for rigid docking, we performed additional vdW scaling at this stage. VdW scaling was applied to all atoms regardless of their partial charge. The results of these experiments are shown in Table 4. While the additional vdW scaling improved rigid docking somewhat, flexible docking accuracy degraded. Table 4. Number of Test Cases Producing Good Poses (columns 3−6) for Various Values of the Rough ScoreSpecific vdW Scaling Factora experiment

ligand scaling

rough all

rough

SP min

final

rigid rigid/scale ligand rigid/scale ligand rigid/scale ligand rigid/scale ligand flexible flexible/scale ligand + experiment 7 flexible/scale ligand + experiment 7 + input conformation

1 0.8 0.6 0.4 0.2 0.6 0.6 0.6

11 14 14 15 16 17 17 17

6 7 9 10 10 11 11 14

10 11 11 12 12 10 12 14

10 10 11 12 12 4 3 10

“Rough all” is the number of test cases that produce at least one pose (good or bad) after rough scoring. a

Physics-Based Post-Processing and Rescoring (MMGBSA). As stated above, the best possible combination of Glide docking settings identified here are in experiment 7 (Table 2) and referred to as SP-PEP. Given the improved sampling of this algorithm, we next tested whether application of postprocessing and rescoring steps could further improve the identification of correct poses from the ensemble of predictions. Leveraging a dependence of Glide on the input bond lengths and angles of the ligand,26 we ran the SP-PEP method 10 times for each of the holo structures with different starting conformations, collecting a total of 100 poses for each run. The 10 starting conformations were generated using a MacroModel sampling method recently developed to efficiently 1692

dx.doi.org/10.1021/ci400128m | J. Chem. Inf. Model. 2013, 53, 1689−1699

Journal of Chemical Information and Modeling

Article

Table 5. Accuracy of Docking Using Multiple Independent Default and SP-PEP Glide Runs (columns 2−4) and Accuracy of Pose Ranking Using Various Scoring Functions (columns 5−9)a number of good poses (