Adaptive BP-Dock: An Induced Fit Docking ... - ACS Publications

Mar 14, 2016 - Tushar Modi , Jonathan Huihui , Kingshuk Ghosh , S. Banu Ozkan. Philosophical Transactions of the Royal Society B: Biological Sciences ...
0 downloads 0 Views 5MB Size
Subscriber access provided by ORTA DOGU TEKNIK UNIVERSITESI KUTUPHANESI

Article

Adaptive BP-Dock: an induced fit docking approach for full receptor flexibility Ashini Bolia, and Sefika Banu Ozkan J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.5b00587 • Publication Date (Web): 14 Mar 2016 Downloaded from http://pubs.acs.org on March 14, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Information and Modeling is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

JCIM

Adaptive BP-Dock: an Induced Fit Docking Approach for Full Receptor Flexibility Ashini Bolia# and Sefika Banu Ozkan†*

#

Department of Chemistry and Biochemistry, Arizona State University Tempe AZ, USA# † Department of Physics, Center for Biological Physics, Arizona State University Tempe AZ, USA†

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ABSTRACT We present an induced fit docking approach called Adaptive BP-Dock that integrates Perturbation Response Scanning (PRS) with the flexible docking protocol of RosettaLigand in an adaptive manner. We first perturb the binding pocket residues of a receptor and obtain a new conformation based on the residue response fluctuation profile using PRS. Next, we dock a ligand to this new conformation by RosettaLigand, where we repeat these steps for several iterations. We test this approach on several protein test sets including difficult unbound docking cases such as HIV-1 Reverse transcriptase and HIV1 Protease. Adaptive BP-Dock results show better correlation with experimental binding affinities compared to other docking protocols. Overall, the results imply that Adaptive BP-Dock can easily capture binding induced conformational changes by simultaneous sampling of protein and ligand conformations. This can provide faster and efficient docking of novel targets for rational drug design.

ACS Paragon Plus Environment

Page 2 of 49

Page 3 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

INTRODUCTION Protein-ligand interactions play an important role in regulating diverse biological processes and diseases.1 Computational tools such as molecular docking provide faster and efficient method to study protein-ligand interactions and facilitate the design of potentially active drugs.1,2 The docking process involves: (i) the prediction of ligand conformation and pose within a target protein-binding site and (ii) the estimation of binding affinities of such protein-ligand complexes. Thus, molecular docking facilitates high-throughput virtual screening by estimating (scoring) and ranking the binding affinities of thousands of lead molecules against a protein target; making it an essential component of drug discovery programs.3 Current docking protocols offer full ligand flexibility by sampling different conformations and orientations, while inherently keeping the protein backbones rigid for algorithmic simplicity and computational efficiency.1,4 However, these protocols may fail in certain cases, particularly docking to an unbound (apo) structure of a protein, or a bound receptor structure crystallized with a different ligand (cross-docking). Virtual screening for novel drug design based on an apo structure decreases the accuracy of docking programs.5 The decrease in accuracy arises due to the fact that large backbone conformational changes upon ligand binding are fairly common (Gutteridge & Thornton, 2005).6 Indeed, protein-ligand binding energetics can change substantially, even when minor conformational changes are observed at the binding site.7 Moreover, in the case of cross-docking experiments where they start with the bound conformations, the same receptor can undergo significant conformational change when it binds to a different ligand,8-10 which may lead to inaccuracies in docking predictions.11-13 The strength and

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 49

quality of modeling binding interactions is sensitive to minute details of the receptor structure. Therefore, docking protocols that model the conformational changes of proteins may have much higher accuracy.1,14 It still remains challenging to account for backbone conformational changes during molecular docking due to the high dimensionality of conformational space and complexity of scoring functions.15 There are two major categories of docking methods that allow for a certain degree of receptor flexibility: (i) induced fit docking and (ii) ensemble docking. Induced fit docking (IFD) allows the search for a new conformational space by explicitly modeling protein flexibility such as side-chain or limited backbone variations during the docking simulation.16,17 Examples of IFD methods are Autodock4,18-20

AutoDock

Vina,21

Dynasite/GOLD,22,23

RosettaLigand,24,25

FLIPDock,26,27 Glide/Prime28, ICM,29,30 DOCK6.0.,31 PC-RELAX,32,33 FITTED,34-36 Flesky,37 FiberDock,38 GalaxyDock39 or hinge bent docking algorithms40-44 etc. All of these methods allow for modest backbone changes and side-chain sampling but do not allow for sampling of large-scale backbone conformational changes such as the closing of flaps in HIV-1 Protease upon ligand binding45 or the opening of the two subdomains (fingers/thumb subdomains) to accommodate the substrate in HIV-1 reverse transcriptase.46 On the other hand, ensemble-docking methods make use of a limited number of discrete protein conformations to account for protein flexibility prior to docking, such as RosettaBackrub,47 MedusaDock,48 AutoDock49 and IFREDA.29 Moreover, few of the IFD docking methods also use a pre-existing ensemble of conformations such as FlexXEnsemble,50 MDock51 (method based on DOCK4.052 and ITScore,53,54 that uses ensemble

ACS Paragon Plus Environment

Page 5 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

docking algorithm). The docking time for serial ensemble docking scales linearly with the number of receptor conformations in the ensemble.55 Moreover, the success of ensemble docking depends on an effective and intelligent sampling strategy while generating the ensemble (using either X-Ray or NMR protein structures,56-60 computationally derived protein conformations from MD simulations,61,62 Normal Mode analysis30,63-65 or homology models62) that can mimic the binding site conformations realized in nature. A complete overview of the available molecular docking software packages, their advantages and issues associated with them has been discussed extensively in recent reviews.1,2,4,14,16 Previously, we have developed an ensemble docking approach BP-Dock (Backbone Perturbation-Docking),66 that can predict bound-like conformations in a computationally efficient way. BP-Dock is based on Perturbation Response Scanning (PRS)63,67 that couples Elastic Network Models (ENM)68 with Linear Response Theory.69,70 In BP-Dock approach, we generate an ensemble of receptors by calculating the fluctuation responses of all the residues in a protein upon exerting random external unit force on every single α-carbon atom of the chain sequentially, which is a first order approximation of the forces exerted by the ligand while approaching a receptor. Thus, in BP-Dock we simulate the natural course of a binding event by computing the ligand induced mean-square fluctuation profile of the protein backbone. The ensemble is later docked with rigidbackbone protocol of RosettaLigand to account for ligand flexibility.25 This multi-scale docking approach was tested on 5 diverse sets of protein-ligand complexes and 20 individual protein complexes with available bound and unbound experimental structures. The docking results from BP-Dock showed that ensemble docking using multiple

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

receptor conformations (MRCs) generated from unbound conformation using PRS can provide good correlation with the experimental binding affinities, similar to those obtained from “bound” docking, i.e. when a ligand-bound crystal structure is re-docked. In this study, we present an induced fit docking approach called Adaptive BP-Dock that integrates BP-Dock with the flexible-backbone protocol of RosettaLigand24 in an adaptive manner. The “flexible-backbone” protocol of RosettaLigand allows full ligand flexibility by changing side-chain torsional angles of the ligand and provides limited backbone flexibility by: (a) adding restrained minimization on the receptor backbone at the end of every docking trajectory; (b) performing side-chain optimization around the binding pocket. In Adaptive BP-Dock, as a first step we perturb all the binding site residues simultaneously with a random unit force (i.e. Brownian kicks) and obtain a new receptor conformation using Perturbation Residue Response Scanning,66 then optimize the ligand orientation in this new perturbed receptor conformation through flexiblebackbone protocol of RosettaLigand. We generate 1000 docked (receptor-ligand) trajectories after each perturbation step and select the lowest energy docked pose for next perturbation for a total of 10 perturbations. PRS provides a fast and efficient way to incorporate receptor flexibility as shown earlier.66,71 Therefore, we combine the strength of BP-Dock in providing full backbone protein flexibility along with full ligand, limited backbone and full side-chain flexibility around the binding pocket of RosettaLigand, which overcomes the computational expense of incorporating full receptor flexibility during the docking simulation. To benchmark our approach, we use the same 5 protein test sets (HIV-1 Protease, Carbonic Anhydrase II, Alcohol Dehydrogenase, Alpha-Thrombin and Cytochrome C

ACS Paragon Plus Environment

Page 6 of 49

Page 7 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Peroxidase)66 that were used to validate BP-Dock. We also analyze glycan-cyanovirin interactions from our previous work.71 Our analysis shows better correlation with experimental binding affinities for Adaptive BP-Dock than the BP-Dock approach. We attribute this success in mimicking the ligand-induced conformational changes by perturbing the binding site residues as a first order approximation upon ligand binding. Adaptive BP-Dock also shows improved performance in self and cross-docking tests compared to the flexible-backbone protocol of RosettaLigand24 and AutoDock 4.018,19 for 10 bound protein pairs. To further determine the accuracy and sensitivity of our docking method, we perform docking for more challenging cases such as HIV-1 Reverse Transcriptase and Urokinase. Overall, our analysis implies that Adaptive BP-Dock provide fast and accurate predictions of binding affinities by capturing binding-induced conformational changes through simultaneous sampling of protein and ligand conformations. METHODS Benchmark As a retrospective assessment of our new Adaptive BP-Dock approach, we analyze 5 different diverse sets of protein-ligand complexes and glycan-cyanovirin interaction from our previous work.66,71 The 5 protein test sets includes flexible receptor of HIV-1 Protease (N = 20), Carbonic Anhydrase II (N = 9), Alcohol Dehydrogenase (N = 8), Alpha-Thrombin (N = 13) and Cytochrome C Peroxidase (N = 18); where N is the number of bound (protein-ligand) complexes for each protein set. The different bound (holo) structures and one representative unbound (apo) structure for each protein set are retrieved from Protein Data Bank (PDB).72 The initial ligand conformation was obtained

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

from the crystallographic complex of the bound protein from PDB. The glycancyanovirin study includes estimating binding affinities of CVN and its single point mutants (N = 9) for dimannose sugar. Moreover, we also perform self and cross docking for Meiler and Baker set of 10 pairs of co-crystallized receptor-ligand (bound) structures.24,25 Furthermore, as the last test we attempt the docking of the highly flexible receptor of HIV-1 Reverse Transcriptase (N = 25) and anti-cancer drug target Urokinase (N = 20). For all the protein test cases, we compare the performance of our Adaptive BPDock approach with the flexible-backbone protocol of RosettaLigand. Overall, we have a large and diverse data set of 162 different protein-ligand complexes to evaluate the performance of Adaptive BP-Dock. The complete list of all the proteins, PDB codes of their corresponding bound and unbound structures, chain length, RMSD between bound and unbound protein and name of binding ligand/peptide are reported in Table S1.

Adaptive BP-Dock Approach In this paper, we present a new molecular docking approach called Adaptive BPDock, which is an induced fit docking approach that incorporates Perturbation Response Scanning (PRS) with “flexible-backbone” protocol of RosettaLigand version 3.5. RosettaLigand performs protein-ligand docking and is a protocol in the Rosetta program suite developed by the Baker lab.24,25,73 The “flexible-backbone” protocol of RosettaLigand extends from its previous version that incorporated ligand flexibility by changing side-chain torsional angles of the ligand along with side-chain repacking of the binding pocket residues of the protein. However, the “flexible-backbone” protocol also provides backbone flexibility by adding restrained minimization on the receptor

ACS Paragon Plus Environment

Page 8 of 49

Page 9 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

backbone, thereby providing full ligand flexibility with limited protein backbone flexibility through minimization of the protein backbone. Thus, RosettaLigand can incorporate minimal protein flexibility but cannot sample large backbone conformational changes with constrained minimization alone. Therefore, in our Adaptive BP-Dock approach, we simultaneously perturb the protein backbone using the PRS methodology that incorporates large backbone conformational changes while also performing ligand docking using RosettaLigand, providing full ligand flexibility. Adaptive BP-Dock perturbs only the binding pocket residues that helps us in achieving better correlations as compared to our previous BP-Dock approach where perturbing all the residues in the protein sequentially might have averaged out the overall response. Another advantage of Adaptive BP-Dock approach over BP-Dock is the simultaneous optimization of the ligand inside the binding pocket during docking that helps in accurate prediction of protein-ligand docked pose. In summary, compared to our previous ensemble docking approach, we limited perturbations only around binding pocket in Adaptive BP-Dock to generate the perturbed conformations similar to the conformations sampled during binding. This step was incorporated into the new limited flexible-backbone version of RosettaLigand, where docking took place after the computation of new conformation of the receptor upon the perturbations. Adaptive BP-Dock is based on PRS that couples the Elastic Network Model (ENM) approach with Linear Response Theory (LRT)63,67-70 to compute the residue response fluctuation profile of the amino acids of a protein upon exerting a Brownian kick to a single residue. In our docking approach, we start with coarse-graining the protein structure, i.e., represent each residue in the protein structure as a C-alpha atom. Coarse-

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

graining helps in reducing the computational cost of calculations by decreasing the number of degrees of freedom associated with N amino acids in the protein model.74 In the original ENM method, the C-alpha model of protein can also be viewed as a threedimensional elastic network (ENM) composed of nodes represented by the C-alpha atoms of each amino acid. If these nodes are located within an interaction range, or cutoff distance, rc, then they are subjected to a uniform, single-parameter harmonic potential. The overall potential (E) of the protein network is given by the sum of all harmonic potentials among interacting nodes (C-alpha atoms) such that 

   =  − 

 

 (1) 2 ,

where γ is the spring constant,  and 

are the instantaneous and equilibrated distances

between residues i and j respectively.68 The summation is performed over the pairs of residues/nodes within the cutoff distance rc through the Heaviside unit step function  

with  

= 1 if 

≤ rc, and 0 otherwise.68,75,76 Therefore, the potential (E)

near the equilibrium state can also be written in matrix formation as:

1  = ∆ ∆ (2) 2

where ∆R is the 3N-dimensional vector of all residue fluctuations, and H is the Hessian, a 3N × 3N matrix composed of second derivatives of potential with respect to the x, y, z components of position vectors of length N (total number of residues). However, in our ENM approach, we weight the interaction strength between all residue pairs by using the inverse of the square distance instead of using arbitrary cutoff distances.63,65,77-79 This approach has been successfully tested to capture the conformational changes upon binding in 25 unbound protein structures.63

ACS Paragon Plus Environment

Page 10 of 49

Page 11 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Once we have the Hessian (H), we perturb the pre-specified binding pocket residues simultaneously with a random unit force, F (i.e. a random Brownian kick). Here, by perturbing only the binding pocket residues, we are mimicking the forces exerted on the binding site when a ligand approaches to the protein as a first order approximation with random Brownian kicks. Since the nodes are all connected within an elastic network, the perturbations applied to binding pocket residues cascades through the chain and causes other residues to exhibit residue fluctuation response, introducing conformational changes in the protein. The residue fluctuation response of all the positions in the chain is calculated using linear response theory:

 ∆ !× (3) ∆ × = ×

where ∆R is the is the 3N-dimensional vector of fluctuations of all residues, H-1 is the inverse of Hessian (H) that represents a 3N × 3N matrix composed of the second derivatives of the potential with respect to the components of the position vector. H-1 can also be represented as a matrix composed of N x N super-elements of size 3 x 3. The ijth off-diagonal super-element of H-1 has the cross-correlations between the x,y,z components of ∆Ri and ∆Rj, whereas the iith super-element of H-1 has the selfcorrelations between the x,y,z components of ∆Ri. ∆F is a 3N × 1 vector that contains the components of the externally applied force on the selected binding pocket residues. The final perturbed coordinates, Rper, for each residue can be calculated after obtaining the residue fluctuation response using, # $%& '× =  !× + )∆!× (4) where R0 are the initial coordinates of the residues before perturbation, α is a scaling factor and ∆R is the residue response fluctuation vector obtained from equation 3.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

As the perturbed conformation is coarse grained, we add the side chains of the initial structure to the perturbed backbone conformations and perform an energy minimization to relax the structure and relieve any steric clashes in the system. The perturbed structure is energy minimized with 50 steepest descent iterations followed by 1000 conjugate gradient iterations using the AMBER14 forcefield,80 along with a GB (Generalized Born) solvation model.81 After the energy minimization, all side chains are repacked with Rosetta’s ligand_rpkmin algorithm,24,25,73 so that any pre-existing clashes (according to Rosetta's energy function) can be resolved. The repacked protein structure is then docked with RosettaLigand to compute 1000 trajectories/models and the bestdocked pose among the models is used for generating a new perturbed structure for the next round of docking iteration. After 1000 docking steps, the docked models are sorted based on total energy score and top 5% of the models are ordered again based on their interface delta energy scores as described earlier.82 The interface delta energy score is the total energy of the bound protein-ligand complex minus the total energy of the unbound protein when the ligand is separated from the binding site (i.e. 500 Å away from the unbound protein). The best model with the lowest interface delta score is then selected as the lowest energy docked pose of the first iteration and used for perturbation in the second round of iteration. The process of perturbation and docking with RosettaLigand is performed iteratively for 10 cycles, until we observe a consistent decrease in the binding energy scores and formation of a well-converged distinct binding funnel in energy score/RMSD plots, which is considered as a successful indication of the convergence of the docking simulation. We also performed 5 individual docking runs with iteration parameters: 5, 10 and 20 for 2 protein datasets and observed that a minimum of 10

ACS Paragon Plus Environment

Page 12 of 49

Page 13 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

iterations is sufficient for convergence of the docking runs (See Table S2). The ligand RMSD is recorded in the output along with the energy scores computed by RosettaLigand. For ligand RMSD, heavy-atom positions in each docked pose vs. the initial pose of the ligand is computed without superimposition of the corresponding binding pocket. After 10 successful iterations, the final docked lowest energy pose is selected and re-scored using the X-Score version 1.2, which is an empirical scoring function developed to re-rank the protein-ligand complex to give a more accurate estimation of the binding free energies.83 The overall outline of Adaptive BP-Dock approach is shown in Figure 1. The CPU times for generating 10000 protein-ligand complexes using a single processor for a 200 amino-acid protein with Adaptive BP-Dock are 08:54 hours and RosettaLigand are 08:32 hours. The small increase in CPU time for Adaptive BP-Dock can be attributed mostly to AMBER minimization performed after each iteration. These CPU times can be decreased significantly by parallelization of the software.

RESULTS AND DISCUSSION Docking Results for Five Protein-Ligand Test Sets: Previously, we have shown that BP-Dock approach gives better correlation with experimental binding affinities as compared to rigid bound and unbound docking for five different test sets of protein-ligand complexes: HIV-Protease (PR), carbonic anhydrase II (CA II), alcohol dehydrogenase (AD), alpha-thrombin (AT) and cytochrome C peroxidase (CCP).66 Therefore, we want to check if we could still capture similar or better results with our new Adaptive BP-Dock approach. As described earlier,66 we still

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

have several bound (i.e. different ligands) structures for all these test cases but only one unbound structure. For both PR and CCP, we use the modeled unbound structure due to point mutations in their bound structures (See Methods of ref (66)). Moreover, for PR, we use a wide-open conformation for the unbound structure to test if we can capture closed bound-like binding affinities with our Adaptive BP-Dock approach. Interestingly, Adaptive BP-Dock is able to capture large backbone conformational changes even for the highly flexible receptor of PR. As shown in Figure 2A, there is a large conformational change between the unbound structure of PR in open conformation (green) and the bound structure in closed form (cyan) with an all-atom RMSD of 3.946 Å in the flap region (allatom RMSD of the full chain is 1.490 Å). In the Adaptive BP-Dock approach, we start with the open unbound conformation, and capture the closed flap conformation by perturbing the binding site residues of the unbound structure and computing the residue fluctuation response in an iterative fashion (magenta). The top view of the receptors (Figure 2B) clearly shows the closing of flaps in unbound structure upon perturbation. This indicates the capability of Adaptive BP-Dock to correctly predicting the bindinginduced conformational changes even when we start with an unbound protein conformation. Furthermore, we also check the corresponding interface delta energy score/RMSD plot by the Adaptive BP-Dock. The actual docking consists of 10000 protein-ligand docked complexes from 10 iterations (i.e. every iteration samples 1000 snapshots). Figure 2C shows the interface delta energy score/RMSD plots energy scores for the first, fifth and tenth iterations. For each iteration, a nice docking funnel in the protein-ligand energy landscape is formed. More importantly, we observe a decrease in energy scores from 1st → 5th → 10th iteration along with a decrease in RMSD of the

ACS Paragon Plus Environment

Page 14 of 49

Page 15 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

ligand in the docked pose compared to the position of native ligand obtained from the bound experimental structure. Thus, as the unbound structure gets closer to the bound form in each perturbation, it enables us to obtain better ligand placement inside the binding pocket. For comparison, we also performed 10 iterations of RosettaLigand with 1000 docking steps using unbound structures (i.e. generating 1000 protein ligand complexes per iteration) for the two datasets of HIV-1 Protease and Alcohol Dehydrogenase, which is a similar approach to our Adaptive BP-Dock method. The results (Table S3) show that the iterative approach with RosettaLigand yields similar results to 10000 docking steps in a single iteration. Figure S1 shows the RosettaLigand energy score vs. RMSD plots for the 10th iteration of docking for a representative protein from each protein data set. Moreover, in Figure S2 we also provide the plot of side-chain RMSDs of binding pocket residues between the docked poses and the bound experimental structures vs. corresponding energy binding scores (X-Scores) for each iteration of (a) Adaptive BPdock along with those of (b) Iterative RosettaLigand docking where we repeated 1000 steps of RosettaLigand docking with 10 iterations. This analysis suggests that incorporation of full backbone flexibility in Adaptive BP-dock enables it to sample correct orientation of the side-chains in the binding pocket, thus better sampling increases the accuracy in Adaptive BP-Dock. Overall, these results highlight the advantage of incorporating backbone flexibility to improve the accuracy of unbound docking. We have applied our approach for different bound complexes of five different proteins. Furthermore, we also performed docking for the bound and unbound crystal structures using RosettaLigand to provide a rigorous comparison with Adaptive BP-

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Dock. The lowest interface delta energy scores and X-Scores for the lowest energy docked poses obtained from the bound and unbound docking by RosettaLigand and those obtained from unbound docking for all the five protein test sets are reported in Table 1. The experimental binding affinities of all the test cases obtained from LPDB84 and Astex databases85 are also reported in Table 1. For all the 5 test sets, the X-Score energies of Adaptive BP-Dock unbound docking give better correlation with experimental binding energies than unbound docking using RosettaLigand. Adaptive BP-Dock unbound docking yields similar or better correlation compared to bound docking with RosettaLigand. The correlation plots of X-Score energies vs. the experimental binding energies are plotted in Figure 3 for (i) bound, (ii) unbound docking with RosettaLigand and (iii) unbound docking with Adaptive BP-Dock. For AD and CCP, the X-Score energies for unbound docking with RosettaLigand have surprisingly negative Pearson correlation coefficients (AD: R = -0.25, P=0.164; CCP: R = -0.25, P=0.0007). However, incorporating full receptor flexibility with Adaptive BPDock improved these correlations significantly (AD: R = 0.28, P=0.01891; CCP: R = 0.66, P