Interaction Entropy for Computational Alanine Scanning - Journal of

Apr 13, 2017 - The theoretical calculation of protein-protein binding free energy is a grand challenge in computational biology. Accurate prediction o...
0 downloads 0 Views 1MB Size
Article pubs.acs.org/jcim

Interaction Entropy for Computational Alanine Scanning Yuna Yan,† Maoyou Yang,‡ Chang G. Ji,*,†,§ and John Z.H. Zhang*,†,§,∥ †

State Key Laboratory for Precision Spectroscopy, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China ‡ College of Mathematics & Physics, Shandong Institute of Light Industry, Jinan, Shandong 250353, China § NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China ∥ Department of Chemistry, New York University, New York, New York 10003, United States S Supporting Information *

ABSTRACT: The theoretical calculation of protein−protein binding free energy is a grand challenge in computational biology. Accurate prediction of critical residues along with their specific and quantitative contributions to protein−protein binding free energy is extremely helpful to reveal binding mechanisms and identify drug-like molecules that alter protein−protein interactions. In this paper, we propose an interaction entropy approach combined with the molecular mechanics/generalized Born surface area (MM/GBSA) method for solvation to compute residue-specific protein−protein binding free energy. In the current approach, the entropic loss in binding free energy of individual residues is explicitly computed from moledular dynamics (MD) simulation by using the interaction entropy method. In this approach the entropic contribution to binding free energy is determined from fluctuation of the interaction in MD simulation. Studies for an extensive set of realistic protein−protein interaction systems showed that by including the entropic contribution, the computed residue-specific binding free energies are in better agreement with the corresponding experimental data. Binding Interface Database17 (BID), and SKEMPI18 are three database collecting information on hot-spot residues. These databases facilitates the theoretical research. Various algorithms including MD simulation based methods,19−25 knowledge-based methods,26−31 empirical formula based methods,29,32−36 and network topology analysis based methods37−40 have been developed to predict the hot-spots. The last three approaches are more suitable for qualitative analysis which is more computational efficiency. In order to quantitative investigate the contribution of one single residue to binding free energy, alanine scanning mutagenesis using MD simulations can be done to discover of energetically crucial determinants for protein association that have been defined as hot spots. Fully atomistic models are more accurate and suitable for detailed analysis of protein−protein interactions at the atomic level. Rigorous approaches such as free energy perturbation41−46 (FEP) and thermodynamic integration47,48 methods have been used to calculate the PPI interaction energy. But both methods are time costly and very difficult to converge numerically as many nonphysical intermediate states of the system should be simulated. The linear interaction energy (LIE)49 approach is another type of method to calculate binding free energy. It is suitable to use this method for systems with similar interaction characteristics. In recent years, the MM/PBSA(GBSA) method has become popular for comput-

1. INTRODUCTION Protein−protein interaction (PPI) plays a very important role in biological process. It is essential for signal transduction, gene expression control, enzyme inhibition, etc.,1−9 and is closely related to human disease. PPIs disorders are one of the most important causes of many diseases, mainly due to the abnormal changes including weakening of hydrophobic interaction, lacking of salt bridge, the formation of steric hindrance. The variation of hotspot areas is the most important factor causing PPI disorder. The development of small molecule drugs with PPIs as the target has become a hot top in the treatment of diseases. Drug-like small molecules prefer to bind to hot spots at protein−protein interfaces.10−13 Accurate computation of protein−protein binding free energies could lead to more reliable prediction of protein−protein associations and help elucidate cellular pathways and the effects of crucial mutations. Identification of key interaction sites is important to reveal protein−protein binding mechanisms and identify drug-like molecules that can alter protein−protein interactions. Although protein−protein interfaces usually involves a large number of residues, it has been demonstrated that only a few residues in the protein−protein interface contribute to the stability of binding,14,15 the so-called hot-spots. A hot-spot is usually defined by binding free energy difference (ΔΔGbind) higher than 2.0 kcal/mol, and null-spot is defined by binding free energy difference (ΔΔGbind) less than 2.0 kcal/mol. Experimental alanine scanning mutagenesis is expensive and time-consuming. Theoretical methods are needed. ASEdb,16 © 2017 American Chemical Society

Received: December 6, 2016 Published: April 13, 2017 1112

DOI: 10.1021/acs.jcim.6b00734 J. Chem. Inf. Model. 2017, 57, 1112−1122

Article

Journal of Chemical Information and Modeling

Here, “A” and “B” represent, respectively, the individual proteins of the protein−protein complex. In the above equations, the superscript “x” denotes a specific residue of protein A at the protein−protein interface, and “a“ denotes alanine that replaces residue x upon mutation. ΔGxbind is the binding free energy of the wild type complex and ΔGabind the binding free energy of the mutant type (replacement of x by alanine) complex. A(x) denotes the wide type of protein A and A(a) denotes the mutant type of protein A. ΔG(A(x)−B) and ΔG(A(x)−B) represent, respectively, the binding free energy of the wide type and mutation type complex. Equation 1 can then be further expanded to give

ing protein−protein binding free energy.50−56 Furthermore, MM/PBSA(GBSA) method has been successfully used to predict hot-spots in the protein−protein interface.19,22−25 In these methods, entropy changes of the wild type and its mutant complexes upon binding are usually assumed to be equal and are therefore neglected in most calculations. Few researches have been done to study the influence of entropy to hot-spots prediction systematically. Standard normal node method is used in current MM/PBSA approach to calculate entropy change. However, the normal mode calculation of entropy is only theoretically approximate. Furthermore, in the normal mode approach, the Hessian matrix of the energy (second derivative of energy) is required, when the analytical second derivatives of the energy are not available, numerical calculation of the second derivative can be extremely costly when many degrees of freedom are involved especially for large systems such as PPI systems. Recently, a new approach based on interaction entropy (IE) was proposed for protein−ligand binding calculation. The interaction entropy was recently shown to be efficient for protein−ligand binding free energy calculation.57 In this paper, the IE method is proposed to calculate entropic contribution of residue-specific free energy contribution to protein−protein binding. To validate the accuracy of the present approach, we performed numerical studies of 24 protein−protein complexes including 433 mutants. The accuracy of the present approach was compared to that from MM/PBSA(GBSA) method without including the entropic contribution.

x→a ΔΔG bind = ΔGgas(A(a)−B) + ΔGsol(A(a)−B)

− [ΔGgas(A(x)−B) + ΔGsol(A(x)−B)] = ΔGgas(A(a)−B) − ΔGgas(A(x)−B) + [ΔGsol(A(a)−B) − ΔGsol(A(x)−B)] x→a x→a = ΔΔGgas + ΔΔGsol

(4)

In computing the above equation, we can make the following simplification,

2. THEORETICAL METHODS Resume of the methodological approach for calculating alanine scanning mutagenesis binding free energy difference is shown in Figure 1.

x→a ΔΔGgas = ΔGgas(A(a)−B) − ΔGgas(A(x)−B)

≈ ΔGgas(a−B) − ΔGgas(x−B)

(5)

An assumption is made that the difference in interprotein gasphase interaction energies between the wild type A(x)−B and the mutant A(a)−B complexes is equivalent to that between residue x−B and residue a−B. In the above equation, single trajectory approach was used and free energy change is calculated without the rearrangement of the surrounding environment. Since the solvation part of free energy in eq 4 is given by

Figure 1. Definition of binding free energy difference in alanine scanning mutagenesis process.

ΔGsol(A(x)−B) = Gsol(A(x)∧B) − Gsol(A(x)) − Gsol(B) (6)

2.1. Residue-Specific Free Energy Contribution. Free energy contributions of individual residues to protein−protein binding are typically calculated by binding free energy difference (ΔΔGx→a bind ) upon alanine mutation, which is defined as x→a ΔΔG bind

=

a ΔG bind



x ΔG bind

where Gsol(A(x)∧B) is the solvation energy of the A(x)−B complex and similarly, ΔGsol(A(a)−B) = Gsol(A(a)∧B) − Gsol(A(a)) − Gsol(B) (7)

(1)

where

we have

x ΔG bind = ΔG(A(x)−B)

= ΔGgas(A(x)−B) + ΔGsol(A(x)−B)

x→a ΔΔGsol = ΔGsol(A(a)−B) − ΔGsol(A(x)−B)

(2)

= Gsol(A(a)∧B) − Gsol(A(x)∧B)

and similarly, a ΔG bind

− [Gsol(A(a)) − Gsol(A(x))]

= ΔG(A(a)−B) = ΔGgas(A(a)−B) + ΔGsol(A(a)−B)

(8)

2.2. Gas-Phase Component ΔΔGx→a gas . The “gas-phase” component of the binding free in eq 5 is given by

(3) 1113

DOI: 10.1021/acs.jcim.6b00734 J. Chem. Inf. Model. 2017, 57, 1112−1122

Article

Journal of Chemical Information and Modeling ⎡Q ⎤ ΔGgas(x−B) = −KT ln⎢ xB ⎥ ⎢⎣ Q xB ′ ⎥⎦

by using the popular MM/PBSA(GBSA) approach, in which the solvation free energy is split into two terms int

ΔGsol = ΔGpb/gb + ΔGnp

int

∫ dqw(xB) dqx dqBe−β(Ex + EB + Ex−B + Ew(xB)+ ExB−w)

= −KT ln

where the first term in the above equation ΔGpb/gb is the electrostatic solvation free energy. The second term ΔGnp is the nonpolar solvation free energy obtained by using an empirical solvent-accessible surface area (SASA) formula

int

∫ dqw(xB) dqx dqBe−β(Ex + EB + Ew(xB)+ ExB−w)

⎡ ⎤ int 1 ⎥ = −KT ln⎢ = KT ln[⟨e βEx−B⟩] int E β ⎢⎣ ⟨e x−B⟩ ⎥⎦ = KT ln[e

β⟨Exint −B⟩

⟨e

ΔGnp = γSASA + β

int β(Exint −B −⟨Ex−B⟩) int

= ⟨Exint−B⟩ − T ΔSx−B (9)

and similarly for ΔGgas(a−B). For calculation of eq 9, we define the interaction entropy (IE) as int

−T ΔS = KT ln⟨e βΔE ⟩

(10)

The calculation of the interaction entropy by eq 10 involves the int natural log of an ensemble average of eβΔEpr , which can be extracted along with MD simulation without extra computational cost. The interaction entropy of eq 10 can be computed by numerical integration over simulation time, e.g. N

1 N

int

∑ e βΔE

int

(ti)

(11)

i=1

where ΔE = E − ⟨E ⟩ and the average interaction energy is obtained by int

⟨Eint⟩ =

int

1 T

∫0

int

T

Eint(t ) dt =

1 N

N

∑ Eint(ti)

(12)

i=1

Alternatively, one could also perform integration over the interaction energy distribution in eq 10 by the integration in interaction energy int

⟨e β ΔE ⟩ =



∫−∞ e βΔE

int

ρ(ΔEint) d(ΔEint)

(13)

where ρ is the distribution of the interaction energy. If the 1

energy distribution is Gaussian, namely, ρ = 2π σ e−(ΔE then the above integral gives the analytical result int

⟨e β ΔE ⟩ =



∫−∞ e βΔE 2 2

= eσ β

int

int 2

) /2σ 2

,

1 −(ΔEint)2 /2σ 2 e d(ΔEint) 2π σ

/2

(14)

which yields the analytical result for IE

−T ΔS =

σ2 2kT

(17)

The values γ and β we used in the calculation are the standard values of 0.00542 kcal/mol·Å2 and 0.92 kcal/mol, respectively. Equation 16 is used to calculate individual terms in eq 8. The PBSA(GBSA) program in AMBER suite58−60 is used to perform the calculation. 2.4. Molecular Dynamics Simulations. In the current study, a total of 24 PPI systems from the protein database were used for computational study. All MD simulations were performed using pmemd.cuda in AMBER14 with ff14SB force field for the protein complex. All of the simulations were carried out performed in TIP3P61 explicit water molecules with a truncated octahedron box. The closest distance between any atom originally present in solute and the edge of the periodic box is 12 Å. The particle mesh Ewald (PME)62 is used to treat the long-range electrostatic interactions. The nonbonded interactions are truncated with 8 Å cutoff. Periodic boundary condition (PBC) is imposed on the system during the calculation of nonbonded interactions. The time-step is set at 2 fs and SHAKE63 is used to constrain the bonds involving hydrogen atom. Langevin thermostat with the collision frequency 2.0 is applied to control the temperature. First, the system is minimized with protein constrained to equilibrate the solvent. Second, the system is minimized with protein backbone constrained to equilibrate the amino acid side chains and the solvent. Third, protein is released to minimize the whole simulation system. Fourthly, the system is slowly heated to 300 K, followed by a 7 ns equilibration of the whole system in an NPT ensemble at an interval of every 10 fs. To avoid unnecessary structural drift due to inaccuracy of the force field, we restrained protein Cα atom with 1 kcal/mol·Å2. 2.5. Binding Free Energy Difference (ΔΔGx→a bind) in Alanine Scanning. The last 2 ns trajectory is selected to calculate enthalpy and entropy contribution. In the application of this method, the selection is not so strict, as long as the trajectory is equilibrated. Configurational sampling is taken from 2 ns trajectories with a time step of 10 fs which is sufficient for IE calculation as is shown in following sections. Thus, a total of 200 000 configurations or snapshots are extracted from the MD trajectories for the calculation of residue-specific interaction. The perl script mm_pbsa.pl script integrated into AMBER14 package was used to calculate the enthalpy contribution to binding free energy. Two different solvation models were used to calculate the solvation free energy: GB model64,65 and PB model.66,67 When using GB model, “igb” is set to 2 in the MM/ GBSA protocol in Amber 14. A total of 100 snapshots equally distributed in the selected 200 000 configurations are used to calculate the enthalpy contribution using MM/PBSA(GBSA). Entropies are calculated using all of the selected configurations by eqs 10 and 15 described above. In actual numerical integration of IE using eq 10, we applied a cutoff limit for energy points larger than a limit to eliminate the energy

⟩]

= ⟨Exint−B⟩ + KT ln⟨e βdEx−B⟩

⟨e β ΔE ⟩ =

(16)

(15)

Here σ is the width of Gaussian distribution. However, extreme caution must be exercised when using the above formula. Unless the sampling is sufficiently converged, which is very hard for large interaction energies such as protein−protein interaction, the use of eq 15 could lead to large over estimation of the entropic term if energy distribution is not converged. 2.3. Calculation of Solvation Components ΔΔGx→a sol . In the present approach, we calculate solvation free energy in PPI 1114

DOI: 10.1021/acs.jcim.6b00734 J. Chem. Inf. Model. 2017, 57, 1112−1122

Article

Journal of Chemical Information and Modeling “noises”. The cutoff limit used here is >3σ. Then the specific residue is replaced by the alanine residue, and entropy and enthalpy are calculated in the same way. Single trajectory approach is used to calculate the binding free energy difference. The single trajectory approach has been used by several groups.22−24 By using a single trajectory, error cancellation will overcome the insufficient sampling of the conformational space. The mutant trajectories were obtained by a single truncation of the amino acid side chain, replacing Cγ with a hydrogen atom and setting the Cβ-H direction to that of the former Cβ−Cγ based on the wild type trajectory.

calculate the entropy contribution difference. (f) Calculate ΔΔGbind, which is the sum of entropy and enthalpy contribution difference. 3.1. Influence of Protein’s Internal Dielectric Constant. Evaluation of protein solvation energy using implicit model such as MM/PBSA and MM/GBSA is generally sensitive to the value of internal dielectric constant used for a protein.88−90 It was found that different types of residues prefer different internal dielectric constants.22,24,25 Here we employ different dielectric constants in calculating the binding free energy change. To evaluate the influence of the value of the dielectric constant, we calculated the ΔΔGbind values for all studied residues with dielectric constants ranging from 1 to 10 and analyzed the mean absolute error between the calculated and experimental ΔΔGbind values for each of the ten internal dielectric constant values. We divide the 20 alpha amino acids into four groups according to the structure of the side chain: nonpolar (phenylalanine, alanine, leucine, methionine, isoleucine, tryptophan, proline, and valine), polar (cysteine, glycine, glutamine, asparagine, serine, tyrosine, threonine), positively charged (histidine, arginine, lysine), and negatively charged (aspartic acid and glutamic acid). As shown in Tables 1 and 2, when using GB model to calculate the solvation free energy the corresponding dielectric constants to obtain the minimal mean absolute errors (MAEs) for polar residues is about at 3, hydrophobic residues at 1, and charged residues at 5. There is little difference using dielectric constant 5, 6, or higher for charged residues; here, we chose 5 when using GB model. When using PB model to calculate the solvation free energy the corresponding dielectric constants to obtain the minimal mean absolute errors for polar residues is about at 3, hydrophobic residues at 2, and charged residues at 9 shown in Table S3 and S4. We can see that higher dielectric constants are normally attributable to charged groups.91,92 In the following discussions, we use 1 for nonpolar residues, 3 for polar residues, and 5 for charged residues when using GB model to get the overall minimal MAE, while using 1 for nonpolar residues, 3 for polar residues, and 9 for charged residues when using the PB model. Using the selected dielectric constants for different residue types, the mean absolute errors are 1.0 kcal/mol (GB, include entropy), 1.1 kcal/mol (PB, include entropy), 1.4 kcal/mol (GB, not include entropy), and 1.5 kcal/mol (PB, not include entropy). The average standard deviations for the calculated binding free energy difference using the MM/GBSA method is 0.12 kcal/mol. Although the polar solvation term was typically obtained by solving the PB equation numerically originally, the GB approach which is faster and pairwise decomposable is becoming a more common continuum-solvation method. Several studies have compared the performance of MM/ PBSA and MM/GBSA. Compared with MM/GBSA results, the MM/PBSA results can be better,93,94 worse,95−98 or of a similar quality93,99−101 depending on the studied system. In our work, results got from GBSA are also slightly better than that from PBSA. Considering the accuracy and efficiency, we use the GB model to calculate the binding free energy in the following discussions. 3.2. Statistical Metrics. To give a qualitative analysis, we evaluated the accuracy (ACC, eq 18a), the true positive rate (TPR, eq 18b), the false positive rate (FPR, eq 18c), the precision (PPV, eq 18d), the specificity (SPE, eq 18e), and the F1 score (F1 score, eq 18f).

3. RESULT AND DISCUSSION Calculations of free energy contributions of individual amino acid residues in protein−protein binding are very useful in protein interface based drug design. Most computational alanine scanning methods are based on the popular MM/ PBSA(GBSA) approach and usually neglect entropic contribution to binding free energy in order to avoid expensive normal mode calculation for entropy. However, free energy calculations without entropic contribution are inherently problematic because one can never be sure how many errors are contained in the predicted free energies. In particular, the predicted free energies are typically overrated which can result in false hot spots when entropic contributions are not included. The proposed IE method described above is applied to specifically calculate entropic contribution to residue-specific binding free energies. In the following, we applied the IE method to alanine scanning calculations involving a total of 433 mutations in 24 protein−protein complexes. The experimental values for these mutations are collected from database KEMPI.18 The native structures of these 24 protein−ligand complexes (Protein Databank ID: 1a22,68 1ahw,69 1ak4,70 1brs,71 1cho,72 1dqj,73 1emv,74 1f47,75 1fc2,76 1fcc,77 1gc1,78 1jrh,79 1jtg,80 1ktz,81 1tm1,82 1vfb,83 1z7x,84 2gox,85 3hfm,86 3sgb87) from the PDB database are taken as the starting structures. The results of our computations are compared directly to the experimental data as well as to those from the standard MM/PBSA(GBSA) calculations without the entropic contribution for these systems. Among these 433 mutations, there are 88 residues with more than 2 kcal/mol free energy contribution to the overall protein−protein binding free energy ΔΔGbind, known as “hot spot” residues. Among all 433 residues mutated, 92 are nonpolar, 168 polar, 94 positively charged, and 79 negatively. The overall workflow is shown in Figure 2. The generic workflow for each target includes the following steps: (a) Run

Figure 2. Workflow of the present approach.

MD simulations. (b) Do alanine scanning mutation for the trajectories. (c) Calculate the wild type and mutant type enthalpy binding free energies using MM/PBSA(GBSA) script integrated into the AMBER14 package. (d) Calculate the gasphase binding free energy of a single interface residue (wide residue and mutant alanine). (e) Use the IE method to 1115

DOI: 10.1021/acs.jcim.6b00734 J. Chem. Inf. Model. 2017, 57, 1112−1122

Article

Journal of Chemical Information and Modeling

Table 1. Mean absolute error (MAE)a from Experimental Data in Calculated Enthalpy Changes (ΔΔHbind) for Different Residue Types Using MM/GBSAb dielectric constant

ε=1

ε=2

ε=3

ε=4

ε=5

ε=6

ε=7

ε=8

ε=9

ε = 10

nopolar polar positive negative

1.3 1.3 4.1 3.8

1.5 1.2 2.8 2.2

1.6 1.2 2.4 1.7

1.6 1.2 2.2 1.4

1.6 1.2 2.1 1.2

1.6 1.3 2.1 1.2

1.7 1.3 2.0 1.1

1.7 1.3 2.0 1.1

1.7 1.3 2.0 1.1

1.7 1.3 2.0 1.1

MAE = ⟨|ΔΔGbind(cal) − ΔΔGbind(exp)|⟩. bThe experental values are extracted from the SKEMPI database.18 The units of MAE are kilocalories per mole.

a

Table 2. MAEa from Experimental Data in Calculated Free Energy Change ΔΔGbind of Different Amino Acid Types Using MM/ GBSA Combined with IE Methodb dielectric constant

ε=1

ε=2

ε=3

ε=4

ε=5

ε=6

ε=7

ε=8

ε=9

ε = 10

nopolar polar positive negative

1.0 1.7 8.8 9.0

1.2 1.0 3.0 3.3

1.3 0.9 1.7 1.8

1.4 1.0 1.3 1.2

1.4 1.0 1.2 1.0

1.4 1.0 1.2 1.0

1.4 1.1 1.3 1.0

1.4 1.1 1.3 0.9

1.5 1.1 1.4 0.9

1.5 1.1 1.4 1.0

MAE = ⟨|ΔΔGbind(cal) − ΔΔGbind(exp)|⟩. bThe experental values are extracted from the SKEMPI database.18 The units of MAE are kilocalories per mole.

a

ACC =

TP + TN N

(18a)

TPR =

TP TP + FN

(18b)

FPR =

FP FP + TN

(18c)

PPV =

TP TP + FP

(18d)

SPE =

TN FP + TN

(18e)

F1 =

2TP 2TP + FP + FN

RI67 in system 1TM1, LA12 in system 1F47, and YH50 in system 3HFM. Here, 37 residues which are falsely predicted by MM/GBSA method to be hot-spots are detected by the inclusion of entropy. The improvements of accuracy and precision are encouraging. However, a very small difference in free energy can turn a hot-spot into a false negative, or a null-spot into a false positive. For example, the inclusion of entropy which is 0.43 kcal/mol for residue LA12 in system 1F47 makes this residue to be false negative. Another example, the exclusion of entropy which is 0.28 kcal/mol for residue TB51 in system 1KTZ makes this residue to be false positive. Considering this situation, quantitative analysis is needed to give more comparison. 3.3. Accuracy and Error Analysis. To give a quantitative analysis, we used three types of error estimation method to evaluate the performance our new method compared to MM/ GBSA. The mean signed error (MSE) is defined by

(18f)

In all these equations, N stands for all the residues, TP stands for true positive (predicted hot-spots that are actual hot-spots), FP stands for false positive (predicted hot-spots that are not actual hot-spots), FN stands for false negative (nonpredicted hot-spots that are actual hot-spots), and TN stands for the true negatives (correctly predicted null-spots). As shown in Table 3,

MSE =

ACC

TPR

FPR

PPV

SPE

F1 score

MM/GBSAa MM/GBSA_IEb

0.79 0.86

0.87 0.83

0.24 0.13

0.48 0.61

0.77 0.87

0.62 0.70

(19a)

The root mean square error (RMSE) is defined by RMSE =

⟨((ΔΔG bind(cal) − ΔΔG bind(exp))2 ⟩

(19b)

and the mean absolute error (MAE) is defined by

Table 3. Statistical Performance Using MM/GBSA Method with and without the Inclusion of Entropic Contributions method

⟨(ΔΔG bind(cal) − ΔΔG bind(exp))⟩

MAE = |ΔΔG bind(cal) − ΔΔG bind(exp)|

(19c)

Since the MM/GBSA method is employed to compute the ΔΔHbind in both the standard MM/GBSA approach and the present interaction entropy approach, the only difference in result from these two different calculations is the entropic component of the binding free energy. As is shown in Table S1, the average binding free energy difference for these 24 systems using the present approach are generally in better agreement with the experiment than those from the MM/GBSA calculations. To give a visual assessment, we plot 3 types of error estimations in Figure 3 for systems of 24 complexes. For all systems, results from the IE method are not worse than those generated when only the MM/GBSA method is used. For example, in the system 3SGB, seven mutations have experimental values and two mutations are hot-spots whose

a Binding free energy difference calculated using MM/GBSA method without the inclusion of entropic contribution. bBinding free energy difference calculated using MM/GBSA method with the inclusion of entropic contributions. The definitions of the quantities are given in the text.

the new method shows a better predictive power compared with MM/GBSA only. We can see that using the new method, although the recall rate decreased a little, the overall accuracy and precision are improved, the false positive rate also decreased. Residues which induce the decrease of recall rate of the new method include residues DA51 in system 1EMV, 1116

DOI: 10.1021/acs.jcim.6b00734 J. Chem. Inf. Model. 2017, 57, 1112−1122

Article

Journal of Chemical Information and Modeling

Figure 3. Three types of error estimates (compared to the experimental values) in our calculated binding free energy difference ΔΔGx→a bind for each of the 24 systems. The blue bars represent the errors using MM/GBMA result of enthalpy only and red bars represent the errors using IE method with entropic contribution to free energy. (top) Mean signed error (MSE). (middle) Root mean square error (RMSE). (bottom) Mean absolute error (MAE). 1117

DOI: 10.1021/acs.jcim.6b00734 J. Chem. Inf. Model. 2017, 57, 1112−1122

Article

Journal of Chemical Information and Modeling

very hot hot-spot, or the calculated enthalpy contribution is too small. The difference between theoretically computed binding free energies and the corresponding experimental values could arise from several factors. For example, the accuracy of the force field is one possible source of error.102 The GBSA calculation of solvation energy based on the implicit solvent model is another possible source of error. 3.4. Discussion of IE Calculation. It is useful to provide some discussion on numerical properties of the IE calculation. For this purpose, we chose four residues that have relatively large entropic contributions for our analysis. The gas-phase interaction energy, the time-averaged interaction entropy or IE versus simulation time, and the energy distribution for these four residues are shown in Figures 5a, b, and c, respectively. In Figure 5b, the black line denotes IE calculated by eq 10 with a time step of 10 fs, which is used in all calculations of IE in this study. To examine whether 10 fs is small enough for numerical integration, 10, 50, 100, and 200 fs time steps are also tested. As can be seen, in all of the four cases, they all converge to each other, indicating that the sampling is sufficient. The IEs calculated by eqs 10 and 15 are 3.0 and 2.9 kca/mol, respectively, for residue RA83 in system 1BRS. A similar result is obtained: 2.6 versus 2.7 kcal/mol, for residue DD35 in system 1BRS; 1.3 versus 1.1 kcal/mol for residue YA55 in system 1EMV; 1.9 versus 2.1 kcal/mol for residue WA92 in system 1VFB. In these four cases, the sampling is sufficiently converged and the energy distribution is a single Gaussian, and results from eqs 10 and 15 are statistically identical. In Table 4, we listed IE values calculated from eqs 10 and 15, respectively, for a total of 88 residues that are experimentally determined to be “hot spots”. In most cases, the ΔΔIEs calculated using two different methods are very close to each other. The average values of ΔΔIEs for these 88 systems using eqs 10 and 15 are 1.2 and 1.3 kcal/mol, respectively. So the difference is within statistical uncertainly to each other. In a few cases, there are relatively large deviations between the two calculations with results from eq 15 being overestimated. This usually indicate that the distribution of the interaction energy or energy

binding free energy difference are higher than 2 kcal/mol. The mean absolute error decreased from 2.3 to 1.5 kcal/mol by including entropic contributions. The MAEs from experimental values of these seven mutations decreased by the inclusion of entropy. As shown in Figure 4, the MSE error for all residues of

Figure 4. Average error estimates for all the systems (a total of 433 mutations from 24 protein−protein systems) using MM/GBSA and IE results, respectively. The blue bars represent the error using MM/ GBSA result of enthalpy only, and red bars represent the error using IE method with entropic contribution to free energy.

ΔΔGbind with respect to the experimental values is 1.0 kcal/mol from the MM/GBSA method versus 0.2 kcal/mol from the IE method. Similarly, the RMSE is 2.1 kcal/mol versus 1.5 kcal/ mol and MAE is 1.4 kcal/mol versus 1.0 kcal/mol. The result of these comparisons is very encouraging and validates the reliability of the IE method. We used the t test implemented in the R package to see whether the differences between ΔΔH (binding free energy difference with entropy not included) and ΔΔG (binding free energy with entropy included) are meaningful within a 0.95 confidence level. The calculated p-value is 6.9 × 10−8 indicating that the differences between DDH and DDG are meaningful. However, there is still difference between the calculated and experimental values. Especially, in some cases the calculated enthalpy contribution is too large, which turns a null-spot to a

Figure 5. Gas-phase interaction energy, time-averaged interaction entropy or IE (kcal/mol) and the distribution of the interaction energy of four residues whose entropic contribution is relatively large. (a) Gas-phase interaction energy during 2 ns simulation time. (b) Time-averaged interaction entropies calculated by eq 10 using time steps of (black) 10, (red) 50, (green) 100, and (blue) 200 fs. (c) Corresponding interaction energy distributions sampled from the last 2 ns. 1118

DOI: 10.1021/acs.jcim.6b00734 J. Chem. Inf. Model. 2017, 57, 1112−1122

Article

Journal of Chemical Information and Modeling Table 4. Comparation of IE Using Two Different Methods: Time Integration and Gaussiana PDB

mutation

ΔΔIE_Ib

ΔΔIE_Gc

PDB

mutation

ΔΔIE_Ib

ΔΔIE_Gc

1A22 1A22 1AK4 1AK4 1BRS 1BRS 1BRS 1BRS 1BRS 1BRS 1BRS 1BRS 1BRS 1BRS 1CBW 1CHO 1CHO 1CHO 1CHO 1CHO 1DQJ 1DQJ 1DQJ 1DQJ 1DQJ 1DQJ 1DQJ 1DQJ 1DQJ 1EMV 1EMV 1EMV 1EMV 1EMV 1EMV 1EMV 1EMV 1EMV 1F47 1F47 1F47 1FCC 1FCC

KA172A IB365A VD486A HD487A KA27A NA58A RA59A EA73A RA83A RA87A HA102A YD29A DD35A DD39A KI15A TI17A LI18A EI19A YI20A RI21A NA31A NA32A YA50A DB32A YB33A WB98A YC20A KC96A KC97A LA33A VA34A EA41A SA50A DA51A YA54A YA55A NB75A FB86A IA8A FA11A LA12A KC31A NC35A

0.91 0.15 0.24 0.92 1.87 0.06 3.33 0.43 1.84 1.38 0.89 1.25 2.18 3.43 2.22 −0.12 0.49 1.43 0.46 3.73 0.2 0.76 0.79 2.9 0.97 1.14 0.44 1.09 2.87 0.28 0.36 3.55 0.47 0.98 0.84 1.25 0.49 0.92 0.69 1.22 0.43 1.51 1.13

0.89 0.14 0.28 0.73 1.69 0.08 4.3 0.44 1.99 1.17 0.75 1.01 2.28 3.59 2.97 0.01 0.6 1.5 0.53 4.16 0.21 0.63 0.68 2.86 0.86 1.29 0.4 0.99 3.44 0.25 0.34 3.37 0.39 1.76 0.75 1.01 0.44 0.97 0.68 1.33 0.4 1.6 0.94

1FCC 1JCK 1JCK 1JTG 1JTG 1JTG 1JTG 1JTG 1JTG 1JTG 1JTG 1JTG 1JTG 1KTZ 1KTZ 1KTZ 1KTZ 1TM1 1TM1 1TM1 1TM1 1TM1 1VFB 1VFB 1VFB 1Z7X 1Z7X 1Z7X 1Z7X 2JEL 3HFM 3HFM 3HFM 3HFM 3HFM 3HFM 3HFM 3HFM 3HFM 3HFM 3HFM 3SGB 3SGB

WC43A VB91A FB176A FB36A HB41A YB53A KB74A WB112A FB142A HB148A WB150A RB160A WB162A RA94A LB27A FB30A IB50A TI58A EI60A YI61A RI65A RI67A WA92A DB100A QC121A WW263A YW434A DW435A YW437A EP70A YH33A YH50A YH53A WH98A NL31A NL32A YL50A YL96A YY20A KY96A KY97A TI17A LI18A

0.46 0.43 2.79 1.82 0.11 0.91 1.96 0.61 1.81 0.34 0.71 1.9 1.63 4.81 0.4 0.85 0.41 0.51 0.89 0.68 0.47 0.77 1.5 2.75 0.64 0.89 1.71 2.71 1.47 1.39 0.98 1.04 1.27 1.47 0.63 0.58 0.97 0.67 1.08 1.26 2.16 0.06 0.61

0.48 0.36 2.41 2.44 0.12 0.78 1.72 0.59 4.04 0.29 0.67 1.86 2.41 6.17 0.32 0.8 0.35 0.47 0.92 0.72 0.47 0.83 1.61 2.62 0.65 0.93 1.49 3.04 1.38 1.23 0.83 0.82 1.05 1.68 0.55 0.5 0.95 0.6 0.96 1.21 2.33 0.11 0.87

Hot-spot residues are listed. All values are in kilocalories per mole. bIE_I is calculated by eq 10 with time step 10 fs, ΔΔIE_I = ΔIE_I⟨wild⟩ − ΔIE_I⟨ala⟩. cIE_G is calculated by eq 15, ΔΔIE_G = ΔIE_G⟨wild⟩ − ΔIE_G⟨ala⟩. a

fluctuation is not fully converged from simulation and the assumption of a single Gaussian in energy distribution gives overly larger width σ. Thus, use of eq 15 could lead to large over estimation of the entropic term if energy distribution is not converged from simulation. However, the numerical integration formula of eq 10 still gives a good result. Detailed results using the GB and PB models with and without entropic contributions for all 24 complexes are shown in Tables S1 and S2 in the Supporting Information.

changes of the wild type and its mutant complexes upon binding are assumed to be equal. However, the systematical research about whether the inclusion of entropy is helpful when the hot-spot prediction is unknown. In this study, the IE method for efficient computation of residue-specific entropic contribution to the protein−protein binding free energy is proposed. In this approach, one calculates the gas-phase residue−protein interaction energy whose fluctuation in time or distribution in energy is used to generate IE or entropic change due and there is no additional computational cost beyond the standard MD simulation of the protein−protein complex. A total of 433 mutations for which accurate experimental data was available are tested in this study using MM/GBSA

4. CONCLUSIONS The popular computational alanine scanning method of MM/ PBSA(GBSA) for hot-spot prediction usually neglects entropic contribution for protein−protein binding. Generally, entropy 1119

DOI: 10.1021/acs.jcim.6b00734 J. Chem. Inf. Model. 2017, 57, 1112−1122

Article

Journal of Chemical Information and Modeling

(11) Cukuroglu, E.; Engin, H. B.; Gursoy, A.; Keskin, O. Hot spots in protein-protein interfaces: Towards drug discovery. Prog. Biophys. Mol. Biol. 2014, 116, 165−173. (12) Thangudu, R. R.; Bryant, S. H.; Panchenko, A. R.; Madej, T. Modulating protein-protein interactions with small molecules: The importance of binding hotspots. J. Mol. Biol. 2012, 415, 443−453. (13) Wells, J. A.; McClendon, C. L. Reaching for high-hanging fruit in drug discovery at protein-protein interfaces. Nature 2007, 450, 1001−1009. (14) Bogan, A. A.; Thorn, K. S. Anatomy of hot spots in protein interfaces. J. Mol. Biol. 1998, 280, 1−9. (15) Clackson, T.; Wells, J. A. A hot spot of binding energy in a hormone-receptor interface. Science 1995, 267, 383−386. (16) Thorn, K. S.; Bogan, A. A. ASEdb: A database of alanine mutations and their effects on the free energy of binding in protein interactions. Bioinformatics 2001, 17, 284−285. (17) Fischer, T. B.; Arunachalam, K. V.; Bailey, D.; Mangual, V.; Bakhru, S.; Russo, R.; Huang, D.; Paczkowski, M.; Lalchandani, V.; Ramachandra, C.; Ellison, B.; Galer, S.; Shapley, J.; Fuentes, E.; Tsai, J. The binding inteference database (BID): A compilation of amino acid hot spots in protein interfaces. Bioinformatics 2003, 19, 1453−1454. (18) Moal, I. H.; Fernández-Recio, J. SKEMPI: a Structural Kinetic and Energetic database of Mutant Protein Interactions and its use in empirical models. Bioinformatics 2012, 28, 2600−2607. (19) Huo, S.; Massova, I.; Kollman, P. A. Computational alanine scanning of the 1:1 human growth hormone-receptor complex. J. Comput. Chem. 2002, 23, 15−27. (20) Kim, D. E.; Chivian, D.; Baker, D. Protein structure prediction and analysis using the Robetta server. Nucleic Acids Res. 2004, 32, W526−W531. (21) Landon, M. R.; Lancia, D. R., Jr; Yu, J.; Thiel, S. C.; Vajda, S. Identification of hot spots within druggable binding regions by computational solvent mapping of proteins. J. Med. Chem. 2007, 50, 1231−1240. (22) Martins, S. A.; Perez, M. A. S.; Moreira, I. S.; Sousa, S. F.; Ramos, M. J.; Fernandes, P. A. Computational Alanine Scanning Mutagenesis: MM-PBSA vs TI. J. Chem. Theory Comput. 2013, 9, 1311−1319. (23) Massova, I.; Kollman, P. A. Computational Alanine Scanning To Probe Protein−Protein Interactions: A Novel Approach To Evaluate Binding Free Energies. J. Am. Chem. Soc. 1999, 121, 8133−8143. (24) Moreira, I. S.; Fernandes, P. A.; Ramos, M. J. Computational alanine scanning mutagenesis - An improved methodological approach. J. Comput. Chem. 2007, 28, 644−654. (25) Simões, I. C. M.; Costa, I. P. D.; Coimbra, J. T. S.; Ramos, M. J.; Fernandes, P. A. New Parameters for Higher Accuracy in the Computation of Binding Free Energy Differences upon Alanine Scanning Mutagenesis on Protein−Protein Interfaces. J. Chem. Inf. Model. 2017, 57, 60−72. (26) Cho, K. I.; Kim, D.; Lee, D. A feature-based approach to modeling proteinprotein interaction hot spots. Nucleic Acids Res. 2009, 37, 2672−2687. (27) Darnell, S. J.; LeGault, L.; Mitchell, J. C. KFC Server: interactive forecasting of protein interaction hot spots. Nucleic Acids Res. 2008, 36, W265−W269. (28) Darnell, S. J.; Page, D.; Mitchell, J. C. An automated decisiontree approach to predicting protein interaction hot spots. Proteins: Struct., Funct., Genet. 2007, 68, 813−823. (29) Tuncbag, N.; Gursoy, A.; Keskin, O. Identification of computational hot spots in protein interfaces: Combining solvent accessibility and inter-residue potentials improves the accuracy. Bioinformatics 2009, 25, 1513−1520. (30) Liu, Q.; Li, J. Protein binding hot spots and the residue-residue pairing preference: a water exclusion perspective. BMC Bioinf. 2010, 11, 244. (31) Xia, J. F.; Zhao, X. M.; Song, J.; Huang, D. S. APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility. BMC Bioinf. 2010, 11, 174.

without entropic contrition and with entropic contribution using IE method. This data set involves diverse sets of mutations and is relatively large. The polar solvation term is calculated using both GB and PB method, and the GB method gave a better result. The calculated residue-specific binding free energy is dependent on the values of dielectric constant used. The optimal values are found to be 1 for nonpolar residues, 3 for polar residues, and 5 for charged residues. By using the MM/GBSA and IE method, an overall MAE of 1.0 kcal/mol relative to the experiment was achieved in 433 mutations.



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jcim.6b00734. Tables S1−S4 as mentioned in the text (PDF)



AUTHOR INFORMATION

Corresponding Authors

*E-mail: [email protected] (J.Z.H.Z.). *E-mail: [email protected] (C.G.J.). ORCID

John Z.H. Zhang: 0000-0003-4612-1863 Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This work was supported by the National Natural Science Foundation of China (Grant nos. 21433004), Ministry of Science and Technology of China (Grant no. 2016YFA0501700), NYU Global Seed Grant, and Shanghai Putuo District (Grant 2014-A-02). We thank the Supercomputer Center of East China Normal University for providing us computer time.



REFERENCES

(1) Keskin, O.; Gursoy, A.; Ma, B.; Nussinov, R. Principles of Protein−Protein Interactions: What are the Preferred Ways For Proteins To Interact? Chem. Rev. 2008, 108, 1225−1244. (2) Lo Conte, L.; Chothia, C.; Janin, J. The atomic structure of protein-protein recognition sites. J. Mol. Biol. 1999, 285, 2177−2198. (3) Muegge, I.; Schweins, T.; Warshel, A. Electrostatic contributions to protein-protein binding affinities: Application to Rap/Raf interaction. Proteins: Struct., Funct., Genet. 1998, 30, 407−423. (4) Northrup, S. H.; Erickson, H. P. KINETICS OF PROTEIN PROTEIN ASSOCIATION EXPLAINED BY BROWNIAN DYNAMICS COMPUTER-SIMULATION. Proc. Natl. Acad. Sci. U. S. A. 1992, 89, 3338−3342. (5) Pazos, F.; HelmerCitterich, M.; Ausiello, G.; Valencia, A. Correlated mutations contain information about protein-protein interaction. J. Mol. Biol. 1997, 271, 511−523. (6) Schreiber, G.; Fersht, A. R. ENERGETICS OF PROTEINPROTEIN INTERACTIONS - ANALYSIS OF THE BARNASEBARSTAR INTERFACE BY SINGLE MUTATIONS AND DOUBLE MUTANT CYCLES. J. Mol. Biol. 1995, 248, 478−486. (7) Schreiber, G.; Fersht, A. R. Rapid, electrostatically assisted association of proteins. Nat. Struct. Biol. 1996, 3, 427−431. (8) Schreiber, G.; Haran, G.; Zhou, H. X. Fundamental Aspects of Protein-Protein Association Kinetics. Chem. Rev. 2009, 109, 839−860. (9) Teichmann, S. A. Principles of protein-protein interactions. Bioinformatics 2002, 18, S249−S249. (10) Arkin, M. R.; Wells, J. A. Small-molecule inhibitors of proteinprotein interactions: Progressing towards the dream. Nat. Rev. Drug Discovery 2004, 3, 301−317. 1120

DOI: 10.1021/acs.jcim.6b00734 J. Chem. Inf. Model. 2017, 57, 1112−1122

Article

Journal of Chemical Information and Modeling (32) Geppert, T.; Hoy, B.; Wessler, S.; Schneider, G. Context-based identification of protein-protein interfaces and ″hot-spot″ residues. Chem. Biol. 2011, 18, 344−353. (33) Guney, E.; Tuncbag, N.; Keskin, O.; Gursoy, A. HotSprint: Database of computational hot spots in protein interfaces. Nucleic Acids Res. 2007, 36, D662−D666. (34) Krüger, D. M.; Gohlke, H. DrugScorePPI webserver: Fast and accurate in silico alanine scanning for scoring protein-protein interactions. Nucleic Acids Res. 2010, 38, W480−W486. (35) Pavelka, A.; Chovancova, E.; Damborsky, J. HotSpot Wizard: A web server for identification of hot spots in protein engineering. Nucleic Acids Res. 2009, 37, W376−W383. (36) Shulman-Peleg, A.; Shatsky, M.; Nussinov, R.; Wolfson, H. J. Spatial chemical conservation of hot spot interactions in proteinprotein complexes. BMC Biol. 2007, 5, 43. (37) Tuncbag, N.; Salman, F. S.; Keskin, O.; Gursoy, A. Analysis and network representation of hotspots in protein interfaces using minimum cut trees. Proteins: Struct., Funct., Genet. 2010, 78, 2283− 2294. (38) Li, J. Y.; Liu, Q. ’Double water exclusion’: a hypothesis refining the O-ring theory for the hot spots at protein interfaces. Bioinformatics 2009, 25, 743−750. (39) Pons, C.; Glaser, F.; Fernandez-Recio, J. BMC Bioinf. 2011, 12, 378. (40) del Sol, A.; O’Meara, P. Small-world network approach to identify key residues in protein-protein interaction. Proteins: Struct., Funct., Genet. 2005, 58, 672−682. (41) Bash, P. A.; Field, M. J.; Karplus, M. FREE-ENERGY PERTURBATION METHOD FOR CHEMICAL-REACTIONS IN THE CONDENSED PHASE - A DYNAMICAL-APPROACH BASED ON A COMBINED QUANTUM AND MOLECULAR MECHANICS POTENTIAL. J. Am. Chem. Soc. 1987, 109, 8092− 8094. (42) Jorgensen, W. L.; Thomas, L. L. Perspective on free-energy perturbation calculations for chemical equilibria. J. Chem. Theory Comput. 2008, 4, 869−876. (43) Kita, Y.; Arakawa, T.; Lin, T. Y.; Timasheff, S. N. CONTRIBUTION OF THE SURFACE FREE-ENERGY PERTURBATION TO PROTEIN SOLVENT INTERACTIONS. Biochemistry 1994, 33, 15178−15189. (44) Kollman, P. FREE-ENERGY CALCULATIONS - APPLICATIONS TO CHEMICAL AND BIOCHEMICAL PHENOMENA. Chem. Rev. 1993, 93, 2395−2417. (45) Rao, B. G.; Singh, U. C. A FREE-ENERGY PERTURBATION STUDY OF SOLVATION IN METHANOL AND DIMETHYLSULFOXIDE. J. Am. Chem. Soc. 1990, 112, 3803−3811. (46) Rao, S. N.; Singh, U. C.; Bash, P. A.; Kollman, P. A. Free energy perturbation calculations on binding and catalysis after mutating Asn 155 in subtilisin. Nature 1987, 328, 551. (47) Beveridge, D. L.; Dicapua, F. M. Free energy via molecular simulation: applications to chemical and biomolecular systems. Annu. Rev. Biophys. Biophys. Chem. 1989, 18, 431−492. (48) Zacharias, M.; Straatsma, T. P.; Mccammon, J. A. Separationshifted scaling, a new scaling method for Lennard-Jones interactions in thermodynamic integration. J. Chem. Phys. 1994, 100, 9025−9031. (49) Aaqvist, J. Comment on ″Transferability of Ion Models″. J. Phys. Chem. 1994, 98, 8253−8255. (50) Huo, S.; Massova, I.; Kollman, P. A. Computational alanine scanning of the 1:1 human growth hormone-receptor complex. J. Comput. Chem. 2002, 23, 15−27. (51) Kollman, P. A.; Massova, I.; Reyes, C.; Kuhn, B.; Huo, S. H.; Chong, L.; Lee, M.; Lee, T.; Duan, Y.; Wang, W.; Donini, O.; Cieplak, P.; Srinivasan, J.; Case, D. A.; Cheatham, T. E. Calculating structures and free energies of complex molecules: Combining molecular mechanics and continuum models. Acc. Chem. Res. 2000, 33, 889−897. (52) Massova, I.; Kollman, P. A. Computational alanine scanning to probe protein-protein interactions: A novel approach to evaluate binding free energies. J. Am. Chem. Soc. 1999, 121, 8133−8143.

(53) Gohlke, H.; Case, D. A. Converging free energy estimates: MMPB(GB)SA studies on the protein−protein complex Ras−Raf. J. Comput. Chem. 2004, 25, 238−250. (54) Obiol-Pardo, C.; Granadino-Roldan, J. M.; Rubio-Martinez, J. Protein-protein recognition as a first step towards the inhibition of XIAP and Survivin anti-apoptotic 190 proteins. J. Mol. Recognit. 2008, 21, 190−204. (55) Ylilauri, M.; Pentikainen, O. T. MMGBSA As a Tool To Understand the Binding Affinities of Filamin-Peptide Interactions. J. Chem. Inf. Model. 2013, 53, 2626−2633. (56) Zuo, Z.; Gandhi, N. S.; Arndt, K. M.; Mancera, R. L. Free energy calculations of the interactions of c-Jun-based synthetic peptides with the c-Fos protein. Biopolymers 2012, 97, 899−909. (57) Duan, L.; Liu, X.; Zhang, J. Z. H. Interaction Entropy: A New Paradigm for Highly Efficient and Reliable Computation of Protein− Ligand Binding Free Energy. J. Am. Chem. Soc. 2016, 138, 5722−5728. (58) Case, D. A.; Cheatham, T. E.; Darden, T.; Gohlke, H.; Luo, R.; Merz, K. M.; Onufriev, A.; Simmerling, C.; Wang, B.; Woods, R. J. The Amber biomolecular simulation programs. J. Comput. Chem. 2005, 26, 1668−1688. (59) Pearlman, D. A.; Case, D. A.; Caldwell, J. W.; Ross, W. S.; Cheatham, T. E.; DeBolt, S.; Ferguson, D.; Seibel, G.; Kollman, P. AMBER, a package of computer programs for applying molecular mechanics, normal mode analysis, molecular dynamics and free energy calculations to simulate the structural and energetic properties of molecules. Comput. Phys. Commun. 1995, 91, 1−41. (60) Salomon-Ferrer, R.; Case, D. A.; Walker, R. C. An overview of the Amber biomolecular simulation package. Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2013, 3, 198−210. (61) Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 1983, 79, 926−935. (62) Darden, T.; York, D.; Pedersen, L. Particle mesh Ewald: An N· log(N) method for Ewald sums in large systems. J. Chem. Phys. 1993, 98, 10089−10092. (63) Ryckaert, J.-P.; Ciccotti, G.; Berendsen, H. J. C. Numerical integration of the cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes. J. Comput. Phys. 1977, 23, 327−341. (64) Onufriev, A.; Bashford, D.; Case, D. A. Exploring protein native states and large-scale conformational changes with a modified generalized born model. Proteins: Struct., Funct., Genet. 2004, 55, 383−394. (65) Tsui, V.; Case, D. A. Theory and applications of the generalized Born solvation model in macromolecular Simulations. Biopolymers 2000, 56, 275−291. (66) Sitkoff, D.; Ben-Tal, N.; Honig, B. Calculation of Alkane to Water Solvation Free Energies Using Continuum Solvent Models. J. Phys. Chem. 1996, 100, 2744−2752. (67) Sitkoff, D.; Sharp, K. A.; Honig, B. Accurate Calculation of Hydration Free Energies Using Macroscopic Solvent Models. J. Phys. Chem. 1994, 98, 1978−1988. (68) Clackson, T.; Ultsch, M. H.; Wells, J. A.; de Vos, A. M. Structural and functional analysis of the 1:1 growth hormone:receptor complex reveals the molecular basis for receptor affinity1. J. Mol. Biol. 1998, 277, 1111−1128. (69) Huang, M.; Syed, R.; Stura, E. A.; Stone, M. J.; Stefanko, R. S.; Ruf, W.; Edgington, T. S.; Wilson, I. A. The mechanism of an inhibitory antibody on TF-initiated blood coagulation revealed by the crystal structures of human tissue factor, Fab 5G9 and TF·5G9 complex1. J. Mol. Biol. 1998, 275, 873−894. (70) Gamble, T. R.; Vajdos, F. F.; Yoo, S.; Worthylake, D. K.; Houseweart, M.; Sundquist, W. I.; Hill, C. P. Crystal Structure of Human Cyclophilin A Bound to the Amino-Terminal Domain of HIV1 Capsid. Cell 1996, 87, 1285−1294. (71) Buckle, A. M.; Schreiber, G.; Fersht, A. R. PROTEINPROTEIN RECOGNITION - CRYSTAL STRUCTURAL-ANALYSIS OF A BARNASE BARSTAR COMPLEX AT 2.0-ANGSTROM RESOLUTION. Biochemistry 1994, 33, 8878−8889. 1121

DOI: 10.1021/acs.jcim.6b00734 J. Chem. Inf. Model. 2017, 57, 1112−1122

Article

Journal of Chemical Information and Modeling (72) Fujinaga, M.; Sielecki, A. R.; Read, R. J.; Ardelt, W.; Laskowski, M.; James, M. N. G. Crystal and molecular structures of the complex of α-chymotrypsin with its inhibitor Turkey ovomucoid third domain at 1.8 Å resolution. J. Mol. Biol. 1987, 195, 397−418. (73) Li, Y.; Li, H.; Smith-Gill, S. J.; Mariuzza, R. A. ThreeDimensional Structures of the Free and Antigen-Bound Fab from Monoclonal Antilysozyme Antibody HyHEL-63. Biochemistry 2000, 39, 6296−6309. (74) Kühlmann, U. C.; Pommer, A. J.; Moore, G. R.; James, R.; Kleanthous, C. Specificity in protein-protein interactions: the structural basis for dual recognition in endonuclease colicin-immunity protein complexes1. J. Mol. Biol. 2000, 301, 1163−1178. (75) Mosyak, L.; Zhang, Y.; Glasfeld, E.; Haney, S.; Stahl, M.; Seehra, J.; Somers, W. S. The bacterial cell-division protein ZipA and its interaction with an FtsZ fragment revealed by X-ray crystallography. EMBO J. 2000, 19, 3179−3191. (76) Deisenhofer, J. Crystallographic refinement and atomic models of a human Fc fragment and its complex with fragment B of protein A from Staphylococcus aureus at 2.9- and 2.8-A resolution. Biochemistry 1981, 20, 2361−70. (77) Sauer-Eriksson, A. E.; Kleywegt, G. J.; Uhlén, M.; Jones, T. A. Crystal structure of the C2 fragment of streptococcal protein G in complex with the Fc domain of human IgG. Structure 1995, 3, 265− 278. (78) Kwong, P. D.; Wyatt, R.; Robinson, J.; Sweet, R. W.; Sodroski, J.; Hendrickson, W. A. Structure of an HIV gp120 envelope glycoprotein in complex with the CD4 receptor and a neutralizing human antibody. Nature 1998, 393, 648−659. (79) Sogabe, S.; Stuart, F.; Henke, C.; Bridges, A.; Williams, G.; Birch, A.; Winkler, F. K.; Robinson, J. A. Neutralizing epitopes on the extracellular interferon gamma receptor (IFNgammaR) alpha-chain characterized by homolog scanning mutagenesis and X-ray crystal structure of the A6 fab-IFNgammaR1−108 complex. J. Mol. Biol. 1997, 273, 882−97. (80) Lim, D.; Park, H. U.; De Castro, L.; Kang, S. G.; Lee, H. S.; Jensen, S.; Lee, K. J.; Strynadka, N. C. J. Crystal structure and kinetic analysis of [beta]-lactamase inhibitor protein-II in complex with TEM1 [beta]-lactamase. Nat. Struct. Biol. 2001, 8, 848−852. (81) Hart, P. J.; Deep, S.; Taylor, A. B.; Shu, Z.; Hinck, C. S.; Hinck, A. P. Crystal structure of the human T[beta]R2 ectodomain-TGF[beta]3 complex. Nat. Struct. Biol. 2002, 9, 203−208. (82) Radisky, E. S.; Kwan, G.; Karen Lu, C.-J.; Koshland, D. E. Binding, Proteolytic, and Crystallographic Analyses of Mutations at the Protease−Inhibitor Interface of the Subtilisin BPN‘/Chymotrypsin Inhibitor 2 Complex. Biochemistry 2004, 43, 13648−13656. (83) Bhat, T. N.; Bentley, G. A.; Boulot, G.; Greene, M. I.; Tello, D.; Dall’Acqua, W.; Souchon, H.; Schwarz, F. P.; Mariuzza, R. A.; Poljak, R. J. Bound water molecules and conformational stabilization help mediate an antigen-antibody association. Proc. Natl. Acad. Sci. U. S. A. 1994, 91, 1089−1093. (84) Johnson, R. J.; McCoy, J. G.; Bingman, C. A.; Phillips, G. N.; Raines, R. T. Inhibition of Human Pancreatic Ribonuclease by the Human Ribonuclease Inhibitor Protein. J. Mol. Biol. 2007, 368, 434− 449. (85) Hammel, M.; Sfyroera, G.; Ricklin, D.; Magotti, P.; Lambris, J. D.; Geisbrecht, B. V. A structural basis for complement inhibition by Staphylococcus aureus. Nat. Immunol. 2007, 8, 430−437. (86) Padlan, E. A.; Silverton, E. W.; Sheriff, S.; Cohen, G. H.; SmithGill, S. J.; Davies, D. R. Structure of an antibody-antigen complex: crystal structure of the HyHEL-10 Fab-lysozyme complex. Proc. Natl. Acad. Sci. U. S. A. 1989, 86, 5938−5942. (87) Read, R. J.; Fujinaga, M.; Sielecki, A. R.; James, M. N. Structure of the complex of Streptomyces griseus protease B and the third domain of the turkey ovomucoid inhibitor at 1.8-A resolution. Wadsworth Pub. Co., 1983; pp 4420−33. (88) Schutz, C. N.; Warshel, A. What are the dielectric “constants” of proteins and how to validate electrostatic models? Proteins: Struct., Funct., Genet. 2001, 44, 400−417.

(89) Sheinerman, F. B.; Norel, R.; Honig, B. Electrostatic aspects of protein−protein interactions. Curr. Opin. Struct. Biol. 2000, 10, 153− 159. (90) Wagoner, J.; Baker, N. A. Solvation forces on biomolecular structures: A comparison of explicit solvent and Poisson−Boltzmann models. J. Comput. Chem. 2004, 25, 1623−1629. (91) Li, L.; Li, C.; Zhang, Z.; Alexov, E. On the Dielectric “Constant” of Proteins:Smooth Dielectric Function for Macromolecular Modeling and Its Implementationin DelPhi. J. Chem. Theory Comput. 2013, 9, 2126−2136. (92) Warshel, A.; Sharma, P. K.; Kato, M.; Parson, W. W. Modeling electrostatic effects in proteins. Biochim. Biophys. Acta, Proteins Proteomics 2006, 1764, 1647−1676. (93) Rastelli, G.; Rio, A. D.; Degliesposti, G.; Sgobba, M. Fast and accurate predictions of binding free energies using MM-PBSA and MM-GBSA. J. Comput. Chem. 2009, 31, NA. (94) Weis, A.; Katebzadeh, K.; Söderhjelm, P.; Nilsson, I.; Ryde, U. Ligand Affinities Predicted with the MM/PBSA Method: Dependence on the Simulation Method and the Force Field. J. Med. Chem. 2006, 49, 6596−6606. (95) Hou, T.; Wang, J.; Li, Y.; Wang, W. Assessing the performance of the MM/PBSA and MM/GBSA methods: II. The accuracy of ranking poses generated from docking. J. Comput. Chem. 2011, 32, 866−877. (96) Oehme, D. P.; Brownlee, R. T. C.; Wilson, D. J. D. Effect of atomic charge, solvation, entropy, and ligand protonation state on MM-PB(GB)SA binding energies of HIV protease. J. Comput. Chem. 2012, 33, 2566−2580. (97) Wang, J.; Hou, T. Develop and Test a Solvent Accessible Surface Area-Based Model in Conformational Entropy Calculations. J. Chem. Inf. Model. 2012, 52, 1199−1212. (98) Yang, T.; Wu, J. C.; Yan, C.; Wang, Y.; Luo, R.; Gonzales, M. B.; Dalby, K. N.; Ren, P. Virtual Screening Using Molecular Simulations. Proteins: Struct., Funct., Genet. 2011, 79, 1940−1951. (99) Genheden, S. MM/GBSA and LIE estimates of host−guest affinities: dependence on charges and solvation model. J. Comput.Aided Mol. Des. 2011, 25, 1085−1093. (100) Sun, H.; Li, Y.; Shen, M.; Tian, S.; Xu, L.; Pan, P.; Guan, Y.; Hou, T. Assessing the performance of MM/PBSA and MM/GBSA methods. 5. Improved docking performance using high solute dielectric constant MM/GBSA and MM/PBSA rescoring. Phys. Chem. Chem. Phys. 2014, 16, 22035−22045. (101) Sun, H.; Li, Y.; Tian, S.; Xu, L.; Hou, T. Assessing the performance of MM/PBSA and MM/GBSA methods. 4. Accuracies of MM/PBSA and MM/GBSA methodologies evaluated by various simulation protocols using PDBbind data set. Phys. Chem. Chem. Phys. 2014, 16, 16719−16729. (102) Yao, X.; Ji, C.; Xie, D.; Zhang, J. Z. H. Interaction specific binding hotspots in Endonuclease colicin-immunity protein complex from MD simulations. Sci. China: Chem. 2013, 56, 1143−1151.

1122

DOI: 10.1021/acs.jcim.6b00734 J. Chem. Inf. Model. 2017, 57, 1112−1122