New Parameters for Higher Accuracy in the Computation of Binding

Dec 12, 2016 - Knowing how proteins make stable complexes enables the development of inhibitors to preclude protein–protein (P:P) binding. The ident...
0 downloads 10 Views 2MB Size
Article pubs.acs.org/jcim

New Parameters for Higher Accuracy in the Computation of Binding Free Energy Differences upon Alanine Scanning Mutagenesis on Protein−Protein Interfaces Inês C. M. Simões,‡ Inês P. D. Costa,‡ Joaõ T. S. Coimbra, Maria J. Ramos, and Pedro A. Fernandes* UCIBIO, REQUIMTE, Departamento de Química e Bioquímica, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, s/n, 4169-007 Porto, Portugal S Supporting Information *

ABSTRACT: Knowing how proteins make stable complexes enables the development of inhibitors to preclude protein− protein (P:P) binding. The identification of the specific interfacial residues that mostly contribute to protein binding, denominated as hot spots, is thus critical. Here, we refine an in silico alanine scanning mutagenesis protocol, based on a residue-dependent dielectric constant version of the Molecular Mechanics/Poisson−Boltzmann Surface Area method. We have used a large data set of structurally diverse P:P complexes to redefine the residue-dependent dielectric constants used in the determination of binding free energies. The accuracy of the method was validated through comparison with experimental data, considering the per-residue P:P binding free energy (ΔΔGbinding) differences upon alanine mutation. Different protocols were tested, i.e., a geometry optimization protocol and three molecular dynamics (MD) protocols: (1) one using explicit water molecules, (2) another with an implicit solvation model, and (3) a third where we have carried out an accelerated MD with explicit water molecules. Using a set of protein dielectric constants (within the range from 1 to 20) we showed that the dielectric constants of 7 for nonpolar and polar residues and 11 for charged residues (and histidine) provide optimal ΔΔGbinding predictions. An overall mean unsigned error (MUE) of 1.4 kcal mol−1 relative to the experiment was achieved in 210 mutations only with geometry optimization, which was further reduced with MD simulations (MUE of 1.1 kcal mol−1 for the MD employing explicit solvent). This recalibrated method allows for a better computational identification of hot spots, avoiding expensive and time-consuming experiments or thermodynamic integration/ free energy perturbation/ uBAR calculations, and will hopefully help new drug discovery campaigns in their quest of searching spots of interest for binding small drug-like molecules at P:P interfaces.



research in the last decades.12−14 The constitution of P:P complexes is governed by the establishment of van der Waals interactions between nonpolar residues. Although hydrophobic effects have been attributed, a major responsibility in the formation of stable P:P complexes, electrostatic interactions, and hydrogen bonds at the interface may also play an important role.15 Significant progress has been made since it was shown that small drugs bind preferentially to specific residues at protein interfaces.16 These reports highlighted that not all residues at the protein interface are critical for P:P binding but rather a small number of residues that tend to be clustered within densely packed regions.17 These residues, defined as hot spots (HS), typically comprise 10% of the residues at the interface.18 HS are enclosed by energetically less important amino acids that enable the occlusion of solvent molecules, which allows for a tight network of interactions in HS regions.19−21 Thus, these locally and structurally conserved residues act as binding site anchors, contributing to the bulk of

INTRODUCTION Proteins often establish protein:protein (P:P) complexes, and such complexes are involved in processes such as growth, differentiation, intercellular communication, or even in programed cell death.1−4 The inadequate interaction between proteins often contributes to many diseases. In particular, this is important in the case of cancer where P:P interactions link networks that rely on oncogenic signals that can promote tumorigenesis, tumor progression, and metastasis.5 Therefore, modulation of such interactions has a great potential in applications for protein engineering and rational drug discovery.6,7 Despite their importance, the targeting of P:P interactions has been extremely challenging and only a few compounds have reached the market. One of the first clinically successful P:P interaction inhibitors belongs to a class of antiplatelet drugs that inhibit the transmembrane receptor integrin.8 Since then, potent inhibitors of cell function, including cell death and immunity have been developed, and some have already moved into clinical trials.9−11 To understand how a P:P complex associates, as well as its architecture, protein interaction sites have been a matter of debate and intense © 2016 American Chemical Society

Received: June 26, 2016 Published: December 12, 2016 60

DOI: 10.1021/acs.jcim.6b00378 J. Chem. Inf. Model. 2017, 57, 60−72

Article

Journal of Chemical Information and Modeling

protein force field. We validated the accuracy of the determination through comparison with experimental determinations and have shown that the recalibrated method provides results that are superior to the original one. We were also able to provide evidence of the applicability of the new parameters in molecular dynamics (MD) simulations with different types of solvent representations (explicit and implicit solvent models) and using an accelerated MD approach. The description is further improved when using MD simulations employing explicit solvent models for generating multiconformations for cASM calculations.

the binding energy, thus stabilizing P:P complexes.22 Therefore, the unravelling of the HS at the protein interface will contribute to the future development of novel P:P interaction inhibitors.23 Alanine scanning mutagenesis (ASM) is commonly employed to measure the per residue binding free energy at protein interfaces and to identify HS. HS are typically defined as residues contributing to an increase in the ΔΔGbinding of at least 4.0 kcal mol−1 upon alanine mutation.24 However, this cutoff value is not consensual in the literature. More recently, some authors proposed that to be considered an HS, the ΔΔGbinding upon alanine mutation should be greater than 1.5 kcal mol−1 while others have considered a 2.0 kcal mol−1 threshold.19,22,25,26 Several approaches have been described for the calculation of ΔΔGbinding. Despite the valuable information that ASM can provide, determining HS residues by experimental ASM techniques is costly and laborious, as each of the mutated proteins needs to be analyzed individually with biophysical methods.24 In the last decades, the growth of computational resources has enabled the study of more complex systems through computational approaches. P:P complexes are among these, and numerous computational methodologies are presently employed in their characterization. We can now perform computational ASM (cASM) with very satisfactory accuracy.27,28 These methods are based on a thermodynamic cycle to obtain free energy differences concerning two similar systems with small differences in their chemical structures.29,30 In this context, an approach represented by the Molecular Mechanics/Poisson−Boltzmann Surface Area (MM/PBSA) was initially applied by Srinivasan et al.31 and was later extended to cASM by Massova et al.29 This method combines an explicit molecular mechanical model to be applied to the solute with a continuum method for the solvation free energy.32 In its original implementation, this method was not as accurate as other approaches, such as thermodynamic integration (TI) or free energy perturbation (FEP). However, its implementation is less demanding than the latter, hence showing more promise for the screening of a large number of structural perturbations.33 It is essential to develop an accurate and predictive in silico protocol that provides fast quantitative estimates for ASM. Thus, efforts have been made to achieve a good compromise between the computational time required and an accuracy capable of reproducing the quantitative free energies obtained experimentally.28 In this context, a variation of the standard cASM methodological approach proposed by Massova et al.29 has been previously developed.28 This method comprises the use of the MM/PBSA approach optimized in a continuum solvent using the Generalized Born model and considering distinct internal dielectric constants depending on the type of amino acid that is mutated by an alanine.28 In this method, simulations were performed by solving the Poisson−Boltzmann (PB) equation with the software DelPhi.34 This methodological approach has been shown to be a very good predictive model to anticipate experimental data, achieving an accuracy equivalent to TI at a fraction of the computational cost.30 Within the spirit of the previous work, we decided to recalibrate the MM/PBSA method using a much larger data set of complexes to better test the method, optimize it, and make the results more universal. We have used the new MM/PBSA implementation in Amber12 (in the form of MMPBSA.py).35 In this study, we have adopted the default parameters to solve the PB equation with MMPBSA.py and employed the ff12SB



METHODOLOGY 1.1. Computational Alanine Scanning Mutagenesis. The MM/PBSA approach was originally developed for the Amber software, and several automated versions have been presented. In this study, the MMPBSA.py script included in the Amber12 software was used to calculate the ΔΔGbinding. The method was originally defined by Kollman et al.,32 and later it was improved by our group28 to implement a more accurate ASM protocol. MM/PBSA calculations can be performed on structures resulting from geometry optimization protocols or from ensembles of structures generated through MD simulations.36 In the MM/PBSA method, the ΔΔGbinding is estimated as the sum of the gas-phase energies, solvation free energies, and entropic contributions, which can be averaged over a set of frames from MD trajectories. The calculations are based on the thermodynamic cycle presented in Scheme 1. Scheme 1. Thermodynamic Cycle To Calculate P:P Complex Binding Free Energy

In Scheme 1, ΔGgas represents the free energy of interaction between proteins A and B in the gas phase, while ΔGpAsolv, ΔGpBsolv, and ΔGcpxsolv represent the solvation free energies of proteins A, B and the complex formed by both. The same cycle is applied for the wild type and the mutant P:P complexes. The ΔΔGbinding is then calculated by the difference between the mutant and the wild-type complexes, as represented in eq 1. ΔΔG binding = ΔG binding

mut

− ΔG binding

wt

(1)

Contributions to the free energy of a molecule include the internal energy (Einternal) of the solute, the electrostatic (Eelec) interactions, the van der Waals (Evdw) interactions, the free energy of solvation calculated as a sum of a polar (Gpolar solvation) and a nonpolar (Gnonpolar solvation) term, and a solute entropic (TS) contribution for the system (eq 2). Gmolecule = E internal + Eelec + Evdw + Gpolar solvation + Gnonpolar solvation − TS 61

(2) DOI: 10.1021/acs.jcim.6b00378 J. Chem. Inf. Model. 2017, 57, 60−72

Article

Journal of Chemical Information and Modeling Table 1. Data Set of 15 P:P Complexes Considereda complex(PDB ID)

resolution/Å

protein family

HS

mutations

1FCC44 3HFM45 1EMV46 1DQJ47 1VFB48 1JTG49 1TM150 1Z7X51 1CHO52 3SGB53 1KTZ54 1F4755 1BRS56 2GOX57 1AK458

3.20 3.00 1.70 2.00 1.80 1.73 1.70 1.95 1.80 1.80 2.15 1.95 2.00 2.20 2.36

complex (antibody−antigen) complex (antibody−antigen) immune system immune system/hydrolase immunoglobulin/hydrolase hydrolase hydrolase hydrolase/hydrolase inhibitor hydrolase/hydrolase inhibitor serine proteinase−inhibitor cytokine/cytokine receptor cell division proteins endonuclease cell adhesion/toxin viral protein/isomerase

1 10 13 7 5 13 5 5 4 3 9 3 10 2 2

5 20 42 16 26 25 6 14 5 7 19 7 14 2 2

The number of HS is given considering the definition of residues that, upon mutation by alanine, increase the binding free energy by 1.5 kcal mol−1. The total number of mutations is also shown.

a

assessments provided by this study (Table S1, Supporting Information (SI)). The main variable under study (as it has been shown that its careful choice yields more accurate results) is the internal solute dielectric constant. This parameter is not a universal value; it is instead an adjusted value that depends on the interacting residues in a protein.38 As pointed out by some studies, the dielectric constant should be 2 when applying MM/PBSA,39 while in a pKa’s calculation for instance, values higher or equal to 4 are recommended.40,41 In this study, the different dielectric constants will account for the different degrees of relaxation of the interface when different types of residues are mutated. According to previous studies,27,28 we vary the dielectric constant ranging from 1 to 20 to evaluate the sensitivity of the calculations with this parameter. A very important aspect upon complex formation is the solute’s (protein) entropy term. The entropy term of the solute includes translational, vibrational, and rotational components. These calculations are computationally expensive because they require a large number of microstates for quasi-harmonic analysis or well energy minimized structures for normal-mode analysis.35 For this reason and assuming that the ΔΔGbinding is calculated between very similar wild-type and mutants structures upon binding to the same “receptors”, the entropic contribution is considered to mostly cancel when one calculates free energy differences (residual differences may be absorbed into the calibrated dielectric constant). This assumption compensates for the computational time taken in the calculation of the entropic contributions and allows obtaining free energy values that correlate relatively well with the experiment.42 1.2. Model Setup. The models used to perform this work were selected from the SKEMPI database.43 This database comprises a total of 63 P:P complexes and a total of 922 mutations. In SKEMPI, specific residues were substituted by alanine and the corresponding ΔΔGbinding were evaluated experimentally. From these, a total of 15 distinct P:P complexes with a total of 210 mutations were evaluated here using the cASM protocol described previously. HS were defined based in the threshold value of 1.5 kcal mol−1. Table 1 illustrates the data set under study.

In this approach, and for each individual thermodynamic cycle (wild type and mutant), the internal potential term equals zero because the energy of interaction is calculated with the bound and unbound states in the same conformation. This is a consequence of the single trajectory/structure protocol. The polar solvation term was calculated using the PB equation, and the nonpolar solvation term was based on the solvent accessible surface area (SASA).35 The PB equation is given by eq 3, where ε is the dielectric constant, Φ is the electrostatic potential, ρsolute is the fixed charge density, k is the Boltzmann constant, and T is the temperature. Here, ε, ρ, and Φ are all functions of the vector r.37 −∇·[ε( r ⃗)∇ϕ( r ⃗)] = 4πρsolute ( r ⃗) +

⎛ −ziϕ( r ⃗) ⎞ ⎟ ⎝ kT ⎠

∑ zini 0exp⎜ i

(3)

The nonpolar solvation term was estimated based in eq 4, where A is the solvent-accessible surface area, and ο• and b are empirical constants (0.00542 kcal Å−2 mol−1 and 0.92 kcal mol−1, respectively). ΔEnonpolar = o•A + b

(4)

The default method for calculating polar solvation energies includes ionic strength and a different external (solvent) and internal (solute) dielectric constant. As the accuracy of the method is also influenced by the parameters used to solve the PB equation, the default values of f ill-ratio and scale in MMPBSA.py script were tested and shown to be adequate. The f ill-ratio is defined as the portion of the grid occupied by the molecule. The scale is the number of PB grid points per Å, being the resolution of the grid a way to modulate the accuracy of the methodology. The default values are 4.0 and 2.0 for f illratio and scale, respectively. We tested the values of 3.0, 4.0, and 5.0 for f ill-ratio and 2.0, 2.5, and 3.0 for scale. We show that there are no significant differences between the results employing these parameters. Considering that the increment of these values implicates a higher computational effort, and in contrast, the decrease in the values may eventually compromise the accuracy of the method, we have established the default values as the standards to be employed in the subsequent 62

DOI: 10.1021/acs.jcim.6b00378 J. Chem. Inf. Model. 2017, 57, 60−72

Article

Journal of Chemical Information and Modeling

long-range electrostatic interactions.68 Covalent bonds comprising hydrogen atoms were constrained using SHAKE, which allowed an integration time step of 2 fs.69 The simulations were then divided into five slabs (10 ns each) from which a total of 250 frames from the last 1.25 ns of each slab were considered for cASM analysis.70 Although the 250 structures of each slab were taken from the last 1.25 ns, the comparison of the structures between the different slabs was representative of the dynamics of the system, as it involved a comparison of structures very much spread in time. To avoid any underestimation of the standard deviation of the free energy calculations due to possible correlation within the data taken from the last 1.25 ns of each slab, we started by averaging the ΔΔGbinding within each of the five slabs and then calculated a confidence interval for the 50 ns average of the binding free energy using the five (safely independent) slabs as input for a t test. We opted to use a 68% confidence interval to be comparable with the typical reports that employ the standard deviation to quantify the statistical convergence. Individual data for the confidence interval of each mutation are presented in detail in Table S3 of the SI. 1.4.2. Implicit Solvent Simulations: Method B. In this method, the solvent was modeled through a modified Generalized born solvation model, with no long-range cutoff. The geometry optimization, MD simulations, and cASM were performed as described in method A. 1.4.3. Accelerated MD: Method C. In this method, the equilibrated structures from method A were subjected to an aMD simulation. This method includes a modification of the potential that, by reducing the heights of potential energy surface barriers, allows the system to evolve more rapidly from minima or metastable conformations where it could become trapped. This enables faster conformational transitions.71 The settings used were based on Pierce et al.’s work.72 The boost energy was applied considering all dihedral angles. We have specified the aMD parameters based on the average dihedral and total potential energies obtained from a short conventional MD simulation, specifically, the first slab (10 ns) of method A. MD simulations and cASM were performed as described in method A. 1.5. cASM Performance. For all the mutations, we compared the ASM experimental values obtained a priori from the SKEMPI database with the calculated values using each of the methods described above. The results have been characterized by UE (unsigned error) and MUE (mean unsigned error) expressed in kcal mol−1. The performance of cASM was further analyzed through the use of two functions. Accuracy is defined as the fraction of correctly predicted residues to total number of residues, as represented in eq 5. In addition, we used another function to evaluate the performance of the methodthe specificity (eq 6).

The selection of the 15 complexes was based on the existence of experimental ΔΔGbinding values upon alanine mutation and on distinctive features such as size and the chemical and structural character of the complexes in order to achieve the maximum diversity within a still manageable set of P:P complexes. The initial models were modeled from the existent X-ray structures on the RCSB Protein Data Bank.59 All residues were included in their physiological state (pH 7), which were determined using the PDB2PQR server60 at http://nbcr-222. ucsd.edu/pdb2pqr_2.0.0/. This server employs the PROPKA methodology, a method that is used in the calculation of pKa residue values.61 In specific cases, some larger discrepancies between experimental and computed values were detected. For these cases, the protein was visually inspected and the protonation states were chosen taking into account the local chemical environment (Table S2, SI). 1.3. Geometry Optimization Protocol. All molecular simulations were performed using the Amber12 software package, with the ff12SB force field and the TIP3P water model.62 The effect of the solvent was described through the use of Hawkins et al.63,64 pairwise Generalized Born model with parameters described by Tsui et al.65 The dissulfide bridges and the hydrogen atoms were added with the LEAP program. Each complex was solvated with explicit TIP3P water molecules that extended 12 Å from any edge of the box to the solute atoms. When necessary, counterions of Cl− or Na+ were added to balance the net charge to zero. All complexes were energy minimized by 1500 steps of steepest descent followed by 1500 steps of conjugated gradient and employing a nonbonded cutoff of 10 Å. After the 3000 steps of geometry optimization, the final structures were subjected to cASM. The MMPBSA.py script implemented in Amber12 was used to compute the ΔΔGbinding for the wild-type complexes and the alanine mutants. The mutants were generated by the simple truncation of the mutated side chain, replacing Cγ with a hydrogen atom. In this analysis, the influence of the modulation of default values of scale and f ill-ratio parameters was evaluated. As the study of the sensitivity of the method with respect to the internal dielectric constant values was one of our main objectives, we have determined the ΔΔGbinding with dielectric constants ranging from 1 to 20. 1.4. Molecular Dynamics Simulations. To study whether the cASM protocol, validated with energy minimized structures, holds comparable accuracy to the one carried out in an ensemble of microstates (provided by MD simulations), we applied the adjusted PB parameters calculated with the geometry optimization protocol to three sets of simulations: (1) one using TIP3P explicit water molecules, (2) another with an implicit solvation model, and (3) a third where we have carried out an accelerated MD with explicit water molecules. 1.4.1. Explicit Solvent Simulations: Method A. In this method, each complex was solvated according to the settings of the geometry optimization protocol. The previously energy minimized complexes were subjected to heating (100 ps) and NPT equilibration (500 ps) stages, prior to a NPT production of 50 ns. The temperature of the systems was raised gradually from 0 to 300 K, followed by constant pressure dynamics (1 atm), employing periodic boundary conditions. The temperature was regulated by the Langevin thermostat, and an isotropic position scaling using the “weak-coupling” variety algorithm was employed to control the pressure.66,67 The Particle Mesh Ewald (PME) method was applied to calculate

Accuracy =

Specificity =

TP + TN TP + FP + TN + FN

TN TN + FP

(5)

(6)

In eqs 5 and 6, TP stands for true positive (predicted HS that are truly HS), FP stands for false positive (predicted HS that are not HS), TN stands for true negative (predicted non-HS that are not HS), and FN for false negative (predicted non-HS that are HS). We have analyzed the uncertainty of the two 63

DOI: 10.1021/acs.jcim.6b00378 J. Chem. Inf. Model. 2017, 57, 60−72

Article

Journal of Chemical Information and Modeling

ΔΔGbinding upon alanine mutation (−2.54 kcal mol−1 ≤ ΔΔGbinding ≤ 7.74 kcal mol−1). The majority of our data set comprises polar and charged residues (ca. 78%) which is also beneficial, as these residues constitute most of the identified HS. For this particular study, we have considered that HS have a ΔΔGbinding higher than 1.5 kcal mol−1 upon mutation, as discussed previously. This lower threshold grants a more challenging character to the cASM prediction and increases the number of HS for later interpretation of the accuracy of the protocol. Larger threshold values (2.0 and 4.0 kcal mol−1) would significantly lower the number of HS (Figure S1, SI). Geometry Optimization Protocol Results. Initially, we set out to study the implementation of the method as previously described by our group28 in the defined data set by considering the MMPBSA.py script. In this study, the default Adaptative Poisson−Boltzmann Solver (APBS) was used as an alternative PB solver (instead of the Delphi software) to estimate solvation and nonpolar energy contributions in the calculation of the ΔΔGbinding. As discussed previously and taking into account our previous work, we have set out to evaluate the effects of the choice of the internal (solute) dielectric constant in the cASM prediction. In this method, the solute and the solvent are represented by two distinct dielectric constants. The solvent is normally characterized by a dielectric constant close to a value of 80. However, a protein dielectric constant is not an universal value, and distinct dielectric constants could be employed to reflect the adjustment of the protein to the different degrees of relaxation of the interface when different types of residues are mutated.73 Previously, we have determined that the values of 2 for nonpolar residues, 3 for polar residues, and 4 for charged residues (and histidine) provide results in close agreement with experiment.28,30 However, several studies have suggested that the dielectric constant for proteins could be even larger depending on its nature, and dielectric constants of 10, 11, and 20 are also commonly employed.74−76 Therefore, we tested a set of values ranging from 1 to 20 for the mutations in the 15 complexes composing our data set. In Table 2, we show the results of the MUE between the experimental and computed ΔΔGbinding values, considering the geometry optimization protocol. Only the values of the standard error of the MUE for the set of dielectric constants (2,3,4) and (7,7,11) are

scoring functions through the calculation of standard errors of the mean. For the geometry optimization protocol, standard errors were obtained by dividing the total set of 210 mutations in 21 slabs of 10 mutations each. For the MD protocols, the standard error was obtained based in the value of the ASM performance functions of each time slab (a total of five slabs per simulation protocol).



RESULTS AND DISCUSSION A structurally diverse set of 15 P:P complexes, which comprised a total of 210 mutations was selected for the purposes of this study. These mutations are rich in terms of amino acid nature, as displayed in Figure 1, and in the range of experimental

Figure 1. Representation of the data set employed in this study. A total of 15 complexes and 210 mutations have been considered and distributed according to the alanine mutation in nonpolar, polar, and charged residues. The abundance of each type of residue is represented in percentage and by the number of residues in each classification. The number of HS is given considering the definition of an increase in the ΔΔGbinding by 1.5 kcal mol−1. The residues were classified as HS or non-HS according to their experimental ΔΔGbinding result.

Table 2. Influence of Internal Dielectric Constant in the Computed ΔΔGbinding valuesa mean unsigned error (MUE) of the ΔΔGbinding dielectric constant

total

1.5 kcal mol‑1

residue type

1

2

3

4

5

7

9

11

13

15

20

nonpolar polar charged nonpolar polar charged nonpolar polar charged

2.15 3.07 8.44 1.26 2.48 6.25 2.94 3.99 11.31

1.45 ± 0.14 1.73 4.39 1.12 1.48 3.21 1.74 2.14 5.92

1.33 1.45 ± 0.19 3.11 1.11 1.20 2.34 1.53 1.85 4.11

1.32 1.40 2.53 ± 0.30 1.14 1.10 1.98 1.47 1.86 3.26

1.30 1.31 2.18 1.13 1.05 1.77 1.44 1.72 2.72

1.29 ± 0.14 1.28 ± 0.17 1.81 1.15 1.00 1.57 1.41 1.70 2.13

1.29 1.28 1.60 1.17 0.99 1.44 1.40 1.74 1.82

− − 1.57 ± 0.14 − − 1.45 − − 1.72

− − 1.49 − − 1.36 − − 1.66

− − 1.42 − − 1.24 − − 1.65

− − 1.44 − − 1.24 − − 1.70

a A separate analysis for HS and non-HS was also made (considering a threshold of 1.5 kcal mol−1 for HS definition). The results are presented in terms of the MUE between experimental and computed ΔΔGbinding values obtained from energy minimized structures. We also depict the values of the standard error of the MUE for the set of dielectric constants (2,3,4) and (7,7,11). The results are expressed in kcal mol−1.

64

DOI: 10.1021/acs.jcim.6b00378 J. Chem. Inf. Model. 2017, 57, 60−72

Article

Journal of Chemical Information and Modeling

Figure 2. Correlation between computed and experimental ΔΔGbinding values, considering each type of residue. cASM results were obtained with the geometry optimization protocol. The values of dielectric constants used are (A) (7,7,11) and (B) (2,3,4) for nonpolar, polar, and charged residues, respectively. A linear regression for each data set is represented, along with the slope value and correlation coefficient (R2). A linear regression with the slope of unit represents the perfect agreement between experimental and computed ΔΔGbinding values, and dashed lines represent the interval where chemical accuracy (error below 1.0 kcal mol−1) is achieved. As a side note, a charged residue with the coordinates (7.74, 21.32) in (B) was out of the range interval of the y axis representation.

allowed to vary in a way to capture the energetic consequences of the local conformational restructuring concomitant with the mutation.40 An increase in the polarity of the residues is associated with a more extensive conformational reorganization in the neighborhood of the mutated residue. This is mostly significant for charged residues, where the use of alternative methods to calculate ΔΔGbinding outcomes in errors by over 4− 10 kcal mol−1, as reported in several studies.80 HS are mainly constituted by arginine, tryptophan, and tyrosine, and the rest of the residues is found with nearly identical frequencies.19 Considering the relevance of polar and charged residues, it is important to maximize the agreement between the computed and experimental ΔΔGbinding values. This can be achieved and observed by analyzing Figure 2, which depicts the correlation between computed and experimental ΔΔGbinding values. An improved correlation was obtained through the implementation of a higher dielectric constant to nonpolar, polar, and charged residues, with the effect being more pronounced for the charged residues with the approximation of the slope value to the unit (Figure 2). These results support the notion that the organization degree in the vicinity of the mutated residue is influenced by the nature of the residue’s side chain. However, conformational rearrangements caused by the mutation are not treated explicitly in the calculations performed, being the organization of the surrounding environment around this

presented. The errors for the remaining dielectric constants are detailed in Table S4 of the SI. According to the literature, the highest dielectric constants are normally attributable to charged groups, while lower dielectric constants are used for the other residues.77,78 In fact, we observed the same trend in Table 2. For the total nonpolar residues, the MUE appeared to be constant between the dielectric constants of 3 and 9, reaching its lowest value for the dielectric constant of 7. For polar residues, MUE values appeared to be stabilized around the dielectric constant of 7 (the lowest MUE value was reached for the same value). We have decided not to test dielectric constant values higher than 9 for polar and nonpolar residues since above the dielectric constant value of 5 their MUE did not suffer significant modifications. However, MUE values for charged residues appeared to stabilize only for dielectric constant values greater than 15, in accordance to literature information. We can see also in Table 2 that larger MUE values are more commonly associated with residues with experimental ΔΔGbinding values higher than 1.5 kcal mol−1 (experimentally determined HS). The solute’s dielectric constant varies as a function of the interacting residues in a protein. In fact, protein environments of ionizable residues tend to stabilize and to yield more accurate pKas with a protein dielectric constant of 20 (rather than using lower values).79 In this study, when different types of residues are mutated by alanine, the dielectric constant of the protein is 65

DOI: 10.1021/acs.jcim.6b00378 J. Chem. Inf. Model. 2017, 57, 60−72

Article

Journal of Chemical Information and Modeling

Table 3. cASM Performance of Different Sets of Internal Dielectric Constant Values in Terms of Accuracy and Specificitya 1.5 kcal mol−1 FP

65% 69% 72% 75%

41 37 33 30

b

TP 35% 31% 28% 25%

66 65 72 69

c

FNd 72% 71% 78% 75%

26 27 20 23

accuracy 28% 29% 22% 25%

68% 70% 75% 75%

± ± ± ±

4% 4% 3% 4%

specificity 65% 69% 72% 75%

± ± ± ±

5% 4% 4% 4%

a This analysis considered the geometry optimization protocol and a threshold value of 1.5 kcal mol−1 for HS definition. We also present the standard error values for accuracy and specificity (in percentage). A t-test shows that the differences in the values of specificity and accuracy between the set of constants (2,3,4) and (7,7,11) are statistically significant. bTN, true negative. cFP, false positive. dTP, true positive. eFN, false negative.

caused a 4% increase in the specificity value. This is probably justified by the values of ΔΔGbinding for charged residues, as these are tendentiously overestimated at lower dielectric constants, which would maintain more or less the same number of predicted TP. As we expected, however, what did change when we increased the dielectric constant from 4 to 11 was the number of predicted TN, as many FP predictions due to the overestimation of the ΔΔG binding became TN predictions. In fact, the best performance of the cASM protocol was achieved only when we changed also the dielectric constants for nonpolar and polar residues. In fact, when we increased the dielectric constant from 2 for nonpolar and 3 for polar residues to 7, maintaining the value of 11 for charged residues, the MUE reached the value of 1.4 kcal mol−1, while the accuracy and specificity reached the values of 75% and 72%, respectively. This was due to the number of FN predictions, which decreased when we incremented the dielectric constant values for this type of residues, which became correctly predicted as TP. Since the specificity is based on TN and FP predictions, this parameter was improved from 65% to 72% with the new set of dielectric constants, (7,7,11), when compared to the previous (2,3,4) set. We should note that the validation of the set of dielectric constants that best mimicked the experimental data was not only based on the analysis of the accuracy, specificity, and MUE results but also in the capacity of the method to correctly predict HS. On the basis of these results, we should note that the set of dielectric constants (7,7,11) demonstrated to give better results than (2,3,4) and (2,3,11). Moreover, from the analysis of Table 3, we showed that the (7,7,20) set caused an increase from 72% to 75% in the specificity value compared to the set of dielectric constants (7,7,11). We verified that this specificity corresponds to a higher capacity of the method to correctly predict the nonHS residues. However, this tendency is also accompanied by a lower capacity of the method to correctly predict HS. We questioned how statistically significant the difference would be between the results obtained with the new proposed dielectric constants values (7,7,11) and with the previous set of dielectric constants (2,3,4).28 A statistical hypothesis test was performed, and we showed that the difference between the ΔΔGbinding obtained with both sets of constants was statistically significant (P value of 0.05) for nonpolar residues (considering the dielectric constant value of 2 versus 7) and charged residues (considering the dielectric constant value of 4 versus 11) (Table S6, SI). This difference probably emanated from the much broader test set. On the basis of these results, we decided to establish the set of dielectric constants (7,7,11) as the optimal set for the subsequent analysis. Molecular Dynamics Simulation Protocol. As the validation of dielectric constants was performed on energy-

mutated residue satisfactorily mimicked by the use of the internal dielectric constant.40 As can be noticed, even after the implementation of higher dielectric constants, some ΔΔGbinding results continue to have a much higher value than the experimental one. We have made box-and-whiskers plots (Figure S2, SI) to define the free energy threshold for outliers. The resulting thresholds were 5.1 and 4.4 kcal mol−1 for the set of dielectric constants (2,3,4) and (7,7,11), respectively (Figure S2.A, SI). The number of outliers was reduced from 15 to 7 when we employed the recalibrated (7,7,11) set of dielectric constants, instead of the (2,3,4) set. The remaining seven outliers could be a consequence, at least in part, of the lack of sampling in the geometry optimization protocol. We also evaluated the performance of the cASM protocol based on its capacity for correctly predicting HS. In Table 3, we show the results for sets of dielectric constants of interest. We compared the performance of the (2,3,4) set of constants that was previously established with the set of constants (2,3,11), (7,7,11), and (7,7,20). The choice of these new sets was based on the performance of each dielectric constant and the respective MUE value. The results for the performance of each dielectric constant are presented in detail in Table S5 of the SI. Therefore, the dielectric constant of 7 showed the lowest MUE (1.29 kcal mol−1) and the highest accuracy (76%) for nonpolar residues. For polar residues, the lowest MUE (1.28 kcal mol−1) and an accuracy of 80% was achieved with a dielectric constant of 7. However, the differences between dielectric constants of 3 and 7 are not statistically significant (Table S6, SI). On the basis of this, both values could have been adopted. Furthermore, there was an increase in the ASM performance for a dielectric constant of 9. However, for this dielectric constant, the method failed in its high capacity to correctly predict HS. For charged residues, the dielectric constant of 11 has neither shown the best accuracy nor the lowest MUE values. However, it is for this constant that the method has shown a better performance on HS prediction (number of TP) compared to the higher dielectric constant values tested. The analysis of Table 3 shows that the set of dielectric constants (2,3,4) resulted in an accuracy and a specificity for the method of about 68% and 65%, respectively, which is accompanied with a MUE value of 1.8 kcal mol−1. Initially, we thought that the poorer results with the (2,3,4) set could be related to the large errors associated with charged residues. In order to test if considering only a higher dielectric constant value for charged residues would improve the predictive capacity of the cASM results, we calculated the accuracy of the method with the set of dielectric constants (2,3,11) for nonpolar, polar, and charged residues, respectively. This alteration improved the accuracy from 68% to 70% and 66

DOI: 10.1021/acs.jcim.6b00378 J. Chem. Inf. Model. 2017, 57, 60−72

Article

Journal of Chemical Information and Modeling Table 4. MUE Results for cASM Data Obtained with Geometry Optimization Protocol (min) and MD Simulationsa mean unsigned error (MUE) of ΔΔGbinding values set of dielectric constants

residue type

min

(7,7,11)

nonpolar polar charged total nonpolar polar charged total

1.36 1.42 1.26 1.35 1.30 1.43 2.70 1.81

(2,3,4)

cMD [0−50] ns 1.01 1.15 1.10 1.09 0.84 1.23 1.94 1.34

± ± ± ± ± ± ± ±

aMD [0−50] ns

0.03 0.02 0.02 0.02 0.02 0.04 0.06 0.02

0.90 1.21 0.93 1.01 0.68 1.24 2.34 1.42

± ± ± ± ± ± ± ±

0.02 0.08 0.02 0.03 0.04 0.11 0.09 0.02

cMD imp [0−50] ns 1.29 1.12 1.28 1.23 1.53 1.59 2.72 1.95

± ± ± ± ± ± ± ±

0.06 0.03 0.03 0.03 0.10 0.07 0.04 0.06

a

The results are presented also per variations of the MD protocol: explicit solvent (cMD), accelerated molecular dynamics (aMD), and implicit solvent (cMD imp). We show the MUE values for the average of the last 1.25 ns of all 10 ns slabs from a total of 50 ns simulations per protocol. We also represent the values of the standard error of the MUE for nonpolar, polar, and charged residues for the set of dielectric constants (2,3,4) and (7,7,11) and for each MD protocol. The results are expressed in kcal mol−1.

Table 5. cASM Performance for Geometry Optimization Protocol (min) and Different MD Protocols: Explicit Solvent (cMD), Accelerated Molecular Dynamics (aMD), and Implicit Solvent (cMD imp)a 1.5 kcal mol−1 b

24% 18%−27% 12%−21% 30%−33%

TP

c

83% 65%−79% 65%−78% 52%−74%

accuracy (7,7,11)

FNd

accuracy (2,3,4)

mean

17% 21%−35% 22%−35% 26%−48%

79% 75%−81% 77%−80% 63%−70%

79% 77% ± 1% 79% ± 0.4% 65% ± 1%

mean 77% 68%−77% 70%−77% 54%−68%

77% 73% ± 1% 75% ± 1% 59% ± 2%

a We show the minimum and maximum prediction values in the five slabs, as well as the accuracy values obtained for the MD protocols (considering the full 50 ns simulation). The analysis was based in the threshold value of 1.5 kcal mol−1 for the HS definition. We also present the standard error for the accuracy (in percentage). A t-test was performed and the differences between the accuracy results obtained with the sets of dielectric constants (2,3,4) and (7,7,11) are statistically significant. bTN, true negative. cFP, false positive. dTP, true positive. eFN, false negative.

Despite the lower MUE obtained with MD protocols, we observed in Table 5 that the geometry optimization protocol still provided the highest (or equal) accuracy (79%) when compared to the average of accuracy for cMD exp (77%), aMD (79%), and cMD imp (65%) (when considering the (7,7,11) set of dielectric constants). The accuracy values corresponding to each 10 ns slab are presented in Table S8 of the SI. The same trend is observed for the set of dielectric constants (2,3,4) to a larger extent. For this set of dielectric constants, the accuracy of the geometry optimization protocol was 77%, while the average accuracy values for cMD exp, aMD, and cMD imp were 73%, 75%, and 59%, respectively. From the analysis of the MUE and accuracy values in MD simulations, it is worth stressing that the dielectric constants that continued to give the best results was the (7,7,11) set, considering nonpolar, polar, and charged residues, respectively. This was shown for the geometry optimization protocol and all MD approaches under evaluation. It has been supported also by the analysis of the ΔΔGbinding values’ distribution for each type of residue (Figure 3). The major difference was observed by the decrease in the MUE values for charged residues when we applied the dielectric constant of 11. Considering the correlation between computed and experimental ΔΔGbinding values, we observed a decrease in the slope of the trend line to values around the unit for all the MD simulations with the (7,7,11) set of dielectric constants. With the set of dielectric constants (7,7,11), we found some differences in the analysis of each type of simulation, with MUE values ranging from 1.04 to 1.15 kcal mol−1 for cMD exp, 0.94 to 1.13 kcal mol−1 for aMD, and 1.07 to 1.30 kcal mol−1 for cMD imp during the 50 ns of MD simulation. The results for aMD led to smaller errors and to a better average accuracy

minimized structures, we set out to check if these dielectric constants did hold as good estimators for cASM in a MD simulation protocol. Three different MD protocols were employed to evaluate the performance of the method: one using explicit water molecules, another with an implicit solvation model, and a third in which we have carried out an accelerated MD with explicit water molecules. Due to the different MD settings to be tested, and due to the computational effort these calculations demanded, we have only analyzed three P:P complexes (1VFB, 1BRS, 1KTZ) from distinct protein families. It is important to stress that the three complexes and the universe of 59 mutations (28% of our initial data set) includes 24 mutations whose residues are experimentally determined HS. This number of HS constitute 41% of the total number of mutations in the subset, which is close to the percentage (42%) of the HS in the initial data set of 15 complexes. On the basis of this, the subset can be considered as a substantial data set. The MUE results considering the different MD protocols are shown in Table 4 (more detailed information for each slab is shown in Table S7 of the SI). Again, we have compared computed and experimental ΔΔGbinding values upon alanine mutation. The results showed in Table 4 are in close agreement with experiment, considering all the MD simulation protocols analyzed. When we compared geometry optimization versus MD, the results showed a lower average deviation from the experimental values for the latter. The previously established dielectric constants (2,3,4) also decreased the MUE of the ΔΔGbinding values in an MD setting (with the exception of the implicit simulation protocol). Still, charged residues on average continued to present a MUE close to 2.0 kcal mol−1. 67

DOI: 10.1021/acs.jcim.6b00378 J. Chem. Inf. Model. 2017, 57, 60−72

Article

Journal of Chemical Information and Modeling

Figure 3. Representation of ΔΔGbinding values obtained with all MD simulations: explicit solvent (cMD exp), accelerated molecular dynamics (aMD), and implicit solvent (cMD imp). The results show the comparison for each method between the set of dielectric constants (A) (7,7,11) and (B) (2,3,4). A linear regression with the slope of unit represents the perfect agreement between experimental and computed ΔΔGbinding values, and dashed lines represent the interval where chemical accuracy (error below 1.0 kcal mol−1) is achieved.

Almost 50% of the outliers that occurred during the geometry minimization protocol disappeared with the explicit cMD simulation protocolmore specifically a total of 2 in 5 outliers. This is inherently related with the sampling provided by MD simulations. Even with this improvement, a few outliers continue to persist, and we should note that their UE values were barely altered with the increment of the dielectric constant value during the geometry minimization protocol. A detailed analysis of each of the three outlier cases (a nonpolar residue and two polar residues) led us to suspect that the cause can be related to the rearrangement of the mutated residue at the interface during the simulation or/and to the influence of the presence/ absence of structural waters.81 In fact, the presence of explicit waters may stabilize an unfavorable interaction between two proteins, in particular, polar groups that may interact through water-mediated hydrogen bonds.81 Despite this limitation, the validated method continues to present a good way of performing HS prediction in the majority of the outlier cases, as it frequently overestimates the ΔΔGbinding values. Chemical accuracy is used as a gold standard in the comparison between theoretical and experimental values since it is appropriate to account for the weakest interactions involvedvan der Waals interactions.28 In this work, we have to consider an average experimental error of ca. 0.5 kcal mol−1 for experimental ΔΔGbinding values prediction43,82 and a computational error of 1.1 kcal mol−1 for computational ΔΔGbinding calculations with a cMD approach with explicit

(79%), closely followed by those for cMD (with a 77% average accuracy). Previous work has shown that MD simulations with implicit solvent provided more accurate results. This fact could be due to the smaller simulation time length that was performed then. This in essence could result in a sufficient sampling of the conformational space owing to the absence of viscous damping forces and the slow equilibration of water molecules.65 Here, as much larger simulation time lengths were performed, this method resulted in larger MUE values and an accuracy of only 65%. Although the solvation of the P:P complex with a water box leads to an increase in the CPU time, our results indicated that this is the preferred method. Thus, and for this improved protocol, MD with explicit solvent seems to be the most capable type of MD simulation to reproduce ΔΔGbinding differences. Still, from the analysis of Figure 3, a few residues continued to present ΔΔGbinding values higher than the experimental value. From the analysis of box-and-whiskers plots (Figure S2.B, SI), we established the threshold for an outlier for each set of the dielectric constants tested for the cMD simulation with explicit solvent (only considering the 40−50 ns slab). The resulting thresholds were 3.8 and 2.8 kcal mol−1 for the set of dielectric constants (2,3,4) and (7,7,11), respectively. Again, we observed a reduction in the number of outliers, from 4 to 3 outliers when we applied the new set of dielectric constants. 68

DOI: 10.1021/acs.jcim.6b00378 J. Chem. Inf. Model. 2017, 57, 60−72

Article

Journal of Chemical Information and Modeling solvent. Although the uncertainty in computational approaches is still larger than the experimental one, this error is not solely attributed to the quality of the force fields or the amount of simulation but is partly attributed to the inaccuracies in the reference experimental data. With this scenario, it is difficult to predict the highest expected value of accuracy. If the calculations were exact, the differences between theory and experiment would still lie around 0.2−0.7 kcal mol−1, the value of experimental error reported.82 In this situation, it is difficult to envisage how much would be the computational error because we do not have uncertainties for the individual mutations, just a general expectation that they should lie around 0.2−0.7 kcal mol−1. Even though the sum of the experimental to the computational error is of larger magnitude than the value of 1.0 kcal mol−1 established for the definition of chemical accuracy, we manage to achieve an accurate methodological cASM approach that fulfills the requirements of a good level of accuracy in HS prediction.

understand protein−protein interactions, and this knowledge advances protein engineering and rational drug design.



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jcim.6b00378. cASM results considering the variation of scale and fillratio parameters (Table S1). Modulation of the protonation state of specific residues (Table S2). Individual data for confidence interval (68%) per mutation of each complex with MD simulations (Table S3). Representation of the data set employed in this study (Figure S1). Standard error of the mean unsigned error (MUE) of ΔΔGbinding values obtained from energy minimized structures (Table S4). cASM performance for each dielectric constant value and for each type of residue, in terms of accuracy and specificity (Table S5). Statistical significance of the differences between the dielectric constants (Table S6). MUE results obtained with molecular dynamics simulation protocols: explicit solvent (cMD), accelerated molecular dynamics (aMD), and implicit solvent (cMD imp) (Table S7). cASM performance (in terms of accuracy) for the different MD protocols (Table S8). Representation of the unsigned error (UE) results for the cASM data in a box-andwhiskers plot (Figure S2). (PDF)



CONCLUSION We have performed a refinement of the cASM approach28 with the implementation of the MMPBSA.py script. The study was performed on a data set that represents 20% of the mutations present in the SKEMPI database, which makes the conclusions of this study particularly relevant. We have checked the default values of the PB equation parameters scale and f ill-ratio and concluded that they are robust and accurate. We proposed the use of larger values for the residue-dependent dielectric constants in order to better mimic the conformational reorganization when a residue is mutated by an alanine. Compared to the previously used set of dielectric constants (2,3,4), the higher accuracy, smaller MUE, and HS predictive capacity achieved with the new set of dielectric constants(7,7,11)justifies its implementation in the cASM methodology. Geometry optimization leads to a MUE value of 1.4 kcal mol−1 and a success rate of 75% in terms of accuracy. Obviously, we cannot advise the implementation of a geometry optimization protocol to every P:P complex structure. In this case, we have always started from X-ray structures, which, in principle, represent a thermally averaged conformation of the complex. In this context, the calibration of the internal dielectric constant based on energy minimized structures is meaningful, as shown by the subsequent MD simulations. In fact, three different approaches of MD simulation lead to comparable MUE results. The limitations and accuracies of each method varies. MD with explicit solvent was the most accurate method, and the cMD and aMD results were relatively similar. We obtained a minimum MUE of 1.04 kcal mol−1 and a maximum accuracy of 81% for cMD and a minimum MUE of 0.94 kcal mol−1 and a maximum accuracy of 80% for aMD. One of the greatest problems with parametrized calculations is the emergence of outliers, in particular, when the propensity for failure cannot be, at least, suspected from the beginning. Here, very few outliers were found (3 outliers from 59 mutations), which renders the method comfortably robust. With this recalibration, we have achieved an accurate methodological cASM approach that is capable of anticipating experimental results. This protocol can easily be implemented to study various P:P complexes in order to provide more information relatively to the protein’s interface and to promote the capacity of HS prediction. This study is also important to



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected] (P.A.F.). ORCID

Inês C. M. Simões: 0000-0002-9701-9431 Author Contributions ‡

Inês C. M. Simões and Inês P. D. Costa contributed equally.

Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This work received financial support from the following institutions: European Union (FEDER funds POCI/01/ 0145/FEDER/007728) and National Funds (FCT/MEC, Fundação para a Ciência e Tecnologia (FCT) and Ministério da Educação e Ciência) under the Partnership Agreement PT2020 UID/MULTI/04378/2013; NORTE-01-0145FEDER-000024, supported by Norte Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (ERDF); FCT through project EXCL-II/QEQ-COM/0394/2012. J.T.S.C. thanks FCT for his Ph.D. Grant (SFRH/BD/87434/2012).



ABBREVIATIONS ΔΔGbinding, binding free energy; aMD, accelerated MD; APBS, adaptative Poisson−Boltzmann solver; ASM, alanine scanning mutagenesis; cASM, computational ASM; FEP, free energy perturbation; FN, false negative; FP, false positive; HS, hot spot; MD, molecular dynamics; MM/PBSA, molecular mechanics/Poisson−Boltzmann surface area; MUE, mean unsigned error; P:P, protein−protein; PB, Poisson−Boltzamnn; PME, particle mesh Ewald; SASA, solvent accessible surface 69

DOI: 10.1021/acs.jcim.6b00378 J. Chem. Inf. Model. 2017, 57, 60−72

Article

Journal of Chemical Information and Modeling

(22) Clackson, T.; Wells, J. A. A Hot Spot of Binding Energy in a Hormone-Receptor Interface. Science 1995, 267, 383−386. (23) Cukuroglu, E.; Gursoy, A.; Keskin, O. HotRegion: A Database of Predicted Hot Spot Clusters. Nucleic Acids Res. 2012, 40, D829− 833. (24) Moreira, I. S.; Fernandes, P. A.; Ramos, M. J. Hot Spots - a Review of the Protein-Protein Interface Determinant Amino-Acid Residues. Proteins: Struct., Funct., Genet. 2007, 68, 803−812. (25) Liu, Q.; Ren, J.; Song, J.; Li, J. Co-Occurring Atomic Contacts for the Characterization of Protein Binding Hot Spots. PLoS One 2015, 10, e0144486. (26) Cheung, L. S.; Kanwar, M.; Ostermeier, M.; Konstantopoulos, K. A Hot-Spot Motif Characterizes the Interface Between a Designed Ankyrin-Repeat Protein and Its Target Ligand. Biophys. J. 2012, 102, 407−416. (27) Moreira, I. S.; Fernandes, P. A.; Ramos, M. J. Unravelling Hot Spots: a Comprehensive Computational Mutagenesis Study. Theor. Chem. Acc. 2006, 117, 99−113. (28) Moreira, I. S.; Fernandes, P. A.; Ramos, M. J. Computational Alanine Scanning Mutagenesis - an Improved Methodological Approach. J. Comput. Chem. 2007, 28, 644−654. (29) Massova, I.; Kollman, P. A. Combined Molecular Mechanical and Continuum Solvent Approach (MM-PBSA/GBSA) to Predict Ligand Binding. Perspect. Drug Discovery Des. 2000, 18, 113−135. (30) Martins, S. A.; Perez, M. A.; Moreira, I. S.; Sousa, S. F.; Ramos, M. J.; Fernandes, P. A. Computational Alanine Scanning Mutagenesis: MM-PBSA vs TI. J. Chem. Theory Comput. 2013, 9, 1311−1319. (31) Srinivasan, J.; Cheatham, T. E.; Cieplak, P.; Kollman, P. A.; Case, D. A. Continuum Solvent Studies of the Stability of DNA, RNA, and Phosphoramidate−DNA Helices. J. Am. Chem. Soc. 1998, 120, 9401−9409. (32) Kollman, P. A.; Massova, I.; Reyes, C.; Kuhn, B.; Huo, S.; Chong, L.; Lee, M.; Lee, T.; Duan, Y.; Wang, W.; Donini, O.; Cieplak, P.; Srinivasan, J.; Case, D. A.; Cheatham, T. E., 3rd Calculating Structures and Free Energies of Complex Molecules: Combining Molecular Mechanics and Continuum Models. Acc. Chem. Res. 2000, 33, 889−897. (33) Foloppe, N.; Hubbard, R. Towards Predictive Ligand Design with Free-Energy Based Computational Methods? Curr. Med. Chem. 2006, 13, 3583−3608. (34) Honig, B.; Nicholls, A. Classical Electrostatics in Biology and Chemistry. Science 1995, 268, 1144−1149. (35) Miller, B. R.; Mcgee, T. D., Jr.; Swails, J. M.; Homeyer, N.; Gohlke, H.; Roitberg, A. E. MMPBSA.Py: an Efficient Program for End-State Free Energy Calculations. J. Chem. Theory Comput. 2012, 8, 3314−3321. (36) Genheden, S.; Ryde, U. The MM/PBSA and MM/GBSA Methods to Estimate Ligand-Binding Affinities. Expert Opin. Drug Discovery 2015, 10, 449−461. (37) Holst, M. J. The Poisson-Boltzman Equation - Analysis and Multilevel Numerical Solution, 1994. (38) Zhu, K.; Shirts, M. R.; Friesner, R. A. Improved Methods for Side Chain and Loop Predictions Via the Protein Local Optimization Program: Variable Dielectric Model for Implicitly Improving the Treatment of Polarization Effects. J. Chem. Theory Comput. 2007, 3, 2108−2119. (39) Gilson, M. K.; Honig, B. The Inclusion of Electrostatic Hydration Energies in Molecular Mechanics Calculations. J. Comput.Aided Mol. Des. 1991, 5, 5−20. (40) Karp, D. A.; Gittis, A. G.; Stahley, M. R.; Fitch, C. A.; Stites, W. E.; Garcia-Moreno, E. B. High Apparent Dielectric Constant Inside a Protein Reflects Structural Reorganization Coupled to the Ionization of an Internal Asp. Biophys. J. 2007, 92, 2041−2053. (41) Georgescu, R. E.; Alexov, E. G.; Gunner, M. R. Combining Conformational Flexibility and Continuum Electrostatics for Calculating pk(A)s in Proteins. Biophys. J. 2002, 83, 1731−1748. (42) Huo, S.; Massova, I.; Kollman, P. A. Computational Alanine Scanning of the 1:1 Human Growth Hormone-Receptor Complex. J. Comput. Chem. 2002, 23, 15−27.

area; TI, thermodynamic integration; TN, true negative; TP, true positive; UE, unsigned error



REFERENCES

(1) Sowmya, G.; Breen, E. J.; Ranganathan, S. Linking Structural Features of Protein Complexes and Biological Function. Protein Sci. 2015, 24, 1486−1494. (2) Yook, S. H.; Oltvai, Z. N.; Barabasi, A. L. Functional and Topological Characterization of Protein Interaction Networks. Proteomics 2004, 4, 928−942. (3) Castelli, C.; Rivoltini, L.; Rini, F.; Belli, F.; Testori, A.; Maio, M.; Mazzaferro, V.; Coppa, J.; Srivastava, P. K.; Parmiani, G. Heat Shock Proteins: Biological Functions and Clinical Application as Personalized Vaccines for Human Cancer. Cancer Immunol. Immunother. 2004, 53, 227−233. (4) Xenarios, I.; Eisenberg, D. Protein Interaction Databases. Curr. Opin. Biotechnol. 2001, 12, 334−339. (5) Ivanov, A. A.; Khuri, F. R.; Fu, H. Targeting Protein-Protein Interactions as an Anticancer Strategy. Trends Pharmacol. Sci. 2013, 34, 393−400. (6) Thangudu, R. R.; Bryant, S. H.; Panchenko, A. R.; Madej, T. Modulating Protein-Protein Interactions with Small Molecules: the Importance of Binding Hotspots. J. Mol. Biol. 2012, 415, 443−453. (7) Kar, G.; Kuzu, G.; Keskin, O.; Gursoy, A. Protein-Protein Interfaces Integrated into Interaction Networks: Implications on Drug Design. Curr. Pharm. Des. 2012, 18, 4697−4705. (8) Hartman, G. D.; Egbertson, M. S.; Halczenko, W.; Laswell, W. L.; Duggan, M. E.; Smith, R. L.; Naylor, A. M.; Manno, P. D.; Lynch, R. J. Non-Peptide Fibrinogen Receptor Antagonists. 1. Discovery and Design of Exosite Inhibitors. J. Med. Chem. 1992, 35, 4640−4642. (9) Gao, Y.; Wang, R.; Lai, L. Structure-Based Method for Analyzing Protein-Protein Interfaces. J. Mol. Model. 2004, 10, 44−54. (10) Ford, M. L.; Larsen, C. P. Translating Costimulation Blockade to the Clinic: Lessons Learned from Three Pathways. Immunol. Rev. 2009, 229, 294−306. (11) Dubrez, L.; Berthelet, J.; Glorian, V. IAP Proteins as Targets for Drug Development in Oncology. OncoTargets Ther. 2013, 9, 1285− 1304. (12) Janin, J.; Henrick, K.; Moult, J.; Eyck, L. T.; Sternberg, M. J.; Vajda, S.; Vakser, I.; Wodak, S. J. CAPRI: a Critical Assessment of Predicted Interactions. Proteins: Struct., Funct., Genet. 2003, 52, 2−9. (13) Spirin, V.; Mirny, L. A. Protein Complexes and Functional Modules in Molecular Networks. Proc. Natl. Acad. Sci. U. S. A. 2003, 100, 12123−12128. (14) Keskin, O.; Gursoy, A.; Ma, B.; Nussinov, R. Principles of Protein-Protein Interactions: What are the Preferred Ways for Proteins to Interact? Chem. Rev. 2008, 108, 1225−1244. (15) Guo, W.; Wisniewski, J. A.; Ji, H. Hot Spot-Based Design of Small-Molecule Inhibitors for Protein-Protein Interactions. Bioorg. Med. Chem. Lett. 2014, 24, 2546−2554. (16) Arkin, M. R.; Wells, J. A. Small-Molecule Inhibitors of ProteinProtein Interactions: Progressing Towards the Dream. Nat. Rev. Drug Discovery 2004, 3, 301−317. (17) Sharma, S. K.; Ramsey, T. M.; Bair, K. W. Protein-Protein Interactions: Lessons Learned. Curr. Med. Chem.: Anti-Cancer Agents 2002, 2, 311−330. (18) Carbonell, P.; Nussinov, R.; Del Sol, A. Energetic Determinants of Protein Binding Specificity: Insights into Protein Interaction Networks. Proteomics 2009, 9, 1744−1753. (19) Bogan, A. A.; Thorn, K. S. Anatomy of Hot Spots in Protein Interfaces. J. Mol. Biol. 1998, 280, 1−9. (20) Fernandez, A.; Scheraga, H. A. Insufficiently Dehydrated Hydrogen Bonds as Determinants of Protein Interactions. Proc. Natl. Acad. Sci. U. S. A. 2003, 100, 113−118. (21) Moreira, I. S.; Fernandes, P. A.; Ramos, M. J. Hot Spot Computational Identification: Application to the Complex Formed Between the Hen Egg White Lysozyme (HEL) and the Antibody Hyhel-10. Int. J. Quantum Chem. 2007, 107, 299−310. 70

DOI: 10.1021/acs.jcim.6b00378 J. Chem. Inf. Model. 2017, 57, 60−72

Article

Journal of Chemical Information and Modeling (43) Moal, I. H.; Fernandez-Recio, J. SKEMPI: a Structural Kinetic and Energetic Database of Mutant Protein Interactions and Its Use in Empirical Models. Bioinformatics 2012, 28, 2600−2607. (44) Sauer-Eriksson, A. E.; Kleywegt, G. J.; Uhlen, M.; Jones, T. A. Crystal Structure of the C2 Fragment of Streptococcal Protein G in Complex with the Fc Domain of Human Igg. Structure 1995, 3, 265− 278. (45) Padlan, E. A.; Silverton, E. W.; Sheriff, S.; Cohen, G. H.; SmithGill, S. J.; Davies, D. R. Structure of an Antibody-Antigen Complex: Crystal Structure of the Hyhel-10 Fab-Lysozyme Complex. Proc. Natl. Acad. Sci. U. S. A. 1989, 86, 5938−5942. (46) Kuhlmann, U. C.; Pommer, A. J.; Moore, G. R.; James, R.; Kleanthous, C. Specificity in Protein-Protein Interactions: the Structural Basis for Dual Recognition in Endonuclease ColicinImmunity Protein Complexes. J. Mol. Biol. 2000, 301, 1163−1178. (47) Li, Y.; Li, H.; Smith-Gill, S. J.; Mariuzza, R. A. ThreeDimensional Structures of the Free and Antigen-Bound Fab from Monoclonal Antilysozyme Antibody Hyhel-63(,). Biochemistry 2000, 39, 6296−6309. (48) Bhat, T. N.; Bentley, G. A.; Boulot, G.; Greene, M. I.; Tello, D.; Dall’Acqua, W.; Souchon, H.; Schwarz, F. P.; Mariuzza, R. A.; Poljak, R. J. Bound Water Molecules and Conformational Stabilization Help Mediate an Antigen-Antibody Association. Proc. Natl. Acad. Sci. U. S. A. 1994, 91, 1089−1093. (49) Lim, D.; Park, H. U.; De Castro, L.; Kang, S. G.; Lee, H. S.; Jensen, S.; Lee, K. J.; Strynadka, N. C. Crystal Structure and Kinetic Analysis of Beta-Lactamase Inhibitor Protein-II in Complex with TEM-1 Beta-Lactamase. Nat. Struct. Biol. 2001, 8, 848−852. (50) Radisky, E. S.; Kwan, G.; Karen Lu, C. J.; Koshland, D. E., Jr. Binding, Proteolytic, and Crystallographic Analyses of Mutations at the Protease-Inhibitor Interface of the Subtilisin BPN′/Chymotrypsin Inhibitor 2 Complex. Biochemistry 2004, 43, 13648−13656. (51) Johnson, R. J.; Mccoy, J. G.; Bingman, C. A.; Phillips, G. N., Jr.; Raines, R. T. Inhibition of Human Pancreatic Ribonuclease by the Human Ribonuclease Inhibitor Protein. J. Mol. Biol. 2007, 368, 434− 449. (52) Fujinaga, M.; Sielecki, A. R.; Read, R. J.; Ardelt, W.; Laskowski, M., Jr.; James, M. N. Crystal and Molecular Structures of the Complex of Alpha-Chymotrypsin with Its Inhibitor Turkey Ovomucoid Third Domain at 1.8 A Resolution. J. Mol. Biol. 1987, 195, 397−418. (53) Read, R. J.; Fujinaga, M.; Sielecki, A. R.; James, M. N. Structure of the Complex of Streptomyces Griseus Protease B and the Third Domain of the Turkey Ovomucoid Inhibitor at 1.8-A Resolution. Biochemistry 1983, 22, 4420−4433. (54) Hart, P. J.; Deep, S.; Taylor, A. B.; Shu, Z.; Hinck, C. S.; Hinck, A. P. Crystal Structure of the Human Tbetar2 Ectodomain-TGF-Beta3 Complex. Nat. Struct. Biol. 2002, 9, 203−208. (55) Mosyak, L.; Zhang, Y.; Glasfeld, E.; Haney, S.; Stahl, M.; Seehra, J.; Somers, W. S. The Bacterial Cell-Division Protein Zipa and Its Interaction with an Ftsz Fragment Revealed by X-Ray Crystallography. EMBO J. 2000, 19, 3179−3191. (56) Buckle, A. M.; Schreiber, G.; Fersht, A. R. Protein-Protein Recognition: Crystal Structural Analysis of a Barnase-Barstar Complex at 2.0-A Resolution. Biochemistry 1994, 33, 8878−8889. (57) Hammel, M.; Sfyroera, G.; Ricklin, D.; Magotti, P.; Lambris, J. D.; Geisbrecht, B. V. A Structural Basis for Complement Inhibition by Staphylococcus Aureus. Nat. Immunol. 2007, 8, 430−437. (58) Gamble, T. R.; Vajdos, F. F.; Yoo, S.; Worthylake, D. K.; Houseweart, M.; Sundquist, W. I.; Hill, C. P. Crystal Structure of Human Cyclophilin A Bound to the Amino-Terminal Domain of HIV1 Capsid. Cell 1996, 87, 1285−1294. (59) Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. The Protein Data Bank. Nucleic Acids Res. 2000, 28, 235−242. (60) Dolinsky, T. J.; Nielsen, J. E.; Mccammon, J. A.; Baker, N. A. PDB2PQR: an Automated Pipeline for the Setup of PoissonBoltzmann Electrostatics Calculations. Nucleic Acids Res. 2004, 32, W665−667.

(61) Bas, D. C.; Rogers, D. M.; Jensen, J. H. Very Fast Prediction and Rationalization of pKa Values for Protein-Ligand Complexes. Proteins: Struct., Funct., Genet. 2008, 73, 765−783. (62) Maier, J. A.; Martinez, C.; Kasavajhala, K.; Wickstrom, L.; Hauser, K. E.; Simmerling, C. Ff14sb: Improving the Accuracy of Protein Side Chain and Backbone Parameters from Ff99sb. J. Chem. Theory Comput. 2015, 11, 3696−3713. (63) Hawkins, G. D.; Cramer, C. J.; Truhlar, D. G. Pairwise Solute Descreening of Solute Charges from a Dielectric Medium. Chem. Phys. Lett. 1995, 246, 122−129. (64) Hawkins, G. D.; Cramer, C. J.; Truhlar, D. G. Parametrized Models of Aqueous Free Energies of Solvation Based on Pairwise Descreening of Solute Atomic Charges from a Dielectric Medium. J. Phys. Chem. 1996, 100, 19824−19839. (65) Tsui, V.; Case, D. A. Theory and Applications of the Generalized Born Solvation Model in Macromolecular Simulations. Biopolymers 2000, 56, 275−291. (66) Berendsen, H. J. C.; Postma, J. P. M.; Van Gunsteren, W. F.; Dinola, A.; Haak, J. R. Molecular Dynamics with Coupling to an External Bath. J. Chem. Phys. 1984, 81, 3684−3690. (67) Izaguirre, J. S. A.; Catarello, D. P.; Wozniak, J. M.; Skeel, R. D. Langevin Stabilization of Molecular Dynamics. J. Chem. Phys. 2001, 114, 2090. (68) Darden, T.; York, D.; Pedersen, L. Particle Mesh Ewald: an N × Log(N) Method for Ewald Sums in Large Systems. J. Chem. Phys. 1993, 98, 10089. (69) Ryckaert, J.-P.; Ciccotti, G.; Berendsen, H. J. C. Numerical Integration of the Cartesian Equations of Motion of a System with Constraints: Molecular Dynamics of N-Alkanes. J. Comput. Phys. 1977, 23, 327−341. (70) Li, T.; Froeyen, M.; Herdewijn, P. Computational Alanine Scanning and Free Energy Decomposition for E. Coli Type I Signal Peptidase with Lipopeptide Inhibitor Complex. J. Mol. Graphics Modell. 2008, 26, 813−823. (71) Hamelberg, D.; Mongan, J.; Mccammon, J. A. Accelerated Molecular Dynamics: a Promising and Efficient Simulation Method for Biomolecules. J. Chem. Phys. 2004, 120, 11919−11929. (72) Pierce, L. C.; Salomon-Ferrer, R.; Augusto F. de Oliveira, C.; Mccammon, J. A.; Walker, R. C. Routine Access to Millisecond Time Scale Events with Accelerated Molecular Dynamics. J. Chem. Theory Comput. 2012, 8, 2997−3002. (73) Moreira, I. S.; Fernandes, P. A.; Ramos, M. J. Accuracy of the Numerical Solution of the Poisson-Boltzmann Equation. J. Mol. Struct.: THEOCHEM 2005, 729, 11−18. (74) Warwicker, J. pKa Predictions with a Coupled Finite Difference Poisson-Boltzmann and Debye-Huckel Method. Proteins: Struct., Funct., Genet. 2011, 79, 3374−3380. (75) Spassov, V. Z.; Yan, L. A Fast and Accurate Computational Approach to Protein Ionization. Protein Sci. 2008, 17, 1955−1970. (76) Talley, K.; Ng, C.; Shoppell, M.; Kundrotas, P.; Alexov, E. On the Electrostatic Component of Protein-Protein Binding Free Energy. PMC Biophys. 2008, 1, 2. (77) Warshel, A.; Sharma, P. K.; Kato, M.; Parson, W. W. Modeling Electrostatic Effects in Proteins. Biochim. Biophys. Acta, Proteins Proteomics 2006, 1764, 1647−1676. (78) Li, L.; Li, C.; Zhang, Z.; Alexov, E. On the Dielectric ″Constant″ of Proteins: Smooth Dielectric Function for Macromolecular Modeling and its Implementation in Delphi. J. Chem. Theory Comput. 2013, 9, 2126−2136. (79) Antosiewicz, J.; Mccammon, J. A.; Gilson, M. K. The Determinants of pKas in Proteins. Biochemistry 1996, 35, 7819−7833. (80) Reyes, C. M.; Kollman, P. A. Investigating the Binding Specificity of U1A-RNA by Computational Mutagenesis. J. Mol. Biol. 2000, 295, 1−6. (81) Zhu, Y. L.; Beroza, P.; Artis, D. R. Including Explicit Water Molecules as Part of the Protein Structure in MM/PBSA Calculations. J. Chem. Inf. Model. 2014, 54, 462−469. (82) Gapsys, V.; Michielssens, S.; Seeliger, D.; De Groot, B. L. Accurate and Rigorous Prediction of the Changes in Protein Free 71

DOI: 10.1021/acs.jcim.6b00378 J. Chem. Inf. Model. 2017, 57, 60−72

Article

Journal of Chemical Information and Modeling Energies in a Large-Scale Mutation Scan. Angew. Chem., Int. Ed. 2016, 55, 7364−7368.

72

DOI: 10.1021/acs.jcim.6b00378 J. Chem. Inf. Model. 2017, 57, 60−72