New Computational Approach for External Entropy ... - ACS Publications

May 6, 2016 - notation qR:RL (qL:RL) is used for the part R (L) in the complex. RL since the ...... (13) Ratkova, E. L.; Palmer, D. S.; Fedorov, M. V...
0 downloads 0 Views 1MB Size
Article pubs.acs.org/JCTC

New Computational Approach for External Entropy in Protein− Protein Binding Song-Ho Chong and Sihyun Ham* Department of Chemistry, Sookmyung Women’s University, Cheongpa-ro 47-gil 100, Yongsan-Ku, Seoul 04310, Korea S Supporting Information *

ABSTRACT: Molecular recognition through the noncovalent association of biomolecules is of central importance in biology and pharmacology. Developing reliable computational methods for estimating binding thermodynamic parameters is therefore of great practical value. However, considerable uncertainty remains regarding the external entropy that is associated with the reduction in the external (positional and orientational) degrees of freedom upon complex formation. Here, we present a novel statistical mechanical method for computing the external entropy by extending the energetic approach we have developed for unimolecular processes to association processes. We find that, in contrary to what is postulated in most of the previous methods, intrinsic couplings between the internal and external degrees of freedom of bound complex cannot in general be neglected in the determination of the external entropy. Nevertheless, there exists the best choice of the external coordinates with which those couplings are minimized. With such a choice of the external coordinates, the lowest upper bound of the external entropy is obtained from a tractable expression, which serves as an estimate of the external entropy. Our method can be implemented in a straightforward manner with molecular dynamics simulations, and its applicability is demonstrated through the application to the barnase-barstar complex. ΔSext of biomolecular association.22−33 Yet, many of the previous approaches rely on postulates whose adequacy is not fully examined. For example, although it seems to be taken for granted that ΔSext reflects molecule’s overall translational and rotational motions, how those motions affect ΔSext is not obvious. Indeed, within the classical statistical thermodynamics, the standard binding free energy as well as its enthalpy and entropy components are entirely configuration-space properties,34 and hence, ΔSext should not depend on molecule’s inertia parameters such as the total mass and principal moments of inertia. Furthermore, it is conventionally assumed from the onset that there is no coupling between the internal and external degrees of freedom and that it suffices to take into account the latter in the determination of ΔSext. This might be justified under the rigid rotor approximation35 with which molecule’s partition function can be factored into the internal and external terms. However, this approximation is valid only for infinitesimal vibrations,36 and its applicability to flexible macromolecules such as proteins is questionable. In this paper, we develop a novel method for computing the external entropy for biomolecular complexes. This is done by extending the energetic approach37−39 that we have recently developed for unimolecular processes such as protein folding to association processes. We find from the derived expression that intrinsic couplings between the internal and external degrees of freedom cannot in general be neglected in the determination of the external entropy. On the other hand, the presence of those couplings hinders the calculation of the external entropy

A

ccurate prediction of the standard binding free energy involving complex biomolecules such as proteins is one of the central problems in computational chemistry.1−3 Rigorous methods have been developed to this end such as the free energy perturbation simulations4,5 and the potential-of-meanforce based approaches.6,7 Even more challenging is the determination of each of the enthalpy and entropy components of the standard binding free energy ΔG0bind 0 ΔG bind = (ΔEu + ΔHsolv ) − T (ΔSsolv + ΔSconfig + ΔSext)

(1)

whose knowledge will further advance our understanding of the molecular mechanisms behind the noncovalent biomolecular association. The gas-phase solute energy term (Eu) can be easily obtained from the molecular force field based on ensembles of (typically, simulated) solute configurations at the initial free and final bound states. A variety of calculation methods are available for the solvation free energy (Gsolv = Hsolv − TSsolv) and its enthalpy (Hsolv) and entropy (Ssolv) components such as continuum solvation models,8−10 molecular integral equation theories,11−13 and approaches based on molecular dynamics simulations.14,15 New methodologies are being developed in recent years for the configurational entropy (Sconfig) that go beyond the harmonic approximation and take into account the multiple-minimum structure of the biomolecular configuration space.16−21 However, there remains a challenge in developing computational methods for the external entropy (ΔSext), which is associated with the reduction in the external (positional and orientational) degrees of freedom upon complex formation. Several methods have been proposed and applied to compute © XXXX American Chemical Society

Received: February 16, 2016

A

DOI: 10.1021/acs.jctc.6b00174 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation

this case, one can interpret M ≡ ∫ dqX IX(qX) as the total number of relevant simulated configurations. For the description of the binding thermodynamics, one must adopt the same M for all the species: the quantity −kBT log[ZRL/ (ZRZL)] can then be understood as the free energy necessary for making M complex molecules from M receptor molecules and M ligand molecules.37 With the factor 1/(8π2V0) required for the standard state correction,34 this indeed leads to the standard binding free energy given in eq 2. We adopt the bond-angle-torsion coordinates as the internal coordinates qX that consist of bond lengths (di), bond angles (θj), and torsion angles (ϕk).40 For each of R and L, these coordinates are defined by referring to three arbitrarily chosen atoms: these atoms in R shall be referred to as atoms a, b, and c, and those in L as atoms A, B, and C. The Jacobian for X = R and L takes the form JX(qX) = ∏i di2 ∏j sin θj, i.e., it consists of “hard” variables only (di and θj) that do not depend significantly on the values of “soft” variables (ϕk).40 For the internal coordinates qRL of the complex RL, one needs to introduce additional six coordinates qext, to be referred to as the “external” coordinates, to specify the position and orientation of L relative to R, such that qRL = (qR:RL, qL:RL, qext). Here, the notation qR:RL (qL:RL) is used for the part R (L) in the complex RL since the conformation of R (L) may change upon complex formation. For the external coordinates, we adopt qext = (raA, θA, θB, ϕA, ϕB, ϕC) using the six chosen atoms from R and L as depicted in Figure 1.41 The Jacobian for the complex is then given by

because of the necessity of evaluating a high-dimensional integral over both the internal and external coordinates. To overcome this difficulty, we also derive a tractable expression for the upper bound for the external entropy that involves an integration over the external coordinates only. We argue that there exists the best choice of the external coordinates with which the couplings between the internal and external degrees of freedom are minimized. Such a choice of the external coordinates provides the lowest upper bound of the external entropy, which serves as an estimate of the external entropy. We implement our method with molecular dynamics simulations, and its applicability to protein−protein binding is demonstrated through the application to the barnase-barstar complex.



THEORY Standard Binding Free Energy. We are interested in the process in which a receptor R and a ligand L form a noncovalently bound complex RL. (In the present article, both R and L refer to proteins, but our formulation applies to arbitrary receptor and ligand molecules.) The statistical mechanics expression for the standard binding free energy ΔG0bind of such a process is well established:34 ⎛ 1 Z ⎞ 0 ΔG bind = −kBT log⎜ 2 0 RL ⎟ ⎝ 8π V Z R Z L ⎠

(2)

Here, kB and T are Boltzmann’s constant and the temperature, respectively; V0 = 1661 Å3 is the standard volume that corresponds to the standard concentration of 1 M; and ZX (X = R, L, or RL) denotes the configuration integral expressed as a function of the molecule’s internal coordinates qX, ZX =

∫X dqX JX (qX) e−βf (q )

JRL (qRL) = JR (qR:RL)JL (qL:RL)Jext (q ext)

(5)

with40,41 Jext (q ext) = raA 2 sin θA sin θB

X

(3)

(6)

In this expression, JX(qX) is the Jacobian for changing variables from Cartesian to internal coordinates.40 f = Eu + Gsolv, often referred to as the effective energy, is a sum of the gas-phase protein energy (Eu) and the solvation free energy (Gsolv), and β = 1/(kBT). The subscript X in the integral symbol (∫ X) means that the integration over qX is restricted to the configurationspace region relevant to each species X. For example, when R and L refer to folded proteins, the integration is restricted to the configuration-space region corresponding to the folded state. For the complex RL, the integration must be carried out only over the configuration-space region associated with the bound state, and the region where R and L are separated and are not forming a complex should be excluded. In practice, this is done by introducing certain criteria for defining the bound state in terms of the root-mean-square-deviation (RMSD) value to a reference structure, the number of native contacts, or the center-of-mass distance between R and L. It is convenient to express the restriction on the configuration-space integral in terms of a step function IX(qX) defined such that it is unity if a configuration qX is relevant (i.e., it satisfies a criterion adopted) and zero otherwise.34 We then rewrite ZX as ZX =

∫ dqX IX(qX)JX (qX) e−βf (q ) X

(4)

Figure 1. External coordinates qext = (raA, θA, θB, ϕA, ϕB, ϕC) in which the distance raA, the angles θA and θB, and the torsion angles ϕA, ϕB, and ϕC are defined with three atoms a, b, and c from receptor R and three atoms A, B, and C from ligand L.

In practical applications, the function IX(qX) is determined by simulated configurations: IX(qX) is unity if qX is a relevant configuration accessed by simulations and zero otherwise. In B

DOI: 10.1021/acs.jctc.6b00174 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation Energetic Approach. The energetic approach37−39 we have developed for unimolecular processes such as protein folding aims to compute the ratio of two configuration integrals ZX/ZX′ for two states X and X′ (e.g., folded and unfolded states) of a protein, which is usually a formidable task. The key point of the approach is to introduce the normalized distribution functions WX(f) and WX′(f) of the effective energy f (hence, the “energetic” approach), with which one can define the ratio of the “normalized” configuration integrals Ẑ X/Ẑ X′. Unlike the unnormalized densities of states nX(f) and nX′( f) necessary for ZX/ZX′ that involve protein configurations accessible in states X and X′, one can easily obtain the normalized distributions WX( f) and WX′( f) based on protein configurations accessed by simulations. By defining the step functions IX and IX′ using the same number (= M) of protein configurations, one can derive the relation ZX/ZX′ = Ẑ X/Ẑ X′.38 This allows us to compute the free energy difference between two states via −kBT log(Ẑ X/Ẑ X′). In particular, this approach provides a practical means to compute the change in protein configurational entropy ΔSconfig (see below). We start from rewriting the configuration integral given in eq 4 as ZX =

∫ df nX(f ) e−βf

TSX,config =

∫ df WX(f ) e−βf nX (f )

∫ df nX (f )

=

(8)

Two comments are in order on the derived expression for the external entropy. First, separating the configuration qRL of the complex RL into the internal (qR: RL and qL: RL) and the external (qext) degrees of freedom is always possible with arbitrarily chosen six atoms that define the latter. Furthermore, we have introduced so far no requirement on the internal and external degrees of freedom in deriving eq 17. Therefore, one can conclude that the external entropy as given by eq 17 is independent of the choice of the external coordinates. In particular, it has nothing to do with the center of mass position nor with the principal moments of inertia. Second, while the Jacobian JRL(qRL) for the complex can be factored into the receptor, ligand, and external terms (eq 5), such a factorization does not in general hold for the function IRL(qRL). Thus, the internal and external degrees of freedom are intrinsically coupled in the determination of the external entropy. Upper Bound for the External Entropy. The derived expression for ΔSext, however, is not of practical value since it requires an evaluation of quite a high-dimensional integral. We shall derive here a tractable expression for the upper bound of ΔSext that can be used in practical applications to estimate the external entropy. To this end, we recall qRL = (qR:RL, qL:RL, qext) and introduce IR:RL(qR:RL), IL:RL(qL:RL), and Iext(qext), which are the marginal functions of IRL(qRL). All the marginal functions also take a value of either unity or zero. IR:RL(qR:RL) is unity if a configuration qR:RL for the receptor part is the one found in the bound complex, irrespective of whether the configuration qL:RL for the ligand part and the external coordinates qext are relevant to the bound state or not. IL:RL(qL:RL) and Iext(qext) have a similar physical meaning. Let us choose three arbitrary configurations q(i) RL (i = 1, 2, 3) of the bound state for which IRL(q(i) ) = 1. By the definition of RL (i) the marginal functions, we have IR:RL(q(i) R:RL) = IL:RL(qL:RL) = (i) Iext(qext) = 1. Now, we consider a configuration q̃ RL = (q(1) R:RL, (3) q(2) L:RL, qext ). In general, there is no guarantee that IRL(q̃ RL) = 1.

(9)

nX (f )

∫ dqX IX(qX)JX (qX)

(10)

The two configuration integrals ZX and Ẑ X are related via

∫ dqX IX(qX)JX (qX)]ẐX

ZX = [

(11)

Substituting this into eq 2 yields ⎛ Ẑ ⎞ ⎛ ΔV ⎞ 0 ΔG bind = −kBT log⎜ RL ⎟ − kBT log⎜ 2 ext0 ⎟ ⎝ 8π V ⎠ ⎝ Ẑ R Ẑ L ⎠

(12)

where ΔVext =

∫ dqRL IRL(qRL)JRL (qRL) ∫ dq R IR(qR)JR (qR) ∫ dqL IL(qL)JL (qL)

(13)

We now show that the second term on the right-hand side of eq 12 can be identified as the external entropy. To see this, we first notice the following result that can be derived using the cumulant expansion method (see refs 37 and 38 for the details of the derivation and for the discussion on the configurational entropy): −kBT log Ẑ X = −kBT log

∫ df WX(f ) e−βf

= f ̅ − TSX,config

(16)

⎤ ⎡ ∫ dqRL IRL(qRL)JRL (qRL) 1 ⎥ ΔSext = kB log⎢ 2 0 ⎢⎣ 8π V ∫ dq IR (q )J (q ) ∫ dq IL(q )J (q ) ⎥⎦ R R R R L L L L (17)

with the normalized distribution function W(f): WX(f ) =

(15)

By referring to eq 1 and remembering that f = Eu + Gsolv = Eu + Hsolv − TSsolv (we notice here that f and its components are conformation-dependent and the quantities in eq 1 refer to the averaged quantities), we obtain the following expression for the external entropy:

Let us introduce the normalized configuration integral Ẑ X =

n=3

( −β)n − 1 (Δf )nc n!

⎛ ΔV ⎞ 0 ΔG bind = Δf ̅ − T ΔSconfig − kBT log⎜ 2 ext0 ⎟ ⎝ 8π V ⎠

(7)

∫ dqX IX(qX)JX (qX)δ(f − f (qX))





Here, g ̅ = ∫ df WX( f) g denotes the average taken with respect to WX(f), Δf = f − f ̅, and (Δf )nc is the nth cumulant average.42 For the systems we have investigated previously, including unfolded and intrinsically disordered proteins, the distribution function WX( f) is well approximated by Gaussian, and this holds thanks to the central limit theorem.37,43,44 In this case, the cumulant averages of n ≥ 3 vanish, and hence, the configurational entropy is simply given by TSX,config = (β /2)Δf 2 , providing a simple recipe to compute the configurational entropy. One then obtains from eqs 12 and 14

in terms of the density of states of the effective energy f: nX (f ) =

β 2 Δf − 2

(14)

in which the configurational entropy is given by C

DOI: 10.1021/acs.jctc.6b00174 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation (1) (2) On the other hand, there holds IR:RL(qR:RL ) IL:RL(qL:RL ) (3) Iext(qext ) = 1. We therefore obtain the following inequality:

IRL(qRL) ≤ IR:RL(qR:RL)IL:RL(qL:RL)Iext(q ext)

(18)

Since log x is a monotonically increasing function (i.e., log x ≤ log y if x ≤ y), one obtains from eqs 5, 17, and 18 ⎡ ∫ dq )J (q )⎤ I (q R:RL R:RL R:RL R R:RL ⎥ ΔSext ≤ kB log⎢ ⎢⎣ ⎥⎦ ∫ dqR IR(qR)JR (qR) ⎡ ∫ dq )J (q ) ⎤ I (q L:RL L:RL R:RL L L:RL ⎥ ⎢ + kB log ⎢⎣ ⎥⎦ ∫ dqL IL(qL)JL (qL) ⎡ 1 + kB log⎢ 2 0 ⎣ 8π V



⎤ dq ext Jext (q ext)Iext(q ext)⎥ ⎦

Figure 2. X-ray structure (PDB entry 1BRS46) of the barnase-barstar complex.

(19)

We now argue that the first and second terms on the righthand side of eq 19 are vanishingly small as far as we use the same number (= M) of configurations for all the species as we stated above. (In the so-called single-trajectory approach in which the configurations of the receptor R and the ligand L are taken from the complex RL, there hold qR = qR:RL and qL = qL:RL, and hence, the first and second terms are identically zero. Here, we are considering a more general situation in which the configurations of the receptor R and the ligand L may change upon complex formation.) We first notice that JX̃ ≡

1 M

∫ dqX IX(q)JX (qX)

taken from the X-ray measurement (PDB entry 1BRS46). Three crystal structures were reported for the complex (barnase chains are labeled A, B, and C, and the corresponding barstar chains are labeled D, E, and F), and we chose chains A and D for barnase and barstar, respectively. Since Cys40 and Cys82 in the wild-type barstar were mutated to Ala in the crystallographic study, these two Ala residues were mutated back to Cys using the Swiss PDB Viewer.47 Missing residues 1 and 2 in barnase chain A were modeled using those present in chain B, whereas missing residues 64 and 65 in barstar chain D were modeled with those in chain F. The barnase-barstar complex was solvated by 23 477 water molecules and neutralized by 4 counter Na+ ions in a cubic box of the initial side length 95.4 Å. The FF99SB force field48 was used for proteins, and the TIP3P model49 was adopted for water. After the standard minimization and equilibration steps, three independent 1 μs production runs were conducted with different random initial velocities. Short-range nonbonded interactions were cut off at 12 Å. Long-range electrostatic interactions were treated with the particle mesh Ewald algorithm.50 Bonds including hydrogen atoms were constrained using the SHAKE algorithm,51 and the simulations were done with a time step of 2 fs. Temperature and pressure were maintained with Berendsen’s method.52 From each 1 μs trajectory, 200 000 configurations were saved with a 5 ps time interval, which were then subjected to the analysis. The three independent trajectories were used to compute averages (a) and standard errors (ϵ) to be reported in the form of a ± ϵ in the following. Computation of the External Entropy. The computation of the external entropy based on the simulated complex configurations proceeds as follows. We first choose three atoms a, b, and c from barnase (considered as the receptor R) and three atoms A, B, and C from barstar (considered as the ligand L). With these six atoms, we define the external coordinates qext = (raA, θA, θB, ϕA, ϕB, ϕC) as depicted in Figure 1. We then sample the values of qext from the simulated complex configurations and compute ΔS′ext via

(20)

can be considered as an average of the Jacobian over the M configurations. Since the Jacobians for the receptor R and the ligand L are given by hard variables (bond lengths and angles) only, which are usually constrained by harmonic potentials present in the gas-phase energy term (Eu), the average JX̃ for X = R and L should be very close to the Jacobian given by the equilibrium bond lengths and bond angles. In addition, since hard variables do not significantly depend on the values of the soft variables,40 this will hold true even when the receptor and ligand configurations are altered upon binding. Therefore, the ratio appearing in the first and second terms on the right-hand side of eq 19 is close to unity. We thus obtain the following expression for ΔS′ext that provides the upper bound for ΔSext ⎡ 1 ΔSext ≤ kB log⎢ 2 0 ⎣ 8π V



∫ dq ext Jext (q ext)Iext(q ext)⎥⎦ ≡ ΔSext′ (21)

which involves an integration over the external coordinates only. Unlike the general expression (eq 17) for the external entropy ΔSext, the value of ΔSext ′ depends on the choice of qext. Since ΔSext ′ is the upper bound for ΔSext, the best choice for qext is the one that minimizes ΔS′ext. Such a minimum value of ΔS′ext provides the lowest upper bound of ΔSext and will serve as a practical estimate of the external entropy.



⎡ 1 ′ = kB log⎢ 2 0 ΔSext ⎣ 8π V

COMPUTATIONAL METHODS Molecular Dynamics Simulations. We apply our method to compute the external entropy associated with the formation of the barnase-barstar complex (Figure 2). To obtain necessary inputs for this purpose, we carried out molecular dynamics simulations at the temperature T = 300 K and the pressure P = 1 bar using the AMBER14 suite.45 The initial coordinates were

∫0 ∫0 D

∫r

π

sin θB dθB× 2π

rmax

raA 2 draA

min

∫0



dϕA

∫0

∫0

π

sin θA dθA



dϕB

⎤ dϕCIext(raA , θA , θB , ϕA , ϕB , ϕC)⎥ ⎦

(22)

DOI: 10.1021/acs.jctc.6b00174 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation which is simply a rewriting of eq 21 with the use of eq 6. Here, rmin and rmax refer to the minimum and maximum of raA values sampled from the simulations. Each integral was evaluated after discretization with equal size bins: the range for raA was divided with bins of 0.2 Å, whereas the range for the angles (θA and θB) and torsion angles (ϕA, ϕB, and ϕC) was discretized with angular bins of 4π/180 (4°). A typical computational time of the six-dimensional integral is ∼10 min using a single core of the Intel Xeon processor (2.6 GHz). We recall here that Iext(qext) is a step function: if two or more configurations yield the values of qext within the specified bin sizes, only one of those configurations contributes to ΔS′ext. For this reason, ΔS′ext is expected to converge if a sufficiently large number of boundstate configurations are taken from the simulations. This procedure yields a value of ΔSext ′ for a particular choice of the six atoms, (a, b, c, A, B, C). This computation is repeated for a number of choices of the six atoms to find (a, b, c, A, B, C) that minimizes ΔSext ′ , which is the best estimate for ΔSext.

Figure 4. Convergence plot of the temperature-weighted external ′ (t) as a function of the simulation time t computed with entropy TΔSext the external coordinates defined with the N-terminal−N-terminal pair (filled black circles) and with the residue pair R83−D39 (filled red squares). Solid lines denote the power-law fit discussed in the main text.



RESULTS The X-ray structure of the barnase-barstar complex was stable during the simulations. In fact, the Cα root-mean-square deviation from the X-ray structure stays around ∼1 Å in all the three independent 1 μs simulation trajectories (Figure S1). Hence, all the simulated structures were considered to belong to the bound state, from which we extracted values of the external coordinates qext necessary to compute ΔS′ext. To gain insights on how ΔSext ′ depends on the choice of the six atoms (three from barnase and three from barstar) that define the external coordinates, we investigated the following two different choices: (i) three main-chain atoms (named N, CA, C in the PDB file) in the N-terminal residue of barnase and the same three atoms in the N-terminal residue of barstar and (ii) three side-chain atoms (NE, CZ, NH) of R83 in barnase and three side-chain atoms (OD, CG, CB) of D39 in barstar forming an interprotein salt-bridge at the binding interface (Figure 3).

entropy in units of energy (kcal/mol), ΔS′ext(t) denotes the value of ΔSext ′ in which the complex structures up to the simulation time t were taken into account, and the curves shown are based on the single representative trajectory. We find that, while TΔS′ext(t) curves exhibit a tendency of convergence at long times, they are not fully converged even with the 1 μs length trajectory. To estimate the long-time limit of TΔSext ′ (t), we fitted each curve with a function of the form at−b + c with the parameters a, b, and c.53 The numerical value of c after the least-squares fit was then taken as the estimate for TΔSext ′ = limt→∞ TΔSext ′ (t). We obtain TΔSext ′ = −2.0 ± 0.1 kcal/mol with the flexible external coordinate and TΔS′ext = −6.8 ± 0.1 kcal/mol with the rigid external coordinates. Since ΔS′ext is the upper bond for the external entropy ΔSext, a smaller (more negative) ΔSext ′ provides a better estimate of ΔSext. Thus, these results indicate that a better estimate of the external entropy is gained with more rigid external coordinates. This is reasonable since there is a multiplicative factor raA2 in eq 22 for ΔSext ′ , whereas no such factor is present for the angle integrals. The smaller and more restricted range for the distance raA (see Figure 1) in turn follows when two atoms a and A are more rigidly bound, e.g., via a hydrogen bond. In this way, our “selection rule” that the more rigid part of the complex provides a better choice of the external coordinate set can be rationalized. Since the interfacial residues are expected to provide the most rigid parts connecting barnase and barster, we examined a number of residue pairs located at the binding interface in search of the external coordinates that minimize ΔS′ext. For this purpose, we selected the residues pairs listed in Table 1 each forming an interprotein hydrogen bond or salt-bridge of substantial (>70%) population during the simulations. We then computed ΔS′ext for each residue pair by defining the external coordinates such that they include atoms involved in the hydrogen bond or salt-bridge. Numerical results of similar magnitude (−6.8 to −5.6 kcal/mol) were obtained for TΔS′ext with those selected residue pairs (Table 1). A somewhat large (less negative) value is obtained, e.g., for the residue pair R59− E76, and we found that this is because this residue pair is located at the edge of the interface and, hence, is rather solvent exposed (Figure S2). From the results listed in Table 1, the best estimate (−6.8 kcal/mol) of the external entropy is provided by the R83−D39 residue pair. Our numerical estimate for the external entropy is in good agreement with the one (−6.6 kcal/

Figure 3. Locations of the N-terminal residues of barnase (colored cyan) and barstar (orange) used in the definition of the flexible external coordinates, and the locations of R83 of barnase (blue) and D39 of barstar (red) forming a salt-bridge used in the definition of the rigid external coordinates.

These two choices shall be referred as the flexible and rigid external coordinates since the terminal residues are among the most flexible parts in proteins and the R83−D39 salt-bridge was found to be quite stable (99.6 ± 0.2% population) during the simulations. We show in Figure 4 the cumulative TΔSext ′ (t) for the flexible and rigid external coordinates as a function of the simulation time t. Here, the temperature T is multiplied to report the E

DOI: 10.1021/acs.jctc.6b00174 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation

the external coordinates with which those intrinsic couplings between the internal and external degrees of freedom are minimized and (ii) with such a choice the best estimate of the external entropy is obtained from a tractable expression involving the integration over the external coordinates only. The presence of the intrinsic couplings between the internal and external degrees of freedom and the existence of the best external coordinates in the determination of the external entropy can be schematically understood with the help of Figure 5. When the external coordinates are chosen as in Figure

Table 1. Hydrogen-Bond Population and External Entropy for Selected Residue Pairs residue paira

population (%)

K27−T42 R59−D35 R59−E76 R83−Y29 R83−D39 R83−G43 R87−D39 H102−G31 H102−N33 H102−D39 Y103−D39

92.8 ± 1.1 99.0 ± 0.5 85.8 ± 5.8 72.0 ± 11.7 99.6 ± 0.2 91.7 ± 3.8 100.0 ± 0.0 93.1 ± 1.5 97.3 ± 0.6 99.3 ± 0.3 75.6 ± 1.2

TΔS′ext (kcal/mol) −6.1 −5.6 −5.6 −5.7 −6.8 −6.4 −6.4 −6.4 −6.3 −6.4 −6.7

± ± ± ± ± ± ± ± ± ± ±

0.1 0.2 0.2 0.2 0.1 0.0 0.3 0.1 0.1 0.1 0.0

a

Residue pair A−B from barnase (A) and barstar (B) forming an interprotein hydrogen bond or salt-bridge.

mol) computed for the barnase-barstar complex54 using the system-frame method developed in ref 27.



DISCUSSION Understanding entropy of biomolecular systems is a complex and long-standing issue.34,41,55,56 It is associated with the equilibrium probability distribution of the configurations of the whole solute and solvent molecules present in the system. Decomposing the total entropy of the system into the solvent (Ssolv), solute internal (Sconfig), and solute external (Sext) parts and separately handling each of them as if it were an independent component have been widely employed. However, such treatment already involves certain assumptions on the separability of the solvent and solute motions as well as that of the solute internal and external degrees of freedom, which are often not fully spelled out. The separability of the solvent and solute contributions to the entropy generally holds to a good approximation because the time scales for the solvent dynamics (picoseconds for water) are typically much shorter than those for sensible protein conformational changes (nanoseconds or longer). In fact, this time-scale separation is already incorporated in our starting equation (eq 3) in which the solvent degrees of freedom are averaged out and their effects are entirely embedded in the solvation free energy part of the effective potential f = Eu + Gsolv. On the other hand, the separability of the solute internal and external contributions is not obvious. Each of the receptor and the ligand molecules has its own six external degrees of freedom. Since the bound complex has only six external degrees of freedom as a whole, the six degrees of freedom which were originally the external degrees of freedom turn into the internal ones of the complex. (Those six degrees of freedom are referred to as the “external” degrees of freedom with the notation qext in the present article.) There is no obvious reason to assume the time-scale separation of the internal and such “external” degrees of freedom, implying that their intrinsic couplings cannot be neglected in general. We indeed find that, while the solute entropy can formally be decomposed into the internal (Sconfig) and external (Sext) contributions, those intrinsic couplings are retained in the general expression (eq 17) for the external entropy. Of course, introducing certain approximations is inevitable in practical applications since one cannot evaluate a high-dimensional integral involving both the internal and external coordinates. In this regard, our primary findings in the present paper are that (i) there exists the “best” choice of

Figure 5. Schematic illustration demonstrating how the degree of couplings between the internal and external coordinates depends on the choice of the external coordinates. (a) In this choice of the external coordinates defined by three atoms from the receptor (colored cyan) and three atoms from the ligand (orange), one also must inquire the value of the internal coordinate ϕ to judge whether the complex is in the bound state or not. (b) On the other hand, in this alternative choice of the external coordinates, one needs not pay attention to the value of the internal coordinate ϕ′ to make such a judgment.

5a, one also needs to inquire the value of the internal coordinate ϕ in order to judge whether the complex is in the bound state or not. This is an example of the aforementioned couplings because of which, using the notation introduced before, the knowledge on the full coordinates qRL = (qR:RL, qL:RL, qext) is necessary to judge whether IRL(qRL) is unity or not. With such IRL(qRL) that enters into the general expression (eq 17), the value of the external entropy is independent of the choice of the external coordinates. Overlooking this point and inferring that the judgment on the bound state can be done solely in terms of the external coordinates through the marginal function Iext(qext) may lead to a misunderstanding that the external entropy depends on the choice of the external coordinates, and hence, there is ambiguity in this concept. If the external coordinates are chosen as in Figure 5b, on the other hand, one needs not pay attention to values of the internal coordinate ϕ′. In this sense, this is a “better” choice and the resulting Iext(qext) holds the information closer to IRL(qRL). The best external coordinates are the ones with which one needs to pay the least attention to the other parts of the complex.



CONCLUSIONS In this paper, we report a new computational method for estimating the external entropy associated with protein−protein F

DOI: 10.1021/acs.jctc.6b00174 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation

(10) Homeyer, N.; Gohlke, H. Mol. Inf. 2012, 31, 114−122. (11) Kovalenko, A. In Molecular Theory of Solvation; Hirata, F., Ed.; Kluwer Academic: Dordrecht, The Netherlands, 2003; p 169. (12) Imai, T.; Harano, Y.; Kinoshita, M.; Kovalenko, A.; Hirata, F. J. Chem. Phys. 2006, 125, 024911. (13) Ratkova, E. L.; Palmer, D. S.; Fedorov, M. V. Chem. Rev. 2015, 115, 6312−6356. (14) Shirts, M. R.; Pitera, J. W.; Swope, W. C.; Pande, V. S. J. Chem. Phys. 2003, 119, 5740−5761. (15) Jorgensen, W. L.; Thomas, L. L. J. Chem. Theory Comput. 2008, 4, 869−876. (16) Chang, C.-E. A.; McLaughlin, W. A.; Baron, R.; Wang, W.; McCammon, J. A. Proc. Natl. Acad. Sci. U. S. A. 2008, 105, 7456−7461. (17) Meirovitch, H. J. Mol. Recognit. 2009, 23, 153−172. (18) Killian, B. J.; Kravitz, J. Y.; Somani, S.; Dasgupta, P.; Pang, Y.-P.; Gilson, M. K. J. Mol. Biol. 2009, 389, 315−335. (19) Hensen, U.; Lange, O. F.; Grubmüller, H. PLoS One 2010, 5, e9179. (20) Suárez, E.; Díaz, N.; Suárez, D. J. Chem. Theory Comput. 2011, 7, 2638−2653. (21) Fenley, A. T.; Killian, B. J.; Hnizdo, V.; Fedorowicz, A.; Sharp, D. S.; Gilson, M. K. J. Phys. Chem. B 2014, 118, 6447−6455. (22) Yu, Y. B.; Privalov, P. L.; Hodges, R. S. Biophys. J. 2001, 81, 1632−1642. (23) Luo, H.; Sharp, K. Proc. Natl. Acad. Sci. U. S. A. 2002, 99, 10399−10404. (24) Swanson, J. M. J.; Henchman, R. H.; McCammon, J. A. Biophys. J. 2004, 86, 67−74. (25) Minh, D. D. L.; Bui, J. M.; Chang, C.-E.; Jain, T.; Swanson, J. M. J.; McCammon, J. A. Biophys. J. 2005, 89, L25−L27. (26) Carlsson, J.; Åqvist, J. Phys. Chem. Chem. Phys. 2006, 8, 5385− 5395. (27) Irudayam, S. J.; Henchman, R. H. J. Phys. Chem. B 2009, 113, 5871−5884. (28) Deng, N.-J.; Zhang, P.; Cieplak, P.; Lai, L. J. Phys. Chem. B 2011, 115, 11902−11910. (29) Silver, N. W.; King, B. M.; Nalam, M. N. L.; Cao, H.; Ali, A.; Reddy, G. S. K. K.; Rana, T. M.; Schiffer, C. A.; Tidor, B. J. Chem. Theory Comput. 2013, 9, 5098−5115. (30) Huggins, D. J. J. Chem. Theory Comput. 2014, 10, 3617−3625. (31) Huggins, D. J. J. Comput. Chem. 2014, 35, 377−385. (32) Fogolari, F.; Corazza, A.; Fortuna, S.; Soler, M. A.; VanSchouwen, B.; Brancolini, G.; Corni, S.; Melacini, G.; Esposito, G. PLoS One 2015, 10, e0132356. (33) Fogolari, F.; Foumthuim, C. J. D.; Fortuna, S.; Soler, M. A.; Corazza, A.; Esposito, G. J. Chem. Theory Comput. 2016, 12, 1−8. (34) Gilson, M. K.; Given, J. A.; Bush, B. L.; McCammon, J. A. Biophys. J. 1997, 72, 1047−1069. (35) McQuarrie, D. A. Statistical Mechanics; Harper & Row: New York, 1976. (36) Potter, M. J.; Gilson, M. K. J. Phys. Chem. A 2002, 106, 563− 566. (37) Chong, S.-H.; Ham, S. J. Phys. Chem. B 2014, 118, 5017−5025. (38) Chong, S.-H.; Ham, S. J. Phys. Chem. B 2015, 119, 12623− 12631. (39) Chong, S.-H.; Ham, S. Acc. Chem. Res. 2015, 48, 956−965. (40) Go̅, N.; Scheraga, H. A. Macromolecules 1976, 9, 535−542. (41) Boresch, S.; Tettinger, F.; Leitgeb, M.; Karplus, M. J. Phys. Chem. B 2003, 107, 9535−9551. (42) van Kampen, N. G. Stochastic Processes in Physics and Chemistry, 3rd ed.; North-Holland: Amsterdam, The Netherlands, 2007. (43) Chong, S.-H.; Park, M.; Ham, S. J. Chem. Theory Comput. 2012, 8, 724−734. (44) Chong, S.-H.; Ham, S. J. Phys. Chem. B 2013, 117, 5503−5509. (45) Case, D. A.; Berryman, J. T.; Betz, R. M.; Cerutti, D. S.; Cheatham, T. E., III, Darden, T. A.; Duke, R. E.; Giese, T. J.; Gohlke, H.; Goetz, A. W.; Homeyer, N.; Izadi, S.; Janowski, P.; Kaus, J.; Kovalenko, A.; Lee, T. S.; LeGrand, S.; Li, P.; Luchko, T.; Luo, R.; Madej, B.; Merz, K. M.; Monard, G.; Needham, P.; Nguyen, H.;

binding. For this purpose, we extend the energetic approach originally developed for unimolecular processes to association processes. Through such an extension, we obtain the general expression for the external entropy that takes into account intrinsic couplings between the internal and external degrees of freedom. Because of the presence of those couplings, the external entropy is well-defined irrespective of the choice of the external coordinates. On the other hand, the general expression is practically useless because of the difficulty in evaluating a high-dimensional integral over both the internal and external coordinates. To circumvent this problem, we also derive a tractable expression for the upper bound of the external entropy that involves the integration over the external coordinates only. We argue that there exists the best choice of the external coordinates with which the couplings between the internal and external degrees of freedom are minimized. The lowest upper bound of the external entropy can then be computed based on such external coordinates, which serves as an estimate of the external entropy. Our method can be easily implemented with molecular dynamics simulations, and its applicability to other biomolecular association processes such as protein−ligand and protein−DNA bindings is obvious. It will therefore find a wide range of applications including the computer-aided drug design.



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jctc.6b00174. Cα root-mean-square deviation from the X-ray structure for the three simulation trajectories and the figure showing the locations of R59 of barnase and E76 of barstar (PDF)



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Phone: +82 2 710 9410. Fax: +82 2 2077 7321. Funding

This work was supported by Samsung Science and Technology Foundation under Project Number SSTF-BA1401-13. Notes

The authors declare no competing financial interest.



REFERENCES

(1) Deng, Y.; Roux, B. J. Phys. Chem. B 2009, 113, 2234−2246. (2) Wereszczynski, J.; McCammon, J. A. Q. Rev. Biophys. 2012, 45, 1−25. (3) Hansen, N.; van Gunsteren, W. F. J. Chem. Theory Comput. 2014, 10, 2632−2647. (4) Straatsma, T.; McCammon, J. Annu. Rev. Phys. Chem. 1992, 43, 407−435. (5) Kollman, P. Chem. Rev. 1993, 93, 2395−2417. (6) Woo, H.-J.; Roux, B. Proc. Natl. Acad. Sci. U. S. A. 2005, 102, 6825−6830. (7) Doudou, S.; Burton, N. A.; Henchman, R. H. J. Chem. Theory Comput. 2009, 5, 909−918. (8) Still, W. C.; Tempczyk, A.; Hawley, R. C.; Hendrickson, T. J. Am. Chem. Soc. 1990, 112, 6127−6129. (9) Kollman, P. A.; Massova, I.; Reyes, C.; Kuhn, B.; Huo, S.; Chong, L.; Lee, M.; Lee, T.; Duan, Y.; Wang, W.; Donini, O.; Cieplak, P.; Srinivasan, J.; Case, D. A.; Cheatham, T. E. Acc. Chem. Res. 2000, 33, 889−897. G

DOI: 10.1021/acs.jctc.6b00174 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation Nguyen, H. T.; Omelyan, I.; Onufriev, A.; Roe, D. R.; Roitberg, A.; Salomon-Ferrer, R.; Simmerling, C. L.; Smith, W.; Swails, J.; Walker, R. C.; Wang, J.; Wolf, R. M.; Wu, X.; York, D. M.; Kollman, P. M. AMBER 14; University of California: San Francisco, CA, 2014. (46) Buckle, A. M.; Schreiber, G.; Fersht, A. R. Biochemistry 1994, 33, 8878−8889. (47) Guex, N.; Peitsch, M. C. Electrophoresis 1997, 18, 2714−2723. (48) Hornak, V.; Abel, R.; Okur, A.; Strockbine, B.; Roitberg, A.; Simmerling, C. Proteins: Struct., Funct., Genet. 2006, 65, 712−725. (49) Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L. J. Chem. Phys. 1983, 79, 926−935. (50) Darden, T.; York, D.; Pedersen, L. J. Chem. Phys. 1993, 98, 10089−10092. (51) Ryckaert, J.-P.; Ciccotti, G.; Berendsen, H. J. C. J. Comput. Phys. 1977, 23, 327−341. (52) Berendsen, H. J. C.; Postma, J. P. M.; van Gunsteren, W. F.; DiNola, A.; Haak, J. R. J. Chem. Phys. 1984, 81, 3684−3690. (53) Killian, B. J.; Kravitz, J. Y.; Gilson, M. K. J. Chem. Phys. 2007, 127, 024107. (54) Ulucan, O.; Jaitly, T.; Helms, V. J. Chem. Theory Comput. 2014, 10, 3512−3524. (55) Brady, G. P.; Sharp, K. A. Curr. Opin. Struct. Biol. 1997, 7, 215− 221. (56) Zhou, H.-X.; Gilson, M. K. Chem. Rev. 2009, 109, 4092−4107.

H

DOI: 10.1021/acs.jctc.6b00174 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX