Rotamer-Specific Potentials of Mean Force for Residue Pair Interactions

of Excellence, UniVersite´ de Montre´al, C.P. 6128, Succ. Centre-Ville, Montre´al, Que´bec H3C 3J7, Canada. ReceiVed: June 10, 1999; In Final Form...
5 downloads 0 Views 183KB Size
J. Phys. Chem. B 2000, 104, 1097-1107

1097

Rotamer-Specific Potentials of Mean Force for Residue Pair Interactions Alexandre S. Lemak and John R. Gunn* De´ partment de Chimie, Centre de Recherche en Calcul Applique´ , and Protein Engineering Network of Centers of Excellence, UniVersite´ de Montre´ al, C.P. 6128, Succ. Centre-Ville, Montre´ al, Que´ bec H3C 3J7, Canada ReceiVed: June 10, 1999; In Final Form: October 6, 1999

We present an approach for the development of knowledge-based mean-force potentials for representing residue-residue interactions in proteins at the amino acid level of resolution which take into account the rotamer states of the residues. The rotamer-specific potentials which consist of an additional correction term to the current database-derived statistical potentials are expressed in terms of distance-dependent mean force potentials (MFP). To describe the relative geometry of two residues we used a set of six distances between different pairs of atoms. The distant-dependent MFPs are calculated from computer simulations of each pair of amino acids at the atomic level using the weighted-histogram analysis method. The umbrella sampling for each of 52003 different rotamer pairs was performed with a force-bias Monte Carlo algorithm using molecular mechanics potentials with statistically derived restraint terms. Our simulations show that the MFPs are very sensitive to the rotamer states of the amino acid residues. Preliminary tests of the potential’s performance show that it can reproduce side-chain packing with reasonable accuracy.

1. Introduction Simplified, reduced models of protein molecules have been employed for many years in structure predictions and protein folding studies.1 These models have been developed to achieve the extensive sampling of conformational space required in protein folding simulations and structure predictions with currently available computer resources. Using reduced models is fraught with the problem of devising corresponding effective potentials suitable for representing interactions at a low level of resolution. There are a few different effective potentials2-13 for residue-residue interactions of proteins which are widely used in computer algorithms to fold or recognize protein structure. They are database potentials derived from statistical analysis of known protein structures. Although these statistical potentials demonstrate a rather good performance in protein modeling and folding simulations, they exhibit at the same time certain limitations. In particular, current potentials have often failed to select the native fold or correct fold of an amino acid sequence among many incorrect alternatives. One of the reasons for this is that the current statistical potentials acting between pairs of residues do not take into account the specific conformations of the residue side chains. However, it is well-known that the specific interactions between different side chains are crucial in determining the protein structure and an accurate treatment of these interactions is clearly important in protein structure prediction and protein folding simulations. Here we propose an approach to extend current residue-residue database potentials in order to represent effectively the possible side-chain specific interactions at the same level of resolution. The new potential is therefore still a residue-based backbone energy, but one which takes into account the ability of the backbone conformation to accommodate the packing of the side chains. As the goal is to use the potential in folding simulations with the reduced model, the complication of calculating explicit sidechain coordinates (and the distances among them) is avoided. In this approach, it is supposed that side chains may occupy * Corresponding author.

one of a small number of discrete conformations, called rotamers.14-16 The potential, acting between a pair of residue types whose side chains are in particular rotamer states may be expressed in terms of two contributions: an average, homogeneous part, the same for all possible rotamer states of the residue pair in question, and a rotamer-specific contribution. The nonspecific contribution depends on the residue types and on their relative position in space, and is presumed to be known. The extra contribution reflecting the rotamer-specific interaction energy of two residues is proposed to be calculated by the following procedure. First, for each rotamer pair the distancedependent potentials of mean force (PMF) are calculated by carrying out umbrella sampling simulations of the corresponding dipeptide pairs. Second, a rotamer-specific part is extracted from the calculated PMF and is then used as additional correction term to the current statistical potential. We believe that the way in which the rotamer states of residues make themselves evident in the case of the PMF is the same as it would be in the case of a statistical potential. The primary goal of this paper is to present the approach and to show what the calculated PMF looks like. It also contains the results of two tests of potential performance. 2. Rotamer-Specific Potentials for a Reduced Model of Proteins A. The Reduced Model. In the reduced model of a protein molecule introduced previously,17,18 each residue is described by the positions of six atoms N, H, CR,Cβ,C′, and O, and thus a residue side chain is represented as a single interaction site which coincides with the Cβ atom. The relative orientation in space of two rigid residues i and j can be specified completely by six distances h1 ) |rCR(i) - rCR(j)|, h2 ) |rCβ(i) - rCβ(j)|, h3 ) |rCβ(i) - rC′(j)|, h4 ) |rC′(i) - rCβ(j)|, h5 ) |rCβ(i) - rCβ(j)|, and h6 ) |rN(i) - rCβ(j)|. In this paper the reduced model is extended in order to take into account effectively an internal conformational state of the residue side chains which are not considered explicitly. For this purpose, we make use of the common simplifying assumption that the internal side-chain conforma-

10.1021/jp9919157 CCC: $19.00 © 2000 American Chemical Society Published on Web 01/05/2000

1098 J. Phys. Chem. B, Vol. 104, No. 5, 2000

Lemak and Gunn where Pijrs(h1, ..., h6) stands for the normalized probability of finding two rotamers with distances hi, and Cijrs is an additive constant. In our case the probability Pijrs(h1, ..., h6) is represented as a product of independent probabilities:

TABLE 1: Number of Rotamer States for Each Residue Type residue name

number of rotamers

residue name

number of rotamers

Ala Cys Ser Pro Ile Val Thr Trp Asn Asp

1 3 3 2 9 3 3 9 9 9

His Phe Tyr Leu Glu Gln Met Lys Arg Gly

6 6 6 9 27 27 27 81 81 1

Pijrs(h1, ..., h6) ) Pijrs(h1)‚Pijrs(h2) ... Pijrs(h6)

tions exist in a limited number of canonical shapes, called rotamers. A rotameric state is assigned to each residue by introducing a discrete rotamer degree of freedom for each residue. We make use of the rotamer set defined in refs 19 and 20. The total number of rotameric states of a residue varies from 1 to 81, depending on the residue type (see Table 1), producing a total of 322 different rotamers. B. Rotamer-Specific Potentials. In the extended reduced model of a protein molecule there are 52003 different types of rotamer pairs instead of 210 different types of residue pairs. Now it seems to be unrealistic to derive distance dependent potentials for all rotamer-rotamer interactions by carrying out a statistical analysis3,20 of the database of known protein structures (PDB), because the present body of data is not of sufficient size to derive from it a reliable estimate of such a large set of parameters. We propose an alternative to a statistical analysis approach for developing potentials for rotamer-rotamer interactions in proteins at the residue level of resolution. For this purpose, the interaction energy between two residues is represented as a sum of two terms. The first term corresponds to the average interaction energy of the residues with no rotamer state specified (rotamer nonspecific energy), and the second one is a correction term that reflects the rotamer specificity of a residue-residue interaction. The rotamer nonspecific energy is presumed to be known and can be calculated from commonly used database-derived statistical potentials, while the rotamerspecific correction energy is proposed to be extracted from the distance-dependent potentials of mean force. The PMF for each pair of rotamers can be obtained by carrying out umbrella sampling computer simulations of the corresponding dipeptide pair at atomic resolution (see section 3). These Monte Carlo (MC) simulations provide the probability distributions for each of the six distances h1, h2, ..., h6. The potential of mean force for the hk distance, W(hk), is determined up to an additive constant from the corresponding probability distribution P(hk) for a canonical ensemble in configurational phase space in accordance with the relation W(hk) ) -kBT‚ln P(hk). What is needed is the specific information contained in the PMF of the rotamer pair that distinguishes it from the average, rotamernonspecific interaction of the corresponding residues. The redundant information can be captured from the calculated PMF by a suitably defined reference state2,3 where there is no specific rotamer-rotamer interaction. We choose to define a reference state according to the normalization scheme carried out in the following two steps. (1) Let us consider two residues of type i and j, which are in rotamer states r and s, respectively. The distance-dependent potential of mean force of the residues, which have a relative orientation in space specified by six distances h1, ..., h6, is in general given by

Wijrs(h1, ..., h6) ) -kBT ‚ ln(Pijrs(h1, ..., h6)) + Cijrs

(1)

(2)

where the probabilities Pijrs(hk), k ) 1, ..., 6, are obtained from MC simulation (see section 3). The probabilities Pijrs(hk) are assumed to be normalized, i.e., ∑hk Pijrs(hk) ) 1. Choosing a nonzero value of the additional constant in eq 1 means a renormalization of these probabilities in order to meet some boundary condition. Certain assumptions and approximations are usually made to perform the renormalization.10,21,22 One possible way is to represent the constant Cijrs by Cijrs ) kBT‚ln P ijrs where P ijrs is the reference-state residue pair distribution at infinite separation where the residue interaction is zero. As is shown by our simulations, two dipeptides at a separation distance of more then 15 Å practically do not interact with one another, and it seems to be reasonable to choose the following boundary condition

Pijrs(hm1 , ..., hm6 ) ) P ijrs

(3)

where hmi ) 15 Å for i ) 1, ..., 6. The boundary condition given by eq 3 can be satisfied by rescaling the known probabilities in accord with

P h ijrs(hk) ) Pijrs(hk)/Pijrs(hmk )

(4)

and by representing the overall modified probability in the following form:

h ijrs(h1) ‚ P h ijrs(h2) ... P h ijrs(h6) Pijrs(h1, ..., h6) ) P ijrs ‚ P

(5)

In line with our assumption, the probability corresponding to zero interaction, P ijrs, is the probability of adopting the given rotamer pair out of the equilibrium ensemble of all possible non-inteacting rotamers pairs. In MC simulations of an isolated dipeptide pair these reference probabilities are in fact equal to 1, and thus the corresponding constants in eq 1 are equal to 0. In the case of a protein molecule it is not evident how to choose properly the reference state ensemble. We propose here to calculate P ijrs from the observed frequencies of the rotamer pair in question in different proteins from the PDB. This can be justified with a simple geometrical argument. Since the number of pairs grows quadratically with separation distance, interacting pairs represent a small fraction of the total. Our analysis of the PDB show that non-local pairs are even further depleted at distances of less than 10 Å. This means that although the observed frequency P ijrs is a result of a summation of the corresponding distance-dependent frequencies over all possible separation distances, it represents, in fact, probabilities for rotamer pairs at large separation distances. Therefore, it seems to be reasonable to use the observed frequencies as a right hand side of eq 3. In this work, in section 4, we approximate the probability P ijrs by a product of independent probabilities P ir and P js that residue i occupies rotamer state r and that residue j occupies rotamer state s, respectively. The values of P ir and P js are available from the rotamer library. This approximation is known as the random-mixing approximation,2,10 which assumes that in the absence of interactions, the residues would be uniformly distributed throughout the available volume, and the reference

Mean-Force Potentials for Residue Pair Interactions

J. Phys. Chem. B, Vol. 104, No. 5, 2000 1099

probabilities are proportional to the relative concentrations of the residue types. But this approximation does not take into account the fact that the residues in protein are covalently linked in specific sequence, and more accurate calculation of the probabilities P ijrs is now under way. (2) The relative free energy ∆Wijrs(h1, ..., h6), which contains specific information about the rotamer-rotamer interaction, is calculated in a convenient way as follows:3

∆Wijrs(h1, ..., h6) ) -kBT ‚ ln

(

)

Pijrs(h1, ..., h6) Pij(h1, ..., h6)

(6)

where Pij(h1, ..., h6) is the corresponding reference probability of observing the residue pair (i - j) at a given relative geometry independently of the rotamer states of the residues. The reference probability is given by

Pij(h1, ..., h6) ) )

Pijrs(h1, ..., h6) ∑ r,s

(7)

P ijrs ‚ P h ijrs(h1) ‚ P h ijrs(h2) ... P h ijrs(h6) ∑ r,s

Now, using eqs 4-7 one can express the correction free energy ∆Wijrs(h1, ..., h6) as a difference between Wijrs(h1, ..., h6) and the rotamer non-specific free energy Wij(h1, ..., h6) in accord with

∆Wijrs(h1, ..., h6) 6

)

Wijrs(hk) - Wij(h1, ..., h6) + Wijrs ∑ k)1 )

δWijrs(h1,

..., h6) +

W ˆ ijrs

Figure 1. Schematic diagram of the isoleucine dipeptide with the φ, ψ, χ1, and χ2 dihedral angles labeled.

A. A Dipeptide. A blocked dipeptide contains two peptide bonds and one side chain. We have considered all 20 dipeptide types that differ from one another by their side chains. For example, a schematic diagram of the isoleucine dipeptide is shown in Figure 1. A united-atom representation of a dipeptide was modeled using the CHARMM24 force field. All bond lengths and bond angles in a dipeptide were fixed according to the force-field parameter set. It is also assumed that the torsions ω are fixed, so that both the backbone atom groups CH3-CONH-CH- and -CH-CO-NH-CH3 make a rigid unit within the dipeptide. The torsions φ, ψ, and all χ dihedral angles are the only internal degrees of freedom of a dipeptide that are allowed to be varied. For a given dipeptide, its rotamer state k is fixed “softly” by applying to its variable torsion angles a database-derived restraint potential of the following form:

Ukrot(φ, ψ, χ1, χ2, ..., χnr)

(8) nr

where

) Wijrs(hk)

) -kBT ‚

ln(P h ijrs(hk))

W h ijrs ) -kBT ‚ ln(P ijrs)

(9) (10)

Wij(h1, ..., h6) ) -kbT ‚ ln(Pij(h1, ..., h6)) ) -kBT ‚ ln(

∑ r,s

6

∑ Wrs(hk)/kBT P ijrs ‚ e-k)1 ) ij

(11)

We are reminded that P h ijrs(hk) are probabilities calculated from MC simulations and are rescaled such that P h ijrs(hmk ) ) 1 at m ij m m hk ) 15 Å, so that δWrs(h1 , ..., h6 ) ) 0. Thus, having both the PMF for each of the six distances hk, Wijrs(hk), and the probabilities P ijrs one can calculate the rotomer-specific PMF ∆Wijrs(h1, ..., h6). 3. Simulation Details We have performed a large set of MC simulations with a simple molecular system consisting of two blocked dipeptides considered at the atomic level of resolution. In this section we describe in detail the systems that have been simulated and the method used to extract from these simulations the probability distributions along each of six distances of interest.

∑ i)1

kBTr 2(δχki )2

‚ (χi - χki )2 - kBTr ‚ ln

(

)

Nk(φ,ψ) + n0

Nk(φm,ψm) + n0

(12)

Here, kB is Boltzmann’s constant, Tr, is the temperature, nr is the number of χ angles in the dipeptide, χki and δχki stand for the average value and standard deviation of χi, respectively, Nk(φ,ψ) is a statistical distribution of φ and ψ angles for given rotamer k, (φm, ψm) denotes the backbone conformation at which the function Nk(φ, ψ) has its maximum, and n0 is a constant used to treat sprase data in Nk(φ,ψ). The parameters χki and δχki of the restraint potential were taken from a backboneindependent rotamer library, while the distributions Nk(φ,ψ) were derived from a backbone-dependent rotamer library. For this purpose the rotamer libraries of Dunbrack and Karplus23 were used. The value of Nk(φ,ψ) is obtained from the number of residues of a given type in a database (PDB) which are observed in the kth rotamer state with local backbone conformation (φ, ψ). The restraining potential Ukrot has minimum equal 0 at the dipeptide configuration specified by torsions (φm, ψm, χk1, ..., χnk r). In our simulations the following parameter values were used: Tr ) 5000 K, and n0 ) 10-3. In this case, the χ torsions were restrained by quadratic restraint potentials which have the force constants within the range of 15 to 500 kcal/ (mol radian2), depending on the value of δχki . The restraint potential Ukrot with chosen values of its parameters turns out to be sufficient to preserve a given side-chain rotamer state, as well as to provide a proper sampling of (φ,ψ) angles, at a simulation temperature of 500 K.

1100 J. Phys. Chem. B, Vol. 104, No. 5, 2000

Lemak and Gunn into the region of interest in conformational space. We used the distance h1 ) |R| between the CR atoms of two dipeptides as the degree of freedom along which the umbrella potential is applied. In order to distinguish the distance to which an umbrella potential is applied from the other distances introduced in section 2.A in this and the next section the following designation is used: d ) h1. The umbrella potential was represented as a harmonic restoring potential

Vs(d) )

Figure 2. Schematic representation of the alanine and valine dipeptides along with the molecule-embedded coordinate systems O1x1y1z1 and O2x2y2z2. R is the separation vector between the CR atoms of the dipeptides. The orientation of Val with respect to Ala is defined in terms of the three Euler angles Θ, Φ, and Ψ.

B. A Dipeptide Pair. In order to describe the relative orientation in space of two dipeptides, for each dipeptide a molecule-embedded coordinate system, O1x1y1z1 and O2x2y2z2 for the first and second dipeptides, respectively, is defined (see Figure 2). The origin of the molecule-embedded frame is assumed to be on the CR atom of the dipeptide. The z1 axis is taken along the CR-Cβ bond of the first dipeptide. The y1 axis is chosen such that it is orthogonal to the plane containing CRCβ and CR-N bonds. The x1 axis completes a right-handed system. The same recipe is adopted for defining the axes z2 and y2 of the system O2x2y2z2. The first dipeptide is assumed to be fixed in space. The position in space of the second one is determined uniquely by a set of six coordinates. Our choice of these six coordinates consists of three components Rx, Ry, and Rz of the vector R connecting the CR atoms of the two dipeptides, expressed in the frame O1x1y1z1, and three Euler angles Θ, Φ, and Ψ describing the orientation of the system O2x2y2z2 with respect to O1x1y1z1. Thus, specifying a set of variables {R, Θ, Φ, Ψ; Q1, Q2}, where Q1 and Q2 denote the sets of internal degrees of freedom (φ, ψ, and χ torsions) of the first and second dipeptides, respectively, fully defines a given configuration of the dipeptide pair. The overall configurational energy of the dipeptide pair, E(R, Θ, Φ, Ψ; Q1, Q2), consists of two different components. The first component includes the energy of non-bonded (van der Waals and electrostatic) interactions between all atoms of the system. For these interactions a set of parameters of the CHARMM force field for a united-atom representation of the dipeptides was used. In order to simulate the protein environment, the dielectric constant for charge-charge interactions was set to 4. All non-bonded interactions were truncated at 10.5 Å. The second component of the configurational energy E includes the database-derived restraint potentials Ukrot for each of the two dipeptides. This component serves to preserve the rotamer states of the dipeptides. C. Umbrella Sampling. For each pair of dipeptides with given rotamer states, a number of MC simulations were performed using the umbrella sampling25,26 method. In umbrella sampling, the potential energy of the system E is changed by adding an umbrella potential Vs in order to direct the system

k0 (d - ds)2 2

(13)

where k0 is a force constant. This potential restricts the range of d sampled in a given run to a small value ∆d around the preselected value, ds. In order to sample adequately the configurational space of interest, several simulations with different values of ds that sample different regions of the coordinate of interest have to be performed. To sample configurations {R, Θ, Φ, Ψ; Q1, Q2} from an ensemble defined by the energy, E{R, Θ, Φ, Ψ; Q1, Q2} + Vs(d), and temperature, Ts, a Monte Carlo (MC) walk was performed with the force-bias method.27 The temperature Ts was set to 500 K, and the force constant k0 of the umbrella potential was set to 1 kcal/(mol Å2). This yields windows having a value of ∆d ≈ 2 Å. For a given rotamer pair, eleven separate runs were performed choosing a series of different values of ds for each simulation, namely, ds ) 18, 16, 14, 12, 10, 8, 7, 6, 5, 4, and 3 Å, so that successive simulations sample overlapping regions along d. For each run, the starting configuration was taken from the previous run (for the first run the starting configuration is random). Then, 10000 MC steps were performed for equilibration, followed by 50000 MC steps during which statistics on the variables of interest were collected. D. The WHAM Technique. The weighted histogram analysis method28 (WHAM), which is discussed in this section, correctly combines the results of different simulations which sample overlapping ranges of selected variables in order to obtain the probability distributions of these variables. We have six variables of interest, namely the six distances h1, h2, ..., h6. Fortunately, we have no need to apply the umbrella potential to each of the distances in order to enhance their sampling. It turns out (see section 4.A) that successive simulations with the umbrella potential applied to the distance d provide a proper sampling of the conformational space spanned by all six distances of interest. Therefore, the simulations described in the previous section are suitable for deriving all six probability distribution functions we are interested in, because using the WHAM technique it is possible to obtain reliable estimates for not only the distributions of the degrees of freedom to which the umbrella potential is applied, but also for the distributions of other degrees of freedom that have been sufficiently sampled as well.29 For this purpose the following scheme is employed. Consider M simulations with the sth simulation being carried out at a temperature Ts with the potential energy E{R, Θ, Φ, Ψ; Q1, Q2} + Vs(d) (perturbed system). The variables of interest E, d, h2, ..., h6 are partitioned into bins. To simplify notation we assume that these variables take a discrete set of values that consists of the bin midpoints, so that a particular value of a variable will henceforth denote the corresponding bin. For each simulation s, the indices of the bins E(ti), d(ti), h2(ti), ..., h6(ti) in which the system is found at MC step ti are saved for each step, forming a “trajectory” of the system in the conformational space spanned by the variables E, d, h1, ..., h6. Our simulations of different rotamer pairs show that at a temperature of 500 K

Mean-Force Potentials for Residue Pair Interactions

J. Phys. Chem. B, Vol. 104, No. 5, 2000 1101

the perturbed system samples conformational potential energies E within the range of -15 to 20 kcal/mol. On the other hand, we are interested in the PMF profiles within the distance range of 3 to 18 Å. Approximately uniform sampling within this distance range is achieved with a seies of 11 successive simulations as described in the previous sections. Therefore, in order to represent discretely the variables which are saved in the trajectory during the simulations, 100 bins of equal size of 0.2 Å within the distance range 1 to 21 Å, and 40 bins of equal size of 1 kcal/mol in the energy range -15 to 25 kcal/mol were used for all distance variables and for the energy, respectively. We are interested in the probabilities of finding the system in a particular bin in an ensemble defined by the potential energy E(R, Θ, Φ, Ψ; Q1, Q2) (unperturbed system) and temperature T, set here to T ) 300 K. The probabilities of interest, P(d), P(h2), ..., P(h6), are approximately given by

P(d) ) Z-1 0

∑E h ∑ ,...,h 2

P(hj) ) Z-1 0

Ω(E, d, h2, ..., h6) exp{-E/kBT}

(14)

6

∑E ∑d h∑*h Ω(E, d, h2, ..., h6) × exp{-E/kBT} k

j

M

∑ ∑ENs(E, d) s)1

Ω*(d) )

-V (d)/k T nsz-1 ∑ s e s)1 s

zs )

s

2

Ω(E, d, h2, ..., h6) exp{-E/kBT} (16)

where Ω(E, d, h2, ..., h6) is a generalized density of states of the unperturbed system with given values (E, d, h2, ..., h6), Z0 is a normalization factor (the partition function of the unperturbed system) which satisfies the condition that the probability of finding the system in any of the bins be to equal to 1, and summations are over all corresponding bins. The generalized density of states can be estimated from each simulation s of the perturbed system. Ω(E, d, h2, ..., h6) is actually a geometrical property of the configurational space of the unperturbed system and, therefore, it is independent of the umbrella potential applied, Vs, and the simulation temperature, Ts. An optimal estimate of the density of states can be derived by combining the results of all M simulations, and is determined from the WHAM equations,28 given in our case by

Ns(E, d, h2, ..., h6) ∑ s)1 M

∑ s)1 zs )

E ∑δd(td ) ‚ δE(t ) i

ti)1

(21)

i

where the summation runs over all MC steps ti of simulation s. A self-consistent solution Ω*(d) of the coupled eqs 19, 20 can be determined by using a convenient iterative scheme. Then, Ω*(d) is used to define the following “weight” function M

∑ ∑ENs(E, d) s)1 (22)

Ω*

Thus, to determine Γ(d) we need to collect statistics only on variables E and d. All probabilities of interest can then be calculated from one pass through all saved trajectories by weighting each configuration by a corresponding factor in accord with the folowing expressions

P(d) ) Z-1 0

M ns

(1/k T -1/k T)E(t ) d δd(t ‚ Γ(d(ti)) ∑ ∑ )e s)1t )1 B s

B

i

i

(23)

i

P(hk) ) Z-1 0 ‚

M ns

δhh (t ) ‚ e(1/k T -1/k T)E(t ) ‚ Γ(d(ti)) ∑ ∑ s)1t )1 k k i

B s

B

i

(24)

i

and M ns

Z0 )

e(1/k T -1/k T)E(t ) ‚ Γ(d(ti)) ∑ ∑ s)1t )1 B s

B

i

(25)

i

M

Ω(E, d, h2, ..., h6) )

(20)

ns

Ns(E, d) )

(15)

6

B s

Here Ns(E, d) ) ∑h2,...,h6Ns(E, d, h2, ..., h6) is the number of counts, which is calculated from a trajectory saved in simulation s as follows:

Γ(d) )

∑E ∑d h ∑ ,...,h

B s

∑d Ω*(d) e-V (d)/k T , s ) 1, ..., M

and

Z0 )

(19)

M

(17) -(E+Vs(d))/kBTs nsz-1 s e

∑E ∑d allh ∑ Ω(E, d, h2, ..., h6)e-(E+V (d))/k T s

B s

(18)

k

where s ) 1, ..., M and Ns(E, d, h2, ..., h6) is the number of times in which the perturbed system is found in a particular bin (E, d, h2, ..., h6) during simulation s, and ns is the number of snapshots taken for the sth simulation. Ns(E, d, h2, ..., h6) can be calculated from a saved trajectory (see below). Because of the umbrella potential Vs is applied only to one variable, the distance d, eqs 17-18 can be rewritten in a simpler form which allows the calculations to be performed efficiently. By multiplying eq 17 by e-E/kBTs, and summing over all bins corresponding to variables h2, ..., h6, and E we obtain

From these probabilities the corresponding potentials of mean force were generated. E. The PMF Fitting. From the results of our MC simulations a set of 52003 different PMF profiles, each represented by 100 discrete values, were obtained. In order to keep and use this very large collection of data more efficiently, a functional form was fitted to each PMF profile. It turns out that the commonly used Morse function does not fit well the collection of data which exhibits a wide variety of PMF profiles. A better fitting was achieved for the following functional form

f(x) ) e-(a1+a2‚x) - e-(a3+a4‚x) + a5

(26)

where a1, ..., a5 are fitting parameters. Thus, after fitting each PMF profile is represented by only five parameters. 4. Results and Discussion A. The Sampling. Figure 3 shows the sampling histograms for the six coordinates R, Θ, Φ, Ψ which describe the relative

1102 J. Phys. Chem. B, Vol. 104, No. 5, 2000

Lemak and Gunn

Figure 3. Observed distributions of the six variables which describe the relative geometry of isoleucine and tyrosine dipeptides, during three different MC simulation runs with the umbrella potential Vs(d) ) 0.5(d - ds)2 applied to h1, the distance between CR atoms. Each run consists of 60000 steps and differs from the others in the value of the distance ds in the umbrella potential: ds ) 17 Å (solid line), 7 Å (dashed line), and 3 Å (dotted line). Distributions are shown for (A) the distance h1, (B) the polar angle alpha, (C) the azimuthal angle γ, (D) the Euler angle Θ, (E) the Euler angle φ, and (F) the Euler angle ψ.

geometry of the dipeptide pair Ile-Tyr in three runs with different umbrella potentials. Here the vector R is represented by its spherical coordinates h1, R, and γ defined in the O1x1y1z1 frame. In the run with an umbrella potential which keeps the two dipeptides at a large separation distance (ds ) 18 Å), the dipeptides are practically not in contact and the sampling of the all angle variables R, γ, Θ, Φ, Ψ is to a quite good approximation uniform. The sampling distributions for these variables begin to deviate noticeably from uniform only in the simulation runs with the dipeptides kept at a separation distance of less than 8 Å. During such runs the system still visits all regions of conformational space spanned by these variables. Finally, in the runs with closely-spaced dipeptides (ds ≈ 3-5 Å) there are regions of conformational space which are impossible for the system to visit to because of the steric clashes. However, even during these runs, the system appears not to be trapped for a long time in a particular conformation. It seems that in this case during a simulation run the system has visited all allowed regions of the conformational space. This indicates that in our simulations with the chosen values of the parameters we achieved a proper sampling of the region of conformational space spanned by the relative orientations. Let us now discuss another related question, the sampling of the distance variables h1, h2, ..., and h6. As we have already noted, an approximately uniform sampling of the distance h1 within the range of 3 to 18 Å is achieved by carrying out 11 successive simulations with different umbrella potentials applied to the coordinate h1. To make sure that this series of simulations provides a proper uniform sampling of all other five distances of interest as well, we perform the following test. For a given rotamer pair of Ile and Tyr dipeptides, we have carried out five additional series of simulations that differ from one another in the choice of variable to which the umbrella potential is applied. We have tried each of the six distances as the chosen variable

for the umbrella potential. From each of this series of simulations the PMFs for all six distances were calculated. It was found that for a given variable, the calculated PMF profile is sufficiently similar in all six series of simulations. This suggests that the proper uniform sampling of all six distances of interest within the considered range of 3 to 18 Å is very likely to be achieved in the simulations with the umbrella potential being applied to the distance h1. This is what is expected, since in the dipeptide model used the atoms CR, Cβ, N, C′ make a rigid tetrahedron, and, therefore, fixing the distance h1 restricts all other five distances h2, ..., and h6 to within some range that does not differ noticeably from that of h1, and the shifting of the system along h1 yields an ensemble which has also been shifted in a similar way along each of the other five distances. B. Distance-Dependent Potentials of Mean Force. Figure 4 shows a comparison of the potentials of mean force along different types of inter-atomic distances, giving an idea of what the potentials of mean force Wijrs(hk) look like. The obtained PMFs demonstrate a strong dependence on the rotameric state of the two dipeptides considered. To characterize the variety of PMFs more concretely we describe a PMF profile by both the separation distance at which the potential takes its minimum, Xmin, and the value of this minimum, 0. Figures 5A and 5B show representative plots of Xmin and 0 as functions of the rotameric states of two interacting dipeptides. One can see from these plots that for a particular dipeptide pair the optimal separation distance for one rotamer pair can differ by more then 2 Å from that for another rotamer pair. The potential depth also can vary considerably (by a factor of about 8-10) as the rotamer state of the interacting dipeptides is changed. The rotamer dependence of the PMFs can be described in compact form by calculating for a particular dipeptide pair the standard deviations δXmin ) (〈X2min〉 - 〈Xmin〉2)0.5 and δ0 ) (〈20〉 - 〈0〉2)0.5, where

Mean-Force Potentials for Residue Pair Interactions

J. Phys. Chem. B, Vol. 104, No. 5, 2000 1103

Figure 5. Rotamer dependence of the parameters Xmin and 0 of a potential of mean force for h1. Xmin is the distance at which the potential has its minimum 0. The values of (A) Xmin and (B)0 are plotted for all possible rotamer pairs of the Ile and Gln dipeptides. Figure 4. Potentials of mean force for the isoleucine and tyrosine dipeptides. (A) Fitted PMFs Wrs(h1) as a function of h1, the distance between the CR atoms of the dipeptides are plotted for different rotamer pairs: r ) 2 and s ) 1 (thick line), r ) 2 and s ) 6 (dot-dash line), r ) 3 and s ) 2 (dashed line), r ) 3 and s ) 3 (dotted line), r ) 3 and s ) 4 (thin line). (B) Data for PMFs W12(hi) along different interatomic distances h, between CR atoms (dotted line), between CR atoms (solid line), between Cβ and C′ atoms (dashed line), and between N and Cβ atoms (dot-dashed line).

averaging is performed over all possible rotamer pairs of the dipeptides. For example, in the case of the Ile-Gln dipeptide pair presented in Figures 5A and 5B, we have δXmin ) 0.38 Å and δ0/〈0〉 ) -0.25. Figures 6A and 6B show both the mean values and the standard deviations of Xmin and 0 calculated from the corresponding PMFs along the separation distance h1 for all possible dipeptide pairs. As for the PMFs along other five distances of interest, we note that explanation of the different PMF profiles has revealed the following general feature: for a given particular rotamer pair, the PMF profiles along different distances differ from one another by the values of Xmin, while having practically the same value of 0 (see Figure 4B). Namely, for a given rotamer pair, Xmin can vary with the distance type by about 2 Å, while the value of the relative change in 0 is on average about 0.05. As we have pointed out in section 2, the PMFs, Wijrs(hk), discussed above contain redundant information, and we have proposed to use in calculations not these PMFs but their rotamerspecific, correction part, δWijrs(h1, ..., h6) defined by eq 8 (see also Figure 7A). Since the average probability distribution Pij(h1, ..., h6) cannot be factorized, the rotamer-specific potential δWijrs(h1, ..., h6) cannot be represented as a sum of corresponding independent potentials for each of the distances hk, k )

1, ..., 6. To give an idea of what the correction, rotamer-specific PMF looks like, different one-dimensional sections along the h1 coordinate of the different six dimensional hypersurfaces δWijrs(h1, ..., h6) are presented in Figures 7B and 7C. One can see that the PMF profiles along h1 obtained for two dipeptides with the same relative orientation have very different shapes depending on the rotamer states of the dipeptides. On the other hand, for a given rotamer pair, a change in relative orientation of the dipeptides results in large changes in the PMF profile only for small separation distances (