Characterization of Hydrogen Bonding in a Continuum Solvent Model

A simple approach for calculating hydrogen bonding (H-bonding) effects in molecular mechanics simulations of peptides and proteins is presented for us...
4 downloads 0 Views 124KB Size
6490

J. Phys. Chem. B 2000, 104, 6490-6498

Characterization of Hydrogen Bonding in a Continuum Solvent Model Sergio A. Hassan, Frank Guarnieri, and Ernest L. Mehler* Department of Physiology and Biophysics, Mount Sinai School of Medicine, New York, New York 10029 ReceiVed: NoVember 3, 1999; In Final Form: March 7, 2000

A simple approach for calculating hydrogen bonding (H-bonding) effects in molecular mechanics simulations of peptides and proteins is presented for use in the framework of the continuum solvent model described in the previous paper. In this approach, the solvated macromolecule is treated as a three-component dielectric system consisting of the solvent, bulk protein, and proton acceptor media. The hydrogen bond (H-bond) interaction is identified from the interpenetration of the van der Waals spheres of the polar hydrogen and the proton acceptor. The H-bond geometry is characterized by the ideal orientation of the electron lone pairs in the acceptor atom and the directionality of the proton donor bond, as observed in experimental and ab initio studies, and classified according to the hybridization state of the acceptor atom. The algorithm was implemented into CHARMM using the PAR22 force field. By introducing the concept of a Born radius of a polar hydrogen immersed in an acceptor environment, the stabilization of H-bond energies can be introduced by means of a simple fitting procedure. This H-bonding description is easily implemented in standard force fields, with virtually no additional computing time requirements. Monte Carlo simulations were carried out on two peptides with this H-bonding treatment and the continuum solvent model. The results clearly demonstrate the need for an explicit treatment of H-bonding with the proposed continuum model, and its reliability to predict peptide structures from the primary sequence that are in agreement with experimental results.

Introduction It is well established that hydrogen bonding (H-bonding) interactions can affect the physical properties of gases, liquids, and solids, and their fundamental role in molecular biology and the chemistry of life is now widely recognized. Many classical textbooks and excellent reviews on the subject can be found in the literature.1-5 The concept of H-bonding is attributed to Huggins6 and independently to Latimer and Rodebush,7 and it was soon realized to be an important interaction for the formation and stabilization of secondary and tertiary structure in peptides and proteins. However, it was only after the classical work by Pauling, Corey, and Branson8,9 on the R-helix and the β-pleated sheet conformations, as well as the base-pairing in DNA double helix proposed by Watson and Crick,10 that the central importance of this interaction for the structure of biological molecules was finally established. A hydrogen-bond (H-bond) is formed when the electronegativity of the proton donor (PD) enables it to withdraw electrons from the proton causing partial deshielding. This deshielding allows the proton to penetrate into the van der Waals sphere of the proton acceptor (PA) and to interact with the lone pair electrons. Other kinds of H-bonds can form in biomolecules when the acceptor atom has no electron lone pairs but polarizable π electrons, as in aromatic rings. Despite this elementary and practical description, the conceptual understanding of the electronic and quantum nature of H-bonding is still a matter of active research.11,12 The first attempts to understand the fundamental nature of H-bonding go back to the classical works by Coulson13,14 who identified the components of the H-bond energy as electrostatic, delocalization, repulsion, and dispersion. Later, ab initio calculations were reported15-18 that decomposed the total H-bonding energy into electrostatic, polarization, exchange repulsion, charge-transfer, and coupling contributions. These studies along with all ab initio calculations performed to

date, have shown that the major stabilizing component of the bonding is electrostatic, and that the other contributions drop to zero at distances >4-5 Å.4,19 Several approaches have been developed to study H-bonding in biological macromolecules at various levels of approximation, including the empirical valence bond method,20,21 several semiempirical approaches,22-24 and methods suitable for ab initio treatments.25,26 In contrast to the covalent bond, the H-bond energy is known to be nonadditive and nontransferable; for example, the H-bond energy of N-H‚‚‚OdC in an R-helix is different from that in a β-sheet.3,4 Similarly, the stabilizing energies of the guanineguanine and thymine-thymine dimers are approximately -29 kcal/mol and -16 kcal/mol, respectively, although they have the same N-H‚‚‚OdC proton donor and acceptor.27 Other examples include water4,28 in which the O-H‚‚‚O bond energy in the dimer (H2O)2 is different than that in the trimer (H2O)3. Because of the complex nature of H-bonding and the experimental difficulties of making accurate determinations of H-bond energies, reliable measurements are mostly limited to small molecular complexes (molecule-ion and dimers, for example). However, attempts have been made to quantify H-bond stabilization energies in macromolecular systems.29 Extensive reviews of the available experimental data of H-bonds for smalland medium-sized systems, like nucleotides and amino acids, have been reported.27,30 Theoretical studies based on ab initio calculations are necessarily restricted to fairly small systems, with water clusters being among the most extensively studied. Current ab initio quantum mechanics has now reached a level of sophistication that calculations of equilibrium geometries and energies of H-bonded dimers and trimers in the gas phase are more reliable than their experimental determination.4 It is widely accepted that H-bonding in biomolecules belongs to the class of so-called weak/moderate interactions, with

10.1021/jp9938967 CCC: $19.00 © 2000 American Chemical Society Published on Web 06/20/2000

H Bonding in a Continuum Solvent Model energies less than ∼15 kcal/mol.4 However, since H-bonding strength is reduced in water-accessible regions because of the competition with solvent molecules,3 it is generally assumed that most of the H-bonds present in biological molecules are stabilized by ∼3-5 kcal/mol.3,4 Strong H-bonds (>15 kcal/mol), involving charged species, have also been recognized to be important in several biological mechanisms, such as enzyme catalysis or formation of salt bridges. The so-called low-barrier H-bonds (LBHB), characterized by a more covalent nature and more disperse charge distribution, were also suggested to be important in enzyme catalysis.31-33 A theoretical study of LBHBs was carried out by Warshel and Papazyan,34 who made efforts to elucidate the reasons for their large catalytic effects. Fersht and co-workers29 carried out studies on enzymesubstrate complexes using site-directed mutagenesis to produce mutant enzymes that differ in their abilities to form H-bonds with their substrates. The interaction energies could be calculated from kinetic data obtained from the modified enzymes. The authors concluded that deletion of side chains in the enzyme that form H-bonds with an uncharged group on the substrate weakens binding energy by 0.5-1.5 kcal/mol, and deletion of uncharged side chains that form H-bonds with charged groups weakens binding by 3.5-4.5 kcal/mol. However, as pointed out by the authors, mutations of this type may have further structural consequences superimposed on the energetics of H-bonding. Nevertheless, these experiments helped to understand the relationship between H-bonding and biological specificity, but are far from a systematic study of all the types of H-bonds that occur in proteins. Ab initio calculation of H-bond energies and geometries between amino acids or nucleic acids immersed in an explicit representation of solvent molecules is a difficult task and has only been reported for a few cases.35,36 Moreover, as far as we know, systematic ab initio calculations of these systems immersed in a continuum solvent representation have not been reported. However, these ab initio calculations are fundamental to elucidate H-bonding in macromolecules because these do not exist in the gas phase.37 An important characteristic of the H-bond is its directionality. The geometrical features, as revealed by structural analysis in the crystal state or in the gas phase, have been studied in great detail.3,4,38,39 In the solid state, where most of the molecules of biological importance are studied, the packing of the molecules is determined by a variety of intermolecular forces, with the H-bond interaction being only one of them. In this case, the environment plays a fundamental role in determining the geometry, because the additional forces lead to more complex potential surfaces with multiple minima, as revealed by the spread of donor positions around acceptor atoms observed in crystals.39 In the gas phase, the H-bond properties refer to the isolated complex and, therefore, are intrinsic properties of the H-bonded system and free of the interactions introduced by the environment.12 An accurate description of H-bonding interactions is of key importance in molecular mechanics modeling. Several approaches to H-bonding within the molecular mechanics framework have been reported.3,40-46 Some of these approaches use explicit functions to describe this interaction whereas others implement the description implicitly in the electrostatic and van der Waals terms of the potential. In this latter type description, the directional properties of the H-bonds are lost except when dipole-dipole interaction terms or lone pairs terms are added to the empirical force field.47 In general, the treatment of H-bonding interactions is optimized in a vacuum and the

J. Phys. Chem. B, Vol. 104, No. 27, 2000 6491 parameters are usually adjusted by fitting H-bond energies obtained from experimental data or ab initio calculation of small molecules in the gas phase.3,4,40,41 The goal of this work is to formulate a reliable and computationally economical algorithm for incorporating Hbonding effects, observed experimentally and theoretically, into the screened Coulomb potential-based implicit solvent model (SCP-ISM) described in the previous paper.48 The formulation does not constitute a model for predicting H-bond energies or geometries, but provides a technique for incorporating Hbonding in hydrated macromolecules for use in combination with an implicit solVent model for structure calculations. Because the initial application of our approach is the calculation of peptide structures from sequence, only H-bonds between amino acid residues have been defined in this paper. Extension to other molecules, such as ligands, nucleic acids, or explicit water molecules, is straightforward with the method presented below. As discussed in the previous paper,48 an explicit description of H-bonding is required because the SCP-ISM modifies the nonbonded term in the PAR22 force field.49 This modification comes about for two reasons: first, the screening function reduced the electrostatic contribution of the interactions, and second, the way that the Born radii were defined assumed that the van der Waals spheres of neighboring atoms do not interpenetrate. This assumption is correct except for the interaction between the proton and the PA in the H-bond. In this case, the interpenetration of the van der Waals spheres is substantial and to account for this interpenetration, the definition of the Born radius of the interacting proton will be modified as described in Section I. It was also found that for accurate structure prediction, the geometric characteristics of H-bonding must be explicitly included in the force field. In Section II the energies and geometries of H-bonds that have been estimated from experimental and theoretical studies on peptide-peptide (and related) interactions3,4,12,39 are incorporated into the algorithm by modifying the Born radii. In Section III two conformational searches using Monte Carlo simulations are reported to demonstrate the reliability of the SCP-ISM and the H-bonding algorithm to predict correct structures and H-bond patterns. I. Modification of the Born Radii To evaluate the Born radii (RBs) of atoms in a macromolecular environment the effectiVe Born radius of an atom was defined as48

RBs) Rwξ + Rp(1 - ξ)

(1)

where Rw and Rp are the Born radii in solvent and bulk protein phases, respectively, ξ ) SASA/4π(RvdW + Rpr)2 is the degree of exposure of the atom to the solvent [where SASA is its solvent accessible surface area, (1 - ξ) is its buried fraction or degree of exposure to the protein interior, RvdW is the van der Waals radius of the atom, and Rpr is the probe radius]. Despite the simplicity of the definition given in eq 1 it was found to be reliable for calculating the self-energy contribution to the solvation free energy as demonstrated in the previous paper48 by comparison with Poisson-Boltzmann results. However, with the Born radii defined by eq 1, the SCP-ISM failed to provide an adequate description of H-bonding. Analysis indicated that the origin of the failure was in part due to the assumption, inherent in the way Rw and Rp were determined, that the van der Waals spheres of two interacting atoms do not interpenetrate. This assumption does not hold in the special case of H-bond

6492 J. Phys. Chem. B, Vol. 104, No. 27, 2000

Hassan et al. TABLE 1: Classification of Donor and Acceptor Groups According to Atom Type and Hybridization Statesa acceptor groups

Osp2 Osp3 Nsp2 Ssp3

donors groups

Osp2 Osp3 Nsp2

Nsp3

Backbone Asn, Gln Asp-, GluTyrb Ser, Thr His Cys, Met Tyr Ser, Thr Backbone Arg,c His, Trp His+ Asn, Gln Arg+d Lys+

a The split of Osp2 acceptor and Nsp2 donor groups are based on similar geometric characteristics and ioization states; representative residues of each group are underlined. b Classified as sp2 because of the double bond-like character of the CsO bond in the hydroxil group.38 c Neutral NH group. d Charged NH + group. 2

Figure 1. Schematic diagram of hydrogen bonding in the continuum model representation showing the overlapping of the van der Waals spheres with radii Ri,vdW. The degree of exposure of the proton (H) to the solvent is denoted by ξ and to the proton acceptor (PA) region by σ; PD is not shown and Rpr is the probe radius. The degree of exposure to the bulk protein medium is given by 1 - ξ - σ ) σ′ (see eq 2).

interactions. To account for the characteristics of this interaction, the definition of the effective Born radius of the interacting proton was modified as follows: Reference to Figure 1 shows the penetration of the polar hydrogen into the van der Waals sphere of the PA, yielding, at equilibrium, the observed result rH-PA< RvdW(H) + RvdW(PA). To account for this penetration, the definition of the effective Born radius of the polar hydrogen is extended as follows (cf. Figure 1)

RBs ) Rwξ + RAζ + Rp(1 - ξ - ζ)

(2)

where ζ is the fraction of proton buried in the acceptor solvation sphere, and RA is the Born radius of the proton in a hypothetical bulk PA solvent. For a nonpolar hydrogen, σ is zero by definition, and then RBs of eq 1 is recovered. Expressions for Rw and Rp based on the expected cavity size created by the atom in aqueous and protein phases, respectively, are given in ref 48. In analogy with the definitions of Rw and Rp, it is proposed that the Born radius of a proton immersed in a hypothetical protein acceptor medium, is given by

RA ) RCOV + gA

(3)

where gA represents the enlargement of the size of the proton relative to its covalent radius. With this definition, the electrostatic interaction at interparticle distances r < rc, where

rc ) RvdW(H) + RvdW(PA) + 2Rpr

(4)

will be shown to yield the correct behavior of the H-bonding interaction in the framework of the SCP-ISM. Appropriate values of gA for each polar hydrogen involved in the interaction need to be determined, as described in the next section. It is important to note that with the approach just described, the H-bonding modulation of the electrostatic energy is in effect only at distances r < rc. In the continuum description for the solvent we are dealing with, this situation could be thought of as one where the bulk solvent between the particles is completely removed. For larger distances (r > rc), the electrostatic behavior

of the system is not perturbed at all because σ is null and the effective Born radius is calculated from eq 1. It is worth noting that with the introduction of a third continuum medium in the system, (i.e., the PA sphere environment), the implementation of H-bond interactions becomes just a simple fitting procedure that makes the technique quite flexible for introducing further improvements in the energetics of H-bonding. II. The Cavity Size in the Proton Acceptor Environment The assignment of values to gA for all the polar hydrogens in the 20 naturally occurring amino acid residues is based on atom types and hybridization states of the acceptor and donor groups as shown in Table 1. The PA and the PD hybridization states have been divided into subgroups that might present different characteristics (e.g., the acceptor group of Asn is neutral, whereas the corresponding group of Asp- is charged; the oxygen atom in Tyr has only one available lone pair, whereas the two lone pairs in Gln could support a bifurcated H-bond). All residues within a subgroup will be assigned the same values of the H-bond parameters, as evaluated for the underlined residues (see Table 1). This subdivision yields 6 acceptor groups and 7 donor groups for the side chains, giving a total of 42 different kinds of side chain-side chain (SS) H-bond interactions to be considered. The peptide bond increases the total number of cases to be analyzed to 56 because backbone groups can interact with each other [i.e., backbone-backbone interactions (BB)] and with side chains [i.e., side chain-backbone interaction (SB)], either as acceptors (-CO) or as donors (-NH). The proton shared in an H-bond is expected to have a different Born radius, depending on the particular PA-PD pair forming the H-bond. Thus, the correct value of gA in eq 2 has to be determined for each of the 56 different H-bonds that can be formed. Hydrogen-Bonding Geometries. From ab initio and experimental studies of small molecules in the gas phase, the intrinsic geometric properties of H-bonding have been quantified.12 From these studies, two geometrical rules have been established and will be used in this article: (a) the polar hydrogen is oriented toward the lone pair orbitals in the acceptor atom; and (b) the H-bond angle, PA‚‚‚H-PD, is close to 180°. These conclusions are also supported by studies of crystal structures of small organic molecules, peptides, and proteins,3,4,38,39 although in these cases the deviations from the optimum geometry are larger than in the gas phase for reasons outlined in the Introduction.

H Bonding in a Continuum Solvent Model

J. Phys. Chem. B, Vol. 104, No. 27, 2000 6493

TABLE 2: Orbital Orientations and Accessible Regions for Hydrogen Bondinga orientation/accessible region

φb

θb

angle intervalc

(60°



B

sp ; all backbone carbonyl oxygens, side chain carbonyl oxygens in Asn and Gln, side chain carboxylate oxygens in Asp- and Glusp2, hydroxyl oxygen in Tyr

60°



C

sp2, unprotonated ring nitrogen in His





D

sp3, side chain hydroxyl oxygens in Ser and Thr, sulfur atoms in Cys and Met



(54.75°

-75° < φ < 75° -30° < θ < 30° 45° < φ < 75° -30° < θ < 30° -30° < φ < 30° -30° < θ < 30° -30° < φ < 30° -75° < θ < 75°

set A

2

a Spherical angles, θ and φ, as defined in Figure 2. b Angles defining the ideal orientation of the orbitals at the acceptor atoms, as given by the standard sp2/sp3 hybridization state. c Angle intervals defining the H-bond accessible regions of a proton around the acceptor atom (angles at the proton require θH > 135°, see Figure 2).

Figure 2. Parameters defining the H-bond geometry: the coordinates of a proton H, covalently bounded to the proton donor, PD, are given by the polar angles θ and φ, where the acceptor PA defines the origin. The azimutal angle φ is measured in the acceptor plane, defined by the acceptor, the first antecedent A1, and one of the second antecedents, A2 (or A2′). The polar angle θ measures the displacement of the proton from the acceptor plane. The H-bond angle θH is also shown.

Because of the difference in geometric preferences of the acceptor for bonding protons, the different types of acceptor atoms have been classified in four sets, as shown in Table 2. The parameters defining the H-bond geometry are most simply expressed in terms of spherical coordinates, as shown in Figure 2. Here the acceptor defines the origin and the reference plane (x,y) is defined by the PA and acceptor antecedents (acceptor plane). The polar angle φ is defined in the plane, whereas the azimuthal angle θ measures the deviations of the proton above and below the plane. The ideal orientations of the lone pair electrons in the acceptor atom, as determined by the hybridization states, are shown in the third column of Table 2. The assignment of angle intervals, shown in the fourth column of Table 2, defines the accessible regions in which the proton can be oriented to form an H-bond with the acceptor. The assignment of these intervals was found to be a good compromise between experimental and theoretical findings and simplicity in the implementation. Simplicity is a fundamental requirement for allowing the algorithm to be efficiently incorporated into a molecular mechanics program [i.e., with competitive CPU time]. The experimental findings of the range of orientations comes from statistical investigations on the crystal structures of small organic molecules,12 peptides,50 and proteins38 obtained from structural databases.4 In Appendix A the details of the calculation of the angles and the accessible regions shown in Table 2 are obtained in terms of the atomic coordinates. It is worthwhile to note that in cases where two lone pairs exist, the entire intervening interval was considered to be accessible to the proton (as is the case of the angle φ in set A and θ in set B, see Table 2). This assumption is the simplest way to account for the experimental fact that for bifurcated H-bonds (two donors H-bonded to the same acceptor), each shared proton tends to lie in the direction of one of the lone pair orbitals, whereas for simple H-bonds (only one donor H-bonded to the acceptor), it tends to be oriented between the orbitals.50 This orientation can

be intuitively interpreted in terms of the interaction of the proton with the lone pairs. For the single-bond case, the two lone pairs of the acceptor atom tend to attract the proton with similar strength, and then the optimal orientation will be somewhere between the two lone pairs. On the other hand, for bifurcated H-bonds, each lone pair attracts one proton and thus their orientations will be nearer the ideal orbital directions. Finally, the H-bond angle PA‚‚‚H-PD is required to satisfy θH > 135° (see Figure 2). Hydrogen Bonding Energies. At a given H‚‚‚PA distance r, the potential energy between two molecules X and Y is V(r) ) VX + VY + VXY(r), where VX and VY are the total potential energies of each molecule at infinite separation, and VXY(r) is the interaction energy between them. At equilibrium, the potential energy has a minimum, V(r0), at a distance r0, and the H-bond energy as defined throughout this paper is given by EH ) V(r0) - V(∞) ) VXY(r0). In addition, the H-bond length will be defined by the position of the minimum of the potential energy, (i.e., by r0). It is clear, however, that because of the thermal fluctuations of real molecules in solution, the observed H-bond lengths are expected to be larger than the quantity obtained from the minimum of the potential energy. The following estimates3,4,29 of H-bond energies, depending on charge state, will be used: for interactions between charged (C) groups, EC-C ) -4.5 kcal/mol; between charged groups and neutral (N) groups, EC-N ) -3.5 kcal/mol; and for neutral group interactions, EN-N ) -2.5 kcal/mol. Within the framework of the SCP-ISM, the parameters for H-bonding will be assigned to reproduce these target H-bond energies. Although this original assignment of H-bonding energies may require modifications on refinements of the SCP-ISM, the fundamental sufficiency of this assignment will be demonstrated next in actual simulations on biological systems (see also refs 51-53). Side Chain-Side Chain H-Bond Interactions. To determine the values of gA that yield the target values of the H-bond energies for each proton involved in the SS H-bonds, the 21 amino acid residues were constructed in CHARMM (both the neutral and protonated forms of the histidine were considered) with neutralized carboxy and amino termini. All the 42 possible SS H-bond pairs were explicitly considered. For a given donoracceptor interaction, the corresponding two-molecule complex was constructed in the ideal orbital geometry as determined by the angles θ and φ in the PA (see third column in Table 2), and with θH ) 180° (the molecules were further rotated around the H-bond direction to minimize artificial van der Waals repulsions). The total potential energy was calculated as a function of the PA distances, r, with all other geometric parameters fixed, and the electrostatic part of the interactions was calculated using the SCP-ISM48 with the Born radii defined by eq 2. Potential curves were calculated for several values of gA until the value yielding the preassigned H-bond energy, corresponding to the

6494 J. Phys. Chem. B, Vol. 104, No. 27, 2000

Hassan et al. TABLE 3: Enlargement gA of the Covalent Radii of the Polar Hydrogens in Side Chain-Side Chain H-Bond Interactionsa ASPASN TYR SER CYS HIS 〈gA〉a

TYR

SER

HIS

ASN

LYS+

HIS+

ARG+

〈gA〉d

0.590 0.760 0.905 1.100 1.150 0.740 0.874

0.640 0.870 0.840 0.905 0.935 0.660 0.808

0.430 0.830 0.920 1.220 1.080 0.500 0.830

0.100 0.380 0.410 0.400 0.480 0.190 0.327

0.570 0.820 1.000 1.050 0.700 0.500 0.773

0.575 0.795 0.915 1.050 0.900 0.692 0.821

0.430 0.560 0.645 0.595 0.700 0.430 0.560

0.476 0.716 0.805 0.903 0.849 0.530 0.713b

a Donors (columns) and acceptors (rows) groups are labeled with the name of the amino acid residues to which they belong; radii in Angstroms. b Double average 〈〈gA〉a〉d in eq 7.

Figure 3. Potential energy curves for the TyrsSer H-bond interaction for different values of gA. The correct stabilization energy of 2.5 kcal/ mol is attained with gA ) 0.840 Å. The geometry of the complex is shown schematically.

interaction type, could be determined. Figure 3 shows potential curves for the Tyr-Ser side chain H-bond interaction shown schematically. Curves are given for four values of gA with the potential for gA ) 0.84 Å yielding the target H-bond energy of -2.5 kcal/mol. For the value of gA making RA ) Rp, only a very shallow minimum of the potential energy is seen, but for values