Role of Hydrogen Bonds in Protein−DNA ... - ACS Publications

May 10, 2005 - Biophysics DiVision, Saha Institute of Nuclear Physics, 1/AF Bidhan Nagar, Kolkata 700 064, India. ReceiVed: NoVember 25, 2004; In Fina...
0 downloads 0 Views 481KB Size
10484

J. Phys. Chem. B 2005, 109, 10484-10492

Role of Hydrogen Bonds in Protein-DNA Recognition: Effect of Nonplanar Amino Groups Shayantani Mukherjee, Sudipta Majumdar,† and Dhananjay Bhattacharyya* Biophysics DiVision, Saha Institute of Nuclear Physics, 1/AF Bidhan Nagar, Kolkata 700 064, India ReceiVed: NoVember 25, 2004; In Final Form: March 17, 2005

Amino groups are one of the various types of hydrogen bond donors, abundantly found in protein main chains, protein side chains, and DNA bases. The polar hydrogen atoms of these groups exhibit short ranged, specific, and directional hydrogen bonds, which play a decisive role in the specificity and stability of proteinDNA complexes. To date, planar amino groups are only considered for the analysis of protein-DNA interfacial hydrogen bonds. This assumption regarding hydrogen atom positions possibly failed to establish the expected role of hydrogen bonds in protein-DNA recognition. We have performed ab initio quantum chemical studies on amino acid side chains and DNA bases containing amino groups as well as on specific hydrogen bonded residue pairs selected from high-resolution protein-DNA complex crystal structures. Our results suggest that occurrences of pyramidal amino groups are more probable in comparison with the usually adopted planar geometry. This increases the quality of the existing hydrogen bonds in almost all cases. Further, detailed analysis of protein-DNA interfacial hydrogen bonds in 107 crystal structures using the in-house program “pyrHBfind” indicates that consideration of energetically more preferred nonplanar amino groups improves the geometry of hydrogen bonds and also gives rise to new contacts amounting to nearly 14.5% of the existing interactions. Large improvements have been observed specifically for the amino groups of guanine, which faces the DNA minor groove and thus helps to resolve the problem of insufficient directional contacts observed in many minor groove binding complexes. Apart from guanine, improvement observed for asparagine, glutamine, adenine, or cytosine also indicates that the consideration of nonplanar amino groups leads to a more realistic scenario of hydrogen bonds occurring between protein and DNA residues.

Introduction Hydrogen bonds are among the most specific interactions occurring in various forms of molecular recognition processes. They are known to play a dominant role in maintaining the threedimensional structures of nucleic acids and proteins. Numerous efforts have been made in understanding the function of hydrogen bonds in protein-DNA interactions, as these short ranged, yet extremely specific and directional contacts are expected to play a decisive role in sustaining the specificity and stability of protein-DNA complexes.1-3 However, various studies on protein-DNA complexes based on X-ray crystallographic structures have been unable to predict the presence of enough hydrogen bonds, which is necessary for it to be considered as the major criterion in maintaining the stability and specificity of such complexes.4-8 Additionally, studies on drug-DNA9 or ligand-protein complexes10 have indicated that a vast majority of detected bonds deviate considerably from the optimum distance and angle criteria of a good hydrogen bond. Such deviations from linearity also noticed largely in the case of protein-DNA complexes, coupled with an overall smaller number of hydrogen bonds, led to the consideration of alternative mechanisms playing a decisive role in stabilizing such complexes. These include CH‚ ‚ ‚O or water mediated hydrogen bonds, electrostatic and hydrophobic interactions occurring between the components, structural complementarity of protein and DNA molecules, and conformational flexibilities of the interacting macromolecules.3,11-19 Despite considering * Corresponding author. Fax: 091-33-2337-4637. E-mail: [email protected]. † Present address: Chemistry Department, Wesleyan University, Middletown, CT 06459.

these important aspects contributing to the recognition process of protein-DNA complexes, hydrogen bonds still remain a significant source of specific and directional contacts, which can contribute immensely in resolving the question of the sequence specificity and recognition of such complexes. However, “relative lack of directional contacts”5 seems to undermine its role toward maintaining the specificity and stability of protein-DNA complexes. In protein-DNA complexes, some of the most important and abundant hydrogen bond donors are the amino groups found in DNA bases and protein side chains and main chains. Studies dealing with hydrogen bonds in protein-DNA complexes model the amino group hydrogen atoms according to the ideal planar configuration.20-22 This is the conventional geometry of these groups occurring in DNA bases or protein chains, which is assumed to be arising due to conjugation of the lone pair electrons of nitrogen with the π electron cloud in the vicinity of the nitrogen atom. However, increasing evidence exists which indicates that the amino groups of DNA bases23,24 and small molecules25-28 can adopt a nonplanar geometry. Moreover, evidence of a large number of nonplanar peptide groups has also been found in protein structures solved using the neutron diffraction technique,29 where the positions of the exchangeable deuterium atoms are determined with more precision than polar hydrogens in X-ray crystal structures. Such nonplanar amino groups are also capable of explaining some unique base-base interactions in RNA,30 cross-strand bifurcated hydrogen bonds between successive base pairs, or DNA deformations observed in DNA crystal structures.31-34 However, direct experimental evidence of nonplanar amino groups in DNA or protein is generally not available in the huge

10.1021/jp0446231 CCC: $30.25 © 2005 American Chemical Society Published on Web 05/03/2005

Role of Hydrogen Bonds in Protein-DNA Recognition X-ray crystal structure database. Very few high-resolution crystal structures provide the positions of hydrogen atoms (58 protein structures are found to have hydrogen positions with 1.0 Å or better resolution), whose analysis reveals the presence of ideally planar peptide hydrogen atoms in most cases (in 47 structures out of 58). The remaining 11 structures provide direct evidence for nonplanar amino group hydrogen atoms35 in addition to the neutron diffraction data. Moreover, it may be noted that hydrogen atom positions in X-ray crystal structures are often not reliable, as (a) these are generally more dynamic and (b) due to less electron density for polar hydrogens bonded to nitrogen or oxygen atoms, they have low scattering cross sections to contribute significantly to X-ray diffraction pattern. In the absence of sufficient experimental evidence, the theoretical ab initio quantum chemical method, which is free from bias arising due to parametrization (as in the force field or semiempirical methods), is perhaps the most accurate means of describing the state of hybridization of the nitrogen of amino groups. In this study, we have primarily used ab initio quantum chemical methods to investigate the geometry of amino groups in DNA bases as well as in amino acid side chains. Further, these methods were also employed to study selected proteinDNA complex residue pairs undergoing hydrogen bonds, with a view to acquire insight into the actual contribution of nonplanar amino group hydrogen atoms in stabilizing such pairs. We have finally attempted to understand the role of nonplanar amino groups in the formation of protein-DNA interfacial hydrogen bonds by studying 107 such crystal structures with the help of an in-house developed program, “pyrHBfind”. The program, unlike available software,20-22 models amino group hydrogen atoms as planar or nonplanar groups by optimizing neighboring hydrogen bond interactions. The changes induced in the pattern of hydrogen bonds occurring due to nonplanar amino groups in a large number of protein-DNA interfaces helps in redefining the role of these short ranged, directional interactions in biomolecular recognition and specificity. Methods (a) Ab Initio Quantum Chemical Studies. The geometries of amino group containing nucleotide bases, viz., adenine, guanine, and cytosine, and the amino acid side chains of asparagine, arginine, histidine, tryptophan, and glycine-glycine dipeptide were optimized with the MP2/6-31G(2p, 2d), HF/ 6-31G(2p, 2d), HF/6-311G(2p, 2d), and HF/6-311+G (2p, 2d) basis sets using the GAMESS software.36 The amino group containing side chain of glutamine was not considered for optimization, as it bears close resemblance to that of an asparagine side chain except for an extra -CH2- group. For each case, two starting geometries were adopted for optimization: (i) with a perfectly planar amino group (indicated as PLN) and (ii) with a nonplanar amino group (indicated as PYR). To check the rigor of the optimized character in each case, harmonic vibration analyses were carried out for both sets of optimized structures. All structure modeling and visualization were done using MOLDEN software.37 The extent of nonplanarity (θ) of the two types of amino groups indicated in Figure 1 is measured by the following methods:35 For primary amino groups as in Figure 1a:

θ ) cos-1[(XC - XN)‚{(XH1 - XN) × (XH2 - XN)}] - 90° where XC, XN, XH1, and XH2 are coordinates of C, N, H1, and H2, respectively.

J. Phys. Chem. B, Vol. 109, No. 20, 2005 10485

Figure 1. Types of amino groups studied: (a) primary amino group; (b) secondary amino group.

For secondary amino groups as in Figure 1b:

θ ) cos-1[(XH - XN)‚{(XC′ - XN) × (XCr - XN)}] - 90° where XC′, XN, XH, and XCr are coordinates of C′, N, H, and CR atoms, respectively. Evaluation of the extent of nonplanarity in this method renders 0° for the ideal planar amino group, and its value increases as pyramidalization of the amino group enhances. To look into the scenario of hydrogen bonds occurring between DNA and protein residues with amino groups as hydrogen bond donors, we have selected all of the available 107 protein-DNA complexes from the Protein Data Bank (PDB)38 (October 2003), which are solved by the X-ray crystallographic method at a resolution better than 2 Å. The PDB identifiers of the selected structures are pdt025 (obtained from the Nucleic Acid Data Bank (NDB),39 October 2003), 1a1g, 1a1h, 1a1i, 1a1j, 1a1k, 1a73, 1aay, 1azp, 1azq, 1azo, 1b94, 1b97, 1bc8, 1bf4, 1bgb, 1bnz, 1brn, 1bss, 1bsu, 1c8c, 1cdw, 1cyq, 1d2i, 1dc1, 1dfm, 1dsz, 1d02, 1dp7, 1duo, 1e3o, 1egw, 1emh, 1emj, 1eo3, 1eo4, 1eon, 1esg, 1eyu, 1fiu, 1fjl, 1g2f, 1g38, 1g9z, 1gd2, 1gu4, 1h6f, 1hcr, 1i6j, 1jb7, 1jgg, 1jk1, 1jk2, 1jx4, 1k3w, 1kx3, 1kx5, 1l1t, 1l1z, 1l2d, 1l3l, 1l3s, 1l3t, 1l3u, 1l3v, 1l5u, 1lv5, 1lat, 1lau, 1lmb, 1m5r, 1mnn, 1mus, 1n3f, 1n4l, 1nkp, 1nlw, 1nnj, 1oe4, 1orn, 1owf, 1p71, 1puf, 1qn3, 1qn4, 1qn5, 1qn9, 1qna, 1qne, 1qum, 1rnb, 1rva, 1ssp, 1tro, 1ytb, 2bam, 2bdp, 2bop, 2dnj, 2hdd, 2nll, 2pvi, 3bam, 3bdp, 3hts, 3pvi, and 4bdp. We have selected a few specific examples of protein-DNA interfacial hydrogen bonding residue pairs from this data set for ab initio geometry optimization. The list of protein-DNA residue pairs represents different types of hydrogen bonds, namely, (ring)NH‚ ‚ ‚OdC, (ring)NH‚ ‚ ‚O(-), (ring)NH‚ ‚ ‚N, NH2‚ ‚ ‚OdC, NH2‚ ‚ ‚O(-), NH2‚ ‚ ‚N, and so forth (Table 1). Prior to optimization, some of these showed moderately good hydrogen bonds in terms of bond distance and angle while others exhibited a poor quality of hydrogen bonds with planar amino groups. The hydrogen atomic positions of these pairs were then optimized with the HF/6-31G(2p, 2d) and HF/6-311+G(2p, 2d) basis sets in two ways: (i) by constraining the internal coordinates of all heavy atoms and the dihedral angles of hydrogen atoms of the particular amino group using the IFREEZ option of GAMESS in order to maintain the planarity of the amino group during optimization and (ii) by constraining only the internal coordinates of heavy atoms while keeping all of the internal coordinates of the hydrogen atoms free, so that nonplanarity of amino groups can occur. The interaction energies of all these optimized structures were evaluated incorporating the basis set superposition error (BSSE) correction by the Morokuma method40 using the HF/6-31G(2p, 2d) basis set.

10486 J. Phys. Chem. B, Vol. 109, No. 20, 2005

Mukherjee et al.

TABLE 1: Specific Hydrogen Bonds in Protein-DNA Complexes Selected for ab initio Quantum Chemical Calculations along with the Corresponding Donors, Acceptors, and the Distance between Them hydrogen bond identifiera CABDOS CABGOM GABVOM1 GABVOM2 AABNOS NASANB1 AABAOM RASANB RASTOB RASAOP RASTOP1 RASTOP2 NASCOB NASGOP NASANB2 HNSGNB HNSGOP1 HNSGOP2 HNSGOB NAMGOP NAMGOB RAMANB

PDB ID

hydrogen bond donor

hydrogen bond acceptor

1A1I 1BGB 1AZP 1C8C 1EO3

C (N4) C (N4) G (N2) G (N2) A (N6) Asn (ND2) A (N6) Arg (NE) Arg (NH1) Arg (NH1) Arg (NH1) Arg (NH2) Asn (ND2) Asn (ND2) Asn (ND2) His (NE2) His (ND) His (ND) His (NE) Asn (N) Asn (N) Arg (N)

Asp (OD2) Gly (O) Val (O) Val (O) Asn (OD1) A (N7) Ala (O) A (N7) T (O2) A (O1P) T (O1P) T (O2P) C (O2) G (O1P) A (N7) G (N7) G (O1P) G (O2P) G (O6) G (O1P) G (O6) A (N7)

2BDP 1A73 1AZQ 1EO4 PDT025 1BSS 1DSZ 1B94 1DC1 1JK1 1NLW 1EYU 1BRN 1EGW

residue number and chain of donor

residue number and chain of acceptor

55 909 115 115 5 175 26 61 42 221 56 56 70 1185 70 253 125 125 506 73 58 3

148 182 26 26 185 5 707 15 106 5 221 221 909 1540 5 7 7 7 807 11 2 13

C D C C C B T B A A A A B A B B A A D A M D

A A A A B C A C B C D D C D D W B B J D B G

heavy atom distance (Å) 2.93 2.83 3.18 3.10 2.90 2.94 2.90 3.07 3.20 3.41 2.96 2.95 2.86 3.06 3.43 3.03 2.94 3.84 3.29 3.06 2.79 3.14

a The following rule has been maintained in naming the hydrogen bond identifiers. The six letters in any of the identifiers signify (i) the single letter code for the hydrogen bond donor residue, (ii) “A” for -NH2 type of amino group or “N” for -NH- type of amino group, (iii) subscript letter “B” indicating donor atom of base, “S” indicating that of the amino acid side chain, or “M” indicating that of the protein main chain, (iv) the single letter code for the hydrogen bond acceptor residue, (v) “O” for oxygen or “N” for imino nitrogen, (vi) subscript letter “B” indicating the acceptor atom of base, “P” indicating that of the DNA backbone, “S” indicating that of the amino acid side chain, or “M” indicating that of the protein main chain. The numbers at the end of the six-letter code denote the occurrences of a specific identifier more than once.

Interaction energies without BSSE correction were also calculated with the MP2/6-31G(2p, 2d) and HF/6-311+G(2p, 2d) basis sets. (b) Analysis of Protein-DNA Crystal Structures. Improvement of the hydrogen bonding pattern between DNA and protein residues arising due to nonplanar amino groups was assessed in all 107 protein-DNA crystal structures. This forms an exhaustive list of cocrystals solved at a resolution better than 2 Å. Furthermore, analysis was also done with a subset of 79 cocrystals sorted after eliminating redundant structures. To do so, protein structures with less than 25% sequence identity were first selected using culledPDB.41 However, this procedure does not deal with DNA sequence redundancy, where a single base change at the ligand binding interface can drastically alter its specificity. Since our interest is to investigate the recognition processes of cocrystals from the context of proteins and DNA molecules as well, crystal structures rejected due to more than 25% protein sequence identity were again sorted on the basis of the DNA part. DNA sequences showing variation even in one base pair, specifically at the protein binding interfaces (without considering changes at the 3′ and 5′ ends), were also incorporated to form the final subset. Hydrogen bond analysis was done with the help of the in-house program pyrHBfind developed in FORTRAN-77, which initially detects possible hydrogen bonds occurring between an amino group nitrogen and any other hydrogen bond acceptor situated within a radius of 3.5 Å of the donor nitrogen atom. The hydrogen atoms of the amino groups are modeled according to standard bond lengths and angles as in the CHARMM-27 parameter set.42,43 The torsion angles are varied over a range of (40° in steps of 2° to locate the best possible bond geometry. The distance and angle criteria for an interaction to be considered as a hydrogen bond are fixed in pyrHBfind as the following: D‚ ‚ ‚A (donoracceptor distance) e 3.5 Å, D-H‚ ‚ ‚A (hydrogen bond distance) e 3.0 Å, and D-H‚ ‚ ‚A (hydrogen bond angle) g

90°. However, these angle and distance cutoffs can be modified by the user as required. When multiple occupancy of an atom was detected in a PDB file, the position with maximum occupancy was selected for analysis (the first position was selected when both the positions are reported to be equally probable). The program executable can be downloaded from http://www.saha.ac.in/biop/www/db/local/bioinformatics.html. The “improvement index” was calculated as a quantitative representation of net improvement observed in hydrogen bonds upon consideration of nonplanar amino groups. It is defined as Σfixi, where fi is the frequency of occurrences in the ith bin and xi is the amount of improvement in hydrogen bond distance or angle obtained from the normalized frequency distribution of a specific case. The method ensures that bonds undergoing greater changes in terms of bond distance and angle are given more importance. It was then represented as percent improvement of the maximum value of Σfixi obtained for different cases. The final value was obtained by averaging the percent improvement of Σfixi calculated from normalized histogram distributions of hydrogen bond distance and angle of that specific case. Results (a) Structures of Nucleotide Bases, Amino Acid Side Chains, and Dipeptide Backbone. The amino group geometries of nucleotide bases, viz., adenine, guanine, and cytosine, and the amino acid side chains of asparagine, arginine, tryptophan, two protonated states of histidine (N or Nδ protonated), and glycine-glycine dipeptide, optimized with different basis sets and from two types of starting geometries, namely, PLN and PYR, are tabulated in Table 2. In most of the cases, optimization starting from the PLN geometry ultimately converged to structures with planar amino groups, except in asparagine. On the other hand, optimization starting from PYR geometries generally resulted in nonplanar amino groups with the exception

Role of Hydrogen Bonds in Protein-DNA Recognition

J. Phys. Chem. B, Vol. 109, No. 20, 2005 10487

TABLE 2: Geometric Parameters of Single Bases or Amino Acids Optimized from PLN and PYR Starting Structures by Different Basis Sets along with ∆E(PLN-PYR) for Each Case

molecule Ade Gua Cyt Asn Arg

MP2/6-31G(2p, 2d) HF/6-31G(2p, 2d) HF/6-311G(2p, 2d) HF/6-311+G(2p, 2d) amino θ (deg) θ (deg) θ (deg) θ (deg) group ∆E ∆E ∆E ∆E nitrogen PLN PYR (kcal/mol) PLN PYR (kcal/mol) PLN PYR (kcal/mol) PLN PYR (kcal/mol) N6 N2 N4 ND2 NE NH1 NH2 ND1

0.01 0.00 0.01 29.14 0.00 0.00 0.00 0.00

36.05 48.20 36.77 29.17 5.24 0.40 3.34 0.09

NE2

0.00

Trp

NE1

0.00

Gly-Gly dipeptide

N

His

0.441 1.797 0.485 0.000 0.000 0.000

0.01 0.00 0.01 24.35 0.00 0.00 0.00 0.00

27.79 39.64 26.28 24.36 0.02 0.00 0.02 0.03

0.12

0.000

0.00

0.01

0.000

15.22 15.16

0.000

0.161 0.800 0.124 0.000 0.000 0.000

0.01 0.00 0.01 24.08 0.00 0.00 0.00 0.00

27.77 39.20 25.99 24.26 0.01 0.02 0.01 0.03

0.000

0.09

0.000

0.00

0.09

0.000

0.00

0.01

0.000

0.00

0.01

0.000

3.62

3.74

0.000

4.68

4.83

0.000

of arginine, two protonated states of histidine, and tryptophan. The dipeptide showed approximately planar amino groups in most cases, with the exception of the MP2 optimized structure. This was, however, comparatively less pyramidal than others like guanine, asparagine, and so forth. The planar structures, though optimized fully, are energetically unfavorable as compared to the pyramidal structures by significant amounts (Table 2). Harmonic vibration analysis of these higher energy optimized PLN geometries indicated the presence of one large imaginary frequency corresponding to the pyramidal out-of-plane wagging movement of amino group hydrogen atoms. This certainly proves that the geometries obtained by optimizing PLN structures were local maxima, which upon harmonic vibration analysis yielded a large negative eigenvalue. On the contrary, optimization of the same molecule starting from PYR geometries gives rise to minima structures with all real frequencies upon such vibration analysis. Structures optimized with four different basis sets do not produce a comparable extent of nonplanarities, among which MP2/6-31G(2p, 2d) exhibits the greatest amount of nonplanarity followed by HF/6-311G(2p, 2d), HF/6-31G(2p, 2d), and HF/6-311+G(2p, 2d). The presence of diffused s and p orbitals on heavy atoms for the HF/6-311G+(2p, 2d) basis set is found to decrease the extent of the nonplanarity of amino groups, while both HF/6-311G(2p, 2d) and HF/6-31G(2p, 2d) showed comparable results. Furthermore, the amino groups of different nucleotides or protein side chains exhibit different extents of pyramidalization and energy stabilization upon being optimized with similar kinds of basis sets. Among all residues, guanine exhibits the largest extent of nonplanarity coupled with a large energy gain, while other DNA residues such as cytosine and adenine also showed a large amount of nonplanarity along with considerable energy gain on pyramidalization. On the other hand, asparagine exhibits the largest amount of nonplanarity among protein side chains, which occurs in both PLN and PYR optimized structures, indicating its greater propensity in maintaining a pyramidal configuration than the rest. The extent of the nonplanarity of amino groups observed for the dipeptide is least among all nonplanar residues, while arginine, histidine, or tryptophan never showed any pyramidalization (Table 2). This trend in the extent of pyramidalization is also manifested in the bond distance and bond order of the N-C bond connecting the amino groups to the rest of the PYR optimized structures of these bases or amino acids (Table 2). In general, the bond orders and bond lengths of the specific N-C bond reflect the correlation of the decrease

0.169 0.795 0.128 0.000 0.000

25.36 37.78 37.85 0.01 23.52 21.85 21.98 179.9 0.01 179.9 0.00 179.9 0.01 0.00 0.02

0.000 0.093 0.000 0.000 0.000

0.00

0.05

0.000

3.55

3.61

0.000

bond type

bond bond order distance (MP2/ (MP2/ PYR) PYR) (Å)

C6-N6 C2-N2 C4-N4 CG-ND2 CZ-NE CZ-NH CZ-NH CG-ND1 CE1-ND1 CD2-NE2 CE1-NE2 CD1-NE1 CE2-NE1 C-N

1.126 1.069 1.116 1.133 1.276 1.219 1.226 1.117 1.193 1.141 1.196 1.133 1.087 1.128

1.367 1.388 1.373 1.377 1.329 1.338 1.335 1.379 1.366 1.376 1.365 1.381 1.376 1.374

CA-N

1.004

1.437

in nitrogen lone pair conjugation with the extent of sp3 hybridization of the amino group, exhibiting larger bond lengths and smaller bond orders for more pyramidal amino groups such as guanine. Arginine with extended lone pair conjugation has large bond orders for all three of the N-C bonds, which implies that the nonplanarity of any of the amino groups demands breakage of extended conjugation and is energetically not feasible for an isolated arginine molecule. Histidine and tryptophan side chains are also examples where pyramidalization of amino nitrogen may induce ring puckering which is unfavorable compared to the energy gain of (ring)NH pyramidalization. (b) Nonplanarity of Amino Groups under Hydrogen Bonded Conditions. To analyze the geometries of hydrogen bonds occurring between protein and DNA residues and involving amino groups as hydrogen bond donors, several hydrogen bond forming residue pairs were isolated from the crystal structures of protein-DNA complexes (Table 1). The positions of the hydrogen atoms of these systems were then optimized with or without the planarity constraints for the amino group hydrogens. Considering the dependence of amino group pyramidalization on basis sets, we have chosen HF/6-31G(2p, 2d) to optimize these systems, which are found to be reasonably adequate in predicting the sp3 hybridization of the nitrogen lone pair. The basis set does not impart extreme nonplanarity as probably done by the MP2 method, and neither does it reduce the same as done by HF/6-311+G(2p, 2d) (Table 2). In addition, these systems were also optimized with HF/6-311+G(2p, 2d) in order to ascertain the basis set dependent effect on the extent of nonplanarity of the amino group. After optimization with completely flexible amino groups, all of these pairs exhibited a much better quality of hydrogen bonds with both basis sets, which definitely could occur due to nonplanar amino groups. In these cases, movement of hydrogen atoms away from the traditionally accepted planar geometry led toward improvement of hydrogen bonds with increased hydrogen bond angle (closer to 180°) and decreased hydrogen bond distance (Figure 2 and Table 3). Moreover, occurrences of nonplanar amino groups in some cases led to the formation of stronger three-centered hydrogen bonds apart from improving the quality of the previously existing bond (PDB ID: 1jk1). Interestingly, for pairs consisting of arginine amino groups as hydrogen bond donors, a sufficient amount of nonplanarity occurred in many cases leading to stronger hydrogen bonds. This is because of a strong hydrogen bond acceptor situated

10488 J. Phys. Chem. B, Vol. 109, No. 20, 2005

Mukherjee et al. complexes does not depend on the choice of basis set. This indicates that basis set dependency of the sp3 hybridization of the nitrogen lone pair of an amino group becomes insignificant in the presence of a prospect of forming a directional contact. This also implies that, in the presence of a hydrogen bond acceptor, amino groups can become irreversibly nonplanar from planar configuration, which is always not sustainable in isolated condition due to lower energy barriers. The differences in interaction energies between complexes with planar and nonplanar amino groups were evaluated considering the basis set superposition error (BSSE) method (Table 3). This indicates that there can be a rise in stability of about 5 kcal/mol due to improvement of the quality of hydrogen bonds occurring as a result of nonplanar amino groups (Table 3). In some cases, where the initial hydrogen bonds are very poor, nonplanar amino groups facilitate the formation of reasonably strong hydrogen bonds with a significant increase in the bonding energies. It was further noted that the interaction energies of complexes undergoing weak hydrogen bonds even with nonplanar amino groups were around 4-5 kcal/mol, which is a quite substantial amount in terms of cumulative energy contributions coming from such bonds in stabilizing these complex systems. This interaction energy can increase manifold in the case of residues containing charged molecules such as the DNA phosphate backbone or arginine and even exhibited stabilization of about 100 kcal/mol for systems containing both positively and negatively charged residues (Table 3). (c) Analysis of Protein-DNA Complexes. The results of our ab initio calculations suggested that the nonplanarity of amino groups can drastically alter the situation of hydrogen bonds occurring between protein and DNA residues in the cocrystals. Statistical analysis of all the selected structures (referred to as set 1) revealed that there are 2539 hydrogen bonds formed between protein and DNA residues with amino group nitrogen as the hydrogen bond donor. The list excluded amino groups of residues such as histidine and tryptophan, as these did not show much geometry improvement in our ab initio calculations. Furthermore, the results of our program obtained

Figure 2. Optimized structures of a hydrogen bonded protein-DNA residue pair (PDB ID: 1azp) showing (a) nonplanar amino groups forming better hydrogen bonds than (b) planar amino groups.

near one of the amino groups, which can now break the extended conjugation of the nitrogen lone pair, and the amino group can thus become nonplanar upon optimization. This breaking of conjugation could not be achieved in the case of an isolated arginine molecule due to unavailability of other energy stabilizing interactions. Some pairs containing histidine amino groups as hydrogen bond donors were also studied, but the hydrogen of these systems did not deviate considerably from planarity. Although some amount of nonplanarity has been noticed in a few cases, notable improvement of the hydrogen bond was observed only in one case (PDB ID: 1jk1), where protonated Nδ exhibited bifurcated hydrogen bonds with two negatively charged DNA phosphate oxygen atoms (Table 3). It is further noted that the amount of nonplanarity of the optimized

TABLE 3: Final Interaction Energies, ∆E (PLNinteraction - PYRinteraction), and Optimized Geometries of Hydrogen Bonded Residue Pairs nonplanar geometry (HF/6-31G(2p, 2d)

planar geometry

nonplanar geometry (HF/6-311G+(2p, 2d)

∆E hydrogen hydrogen hydrogen hydrogen hydrogen hydrogen hydrogen final ∆E ∆E (BSSE) bond bond bond bond bond bond bond energy (MP2/ (HF/6(HF/6distance angle θ distance angle θ distance angle θ identifier (kcal/mol) 6-31G(2p, 2d) 311G+(2p, 2d) 31G(2p, 2d) (Å) (deg) (deg) (Å) (deg) (deg) (Å) (deg) (deg) CABDOS -24.11 CABGOM -5.96 GABVOM1 -3.61 GABVOM2 -4.02 AABNOS -12.85 NASANB1 -12.85 AABAOM -2.69 RASANB -4.48 RASTOB -12.35 RASAOP -72.74 RASTOP1 -100.75 RASTOP2 -100.75 NASCOB -6.56 NASGOP -5.12 NASANB2 -3.15 HNSGNB -3 HNSGOP1 -15.29 HNSGOP2 -15.29 HNSGOB -6.03 NAMGOP -18.69 NAMGOB -1.04 RAMANB -2.46

2.45 1.95 2.74 2.22 1.95 1.95 2.96 2.63 2.94 2.96 3.9 3.9 1.52 2.61 1.84 1.09 4.89 4.89 0.33 2.62 1.6 1.24

2.5 2.09 2.72 2.2 1.9 1.9 3.1 2.43 2.8 2.52 3.31 3.31 1.64 2.49 1.73 1.29 4.48 4.48 0.38 2.58 1.54 1.3

2.64 2.05 2.74 2.25 2.09 2.09 2.34 2.74 3.59 2.85 3.89 3.89 1.99 2.55 1.8 1.22 5.00 5.00 0.36 2.7 2.01 1.38

2.02 1.98 2.59 2.5 1.94 2.03 2.1 2.55 2.74 2.54 1.98 2.09 1.97 2.17 2.64 2.14 2.54 3.08 2.42 2.27 2.52 2.3

148.4 141.7 118 118.9 159.5 149.7 136.3 112.8 108.8 145.3 161.1 141.8 147.4 146.7 137.3 148.3 103.1 134.2 146.2 135.6 95.2 142.6

0.01 0.00 0.04 0.01 0.02 0.01 0.01 0.00 0.03 0.00 0.00 0.00 0.06 0.00 0.00 0.00 0.00 0.00 0.00 0.07 0.00 0.00

1.9 1.86 2.32 2.26 1.92 1.94 1.92 2.23 2.3 2.42 1.95 1.95 1.89 2.06 2.45 2.06 2.19 2.96 2.38 2.11 2.22 2.18

169.8 164.7 143.2 140.6 165.5 168.6 167.6 140.3 148.7 168.3 171.1 165.4 164 171.1 167.9 163.5 129.9 147.2 152.3 159 115.5 163.5

23.17 26.89 41.68 37.47 9.75 14.75 34.44 20.32 26.78 20.03 18.56 16.60 20.94 43.82 30.74 12.48 25.82 25.82 6.29 14.58 17.73 15.96

1.92 1.86 2.32 2.26 1.92 1.94 1.92 2.24 2.31 2.42 1.95 1.96 1.88 2.06 2.46 2.05 2.2 2.96 2.37 2.11 2.23 2.17

170.8 164.6 143.1 140.4 165.6 168.8 168.1 139.4 147.7 166.9 170.8 164.6 165.5 170.9 166.4 165 129 146.9 153.2 158.6 114.1 164.5

23.81 27.18 41.21 36.83 10.09 14.33 34.29 19.64 27.13 21.11 17.00 17.62 21.16 42.42 28.66 13.32 25.00 25.00 6.98 14.81 16.10 16.67

Role of Hydrogen Bonds in Protein-DNA Recognition

J. Phys. Chem. B, Vol. 109, No. 20, 2005 10489 that nonplanar amino groups can strengthen existing bonds by improving their geometry (Figure 3). In both of the cases, we find that the hydrogen bond distance has decreased while the hydrogen bond angle has increased toward the ideal linear condition upon introducing nonplanarity to the donor atoms. Further, many new contacts were formed which can be considered as weak hydrogen bonds in terms of bond distance and angle, and it could be noted from our ab initio calculations of DNA-protein residues that these weak bonds can also exert a cumulative effect on the stability of such complexes (Table 3). Analysis of the improvement of hydrogen bonds, considering bonds already existing with planar amino groups, has been done for different types of DNA bases, protein side chains, and protein backbones in both sets 1 and 2. The improvement index values calculated for each case from their respective normalized frequency distribution tabulated in the top part of Table 4 indicate that hydrogen bonds with amino groups of guanine improve maximally compared to other residues, while those of arginine and the protein backbone show the least improvement. Figures 4 and 5 depict the normalized frequency distributions of hydrogen bond angle and distance improvement occurring in the residues guanine, asparagine, and arginine and the protein backbone for set 1. The trend observed regarding improvement in hydrogen bond quality correlates well with the amount of nonplanarity induced in these residues optimized in isolation with several basis sets (Table 2). This is perhaps an indication toward the fact that positioning of hydrogen bond acceptors with respect to amino group donors of residues such as guanine, cytosine, and asparagine is such that better hydrogen bonds can be formed with nonplanar amino groups in most of these cases. Thus, the hydrogen bonds detected with planar amino groups of these residues are not the perfect scenario and improved hydrogen bond geometry tending more toward linearity can be obtained with nonplanar amino groups. Moreover, these improved hydrogen bonds also increase the interaction energy substantially, as suggested from the ab initio results of several systems (Table 3). Among these residues, guanine exhibited the maximum amount of energy gain upon pyramidalization (Table 2), indicating that occurrences of hydrogen bonds with nonplanar amino groups are certainly more probable in most of the cases involving guanine. On the contrary, residues such as arginine and the protein backbone have shown poor abilities in undergoing pyramidalization of amino groups, and this fact is also reflected in protein-DNA crystal structures, where the positioning of hydrogen bond donors and acceptors is such that these facilitate the formation of good bonds with moderately planar amino groups.

Figure 3. Histograms showing distributions of improvement in (a) hydrogen bond distance and (b) hydrogen bond angle with planar and nonplanar amino group donors in 107 protein-DNA structures.

with planar amino groups were confirmed with those from similar software available20,21 and were found to be consistent with them. The number of hydrogen bonds increased to 2939 when nonplanar amino groups were considered while detecting the hydrogen bonds. This corresponds to about a 14.5% increase in the number of such favorable interactions as compared to those with planar amino groups. Similar analysis with a subset of 79 crystal structures (referred to as set 2) showed the existence of 1856 hydrogen bonds with planar amino groups and 2153 with nonplanar amino groups. This subset also indicated a rise of 16% in the number of hydrogen bonds with nonplanar amino groups. Apart from increasing the number of strong and weak hydrogen bonds, nonplanar amino groups also introduce bifurcations in many cases, which are important in maintaining the flexibility of interacting protein and DNA residues. A histogram representing the distribution of the hydrogen bond distance and angle of 2539 bonds of set 1 obtained with both planar and nonplanar amino groups strongly indicates

TABLE 4: Improvement Index Values for Hydrogen Bond Donor Groups of Different Bases and Amino Acid Side Chains and Those of Hydrogen Bonds Occurring through the Minor and Major Grooves of DNA in Protein-DNA Complexes number of new bonds formed with nonplanar amino group

number of bonds existing with planar amino group

improvement index (calculated with bonds existing with planar amino groups)

specific amino group

set 1a

set 2b

set 1

set 2

set 1

set 2

Gua Ade Cyt Asn Gln Arg protein backbone

17 22 25 48 25 224 46

16 20 18 30 22 168 27

60 138 151 295 119 1124 663

52 105 96 184 98 888 429

90.89 66.65 74.01 84.48 64.48 44.97 47.39

100.00 68.57 79.96 67.18 75.75 46.35 58.14

minor groove major groove

52 125

42 101

298 783

215 570

100.00 72.83

100.00 77.16

a

Considering all 107 protein-DNA structures. b Considering a subset of 79 protein-DNA structures.

10490 J. Phys. Chem. B, Vol. 109, No. 20, 2005

Mukherjee et al.

Figure 4. Histograms showing distributions of improvement in the hydrogen bond distance of amino group donors of (a) guanine, (b) asparagine, (c) arginine, and (d) protein main chain in 107 protein-DNA structures.

Figure 5. Histograms showing distributions of improvement in the hydrogen bond angle of amino group donors of (a) guanine, (b) asparagine, (c) arginine, and (d) protein main chain in 107 protein-DNA structures.

Analysis of the improvement in hydrogen bonds has been done separately for those occurring through the DNA minor grooves and major grooves in both the set 1 and 2 structures. This was done by considering hydrogen bond donors and acceptors of DNA bases facing the minor or major grooves and discarding interactions occurring through the DNA backbone and ribose sugar. The values of the improvement indices of these two subsets indicate that minor groove binding complexes undergo a larger amount of improvement compared to the case of major groove binding complexes (Table 4, bottom part). It

could be observed from the normalized frequency distribution of hydrogen bond distance (Figure 6) and angle (Figure 7) that the maximum number of bonds underwent a large amount of angle or distance improvement in the minor subset. The amino groups of guanine, showing maximum improvement, always remain facing the minor grooves in Watson-Crick base paired DNA, while the amino groups of adenine and cytosine remain facing the major grooves. This is partly responsible for increasing the improvement index of the minor subset. Hence, an important implication arises from this result; that is, the role

Role of Hydrogen Bonds in Protein-DNA Recognition

J. Phys. Chem. B, Vol. 109, No. 20, 2005 10491

Figure 6. Histograms showing distributions of improvement in the hydrogen bond distance of amino group donors facing (a) the minor groove of DNA and (b) the major groove of DNA in 107 protein-DNA structures.

Figure 7. Histograms showing distributions of improvement in the hydrogen bond angle of amino group donors facing (a) the minor groove of DNA and (b) the major groove of DNA in 107 protein-DNA structures.

of the nonplanar amino groups of guanine in improving the geometry of hydrogen bonds formed through DNA minor grooves. Discussion Although the hydrogen bonds in protein-DNA complexes is a widely studied area, several aspects still remain inadequately understood. In the absence of hydrogen atoms in most of the X-ray crystallographic data, amino group hydrogens are to date modeled according to the widely adopted planar geometry. This failed to reveal the existence of expected hydrogen bonds, the presence of which could account for vital contacts in areas responsible for sequence specificity and recognition of protein-DNA complexes. Moreover, there are many cases in minor groove binding ligands and drugs where the reported hydrogen bonds obtained with planar amino groups are often of poor quality due to large deviations from the ideal linear conditions. Thus, consideration of only planar amino groups probably undermines the exact contribution of hydrogen bonds in biomolecular recognition processes. Eventually, even after several revelations, the question of sequence recognition for minor groove binding proteins still remains unanswered in many cases.5,44 However, our study on protein-DNA interfacial hydrogen bonds, analyzed also considering nonplanar amino groups, has indicated that these strong, short ranged, and directional interactions indeed play a greater role in these biomolecular recognition and binding processes than that speculated in previous studies. Nonplanar amino groups improve the overall

quality of hydrogen bonds and exhibit many new interactions, which enhance the stability of such complexes. The study also indicates different extents of geometry improvement for different residues, which can perhaps extend the current understanding of the problem of sequence specificity and insufficient directional contacts, particularly in the minor grooves. It is seen that positively charged residues found abundantly in protein-DNA contact surfaces, for example, arginine, are widely responsible for nonspecific contacts with negatively charged DNA backbone atoms.45 On the other hand, residues such as asparagine, glutamine, adenine, guanine, and cytosine are involved in sequence specific interactions. We found that nonplanar amino groups improve the quality of hydrogen bonds with arginine to some extent but greatly strengthen those with asparagine, glutamine, guanine, adenine, and so forth. Further, the highest degree of improvement observed for guanine with its amino group facing the DNA minor groove suggests that nonplanar amino groups strengthen the directional contacts in minor grooves which have otherwise been found to be poor in minor groove binding ligand-DNA complexes. Hence, consideration of nonplanar amino groups while optimizing neighboring hydrogen bonds can lead to a more realistic scenario of such interactions occurring between protein and DNA residues. This definitely signifies that hydrogen bonds play a bigger role in maintaining the specificity and stability of protein-DNA complexes than that thought before. Furthermore, the role of hydrogen bonds revealed in the present study would also provide additional guidelines in developing more competent docking algorithms, molecular representations, or interaction poten-

10492 J. Phys. Chem. B, Vol. 109, No. 20, 2005 tials,46,47 that can aim in predicting recognition patterns in macromolecular complexes. Acknowledgment. We are thankful to Council of Scientific and Industrial Research (CSIR), India, for financial support. Note Added after ASAP Publication. Figures 5a and 5b have been corrected. This paper was published ASAP on 5/3/ 05. The corrected version was posted on 5/10/05. References and Notes (1) Jen-Jacobson, L. Biopolymers 1997, 44, 153-180. (2) Pardo, L.; Campillo, M.; Bosch, D.; Pastor, N.; Weinstein, H. Biophys. J. 2000, 78, 1988-1996. (3) Mandel-Gutfreund, Y.; Margalit, H. Nucleic Acids Res. 1998, 26, 2306-2312. (4) Mandel-Gutfreund, Y.; Schueler, O.; Margalit, H. J. Mol. Biol. 1995, 253, 370-382. (5) Werner, M. H.; Gronenborn, A. M.; Clore, M. Science. 1996, 271, 778-784. (6) Nadassy, K.; Wodak, S. J.; Janin, J. Biochemistry 1999, 38, 19992017. (7) Jones, S.; Heyningen, P. V.; Berman, H. M.; Thornton, J. M. J. Mol. Biol. 1999, 287, 877-896. (8) Luscombe, N. M.; Laskowski, R. A.; Thornton, J. M. Nucleic Acids Res. 2001, 29, 2860-2874. (9) Tabernero, L.; Bella, J.; Aleman, C. Nucleic Acids Res. 1996, 24, 3458-3466. (10) Sarkhel, S.; Desiraju, G. R. Proteins 2004, 54, 247-259. (11) Janin, J. Structure. 1999, 7, R277-R279. (12) Norberg, J. Arch. Biochem. Biophys. 2003, 410, 48-68. (13) Nadassy, K.; Oliviera, I. T.; Alberts, I.; Janin, J.; Wodak, S. J. Nucleic Acids Res. 2001, 29, 3362-3376. (14) Mandel-Gutfreund, Y.; Margalit, H.; Jernigan, R. L.; Zhurkin, V. B. J. Mol. Biol. 1998, 277, 1129-1140. (15) Rooman, M.; Lievin, J.; Buisine, E.; Wintjens, R. J. Mol. Biol. 2002, 319, 67-76. (16) Gromiha, M.; Santhosh, C.; Ahmad, S. Int. J. Biol. Macromol. 2004, 34, 203-211. (17) Kono, H.; Sarai, A. Proteins 1999, 35, 114-131. (18) Gromiha, M.; Siebers, J. G.; Selvaraj, S.; Kono, H.; Sarai, A. J. Mol. Biol. 2004, 337, 285-294. (19) Siggers, T. W.; Silkov, A.; Honig, B. J. Mol. Biol. 2005, 335, 10271045. (20) McDonald, I. K.; Thornton, J. M. J. Mol. Biol. 1994, 238, 777793. (21) Hooft, R. W. W.; Sander, C.; Vriend, G. Proteins 1996, 26, 363376. (22) Word, J. M.; Lovell, S. C.; Richardson, J. S.; Richardson, D. C. J. Mol. Biol. 1999, 285, 1735-1747.

Mukherjee et al. (23) Gould, I. R.; Kollman, P. A. J. Am. Chem. Soc. 1994, 116, 24932499. (24) Sponer J.; Hobza, P. J. Phys. Chem. 1994, 98, 3161-3164. (25) Kydd, R. A.; Mills, I. M. J. Mol. Spectrosc. 1972, 42, 320-326. (26) Kydd, R. A.; Krueger, P. Chem. Phys. Lett. 1977, 49, 539-543. (27) Larsen, N. W.; Hansen, E. L.; Nicolaisen, F. M. Chem. Phys. Lett. 1976, 43, 584-586. (28) Christen, D.; Norbury, D.; Lister, D. G.; Palmiery, P. J. Chem. Soc., Faraday Trans. 1975, 2, 438-446. (29) Dasgupta, A.; Majumdar R.; Bhattacharyya, D. Indian J. Biochem. Biophys. 2004, 41, 233-240. (30) Sponer, J.; Mokdad, A.; Sponer, J. E.; Pacjkova, N.; Leszczynski, J.; Leontis, N. B. J. Mol. Biol. 2003, 330, 967-978. (31) Sponer, J.; Hobza, P. J. Am. Chem. Soc. 1994, 116, 709-714. (32) Sponer, J.; Florian, J.; Hobza, P.; Leszczynski, J. J. Biomol. Struct. Dyn. 1996, 13, 827-833. (33) Sponer, J.; Leszczynski, J.; Hobza, P. J. Biomol. Struct. Dyn. 1996, 14, 117-135. (34) Bhattacharyya, D.; Majumdar, R. Indian J. Biochem. Biophys. 2001, 38, 16-19. (35) Bhattacharyya, D.; Sen, K.; Mukherjee, S. Indian J. Chem., Sect A, in press. (36) Schmidt, M. W.; Baldridge, K. K.; Boatz, J. A.; Elbert, S. T.; Gordon, M. S.; Jensen, J. J.; Koseki, S.; Matsunaga, N.; Nguyen, K. A.; Su, S.; Windus, T. L.; Dupuis, M.; Montgomery, J. A. J. Comput. Chem. 1993, 14, 1347-1363. (37) Schaftenaar, G.; Noordik, J. H. J. Comput.-Aided Mol. Des. 2000, 14, 123-134. (38) Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. Nucleic Acids Res. 2000, 28, 235-242. (39) Berman, H. M.; Olson, W. K.; Beveridge, D. L.; Westbrook, J.; Gelbin, A.; Demeny, T.; Hsieh, S.-H.; Srinivasan, A. R.; Schneider, B. Biophys. J. 1992, 63, 751-759. (40) Morokuma, K. J. Chem. Phys. 1971, 55, 1236-1244. (41) http://dunbrack.fcc.edu/PISCES.php. (42) MacKerell, A. D., Jr.; Bashford, D.; Bellott, M.; Dunbrack, R. L., Jr.; Evanseck, J. D.; Field, M. J.; Fischer, S.; Gao, J.; Guo, H.; Ha, S.; Joseph-McCarthy, D.; Kuchnir, L.; Kuczera, K.; Lau, F. T. K.; Mattos, C.; Michnick, S.; Ngo, T.; Nguyen, D. T.; Prodhom, B.; Reiher, W. E., III; Roux, B.; Schlenkrich, M.; Smith, J. C.; Stote, R.; Straub, J.; Watanabe, M.; Wiorkiewicz-Kuczera, J.; Yin, D.; Karplus, M. J. Phys. Chem. B 1998, 102, 3586-3616. (43) Foloppe, N.; MacKerell, A. D., Jr. J. Comput. Chem. 2000, 21, 86-104. (44) Wemmer, D. Nat. Struct. Biol. 1998, 5, 169-171. (45) Jones, S.; Shanahan, H. P.; Berman, H. M.; Thornton, J. M. Nucleic Acids Res. 2003, 31, 7189-7198. (46) Deremble, C.; Lavery, R. Curr. Opin. Struct. Biol., in press. (47) Havranek, J. J.; Duarte, C. M.; Baker, D. J. Mol. Biol. 2004, 344, 59-70.