Probability Distributions of Hydration Water Molecules around Polar

Jul 21, 2009 - Table 1 lists the amount of polar protein atoms, which satisfy the given criteria for H-bond distance (2.4−3.4 Å) with water molecul...
0 downloads 0 Views 13MB Size
11274

J. Phys. Chem. B 2009, 113, 11274–11292

Probability Distributions of Hydration Water Molecules around Polar Protein Atoms Obtained by a Database Analysis Daisuke Matsuoka† and Masayoshi Nakasako*,†,‡ Department of Physics, Faculty of Science and Technology, Keio UniVersity, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Kanagawa 223-8522, Japan, and RIKEN Harima Institute, 1-1-1 Kouto, Mikaduki, Sayo, Hyogo, Japan ReceiVed: March 19, 2009; ReVised Manuscript ReceiVed: June 13, 2009

Hydration structures on protein surfaces are visualized by high-resolution cryogenic X-ray crystallography. We calculated the probability distributions of 4 831 570 hydration water molecules found around the 4 214 227 polar atoms in main chains and hydrophilic side chains from the 17 984 crystal structures in the Protein Data Bank. The structures are refined using the diffraction data collected below 150 K and at resolutions of better than 2.2 Å. The calculated distributions were nonrandom but condensed into a few clusters. The clusters were decomposed into the distance and angular distributions by viewing from the polar coordinate system. The major peaks in the clusters were almost located along the directions of the NsH and OsH bonds or the lone pairs of oxygen atoms. The Gaussian fitting method was applied for the distribution profiles to evaluate quantitatively the peak positions and the widths. The parameters characterizing the distributions apparently depended on the hydrogen-bond partners of water molecules and on the modes whether the water molecules acted as donors or acceptors of protons. This led to propose the different roles of NHn (n ) 1, 3), OH, and CO groups in protein hydration and possible in protein-ligand and protein-protein interaction: While CdO groups appear to control the H-bond distances, NHn groups likely limit the angular range of H-bonds. The OH groups have both characteristics. In addition, it was also demonstrated that polar protein atoms were arranged to satisfy the tetrahedral hydrogen-bond geometry of water molecules, suggesting essential roles of water molecules in the folding process and in the stabilization of protein structures. These probability distributions are probably one of fundamental data to better understand the roles of hydration water molecules in the folding process and the stability of proteins in solution. 1. Introduction The structures at the protein-water interfaces, so-called hydration structures, are the subject of experimental and theoretical studies and of much discussion to understand why proteins fold and function in water.1–5 At least, an amount of water molecules enough for the monolayer hydration of a protein has been argued as likely necessary for the internal motions to induce the biological functions of the protein.6–9 X-ray crystallography10–12 and neutron crystallography13–15 have made important contributions to visualize the ensemble-averaged hydration structures regarding the geometries and interactions in the hydrogen bonds (H-bonds)16,17 formed between water molecules and protein atoms. Studies focusing on the hydration structures of proteins postulate their relevance to biological functions of proteins. Hydration water molecules have been argued to act as a glue to stabilize tertiary,18 higher-order structures19,20 and molecular complexes.21,22 In large-scale conformational changes of proteins, water molecules may work as a lubricant to assist in smooth movements of domains23 or a ratchet to drive or hinder the directional movements.24 Some enzymatic reactions use water molecules as donors or acceptors of protons.25 In a lightdriven proton pump, a cluster of water molecules in the transmembrane region works as a medium for the proton transfer.26,27 * To whom correspondence should be addressed. Phone: +81-45-5661679. Fax: +81-45-566-1672. E-mail: [email protected]. † Keio University. ‡ RIKEN Harima Institute.

Soluble proteins are generally thought to be covered by a monolayer of water molecules in a nonrandom distribution.28 In this picture, water molecules are localized in hydration sites at about polar atoms on the hydrophilic surfaces and show directional ordering.29,30 Hydrophobic surfaces may promote clathrate-like arrangements of water molecules that probably depend on the hydration patterns induced by the polar groups in the near neighborhood.31,32 The water molecules and polar protein atoms appear to form H-bond networks extended over much of the surface of proteins.33–35 Cryogenic X-ray diffraction techniques36 to suppress the radiation damage of protein crystals have made a great contribution to high-resolution crystal structure analyses. When resolutions are better than 2.2 Å, cryo-crystallography provides the structures of proteins accompanying a lot of hydration water molecules. The amount experimentally seen to be localized around a protein at 100 K is more than twice of that found at ambient temperature.32,33,37 A large number of protein structures solved at cryogenic temperatures have now been accumulated together with a huge number of water molecule positions in the Protein Data Bank (PDB; http://www.pdb.org/).38 Thus, it is timely to determine the distributions and interactions of hydration water molecules around proteins from the “hydrated” structural models in the databank. At the first stage, we focus on the local hydration structures about polar protein atoms. Polar atoms, which occupy a large surface area of soluble proteins, provide the reaction fields of enzymes or the platforms for molecular interactions. About 20 years ago, systematic analyses were carried out to visualize the

10.1021/jp902459n CCC: $40.75  2009 American Chemical Society Published on Web 07/21/2009

Hydration Structure around Polar Protein Atoms

J. Phys. Chem. B, Vol. 113, No. 32, 2009 11275

TABLE 1: Statistics of the Selected Polar Atoms and Hydration Water Molecules Used in the Calculation no. of polar protein atoms used in the calculationa no. of water molecules total no. atoms used in the hybridization/ lone pairsd/ no. of polar used in the calculation atom amino acid of residues used superimposition to the ideala (average Bb Å2) geometryc bonds with He atoms (B < 45 Å2) peptide bond

9 145 366

C, O, N

Glu

588 397

CD, OE1, OE2

Asp

529 113

CG, OD1, OD2

Arg

450 353

NE, CZ, NH2

Lys His

499 412 224 809

CD, CE, NZ ND1, CE1,NE2

Trp Ser Thr Tyr Gln

140 029 525 979 512 327 329 373 328 705

CD1, NE1, CE2 CA, CB, OG CA, CB, OG1 CE2, CZ, OH CD, OE1, NE2

Asn

384 810

CG, OD1, ND2

O (20.01) N (21.15) OE1 (27.08) OE2 (27.12) OD1 (23.52) OD2 (23.95) NE (25.58) NH1 (25.73) NH2 (26.08) NZ (27.88) ND1 (21.76) NE2 (21.37) NE1 (19.39) OG (21.99) OG1 (20.16) OH (21.21) OE1 (24.68) NE2 (24.21) OD1 (22.41) ND2 (21.61)

sp2/T.p. sp2/T.p. sp2/T.p. sp2/T.p. sp2/T.p. sp2/T.p. sp2/T.p. sp2/T.p. sp2/T.p. sp3/Tet. sp2/T.p. sp2/T.p. sp2/T.p. sp3/Tet. sp3/Tet. sp2/T.p. sp2/T.p. sp2/T.p. sp2/T.p. sp2/T.p.

2/0 0/1 2/0 2/0 2/0 2/0 0/1 0/2 0/2 0/3 0/1 0/1 0/1 2/1 2/1 1/1 2/0 0/2 2/0 0/2

1 821 038 664 650 167 027 167 189 157 946 188 267 46 184 92 497 73 012 120 938 34 900 32 700 17 825 151 156 163 944 110 067 89 490 66 345 93 973 90 969

1 989 000 668 188 198 901 202893 188 175 235 447 46 910 102 030 79 556 151 717 35 602 33 834 18 086 174 132 191 686 128 362 103 694 74 546 106 353 102 458

a The root-mean-square deviations of the atom positions used in the superimpositions from the standard geometry are less than 0.07 Å. b The frequencies of the B-factors are shown in Figure S1, Supporting Information. c The trigonal planar and tetrahedral geometries are abbreviated as “T.p.” and “Tet.”. d The number of lone pairs available in H-bonding. e The number of OH or NH groups to act as donors in a H-bond.

distributions of hydration water molecules around polar protein atoms in 16 nonhomologous high-resolution crystal structures determined at ambient temperatures.39,40 Now, the huge number of water molecules in the databank is likely sufficient for the quantitative description to better understand the hydration structures of proteins. Here, we show the probability distributions of hydration water molecules around polar protein atoms in main chains and in the side chains of 11 types of amino acid residues (glutamate, aspartate, arginine, lysine, histidine, tryptophan, serine, threonine, tyrosine, glutamine, and asparagine residues). The distributions are calculated by using millions of water molecules, which form H-bonds with the polar atoms in high-resolution crystal structures at cryogenic temperatures. We also analyze the H-bond geometry of hydration water molecules in the vicinity of protein surfaces. Through the quantitative analyses for the distributions, we discuss the common characteristics that may relate to the role of water molecules in the structures and functions of proteins in solution. 2. Computational Methods 2.1. Calculation of the Probability Distribution of Hydration Water Molecules. To analyze systematically the hydration structures of proteins in the PDB, the program suite “PD_hydra” is developed through incorporating the basic functions implemented in the program suite “FESTKOP”.32,33 The program “PD_hydra” selects PDB files according to the given criteria as to the resolutions, temperatures in the data collection, refinement programs used, and their crystallographic R-factors. The program first selects all hydration water molecules within a 2.4-3.4 Å distance from polar protein atoms (oxygen and nitrogen atoms). In a preliminary analysis, we obtained a distance distribution for a number of possible H-bond pairs among water molecules and polar protein atoms. The resultant distribution displayed a peak at ca. 2.9 Å with a half width of ca. 0.2 Å. Taking also the coordinate errors (less than 0.3 Å) expected in structures determined at resolutions of 2.2 Å,41 we

here set the lower and upper limits for picking up possible hydrogen-bond pairs. In the next stage, the program searches one polar protein atom located closest to every selected water molecule and picks up a group including the polar atom from those listed in Table 1. Then, the rotation matrix and the translation vector are calculated to superimpose the group onto its stereochemically standard structure, which accompanies the arrays of 0.25 × 0.25 × 0.25 Å3 voxels to describe the probability distribution of hydration water molecules. The hydration water molecule is assigned to one of the voxels after the transformation by the matrix and the vector. This procedure is iteratively carried out for hydration water molecules displaying B-factors less than a given threshold value in the selected set of protein structures. Finally, the probability distributions are obtained by counting the number of water molecules in the voxels. This program also examines whether the tetrahedral arrangement of bulk water42,43 is retained in the vicinity of protein surfaces, by picking up hydration water molecules having four H-bond partners. The found partners are fitted to the standard tetrahedral geometry by using the algorithm developed by Kabsch.44 2.2. Description of the Probability Distribution. To describe the probability distribution, we use a polar coordinate system (r, θ, φ) as defined in Figure 1a. In the coordinate system, we set the reference bond axis in the xy-plane except for the side chains of serine, threonine, and lysine. For the three, the other polar coordinate system is used to set the reference bond axis in the yz-plane, because of the sp3-hybridization of their oxygen or nitrogen atom. To characterize quantitatively the probability distributions, the profiles are approximated as the sum of Gaussians

f(x) )

∑ i

[

Ai exp -

(x - x0i)2 2σi2

]

11276

J. Phys. Chem. B, Vol. 113, No. 32, 2009

Matsuoka and Nakasako

Figure 1. (a) Diagrammatic illustration of the polar coordinate system used in describing the probability distribution of hydration water molecules of serine, threonine, and lysine (right) and the other (left). (b) A plot showing the correlation between the averaged B-factors of the polar atoms (oxygen atoms in the CdO groups (red-colored symbol) and in the OH groups (blue) and nitrogen atoms (green) in the NHn (n ) 1-3) groups) and water molecules used in the analysis.

The amplitude (Ai), the center position (x0i), and the standard deviation (σi) of the ith Gaussian are calculated by using the program Origin8 (OriginLab Co., USA). The probability distributions of water molecules are output in the map.dn6 format45 to visualize easily by graphic software. 3. Results 3.1. Overview of the Probability Distributions of Hydration Water Molecules. From the PDB, 17 984 protein structures are selected to calculate the probability distributions of hydration water molecules (Table S1, Supporting Information). The crystal structures are solved at resolutions better than 2.2 Å using the diffraction data collected below 150 K, and show the crystallographic R-factors of better than 0.23. When the resolutions are higher (better) than the given limit, hydration water molecules with small positional fluctuations appear as discrete and spherical densities to be unambiguously assigned.33,37 The three criteria ensure errors in the refined coordinates of less than ca. 0.3 Å.41 Table 1 lists the amount of polar protein atoms, which satisfy the given criteria for H-bond distance (2.4-3.4 Å) with water molecules, from the 9 145 366 residues in the 17 984 structural models (Table 1). In the calculation of the probability distributions, we excluded hydration water molecules displaying thermal factors of larger than 45 Å2 (Figure 1b), corresponding to the mean square displacement of 1.7 Å2, for ensuring the statistical reliability of the analysis (Figure S1, Supporting Information). The calculated probability distributions are basically consistent with those from the 10 high-resolution crystal structures at 100 K.32 The profiles and density maps calculated from millions of water molecules display far clearer details than in the previous study. The distributions are illustrated as density maps (Figures 2-8), and the Gaussian fitting method is applied for the quantitative descriptions of the distance and the angular distributions in the polar coordinate systems centered at the target polar atoms (Tables 2 and 3). The selected set of 17 984 protein structures contains proteins of high structural homologies such as the same proteins crystallized under the different conditions, proteins with a few mutations from the wild type, and the liganded/unliganded forms of the same proteins. Thus, it should be careful to analyze the set biased by many redundant components. We examine whether the redundant components have influences on the probability distributions by the following two approaches. In the first, we apply the algorithm to 1265 proteins. They are selected under the three criteria from the 3873 proteins with structural homologies less than 25% listed in the WEB server46–48 (Table

S2, Supporting Information). The obtained probability distributions are slightly noisy but are essentially consistent with those from the whole set regarding the peak positions and the standard deviations in their Gaussian fitting (Figures S2-S8 and Tables S3-S4, Supporting Information). This result implies that the hydration structures around polar protein atoms are independent of the folding patterns of proteins. In the second approach, we examined whether the positions of hydration water molecules are preserved around polar atoms within a partial set of redundantly selected protein structures. As the partial set, we selected 42 crystal structures of T4lysozyme and superimposed them as to their CA atoms in the 1-162 residues (Table S5, Supporting Information). In the superimposed set, the spatial distributions of hydration water molecules are nonrandom and their positions are almost conserved (Figure S9, Supporting Information). This result confirms that the redundant sets have little influences on the probability distributions derived from the whole set. In fact, Bottoms et al. reported that proteins in the six families had solvent sites with more than 90% conservation and that the sites were invariably located in regions of the protein with very high sequence conservation.18 The probability distributions are almost independent of the refinement programs used (Tables S6 and S7 and Figures S10-S14, Supporting Information). On the other hand, the widths of the profiles become narrow, when the analysis is applied to the water molecules displaying B-values smaller than 25 Å2, or the structures solved at resolutions better than 1.5 Å (Tables S8 and S9 and Figures S10-S14, Supporting Information). The differences are more prominent in the distance (r-) distributions rather than in the angular (θ- and φ-) distributions. 3.2. Distribution around Peptide Bond. More than 2 million water molecules found around peptide bonds are condensed in three clusters 1-3 (Figure 2). The number of water molecules composing cluster 2 is about twice that of cluster 1 or 3. Cluster 1 is localized in line with the N-H bond, and the distance distribution (r-distribution) has a single peak centered at 2.92 Å (Figure 2b). The θ- (Figure 2c) and φ-distributions (Figure 2d) are limited in narrow angular ranges, as indicated by the standard deviation (σθ and σφ) values in the Gaussian approximations that are smaller than those in the other (Table 2). Clusters 2 and 3 display semicircular shapes around the carbonyl oxygen atom in the sp2-hybridization. The r-distributions in the clusters display single peaks centered at 2.74-2.75 Å, and cluster 3 has a longer tail than cluster 2 (Figure 2b). The peaks in the φ-distributions shift toward the axis of the

Hydration Structure around Polar Protein Atoms

J. Phys. Chem. B, Vol. 113, No. 32, 2009 11277

Figure 2. Probability distribution of hydration water molecules around the peptide bond. (a) A stereoplot shows the distribution as fishnets contoured at the 1551 water molecules/voxel level. This panel is drawn with the PyMol graphics program.61 The red spheres indicate the positions of peaks in the θ-distribution. When a θ-distribution is approximated by three Gaussians, the components are named as ξ, η, and ζ. The sub components in the φ-distribution are named as R and β. The coordinates used for the descriptions of the distributions are schematically drawn on the right. Panels b, c, and d show the r-, θ-, and φ-distributions, respectively. The observed profiles of the distributions (open circles) are approximated by Gaussians. The red lines are the sum of all Gaussians indicated by green and blue lines. The parameters of the major Gaussians (green) are depicted in Table 2. (e) An example of a water molecule composing the η2-component. The water molecule forms a H-bond with the CdO group of the main chain in an R-helix of the LOV1 domain of phototropin1 (PDB accession code 2Z6C).22

CdO bond from the directions of the lone pair electrons of the oxygen atom (Figure 2d and Table 2). At about φ ) 0°, a significant number of water molecules distribute in the direction of the CdO bond. The θ-distribution in cluster 2 is approximated with one broad Gaussian centered at θ ) 94.5° (ξ2-component) and another narrow one at θ ) 112.0° (η2) (Figure 2c). The water molecules of the η2-component frequently hydrate the carbonyl oxygen atoms,

which form the backbones of R-helices with bifurcated H-bonds (Figure 2e).49,50 Such hydration patterns were found in the systematic analysis on the solvation of the main chain.40 The θ-distribution in cluster 3 is approximated by three Gaussians (ξ3, η3, and ζ3 in Figure 2c). Water molecules in the η3- and ζ3-components are probably stabilized by the van der Waals interactions with the oxygen atom, and the side chain in the next residue.

11278

J. Phys. Chem. B, Vol. 113, No. 32, 2009

Matsuoka and Nakasako

TABLE 2: Statistics of the Data Used for Calculating the Distribution Functions and the Parameters for the Gaussian Fitting for the Probability Distributions of the Water Molecules in 17 984 Crystal Structures Solved Using the Data Collected below 150 K and Resolutions Better than 2.2 Å Gaussian fitting parameters for probability distributionc r-distribution amino acid peptide bond

Glu

Asp

Arg

Lys His Trp Ser Thr Tyr

cluster no. of water molecules (average Bb Å2) no.a

Ar

r0

θ-distribution σr



θ0

θideald

1

668 188 (27.56)

40 799 2.92 0.111

47 244

93.6

90.0

2

1 260 894 (27.48) 728 106 (27.56)

1

110 936 (28.66)

29 251 94.5 23 351 112.0 12 737 91.0 9845 131.8 5977 45.8 2686 90.2

90.0

3

81 449 2.74 0.103 ξ η 42 314 2.75 0.106 ξ η ζ 6400 2.69 0.109

90.0

2 3 4

87 965 (29.62) 86 435 (29.65) 116 458 (28.72)

4890 4748 6777

2.70 0.108 2.69 0.110 2.68 0.106

2129 2006 3292

86.2 89.9 89.8

90.0 90.0 90.0

1

92 030 (28.13)

5397

96 145 (28.88)

5628

3 4

103 871 (29.07) 131 576 (28.09)

6160 7949

1867 521 1772 1229 818 2352 3886

87.6 59.3 91.6 134.6 53.0 93.6 90.1

90.0

2

2.71 0.106 ξ η 2.71 0.103 ξ η ζ 2.69 0.109 2.70 0.104

90.0 90.0

1 5

46 910 (30.04) 102 030 (29.92)

1997 4289

2.85 0.148 2.91 0.166

2405 5431

90.7 89.7

90.0 90.0

79 556 (30.74)

2973

2.90 0.188

3815

89.4

90.0

49 893 (30.91) 48 368 (31.05) 53 456 (32.02) 35 602 (27.65) 33 834 (28.64) 18 086 (28.87) 174 132 (27.56)

2087 2067 2270 2121 1775 1023 10 113

2.78 2.78 2.80 2.77 2.79 2.92 2.71

0.143 0.132 0.136 0.109 0.112 0.120 0.101

2883 2784 2967 1958 2326 1276 10 867

72.5 72.2 78.8 89.9 89.7 89.8 70.4

70.6 70.6 70.6 90.0 90.0 90.0 70.6

191 686 (27.00)

12 112 2.72 0.104

14 197

64.3

70.6

64 736 (27.07) 63 626 (27.09)

3875 3835

2333 2409

91.0 89.1

90.0 90.0

4 3 2 1 2 3 1 2 1 2 3 1 2 3 1 2

2.68 0.105 2.68 0.105

90.0

90.0

φ-distribution σθ



φ0

φideald

σφ

10.51 R 127 292 4.6 0.0 4.78 β 48 577 -1.7 0.0 8.84 29.80 R 79 963 -42.1 -60.0 12.70 6.23 β 102 028 -55.2 -60.0 6.18 24.97 72 967 32.5 60.0 9.14 16.68 16.87 30.69 R 10 282 -33.0 -60.0 8.14 β 5873 -55.8 -60.0 7.35 31.94 9295 55.5 60.0 11.59 32.04 7320 -53.5 -60.0 13.91 24.38 R 4056 36.8 60.0 10.35 β 14 156 56.7 60.0 7.40 34.15 R 7446 -33.4 -60.0 13.64 22.42 β 4970 -56.9 -60.0 7.08 25.91 8961 55.7 60.0 13.78 15.98 15.26 32.05 9580 -53.5 -60.0 13.03 23.08 R 6481 47.4 60.0 11.94 β 16 000 57.8 60.0 6.21 12.63 5170 -63.1 -62.0 12.06 12.56 R 5203 -43.4 -60.0 8.35 β 11 433 -45.8 -60.0 3.86 3630 59.7 60.0 13.3 14.29 3639 -57.0 -60.0 13.59 2200 56.6 60.0 14.84 11.94 3159 -153.9 -150.0 23.08 12.02 2991 -26.4 -30.0 23.54 11.67 3582 89.8 90.0 20.93 12.33 5086 53.8 54.0 9.89 9.57 3353 123.4 126.0 14.13 10.39 2333 49.4 54.0 10.21 9.79 4775 1.0 -30.0 17.54 3204 99.68 90.0 17.23 3737 -177.6 -150.0 13.09 9.67 5853 -4.2 -30.0 13.04 4597 99.1 90.0 16.69 2562 -165.7 -150.0 25.68 18.68 11 474 -62.2 -60.0 5.08 16.98 11 747 62.3 60.0 5.23

a

The cluster numbers are defined in Figures 2-6. b The frequencies of the B-factor values are shown in Figure S1, Supporting Information. The parameters in the Gaussian fitting are described in the Computational Methods section. d The ideal values are expected from the geometries of the lone pairs, O-H and N-H bonds viewed from the polar coordinates defined in Figures 2-6. e The Gaussian components in the φ-distributions are defined and shown in Figures 2d, 3d and h, and 4d. f The Gaussian components in the θ-distributions are defined and shown in Figures 2c and 3g. c

3.3. Distribution around Carboxyl Groups of Glutamate and Aspartate. Carboxyl groups of glutamate and aspartate side chains are expected to be deprotonated under most crystallization conditions of proteins. However, it is difficult for X-ray crystallography to judge which oxygen atom of the carboxyl group is protonated. Thus, the probability distributions shown here are the ensemble average of the protonated and deprotonated states. (i) Glutamate. Pairs of discrete clusters appear around the OE1 and OE2 atoms (Figure 3a). Though the density maps of clusters are broad in the θ-distributions, the peak positions are in the plane of the lone-pair electrons of the atoms in the sp2hybridization (Table 2). The r-distributions in the four clusters (Figure 3b) are very similar to those around the peptide bond (Figure 2b and Table 2). All θ-distributions (Figure 3c) are

almost approximated by single Gaussians with σ-values similar to the main peaks in clusters 2 and 3 around the peptide bond (Table 2). The φ-distributions in clusters 2 and 3 are approximated by single Gaussians, but those in clusters 1 and 4 are composed of two components (R and β in Figure 3d). The fine structures occurring only in clusters 1 and 4 reflect the distortions of the H-bond potentials by the CG-H2 groups near the clusters. (ii) Aspartate. The probability distributions are very similar to that in glutamate (Figure 3e) in the r- and φ-distributions of clusters 1-4 and the θ-distributions of clusters 3 and 4 (Figure 3f-h and Table 2). The θ-distributions around the OD1 atom have additional components out of the lone-pair plane (η1, η2, and ζ2 in Figure 3g).

Hydration Structure around Polar Protein Atoms

J. Phys. Chem. B, Vol. 113, No. 32, 2009 11279

TABLE 3: Statistics of the Data Used for Calculating the Distribution Functions and the Parameters for the Gaussian Fitting for the Probability Distributions of the Water Molecules in 17 984 Crystal Structures Solved Using the Data Collected below 150 K and resolutions better than 2.2 Å Gaussian fitting parameters for probability distributionb r-distribution amino acid Gln

Asn

cluster

no. of water molecules (average Ba Å2)

1

51 876 (27.99)

2 3

51 818 (28.78) 36 787 (30.45)

4

37 759 (29.43)

1

Ar

θ-distribution

r0

2872

2.72

0.109

45 557 (28.07)

2752 1261 640 1064 945 2308

2.72 2.95 2.70 2.92 2.71 2.74

0.109 0.156 0.086 0.143 0.105 0.115

2

60 796 (28.20)

3236

2.73

0.112

3

48 331 (29.84)

4

54 127 (29.10)

2002 837 2110 601

2.96 2.72 2.92 2.71

0.126 0.083 0.155 0.084

εe µ ε µ

ε µ ε µ



σr

φ-distribution

θidealc

θ0

1499

91.8

90.0

23.81

ξf η ξ η

1542 1280 499 1129 577 1328

88.2 90.5 89.7 88.9 89.9 87.6

90.0 90.0 90.0 90.0 90.0 90.0

23.21 10.67 31.80 11.12 31.20 23.22

ξ η ζ ξ η ξ η

1610 859 554 1936 600 2626 509

90.1 130.7 49.1 89.7 90.3 89.5 92.4

90.0

15.24 14.43 17.67 10.50 31.43 10.20 32.05

90.0 90.0 90.0 90.0

φ0

φidealc

σφ

3716 3211 5546 4590

-36.7 -57.4 49.6 -54.1

-60.0 -60.0 60.0 -60.0

9.76 8.12 11.00 10.60

1764 3805 3175 2278 6503

42.6 60.2 -40.2 -60.2 49.4

60.0 60.0 60.0

18.18 6.79 16.42 6.99 12.05

6611

-54.9

-60.0

9.81

2214 8247

47.9 61.3

60.0 60.0

13.62 6.03



σθ R β

d

R β R β

R β

a

The cluster numbers are defined in Figure 7. b The parameters in the Gaussian fitting are described in the Computational Methods section. The ideal values are expected from the geometries of the lone pairs and N-H bonds viewed from the polar coordinates defined in Figure 7. d The Gaussian components in the φ-distributions are defined and shown in Figure 7d and h. e The Gaussian components in the r-distributions are defined and shown in Figure 7b and f. f The Gaussian components in the θ-distributions are defined and shown in Figure 7c and g. c

TABLE 4: Statistics of Data Used in Examining the Tetrahedral H-Bond Geometry of Hydration Water Molecules and the Gaussian Fitting Parameters for the Probability Distributions Gaussian fitting parametersb θ-distributionc

r-distribution clustera

H-bond partners of water molecules

1

polar protein atom

2

water oxygen atom polar protein atom

3

water oxygen atom polar protein atom

4

water oxygen atom polar protein atom water oxygen atom

N O N O N O N O

no. of partners

Ar

r0

σr

261 123 565 407 61 390 253 469 376 655 257 796 97 687 142 486 647 747 98 435 138 310 651 175

12 048 29 125 3035 10 074 16 909 12 763 3974 6494 28 808 4286 6370 29 185

2.93 2.76 2.76 2.95 2.75 2.76 2.94 2.76 2.76 2.94 2.76 2.76

0.117 0.112 0.109 0.129 0.114 0.115 0.134 0.118 0.123 0.125 0.121 0.122

Aθ θtotal θpwp θpww θwww

ξ η ξ η ξ ξ η

183 627 54 323 46 640 22 815 35 564 88 853 26 222 49 358

θ0

θideald

σθ

108.7 70.7 108.5 66.6 54.2 110.3 74.6 106.0

109.5

16.51 12.60 18.53 12.00 2.63 16.03 11.76 15.35

109.5 109.5 109.5

a

The cluster-numbers are defined in Figure 8. b The parameters in the Gaussian fitting are described in the Computational Methods section. The Gaussian components in the θ-distributions are defined in Figure 8c. d The ideal θ-values in the tetrahedral geometry in H-bonds by water molecules. c

3.4. Distributions around Side Chains of Arginine, Lysine, Histidine, and Tryptophane. The N-H and N-H2 groups in arginine and the N-H3 group in lysine act as proton donors in H-bonds with water molecules under neutral pH conditions. The imidazole group of histidine interchanges the neutral or positively charged state around neutral pH. (i) Arginine. The guanydyl group stabilized by the resonance of electrons among the NH1, NH2, and CZ atoms remains protonated under most circumstances. Hydration water molecules around the group are condensed in five clusters (Figure 4a). The centers of the clusters are located in line with the N-H bonds of the NE, NH1, and NH2 atoms in the sp2-hybridization. The r-distributions of the clusters show single peaks at 2.85-2.91 Å (Figure 4b), and are broader than that in cluster 1 in the peptide bond (Table 2). The θ-distributions of the clusters are similar to each other (Figure 4c). The φ-distribution of

isolated cluster 5 is as sharp as that around the N-H group of the peptide bond. Clusters 1-4 show φ-distributions with σ-values about twice that of cluster 5. (ii) Lysine. The reactive amino group is ordinarily protonated under neutral pH conditions. The distribution of hydration water molecules is condensed in the three clusters (Figure 4e). The clusters are almost in the 3-fold rotational symmetry as to the φ-angle (Figure 4h and Table 2), and the peaks are in line with the NZ-H bonds expected from the conformation of the NZ-H3 group against the CD-CE bond (Figure 4h). The broad φ-distribution in each cluster may come from the bifurcated H-bonds found in the crystal structure of amino acids.51 The significant tailing beyond 3 Å in the r-distribution (Figure 4f) suggests the NZ-H3 group is more attractive than the N-H groups in the peptide bond and arginine and/or reflects the positional fluctuations of the water molecules hydrating the tip

11280

J. Phys. Chem. B, Vol. 113, No. 32, 2009

of the flexible side chain. The major peaks in the θ-distributions have similar widths to those in arginine (Figure 4g and Table 2). (iii) Histidine. The basic site NE2 of the imidazole group accepts a proton to form a conjugate acid. Because the protonation and deprotonation states of the ND1 and NE2 atoms are unknown in most of X-ray analyses, the distributions described here are the ensemble of the two states. Under neutral pH conditions, a nuclear-magnetic-resonance titration is necessary to elucidate the protonation states of imidazole groups as demonstrated in our previous study.52 The distributions of hydration water molecules are condensed in the directions of the ND1-H and NE2-H bonds (Figure 5a

Figure 3. Continued.

Matsuoka and Nakasako and Table 2). The r-distributions displaying peaks at 2.77-2.79 Å resemble those of NZ-H3 groups in lysine (Figure 5b). The φ- and the θ-distributions approximated by single Gaussians are similar to those in cluster 1 in the peptide bond (Figure 5c and d). (iW) Tryptophane. The NE1 atom of the indole ring provides one hydration site (Figure 5e). The r-distribution has a peak at 2.92 Å (Figure 5f), and the σ-value is in the middle between those of lysine and histidine (Table 2). The φ-distribution resembles that of cluster 1 in histidine, while the θ-distribution is similar to that of cluster 2 in histidine (Figure 5g and h and Table 2).

Hydration Structure around Polar Protein Atoms

J. Phys. Chem. B, Vol. 113, No. 32, 2009 11281

Figure 3. Probability distributions around the carboxyl groups of glutamate (panels a-d) and aspartate residues (e-h). The fishnets visualizing the spatial distributions in panels a and e are contoured at the 126/voxel and 133/voxel levels, respectively. The distributions as to r (panels b and f), θ (c and g), and φ (d and h) are approximated by a single or some Gaussians (Table 2). The other details are as in Figure 2.

3.5. Distribution around Hydroxyl Groups of Serine, Threonine, and Tyrosine. (i) Serine. The hydroxyl group is known to have a very weakly acidic property. Water molecules distribute in a semicircular shape around the OG atom (Figure 6a). The r-distribution (Figure 6b) displays a profile similar to those around oxygen atoms of the peptide bond, glutamate, and aspartate (Table 2). The θ-distribution is almost approximated by a single Gaussian with the σ-value similar to those in arginine, lysine, histidine, and tryptophane rather than glutamate and aspartate (Table 2).

The φ-distribution is approximated by 10 Gaussians, and clusters 1-3 are major probably because of the sp3hybridazation of the OG atom and the conformation of the OG-HG bond against the CA-CB bond (Figure 6d). The φ-peaks of clusters 1 and 3 shift toward cluster 2 from the ideal φ-angles defined by the hybridization of the OG atom (Table 2). The main chain atoms probably limit the rotation of the OG-HG bond and hinder the adsorption of hydration water molecules to the ideal positions, and distort the hydration potential.

11282

J. Phys. Chem. B, Vol. 113, No. 32, 2009

(ii) Threonine. The distribution is quite similar to that of serine with regard to the r- and the θ-distributions (Figure 6e-g and Table 2). The φ-distribution is approximated with nine Gaussians. Subcomponent R near cluster 2 becomes larger than the corresponding component in serine, and the amount of water molecules in cluster 3 decreases in threonine (Figure 6h) probably because of the perturbation from the CG2-H3 group. (iii) Tyrosine. Two clusters appear around the hydroxyl group. They are likely in the 2-fold rotational symmetry as to the CZ-OH bond (Figure 6i), and their centers are in the plane of the phenyl ring (Table 2). The profiles of the r-distributions in the two clusters are similar to those in serine and threonine, glutamate, and aspartate (Figure 6j and Table 2). The sharp

Figure 4. Continued.

Matsuoka and Nakasako φ-distributions (Figure 6l and Table 2) are ascribed to the H-bond in line with the OH-HH bond. A number of water molecules are located out of plane of the phenyl ring (Figure 6k). The σ-values in the θ-distributions are about twice that in the major peak in the φ-distribution of threonine or serine. The van der Waals contacts of water molecules with the CE1-H or CE2-H atoms are possibly responsible for the broad angular distribution. 3.6. Distribution around Amide Groups in Glutamine and Asparagine. (i) Glutamine. The amide group is not acidic but is polar and able to participate in H-bonds. Two clusters of water molecules lie in the directions of the lone-pair electrons of the OE1 atom (clusters 1 and 2), and two in line with the N-H

Hydration Structure around Polar Protein Atoms

J. Phys. Chem. B, Vol. 113, No. 32, 2009 11283

Figure 4. Probability distributions around the guanydyl group of arginine (panels a-d) and the N-H3 group of lysine (e-h). The density maps in panels a and e are contoured at the 58/voxel and 64/voxel levels, respectively. The distributions as to r (panels b and f), θ (c and g), and φ (d and h) are approximated by a single or some Gaussians (Table 2). For the overlapped densities between closely contacting pairs of clusters 1-2 (or 3-4) around the guanydyl group, we simply separate the r, θ-, φ-distributions and distributions at the midplane for the Gaussian fitting. The other details are as in Figure 2.

bonds of the NE2-H2 group (clusters 3 and 4). The profiles of the r-, θ-, and φ- distributions in clusters 1 and 2 resemble those around glutamate and aspartate (Figure 7a and Table 3) except for the long tails in the r-distribution in cluster 2 because of the confusion of the side chain conformations as described below. Clusters 3 and 4 show the r-distributions approximated by two Gaussians (Figure 7b and Table 3). This is possibly ascribed to the confusion of the OE1 and NE2 atoms in the modeling processes with the ε- and the µ-components coming from the water molecules hydrating oxygen and nitrogen atoms, respectively. This confusion could also be the major cause for the θ-distributions approximated by two Gaussians (ξ and η in Figure 7c and Table 3). The ξ-component has a σ-value similar

to those in the other N-H groups, and the σ-value in the η-component is comparable with those of oxygen atoms in glutamate. The φ-distributions of clusters 1 and 4 are composed of two peaks as in cluster 1 of glutamate and aspartate and cluster 2 of carbonyl oxygen in a peptide bond (Figure 7d and Table 3). (ii) Asparagine. Four clusters display shapes and sizes similar to those around glutamine (Figure 7e and Table 3). The r- and the θ-distributions around the ND2 atom also suggest the confusion in the modeling as for glutamine (Figure 7f and g). The φ-distributions of all sites are similar to those in glutamine, but cluster 2 has its θ-distribution approximated by three Gaussians centered at 60, 90, and 120° (Figure 7h). The fine

11284

J. Phys. Chem. B, Vol. 113, No. 32, 2009

Matsuoka and Nakasako

Figure 5. Probability distributions around the N-H groups of histidine (panels a-d) and tryptophane (e-h). The density maps in panels a and e are contoured at the 44/voxel and 23/voxel levels. The distributions as to r (panels b and f), θ (c and g), and φ (d and h) are approximated by a single or some Gaussians (Table 2). The other details are as in Figure 2.

structure resembles those of cluster 3 of the peptide bond and cluster 2 of aspartate. 3.7. Probability Distribution of H-Bond Partners around Hydration Water Molecules. The water molecule is generally considered to have four potential H-bonding sites placed approximately tetrahedrally around the oxygen.42,43 The geometry is the structural basis for three-dimensional networks of H-bonds.16,43 We pick up hydration water molecules having four H-bond partners, and compare the positions of the partners with

the standard to examine statistically whether the geometry is kept in the vicinity of protein surfaces (Figure 8a and Table 4). The probability distribution is quite similar to the spatial distribution function of the first neighbor environment in liquid water visualized by the neutron scattering experiments.53 The r-distributions for polar protein atom-water molecule pairs appear as the sum of those shown in Figures 2-7 (Figure 8b). The large tails are probably due to the incorporation of four H-bond partners in the calculation, because three of four are

Hydration Structure around Polar Protein Atoms not the closest neighbors of the targeted water molecule. The r-distributions for water-water pairs have peaks at 2.76 Å. The angular distributions are evaluated separately in the three cases taking the combinations of H-bond partners as defined in Figure 8c. Each profile has a major component centered at 106-110° expected from the standard geometry (Table 4). The σ-values of 15-19° probably come from the positional errors in the structural analyses and the intrinsic positional fluctuations. While the profile in the θwww distribution is almost approximated by a Gaussian (Figure 8c), the θpww and θpwp distributions have two major Gaussian components (ξ and η). The profile of the ξ-component is similar to that of the θwww distribution, and the η-component is composed of water molecules in the H-bond geometry illustrated in Figure 8d or in the trigonal arrange-

Figure 6. Continued.

J. Phys. Chem. B, Vol. 113, No. 32, 2009 11285 ment.43 The ζpwp-component in the θpwp-distribution includes, for instance, water molecules located at the midpoints of the NH1 and NH2 atoms of arginine (Figure 8d). Thus, except for the sub components in the θpwp- and θpwwdistributions, the present results confirm that water molecules engaged in protein hydration retain the tetrahedral H-bond geometry, as speculated in the previous studies.28,32 4. Discussion The probability distributions of hydration water molecules around polar protein atoms are obtained through the systematic analysis on a huge number of water molecules in 17 984 highresolution crystal structures solved by using the diffraction data

11286

J. Phys. Chem. B, Vol. 113, No. 32, 2009

Matsuoka and Nakasako

Figure 6. Probability distributions around the OH groups of serine (panels a-d), threonine (e-h), and tyrosine (i-l). The density maps in panels a, e, and i are contoured at the 98/voxel, 120/voxel, and 81/voxel levels, respectively. The distributions as to r (panels b, f, and j), θ (c, g, and k), and φ (d, h, and l) are approximated by a single or some Gaussians (Table 2). The other details are as in Figure 2.

collected at cryogenic temperatures. Here, we discuss the common characteristics found in the distributions, the relevance to the structures and functions of proteins, and the possible utilization of the distributions for studying the hydration structures of proteins. 4.1. Common Characteristics in the Probability Distributions. Figure 9a shows a correlation between the peak positions in the r-distributions and the widths in the angular distributions (Table 2). The angular distributions are apparently narrower around NsH and OsH groups acting as proton donors than CdO groups as acceptors. Water oxygen atoms tend to distribute along the direction of the N (or O)sH bonds donating protons, because the donor protons are located in a limited angular range around the bond axes (Figure 9a). Around the CdO groups, the position of the proton donated from the water molecule is likely more important than the orientation of the OsH bond in the water molecule relative to the CdO bond axis (Figure 9a). The peak positions in the r-distributions around CdO and OsH groups are located in a narrow range between 2.65 and 2.75 Å (Figure 9b), and are independent of the hybridization states of the oxygen atoms (Table 1). Thus, the CdO and OsH groups contribute to close contacts of water molecules at the protein surface. In contrast, NH, NH2, and NH3 groups are

advantageous to attract water molecules as seen in the tails in the r-distributions extended to a longer distance than those of the CdO and OsH groups. These may reflect possible differences in the potential minima and the profiles in the OsH · · · O and NsH · · · O bonds (Figure 9b). Thus, the NsH groups may exert control over the positions of water oxygen atoms in a narrow angular range around the bond axis. On the other hand, the CdO groups seem to induce closer contacts of water molecules than the NsH groups but exert less control over the angular distribution of the water oxygen atoms. These tendencies may be limited in the H-bonds between the polar protein atoms and hydration water molecules in the scope of the present study. However, if they are retained in the molecular interactions, CdO and NsH groups will both be involved with roles in the molecular recognition. For instance, in the complex of an enzyme and a DNA fragment, the NH groups of the DNA base are advantageous to control the directions of H-bonds with the CdO groups in the enzyme and/or hydration water molecules mediating H-bonds. 4.2. Fine Structures in the Angular Distributions. Basically, hydration sites are expected to be formed along the directions of the lone pair electrons of oxygen atoms, NsH, or

Hydration Structure around Polar Protein Atoms OsH bonds. Some exceptions are found in the present analysis as fine structures in the angular distributions around CdO groups in the peptide bond (Figure 2), aspartate (Figure 3), and asparagine (Figure 7). A possible reason for the fine structures is the van der Waals interactions between a water molecule, the CdO groups, and environmental main and/or side chain groups. Because the H-bonds between a CdO group and a water molecule show a robustness with regard to the angular distributions of water molecules as described, water molecules are able to accommodate to hydration potentials deformed by environmental perturbations. The fine structures in the H-bonds may couple with the conformational variations of the side chains with CdO groups

Figure 7. Continued.

J. Phys. Chem. B, Vol. 113, No. 32, 2009 11287 on the protein surface. For instance, the θ-distributions around OD1 atoms of aspartate and asparagine split into three components. As reported,54 the side chains sometimes preferentially form H-bonds with main chain atoms so that they cap the N-terminal end of R-helices. In the conformation, the OD1 atoms are known to form H-bonds with water molecules out of the plane of the side chain. Such a mode of side chain conformation may be collectively induced and stabilized by the water molecules capable of sitting in the deformed hydration sites. 4.3. Probability Distributions at Ambient Temperatures. The present results are obtained from the crystal structures below 150 K. To understand the local hydration structures

11288

J. Phys. Chem. B, Vol. 113, No. 32, 2009

Matsuoka and Nakasako

Figure 7. Probability distributions around the amide groups of glutamine (panels a-d) and asparagine (e-h). The density maps in panels a and e are contoured at the 56/voxel and 66/voxel levels, respectively. The distributions as to r (panels b and f), θ (c and g), and φ (d and h) are approximated by a single or some Gaussians (Table 3). The r-distributions around the N-H2 groups are approximated by two Gaussians designated as ε and µ. The other details are as in Figure 2.

at ambient temperatures, the same protocol is applied for the 3558 crystal structures solved above 273 K (Figures S10-S14 and Tables S10 and S11, Supporting Information), despite the smaller number of hydration water molecules than in the cryogenic data. Figure 9c compiles some topics in the comparison. In the r-distributions, the peak positions are independent of temperature, but the profiles at ambient temperatures tend to become broader as indicated by the σ-values of about 1.1-1.3 times larger than those below 150 K. The

peak positions in the angular distributions are almost independent of temperature. The σθ-values at ambient temperatures are slightly larger than those below 150 K, and the σφ-values are almost independent of temperature except for clusters 1-3 of lysine and cluster 3 in threonine. These results suggest that the H-bond potentials are sensitive to temperature with regard to the relative distances between the polar protein atoms and water molecules rather than the angular distributions.

Hydration Structure around Polar Protein Atoms

J. Phys. Chem. B, Vol. 113, No. 32, 2009 11289

Figure 8. Probability distributions of four H-bond partners around a hydration water molecule. (a) A stereoplot illustrating the spatial distributions of polar protein atoms (blue fishnets) and water molecules (pink) both contoured at the 915/voxel level, separately. (b) The r-distributions of the oxygen atoms of proteins (open circles for the observed frequencies and the red line from the Gaussian approximation of the main peak), nitrogen atoms (triangles for the observed frequencies and the blue line for the Gaussian approximation), and water molecules in H-bond partners (dots for the observed frequencies and the green line for the Gaussian approximation). (c) The θ-angles are defined separately for the combinations of [polar atom]-[water]-[polar atom] (abbreviated as pwp), [polar atom]-[water]-[water] (pww), and [water]-[water]-[water] (www). In each combination, the observed profile (open circles) is approximated by Gaussians. The red lines are the sum of all Gaussians (the green and blue lines). The Gaussian parameters of the major Gaussians (green) (ξ, η, and ζ in the θ-distribution) are compiled in Table 4. Panel d illustrates the arrangements of water molecules and polar protein atoms composing the η- and ζ-components in the θ-distribution.

The hydration sites observed at cryogenic temperatures are roughly consistent with the probability distributions obtained from the simulations for the explicit water system at 300 K.29 Thus, when

taking the temperature dependence in the distributions, the probability distributions at cryogenic temperatures are useful in considering the hydration structures of proteins in solution.

11290

J. Phys. Chem. B, Vol. 113, No. 32, 2009

Matsuoka and Nakasako

[

F(b) r ) A exp -

Figure 9. (a) The left panel shows the correlation between the peak positions in the major Gaussian components in the r-distributions and the σ-values in the major components of the θ-distributions (Table 2). The red, blue, and green symbols indicate the values in the distributions around CdO, OsH, and NsHn (n ) 1-3) groups, respectively. The right panel illustrates schematically the reasons why the correlation in the left panel is observed. (b) The left panel shows the correlation between the peak positions and the σ-values in the major Gaussian components (Table 2). The values are plotted the same way as in panel a. The right panel schematically illustrates the potential curves causing the correlation in the left panel. (c) The parameters in the Gaussian approximations for major peaks in the probability distributions at ambient temperature (Table S11, Supporting Information) are plotted against those at cryogenic temperature (Table 2). The plotted parameters are the peak positions and the σ-values in the r-distributions, the σ-values in the φ-distributions, and the σ-values in the θ-distributions. The symbols are used as in panel a.

For more rigorous analysis on the temperature dependent variations of the probability distributions, the three-dimensional profiles of the effective H-bond potentials will be obtained, for instance, by the following equation:

]

W(b) r , kBT

W(b) r )

r V(b, r b') r d3r' ∫ F(b')

where F(r b) is the probability distribution, A is a constant, kB is Boltzmann’s constant, T is the absolute temperature, and V(r b,r b′) is the H-bond potential. However, for applying the equation, the coordinate errors in the crystal structures41 must be taken into consideration. The coordinate error is one of the causes for the increase of the σ-values in the probability distributions. Indeed, the σ-values tend to be smaller, when the analysis is limited to the structure models solved at higher resolutions (Figures S10-S14 and Table S9, Supporting Information). In addition to X-ray data, geometric data including hydrogen atoms from neutron crystallography may aid in a more detailed description of H-bond potentials in hydration structures like in the database analysis16,55 for crystal structures of organic small molecules or in the crystal structure analysis on the hydration of a vitamin B12 molecule.56 4.4. Tetrahedral Interaction Geometry of Water H-Bonds. A number of pairs of polar protein atoms distribute to satisfy the tetrahedral H-bond geometry around a water molecule on the protein surface as demonstrated by the ξpwp-component in the θpwp-distribution (Figure 8c). This result suggests the possibility that a hydration water molecule tunes the positions of two polar atoms in the H-bond partners even when the residues including the polar atoms are apart from each other in the primary sequence. Thus, in the folding process of proteins, water molecules hydrating polar atoms may act as anchor points to stabilize the surface structures of proteins and to assist the local folding of surface residues, besides the possible role of water molecules to induce the aggregations of hydrophobic cores in soluble proteins. A method to evaluate rationally the stability of globular proteins was proposed by focusing H-bonds between polar protein atoms and hydration water molecules.57 The present distribution function on the H-bond by water molecules at the protein surface may provide useful data to discuss why proteins are stabilized in an aqueous environment from the point of view of H-bond interactions. 4.5. Utilization of the Probability Distribution. Hydration water molecules are identified from discrete and spherical densities in crystal structure analyses. Errors from the diffraction intensity data and the estimated phase sets sometimes result in ghost density peaks resembling those of water molecules with high positional fluctuations. The probability distributions may provide benefits to avoid misassignments of the ghost peaks to water molecules by mapping the positions of the density peaks in the distributions obtained in the present study like the validation of protein conformations,58 although this examination is limited to density peaks around polar atoms. In addition, the peak positions and widths in the distributions may be taken into account in the structure refinement stage as the restraints for hydration water molecules with appropriate weighting factors like those used for the conformations of proteins.59 The distribution will be utilized in a knowledge-based prediction of the most likely distributions of hydration water molecules around proteins like the program AQUARIUS2.60 The development of a program suite with a novel algorithm is under way. Conclusions To elucidate general characteristics in the interactions of water molecules in protein hydration, the probability distributions of hydration water molecules are calculated from 4 831 570

Hydration Structure around Polar Protein Atoms hydration water molecules around 4 214 227 polar protein atoms from 17 984 high-resolution protein structures solved using diffraction data collected below 150 K. The sufficient number of the water molecules enables us to evaluate quantitatively the distributions around the polar atoms in the main chains and those of the side chains in 11 types of hydrophilic amino acid residues. The probability distributions appear nonrandom but are condensed as clusters located in line with the NsH and OsH bonds and the lone-pairs of the CdO groups. Quantitative analyses for the profiles of the probability distributions by the Gaussian fitting method demonstrated the substantial differences of the hydrogen-bond geometries around the NHn (n ) 1, 3), OH, and CO groups. The analysis on the hydrogen-bond geometry of water molecules demonstrated that most of polar protein atoms are arranged generally to satisfy the tetrahedral hydrogen-bond geometry of water molecules as well as the water molecules in liquid water, suggesting a geometrical role for water molecules in the folding process and the structural stability of proteins. The probability distributions and their parameters are useful to better understand the interactions of water molecules and proteins. Because a water molecule might be thought of a nanoscopic probe to survey the capabilities as to the H-bond formation on a molecular surface, the probability distributions obtained here may be applicable to investigate the surface properties of molecules with the CdO, OsH, and NsHn groups as well as proteins. Acknowledgment. We are grateful to Professor Haruki Nakamura of the Protein Research Institute of Osaka University for his kind advice to obtain the structurally nonhomologous protein structures and also to Dr. Yasumasa Joti of the University of Tokyo for his fruitful comments to the manuscript. This work was supported by the grants-in-aid from the MEXT Japan (grants 15076210 and 20050030) and the JSPS Japan (grant 1920402) to M.N. Supporting Information Available: The frequencies of thermal factors of polar protein atoms and hydration water molecules used in the analysis are plotted in Figure S1. Figures S2-S8 show the probability distributions obtained for 1265 structurally nonhomologous proteins, and the fitting parameters are listed in Tables S3 and S4. Figure S9 demonstrates the conservation of hydration sites as to 42 crystal structures of T4-lysozyme. Figures S10-S14 depict the profiles of the probability distributions calculated under various conditions for all main chains and the side chains of aspartate, lysine, and tyrosine residues. The PDB accession codes of the used crystal structures are listed in Tables S1, S2, S5, and S10. The Gaussian fitting parameters in Figures S10-S14 are compiled in Tables S6-S9 and S11. This material is available free of charge via the Internet at http://pubs.acs.org. References and Notes (1) Finney, J. L. Philos. Trans. R. Soc. London, Ser. B 1977, 278, 3– 32. (2) Edsall, J. T.; MacKenzie, H. A. Water and Proteins. II. The Location and Dynamics of Water in Protein Systems and Its Relation to Their Stability and Properties. In AdVances in Biophysics; Kotani, M., Noda, H., Eds.; Japan Scientific Society Press: Tokyo, 1983; Vol. 16, pp 53-182. (3) Baker, E. N.; Hubbard, R. E. Hydrogen Bonding in Globular Proteins. In Prog. Biophys. Mol. Biol.; Noble, D., Blundell, T. L., Eds.; Pergamon Press: Oxford, U.K., 1984; Vol. 44, pp 97-179. (4) Chaplin, M. Nat. ReV. Mol. Cell Biol. 2006, 7, 861–866. (5) Rupley, J. A.; Careri, G. Protein hydration and function. In AdVances in Protein Chemistry; Anfinsen, C. B., Edsall, J. T., Richards, F. M., Eisenberg, D. S., Eds.; Academic Press: London, 1991; Vol. 41, pp 37-172.

J. Phys. Chem. B, Vol. 113, No. 32, 2009 11291 (6) Ferrand, M.; Dianoux, A. J.; Petry, W.; Zaccai, G. Proc. Natl. Acad. Sci. U.S.A. 1993, 90, 9668–9672. (7) Chen, S.-H.; Liu, L.; Fratini, E.; Baglioni, P.; Faraone, A.; Mamontov, E. Proc. Natl. Acad. Sci. U.S.A. 2006, 103, 9012–9016. (8) Graber, M.; Bousquet-Dobouch, M.-P.; Lamare, S.; Legoy, M.-D. Biochim. Biophys. Acta 2003, 1648, 24–32. (9) Lind, P. A.; Daniel, R. M.; Monk, C.; Dunn, R. V. Biochim. Biophys. Acta 2004, 1702, 103–110. (10) Watenpaugh, K. D.; Margulis, T. N.; Sieker, L. C.; Jensen, L. H. J. Mol. Biol. 1978, 122, 175–190. (11) Blake, C. C.; Pulford, W. C.; Artymiuk, P. J. J. Mol. Biol. 1983, 167, 693–723. (12) Baker, E. N.; Blundell, T. L.; Cutfield, J. F.; Cutfield, S. M.; Dodson, E. J.; Dodson, G. G.; Hodgkin, D. M.; Hubbard, R. E.; Isaacs, N. W.; Reynolds, C. D.; Sakabe, K.; Sakabe, N.; Vijayan, N. M. Philos. Trans. R. Soc. London, Ser. B 1988, 319, 369–456. (13) Savage, H.; Wlodawer, A. Determination of Water Structure around Biomolecules Using X-Ray and Neutron Diffraction Methods. In Methods in Enzymology; Packer, L., Ed.; Academic Press: London, 1986; Vol. 127, pp 162-183. (14) Finner-Moore, J. S.; Kossiakoff, A. A.; Hurley, J. H.; Earnest, T.; Stroud, R. M. Proteins: Struct., Funct., Genet. 1992, 12, 203–222. (15) Niimura, N.; Chatake, T.; Osterman, A.; Kurihara, K.; Tanaka, I. Z. Kristallogr. 2003, 218, 96–107. (16) Savage, H. Water Structure in Crystalline Solids: Ices to Proteins. In Water Science ReView; Franks, F., Ed.; Cambridge University Press: New York, 1986; Vol. 2, pp 67-184. (17) Morozov, A. V.; Kortemme, T. Potential functions for hydrogen bonds in protein structure prediction and design. In AdVances in Protein Chemistry; Baldwin, R. L., Baker, D., Eds.; Academic Press: New York, 2005; Vol. 72, pp 1-38. (18) Bottoms, AC. A.; White, T. A.; Tanner, J. Proteins: Struct., Funct., Bioinf. 2006, 64, 404–421. (19) Nakasako, M.; Odaka, M.; Yohda, M.; Dohmae, N.; Takio, K.; Kamiya, N.; Endo, I. Biochemistry 1999, 38, 9887–9898. (20) Rodier, F.; Bahadur, R. P.; Chakrabarti, P.; Janin, J. Proteins: Struct., Funct., Bioinf. 2005, 60, 36–45. (21) Bhat, T. N.; Bentley, G. A.; Boulot, G.; Greene, M. I.; Tello, D.; Dall’Acqua, W.; Souchon, H.; Schwarz, F. P.; Mariuzza, R. A.; Poljak, R. J. Proc. Natl. Acad. Sci. U.S.A. 1994, 91, 1089–1093. (22) Nakasako, M.; Zikihara, K.; Matsuoka, D.; Katsura, H.; Tokutomi, S. J. Mol. Biol. 2008, 381, 718–733. (23) Nakasako, M.; Oka, T.; Masumo, K.; Takahashi, H.; Shimada, I.; Yamaguchi, Y.; Kato, K.; Arata, Y. J. Mol. Biol. 2005, 351, 627–640. (24) Nakasako, M.; Fujisawa, T.; Adachi, S.; Kudo, T.; Higuchi, S. Biochemistry 2001, 40, 3069–3079. (25) Zhang, Z.; Komives, E. A.; Sugio, S.; Blacklow, S. C.; Narayana, N.; Xuong, N. H.; Stock, A. M.; Petsko, G. A.; Ringe, D. Biochemistry 1999, 38, 4389–4397. (26) Leucke, H.; Schobert, B.; Richter, H. T.; Cartailler, J. P.; Lanyi, J. K. J. Mol. Biol. 1999, 291, 899–911. (27) Garczarek, F.; Gewert, K. Nature 2005, 439, 109–112. (28) Nakasako, M. Philos. Trans. R. Soc. London, Ser. B 2004, 359, 1191–1206. (29) Higo, J.; Nakasako, M. J. Comput. Chem. 2002, 23, 1323–1336. (30) Merzel, F.; Smith, J. Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 5378– 5383. (31) Teeter, M. M. Proc. Natl. Acad. Sci. U.S.A. 1984, 81, 6014–6018. (32) Nakasako, M. Cell. Mol. Biol. 2001, 47, 767–790. (33) Nakasako, M. J. Mol. Biol. 1999, 289, 547–564. (34) Yokomizo, T.; Higo, J.; Nakasako, M. Chem. Phys. Lett. 2005, 410, 31–35. (35) Somlin, N.; Oleinkova, A.; Brovchenko, I.; Geiger, A.; Winter, R. J. Phys. Chem. B 2005, 109, 10995–11005. (36) Garman, E. F.; Schneider, T. R. J. Appl. Crystallogr. 1997, 30, 211–237. (37) Dunlop, K. V.; Irvin, R. T.; Hazes, B. Acta Crystallogr., Sect. D 2005, 61, 80–87. (38) Henrick, K.; Feng, Z.; Bluhm, W. F.; Dimitropoulos, D.; Doreleijers, J. F.; Dutta, S.; Flippen-Anderson, J. L.; Ionides, J.; Kamada, C.; Krissinel, E.; Lawson, C. L.; Markley, J. L.; Nakamura, H.; Newman, R.; Shimizu, Y.; Swaminathan, J.; Velankar, S.; Ory, J.; Ulrich, E. L.; Vranken, W.; Westbrook, J.; Yamashita, R.; Yang, H.; Young, J.; Yousufuddin, M.; Berman, H. M. Nucleic Acids Res. 2008, 36, D426–D433. (39) Thanki, N.; Thornton, J. M.; Goodfellow, J. M. J. Mol. Biol. 1988, 202, 637–657. (40) Thanki, N.; Umrania, Y.; Thornton, J. M.; Goodfellow, J. M. J. Mol. Biol. 1991, 221, 669–691. (41) Luzzati, P. V. Acta Crystallogr. 1951, 5, 803–810. (42) Eisenberg, D.; Kauzmann, W. The structure and properties of water; Oxford at the Clarendon Press: London, 1969. (43) Finney, J. L. Philos. Trans. R. Soc. London, Ser. B 2004, 359, 1145– 1165.

11292

J. Phys. Chem. B, Vol. 113, No. 32, 2009

(44) Kabsch, W. A. Acta Crystallogr., Sect. A 1976, 32, 922–923. (45) Bru¨nger, A. T. X-PLOR Version 3.1: A system for X-ray crystallography and NMR; Yale University Press: New Haven, CT, 1992. (46) Hobohm, U.; Sander, C. PDBselect - selection of a representative set of PDB chains. available at http://bioinfo.tg.fh-giessen.de/pdbselect/ recent.pdb_select25 (accessed March 28, 2009). (47) Hobohm, U.; Scharf, M.; Schneider, R.; Sander, C. Protein Sci. 1992, 1, 409–417. (48) Hobohm, U.; Sander, C. Protein Sci. 1994, 3, 522. (49) Taylor, R.; Kennard, O.; Versichel, W. J. Am. Chem. Soc. 1984, 106, 244–248. (50) Preiβner, R.; Egner, U.; Saenger, W. FEBS Lett. 1991, 288, 192– 196. (51) Jeffrey, G. A.; Mitra, J. J. Am. Chem. Soc. 1984, 106, 5546–5553. (52) Nakasako, M.; Takahashi, H.; Shimba, N.; Shimada, I.; Arata, Y. J. Mol. Biol. 1999, 291, 117–134. (53) Finney, J. L. Struct. Chem. 2002, 13, 231–246.

Matsuoka and Nakasako (54) Penel, S.; Hughes, E.; Doig, A. J. J. Mol. Biol. 1999, 287, 127– 143. (55) Steiner, T. H.; Seanger, W. Acta Crystallogr., Sect. B 1992, 48, 819–827. (56) Savage, H. Biophys. J. 1986, 50, 967–980. (57) Savage, H. J.; Elliot, C. J.; Freeman, C. M.; Finney, J. L. J. Chem. Soc., Faraday Trans. 1993, 89, 2609–2617. (58) Laskowski, R. A.; MacArthur, M. W.; Moss, D. S.; Thornton, J. M. J. Appl. Crystallogr. 1993, 26, 283–291. (59) Engh, R. A.; Huber, R. Acta Crystallogr., Sect. A 1991, 47, 392– 400. (60) Pitt, W. R.; Murray-Rust, J.; Goodfellow, J. M. J. Comput. Chem. 1993, 14, 1007–1018. (61) DeLano, W. L. The PyMOL Molecular Graphics System; DeLano Scientific: Palo Alto, CA, 2002 (available at http://www.pymol.org).

JP902459N