Prediction of Hydration Structures around Hydrophilic Surfaces of

Mar 4, 2010 - Empirical Hydration Distribution Functions from a Database Analysis† .... probability density calculated by summing up the EHDF of pol...
0 downloads 0 Views 10MB Size
4652

J. Phys. Chem. B 2010, 114, 4652–4663

Prediction of Hydration Structures around Hydrophilic Surfaces of Proteins by Using the Empirical Hydration Distribution Functions from a Database Analysis† Daisuke Matsuoka‡ and Masayoshi Nakasako*,‡,§ Department of Physics, Faculty of Science and Technology, Keio UniVersity, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Kanagawa 223-8522, Japan, and RIKEN Harima Institute, 1-1-1 Kouto, Mikaduki, Sayo, Hyogo, Japan ReceiVed: October 20, 2009; ReVised Manuscript ReceiVed: January 6, 2010

We developed a knowledge-based program to predict hydration structures around hydrophilic surfaces of proteins as probability density. In the program, we assume that the three-dimensional distribution of hydration water molecules on a hydrophilic surface is reconstructed by summing up the empirical hydration distribution function of each solvent-exposed polar atom composing the surface. The probability functions of polar atoms in the CO, NHn (n ) 1,3), and OH groups were calculated from the 17 984 protein structures solved by X-ray crystallography better than resolutions of 2.2 Å (Matsuoka, D.; Nakasako, M. J. Phys. Chem. B 2009, 113, 11274-11292). The program was first tested for human lysozyme. The predicted probability density enveloped more than 85% of crystal water sites found in the crystal structure refined at a resolution of 0.95 Å, and the density peaks suggested as hydration sites were located within 1.5 Å from more than 75% of the crystal water sites. The density reproduced the hydration structure in a solvent accessible narrow channel from the surface to the lysozyme interior. We also tested the feasibility of the program to predict the water clusters existing in the transmembrane channels of bacteriorhodopsin and aquaporin. In bacteriorhodopsin, the distributions were distinct between the ground state and the photoreaction intermediate indispensable for its function. The program reproduced the interfacial hydration in Per-Arnt-Sim-related protein-protein complex and the hydration of metastable conformations in domain motion of glutamate dehydrogenase. Taking the results for the various types of protein hydration, the present program may be a useful tool to characterize the surface properties of proteins and discuss the relevance of hydration structures to the biological functions of proteins. In addition, it will be used to predict hydration structures of proteins available at resolutions insufficient to identify water molecules. 1. Introduction Most biochemical and biophysical processes by proteins occur in an aqueous environment in living cells, and protein-water interactions have been argued to be necessary for the folding and the stabilities of proteins.1-4 It has been proposed that protein hydration correlates with the internal molecular motions of proteins, inducing their biological functions.5,6 Since the discovery of hydration water molecules in the crystal structures of proteins, so-called crystal water molecules,7-9 various experimental studies (such as nuclear magnetic resonance spectroscopy, dielectric measurement, neutron scattering, time-resolved fluorescence measurement, and Fourier-transform infrared spectroscopy) have been carried out to better understand how water molecules interact with proteins.10-14 Crystallographic studies contribute to reveal the molecular details of hydration structures over the surfaces of proteins.15,16 In the past decade, most crystal structures of proteins were determined by cryogenic † Abbreviations: 3D-RISM, three-dimensional reference interaction site model; ACE, acetylcholine esterase; AQP, aquaporin; Arnt, aryl hydrocarbon receptor nuclear translocator; ASA, accessible solvent area; BR, bacteriorhodopsin; EHDF, empirical hydration distribution function; GDH, glutamate dehydrogenase; HEWL, hen-egg-white lysozyme; HIF-2R, isoform of hypoxia-inducible factor alpha; H-bond, hydrogen bond; LOV, light-oxygenvoltage sensing domain; MD, molecular dynamics; PAS, Per-Arnt-Sim; phot2, phototropin 2. * To whom correspondence should be addressed. Phone: +81-45-5661679. Fax: +81-45-566-1672. E-mail: [email protected]. ‡ Keio University. § RIKEN Harima Institute.

X-ray crystallography. Protein structures at cryogenic temperatures and at high resolution accompany a number of hydration watermoleculesmorethantwiceofthoseatambienttemperatures.17,18 The integrated structural data on protein hydration are used to characterize quantitatively protein-water interactions19 and then to predict protein hydration through knowledge-based approaches.20-22 Besides experimental studies, molecular dynamics (MD) simulations have shown the potential to provide information on protein hydration at high temporal and spatial resolution through the improvements of the force field and the executable time scales.23-27 In addition, protein hydration is now in a scope of the statistical mechanical theory for liquids through the progress in the three-dimensional reference interaction site model (3D-RISM)28 derived from the Ornstein-Zernike equation. To date, millions of crystal water molecules identified in cryogenic X-ray crystallography have been accumulated in the Protein Data Bank (PDB),29 and the amount is at a level to bear quantitative and statistical analyses. Indeed, we have proposed the probability distribution of hydration water molecules, named “empirical hydration distribution function (EHDF)”, around polar atoms (oxygen and nitrogen atoms) in main chains and in the side chains of 11 types of hydrophilic amino acid residues (Figure 1a and Table 1).30 In the previous study, we calculated the EHDFs from the 17 984 protein structures solved by using cryogenic X-ray diffraction data at resolutions of better than 2.2 Å. Each EHDF is an ensemble average for the possible

10.1021/jp9100224  2010 American Chemical Society Published on Web 03/04/2010

Prediction of Protein Hydration

Figure 1. (a) EHDF for the NZ atom of the lysine side chain.30 The density was calculated by using 120 938 hydration water molecules selected from 17 984 protein structures (Table 1). (b) A histogram shows the frequencies on the ratio of ASA composed of polar atoms against the total of proteins. The ASA values are calculated for 17 984 crystal structures by the algorithm proposed by Lee and Richards.31 Panel a is prepared by using PyMol.74

hydration patterns around the polar atom, and describes quantitatively the anisotropic spreads from the ideal geometry in a hydrogen bond (H-bond) (Figure 1a). It should be noted that EHDFs from 1265 protein structures with structural homologies less than 25% were basically consistent with those from the set of 17 984 protein structures containing many protein families with high structural homologies.30 Thus, EHDFs are independent of the bias from the protein families in the whole set. In this study, we have developed a program to predict hydration structures on the hydrophilic protein surface as probability density calculated by summing up the EHDF of polar atoms composing the surface. Polar atoms occupy about 35-50% of the accessible solvent area (ASA) of soluble proteins (Figure 1b), and work frequently as important building blocks for chemical reactions in enzymes and molecular interactions in protein-protein complexes, and so on. The program was first applied to human lysozyme, and the predicted hydration structure was compared with the crystal water sites identified in a high-resolution crystal structure analysis. Then, we examined whether the program predicts the hydration structures in the interior of membrane proteins, at the interfaces of protein-protein complexes, and of the metastable states in the domain motion of an enzyme. We believe that this simple and knowledge-based program is a useful tool to characterize the protein surfaces from the point of view of hydration and helpful to discuss the relevance of hydration structures to the biological functions and dynamics of proteins. 2. Computational Methods Probability Density Distribution of Hydration Water Molecules. We have developed the program suite “DM_hydra” written with the FORTRAN77 language. The program predicts hydration structures on hydrophilic protein surfaces as the probability density of hydration water molecules by using the stereochemically standard structures of the main-chain peptide bond and the side chains in 11 hydrophilic amino acids (Table 1), and the EHDFs around the polar atoms (Figure 1a).30 Our previous paper reported the method to calculate EHDFs for hydration water molecules, which are located within 2.4-3.4 Å from polar protein atoms and display B-factors less than 45.0 Å2.30 Each EHDF is normalized so that the sum is equal to the number of possible H-bonds with hydration water molecules (Table 1), and the density values are assigned to the arrays of 0.05 × 0.05 × 0.05 Å3 voxels fixed to the standard structure. In the case of glutamine and asparagine, the obtained probability

J. Phys. Chem. B, Vol. 114, No. 13, 2010 4653 distribution functions are smeared by the confusion of the amide conformations in the model building.30 Thus, the EHDF for OE1 of glutamate and OD1 of aspartate are replaced with those of the glutamine and asparagine, respectively. The EHDF around the amide N atom of the main chain is used for those around NE2 atoms of glutamine and ND2 atoms of asparagine residues. In addition, the EHDF for the N-terminal and the C-terminal ends are approximated by those around the side chains of lysine and glutamate, respectively. The computational procedure for predicting the hydration structure of a protein is composed of the steps described below. The program reads the coordinates of the target protein in the PDB format. The space occupied by the protein is partitioned into 0.5 × 0.5 × 0.5 Å3 voxels to calculate the probability distribution density of a hydration water molecule. For every polar atom with a nonzero ASA value calculated by the Lee-Richards algorithm,31 the program selects a standard group including the polar atom (Table 1), and computes the rotation matrix R and the translation vector bt to superimpose the selected standard group {r bstandard} onto the group including the polar atom {r bobserve} in the protein structure as

b r observe ) R(b r standard + b t)

(1)

By using the matrix and the vector, the EHDF of the standard structure and its peak positions are transformed to the protein surface. The transformed density is accumulated in the arrays of the voxels fixed on the target protein. This procedure is iteratively carried out for all polar atoms with nonzero ASA values. When the center of a voxel is located within 1.0 Å from any polar atom or 2.3 Å from any nonpolar atom, the density of the voxel is set to zero. Finally, the calculated probability density is output in the map.dn6 format32 for the convenience in visualizing the density distribution. The local maxima of the probability density are picked up as plausible hydration sites. In this process, we survey three voxel layers around the central voxel, which includes the transformed highest peak in the original EHDF. Voxels are excluded from the calculation when their centers are located within 2.0 Å from any polar atom or within 2.5 Å from any apolar atom. The positions of local maxima are given as the density-weighted center of gravity among the surveyed voxels. The list of the calculated peak positions is written in the PDB format. We define a prediction score (P-score) to evaluate how close the local maxima in the predicted density are to the crystal water sites as

P)

1 N

N

∑ |br icrystal - br ipredict| i

and b ripredict are the coordinates of the ith crystal where b r crystal i water and its nearest peak in the density map, respectively. N is the number of crystal water sites. 3. Results 3.1. A Brief Description of the Program. The program suite “DM_hydra” computes the hydration probability density on hydrophilic protein surfaces under the assumption that the density is reproduced approximately by summing up threedimensionally the EHDFs of polar atoms composing the surface (Figure 1a and Table 1). The P-score is used to evaluate how

4654

J. Phys. Chem. B, Vol. 114, No. 13, 2010

Matsuoka and Nakasako

TABLE 1: Geometries of the Polar Groups in Amino Acid Residues and the Number of Selected and Used Atoms in the Calculation amino acid

atom

hybridization/geometrya

lone pairsb/bonds with Hc

number of possible H-bonds

number of atoms used in the density calculation

peptide bond

O N OE1 OE2 OD1 OD2 NE NH1 NH2 NZ ND1 NE2 NE1 OG OG1 OH OE1 NE2 OD1 ND2

sp2/T.p. sp2/T.p. sp2/T.p. sp2/T.p. sp2/T.p. sp2/T.p. sp2/T.p. sp2/T.p. sp2/T.p. sp3/Tet. sp2/T.p. sp2/T.p. sp2/T.p. sp3/Tet. sp3/Tet. sp2/T.p. sp2/T.p. sp2/T.p. sp2/T.p. sp2/T.p.

2/0 0/1 2/0 2/0 2/0 2/0 0/1 0/2 0/2 0/3 0/1 0/1 0/1 2/1 2/1 1/1 2/0 0/2 2/0 0/2

2 1 2 2 2 2 1 2 2 3 1 1 1 3 3 2 2 2 2 2

1 821 038 664 650 167 027 167 189 157 946 188 267 46 184 92 497 73 012 120 938 34 900 32 700 17 825 151 156 163 944 110 067 89 490 66 345 93 973 90 969

Glu Asp Arg Lys His Trp Ser Thr Tyr Gln Asn

c

a The trigonal planar and tetrahedral geometries are abbreviated as “T.p.” and “Tet.”. The number of OH or NH groups to act as donors in a H-bond.

b

The number of lone pairs available in H-bonding.

TABLE 2: Performance of the Program for Predicting the Probability Distributions around Proteins with Various Sizes proteins (PDB ID)

resolution (Å)

residues

ASAa (Å2)

human lysozyme LOV1 of phot2 (2Z6D) BR (1C3W) acetylcholine esterase (1MAH) AQP SoPIP2;1 (1Z98) GDH cytochrome c oxidase (1V55)

0.95 2.10 1.55 3.20 2.10 1.80 1.90

130 225 248 × 3 (trimer) 533 251 × 4 (tetramer) 2496 3550

6391 11 463 24 542 19 654 30 502 85 540 123 844

average B of water moleculesb (Å2)

Pc

CPU timed (s)

203 143 23

20.0 29.5 32.0

1.14 1.04 0.99

100 × 4 1269 1943

39.1 31.0 45.5

1.28 0.94 1.14

95 167 317 332 565 2653 3209

crystal waters

a ASA is calculated by the algorithm developed by Lee and Richards with a probe sphere of 1.4 Å. b The averaged B-factors are calculated for water molecules in H-bonds with polar protein atoms. c The P score is defined in the Computational Methods section. d The calculations were carried out by using a computer system composed of CPU Xeon 5160 (3.0 GHz) (Intel).

close the crystal water sites and the peak positions in the density are. The possible molecular size to be processed by the program depends only on the memory size equipped in a computer system used. The computation time is approximately proportional to the number of residues or the ASA of the target molecules (Table 2). 3.2. Predicted Hydration Distribution Density for Human Lysozyme. To examine and demonstrate the feasibility, we first applied the program to predict the hydration structure of human lysozyme in referring the crystal structure refined at a resolution of 0.95 Å to provide crystal water sites identified unambiguously (Table S1 and Figure S1, Supporting Information). Figure 2a compares the predicted probability density with the crystal water sites. The EHDFs overlap around the clusters of polar atoms, and the predicted density distributes over the molecule or penetrates to the grooves except the hydrophobic area. The probability density proposes possible hydration patterns on the surface engaged in crystal contacts. Figure 2b shows a histogram on the distances between crystal water sites and their nearest density peaks. For more than 75% of crystal water sites, density peaks are located within 1.5 Å. The close appearance is likely independent of the surface shape, as illustrated in the magnified view in Figure 2b, but depends on the polar protein atoms as discussed later. The predicted density penetrates into the narrow channel formed by residues Thr40, Tyr54-Ile56, and Gln86-Ala92

(Figure 2c). The density envelops the five crystal waters bound strongly to the residues,27 and the peaks are located within 1.1 Å from crystal water sites “1”, “2”, “4”, and “5”. Site “3” is located in the minor component of the EHDF of the main-chain carbonyl group of Gln86 and at the middle of two predicted peaks. Thus, this result suggests the possibility that the probability densities of hydration calculated for the solventexposed surface are applicable to such a narrow solvent channel in the protein interior. 3.3. Overview of the Predicted Probability Density of Proteins. Taking the results in the hydration prediction for lysozyme, we next applied the program to other proteins having hydration structures relevant to their dynamics, stability, and functions. Figure 3 illustrates the predicted probability densities for two soluble proteins (the light-oxygen-voltage sensing domain 1 (LOV1)33 and glutamate dehydrogenase (GDH)34) and two membrane proteins (bacteriorhodopsin (BR)35 and aquaporin (AQP)36). The densities envelop crystal water sites over the surfaces of the soluble proteins, but those of the membrane proteins are absent from their hydrophobic surfaces buried in cell membranes. In the following sections, the details of the predicted hydration structures of the four proteins are described in comparison with their crystal water sites. The histogram in Figure 3e shows that predicted density peaks appear within 1.5 Å for more than 75% of the crystal water sites in GDH and cytochrome c oxidase (CcO).37 Because the

Prediction of Protein Hydration

J. Phys. Chem. B, Vol. 114, No. 13, 2010 4655

Figure 2. Probability density predicted for human lysozyme. (a) Two different views of the predicted probability distributions of water molecules around human lysozyme determined at a resolution of 0.95 Å (Supporting Information). Human lysozyme is illustrated as a surface-rendered model, and the calculated probability density is shown as green fishnets contoured at a level of 0.001 25 water molecules/0.5 × 0.5 × 0.5 Å3 voxel. The red spheres indicate the positions of the crystal water sites in a H-bond (distance 2.4-3.4 Å) with polar protein atoms, and the blue spheres are the crystal water molecules in van der Waals contacts with apolar atom groups (distance 3.4-3.7 Å). The yellow spheres are the peak positions in the predicted density around polar atoms. (b) A histogram of the distances between the crystal water sites and the nearest peak positions in the predicted density. The inset is a magnified view around Ala90 illustrating the distribution of the density peaks and the crystal water sites. Each ellipse encircling a pair of a predicted and a crystal water site is colored according to the distance range in the histogram. (c) The probability density for a channel penetrating into the core region from the lysozyme surface. The blue fishnet is the 2Fo - Fc difference Fourier electron density map calculated at a resolution of 0.95 Å (Supporting Information) and contoured at a 2.0 standard deviation level from the average. Dashed lines indicate possible H-bonds between the crystal water sites and the residues forming the channel. Panels a and c were drawn by using PyMol.74

amount of the crystal water molecules in the two protein assemblies is statistically significant, the results are representative examples in the application of the program. The P-scores of less than 1.3 Å for the tested proteins (Table 2) are comparable with that from the MD-based prediction of hydration structures for acetylcholine esterase (ACE).26 3.4. Water Clusters in Membrane Proteins. Because the program reproduces the hydration in the narrow channel of human lysozyme, the program is likely able to predict water clusters occupying a hydrophilic cavity (or a channel) equipped in membrane proteins. To examine this idea, we applied the program to the two membrane proteins, BR35 and AQP,36 solved at high resolution.

(i) Water Clusters in BR. BR is a light-driven proton pump composed of a bundle of seven R-helices and one chromophore retinal (Figure 3c).38 Upon absorbing light, BR undergoes a unique photoreaction cycle through metastable intermediates (K, L, M1, M2, and so on), and pumps one proton per cycle from the cytoplasmic to the extracellular side.39 Charged residues and the water cluster confined near the chromophore are argued to be involved actively in the vectorial proton transport.14 In the ground state (Figure 4c),35 the predicted probability density in the extracellular side of the chromophore is partitioned by the Arg82 side chain into a space surrounded by Tyr57, Arg82, Asp85, Asp212, and Lys216 (designated as the upper cavity) and another by Arg82, Ser193, Glu194, and Glu204 (the

4656

J. Phys. Chem. B, Vol. 114, No. 13, 2010

Matsuoka and Nakasako

Figure 3. Predicted probability densities of hydration water molecules around (a) the dimer of the light-oxygen-voltage sensing domain from phototropin233 (LOV1, the PDB accession code: 2Z6D), (b) hexameric glutamate dehydrogenase34 (GDH, see the Supporting Information), (c) the trimer of bacteriorhodopsin (BR) in the ground state35 (1C3W), and (d) the tetramer of plant aquaporin36 (AQP, 1Z98). The protein structures are schematically shown in the left column, and the probability densities of hydration water molecules are illustrated with the surface-rendered protein models in the right column. The densities shown as green fishnets are contoured at a level of 0.001 25 water molecules/0.5 × 0.5 × 0.5 Å3 voxel. Panels a-d were prepared by using PyMol.74 Panel e shows histograms for distances between the predicted peak positions and the crystal water sites in cytochrome c oxidase (CcO)37 and GDH.34

Prediction of Protein Hydration

J. Phys. Chem. B, Vol. 114, No. 13, 2010 4657

Figure 4. Stereoplots illustrating the hydration probability densities in the interior of the wild-type BR at the ground state (the PDB accession code: 1C3W)35 (a), the D96N-mutated BR at the M2-intermediate (1C8S)40 (b), and plant AQP (1Z98)36 (c). The protein structures are illustrated by the ribbon and stick models. In each panel, the green fishnets are contoured at 0.001 25 water molecules/0.5 × 0.5 × 0.5 Å3 voxel. The red and yellow spheres indicate the positions of crystal water molecules and those of predicted density peaks, respectively.

4658

J. Phys. Chem. B, Vol. 114, No. 13, 2010

lower cavity), and envelops the crystal water sites in the cavities. The number of density peaks is comparable with that of the crystal water sites, and their positions are located close (