Molecular Anatomy of Preferential Interaction Coefficients by

Singapore−MIT Alliance, National University of Singapore, 4 Engineering Drive 3, Singapore 117576, Bioprocessing Technology Institute, 20 Biopolis W...
0 downloads 3 Views 2MB Size
J. Phys. Chem. B 2009, 113, 11743–11753

11743

Molecular Anatomy of Preferential Interaction Coefficients by Elucidating Protein Solvation in Mixed Solvents: Methodology and Application for Lysozyme in Aqueous Glycerol Vincent Vagenende,† Miranda G. S. Yap,†,‡ and Bernhardt L. Trout*,‡,§ Singapore-MIT Alliance, National UniVersity of Singapore, 4 Engineering DriVe 3, Singapore 117576, Bioprocessing Technology Institute, 20 Biopolis Way #06-01 Centros, Singapore 138668, and Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts AVenue, Cambridge, Massachusetts 02139 ReceiVed: April 14, 2009; ReVised Manuscript ReceiVed: June 30, 2009

Preferential interaction coefficients of proteins in mixed solvents are bulk thermodynamic parameters that relate molecular characteristics of protein solvation with solvent effects on protein thermodynamics. Because of their bulk nature, they give no insight in the molecular level nature of protein solvation. In this study, we develop a methodology which provides insight into the molecular anatomy of preferential interaction coefficients by elucidating protein solvation in mixed solvents. Our methodology makes use of molecular simulations and reveals intricacies of solvent-protein interactions which are not accounted for by less detailed models for solvent effects on protein thermodynamics. This is demonstrated for lysozyme in 30 vol % aqueous glycerol. We find that solvent regions near protein O- and N-atoms that favor the formation of multiple hydrogenbonds with glycerol positively contribute to the preferential interaction coefficient (15 ( 4) due to the preferential solvation by glycerol molecules with long residence times (>2 ns). Yet, the overall value of the preferential interaction coefficient is negative as solvent regions near protein surface loci with similar affinities for glycerol and water have a stronger negative contribution (-22 ( 4). On the basis of these results, we discuss the current scope and future prospects of our methodology to understand solvent effects on protein thermodynamics. 1. Introduction In addition to proteins and water, protein solutions often consist of cosolvents including salts, sugars, polyols, and denaturants. Interactions of such cosolvents with the protein are of similar magnitude as water-protein interactions, and if present in sufficient amounts (several volume percents), they affect protein thermodynamics.1 Hence, cosolvents are widely used in protein processes relevant to biology and biotechnology.2-7 Although experimental data regarding cosolvent effects on protein thermodynamics have been accumulated over many decades, insight into the mechanisms of solvent effects on protein thermodynamics is lacking. Over the past decades, a thermodynamic framework has been established that accounts for solvent effects on protein thermodynamics based on the preferential interaction coefficient, ΓXP.1 The preferential interaction coefficient of a protein in a mixed solvent is not only a thermodynamic quantity for solvent effects on protein thermodynamics, but it is also a measure of the excess number of cosolvent molecules near the protein surface:8



ΓXP ) nXL -

nXB nBW

nLW



(1)

In the above equation, n is the number of molecules of a specific type (subscript X for cosolvent and W for water) in a * Author to whom correspondences should be addressed. E-mail: [email protected]. Phone: (617) 258-5021. Fax (617) 253-2272. † Bioprocessing Technology Institute. ‡ Singapore-MIT Alliance. § Massachusetts Institute of Technology.

certain domain around the protein (subscript L for local domain and subscript B for bulk domain). The local domain is the solvent region near the protein surface where the solvent composition is altered due to solvent interactions with the protein, whereas the bulk domain comprises the rest of the solvent. Recently, a number of theoretical studies have refined expressions of ΓXP based on statistical thermodynamics.9-13 Yet, the underlying punch line remains: ΓXP is determined by the average local solvent composition near the protein surface. Since the local solvent composition is determined by protein solvation characteristics, ΓXP relates protein solvation with solvent effects on protein thermodynamics. Since all solvent molecules in the local domain contribute to ΓXP (eq 1), the solvent composition in the local domain needs to be characterized comprehensively to gain molecular level understanding of ΓXP. Experimental studies have pointed out interesting aspects of protein solvation in mixed solvents. For example, Roche et al.14 used fluorescence to probe the extent to which glycerol is preferentially excluded from a specific site of hemoglobin. However, no quantitative information on preferential interactions could be derived from the measured emission intensities. Another insightful study by Sinibaldi et al.15 characterized solvation properties of lysozyme in glycerolwater mixtures by small-angle neutron scattering. ΓXP was estimated based on a solvent exchange model and good agreement with experimental ΓXP values was reported. Unfortunately, models like the solvent exchange model16 incorporate assumptions that ignore the intricacies of preferential interactions and compromise molecular level understanding of solvent effects on protein thermodynamics. For example, the heterogeneity of the contributions of local solvent regions at distinct surface loci of the protein1 is not accounted for by such models.

10.1021/jp903413v CCC: $40.75  2009 American Chemical Society Published on Web 08/04/2009

11744

J. Phys. Chem. B, Vol. 113, No. 34, 2009

Insight into protein solvation has also been obtained from computational studies.17 Computational studies for a protein in pure water have generated insight in the structure and dynamics of water near the protein surface. Merzel and Smith18 showed that the local water concentration is significantly different from the bulk concentration up to radial distances of about 4 Å from the protein van der Waals surface. This distance delimits a local solvent region comprising of two hydration layers. Water diffusion rates decrease up to 7 times in the vicinity of the protein,17,19 and characteristic residence times of water near the protein surface range from 10 ps to several nanoseconds.20 Water tends to have particularly long residence times near protein surface loci in the cavities and clefts of the protein, whereas hydrogen bonding and hydrophobic interactions with the protein play a lesser role in determining the residence times.21 In addition to studies on protein solvation in pure water, several computational studies of proteins in mixed solvents have investigated the relationship between protein solvation and solvent effects on protein thermodynamics.22-28 Solvent effects on protein thermodynamics inferred from these studies are mostly qualitative, and limited molecular level understanding of ΓXP is gained because the characterization of the solvent composition in the local domain is incomprehensive. A breakthrough was pioneered by Baynes and Trout, who established a methodology for the calculation of preferential interaction coefficients for proteins in cosolvent-water mixtures from molecular dynamics (MD) simulations.29 Yet, the successful computation of preferential interaction coefficients for proteins,29-31 as well as for peptides32 and small molecules,33-35 has been hindered by computational limitations with regard to the simulation time and the system size. In a study that was carried out recently in our group, it was demonstrated that ΓXP values converge using extended simulation times (20 ns).36 Computed ΓXP values for various proteins and cosolvents determined in this way agree well with experiment.36 This indicates that protein solvation characteristics which are relevant to ΓXP can be realistically represented by MD simulations. Hence, MD simulation studies appear promising for gaining an understanding of the molecular anatomy of preferential interaction coefficients. In particular, such studies could provide answers to the following questions: how do interactions of water and the cosolvent with the protein surface determine ΓXP? How do different solvent regions contribute to ΓXP? To address these questions, we develop a methodology that enables anatomizing ΓXP on the molecular level. Computational bottlenecks with respect to simulation time and system size are addressed, and we present a number of methods to characterize protein solvation and to differentiate contributions of local solvent regions to ΓXP. The methodology is applied to a series of 40 ns simulations for lysozyme in a 30 v/v % glycerol-water mixture. Lysozyme is chosen since it is one of the most well studied proteins, and glycerol is a cosolvent which is widely used in protein solutions.37-41 Solvent-protein interactions are characterized and contributions of distinct solvent regions to ΓXP are quantified. Finally, we evaluate the scope and future prospects of anatomizing preferential interaction coefficients to gain molecular understanding of solvent effects on protein thermodynamics. 2. Methodology 2.1. Molecular Dynamics Simulations. The setup of each simulation is carried out with CHARMM42 version c32b2. A high resolution crystallographic structure of HEL (PDB code

Vagenende et al. TABLE 1: Simulation Details sima

proteinb

constraintc

nWd

nXd

mX (molal)

time (ns)

U1 U2 B1 B2 F1 F2 F3 E1 S W

194L 194L 194L U1:5 ns 194L U1:5 ns 194L 194L -

bb bb fix fix fix fix -

7520 8991 4482 7520 4482 7520 6336 6181 2986 4253

794 950 474 794 474 794 0 653 315 0

5.87 5.87 5.87 5.87 5.87 5.87 0 5.87 5.86 0

20 40 40 40 40 40 40 20 40 10

a Simulation identifiers: U for simulations with unconstrained protein coordinates; B for simulations for which the protein backbone was constrained; F for simulations with fixed protein coordinates; E for simulations for which electrostatic charges of the protein are switched off; S for solvent without protein; and W for pure water. b Protein structures of HEL at the start of the simulation. 194L: according PDB structure of HEL. U1: x ns: structure sampled in sim U1 after x ns. c Constraining of the protein coordinates: bb refers to harmonically constrained coordinates with a force constant of 2 kcal/(mol Å2) of C-, N-, and O-atoms of the protein backbone; fix refers to fixed protein coordinates. d The total number of water (nW) and glycerol (nX) molecules in the simulation box.

194L) is taken as a starting structure. The charges of Arg, Lys, and the N-terminus are set positive, charges for Asp, Glu, and the C-terminus are set negative, and all other amino acids are neutral. The protein is solvated in a 30 v/v % glycerol-water mixture (i.e., 5.87 M glycerol43), and the protein charge (+8) is neutralized by Cl-atoms. To verify that the box size does not influence results, box sizes are varied between different simulations, albeit preserving a minimum distance of 7 Å between the protein van der Waals surface and the boundary of the box at any time for each of the simulations. The CHARMM2244 parameter set is used to model protein atoms, and water is modeled by the TIP3 model.45 Similar to Baynes and Trout,29 we use force field parameters for glycerol which are based on the carbohydrate hydrate parameters developed by Brady and co-workers46 and modified for optimal compatibility with the CHARMM22 field (these parameters are available on the website of Alexander Mackerell http:// mackerell.umaryland.edu/CHARMM_ff_params.html in the file par_all22_sugar.inp). The solvated protein system is minimized for 1000 steps in NAMD, version 2.6.47 Further simulations are performed by NAMD for 20 to 40 ns at constant pressure (1 atm) and constant temperature (298 K) using a 2 fs step size while keeping the bond lengths of hydrogen constant with the SHAKE algorithm. For all simulations, periodic boundary conditions are used, and images and nonbonded lists are updated every 10 steps using a 12 Å cutoff distance. The van der Waals potential energy is smoothly switched off between 8 and 10 Å, and the particlemesh Ewald method is used to calculate long-range electrostatic interactions. To verify whether conformational changes affect the preferential interaction coefficient, different modes of constraining protein coordinates are used: protein coordinates are unconstrained (sim U1 and U2), protein backbone coordinates are harmonically constrained (sim B1 and B2), or protein coordinates are fixed (sim F1, F2, and F3) with respect to the minimized crystal structure 194L. The role of electrostatic interactions on protein solvation is assessed by switching off protein partial charges for one simulation (sim E1). Besides, one simulation is run for a binary mixture of glycerol and water (sim S) and one simulation is run for pure water (sim W). Simulation details are listed in Table 1.

Preferential Interaction Coefficients of Lysozymes

J. Phys. Chem. B, Vol. 113, No. 34, 2009 11745

2.2. Theory of Preferential Interaction Coefficients. Depending on the concentration units and the thermodynamic ensemble, preferential interaction coefficients can be defined in different ways.11-13 This study focuses on the preferential interaction coefficient at constant chemical potentials of cosolvent (µX) and water (µW), at constant temperature T and at infinitely dilute protein concentration (mP f 0): m,mPf0 ΓXP,µ ≡ W,µX,T

( ) ∂mX ∂mP

mPf0

(2) µX,µW,T

In the above equation, mX and mP are the molalities of m,mPf0 is of general cosolvent X and protein P respectively. ΓXP,µ W,µX,T interest since it quantifies the change of the chemical potential of the protein when adding cosolvent to an infinitely dilute aqueous protein solution.13,29,48 In the following, we use the m,mPf0 . short-hand notation ΓXP when referring to ΓXP,µ W,µX,T Several recent publications have applied Kirkwood-Buff theory to preferential interactions in three-component solutions.9-13,49,50 Herein, Kirkwood-Buff integrals GRβ play a central role:

GRβ ≡ 4π

∫0∞ [〈gRβ(r)〉µVT - 1]r2 dr

〈cRβ(r)〉µVT 〈cR〉µVT

(4)

In eq 4, cRβ(r) is the local concentration of R at radial distance r from molecule β, and cR is the bulk concentration of R. Among other authors, Smith13 derived an expression for ΓXP for infinitely dilute protein solutions in function of KirkwoodBuff integrals:

ΓXP ) 〈cX〉µVT(GXP - GWP)

(5)

Combining eqs 3-5 results in the following expression for the preferential interaction coefficient:

ΓXP ) 4π

∫0∞

[

〈cXP(r)〉µVT -

]

〈cX〉µVT 〈c (r)〉 r2 dr 〈cW〉µVT WP µVT

(6)

2.3. Calculation of the Preferential Interaction Coefficient from Molecular Dynamics Simulations. Smith51,52 introduced two approximations to calculate ΓXP from MD simulations. First, since MD simulations in the µVT ensemble are impractical, local concentrations are estimated from simulations in more straightforward ensembles such as the NPT or NVT ensemble. Second, the size of the solvent box in which the protein is placed is limited, and therefore, the integral in eq 6 is truncated to a final radial distance R which falls within the boundaries of the solvent box. Equation 6 then becomes

ΓXP(R) ) 〈nXP(r < R)〉 -

B 〈nXP 〉 B 〈nWP 〉

τrun/τrec

ΓXP )

(3)

In eq 3, 〈gRβ (r)〉µVT is the radial distribution function (RDF) of molecule R with respect to molecule β in the µVT ensemble:

〈gRβ(r)〉µVT ≡

The brackets 〈〉 refer to the ensemble average in which the MD simulation is performed; nXP(r < R) and nWP(r < R) are the number of cosolvent and water molecules for which the center of mass falls within the radial distance R from the protein van B B and nWP are the number of cosolvent der Waals surface and nXP and water molecules in the bulk domain B. Truncation of the integral will lead to minimal error if the integrand in eq 6 is close to zero for radial distances beyond the cutoff distance. This is the case for radial distances R beyond B B 〉/〈nWP 〉. The which 〈cXP(r)〉/〈cWP(r)〉 ) 〈nXP(r)〉/〈nWP(r)〉 = 〈nXP minimum cutoff distance R for which this condition is fulfilled can be interpreted as the boundary between the local domain where the solvent composition is altered due to solvent-protein interactions, and the bulk domain where the composition of the solvent remains unperturbed.13,29,48 The number density ratio in B B 〉/〈nWP 〉, is calculated based on all solvent the bulk domain, 〈nXP molecules at radial distance r > R.30 The ensemble average ΓXP can then be estimated as

〈nWP(r < R)〉

(7)

ΓXP(τ) ) nXP(r < R, τ) -



ΓXP(τ)

τ)1

(8)

τrun /τrec

nX - nXP(r < R, τ) n (r < R, τ) nW - nWP(r < R, τ) WP (9)

ΓXP(τ) is the instantaneous preferential interaction coefficient calculated from simulation frame τ, τrun is the total simulation time, and τrec is the recording period of the simulation. nXP(r < R,τ) and nWP(r < R,τ) are the number of cosolvent and water molecules for which the center of mass falls within the distance R of the protein van der Waals surface for frame τ, and nX and nW are the total number of cosolvent and water molecules in the simulation box. To calculate ΓXP and related quantities, a recording period of 10 ps proves sufficient: including coordinate frames at shorter intervals (0.1 or 1 ps) does not affect averages and standard deviations; however, values of ΓXP and related quantities vary when increasing the recording period to 100 ps. 2.3.1. Statistical Analysis. Statistical errors reported in this study are calculated via the method described by Allen and Tildesley.53 This method allows systematic estimation of statistical errors of any quantity A based on the analysis of block time averages 〈A〉τB. The simulation trajectory is divided in blocks of varying length τB and averages of A are calculated for subsequent time blocks. This enables statistical analysis based on a plot of p(τB): 2 τB σ 〈A〉τB p(τB) ) τrec σ2〈A〉

(10)

τrec

In the above equation, τrec is the recording period of atom coordinates, and σ2〈A〉τrec and σ2〈A〉τB are the variances calculated from the respective block time averages. For large block times, block time averages 〈A〉τB eventually become statistically independent and p(τB) remains constant. The plateau value of p(τB) equals the statistical inefficiency s and the minimum block time τB beyond which p(τB) remains constant is the correlation time τc. We find that the minimum simulation time to clearly distinguish a plateau region is about

11746

J. Phys. Chem. B, Vol. 113, No. 34, 2009

Vagenende et al. TABLE 2: Local Molecular Diffusion Coefficientsa for Water and Glycerol protein surface 5 < r < 6 Å r > 20 Å Dwat [10-9 m2/s] Dglyc [10-9 m2/s]

Figure 1. Preferential interaction coefficient ΓXP(R) in function of the upper integration limit R for sim F1 (1), sim B2 (9), sim U1 (b), and sim U2 (2). Error bars correspond with the standard deviation.

10τc. Hence, for simulation times τrun > 10τc, the variance of the quantity A can be estimated as

σ2〈A〉τrun ) s

τrec 2 σ 〈A〉τrec τrun

(11)

2.3.2. Extent of the Local Domain. Generally, the boundary between the local and the bulk domain is determined as the minimal distance R beyond which ΓXP(R) (eq 7) remains constant.29,30,32-36 Figure 1 shows the values of ΓXP(R) for four simulations up to 18 Å. For each simulation, values of ΓXP(R) vary significantly for R < 5 Å. For example, ΓXP(R) is significantly lower at 5 Å than at 4 Å for all simulations. This indicates that the solvent composition in the solvent region 4 < r < 5 Å is significantly different from the solvent composition in the bulk, and therefore, the solvent in this region contributes to ΓXP. In contrast, values of ΓXP(R) do not change significantly beyond 5 Å (Figure 1). For example, for none of our simulations is ΓXP(5 Å) significantly different from ΓXP(6 Å). At larger radial distances, drift of ΓXP(R) is observed. Yet, standard deviations increase concomitantly and differences of ΓXP(R) from ΓXP(5 Å) are not significant. Since 5 Å is the minimum distance beyond which differences of ΓXP(R) are no longer significant, the boundary between the local and the bulk domain is set at 5 Å and by combining eq 8 and eq 9, we get



ΓXP = nXP(r < 5 Å) -

nX - nXP(r < 5 Å) n (r < 5 Å) nW - nWP(r < 5 Å) WP



τrun

(12)

In the above equation 〈〉τrun refers to the average over the simulation time τrun. Notably, the cutoff R ) 5 Å is close to the thickness of the local domain (∼5.8 Å) estimated by a smallangle neutron scattering study of HEL in water-glycerol mixtures.15 Another noteworthy point is that values of ΓXP(5 Å) differ significantly between different simulations (Figure 1). This observation is addressed at a later point in this study. In several studies,30,32-35 drift of ΓXP(R) in the bulk domain has been ascribed to insufficient simulation times or system sizes. These studies were likely right by pinpointing these reasons; however, for our study, this is not the case as drift of ΓXP(R) beyond 5 Å is not significant. Moreover, we find that, as long as the minimum distance between the protein van der Waals surface and the boundary of the simulation box is greater than 7 Å, increasing the system size does not affect drift of ΓXP(R). Nevertheless, the investigation of molecular origins of the drift of ΓXP(R) in the bulk domain deserves further

binary mixture

pure water

0.6

1.3

1.5

1.5 (0.8b) 2.6 (2.3b)

0.1

0.3

0.5

0.5 (0.3b)

-

a Diffusion coefficients are calculated from sim F2 (protein surface, 5 < r < 6 Å and r > 20 Å), sim S (binary mixture) and sim W (pure water) as discussed in Section 2.4. b Experimental diffusion coefficients in 30 vol % aqueous glycerol and pure water reported by d’Errico et al.61

investigation as it will aid computational studies on solvation dynamics and preferential interaction coefficients. Drift of ΓXP(R) is caused by fluctuations of the number ratio 〈nXP(r)〉/〈nWP(r)〉 with respect to the average number ratio in the bulk (eq 7). For r > 5 Å, fluctuations of the number ratio are inconsistent between different simulations (Figure S1A of the Supporting Information). Such fluctuations diminish when averaged over longer block times and will eventually vanish for extended simulation times (Figure S1B and C). Moreover, fluctuations of the number ratio follow a random distribution (Figure S1D). All these observations indicate that, for our simulations, fluctuations of the number ratio 〈nXP(r)〉/〈nWP(r)〉 do not contribute to ensemble averaged properties of the solvent composition in the bulk domain. Therefore, drift of ΓXP(R) at R > 5 Å is not accounted for. Fluctuations of the molecule number ratio originate at the protein surface and move to larger radial distances by molecular diffusion (Figure S2 of the Supporting Information). To estimate time-scales of molecular diffusion, diffusion coefficients of water and glycerol are calculated at various radial distance with respect to the protein surface (Table 2). Diffusion coefficients gradually increase as the radial distance from the protein surface increases, and at r > 20 Å, they equal diffusion coefficients in a binary mixture of 30 vol % aqueous glycerol. Calculated diffusion coefficients are higher than experimental values; yet, simulation results agree with experiments in that the diffusion rate for glycerol is about 3 times smaller than that for water. According to the diffusion coefficients listed in Table 2, it would take several nanoseconds for a glycerol molecule to diffuse over the edge length of the simulation box. Moreover, we find that characteristic residence times of solvent molecules at the protein surface amount to several nanoseconds (see section 3.1). Thus, time-scales for molecular exchange between the local domain and the bulk domain and molecular diffusion are substantial with respect to the simulation time. Moreover, equilibration of solvent concentrations at the boundaries of the is slowed down because molecules diffusing out of the simulation box end up in the same bin at the opposite side of the simulation box because of periodic boundary conditions. As a result, fluctuations of the molecule number ratio persist after averaging over 40 ns simulations. 2.3.3. Minimum Simulation Time to Calculate ΓXP. The correlation time of ΓXP ranges between 1 and 3 ns (Figure 2). Consequently, the minimum simulation time to estimate the confidence interval of ΓXP varies between 10 and 30 ns. To identify the factors determining the minimum simulation time, we investigate time fluctuations of the three terms that make up ΓXP: nXP(r < 5 Å), nWP(r < 5 Å), and [nX - nXP(r < 5 Å)]/ [nW - nWP(r < 5 Å)] (eq 12). If the two last terms are statistically independent, temporal differences of block time averages 〈ΓXP〉τB

Preferential Interaction Coefficients of Lysozymes

J. Phys. Chem. B, Vol. 113, No. 34, 2009 11747

with respect to 〈ΓXP〉τrun, i.e. ∆〈ΓXP〉τB, can be expressed as follows:

∆〈ΓXP〉τB ) ∆〈nXP(r < 5 Å)〉τB -



nX - nXP(r < 5 Å) nW - nWP(r < 5 Å)



τrun



〈nWP(r < 5 Å)〉τrun∆

∆〈nWP(r < 5 Å)〉τB -

nX - nXP(r < 5 Å) nW - nWP(r < 5 Å)



(13)

τB

For block times of 1 ns, the sum of the three terms in eq 13 corresponds well with fluctuations ∆〈ΓXP〉1ns (Figure 3). Fluctuations ∆〈ΓXP〉1ns are mainly caused by variations of the number of glycerol molecules in the local domain, ∆〈nXP(r < 5 Å)〉1ns, andsto a lesser extentsby variations of the number of water molecules in the local domain, ∆〈nWP(r < 5 Å)〉1ns. Contributions of the third term in eq 13, caused by fluctuations of the number ratio in the bulk domain, are only minor. This is expected as simulation boxes are such that the number of solvent molecules in the bulk domain (r > 5 Å) is much larger than the local domain (r < 5 Å). Since temporal fluctuations of ΓXP are mainly caused by fluctuations of the number of solvent molecules in the local domain, the minimum simulation time to calculate ΓXP must be determined by characteristic residence times of solvent molecules in the local domain. Characteristic residence times are estimated as described in the following section. 2.4. Characterization Modes of Protein Solvation. To estimate the lower limit of the molecular diffusion rates in the local domain and the range of diffusion rates in the bulk domain, local diffusion coefficients are computed at the protein surface (r < 2 Å), at the start of the bulk domain (5 < r < 6 Å), and further in the bulk domain (r > 20 Å). Diffusion coefficients are calculated according to the Einstein equation from the limiting slope of the mean-square displacement curves:54

lim 〈| b(t r + ∆t) - b(t)| r 2〉 ) 6D∆t

∆tf∞

Figure 3. Temporal fluctuations of the three terms in eq 13 which determine fluctuations of block time averages 〈ΓXP〉1ns with respect to 〈ΓXP〉40ns (thick black line).

The residence time of a solvent molecule in the local domain (r < 5 Å) is defined as the difference between the exit and the entrance time in the local domain. Residence times are monitored for all solvent molecules and the survival functions NXP(t) and NWP(t), which represent the average number of glycerol and water molecules with a residence time longer than t,20,55 are calculated. Notably, NXP(0) and NWP(0) equal the average total number of glycerol and water molecules in the local domain, i.e. 〈nXP(r < 5 Å)〉τrun and 〈nWP(r < 5 Å)〉τrun. NXP(t) and NWP(t) are fitted by a combination of simple exponential functions. For glycerol, the sum of two exponential functions allows good fitting for t > 50 ps (R2 ) 0.998 and residuals 5 ns, respectively (Table 4). We find that the fits of the survival functions (eqs 15 and 16) are equally good for any local domain r < R with 3 < R < 8 Å. Thus, the classification of solvent molecules according to their characteristic residence times is robust with respect to the selection of the extent of the local domain. Characteristic residence times τ1W and τ1X gradually increase with increasing R and are almost 4 times larger for r < 8 Å than for r < 3 Å (Figure S3 of the Supporting Information). This increase can be attributed to the increase of diffusion times between the bulk domain and the protein surface as the extent of the local domain increases. In contrast, the characteristic residence times τ2W, τ3W, and τ2X remain relatively constant as the extent of the local domain increases from 3 to 8 Å (Figure S3 of the Supporting Information). This indicates that the characteristic residence times of the corresponding solvent classes (i.e., class 2 and 3 water molecules and class 2 glycerol molecules) are mainly

Preferential Interaction Coefficients of Lysozymes determined by intermolecular interactions in the solvent region r < 3 Å where protein-solvent interactions occur. To gain further insight in solvation characteristics of different solvent classes, we compare molecule numbers and characteristic residence times between different simulations. Side chain motions of the protein significantly decrease the characteristics residence time τ2X of class 2 glycerol molecules (compare sim B1/B2 with sim F1/F2 in Table 3). In contrast, the decrease of characteristics residence times of glycerol molecules caused by backbone motions of the protein is much smaller (compare sim U1/U2 with sim B1/B2 in Table 3). This suggests that class 2 glycerol molecules concurrently interact with different flexible protein segments including at least one side chain. For water, the decrease of characteristic residence times caused by protein side chain and backbone motions is rather small (Table 4). This suggests that water molecules mostly interact with one protein segment at a time. Interestingly, characteristic residence times τ1W and τ2W are considerably higher for HEL in 30 vol % aqueous glycerol compared to HEL in pure water (compare sim F1 and F3 in Table 4). The relative increase corresponds well with the increase of the water diffusion coefficient in pure water compared to 30 vol % aqueous glycerol (Table 2). This suggests that residence times of class 1 and class 2 water molecules depend on transport properties of the solvent. In contrast, the characteristic residence time of class 3 water molecules is not considerably affected by the presence of glycerol. When protein charges are switched off, only one characteristic residence time occurs for glycerol and water respectively (sim E1; Table 3 and Table 4). For glycerol, this characteristic residence time is intermediate between the characteristic residence times of class 1 and class 2 glycerol molecules (Table 3). This indicates that electrostatic solvent-protein interactions either increase or decrease the characteristic residence time of glycerol molecules. Thus, electrostatic solvent-protein interactions are important for both classes of glycerol molecules. For water, the characteristic residence time when protein charges are switched off is even smaller than the characteristic residence time of class 1 water molecules. Thus, electrostatic solvent-protein interactions are essential for class 2 and class 3 water molecules. This is in good agreement with the electrostatic nature of class 2 water-protein interactions in pure water.57 As pointed out in the Methodology section, the minimum simulation time to calculate ΓXP is determined by characteristic residence times of solvent molecules in the local domain. For block times between 1 and 3 ns, temporal fluctuations of 〈ΓXP〉τB become independent. Therefore, the correlation time and the minimum simulation times are determined by fluctuations of the number of solvent molecules in the nanosecond time-scale. For block time averages of 2 ns, temporal fluctuations of 〈ΓXP〉2ns are mainly determined by fluctuations of NWP(2 ns) and NXP(2 ns). Taking into account the values of NWP(2 ns) and NXP(2 ns) and their respective contributions to ΓXP, one finds that fluctuations of 〈ΓXP〉2ns are mainly determined by temporal fluctuations of NXP(2 ns). Since NXP(2 ns) mainly consists of class 2 glycerol molecules, fluctuations of 〈ΓXP〉2ns are mainly determined by fluctuations of the number of class 2 glycerol molecules in the local domain. We conclude therefore that the correlation time and the minimum simulation time to calculate ΓXP are determined by the characteristic residence time of class 2 glycerol molecules. 3.2. Solvent Molecules with Long Residence Times. For class 2 and 3 water and class 2 glycerol molecules, residence times in the local domain (Tables 3 and 4) are much longer

J. Phys. Chem. B, Vol. 113, No. 34, 2009 11749

Figure 5. Glycerol and water molecules remaining at the protein surface locus formed by Arg 45, Gly 49, and Arg 68 for the entire trajectory of sim F2. Snapshots are taken at 36 (A) and 40 ns (B). Glycerol, water, and Arg 45 are represented as ball-and-stick models, and atoms of other protein residues are represented as van der Waals spheres. O-, N-, C-, and H-atoms are colored in red, blue, cyan, and white, respectively, and hydrogen-bonds formed by glycerol are indicated by dotted lines.

than typical diffusion times in and out of the local domain (Table 2). If solvation dynamics of these glycerol molecules cannot be explained by molecular diffusion, then what else causes such long residence times? To answer this question, we inspect solvent molecules with long residence times. Sim F2 is the only simulation for which a glycerol molecule stays within the local domain for the entire simulation (Table 3). This glycerol molecule resides in the superficial cleft formed by Arg 45, Gly 49, and Arg 68 where it forms one hydrogenbond with one of the two water molecules remaining in the same cleft for the entire simulation time (Figure 5). Two more hydrogen-bonds are formed with the guanidinium group of Arg 45 (Figure 5). Most of the time, the glycerol conformation is such that it allows the formation of an intramolecular hydrogenbond (Figure 5A). Occasionally, the conformation of glycerol is more extended and the O-atoms which form hydrogen-bonds with Arg 45 are swapped (Figure 5B). In similar ways, other glycerol molecules with long residence times (> 5 ns) reside at loci at the protein surface where they from multiple hydrogen-bonds with protein N- and O-atoms. Generally, protein N- and O-atoms forming such loci belong to side chains or the backbone of adjacent protein residues. This is in good agreement with glycerol orientations at the protein surface determined from X-ray crystallography.58 Long-residing glycerol molecules generally stay at the same locus before leaving the local domain and frequently change hydrogenbonding partners within the locus. We refer to such loci as glycerol-binding loci. In contrast, most water molecules with long residence times typically move between different protein surface loci before leaving the local domain. Notable exceptions are water molecules which are trapped by long-residing glycerol molecules (Figure 5) and water molecules buried inside the protein interior near Ile 55. Given the specificity of relative orientations of protein Oand N-atoms to form multiple hydrogen-bonds with glycerol, protein conformational changes are expected to lead to the creation or deletion of glycerol-binding loci. Indeed, the glycerol-binding locus at Arg 45 for the protein conformation in sim F2 (Figure 5) is not present in sim F1 as Arg 45 forms hydrogen-bonds with the guanidinium group of Arg 68 such that the cleft is inaccessible for glycerol molecules. For simulations for which the protein side chains can move freely, repetitive hydrogen-bond formation and breaking between the guanidinium groups of Arg 45 and Arg 68 occurs. As a result, residence times of glycerol molecules residing near Arg 45 are

11750

J. Phys. Chem. B, Vol. 113, No. 34, 2009

Figure 6. Local concentration maps for glycerol (red) and water (blue) for HEL in 30 vol % aqueous glycerol (sim F1; parts A, C, and D) and for HEL in pure water (sim F3; part B). Isosurfaces show the solvent regions where gXP(r b) > 1.5 and gWP(r b) > 1.25 (A); gWP(r b) > 1.12 (B); gXP(r b) > 1.5 and gWP(r b) > 1.5 (C); gXP(r b) > 1.25 and gWP(r b) > 1.12 (D).

relatively short (2 ns) (Figure 8D). For example, the glycerol concentration regions on the right top corner of Figure 8A consist of one glycerol molecule which forms three hydrogen-bonds with the protein surface for 30 ns (consecutive). Other high concentration regions of glycerol are mainly made up of glycerol atoms located between 2 and 5 Å from the protein surface (Figure 8B and C). Such glycerol concentration regions do not have distinguishable contributions of long-residing glycerol molecules (Figure 8D). In conformance with the upper boundary of the local domain at 5 Å, only minor contributions are found beyond 5 Å (Figure 8C). The distinction between long-residing glycerol molecules which are hydrogen-bonded to the protein surface and more

Preferential Interaction Coefficients of Lysozymes

Figure 8. (A-C) Local concentration maps for glycerol (red) and water (blue) atoms at radial distances [R - 1 Å, R] for R ) 1 (A), 3 (B), and 6 Å (C) from the protein surface. Contributions of atoms in bins at smaller radial distances are represented in other colors: green/black, yellow/gray, and orange/cyan for glycerol/water atoms in [R - 4 Å, R - 3 Å], [R - 3 Å, R - 2 Å], and [R - 2 Å, R - 1 Å], respectively. (D) Local concentration maps calculated for glycerol and water atoms with residence times longer than 2 ns. Cutoff values for isosurfaces are like those in Figure 6A.

Figure 9. Radial distribution functions of the center of mass of water and glycerol molecules and the respective glycerol atoms.

mobile glycerol molecules which are more distant from the protein surface also appears from the radial distribution functions (RDFs) of glycerol atoms. Glycerol atoms are named as in the inset of Figure 9. For water and glycerol O-atoms, the first peak of the RDF occurs at 1 Å (Figure 9). This distance is determined by the average distance of solvent-protein hydrogen-bonds. At 3.7 Å, a second peak is observed for O2 but not for O1/O3. The distance between the first peak and the second peak of O2 corresponds with the average intramolecular distance between O2 and O1/O3 (2.87 ( 0.10 Å). This indicates that the second peak originates from glycerol molecules for which either O1 or O3 is in contact with the protein surface and for which the vector connecting O1/O3 and O2 is almost perpendicular to the protein surface. Such glycerol conformations form at most one hydrogen-bond with the protein surface, and therefore, the corresponding residence time is expected to be short. Indeed, we find that the second peak for O2 is not present in the RDF calculated for glycerol molecules with residence times longer than 2 ns (data not shown). Taken together, the above information gives rise to a picture whereby glycerol molecules involved in class 2 glycerol-protein interactions (τ2X > 2 ns) stick at specific loci at the protein surface where multiple hydrogen-bonds between a glycerol molecule and the protein can be formed, whereas glycerol molecules involved in class 1 glycerol-protein interactions (τ1X < 1 ns) mostly reside at further radial distances from the protein surface (2 < r < 5 Å) and regularly move to and from the protein surface to form hydrogen-bonds involving one O-atom (Figure

J. Phys. Chem. B, Vol. 113, No. 34, 2009 11751

Figure 10. Characteristics and schematic representation of class 1 and 2 glycerol-protein interactions. Dotted lines represent hydrogen-bonds to the protein surface. A distinctive orientation of class 1 glycerol molecules is such that either O1 or O3 is hydrogen-bonded to the protein surface, and the vector connecting O1/O3 and O2 is almost perpendicular to the protein surface (bold). The double arrow indicates the average distance between O1/O3 and O2 (2.8 Å).

10). The strength of such hydrogen-bonds is expected to be comparable to water-protein hydrogen-bonds. This hypothesis is corroborated by the similarity of characteristic residence times of class 1 glycerol molecules and characteristic residence times of class 2 water molecules (Tables 3 and 4). This suggests that class 1 glycerol molecules and class 2 water molecules compete for the same surface loci where they form single hydrogenbonds. 3.5. Contributions of Solvent Regions to the Preferential Interaction Coefficient. Class 2 glycerol molecules from multiple hydrogen-bonds with N- and O-atoms which generally belong to adjacent residues at the protein surface. Such cooperative interactions are not accounted for by residue-based local preferential interaction coefficients. Instead, we differentiate solvent regions based on the type of solvent-protein interactions: I. Solvent regions near protein surface loci with relative orientations of N- and O-atoms which favor the formation of multiple hydrogen-bonds with O-atoms of the same glycerol molecule. Such solvent regions are predominantly occupied by class 2 glycerol molecules and glycerol molecules residing in the local domain for the entire simulation time. II. Solvent regions which are mainly occupied by water molecules with long characteristic residence times (>5 ns), i.e. class 3 water molecules and water molecules residing in the local domain for the entire simulation time. This type of solvent regions includes narrow clefts at the protein surface which are accessible to water but not to glycerol. III. Solvent regions where glycerol and water molecules compete to form single hydrogen-bonds with the protein surface. Class 1 glycerol molecules and class 2 water molecules form single hydrogen-bonds with the protein surface. Since glycerol molecules are larger than water and occupy solvent regions that extend to further radial distances compared to water (Figure 8C and Figure 10), exchange of glycerol by water at such protein surface loci involves several class 2 water molecules (which form hydrogen-bonds with the protein surface) as well as class 1 water molecules (which do not form hydrogen-bonds with the protein surface). Hence, type III solvent regions are occupied by class 1 glycerol molecules and class 1 and 2 water molecules. IV. Solvent regions near protein surface loci that do not interact favorably with either glycerol or water. Preferential

11752

J. Phys. Chem. B, Vol. 113, No. 34, 2009

Figure 11. Types of solvent regions and their contributions to ΓXP are calculated from the average number of solvent molecules in the corresponding classes (eq 19, Tables 3 and 4). Atom colors for glycerol are as in Figure 5, water O-atoms are represented as dark blue spheres, the protein surface is colored gray and the dotted lines represent hydrogen-bonds.

hydration/solvation of such solvent regions is determined by mobile solvent moleculessi.e., class 1 glycerol molecules and class 1 and 2 water moleculesswhich travel to and from neighboring loci of the protein surface. On the basis of the average number of molecules in the respective classes (Tables 3 and 4), contributions of different types of solvent regions to ΓXP are estimated (Figure 11). The contribution of type I solvent regions to ΓXP is strongly positive (15 ( 4), and the contribution of type II solvent regions ΓXP is minor. The combined contribution of type III and type IV solvent regions to ΓXP is strongly negative (-22 ( 4), even though the affinity of protein surface loci in these solvent regions is similar for water and glycerol. The negative contribution of these solvent regions is in accord with the fact that atoms of class 1 glycerol molecules are located at larger radial distances than water (Figure 8). Presumably, glycerol is preferentially excluded in these solvent regions because of the difference of the molecular size of glycerol and watersi.e. the excluded volume effect16sand because of the preferential orientation of class 1 glycerol molecules whereby glycerol atoms are further excluded from the protein surface as would be the case for randomly oriented glycerol molecules (Figure 10). Besides, a number of solvent molecules with small residence time in the local domain (2 ns), and contributions of type II solvent regions to ΓXP are minor. Yet, the overall value of ΓXP is negative as the combined contribution of type III and type IV solvent regions to ΓXP is more negative (-22 ( 4). This study has demonstrated for the first time how the thermodynamic quantity ΓXP can be anatomized on the molecular level by elucidating protein solvation in mixed solvents. In contrast to previous models that emulate molecular understanding of ΓXP, our methodology fully accounts for the heterogeneity of contributions of solvent regions to ΓXP at distinct protein surface loci and for cooperative interactions of solvent molecules with different protein residues. Our methodology can be extended to study preferential interactions of other proteins and cosolvents; such studies will enable further understanding of preferential interactions and solvent effects on protein thermodynamics. Remarkably, we found that local conformational changes of the protein lead to significant differences in ΓXP. We are currently investigating the effects of protein conformational changes on ΓXP and the consequences for protein thermodynamics in mixed solvents. Acknowledgment. The authors acknowledge funding support from the Singapore-MIT Alliance. Supporting Information Available: Additional details are available regarding the characterization of fluctuations of the number density ratio of glycerol and water in the bulk domain, the statistical analysis of the radial distribution of molecule number fluctuations, and the change of the characteristic residence times with respect to the extent of the local domain. This material is available free of charge via the Internet at http:// pubs.acs.org. References and Notes (1) Timasheff, S. N. AdV. Protein Chem. 1998, 51, 355. (2) Yancey, P. H.; Clark, M. E.; Hand, S. C.; Bowlus, R. D.; Somero, G. N. Science 1982, 217, 1214. (3) Wang, W. Int. J. Pharm. 2005, 289, 1. (4) Chi, E. Y.; Krishnan, S.; Randolph, T. W.; Carpenter, J. F. Pharm. Res. 2003, 20, 1325. (5) Curtis, R. A.; Lue, L. Chem. Eng. Sci. 2006, 61, 907. (6) Clark, E. D. Curr. Opin. Biotechnol. 2001, 12, 202. (7) Frokjaer, S.; Otzen, D. E. Nat. ReV. Drug DiscoV. 2005, 4, 298. (8) Tanford, C. J. Mol. Biol. 1969, 39, 539. (9) Shimizu, S.; Boon, C. L. J. Chem. Phys. 2004, 121, 9147.

Preferential Interaction Coefficients of Lysozymes (10) Smith, P. E. J. Phys. Chem. B 2004, 108, 18716. (11) Schurr, J. M.; Rangel, D. P.; Aragon, S. R. Biophys. J. 2005, 89, 2258. (12) Shulgin, I. L.; Ruckenstein, E. J. Chem. Phys. 2005, 123. (13) Smith, P. E. J. Phys. Chem. B 2006, 110, 2862. (14) Roche, C. J.; Guo, F.; Friedman, J. M. J. Biol. Chem. 2006, 281, 38757. (15) Sinibaldi, R.; Ortore, M. G.; Spinozzi, F.; Carsughi, F.; Frielinghaus, H.; Cinelli, S.; Onori, G.; Mariani, P. J. Chem. Phys. 2007, 126. (16) Schellman, J. A. Biophys. J. 2003, 85, 108. (17) Makarov, V.; Pettitt, B. M.; Feig, M. Acc. Chem. Res. 2002, 35, 376. (18) Merzel, F.; Smith, J. C. Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 5378. (19) Marchi, M.; Sterpone, F.; Ceccarelli, M. J. Am. Chem. Soc. 2002, 124, 6787. (20) Sterpone, F.; Ceccarelli, M.; Marchi, M. J. Mol. Biol. 2001, 311, 409. (21) Makarov, V. A.; Andrews, B. K.; Smith, P. E.; Pettitt, B. M. Biophys. J. 2000, 79, 2966. (22) Roccatano, D. Curr. Protein Peptide Sci. 2008, 9, 407. (23) Bennion, B. J.; Daggett, V. Proc. Natl. Acad. Sci. U.S.A. 2003, 100, 5142. (24) Bennion, B. J.; Daggett, V. Proc. Natl. Acad. Sci. U.S.A. 2004, 101, 6433. (25) Caflisch, A.; Karplus, M. Struct. Folding Des. 1999, 7, 477. (26) TiradoRives, J.; Orozco, M.; Jorgensen, W. L. Biochemistry 1997, 36, 7313. (27) Micaelo, N. M.; Soares, C. M. Febs J. 2007, 274, 2424. (28) Lins, R. D.; Pereira, C. S.; Hunenberger, P. H. ProteinssStruct., Funct., Bioinf. 2004, 55, 177. (29) Baynes, B. M.; Trout, B. L. J. Phys. Chem. B 2003, 107, 14058. (30) Kang, M.; Smith, P. E. Fluid Phase Equilib. 2007, 256, 14. (31) Smolin, N.; Winter, R. J. Phys. Chem. B 2008, 112, 997. (32) Abui, M.; Smith, P. E. J. Phys. Chem. B 2004, 108, 7382. (33) Chitra, R.; Smith, P. E. J. Phys. Chem. B 2001, 105, 11513. (34) Shah, P. P.; Roberts, C. J. J. Phys. Chem. B 2007, 111, 4467. (35) Ghosh, T.; Kalra, A.; Garde, S. J. Phys. Chem. B 2005, 109, 642. (36) Diwakar, S.; Shinde, C.; Trout, B. L. J. Phys. Chem. B, submitted for publication. (37) Knubovets, T.; Osterhout, J. J.; Connolly, P. J.; Klibanov, A. M. Proc. Natl. Acad. Sci. U.S.A. 1999, 96, 1262. (38) Gekko, K.; Timasheff, S. N. Biochemistry 1981, 20, 4667. (39) Bolen, D. W. Methods 2004, 34, 312. (40) Wu, L. Z.; Ma, B. L.; Sheng, Y. B.; Wang, W. J. Mol. Struct. 2008, 891, 167.

J. Phys. Chem. B, Vol. 113, No. 34, 2009 11753 (41) Davis-Searles, P. R.; Saunders, A. J.; Erie, D. A.; Winzor, D. J.; Pielak, G. J. Annu. ReV. Biophys. Biomol. Struct. 2001, 30, 271. (42) Brooks, B. R.; Bruccoleri, R. E.; Olafson, B. D.; States, D. J.; Swaminathan, S.; Karplus, M. J. Comput. Chem. 1983, 4, 187. (43) Bosart, L. W.; Snoddy, A. O. Ind. Eng. Chem. 1927, 19, 5. (44) MacKerell, A. D.; Bashford, D.; Bellott, M.; Dunbrack, R. L.; Evanseck, J. D.; Field, M. J.; Fischer, S.; Gao, J.; Guo, H.; Ha, S.; JosephMcCarthy, D.; Kuchnir, L.; Kuczera, K.; Lau, F. T. K.; Mattos, C.; Michnick, S.; Ngo, T.; Nguyen, D. T.; Prodhom, B.; Reiher, W. E.; Roux, B.; Schlenkrich, M.; Smith, J. C.; Stote, R.; Straub, J.; Watanabe, M.; Wiorkiewicz-Kuczera, J.; Yin, D.; Karplus, M. J. Phys. Chem. B 1998, 102, 3586. (45) Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L. J. Chem. Phys. 1983, 79, 926. (46) Ha, S. N.; Giammona, A.; Field, M.; Brady, J. W. Carbohydr. Res. 1988, 180, 207. (47) Phillips, J. C.; Braun, R.; Wang, W.; Gumbart, J.; Tajkhorshid, E.; Villa, E.; Chipot, C.; Skeel, R. D.; Kale, L.; Schulten, K. J. Comput. Chem. 2005, 26, 1781. (48) Record, M. T.; Zhang, W. T.; Anderson, C. F. AdV. Protein Chem. 1998, 51, 281. (49) Shimizu, S. Proc. Natl. Acad. Sci. U.S.A. 2004, 101, 1195. (50) Shimizu, S.; Matubayasi, N. Chem. Phys. Lett. 2006, 420, 518. (51) Smith, P. E. J. Phys. Chem. B 2004, 108, 16271. (52) Pierce, V.; Kang, M.; Aburi, M.; Weerasinghe, S.; Smith, P. E. Cell Biochem. Biophys. 2008, 50, 1. (53) Allen, M. P.; Tildesley, D. J. Computer simulation of liquids; Clarendon Press; Oxford University Press: Oxford [Oxfordshire] New York, 1987. (54) Brooks, C. L.; Karplus, M. J. Mol. Biol. 1989, 208, 159. (55) Impey, R. W.; Madden, P. A.; McDonald, I. R. J. Phys. Chem. 1983, 87, 5071. (56) Humphrey, W.; Dalke, A.; Schulten, K. J. Mol. Graph. 1996, 14, 33. (57) Pizzitutti, F.; Marchi, M.; Sterpone, F.; Rossky, P. J. J. Phys. Chem. B 2007, 111, 7584. (58) Adler, M.; Davey, D. D.; Phillips, G. B.; Kim, S. H.; Jancarik, J.; Rumennik, G.; Light, D. R.; Whitlow, M. Biochemistry 2000, 39, 12534. (59) Courtenay, E. S.; Capp, M. W.; Anderson, C. F.; Record, M. T. Biochemistry 2000, 39, 4455. (60) Schneider, C. P.; Trout, B. L. J. Phys. Chem. B 2009, 113, 2050. (61) D’Errico, G.; Ortona, O.; Capuano, F.; Vitagliano, V. J. Chem. Eng. Data 2004, 49, 1665.

JP903413V