Extensive Assessment of Various Computational ... - ACS Publications

Jun 23, 2017 - ABSTRACT: A series of computational methods for pKa shift prediction are extensively tested on a set of benchmark protein systems, aimi...
1 downloads 8 Views 3MB Size
Article pubs.acs.org/jcim

Extensive Assessment of Various Computational Methods for Aspartate’s pKa Shift Zhaoxi Sun,*,† Xiaohui Wang,† and Jianing Song*,‡,§ †

State Key Laboratory of Precision Spectroscopy, School of Physics and Material Science, East China Normal University, Shanghai 200062, China ‡ NYU-ECNU Center for Computational Chemistry, NYU Shanghai, Shanghai 200062, China § School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China S Supporting Information *

ABSTRACT: A series of computational methods for pKa shift prediction are extensively tested on a set of benchmark protein systems, aiming at identifying pitfalls and evaluating their performance on high variants. Including 19 ASP residues in 10 protein systems, the benchmark set consists of both residues with highly shifted pKa values as well as those varying little from the reference value, with an experimental RMS free energy differences of 2.49 kcal/mol with respect to blocked amino acid, namely the RMS pKa shift being 1.82 pKa units. The constant pH molecular dynamics (MD), alchemical methods, PROPKA3.1, and multiconformation continuum electrostatics give RMSDs of 1.52, 2.58, 1.37, and 3.52 pKa units, respectively, on the benchmark set. The empirical scoring method is the most accurate one with extremely low computational cost, and the pH-dependent model is also able to provide accurate results, while the accuracy of MD sampling incorporating alchemical free energy simulation is prohibited by convergence achievement and the performance of conformational search incorporating multiconformation continuum electrostatics is bad. Former research works did not define statistical uncertainty with care and yielded the questionable conclusion that alchemical methods perform well in most benchmarks. In this work the traditional alchemical methods are thoroughly tested for high variants. We also performed the first application of nonequilibrium alchemical methods to the pKa cases.

I. INTRODUCTION Recently the pH-dependent stability and functionality of proteins have been recognized as being very important.1−11 Given this information, greater emphasis has been attached to the area of determining the pKa values of ionizable groups.12−16 Interactions between buried charges fall off slowly as a result of the low dielectric constant of the protein region, so that the ionizations of sites are interdependent. The degree of charge burial and the local flexibility influence the free energy of protonation for titratable residues.17 Research on pKa predictions can be widely seen in the computational biophysics community. In most studies the calculated pKa shifts deviate little from experimental values, having RMSD of no more than 1 pKa unit,18−24 which is somewhat triggered by the fact that these data sets are filled with surface residues or groups with no strong neighboring intramolecular interactions. Substantially pKa-shifted ionizable groups are difficult to model accurately and precisely, and the RMSD of prediction values would rise significantly. Hence, appropriate benchmark sets should be divided into three groups: strongly perturbed situations, low variants, and a combination of both. Although the last ones and middle ones are studied more often, the conclusions are biased and cannot be extended to the first ones. In highly shifted situations the accuracy, precision and relative rank are of importance. © XXXX American Chemical Society

The pitfalls and major deficiency could then be identified. Here we choose a highly pKa-shifted benchmark set to evaluate the performance of modern accessible pKa prediction methods. An increasing number of pKa prediction methods are devised. Most of them require a significant computational effort, which bars their application to industry. However, being parametrized or trained like scoring functions in protein−ligand binding systems, empirical methods such as PROPKA serve as an efficient alternative for high-throughput screening and drug design.25 They are highly efficient with relatively poor accuracy.26−28 In a research on predictions of relative stability of inhibitors’ with thermodynamic integration as well as a series of good rapid scoring functions by Pearlman et al., they conclude that precise free energy simulations provide reliable and accurate predictions while the predictions obtained from the scoring functions are of low reliability but still qualitatively agree with the experimental results with little demand on computational resources. Methods that are not based on free energy simulations are prone to provide inaccurate predictions.29 A variety of alchemical free energy simulation methods are applicable for pKa shift calculation. They are divided into Received: March 27, 2017

A

DOI: 10.1021/acs.jcim.7b00177 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

Aside from the sampling problem, the coupling between spatial neighboring titratable groups also influences simulation results. The pKa shift value is determined by the electrostatic environment, the explicit symbols of which include atomic description of protein conformation and protonation states of other titratable groups. Neglecting the tight coupling between conformation and protonation states or the pH of the environment, the preset and fixed protonation state model triggers severe problems: assigning the protonation states of any ionizable groups without accurate knowledge induces strong artificial factors in the simulation, and the dynamic-protonation coupling, where the ionization equilibrium should be represented by the ensemble of protonation states, is totally destroyed. For accurate pK a shift calculations and pH-coupled simulations, various methods have been devised, tested, and developed in the computational community.14,87−94 Some methods concentrate on single/multiple structure(s) obtained from crystal structure or averaged ones,13,15−17,25,95 while others use hybrid MD/MC (Monte Carlo) schemes in implicit or explicit (hybrid) solvent with a continuous92,96,97 or discrete91,93,98,99 description of protonation states. As single conformation has been proven to be insufficient for determination of pKa values, constant-pH MD simulations are strongly recommended for accurate pKa shift predictions. A periodic abrupt switch of MC step in the protonation states introduces a discontinuity in energy and force, thus simulation in implicit solvent without the need of relaxation of solvent being preferred. Note that the charge of the simulated system is generally nonzero and changes with the protonation states of ionizable groups, which may present a problem in an explicit solvent with periodic boundary conditions. Almost all published pKa calculations only assessed their results with RMSE or other error values with respect to experimental results, which only reflects one aspect of the question. In this work, we extensively assessed the performance of various methods including alchemical free energy simulation with statistical inefficiency being fully considered. Various rank correlations and absolute errors between the predictions and the experimental references are used as estimators.

two groups: (a) thermodynamic integration (TI) and its variants30−32 and (b) perturbation based methods of free energy perturbation of exponential average33 and acceptance ratio methods (BAR,34,35 UBAR,36 RBAR,36 MBAR37). Slow growth38−40 is a combination of both types but its performance is poor. Their semiempirical extensions of linear response approximation (LRA)41 and linear interaction energy (LIE)42−44 are often applied in pKa shift calculation as the electrostatic response is theoretically linear.41,45−53 Sample size hysteresis54 bars the use of exponential averaging, while variance propagation in TI is not theoretically rigorous and the TI estimate varies with the integration method used.55 Bidirectional estimator of BAR provides asymptotically unbiased results with the lowest variance in the large sample regime.35,56 In the slow growth method, the forward switching places an upper bound on free energy difference,57,58 while the corresponding reverse process represents a lower one on free energy difference. Minimizing the difference of the upper and lower bounds with the variational path optimization scheme in conjunction with a metric scaling strategy,59 one can improve the efficiency of slow growth dramatically.60 But the general applicability of slow growth is still worse than other equilibrium alchemical methods. Traditional free energy simulation methods along selected reaction coordinates are based on equilibrium dynamics or nonequilibrium trajectories, the latter of which such as metadynamics61 and self-healing umbrella sampling62 are principally needed to reach an equilibrium sampling in the subspace of the order parameters. As a result, they often refer to the equilibrium regimes or quasi-equilibrium ones, while the truly nonequilibrium scenarios were introduced as Jarzynski’s identity (JI)56,63 and Crooks’ equation (CE).64 Originally derived in a Hamiltonian formulation63 and extended to a Langevin case and a Markov one,65 the nonequilibrium work (NEW) method of Jarzynski’s identity is widely applied into various systems.66−79 Provided the relationship between nonequilibrium ensemble averages and equilibrium thermodynamic properties, path integral (PI) theory is obviously suitable for path averages of NEW.80 Field-theoretic proof of the nonequilibrium work relations for a space-dependent field with stochastic dynamics was provided, where JI and CE are directly obtained from the supersymmetry invariance of the Langevin equation. Microscopic reversibility leads to the supersymmetry, which gives Jarzynski’s theorem.81 The JI proof was also given under a hidden supersymmetry in a classical Hamiltonian system,82 while nonequilibrium PI for quantum free energy was also presented.83,84 Enhanced sampling methods are also coupled with NEW, and nonequilibrium umbrella sampling and forward flux sampling are applied to systems in nonequilibrium steady states.85 NEW BAR is also derived with maximum likelihood methods, showing its correspondence to Crooks’ equation.34 To our best knowledge, NEW is never used to perform pKa predictions. In this work, we perform the first application of NEW on pKa shift calculation. Non-pH-coupled alchemical methods are widely applied to pKa shift predictions. In many benchmarks researchers come to the conclusion that they are able to provide qualitatively acceptable and predictions within a several-hundred picosecond simulation. However, the simulation time is too short to get a converged picture in the subspace of the interaction energy between the TI region and the rest of the system. Hence the results are questionable and involve conformational bias or luck as these methods do in the case of protein ligand binding affinities.86 Statistical inefficiency is never calculated, and statistical uncertainty is always estimated by block averaging.

II. METHODOLOGY Understanding, predicting, and modulating protein pKa values is of significant importance in molecule recognition and protein engineering, estimating the pH optimum for acid−base catalytic proteins with ionizable groups via pKa calculation, for instance. An ionizable group with intrinsic pKa is perturbed by its surroundings in protein system, showing different electrostatic behavior and thus different pKa values pK a,protein = pKa ,model + ΔpK a

(1)

where all terms are self-explainable. As the pKa is an explicit function of free energy difference between protonated state and deprotonated state pKa ,protein − pKa ,model =

β (ΔGprotein − ΔGmodel) ln 10

(2)

the pKa shift calculation turns to estimation of double free energy difference of the transformation in Figure 1. Programs such as PROPKA25,95 and MCCE17 are able to provide good predictions.23,24 PROPKA utilizes an empirical method and MCCE uses a continuum solvent model based on B

DOI: 10.1021/acs.jcim.7b00177 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

standard Single Conformation continuum electrostatics calculations. And finally, the pKa shift is obtained from the bestfitting Henderson−Hasselbaich curve. Constant pH MD. As the protonation of ionizable groups are always coupled with each other and the changes in protonation states incur the reorganization of the system, the hybrid MD/MC method of constant pH is devised to perform pH-coupled dynamics mostly in implicit solvent to avoid a treatment of the need for solvent relaxation equilibration steps. Being able to catch important factors mentioned above, the method has been developed for years and we use the discrete protonation state method combined with implicit solvent of GB model,101 which replaces numerical solution of Poisson equation with approximate analytical expression. The titratable residues are described with the constph force field (AMBER14SB102 in conjunction with modifications). All atoms have some differences in their charges between protonated and deprotonated forms, and the largest changes in these charge sets are on areas near the titratable proton. The reference pKa value of aspartate is 4.0.103 The transition free energy is divided into molecular mechanics (MM) part and the non-MM part, the latter of which is assumed to be approximately the same in the protein environment and in the model compound which have the reference pKa value.

Figure 1. Thermodynamic cycle for pKa shift calculation. The free energy difference is calculated from protonated aspartate (ASH) to deprotonated aspartate (ASP).

Poisson−Boltzmann equation. Alchemical methods include conformational sampling, and constant pH provides pH-couple picture of protein dynamics. In the following part we briefly describe the methods we use to calculate pKa shifts. PROPKA.95 Using an effective perturbation functional to describe the environmental perturbation from the protein groups, the expression in the current implementation of PROPKA3 is ΔpK a, i = =

ΔpK a,self i

+

(ΔpK a,desolv i

ΔG = ΔGMM + ΔGnon‐MM MM non‐MM ΔGprotein = ΔGprotein + ΔGprotein

ΔpK a,QQ i +

ΔpK a,HB i

+

ΔpK a,REi )

+

MM MM = ΔGprotein − ΔGmodel +

ΔpK a,QQ i

1 ln 10(pH − pK a,model) β (4)

(3)

offset

The offset values E used in each Metropolis Monte Carlo attempt are obtained following the approach of Mongan et al.98

where ΔpKQQ a,i is the coulomb contribution of protein charge− charge interaction with all other ionizable groups and ΔpKself a,i is the self-energy or intrinsic contribution obtained by calculating the remaining terms with all other ionizable groups in neutral form, consisting of another three terms, namely the desolvation penalty when the protein replace ambient water ΔpKdesolv , a,i the contribution of hydrogen bond interactions ΔpKHB a,i , and the contribution due to unfavorable electrostatic reorganization energies ΔpKRE a,i . Empirically optimized parameters lead to high performance and efficiency for PROPKA predictions. Previous version of PROPKA is tested to perform less well for ASP than for GLU and LYS,24 while the current PROPKA3 provides a moderate but significantly improvement.25 The vast majority of protein titratable residues remaining their reference pKa values, the effect of the protein environment is usually small and triggers little pKa shift. As the environment induced free energy differences of 1.36 kcal/mol per pKa unit (at 298 K) is relative small compared to solvation energies differences of about 70 kcal/mol.25 For the cases with large pKa shift values and where protonation are coupled with significant structural rearrangement, single conformation based empirical methods fail. Conformational sampling is needed in these cases. Note that the pKa for model ASP is 3.8.25 Because of the efficiency and accuracy of PROPKA, this method is the most recommended one in several pKa benchmark studies.23,24 MCCE.17,100 Multiconformation continuum electrostatics (MCCE) explores different conformational degrees of freedom (side chain rotamers around Asp and Glu termini) as well as ionization states with Monte Carlo sampling with continuum electrostatics based energy functions for pKa shift calculations. The method include conformational search (but only side chain rotations) and improves the calculations compared with

MM E offset(pH) = ΔGmodel −

ln 10 (pH − pK a,model) β

(5)

The reference energies we use are precomputed values in AMBER, which are obtained at 0.1 M salt concentration with a cutoff of 30 Å under AMBER 99SB102 force field in GB solvent. If the protonation fraction is a monotonic function of pH, the pKa shift value or pKa value can be obtained with the HH (Henderson−Hasselbaich) equation94,96 fdeprot =

1 1 + 10

n(pK a − pH)

(6)

where fdeprot represents the deprotonation fraction and n denotes Hill coefficient. If the protonation of a titratable group is strongly coupled with others, the Hill coefficient differs from unity and the fitting of HH equation may probably be poor. Benefiting from expanded ensemble method, the constant pH MD is highly efficient and information about all ionizable groups can be obtained in a single simulation. Note that in expanded ensemble simulations the decorrelation of simulation data is improved and statistical inefficiency is not calculated. The λ-dynamics based continuous constant pH method is not tested here, but it is proved to be a robust pKa prediction method.104,105 It is the only method that can calculate pKa for transmembrane proteins currently.106,107 The PME-based continuous CpHMD108 and hybrid-solvent continuous CpHMD109 are also implemented and give predictions with almost the same accuracy. Various Nonphysical Intermediate State Based Equilibrium Alchemical Methods. Alchemical methods estimate C

DOI: 10.1021/acs.jcim.7b00177 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

The asymptotically unbiased estimator of BAR providing minimum variance in large sample size regime, the need for overlap between neighboring windows is relative small than other overlap based methods, such as FEP. Being the most efficient scheme for two states transformation, BAR provides the optimal estimate for free energy differences with fixed numbers of samples. Practically one solves the following equation in a self-consistent way to get ΔG and its variance. The term f denotes a Fermi function.

the free energy difference between two end states, whose phase spaces are bridged with alchemical intermediate states. These methods include TP (thermodynamic perturbation) or FEP (free energy perturbation), TI (thermodynamic integration), acceptance ratio methods (BAR, MBAR), and nonequilibrium methods of JI and CE. (The generalized ensemble based replica exchange method is included in equilibrium methods.) The nonequilibrium ones are demonstrated in the next section as their thermodynamic foundations differ somehow, thus having different behaviors in realization, convergence, and efficiency. Thermodynamic integration method is widely applied to investigate various systems.110,111 Ensemble averages of the derivative of potential energy function with respect to a coupling parameter are calculated at a series of intermediate states and the free energy difference is obtained by integration with the trapezoid rule (the normal TI) or a cubic spline (TI cubic)30 ΔG =

∫0

1

∂U ∂λ

K

dλ = λ

∑ ωi i=1

∂U ∂λ

ΔGij = ln

σ =

⟨f (Uj − Ui − Cij)⟩i

+ Cij

⎛ NQ ⎞ j i ⎟ Cij = ln⎜⎜ ⎟ Q N ⎝ j i⎠ σΔGij 2 =

λi

K 2

⟨f (Ui − Uj + Cij)⟩j

∑ ωi 2σi 2

⟨f 2 (Ui − Uj + Cij)⟩j Nj⟨f (Ui − Uj + Cij)⟩j 2 1 1 − − Nj Ni

+

⟨f 2 (Ui − Uj + Cij)⟩i Ni⟨f (Ui − Uj + Cij)⟩i 2

(7)

(9)

where ωi denotes the weighting factor used in all integration regime, K designates the total number of intermediate states,

The statistical optimal estimator of MBAR is an extension of original BAR. In this method information contained in configurations sampled from all intermediate states are used to calculate the free energy difference of the alchemical transformation, and the cross correlations of non-neighboring states are also taken into consideration. Thus, MBAR variance is always larger than that of BAR for the overall free energy difference. The dimensionless free energy for ith state is calculated by

i=1

and σi2 is the variance of

∂U . ∂λ λ i

The variance of the free energy

estimate can be obtained by variance propagation. The weighted sum nature of TI makes it curvature-dependent (∂U ), namely ∂λ alchemical path chosen strongly influencing the TI results. Linear interpolation and other polynomial fitting regimes30,31 introduce integration bias which is difficult to quantify, and corresponding results differ if ∂U varies nonlinearly between end ∂λ states. Thus, often the TI derivatives underestimate the overall variance and are unreliable compared with bidirectional perturbation-based acceptance ratio methods. Perturbation-based techniques include exponential averaging of TP33,112 and acceptance ratio methods.34−37 Using potential energies between two adjacent states, exponential averaging schemes are based on the Zwanzig relationship33

K

qi(x jn) Nk K j = 1 n = 1 ∑k = 1 Q qk (x jn) k

gî = −ln ∑ ∑ K

j=1 n=1

N ⟨x⟩2

, with x = ⟨e−β ΔU (qn)ij⟩λi

e−Ui(x jn) K

∑k = 1 Nke−Uj(x jn)

(10)

where Nk is the number of samples from state k, Qk denotes the partition function of state k, xjn represents the coordinate vector for nth configuration in state j, and qi is the weight in the ith state.37 And uncertainty of the free energy estimate is calculated from the covariance matrix of free energy of each state. Differences of correlation time in these intermediate states result in different numbers of samples in the equal time protocol, triggering significantly unbalanced statistical uncertainty along the alchemical path and large variance in the overall free energy estimate. Hence the efficient CPU time distribution scheme should be considered with care. In this work our intermediate states sampling is guided by the Optimum BAR scheme, which finds the steepest decent direction/window for free energy estimate and samples along it.116 To evaluate the quality of window spacing, we use the overlap matrix, which is a quantitative estimator for phase space overlap. Its element Oij is the average probability of finding a sample from state j at state i. The relationship between uncertainty of free energy difference and the overlap matrix is hard to write explicitly, but its element is somewhat negatively correlated with the variance of the corresponding perturbationbased transformation estimator. Oij should also be substantial with no zero in the main diagonal and its neighbors, the

N

σx 2

Nj

= −ln ∑ ∑

1 ΔGij = − ln(N ∑ e−β ΔU (qn)ij) β n=1 σi 2 =

Nj

(8)

where N represents the number of samples for the transformation of intermediate states i and j, σ designates the variance, and β represents the reciprocal temperature. A Gaussian estimate of exponential averaging is also proposed.113 Estimators based on the Zwanzig relationship always produce direction-dependent results, namely the free energy estimates being biased, which originates from inadequate sampling in the tail regions of ΔUij distributions.114,115 Forward directional and backward directional perturbations are termed as desert exponential averaging and insert exponential averaging for the sign of entropy change. Suffering from sample size hystersis,54 convergence is hard to reach and systematic bias is large. In this manuscript the perturbation in the forward direction is referred as TP (F) and that in the reverse direction is called TP (B). D

DOI: 10.1021/acs.jcim.7b00177 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

successive perturbations should also be performed. Then we write estimates of slow growth and fast growth together

nonzeros of which should be appreciably different from zero and larger than 0.03.115 Nonequilibrium Methods. Rather than using a series of alchemical states to enhance phase space overlap, nonequilibrium methods directly include walk in Hamiltonian space. Being a variant of thermodynamic integration,38 the original pulling method named slow growth or adiabatic switching evolves the system with an controlled parameter λ being changed within infinite time, driving the system from the initial equilibrium state to the final one, after which the free energy difference is obtained with a basic result of statistical mechanics:57,58 Reversibly and isothermally pulling the system from initial state A to the final state B, one finds that the free energy difference equals the external work, as is shown in the following equation ΔF = Wτ

W τa , N =

(12)



ΔF =

(13)

(14)

Simulation begins with sampling of the initial microstates from canonical ensemble corresponding to state A, after which the system evolves in time under a thermostatting scheme holding the temperature fixed at T, with the value of an external variable λ switched. The work can be calculated via the following expression for discrete-time evolution τ−1

Wτ =

∑ (H λ t=0

(q t) − H λt(q t))

t+1

τ ,n

n=1

⎞ ⎟⎟ ⎠

(16)

ωn n!

(Waτ

(17)

where ωn is the nth cumulant of the ensemble distribution of ΔV/W. If the underlying distribution is approximately Gaussian, then one gets the expression of linear response theory, where only the first two terms survive. And the relationship between slow growth, linear response and fast growth can be understood as different realization with different truncation of cumulant expansion terms. For specific distribution multiple Gaussian functions fitting scheme is also reported.121 Accuracy Assessments. As the reference pKa for different methods differs, we use the pKa shift value to estimate their performances. As is noted above, at 298 K the free energy difference is larger than pKa shift (1 pKa unit = 1.36 kcal/mol) and hence more sensitive, and the convergence is normally determined by small standard deviation of free energy difference. We then use relative free energy change for comparison. Determining the performance of a method is also of significance, where both accuracy and precision should be considered. Well-converged results can be inaccurate if errors are harbored by force field and model building. The mean signed error (MSE), the mean absolute error (MAE), the root-meansquared error (RMSE), Pearson’s correlation coefficient (R2), Kendall’s rank correlation coefficient (τ),122 Pearlman’s predictive index (PI),29 and the slope of the best correlation line are most used measurements. Pearlman’s predictive index, which evaluates the reliability of a prediction method, is broadly used to characterize the estimates of various theatrical methods for protein ligand binding, such as MM/PBSA, SIE, MM/QMSA, MM/QM-COSMO, and TI.29,123,124 1.0 means perfect correct prediction, −1.0 is perfect incorrect prediction, and 0.0 is random.29 Kendall’s rank correlation coefficient (τ)122 is another estimator for

where Wx,N represents the exponential average of Wτ over τ N simulations. Thus, the free energy difference can be calculated via performing a series of switching for arbitrary duration. We obtain the finite number of realizations formula, and the number of realizations is where the convergence check is required. ΔF ≈ W τx , N

∑ (−β)n− 1 n=1

W τx , N

1 W τx , N = − ln(e−βWτ ) β

N

∑ e − βW

The ordinary average overestimates ΔF ≥ ΔF), while the exponential average Wxτ provides better estimates of ΔF but still is subject to systematic bias, according to Jensen’s inequality117 of e x ≥ e x ̅ . Both averages reduce to the same value for reversibly switching. For each simulation, the work W gives an estimate of ΔF, which is subject to statistical error and systematic error.54 The statistical error leads to the spread of work while the systematic one adds a bias to the estimate, which is the same as FEP. While the statistical error vanishes with infinite realization sampling, according to a thermodynamic consideration the bias remains.57,58,63,118,119 The size of errors in the estimate of ΔF depends only on Nτ.120 Small τ triggers a wide distribution of work value, thus resulting in slow convergence of the exponential average with N.54 Also, as the sample size in nonequilibrium simulation is much smaller than that in equilibrium simulation, sample size hysteresis is relatively significant. Hence convergence check is necessary. Cumulant expanding the exponential average of FEP/JI, one obtains a general form of ΔF estimates33

which has been proven valid for arbitrary duration of the switching process with initial configurations sampled from the starting equilibrium state A as well as for a variety of thermostatting schemes.56,64 In practice the average is actually done with respect to the number of realizations N →∞

n=1

Waτ

where ΔF represents free energy difference and Wτ is the external work performed on the system within the simulation time of τ. Practically the infinite duration of reversible process can never be achieved, and the above finite time formula is used. Finite duration always wreck the estimate because of irreversibility, thus being exposed to intrinsic biases. Thus, a theoretical rigorous alternative prescription using nonequilibrium work is Jarzynski’s identity56,63

ΔF = lim

N

∑ Wτ ,n

1 ⎛1 W τx , N = − ln⎜⎜ β ⎝N

(11)

e−βWτ = e−β ΔF

1 N

(15)

where qt denotes the coordinate vector of the system and τ represents the duration of the whole pulling process. Since the perturbation per step also influence the outcome of NEW, the second convergence check term lies here. Another convergence check on the length of relaxation time between E

DOI: 10.1021/acs.jcim.7b00177 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

a period of 600 ps from 0 to 298 K in a NVT ensemble. With pressure regulated by isotropic position scaling method, pre-equilibrating the system in a NPT ensemble for 4 ns, we start 12 ns equilibrate state dynamics production simulations with a sampling frequency of 8 ps. MBAR, FEP, TI, BAR calculations are then performed to estimate the free energy differences and the pKa shift values. Another simulation for with 0.1 λ increments is performed, with the same overall simulation time (sum of equilibration and production time for all windows equal the initial 0.2 increments’ MD time of 96 ns), other settings being the same with 0.2 set. For identification of the initial point where the analysis begins, we monitor value ∂V of ∂λ (λi). If no sudden change followed by equilibrium value is observed, we opine that the equilibrium is reached and start data collection. Atoms’ creation or annihilation triggering VDW singularity,39,154−162 although the vdW parameters of the hydrogen of −COOH(ASP) are zeros in the AMBER force field, the nonlinear separation-shifted scaling regime of softcore potential39,156−158,163 defined with the following equation is used.

measurement for degree of correspondence between two rankings and assessment for the significance of this correspondence, and the most widely applied estimator of Pearson’s correlation coefficient (R2) combined with least-squares regression analysis provides the degree of theoretical values’ conformance to the experimental values. Note that in MD simulations uncertainty equals variance when all samples are uncorrelated, thus one needs to perform Timeseries analysis. Identifying and retaining independent samples125,126 with autocorrelation analysis, statistical inefficiency (g) is calculated with the following equation g = 1 + 2τ

(18)

where τ denotes the autocorrelation time. Performing Timeseries autocorrelation analysis127 and subsampling the whole data set with the interval of g, a set of independent samples is obtained.115 To get an reliable estimate of τ the simulation time need to exceed 50τ.127 Note that in the nonequilibrium framework g is defined as the sum of the period of the NEW transformation and the correlation time in the initial equilibrium ensemble, namely gNEW = gpulling + ginitial configuration.

V0 − vdW,disappearing ⎡ ⎢ 1 = 4ε(1 − λ)⎢ ⎢⎡ ⎢ ⎢αλ + ⎣⎣

III. SIMULATION PROTOCOL Structures of 1BEO,128 1BVI,129 1BVV,130 1C8C,131 1CVO,132 1I0V,133 1KXI,134 1LNI,135 1RGG,136 and 1XOA137 are selected as starting crystal structures. (see Table 1 for the no. residues to be titrated) The reason that we choose these systems is that their pKa shifts are high, and the RMS pKa shift of the data set is 1.82 pKa units. A model system with residue sequence of ACE-ASP-NME providing the offset free energy difference of change in the protonation state is also simulated. All ligands and DNA chains are stripped. The first configuration is chosen, and only monomer is included in calculation. Histidine groups are assigned to be in its neutral form with hydrogen atom at ε position, while the disulfide bridges in cysteine residues are defined according to information in pdb files. Terminal residues are in their charged states. The corresponding experimental pKa values are from the following references: 1BEO,138 1BVI,139−142 1BVV,143 1C8C,144 1CVO,145,146 1I0V,139−142 1KXI,145,146 1LNI,147 1RGG,147 1XOA.148,149 The average experimental pKa shift for the benchmark set is 1.82 pKa units, which is regarded as strongly perturbed systems with complex intraprotein interactions. In alchemical transformation with window sampling regime to enhance convergence, the simulated system is composed of a protein system and TIP3P150,151 water molecules located in a truncated octahedron cell replicated in whole space by periodic boundary conditions, with the solvation protocol of setting the minimum distance between any atoms originally present in solute and the edge of the periodic box to be 18 Å) Nonpolarizable spherical counterions of Na+ or Cl− parametrized for TIP3P water by Joung and Cheatham152,153 are then added to neutralize the system. Alchemical windows/ intermediate states are controlled by the parameter of λ from 0.0 to 1.0 at 0.2 increments (for model system at 0.1 increments) as we opine that the simulation time should be long enough for validation of ensemble average and insufficient sampling in each state is not theoretical rigorous. We are not using optimal λ values and corresponding weights for thermodynamic integration as such a spacing regime may not ensure sufficient phase space overlap for perturbation based methods. For each window, first being energy-minimized for 20 000 cycles with steepest decent algorithm, the system is heated over

2 rij 6 ⎤



rij

6 ⎤2



( σ ) ⎥⎦

V0 − ele,disappearing = (1 − λ)

rij

(σ )

( σ ) ⎥⎦

V1 − vdW,appearing ⎡ ⎢ 1 = 4ελ⎢ ⎢⎡ ⎢ ⎢α(1 − λ) + ⎣⎣

1 ⎡ ⎢⎣αλ +

⎤ ⎥ ⎥ 6⎤ ⎥ ⎥⎦ ⎥ ⎦

1 ⎡ ⎢⎣α(1 − λ) +

rij

⎤ ⎥ ⎥ 6⎤ ⎥ ⎥⎦ ⎥ ⎦

(σ )

qiqj 4πε0 βλ + rij 2

α = 0.5 and β = 12 A2 (19)

And, the mixing scheme for potential energy function for alchemical intermediate states is used V (λ) = (1 − λ)k V0 + [1 − (1 − λ)k ]V1

(20)

with k equaling 1 as the soft-core potential used. Nonequilibrium trajectories are also simulated with the initial states sampled from the equilibrium trajectories with a sample frequency of 100 ps at 0.0 and 1.0 states for forward pulling and reverse process, respectively. The normal pulling process lasts 200 ps with a change in λ of 0.001 each 200 fs, which is called 1000 set. A series of convergence check simulations for the magnitude of change and the speed of change (the relaxation time per change) are performed: (a) λ changes 0.01 each time with relaxation time of 200 fs, (b) λ changes 0.002 each time with relaxation time of 200 fs, (c) λ changes 0.001 each time with relaxation time of 400 fs, which are named the 100, 500, and 2000 sets, respectively. The convergence check is a little different from the normal same time protocol one, as here we concentrate on whether the overall pulling simulation is long enough rather than identifying the difference between free energy estimates via different pulling speed. Thus, the F

DOI: 10.1021/acs.jcim.7b00177 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

N/A 3 76 25 76 25 33 20 26 16 35 42 59 4 101 21 30 72 42 59

model 1BVI

G

exp

N/A −0.63 −4.80 1.19 −4.80 1.19 −2.21 0.00 4.80 −1.51 −1.92 −1.10 −2.33 −1.37 −2.75 −2.06 −2.06 −1.92 −1.10 −2.33

MCCE

N/A N/A N/A N/A N/A 4.54 1.21 N/A N/A 2.68 4.25 0.77 2.35 3.62 5.16 1.10 3.14 2.97 N/A N/A 4.79 4.53 4.53 0.05 0.05 0.03 0.25

PROPKA

N/A −1.54 −0.82 0.25 −0.48 0.26 −0.05 −0.41 4.01 −2.91 0.04 −1.14 −2.07 −0.84 0.21 −2.62 −0.70 −1.95 0.29 −0.84 1.83 0.76 1.39 0.02 0.30 0.37 0.42

SD 0.18 3.05 0.85 0.42 1.04 0.57 1.57 0.76 1.40 0.76 0.98 1.55 0.57 0.55 0.63 1.32 2.27 1.28 0.39 1.14

TI −1.21 −7.81 −2.51 −8.22 −3.55 −7.78 −2.83 −7.66 7.19 −2.66 −5.32 −1.37 −9.44 −5.94 −5.82 −4.37 −4.50 −11.69 −9.18 −3.13 5.29 −3.60 4.20 −0.15 −0.01 0.09 0.59

TI-cubic −0.49 −5.05 −1.01 −5.68 −2.06 −5.22 −1.01 −5.51 8.91 −0.65 −3.13 0.15 −7.69 −3.66 −5.13 −2.47 −2.21 −9.55 −7.38 −1.20 7.34 −5.89 6.19 0.04 0.28 0.13 0.65

SD 0.18 3.42 0.89 0.39 1.06 0.49 1.73 0.83 1.48 0.74 0.98 1.67 0.57 0.55 0.66 1.17 2.51 1.16 0.38 0.95

TP(F) −0.64 −8.82 −7.34 −6.13 −6.97 −4.65 −7.68 −7.95 7.79 −5.29 −7.91 −5.40 −7.54 −5.58 −15.77 −8.17 −6.92 −20.91 −9.52 −8.19 7.34 −5.89 6.19 0.04 0.28 0.35 1.52

SD 0.20 0.91 0.81 0.70 0.75 0.79 0.82 0.81 0.90 0.64 0.75 0.86 0.82 0.73 0.47 0.88 0.81 0.77 0.91 0.69

TP(B) 0.47 4.00 2.22 2.46 8.81 0.79 7.64 1.55 18.60 4.54 6.75 13.04 0.68 3.30 5.28 4.28 4.98 6.11 1.80 9.17 7.67 6.35 6.39 −0.27 0.09 0.07 0.57

SD 0.23 0.95 0.94 0.51 0.77 0.62 0.77 0.83 0.52 0.84 0.88 0.77 0.66 0.62 0.99 0.97 1.03 0.43 0.51 0.93

BAR 0.00 −1.61 −0.67 −3.02 0.25 −3.08 0.48 −2.99 11.63 0.73 −1.17 4.30 −4.68 −1.78 −3.37 −1.56 0.86 −7.51 −5.24 −0.01 3.50 0.36 2.92 −0.19 0.19 0.18 0.79

SD 0.07 1.01 0.76 0.35 0.56 0.30 0.77 0.52 0.69 0.50 0.84 0.88 0.46 0.37 0.47 0.85 0.80 0.62 0.37 0.84

MBAR −0.01 −1.61 −0.67 −3.01 0.25 −3.08 0.50 −2.98 11.63 0.72 −1.18 4.27 −4.68 −1.78 −3.37 −1.55 0.86 −7.51 −5.24 −0.01 3.50 0.36 2.92 −0.19 0.19 0.18 0.79

SD 0.08 1.21 0.78 0.35 0.90 0.30 1.03 0.57 0.85 0.51 2.20 2.93 0.50 0.37 7.57 0.89 0.87 9.24 0.39 1.38

JIF −0.57 −6.81 −6.13 0.16 −5.11 −3.73 −0.95 −2.31 7.04 −1.92 −2.61 1.41 −7.35 −1.88 −10.36 −2.88 −1.39 −7.60 −7.12 0.75 3.51 −1.68 2.66 −0.11 0.23 0.11 0.62

SD 0.26 0.71 0.60 0.60 0.92 0.31 0.95 0.63 0.45 0.50 0.86 0.53 0.23 0.18 0.57 0.29 0.47 0.90 0.17 0.59

JIR 0.34 −1.92 0.72 −2.65 3.23 −3.68 2.94 −4.34 16.99 1.82 −0.94 5.68 −2.31 −1.20 9.26 −0.37 5.12 3.55 −3.55 4.58 5.79 2.95 4.63 −0.44 −0.15 0.37 1.15

SD 0.36 0.94 0.65 0.36 0.74 0.39 0.67 0.95 0.62 0.36 0.96 0.55 0.79 0.50 0.81 0.57 0.92 0.76 0.39 0.76

CE 0.18 −3.32 −3.16 −2.45 −1.20 −4.01 0.11 −3.34 11.78 −0.43 −0.94 3.64 −4.36 −1.12 −1.36 −1.44 2.66 −3.80 −5.06 1.17 3.27 0.46 2.74 −0.25 0.10 0.23 0.85

SD 0.11 0.41 0.56 0.14 0.43 0.18 0.42 0.22 0.83 0.19 0.86 0.36 0.45 0.21 0.53 0.20 0.51 0.75 0.19 0.32

CPH N/A −1.06 −1.37 0.10 −1.14 0.62 −4.90 −0.97 0.12 −5.67 −2.36 −1.59 −3.80 −2.65 −2.46 −3.16 −1.98 −2.20 −2.29 −2.62 2.07 −0.72 1.50 0.12 0.23 0.22 0.35

n N/A 1.41 0.65 0.99 0.95 0.48 13.32 0.55 0.74 3.19 1.10 1.02 0.95 0.84 1.54 1.88 1.07 1.04 2.61 3.14

a All values are listed in kcalories per mol. The mean signed deviation (MSD), the mean absolute deviation (MAD), the root-mean-squared deviation (RMSD), Pearson’s correlation coefficient (R2), Kendall’s rank correlation coefficient (τ), and Pearlman’s predictive index (PI), as well as the slope of the best correlation line serve as quality measurements. For systems with no value available the correlation analysis is not performed. SD is standard deviation. n represents the Hill coefficient. We give the reference pKa and Hill coefficient as N/A as the reference pKa for PROPKA is 3.8 while that for others is 4.0.

RMSE MSE MAE τ PI R2 slope

1CVO

1BEO

1BVV

1KXI

1C8C

1XOA

1RGG 1I0V 1LNI

no. residue

PDB ID

Table 1. pKa Shift Free Energy Differences Computed via Various Methods as Well as the Corresponding Experimental Valuesa

Journal of Chemical Information and Modeling Article

DOI: 10.1021/acs.jcim.7b00177 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling 100 and 500 sets are used to check the magnitude of change with the same relaxation time, and the 2000 set is for validation of relaxation time in the 1000 set. In constant pH calculations we use the following workflow. As the charge scheme of AMBER 99SB102 is the same with that of AMBER 14SB,102 the system is described with constph force field (leaprc.constph in Amber16), which is a combination of AMBER14SB and modifications. With implicit solvent of GBOBC model164,165 using a salt concentration of 0.1 mol/L to represent the effects of solvent, energy-minimization of 20 000 cycles with the steepest decent algorithm followed by 10 000 cycles with conjugate gradient algorithm166 are carried out. Heating the system from 0 to 298 K over a period of 600 ps with 10 kcal mol−1 Å−2 harmonic restraints on all alpha-C atoms of proteins, we perform 600 ps pre-equilibration without restraints to get the equilibrated starting structure for pH window sampling simulations. With aspartates, histidine, and glutamates titrating, the pH windows being separated equally from 0.0 to 8.0 at 1.0 pH increments with a total of 9 windows, further equilibration for 40 ns with MC for protonation states and 100 ns constant pH simulation are performed for each pH window respectively, with a time step of integer (188/the number of titratable residues) femtoseconds between Monte Carlo steps, namely about 99 MD steps the protonation state of each residue being attempted to change. In the expanded ensemble simulation, the autocorrelation times are not calculated and the sampling frequency is set as 1000 MD steps. Protonation states of terminal residues are fixed at their most likely forms of protonated N-terminus and deprotonated C-terminus, which are expected to have little effect on the titrating side chains. No enhanced sampling method is coupled with CpHMD as this research focus on conventional MD based simulation. Titration curves are fitted to HH plot, the initial guess being set as pKa = 3.5 and (Hill’s coefficient) n = 1.0. In all simulations, the SHAKE167 algorithm are used to perform bond-length constraints for bonds involving hydrogen atoms in both protein and water molecules.168 The equation of motions is integrated with a time step of 2 fs. The simulation is performed at 298 K, and Langevin dynamics169 with collision frequency of 4 ps−1 are implemented for temperature regulation. A cutoff of 12 Å for nonbonded interactions in the real space is used with long-range electrostatics treated with PME method170 for periodic simulations, while there is no cut off for nonperiodic simulation in implicit solvent since the salt concentration introduces additional screening in addition to GB dielectric. All MD simulations are performed with AMBER16 suite,171 and postprocess free energy estimates are obtained with pymbar37,55 and homemade codes. MCCE calculations are not performed and the results listed in this paper are from a pKa database.17

MCCE as well as non-pH-coupled alchemical methods are large. Correlation analysis shows that alchemical methods randomly deviate from experimental value and MCCE overestimates pKa and does not predict the right rank. Even with small deviations, the PROPKA and constant pH do not predict correct rank of these shift values, thus also performing poorly. Constant pH MD. Constant pH MD simulations for 100 ns in each pH environment with 40 ns equilibration should achieve the long time limit under the conventional MD based force field framework and their corresponding results are assumed precise with zero variance. (Of course the fitting procedure introduced uncertainty, but here we concentrate on the most precise prediction with force field.) The titration curves are given in Figures 2 and S1. With assumed ergodicity the coupling of titratable groups is well sampled. Note that our results are obtained with the GBOBC model. The performance of GBHCT may be different. Non-HH Behaviors Arise from Nonconvergence. The running averages of deprotonation fraction for each residue at pH 1.0, 4.0, and 7.0 are given in Figure S2, while those at all simulated pHs are in Figures 3 and S3. In the running average in Figure 3, the deprotonation fraction of 1CVO-42 at pH 3.0 is not converged and that at pH 4.0 converges better. Hence the convergence of 1CVO-59 is better achieved than 1CVO-42 at pH 3.0 and pH 4.0, which should be the cause of the non-HH behavior of 1CVO-42. An enhanced constant pH research also suggests this.99 Thus, 100 ns conventional MD based GB sampling is still not long enough, which emphasizes the importance of sampling enhancement. As the expanded ensemble used in CpHMD enhances sampling process and the coupling between ionizable groups are taken into account, the precision and accuracy is best among all these methods, but there is still much room for improvement. Most of Hill’s coefficients are close to unity value. Those smaller than 1.0 and those close to 1.0 indicates weak cooperativity and those being larger than 1 verify the existence of cotitrating and heterogeneity in the response.172 The unreasonably large value of 1LNI-33 arises from nonconvergent MD results. Note that 1BVI and 1I0V are the same protein which are crystallized under different conditions and so are 1RGG and 1LNI. Hence comparison of their results gives some insights into the convergence. Discussions about these systems are included in the Supporting Information. Single Site Alchemical Free Energy Simulation. In traditional alchemical simulation, the free energy change for deprotonation of model compound is estimated about −75 kcal/mol via TI while that by BAR is about −74 kcal/mol. As the acceptance ratio method serves as the best estimator giving unbiased result when the sample size is large and TI is the most accurate method only when the number of

IV. RESULT AND DISCUSSION As the free energy difference is more sensitive than pKa shift value (ΔΔG = 1.36ΔpKa (kcal/mol)), here we use the free energy difference as the estimator. Free energy estimates with all methods are listed in Table 1. As the data set contains highly shifted pKa values which may result from coupling between ionizable groups, the deviation of theoretical prediction from experimental values are large for all methods. Discrete-protonation-state constant pH with implicit solvent and empirical single-conformation based PROPKA3.1 give the smallest deviations of RMSE, MSE, and MAE from experimental values. Deviations for conformational sampling incorporated

windows approaches infinite or

∂U ∂λ λ i

varies linearly along the

alchemical control parameter, obviously TI based estimates are unable to give appreciably reliable and well-converged result in pKa shift calculations. Inconsistency of Various Alchemical Free Energy Estimators. Accuracy of TI is not directly related with the overlap in energy functions but is determined by the curvature of

∂U ; ∂λ λ i

meanwhile, perturbation-based estimators depend

on the overlap in energy distributions. In the vicinities of the rapid change in the H

∂U ∂λ λ i

derivative, more windows should be DOI: 10.1021/acs.jcim.7b00177 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

Figure 2. Titration curves for (a) 1KXI and (b) 1CVO. Green lines are best fitted HH curves.

Figure 3. Running averaged deprotonation fraction for 1CVO (all pH values). The window size is 10 000 time steps, and the averaging frequency is 2000 steps.

based on same data providing inconsistent free energy differences indicates that the sampling may be insufficient and the phase space overlap may be poor. In Figures 4 and S4, we give the overlap matrices, the element of which may be too small to give enough phase space overlap for perturbation based methods. And

placed. When the phase space overlap diminishes, Zwanzig’s relation based approaches are generally expected to break down first, and BAR-based estimates are the most accurate results.115 In the large sample size regime and sufficient phase space overlap occasion, all free energy estimators should give the same prediction. Thus, checking the consistency across these methods can give some insights in identifying analysis or sampling problems.115 As in the TI method the free energy difference is calculated via integrating along the control parameter with finite increments, the statistical uncertainty that propagates from each ∂U ∂λ λ i

the nonlinear behavior of

∂U ∂λ λ i

also indicates poor convergence

for curvature based TI. Thus, λ increments of 0.2 are too large for perturbation-based pKa calculation. In the 0.1 increment simulation, the neighboring states have larger overlaps as is shown in Figures 4 and S5, but estimates via various methods in Table 2 neither match with their experimental components nor agree well with each other. Hence convergence is really hard to achieve in traditional alchemical staging regimes, and previous picosecondsimulation-based conclusions are biased and questionable. ∂U Monitoring the time series of ∂λ (λi), we identify the equilibrium point, where we start data collecting. The well-equilibrated

is always underestimated, making the result unreliable.

Using a cubic spline to improve the integration gives better result but still with strong bias. BAR and MBAR gives the most reliable results while FEP suffers from sample size hysteresis and is strongly biased and hard to converge. Alchemical methods I

DOI: 10.1021/acs.jcim.7b00177 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

stable states, thus no data is excluded in the data analysis. The ∂U time series of the cumulant average of ∂λ (λi) is also given in Figure S7, giving the same information. Alternatively one can use the time reverse cumulant analysis for identification of nonequilibrium data.115,173 Results Not Consistent with Previous Research. From the Table 1, we can see that for system 1XOA-20 the PROPKA is able to predict a pKa shift which is in accordance with

occasion is shown in Figure 5 while the nonequilibrated data given in Figure 6a is excluded from free energy analysis, from the 500th sample for 1BVV-101 in the λ 1.0 state for instance. All plots are summarized in Figure S6. For 1CVO-59 in Figure 6b the evolution of

∂U (λ i ) ∂λ

with time shows the existence of two

Figure 5. Time series of

∂U ∂λ λ i

for 1KXI-59 including data of

equilibration steps (from ASP to ASH). From top to bottom: λ = 0.0, 0.2, 0.4, 0.6, 0.8, 1.0.

Figure 4. Overlap matrices for 1CVO with 0.2λ increments (top) as well as those with 0.1 increments (bottom).

Table 2. Free Energy Differences Computed via Alchemical Methods with 0.1λ Increments as Well as the Corresponding Experimental Valuesa PDB ID

no. residue

exp

TI

SD

TI-cubic

SD

TP(F)

SD

TP(B)

SD

BAR

SD

MBAR

SD

1BVI

3 76 25 76 25 33 20 26 16 35 42 59 4 101 21 30 72 42 59

−0.63 −4.80 1.19 −4.80 1.19 −2.21 0.00 4.80 −1.51 −1.92 −1.10 −2.33 −1.37 −2.75 −2.06 −2.06 −1.92 −1.10 −2.33

−4.1 −4.1 −4.6 −3.7 −4.7 0.4 −4.0 10.2 −1.7 −3.3 2.3 −7.3 −2.1 −7.2 −2.8 −1.6 −5.3 −7.8 −0.4 3.65 −1.37 3.02 −0.18 0.16 0.27 0.98

1.2 1.2 0.5 1.5 0.4 1.0 1.3 0.9 0.7 1.5 1.4 0.5 0.5 0.7 1.2 1.3 0.8 0.6 1.4

−3.2 −3.5 −3.7 −3.1 −3.9 1.0 −3.2 10.6 −0.4 −2.5 3.1 −6.7 −1.5 −6.5 −1.8 −0.8 −4.6 −6.9 0.4 3.38 −0.60 2.88 −0.20 0.20 0.28 0.98

1.2 1.2 0.5 1.5 0.4 1.0 1.3 0.9 0.7 1.6 1.4 0.5 0.4 0.7 1.3 1.3 0.8 0.7 1.3

−3.9 −5.8 −4.7 −4.6 −4.7 −0.3 −3.5 10.3 −2.3 −4.6 0.7 −8.4 −1.9 −5.0 −3.9 −2.6 −6.1 −8.5 −0.4 3.70 −1.82 3.01 −0.11 0.16 0.34 1.11

0.7 0.6 0.3 0.6 0.4 0.6 0.7 0.6 0.5 0.6 0.7 0.4 0.6 0.6 0.7 0.6 0.6 0.5 0.8

−1.3 −0.6 −2.5 −3.3 −3.0 2.2 −2.2 12.5 0.8 0.8 3.2 −5.2 −0.2 −5.9 0.3 1.8 −3.0 −5.8 4.3 3.79 0.98 3.35 −0.16 0.23 0.23 0.94

0.8 0.5 0.4 0.8 0.4 0.6 0.6 0.6 0.8 0.7 0.9 0.5 0.4 0.5 0.7 0.7 0.7 0.6 0.5

−2.8 −3.1 −3.3 −3.0 −3.4 1.1 −2.3 11.1 0.0 −2.9 2.8 −6.3 −1.2 −6.4 −1.3 −1.1 −3.9 −6.7 1.2 3.28 −0.30 2.82 −0.20 0.24 0.30 1.02

0.4 0.3 0.2 0.4 0.2 0.3 0.3 0.3 0.3 0.5 0.4 0.2 0.2 0.2 0.5 0.3 0.3 0.2 0.5

−2.8 −2.8 −3.3 −3.0 −3.4 1.2 −2.3 11.2 −0.1 −2.9 2.4 −6.3 −1.2 −6.3 −1.2 −1.5 −3.8 −6.7 1.2 3.27 −0.31 2.80 −0.17 0.24 0.30 1.01

0.4 0.4 0.2 0.5 0.2 0.4 0.4 0.4 0.3 0.5 0.4 0.2 0.2 0.2 0.5 0.3 0.3 0.3 0.5

1RGG 1I0V 1LNI 1XOA 1C8C 1KXI 1BVV 1BEO

1CVO RMSE MSE MAE τ PI R2 slope a

All values are given in kilocalories per mole. The mean signed deviation (MSD), the mean absolute deviation (MAD), the root-mean-squared deviation (RMSD), Pearson’s correlation coefficient (R2), Kendall’s rank correlation coefficient (τ), and Pearlman’s predictive index (PI), as well as the slope of the best correlation line serve as quality measurements. J

DOI: 10.1021/acs.jcim.7b00177 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

Figure 6.

∂U (λi) ∂λ

as a function of time for (a) 1BVV-101, (b) 1CVO-59 (from ASP to ASH). Curves from top to bottom are from λ = 0.0−1.0.

Figure 7.

∂U (λi) ∂λ

as a function of time for 1XOA (from ASP to ASH). Curves from top to bottom are from λ = 0.0−1.0.

Figure 8.

∂U ∂λ λ i

as a function of λ values for the model compound, 1KXI and 1CVO (from ASP to ASH). λ increments equal 0.2.

experimental result, and earlier work shows TI can also provide a good result,45 but the TI results in this work dramatically deviate from the reference value. The reason for this should be their lack of sampling. For 1XOA-26, our results match theirs, indicating that the deviation from the experimental value ∂U should be triggered by a Hamiltonian. The timeseries of ∂λ (λi) in Figure 7 shows that sufficient equilibration is needed before gaining adequate sampling. Note that their conclusion also neglects statistical inefficiency, thus being biased and question∂U able. For system with dramatic fluctuation in ∂λ (λi), long simulation time is needed to get a converged result. For 1XOA-20 statistical uncertainty of MBAR and BAR both converge to kBT, while those of 1XOA-26 do not because of its largely fluctuating value. Is LRA Applicable? As a protonation state change only includes electrostatic interaction changes, theoretically the

curvature of

∂U ∂λ λ i

varies linearly with λ, leading to the well-

known LRA approximation. For a linear responded electrostatic change, the relaxation free energy can be calculated and dielectric medium presentation of protein environment can be explored.

∂U ∂λ λ i

of model compound and protein systems are

given in Figures 8 and S8. The curves nonlinearly changes from (λ) 0.0 to about 0.3 and, then, vary linearly to 0.8. At last the nonlinear behavior shows again. This nonlinearity arises from the nonlinear potential used in this research and does not show the cooperativity in titration. Even though the free energies of initial states under soft-core potential are the same with those under normal electrostatic potential function, the potential of mean force (PMF) or potential derivative profile along the reaction coordinate differs. Hence LRA is no longer valid here and the performance of TI, of course, is poorer than BAR. K

DOI: 10.1021/acs.jcim.7b00177 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling agree to some extent, as is shown in Figure 9.

∂U ∂λ λ i

values

are given in Figures 10 and S9, which agree with those in the 0.2 set. However, from the timeseries of

∂U ∂λ λ i

as a function of λ values for 1KXI and 1CVO (from

ASP to ASH). λ increments equal 0.1.

Note that the artificial potential function does not change the relative free energy of the electrostatic response. 0.1 Increments vs 0.2 Increments. The smaller SDs of the free energy estimates in the 0.1 set are shown to be better converged compared with those of the 0.2 set, and their values

Figure 11.

∂U (λi) ∂λ

in

Figures 11 and S10, we know that the convergence of free energy estimates should be highly questionable, as the large fluctuations last too long and the assumption of ergodicity is questionably valid. The statistical inefficiencies provided in Table 3 are too long for an accurate and precise estimate. As such large ϕ may also result from insufficient equilibration, results in 0.1 set are nonconvergent. Thus, the smaller SDs in 0.1 set do not lead to the conclusion of better convergence. Cumulant averages are given in Figure S11. Note that in the 0.1 set, ΔΔG for 1XOA-26 agrees well with previous work.45 As in alchemical free energy simulations the preassigned protonation states lead to wrong interactions in the simulated systems, we plotted RMSDs for all system at alchemical windows of 0.0, 0.5 and 1.0, as is shown in Figure S12. Most RMSDs are small in value while only those of 1CVO have values larger than 3. For 1CVO-42 the RMSD of 1.0 state is significantly smaller than others, which may arise from the favorable protonated aspartate rather than wrongly assignments of protonation states. The RMSD of 1CVO-59 is too large in value and should be triggered by wrongly protonated ionizable groups. Hence the 1CVO results may be questionable. For most systems the RMSDs fluctuate around their average values. Hence the simulations are generally correct. The influence of removed ligands and nucleotides will be studied in future work. Nonequilibrium Alchemical Results. The convergence of the nonequilibrium work method is checked, as is shown in Figure 12 and Table S1. The ordinary average tends to overestimate the free energy difference, with strong bias when a fast switching speed is used. The bias is removed gradually when the speed of the pulling process becomes slower and the magnitude of change per step becomes smaller. Namely plenty of equilibration steps are performed during each switch and a sufficiently small perturbation is added in each step. JI estimates are relatively well-converged while slow growth results are biased and with large variances. Some of the converged CE results are identical with their equilibrium equivalents while others deviate obviously. Such a phenomenon arises from nonconverged free energy estimates, and thus, they are not comparable. In wellconverged occasions, the nonequilibrium results are almost the

Figure 9. Comparison of BAR estimates with 0.1 and 0.2 increments (free energies in kilocalories per mole).

Figure 10.

∂U ∂λ λ i

as a function of time for 1KXI and 1CVO (from ASP to ASH). λ increments equal 0.1. L

DOI: 10.1021/acs.jcim.7b00177 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling Table 3. Statistical Inefficiencies in Unit of 8 Picosecond in 0.1 Set statistical inefficiency PDB ID

no. residue

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1BVI

3 76 25 76 25 33 20 26 16 35 42 59 4 101 21 30 72 42 59

1.49 1.79 1.02 4.99 1.08 3.32 20.25 6.31 1.65 1.51 14.41 1.71 1.57 1.70 6.30 1.36 1.19 1.10 42.12

1.85 1.82 1.04 4.29 1.13 8.42 1.00 7.14 1.50 34.43 2.31 1.45 1.03 1.45 24.65 1.57 6.85 1.06 1.96

16.66 4.23 1.20 1.87 1.15 13.56 1.10 4.40 1.08 2.12 1.18 1.10 1.38 1.13 1.24 62.34 3.99 1.15 1.28

23.31 3.43 1.28 4.24 1.57 8.67 1.25 3.70 13.92 1.66 1.09 1.00 1.53 17.41 21.83 1.00 1.37 1.85 30.86

1.00 47.67 1.14 28.62 1.00 1.71 1.54 1.16 1.02 1.29 7.34 1.02 1.16 1.93 1.66 1.99 1.99 1.16 1.53

1.10 1.36 1.21 4.18 1.00 2.69 1.01 1.87 1.24 1.26 1.00 1.00 1.00 1.22 7.83 1.00 1.06 1.12 1.00

1.14 2.06 1.19 1.46 1.32 3.21 1.08 3.44 1.00 1.06 2.37 1.44 1.19 1.27 1.91 2.33 1.38 1.18 2.48

3.42 2.22 1.00 1.72 1.00 4.88 3.30 2.32 1.42 2.75 1.00 1.91 1.00 1.49 7.72 1.50 4.14 1.56 1.08

2.28 2.80 1.27 26.21 1.04 7.18 5.72 10.84 1.99 1.35 32.84 1.40 1.00 2.10 1.27 1.17 10.57 1.21 5.85

12.72 4.90 2.63 14.53 1.58 1.12 25.23 8.19 1.00 1.39 29.90 1.31 1.29 1.00 3.44 2.66 1.68 9.12 17.15

10.14 3.30 2.17 12.62 1.07 1.39 1.19 7.33 13.56 31.17 4.03 1.13 1.70 1.18 2.35 1.55 2.47 4.50 1.12

1RGG 1I0V 1LNI 1XOA 1C8C 1KXI 1BVV 1BEO

1CVO

Figure 12. Ordinary average (Wa) and the exponential average (Wx) of the work values as a function of the magnitude of change for (a) 1KXI and (b) 1CVO. 100 relaxation MD steps are used before each change in λ for simulation times of 100, 500, 1000, and 200 MD steps for 2000. The pulling direction is from ASP to ASH.

Figure 13. Histogram of work, with marks of forward and reverse JI estimates and ordinary averages, CE result, and corresponding experimental value. (left) 1BEO-21, (right) 1I0V.

same with BAR and MBAR and yield smaller variances, which in return validate the convergence of equilibrium results. In Figures 13 and S13, the work distributions for forward and backward processes are plotted, the intercept of which is the

CE result. JI and slow growth values are placed on both sides of CE, giving its upper bound and lower bound, respectively. Is Alchemical FES Really Useful in pKa Shift Calculation? Alchemical estimates show poor agreement with their experimental M

DOI: 10.1021/acs.jcim.7b00177 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

and the sampling is insufficient, which also explain why the results do not match their experimental components. But for alchemical transformations in explicit solvent for protein systems, the simulation length for each intermediate state is always less than 10 ns because of the computational cost. And our simulation is long enough to verify that such non-pH-coupled window sampling method is not applicable for accurate pKa shift calculations. There are systems which converge fast and show great agreement with their corresponding experimental values, but broadly speaking, the effects of initial assignments of protonation states, reorganization, and coupling between titratable groups are large and alchemical methods do not perform well. Hence instead of traditional alchemical methods, the enhanced alchemical strategy of CpHMD is more robust. Performance of all Methods. According to identical MSE and MAE in Table 1 and correlation plot of Figure 14, tending to overestimate pKa shifts for high variants with almost zero correlation coefficients, MCCE predictions give largest errors among all methods (BAR, PROPKA, MCCE, and CPH). Such a mismatch never happens in previous benchmarks for low variants. Normally high variants are harder to model than low variants and predicted free energy differences deviate much larger from experimental values, as complex interactions are included. MCCE overestimates shift values. BAR estimates just fluctuate randomly around the y = x line. The constant pH method and PROPKA are much better than others. A recent research by Ernest Awoonor-Williams and Christopher N. Rowley concentrating on the pKa calculation of cysteine residues comes to the conclusion that general ensemble

Figure 14. Comparison of theoretical results and experimental values (free energies in kilocalories per mole). (BAR results agree well with those of MBAR, so here we use BAR to represent the best alchemical prediction.).

components, which is caused by the coupling between protonation states of spatial neighboring ionizable groups and conformational changes which may be not well-sampled. Initial assignments of nontitrating ionizable groups also have a strong influence on the outcome. For several systems MBAR provides much larger uncertainty than original BAR, triggered by the covariance of free energies between intermediate states. Such correlation suggests that the system is not well equilibrated for several intermediate states

Table 4. pKa Values Computed via Various Methods As Well As the Corresponding Experimental Valuesa system

exp

SD

H++

MCCE

PROPKA

CHARMM

SD

AMBER

SD

α-1-AT AhpC ACBP-M46C ACBP-S65C ACBP-T17C DJ-1b HMCK HMCK-S285A Mb-G124C Mb-A125C MmsrAc MmsrA-E115Q O6-AGT papaine ppΩf PTP1B YopH YopH-H402A RMSE MSE MAE τ PI R2 slope

6.86 5.9 8.2 9.0 9.8 5.4 5.6 6.7 6.53 8.43 7.2 8.2 5.3 3.32 2.88 5.6 4.7 7.35

0.05 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.05 0.03 0.2 0.1 0.2 0.01 0.02 0.1 0.2 0.04

7.3 9.4 8.8 8.8 8.4 11.3 9.1 9.3 8.1 8.3 >12 >12 9.5 9.3 9.4 3.9 4.0 3.3 3.53 1.95 2.86 −0.15 −0.07 0.00 0.09

8.3 9.1 8.8 9.4 8.8 12.6 6.8 6.6 8.5 8.8 16.3 15.4 8.3 8.8 7.6 −0.7 −1.0 −0.6 4.72 1.38 3.71 0.18 0.19 0.06 0.64

9.1 9.1 9.0 9.6 8.9 12.3 10.4 11.2 8.4 9.2 13.1 11.4 10.6 10.5 7.5 8.5 7.6 7.5 3.90 3.15 3.25 0.02 −0.14 0.01 0.08

7.6 8.2 8.7 8.1 9.0 5.4 7.1 6.5 8.0 8.7 7.5 7.0 8.7 4.4 0.7 1.2 2.9 0.3 2.40 −0.40 1.66 0.27 0.36 0.32 0.89

0.4 0.5 0.2 0.2 0.4 1.0 0.4 0.8 0.8 0.8 0.6 0.7 0.5 0.6 0.8 0.4 0.7 1.0

9.4 9.7 7.0 7.6 10.3 8.8 7.1 9.4 8.8 8.5 10.7 13.2 10.9 5.0 −0.2 1.3 4.6 1.4 2.95 2.12 2.51 0.10 0.10 0.26 0.99

0.7 0.6 0.3 0.6 0.9 0.3 1.2 0.4 0.6 0.5 1.0 0.7 0.7 0.8 0.7 0.3 0.8 0.4

a

For reference, see the text for more information. The mean signed deviation (MSD), the mean absolute deviation (MAD), the root-mean-squared deviation (RMSD), Pearson’s correlation coefficient (R2), Kendall’s rank correlation coefficient (τ), and Pearlman’s predictive index (PI), as well as the slope and intercept of the best correlation line serve as quality measurements. For systems with a value of >12, the value of 12 is used in correlation analysis. AMBER and CHARMM results with different force fields used in the reference. The data are from ref 174. N

DOI: 10.1021/acs.jcim.7b00177 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Journal of Chemical Information and Modeling



based methods enhance the convergence of pKa calculation and implicit models are highly unreliable, the results of which are shown in Table 4.174 From the analysis in the article and the correlation analysis done in this work, single site TI outperforms the other methods. However, according to the large SDs of pKa values and the fact that the SDs of free energies equal 1.36SDs of pKa, almost all of their free energy simulations are nonconvergent. Thus, the validation of single site protonation calculation is still questionable. In our alchemical free energy simulation, the assignments for protonation states of histidines may also influence the predictions. Thus, in order to perform alchemical FES to calculate pKa values of ASP groups, one need to determine the pKa of other ionizable groups, with PROPKA for instance. But considering the fact that the results in Table 4 are also not that satisfactory, we believe that the performance of alchemical FES is still bad. While alchemical FES is susceptible to model construction, CpHMD obviously is a more robust and powerful tool to perform pKa shift predictions. For all systems in our case, we can see that the alchemical transformation fails to give reliable and accurate pKa shift values, while constant pH MD with implicit solvent is able to provide better predictions and to obtain results for all ionizable residues in a single simulation, thanks to the pH-coupled dynamics. Hence for calculations of pKa shift value, alchemical transformation should not be used exclusively but should be coupled with expanded ensemble methods, leading to constant pH MD. As the conformational sampling still bars the applicability of CpHMD, enhanced sampling should also be considered to further improve performance, such as pH-REMD.

AUTHOR INFORMATION

Corresponding Authors

*[email protected] (Z.S.). *[email protected] (J.S.). ORCID

Zhaoxi Sun: 0000-0001-8311-7797 Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This work was supported by graduate scientific research innovation funding project of East China Normal University and National Natural Science Foundation of China (Grant No. 21433004), National Natrual Science Foundation of China (Grant No. 21603144), and Shanghai Sailing Program (Grant No. 2016YF1408400). We thank Prof. Ye Mei (ECNU), Dr. Ya Gao (ECUST), and the anonymous reviewers for valuable comments.



REFERENCES

(1) Bullough, P. A.; Hughson, F. M.; Skehel, J. J.; Wiley, D. C. Structure of Influenza Haemagglutinin at the Ph of Membrane Fusion. Nature 1994, 371, 37−43. (2) Clippingdale, A. B.; Wade, J. D.; Barrow, C. J. The Amyloid? Peptide and Its Role in Alzheimer’s Disease. J. Pept. Sci. 2001, 7, 227− 249. (3) Howell, E. E.; Villafranca, J. E.; Warren, M. S.; Oatley, S. J.; Kraut, J. Functional Role of Aspartic Acid-27 in Dihydrofolate Reductase Revealed by Mutagenesis. Science 1986, 231, 1123−1128. (4) Kelly, J. W. Amyloid Fibril Formation and Protein Misassembly: A Structural Quest for Insights into Amyloid and Prion Diseases. Structure 5, 595−600. Structure 1997, 5, 595−600. (5) Matthew, J. B.; Gurd, F. R.; Garcia-Moreno, B.; Flanagan, M. A.; March, K. L.; Shire, S. J. Ph-Dependent Processes in Protein. Crit. Rev. Biochem. Mol. Biol. 1985, 18, 91−197. (6) O’Keefe, D. O.; Cabiaux, V.; Choe, S.; Eisenberg, D.; Collier, R. J. Ph-Dependent Insertion of Protein into Membranes: B-Chain Mutation of Diphtheria Toxin That Inhibits Membrane Translocation, Glu349lys. Proc. Natl. Acad. Sci. U. S. A. 1992, 89, 6202−6206. (7) Girvin, M. E.; Rastogi, V. K. Structural Changes Linked to Proton Translocation by Subunit C of the Atp Synthase. Nature 1999, 402, 263−268. (8) Alonso, D. O. V.; DeArmond, S. J.; Cohen, F. E.; Daggett, V. Mapping the Early Steps in the Ph-Induced Conformational Conversion of the Prion Protein. Proc. Natl. Acad. Sci. U. S. A. 2001, 98, 2985−2989. (9) Ripoll, D. R.; Vila, J. A.; Villegas, M. E.; Scheraga, H. A. On the Ph-Conformational Dependence of the Unblocked Sypyd Peptide. J. Mol. Biol. 1999, 292, 431−440. (10) Ripoll, D. R.; Vorobjev, Y. N.; Liwo, A.; Vila, J. A.; Scheraga, H. A. Coupling between Folding and Ionization Equilibria: Effects of Ph on the Conformational Preferences of Polypeptides. J. Mol. Biol. 1996, 264, 770−783. (11) Vila, J. A.; Ripoll, D. R.; Scheraga, H. A. Influence of Lysine Content and Ph on the Stability of Alanine-Based Copolypeptides. Biopolymers 2001, 58, 235−246. (12) Antosiewicz, J.; Mccammon, J. A.; Gilson, M. K. Prediction of Ph-Dependent Properties of Proteins. J. Mol. Biol. 1994, 238, 415− 436. (13) Bashford, D.; Gerwert, K. Electrostatic Calculations of the P K a Values of Ionizable Groups in Bacteriorhodopsin. J. Mol. Biol. 1992, 224, 473−486. (14) Sham, Y. Y.; Tao Chu, Z.; Warshel, A. Consistent Calculations of Pka’s of Ionizable Residues in Proteins: Semi-Microscopic and Microscopic Approaches. J. Phys. Chem. B 1997, 101, 4458−4472.

V. CONCLUSION In this paper we do a benchmark test containing hard-to-model highly pKa-shifted residues to assess the performances of various free energy simulation methods for pKa shift prediction and identify pitfalls. As initial assignments of protonation states, the coupling between ionizable groups and reorganization dominates the protonation process, the non-pH-coupled conventional MD based equilibrium and nonequilibrium free energy simulation methods are not capable of giving wellconverged and accurate results. They are also computationally expensive. All pKa prediction methods perform poorly in the correlation analysis and constant pH method gives the lowest mean deviation from experimental values. Even with single conformation in consideration, the empirical method of PROPKA3 shows the best performance as well as the best efficiency. In MD simulation, pH-coupled behavior needs to be considered and expanded ensemble based CpHMD would be the best method to give the pH-dependent picture. For the specific case of pKa shift prediction, PROPKA should be the most accurate and efficient method.



Article

ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jcim.7b00177. Titration curves, time series of deprotonation fraction, ∂V ∂V (λ ) as a function of λ values, time series of ∂λ (λi), ∂λ i ∂V

time series of cumulant average of ∂λ (λi), nonequilibrium work analysis, nonequilibrium convergence check data, and overlap matrices (PDF) O

DOI: 10.1021/acs.jcim.7b00177 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling (15) van Vlijmen, H. W. T.; Schaefer, M.; Karplus, M. Improving the Accuracy of Protein Pk a Calculations: Conformational Averaging Versus the Average Structure. Proteins: Struct., Funct., Genet. 1998, 33 (33), 145−158. (16) Koumanov, A.; Karshikoff, A.; Friis, E. P.; Borchert, T. V. Conformational Averaging in Pk Calculations: Improvement and Limitations in Prediction of Ionization Properties of Proteins. J. Phys. Chem. B 2001, 105, 9339−9344. (17) Song, Y.; Mao, J.; Gunner, M. R. Mcce2: Improving Protein Pkacalculations with Extensive Side Chain Rotamer Sampling. J. Comput. Chem. 2009, 30, 2231−2247. (18) Mehler, E. L.; Guarnieri, F. A Self-Consistent, Microenvironment Modulated Screened Coulomb Potential Approximation to Calculate Ph-Dependent Electrostatic Effects in Proteins. Biophys. J. 1999, 77, 3−22. (19) Wisz, M. S.; Hellinga, H. W. An Empirical Model for Electrostatic Interactions in Proteins Incorporating Multiple Geometry-Dependent Dielectric Constants. Proteins: Struct., Funct., Genet. 2003, 51, 360−377. (20) Still, W. C.; Tempczyk, A.; Hawley, R. C.; Hendrickson, T. Semianalytical Treatment of Solvation for Molecular Mechanics and Dynamics. J. Am. Chem. Soc. 1990, 112, 6127−6129. (21) Archontis, G.; Simonson, T. Proton Binding to Proteins: A Free-Energy Component Analysis Using a Dielectric Continuum Model. Biophys. J. 2005, 88, 3888−3904. (22) Pokala, N.; Handel, T. M. Energy Functions for Protein Design I: Efficient and Accurate Continuum Electrostatics and Solvation. Protein Sci. 2004, 13, 925−936. (23) Davies, M. N.; Toseland, C. P.; Moss, D. S.; Flower, D. R. Benchmarking Pka Prediction. BMC Biochem. 2006, 7, 18. (24) Stanton, C. L.; Houk, K. N. Benchmarking Pka Prediction Methods for Residues in Proteins. J. Chem. Theory Comput. 2008, 4, 951−966. (25) Olsson, M. H.; Søndergaard, C. R.; Rostkowski, M.; Jensen, J. H. Propka3: Consistent Treatment of Internal and Surface Residues in Empirical Pka Predictions. J. Chem. Theory Comput. 2011, 7, 525−537. (26) Murcko, M. A. Computational Methods to Predict Binding Free Energy in Ligand-Receptor Complexes. J. Med. Chem. 1995, 38, 4953− 4967. (27) Leach, A. R. The in Silico World of Virtual Libraries. Drug Discovery Today 2000, 5, 326−336. (28) Charifson, P. S.; Corkery, J. J.; Murcko, M. A.; Walters, W. P. Consensus Scoring: A Method for Obtaining Improved Hit Rates from Docking Databases of Three-Dimensional Structures into Proteins. J. Med. Chem. 1999, 42, 5100−5109. (29) Pearlman, D. A.; Charifson, P. S. Are Free Energy Calculations Useful in Practice? A Comparison with Rapid Scoring Functions for the P38 Map Kinase Protein System†. J. Med. Chem. 2001, 44, 3417− 3423. (30) Bruckner, S.; Boresch, S. Efficiency of Alchemical Free Energy Simulations. Ii. Improvements for Thermodynamic Integration. J. Comput. Chem. 2011, 32, 1320−1333. (31) Resat, H.; Mezei, M. Studies on Free Energy Calculations. I. Thermodynamic Integration Using a Polynomial Path. J. Chem. Phys. 1993, 99, 6052−6061. (32) Resat, H.; Mezei, M. Studies on Free Energy Calculations. Ii. A Theoretical Approach to Molecular Solvation. J. Chem. Phys. 1994, 101, 6126−6140. (33) Zwanzig, R. W. High Temperature Equation of State by a Perturbation Method. J. Chem. Phys. 1954, 22, 1420−1426. (34) Shirts, M. R.; Bair, E.; Hooker, G.; Pande, V. S. Equilibrium Free Energies from Nonequilibrium Measurements Using MaximumLikelihood Methods. Phys. Rev. Lett. 2003, 91, 140601. (35) Bennett, C. H. Efficient Estimation of Free Energy Differences from Monte Carlo Data. J. Comput. Phys. 1976, 22, 245−268. (36) Fenwick, M. K.; Escobedo, F. A. On the Use of Bennett’s Acceptance Ratio Method in Multi-Canonical-Type Simulations. J. Chem. Phys. 2004, 120, 3066−3074.

(37) Shirts, M. R.; Chodera, J. D. Statistically Optimal Analysis of Samples from Multiple Equilibrium States. J. Chem. Phys. 2008, 129, 124105. (38) Kirkwood, J. G. Statistical Mechanics of Fluid Mixtures. J. Chem. Phys. 1935, 3, 300. (39) Pearlman, D. A.; Kollman, P. A. The Lag between the Hamiltonian and the System Configuration in Free Energy Perturbation Calculations. J. Chem. Phys. 1989, 91, 7831−7839. (40) Straatsma, T. P.; Mccammon, J. A. Treatment of Rotational Isomers in Free Energy Calculations. Ii. Molecular Dynamics Simulation Study of 18-Crown-6 in Aqueous Solution as an Example of Systems with Large Numbers of Rotational Isomeric States. J. Chem. Phys. 1989, 91, 3631−3637. (41) Lee, F. S.; Chu, Z. T.; Bolger, M. B.; Warshel, A. Calculations of Antibody-Antigen Interactions: Microscopic and Semi-Microscopic Evaluation of the Free Energies of Binding of Phosphorylcholine Analogs to Mcpc603. Protein Eng., Des. Sel. 1992, 5, 215−228. (42) Aqvist, J.; Medina, C.; Samuelsson, J. E. A New Method for Predicting Binding Affinity in Computer-Aided Drug Design. Protein Eng., Des. Sel. 1994, 7, 385−391. (43) Carlson, H. A.; Jorgensen, W. L. An Extended Linear Response Method for Determining Free Energies of Hydration. J. Phys. Chem. 1995, 99, 10667−10673. (44) Wang, W.; Wang, J.; Kollman, P. A. What Determines the Van Der Waals Coefficient  in the Lie (Linear Interaction Energy) Method to Estimate Binding Free Energies Using Molecular Dynamics Simulations? Proteins: Struct., Funct., Genet. 1999, 34, 395−402. (45) Simonson, T.; Carlsson, J.; Case, D. A. Proton Binding to Proteins: Pka Calculations with Explicit and Implicit Solvent Models. J. Am. Chem. Soc. 2004, 126, 4167−4180. (46) Sham, Y. Y.; Chu, Z. T.; Tao, H.; Warshel, A. Examining Methods for Calculations of Binding Free Energies: Lra, Lie, Pdld-Lra, and Pdld/S-Lra Calculations of Ligands Binding to an Hiv Protease. Proteins: Struct., Funct., Genet. 2000, 39, 393−407. (47) Lee, F. S.; Chu, Z. T.; Warshel, A. Microscopic and Semimicroscopic Calculations of Electrostatic Energies in Proteins by Polaris and Enzymix Programs. J. Comput. Chem. 1993, 14, 161− 185. (48) Muegge, I.; Tao, H.; Warshel, A. A Fast Estimate of Electrostatic Group Contributions to the Free Energy of Protein-Inhibitor Binding. Protein Eng., Des. Sel. 1997, 10, 1363−1372. (49) Warshel, A.; Tao, H.; Fothergill, M.; Chu, Z. T. Effective Methods for Estimation of Binding Energies in Computer-Aided Drug Design. Isr. J. Chem. 1994, 34, 253−256. (50) Štrajbl, M.; Hong, G.; Warshel, A. Ab Initio Qm/Mm Simulation with Proper Sampling: First Principle Calculations of the Free Energy of the Autodissociation of Water in Aqueous Solution. J. Phys. Chem. B 2002, 106, 13333−13343. (51) Plotnikov, N. V.; Kamerlin, S. C.; Warshel, A. Paradynamics: An Effective and Reliable Model for Ab Initio Qm/Mm Free-Energy Calculations and Related Tasks. J. Phys. Chem. B 2011, 115, 7950− 7962. (52) Rosta, E.; Klähn, M.; Warshel, A. Towards Accurate Ab Initio Qm/Mm Calculations of Free-Energy Profiles of Enzymatic Reactions. J. Phys. Chem. B 2006, 110, 2934−2941. (53) Simonson, T. Gaussian Fluctuations and Linear Response in an Electron Transfer Protein. Proc. Natl. Acad. Sci. U. S. A. 2002, 99, 6544−6549. (54) Wood, R. H.; Muhlbauer, W. C. F.; Thompson, P. T. Systematic Errors in Free Energy Perturbation Calculations Due to a Finite Sample of Configuration Space: Sample-Size Hysteresis. J. Phys. Chem. 1991, 95, 6670−6675. (55) Paliwal, H.; Shirts, M. R. A Benchmark Test Set for Alchemical Free Energy Transformations and Its Use to Quantify Error in Common Free Energy Methods. J. Chem. Theory Comput. 2011, 7, 4115−4134. (56) Jarzynski, C. Equilibrium Free-Energy Differences from Nonequilibrium Measurements: A Master-Equation Approach. Phys. P

DOI: 10.1021/acs.jcim.7b00177 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling Rev. E: Stat. Phys., Plasmas, Fluids, Relat. Interdiscip. Top. 1997, 56, 5018−5035. (57) Hunter, J. E.; Reinhardt, W. P.; Davis, T. F. A Finite-Time Variational Method for Determining Optimal Paths and Obtaining Bounds on Free Energy Changes from Computer Simulations. J. Chem. Phys. 1993, 99, 6856. (58) Reinhardt, W. P.; Hunter, J. E. Variational Path Optimization and Upper and Lower Bounds to Free Energy Changes Via Finite Time Minimization of External Work. J. Chem. Phys. 1992, 97, 1599. (59) Miller, M. A.; Reinhardt, W. P. Efficient Free Energy Calculations by Variationally Optimized Metric Scaling: Concepts and Applications to the Volume Dependence of Cluster Free Energies and to Solid−Solid Phase Transitions. J. Chem. Phys. 2000, 113, 7035. (60) Jarzynski, C. Equilibrium and Nonequilibrium Foundations of Free Energy Computational Methods. Computational Methods for Macromolecules: Challenges and Applications; Springer: Berlin Heidelberg, 2002; Vol. 24, pp 287−303. (61) Laio, A.; Parrinello, M. Escaping Free-Energy Minima. Proc. Natl. Acad. Sci. U. S. A. 2002, 99, 12562−12566. (62) Marsili, S.; Barducci, A.; Chelli, R.; Procacci, P.; Schettino, V. Self-Healing Umbrella Sampling: A Non-Equilibrium Approach for Quantitative Free Energy Calculations. J. Phys. Chem. B 2006, 110, 14011−14013. (63) Jarzynski, C. A Nonequilibrium Equality for Free Energy Differences. Phys. Rev. Lett. 1997, 78, 2690−2693. (64) Crooks, G. E. Nonequilibrium Measurements of Free Energy Differences for Microscopically Reversible Markovian Systems. J. Stat. Phys. 1998, 90, 1481−1487. (65) Crooks, G. E. Path-Ensemble Averages in Systems Driven Far from Equilibrium. Phys. Rev. E: Stat. Phys., Plasmas, Fluids, Relat. Interdiscip. Top. 2000, 61, 2361−2366. (66) Procacci, P.; Marsili, S.; Barducci, A.; Signorini, G. F.; Chelli, R. Crooks Equation for Steered Molecular Dynamics Using a NoséHoover Thermostat. J. Chem. Phys. 2006, 125, 164101. (67) Hummer, G. Fast-Growth Thermodynamic Integration: Error and Efficiency Analysis. J. Chem. Phys. 2001, 114, 7330. (68) Hess, B.; Peter, C.; Ozal, T.; van der Vegt, N. F. Fast-Growth Thermodynamic Integration: Calculating Excess Chemical Potentials of Additive Molecules in Polymer Microstructures. Macromolecules 2008, 41, 2283−2289. (69) Chelli, R.; Marsili, S.; Barducci, A.; Procacci, P. Generalization of the Jarzynski and Crooks Nonequilibrium Work Theorems in Molecular Dynamics Simulations. Phys. Rev. E 2007, 75, 050101. (70) Braga, C.; Muscatello, J.; Lau, G.; Müller, E. A.; Jackson, G. Nonequilibrium Study of the Intrinsic Free-Energy Profile across a Liquid-Vapour Interface. J. Chem. Phys. 2016, 144, 044703. (71) Chernyak, V. Y.; Chertkov, M.; Malinin, S. V.; Teodorescu, R. Non-Equilibrium Thermodynamics and Topology Of currents. J. Stat. Phys. 2009, 137, 109−147. (72) Chelli, R.; Gellini, C.; Pietraperzia, G.; Giovannelli, E.; Cardini, G. Path-Breaking Schemes for Nonequilibrium Free Energy Calculations. J. Chem. Phys. 2013, 138, 214109. (73) Schöllpaschinger, E.; Dellago, C. A Proof of Jarzynski’s Nonequilibrium Work Theorem for Dynamical Systems That Conserve the Canonical Distribution. J. Chem. Phys. 2006, 125, 054105−054105−054105. (74) Chelli, R.; Marsili, S.; Barducci, A.; Procacci, P. Recovering the Crooks Equation for Dynamical Systems in the Isothermal-Isobaric Ensemble: A Strategy Based on the Equations of Motion. J. Chem. Phys. 2007, 126, 044502. (75) Wu, D.; Kofke, D. A. Rosenbluth-Sampled Nonequilibrium Work Method for Calculation of Free Energies in Molecular Simulation. J. Chem. Phys. 2005, 122, 204104. (76) Hudson, P. S.; Woodcock, H. L.; Boresch, S. Use of Nonequilibrium Work Methods to Compute Free Energy Differences between Molecular Mechanical and Quantum Mechanical Representations of Molecular Systems. J. Phys. Chem. Lett. 2015, 6, 4850−4856.

(77) Imparato, A.; Peliti, L. Work Distribution and Path Integrals in General Mean-Field Systems. Europhysics Letters (EPL) 2005, 70, 740−746. (78) Zimanyi, E. N.; Silbey, R. J. The Work-Hamiltonian Connection and the Usefulness of the Jarzynski Equality for Free Energy Calculations. J. Chem. Phys. 2009, 130, 171102. (79) Zuckerman, D. M.; Woolf, T. B. Extrapolative Analysis of FastSwitching Free Energy Estimates in a Molecular System. arXiv preprint physics/0107066, 2001. (80) Minh, D. D.; Adib, A. B. Path Integral Analysis of Jarzynski’s Equality: Analytical Results. Phys. Rev. E Stat Nonlin Soft Matter Phys. 2009, 79, 021122. (81) Mallick, K.; Moshe, M.; Orland, H. Supersymmetry and Nonequilibrium Work Relations. arXiv preprint arXiv:0711.2059, 2007. (82) Gozzi, E.; et al. Hidden Brs Invariance in Classical Mechanics. Phys. Rev. D: Part. Fields 1989, 40, 3363−3377. (83) van Zon, Z. R.; Hernández de la Pena, L.; Peslherbe, G. H.; Schofield, J. Quantum Free-Energy Differences from Nonequilibrium Path Integrals. I. Methods and Numerical Application. Phys. Rev. E 2008, 78, 2517−2530. (84) van Zon, R.; Hernández de la Pena, L.; Peslherbe, G. H.; Schofield, J. Quantum Free-Energy Differences from Nonequilibrium Path Integrals. II. Convergence Properties for the Harmonic Oscillator. Phys. Rev. E 2008, 78, 561−570. (85) Dickson, A.; Dinner, A. R. Enhanced Sampling of Nonequilibrium Steady States. Annu. Rev. Phys. Chem. 2010, 61, 441−459. (86) Rami Reddy, M.; Erion, M. D. Free Energy Calculations in Rational Drug Design; Springer: 2001; ISBN 9780306466762. (87) Baptista, A. M.; Martel, P. J.; Petersen, S. B. Simulation of Protein Conformational Freedom as a Function of Ph: Constant-Ph Molecular Dynamics Using Implicit Titration. Proteins: Struct., Funct., Genet. 1997, 27, 523−544. (88) Baptista, A. M.; Teixeira, V. H.; Soares, C. M. Constant-Ph Molecular Dynamics Using Stochastic Titration. J. Chem. Phys. 2002, 117, 4184−4200. (89) Bürgi, R.; Kollman, P. A.; Gunsteren, W. F. V. Simulating Proteins at Constant Ph: An Approach Combining Molecular Dynamics and Monte Carlo Simulation. Proteins: Struct., Funct., Genet. 2002, 47, 469−480. (90) Dlugosz, M.; Antosiewicz, J. M. Constant-Ph Molecular Dynamics Simulations: A Test Case of Succinic Acid. Chem. Phys. 2004, 302, 161−170. (91) Khandogin, J.; Brooks, C. L. Constant Ph Molecular Dynamics with Proton Tautomerism. Biophys. J. 2005, 89, 141−157. (92) Lee, M. S.; Salsbury, F. R.; Brooks, C. L. Constant-Ph Molecular Dynamics Using Continuous Titration Coordinates. Proteins: Struct., Funct., Genet. 2004, 56, 738−752. (93) Swails, J. M.; York, D. M.; Roitberg, A. E. Constant Ph Replica Exchange Molecular Dynamics in Explicit Solvent Using Discrete Protonation States: Implementation, Testing, and Validation. J. Chem. Theory Comput. 2014, 10, 1341−1352. (94) Onufriev, A.; Case, D. A.; Ullmann, G. M. A Novel View of Ph Titration in Biomolecules. Biochemistry 2001, 40, 3413−3419. (95) Søndergaard, C. R.; Olsson, M. H. M.; Rostkowski, M.; Jensen, J. H. Improved Treatment of Ligands and Coupling Effects in Empirical Calculation and Rationalization of Pkavalues. J. Chem. Theory Comput. 2011, 7, 2284−2295. (96) Börjesson, U.; Hünenberger, P. H. Explicit-Solvent Molecular Dynamics Simulation at Constant Ph: Methodology and Application to Small Amines. J. Chem. Phys. 2001, 114, 9706−9719. (97) Dobrev, P.; Donnini, S.; Groenhof, G.; Grubmüller, H. An Accurate Three States Model for Amino Acids with Two Chemically Coupled Titrating Sites in Explicit Solvent Atomistic Constant Ph Simulations and Pka Calculations. J. Chem. Theory Comput. 2017, 13, 147−160. (98) Mongan, J.; Case, D. A.; McCammon, J. A. Constant Ph Molecular Dynamics in Generalized Born Implicit Solvent. J. Comput. Chem. 2004, 25, 2038−2048. Q

DOI: 10.1021/acs.jcim.7b00177 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling (99) Lee, J.; Miller, B. T.; Damjanović, A.; Brooks, B. R. Enhancing Constant-Ph Simulation in Explicit Solvent with a Two-Dimensional Replica Exchange Method. J. Chem. Theory Comput. 2015, 11, 2560− 2574. (100) Georgescu, R.; Alexov, E.; Gunner, M. Combining Conformational Flexibility and Continuum Electrostatics for Calculating Pkas in Proteins: Biophysical Journal. Biophys. J. 2002, 83, 1731−1748. (101) Nguyen, H.; Roe, D. R.; Simmerling, C. Improved Generalized Born Solvent Model Parameters for Protein Simulations. J. Chem. Theory Comput. 2013, 9, 2020−2034. (102) Hornak, V.; Abel, R.; Okur, A.; Strockbine, B.; Roitberg, A.; Simmerling, C. Comparison of Multiple Amber Force Fields and Development of Improved Protein Backbone Parameters. Proteins: Struct., Funct., Genet. 2006, 65, 712−725. (103) Bashford, D.; Case, D. A.; Dalvit, C.; Tennant, L.; Wright, P. E. Electrostatic Calculations of Side-Chain Pk(a) Values in Myoglobin and Comparison with Nmr Data for Histidines. Biochemistry 1993, 32, 8045−8056. (104) Alexov, E.; Mehler, E. L.; Baker, N.; Baptista, A.; Huang, Y.; Milletti, F.; Nielsen, J. E.; Farrell, D.; Carstensen, T.; Olsson, M. H. M.; et al. Progress in the Prediction of Pka Values in Proteins. Proteins: Struct., Funct., Genet. 2011, 79, 3260−3275. (105) Wallace, J. A.; Wang, Y.; Shi, C.; Pastoor, K. J.; Nguyen, B. L.; Xia, K.; Shen, J. K. Toward Accurate Prediction of Pka Values for Internal Protein Residues: The Importance of Conformational Relaxation and Desolvation Energy. Proteins: Struct., Funct., Genet. 2011, 79, 3364−3373. (106) Chen, W.; Huang, Y.; Shen, J. Conformational Activation of a Transmembrane Proton Channel from Constant Ph Molecular Dynamics. J. Phys. Chem. Lett. 2016, 7, 3961. (107) Huang, Y.; Chen, W.; Dotson, D. L.; Beckstein, O.; Shen, J. Mechanism of Ph-Dependent Activation of the Sodium-Proton Antiporter Nhaa. Nat. Commun. 2016, 7, 12940. (108) Huang, Y.; Chen, W.; Wallace, J. A.; Shen, J. All-Atom Continuous Constant Ph Molecular Dynamics with Particle Mesh Ewald and Titratable Water. J. Chem. Theory Comput. 2016, 12, 5411. (109) Wallace, J. A.; Shen, J. K. Continuous Constant Ph Molecular Dynamics in Explicit Solvent with Ph-Based Replica Exchange. J. Chem. Theory Comput. 2011, 7, 2617. (110) Martinez-Veracoechea, F. J.; Mladek, B. M.; Tkachenko, A. V.; Frenkel, D. Design Rule for Colloidal Crystals of DNA-Functionalized Particles. Phys. Rev. Lett. 2011, 107, 045902. (111) Free Energy Calculations: Theory and Applications in Chemistry and Biology; Chipot, C.; Pohorille, A., Eds.; Springer Series in Chemical Physics; Springer: New York, 2007. (112) Shirts, M. R.; Pande, V. S. Comparison of Efficiency and Bias of Free Energies Computed by Exponential Averaging, the Bennett Acceptance Ratio, and Thermodynamic Integration. J. Chem. Phys. 2005, 122, 144107−144107. (113) Ytreberg, F. M.; Swendsen, R. H.; Zuckerman, D. M. Comparison of Free Energy Methods for Molecular Systems. J. Chem. Phys. 2006, 125, 184114. (114) Pohorille, A.; Jarzynski, C.; Chipot, C. Good Practices in FreeEnergy Calculations. J. Phys. Chem. B 2010, 114, 10235−10253. (115) Klimovich, P. V.; Shirts, M. R.; Mobley, D. L. Guidelines for the Analysis of Free Energy Calculations. J. Comput.-Aided Mol. Des. 2015, 29, 397−411. (116) Sun, Z.; Wang, X.; Zhang, J. Z. H. Bar-Based Optimum Adaptive Sampling Regime for Variance Minimization in Alchemical Transformation. Phys. Chem. Chem. Phys. 2017, 19, 15005. (117) Chandler, D. Introduction to Modern Statistical Mechanics; Oxford University Press: New York, Oxford, 1987. (118) Tsao, L.-W.; Sheu, S.-Y.; Mou, C.-Y. Absolute Entropy of Simple Point Charge Model Water by Adiabatic Switching Processes. J. Chem. Phys. 1994, 101, 2302. (119) Hendrix, D. A.; Jarzynski, C. A “Fast Growth” Method of Computing Free Energy Differences. J. Chem. Phys. 2001, 114, 5974. (120) Crooks, G. E. Excursions in Statistical Dynamics. Ph.D. Thesis, University of California, Berkeley, 1999.

(121) Procacci, P. Unbiased Free Energy Estimates in Fast Nonequilibrium Transformations Using Gaussian Mixtures. J. Chem. Phys. 2015, 142, 154117. (122) Kendall, M. G. A New Measure of Rank Correlation. Biometrika 1938, 30, 81−93. (123) Anisimov, V. M.; Ziemys, A.; Kizhake, S.; Yuan, Z.; Natarajan, A.; Cavasotto, C. N. Computational and Experimental Studies of the Interaction between Phospho-Peptides and the C-Terminal Domain of Brca1. J. Comput.-Aided Mol. Des. 2011, 25, 1071−1084. (124) Mikulskis, P.; Cioloboc, D.; Andrejic, M.; Khare, S.; Brorsson, J.; Genheden, S.; Mata, R. A.; Soderhjelm, P.; Ryde, U. Free-Energy Perturbation and Quantum Mechanical Study of Sampl4 Octa-Acid Host-Guest Binding Energies. J. Comput.-Aided Mol. Des. 2014, 28, 375−400. (125) Flyvbjerg, H. Error Estimates on Averages of Correlated Data. Lecture Notes in Physics 1998, 501, 88−103. (126) Straatsma, T. P.; Berendsen, H. J. C.; Postma, J. P. M. Free Energy of Hydrophobic Hydration: A Molecular Dynamics Study of Noble Gases in Water. J. Chem. Phys. 1986, 85, 6720−6727. (127) Chodera, J. D.; Swope, W. C.; Pitera, J. W.; Seok, C.; Dill, K. A. Use of the Weighted Histogram Analysis Method for the Analysis of Simulated and Parallel Tempering Simulations. J. Chem. Theory Comput. 2007, 3, 26−41. (128) Boissy, G.; de La Fortelle, E.; Kahn, R.; Huet, J. C.; Bricogne, G.; Pernollet, J. C.; Brunie, S. Crystal Structure of a Fungal Elicitor Secreted by Phytophthora Cryptogea, a Member of a Novel Class of Plant Necrotic Proteins. Structure 1996, 4, 1429−1439. (129) Langhorst, U.; Loris, R.; Denisov, V. P.; Doumen, J.; Roose, P.; Maes, D.; Halle, B.; Steyaert, J. Dissection of the Structural and Functional Role of a Conserved Hydration Site in Rnase T1. Protein Sci. 1999, 8, 722−730. (130) Sidhu, G.; Withers, S. G.; Nguyen, N. T.; McIntosh, L. P.; Ziser, L.; Brayer, G. D. Sugar Ring Distortion in the Glycosyl-Enzyme Intermediate of a Family G/11 Xylanase. Biochemistry 1999, 38, 5346− 5354. (131) Su, S.; Gao, Y. G.; Robinson, H.; Liaw, Y. C.; Edmondson, S. P.; Shriver, J. W.; Wang, A. H. Crystal Structures of the Chromosomal Proteins Sso7d/Sac7d Bound to DNA Containing T-G Mismatched Base-Pairs. J. Mol. Biol. 2000, 303, 395−403. (132) Singhal, A. K.; Chien, K. Y.; Wu, W. G.; Rule, G. S. Solution Structure of Cardiotoxin V from Naja Naja Atra. Biochemistry 1993, 32, 8036−8044. (133) Deswarte, J.; De Vos, S.; Langhorst, U.; Steyaert, J.; Loris, R. The Contribution of Metal Ions to the Conformational Stability of Ribonuclease T1: Crystal Versus Solution. Eur. J. Biochem. 2001, 268, 3993−4000. (134) Sun, Y. J.; Wu, W. G.; Chiang, C. M.; Hsin, A. Y.; Hsiao, C. D. Crystal Structure of Cardiotoxin V from Taiwan Cobra Venom: PhDependent Conformational Change and a Novel Membrane-Binding Motif Identified in the Three-Finger Loops of P-Type Cardiotoxin. Biochemistry 1997, 36, 2403−2413. (135) Sevcik, J.; Lamzin, V. S.; Dauter, Z.; Wilson, K. S. Atomic Resolution Data Reveal Flexibility in the Structure of Rnase Sa. Acta Crystallogr., Sect. D: Biol. Crystallogr. 2002, 58, 1307−1313. (136) Sevcik, J.; Dauter, Z.; Lamzin, V. S.; Wilson, K. S. Ribonuclease from Streptomyces Aureofaciens at Atomic Resolution. Acta Crystallogr., Sect. D: Biol. Crystallogr. 1996, 52, 327−344. (137) Jeng, M. F.; Campbell, A. P.; Begley, T.; Holmgren, A.; Case, D. A.; Wright, P. E.; Dyson, H. J. High-Resolution Solution Structures of Oxidized and Reduced Escherichia Coli Thioredoxin. Structure 1994, 2, 853−868. (138) Gooley, P. R.; Keniry, M. A.; Dimitrov, R. A.; Marsh, D. E.; Keizer, D. W.; Gayler, K. R.; Grant, B. R. The Nmr Solution Structure and Characterization of Ph Dependent Chemical Shifts of the BetaElicitin, Cryptogein. J. Biomol. NMR 1998, 12, 523−534. (139) Spitzner, N.; Löhr, F.; Pfeiffer, S.; Koumanov, A.; Karshikoff, A.; Rüterjans, H. Ionization Properties of Titratable Groups in Ribonuclease T1. I. Pka Values in the Native State Determined by R

DOI: 10.1021/acs.jcim.7b00177 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling Two-Dimensional Heteronuclear Nmr Spectroscopy. Eur. Biophys. J. 2001, 30, 186. (140) Giletto, A.; Pace, C. N. Buried, Charged, Non-Ion-Paired Aspartic Acid 76 Contributes Favorably to the Conformational Stability of Ribonuclease T1†. Biochemistry 1999, 38, 13379. (141) Koumanov, A.; Spitzner, N.; Rüterjans, H.; Karshikoff, A. Ionization Properties of Titratable Groups in Ribonuclease T1. Eur. Biophys. J. 2001, 30, 198−206. (142) Inagaki, F.; Kawano, Y.; Shimada, I.; Takahashi, K.; Miyazawa, T. Nuclear Magnetic Resonance Study on the Microenvironments of Histidine Residues of Ribonuclease T1 and Carboxymethylated Ribonuclease T11. J. Biochem. 1981, 89, 1185−1195. (143) Joshi, M. D.; Hedberg, A.; Mcintosh, L. P. Complete Measurement of the Pka Values of the Carboxyl and Imidazole Groups in Bacillus Circulans Xylanase. Protein Sci. 1997, 6, 2667− 2670. (144) Consonni, R.; Arosio, I.; Belloni, B.; Fogolari, F.; Fusi, P.; Shehi, E.; Zetta, L. Investigations of Sso7d Catalytic Residues by Nmr Titration Shifts and Electrostatic Calculations. Biochemistry 2003, 42, 1421−1429. (145) Chiang, C. M.; Chien, K. Y.; Lin, H. J.; Lin, J. F.; Yeh, H. C.; Ho, P. L.; Wu, W. G. Conformational Change and Inactivation of Membrane Phospholipid-Related Activity of Cardiotoxin V from Taiwan Cobra Venom at Acidic Ph. Biochemistry 1996, 35, 9167− 9176. (146) Chiang, C. M.; Chang, S. H.; Wu, W. G.; Lin, H. J. The Role of Acidic Amino Acid Residues in the Structural Stability of Snake Cardiotoxins. Biochemistry 1996, 35, 9177−9186. (147) Laurents, D. V.; Huyghues-Despointes, B. M. P.; Bruix, M.; Thurlkill, R. L.; Schell, D.; Newsom, S.; Grimsley, G. R.; Shaw, K. L.; Treviño, S.; Rico, M.; et al. Charge−Charge Interactions Are Key Determinants of the P K Values of Ionizable Groups in Ribonuclease Sa (Pi = 3.5) and a Basic Variant (Pi = 10.2). J. Mol. Biol. 2003, 325, 1077−1092. (148) Langsetmo, K.; Fuchs, J. A.; Woodward, C. The Conserved, Buried Aspartic Acid in Oxidized Escherichia Coli Thioredoxin Has a Pka of 7.5. Its Titration Produces a Related Shift in Global Stability. Biochemistry 1991, 30, 7603−7609. (149) Forsyth, W. R.; Antosiewicz, J. M.; Robertson, A. D. Empirical Relationships between Protein Structure and Carboxyl Pka Values in Proteins. Proteins: Struct., Funct., Genet. 2002, 48, 388−403. (150) Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L. Comparison of Simple Potential Functions for Simulating Liquid Water. J. Chem. Phys. 1983, 79, 926−935. (151) Price, D. J.; Brooks, C. L. A Modified Tip3p Water Potential for Simulation with Ewald Summation. J. Chem. Phys. 2004, 121, 10096−10103. (152) Joung, I. S.; Cheatham, T. E. Determination of Alkali and Halide Monovalent Ion Parameters for Use in Explicitly Solvated Biomolecular Simulations. J. Phys. Chem. B 2008, 112, 9020−9041. (153) Joung, I. S.; Cheatham, T. E. Molecular Dynamics Simulations of the Dynamic and Energetic Properties of Alkali and Halide Ions Using Water-Model-Specific Ion Parameters. J. Phys. Chem. B 2009, 113, 13279−13290. (154) Ravishanker, G.; Mezei, M.; Beveridge, D. L. Conformational Stability and Flexibility of the Ala Dipeptide in Free Space and Water: Monte Carlo Computer Simulation Studies †. J. Comput. Chem. 1986, 7, 345−348. (155) Cross, A. J. Influence of Hamiltonian Parameterization on Convergence of Kirkwood Free Energy Calculations. Chem. Phys. Lett. 1986, 128, 198−202. (156) Steinbrecher, T.; Mobley, D. L.; Case, D. A. Nonlinear Scaling Schemes for Lennard-Jones Interactions in Free Energy Calculations. J. Chem. Phys. 2007, 127, 214108. (157) Zacharias, M.; Straatsma, T. P.; Mccammon, J. A. SeparationShifted Scaling, a New Scaling Method for Lennard-Jones Interactions in Thermodynamic Integration. J. Chem. Phys. 1994, 100, 9025−9031. (158) Beutler, T. C.; Mark, A. E.; Schaik, R. C. V.; Gerber, P. R.; Gunsteren, W. F. V. Avoiding Singularities and Numerical Instabilities

in Free Energy Calculations Based on Molecular Simulations. Chem. Phys. Lett. 1994, 222, 529−539. (159) Pitera, J. W.; van Gunsteren, W. F. A Comparison of NonBonded Scaling Approaches for Free Energy Calculations. Mol. Simul. 2002, 28, 45−65. (160) Bitetti-Putzer, R.; et al. Generalized Ensembles Serve to Improve the Convergence of Free Energy Simulations. Chem. Phys. Lett. 2003, 377, 633−641. (161) Chipot, C.; Rozanska, X.; Dixit, S. B. Can Free Energy Calculations Be Fast and Accurate at the Same Time? Binding of LowAffinity, Non-Peptide Inhibitors to the Sh2 Domain of the Src Protein. J. Comput.-Aided Mol. Des. 2005, 19, 765−770. (162) Fowler, P. W.; Jha, S.; Coveney, P. V. Grid-Based Steered Thermodynamic Integration Accelerates the Calculation of Binding Free Energies. Philos. Trans. R. Soc., A 2005, 363, 1999−2015. (163) Levitt, M. Protein Folding by Restrained Energy Minimization and Molecular Dynamics. J. Mol. Biol. 1983, 170, 723−764. (164) Onufriev, A.; Bashford, D.; Case, D. A. Exploring Protein Native States and Large-Scale Conformational Changes with a Modified Generalized Born Model. Proteins: Struct., Funct., Genet. 2004, 55, 383−394. (165) Onufriev, A.; Bashford, D.; Case, D. A. Modification of the Generalized Born Model Suitable for Macromolecules. J. Phys. Chem. B 2000, 104, 3712−3720. (166) Debye, P. Näherungsformeln Für Die Zylinderfunktionen Für Große Werte Des Arguments Und Unbeschränkt Veränderliche Werte Des Index. Mathematische Annalen 1909, 67, 535−558. (167) Ryckaert, J. P.; Ciccotti, G.; Berendsen, H. J. C. Numerical Integration of the Cartesian Equations of Motion of a System with Constraints: Molecular Dynamics of N -Alkanes. J. Comput. Phys. 1977, 23, 327−341. (168) Miyamoto, S.; Kollman, P. A. Settle: An Analytical Version of the Shake and Rattle Algorithm for Rigid Water Models. J. Comput. Chem. 1992, 13, 952−962. (169) Pastor, R. W.; Brooks, B. R.; Szabo, A. An Analysis of the Accuracy of Langevin and Molecular Dynamics Algorithms. Mol. Phys. 1988, 65, 1409−1419. (170) York, D. M.; Darden, T. A.; Pedersen, L. G. The Effect of Long-Range Electrostatic Interactions in Simulations of Macromolecular Crystals: A Comparison of the Ewald and Truncated List Methods. J. Chem. Phys. 1993, 99, 8345−8348. (171) Case, D. A.; Cheatham, T. E.; Darden, T.; Gohlke, H.; Luo, R.; Merz, K. M.; Alexey, O.; Carlos, S.; Bing, W.; Woods, R. J.; et al. The Amber Biomolecular Simulation Programs. J. Comput. Chem. 2005, 26, 1668−1688. (172) Goutelle, S.; Maurin, M. F.; Barbaut, X.; Bourguignon, L.; Ducher, M.; Maire, P.; Rougier, F. The Hill Equation: A Review of Its Capabilities in Pharmacological Modelling. Fundam. Clin. Pharmacol. 2008, 22, 633−648. (173) Yang, W.; Bitettiputzer, R.; Karplus, M. Free Energy Simulations: Use of Reverse Cumulative Averaging to Determine the Equilibrated Region and the Time Required for Convergence. J. Chem. Phys. 2004, 120, 2618−2628. (174) Awoonor-Williams, E.; Rowley, C. N. Evaluation of Methods for the Calculation of the Pka of Cysteine Residues in Proteins. J. Chem. Theory Comput. 2016, 12, 4662−4673.

S

DOI: 10.1021/acs.jcim.7b00177 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX