Accuracy Comparison of Generalized Born Models ... - ACS Publications

California 94080, Department of Pharmaceutical Sciences, University of Maryland School of. Pharmacy, Baltimore, Maryland 21201, Institute of Molecular...
1 downloads 14 Views 2MB Size
Subscriber access provided by READING UNIV

Article

Accuracy Comparison of Generalized Born Models in the Calculation of Electrostatic Binding Free Energies Saeed Izadi, Robert C Harris, Marcia O Fenley, and Alexey V Onufriev J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.7b00886 • Publication Date (Web): 29 Jan 2018 Downloaded from http://pubs.acs.org on January 31, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Theory and Computation is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Accuracy Comparison of Generalized Born Models in the Calculation of Electrostatic Binding Free Energies Saeed Izadi,† Robert C. Harris,‡ Marcia O. Fenley,∗,¶ and Alexey V. Onufriev∗,§,k Early Stage Pharmaceutical Development, Genentech Inc., 1 DNA Way, South San Francisco, California 94080, Department of Pharmaceutical Sciences, University of Maryland School of Pharmacy, Baltimore, Maryland 21201, Institute of Molecular Biophysics, Florida State University, Tallahassee, Florida 32306-3408, Department of Computer Science and Physics, Virginia Tech, Blacksburg, VA 24061, and Center for Soft Matter and Biological Physics, Virginia Tech, Blacksburg, VA 24061 E-mail: [email protected]; [email protected]

Abstract The need for accurate yet efficient representation of the aqueous environment in biomolecular modelling has led to the development of a variety of generalized Born (GB) implicit solvent models. While many studies have focused on the accuracy of available GB models in predicting solvation free energies, a systematic assessment of the quality of these models in binding free energy calculations, crucial for rational drug design, has not been undertaken. Here, we evaluate the accuracies of eight common GB flavors (GB-HCT, GB-OBC, GBneck2, GBNSR6, GBSW, GBMV1, GBMV2 and GBMV3), available in major molecular dynamics packages, in predicting the electrostatic binding free energies (∆∆Gel ) for a diverse set of 60 biomolecular complexes belonging to four main classes: protein-protein, protein-drug, RNA-peptide and small complexes. The GB flavors are examined in terms of their ability to reproduce the results from the Poisson-Boltzmann (PB) model, commonly used as accuracy reference in this context. We show that the agreement with the PB of ∆∆Gel estimates varies widely between different GB models and also across different types of biomolecular complexes, with R2 correlations ranging from 0.3772 to 0.9986. A surface-based “R6” GB model recently implemented in AMBER shows closest over-all agreement with reference PB (R2 =0.9949, RMSD=8.75 kcal/mol). The RNApeptide and protein-drug complex sets appear to be most challenging for all but one model, as indicated by the large deviations from the PB in ∆∆Gel . Small neutral complexes present the least challenge for most of the GB models tested. The quantitative demonstration of the strengths and weaknesses of the GB models across the diverse complex types provided here can be used as a guide for practical computations and future development efforts.

INTRODUCTION

challenging. 1–9 Experimental approaches for assessing the free energy of binding are difficult and prohibitively expensive on large scale. 10 Therefore, considerable effort has been devoted to develop in sil-

Accurate determination of the binding affinities of small molecules with a biological target is crucial in molecular design and drug discovery but remains 1

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 25

the Poisson–Boltzmann (PB) model, 14,29,38,45,51–57 or, in case where computational efficiency is paramount, the generalized Born (GB) approximation. 36,58–63 The GB, of which a great variety of flavors and implementations are available, 63–92 has become popular due to a reasonable balance between accuracy and speed and the availability of a diverse class of GB implementations in major molecular modeling packages. 93–97 While arguably most widely used in molecular dynamics simulations, the relatively high efficiency of the GB model makes it also very promising in large-scale estimates of solvation free energies and, hence, in estimates of receptor-ligand binding interactions, 16,30,34,61,98,99 especially in conjunction with approaches, such as MM-PB(GB)SA, 14–16 where large numbers of snapshots need to be processed. In the past, systematic studies were performed to examine the accuracy and efficiency of computing the electrostatic solvation free energies (∆Gel ) with various implicit solvent models, including the GB and PB. 18,19,64,100–103 However, even a reasonably good accuracy of a GB (or the PB, for that matter) model in reproducing the electrostatic solvation free energy alone may not necessarily translate into the same level of accuracy for ligand binding calculations. 56,63,104 One reason is that the values of binding free energies are relatively much smaller than the free energy terms corresponding to the bound and unbound states (e.g. tens of kcal/mol for binding affinities vs. thousands of kcal/mol for solvation free energies), which leads to high sensitivity of the method to parameters used in the calculations of binding affinities. 56,105–107 To the best of our knowledge, no systematic assessment of the GB accuracy in receptor-ligand binding was ever performed on a large and diverse set of complexes, simultaneously for most GB models available in major modeling packages. This study is intended to fill this critical gap. Here, we study the accuracies of the electrostatic binding free energies (∆∆Gel ) computed for a set of 60 biomolecular complexes using eight commonly used GB models available in the major molecular modeling packages AMBER 93 and CHARMM. 94 Several of these GB models originally developed for these packages have since been adopted by other major modeling

ico approaches with the aim of reducing time and cost in modern drug design processes, namely free energy perturbation (FEP), 11 thermodynamic integration (TI), 12 linear response (LR) 13 and molecular mechanics-Poisson-Boltzmann-surface area (MMPBSA) 14,15 methods. Among all, molecular simulations have gained significant popularity in estimating binding affinities due to substantial methodological improvements and advanced computational power. 14,16,17 In essence, the free energy of binding is largely determined from the combined effect of the delicate balance between receptor-ligand, solvent-ligand, solventreceptor and solvent-solvent interactions. The quality of the underlying solvent environment can therefore have a profound effect to the outcomes of receptorligand binding calculations. 18–20 Traditionally, the effects of aqueous solvation have been accounted for by placing a sufficiently large number of individual water molecules around the solute, and simulating their motion on an equal footing with the molecule of interest. 21–23 While arguably the most realistic of the current theoretical approaches, it suffers from considerable computational costs, which often becomes prohibitive, for example for the receptor-ligand docking problem in which one searches for the best position of the ligand molecule (drug candidate) on the target receptor. 24,25 The cost of such calculations increase dramatically with the system size. Moreover, it is not apriori clear at this point which, if any, of the numerous available explict water models can be used as an accuracy standard for estimates of receptor-ligand binding affinities. For example, the popular TIP3P model can lead to 40 % (6 kcal/mol) error relative to experiment in binding enthalpy estimates of even small hostguest systems; 26 differences between binding free energy estimates based on the widely used TIP3P and TIP4P-Ew models can be as high as 9 kcal/mol. 27 The implicit solvation methodology 28,29,29–50 provides significant computational advantages by representing the discrete solvent molecules by a uniform continuous medium with the average properties of the solvent. Currently, the practical “engine” of the implicit solvent models – responsible for the estimation of the key electrostatic interactions – is either 2

ACS Paragon Plus Environment

Page 3 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

METHODS

packages. 95–97 The sets of biomolecular complexes tested here are diverse with respect to the biomolecule type (e.g. nucleic acids, proteins and drugs), structure sizes, net charges, as well as ∆∆Gel values. The objective is to provide a quantitative assessment of the accuracies of common GB models across diverse complex types. All of the GB models (flavors) considered here have been developed in the context of simulating biological macromolecules; the PB model has been used as their natural accuracy reference, including previous largescale estimates of the GB performance across multiple models 64,100 focused on solvation free energy. We adopt the same strategy here in terms of the PB reference, and compare the predictions of ∆Gel and ∆∆Gel by different GB models to those given by the numerical PB. That is we focus on the electrostatic component of the binding free energy, since it is the electrostatics that the GB approximation is designed to estimate. We fully acknowledge the approximate nature of the PB; 108 further justification for the use of the numerical PB reference for the GB models considered here is provided in “Methods”. Implications for practical calculations of binding affinity are discussed in “Conclusions”. We expressly avoid comparing computational efficiencies of different models from multiple packages – interested readers are referred to previous comprehensive studies, 64,100 and , for the newest models, to the original manuscripts. 109,110 The remainder of the paper is organized as follows. First, we present the specifics of the receptor-ligand complexes used for the comparative studies, along with a brief overview of the GB flavors and the reference PB model used in this study. Second, we present the quality of the calculated ∆Gel and ∆∆Gel with respect to reproducing the values predicted by the reference PB models. Our findings, including implications of the GB model accuracy relative to the PB in the context of practical calculations are summarized and discussed in the Conclusions.

Test Structures Four test sets containing 60 biomolecular complexes were collected for this study: a set of small complexes (data set 1), a set of protein-drug complexes (data set 2), a set of protein-protein complexes (data set 3), and a set of RNA-peptide complexes (data set 4). The structures of these molecules were taken from the RCSB protein databank 111 (PDB), and the PDB identification codes of these structures can be found in Table 1. These sets are diverse with respect to the biomolecule type (e.g. nucleic acids, proteins and drugs), structure sizes and net charges (see Table 2), as well as their calculated ∆∆Gel (see the Results section). The structures in Set 1 (small neutral complexes) were originally selected 27,112 to contain no more than 2000 heavy atoms and no missing heavy atoms. They were protonated as described previously. 27 For the proteins and RNA molecules (except for data set (1)), the atomic charges and hydrogens were assigned using the PDB2PQR utility. 113,114 The ff99bsc0 parameters and the GAFF force field, both part of AMBER, 115 were used for preparing the topology and coordinate files, which includes partial charges. The atomic charges for the drugs in data set (4) were assigned with the methods outlined elsewhere. 63 For the sake of consistency, we used the Bondi radii set for all of the implicit solvent models studied here, as these radii are recommended for specific AMBER GB models. 94 However, to test the sensitivities of the calculated ∆Gel and ∆∆Gel to the choice of atomic radii, some of the calculations were repeated with the atomic radii other than Bondi. The PQR files as well as the corresponding ∆Gel values for the test sets are available in the Supporting Information.

The Thermodynamic Cycle for Electrostatic Binding Free Energies The thermodynamic cycle shown in Figure 1 is used to calculate ∆∆Gel : first, the receptor and the ligand are transferred from solvent into vacuum, second, the 3

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

complex is formed in vacuum, and then the complex is transferred from vacuum into the solvent. The ∆Gel of the receptor, the ligand and the complex are designated as ∆GRel , ∆GLel , and ∆GRL el , respectively. The electrostatic component of the binding free energy in vacuum is ∆EelC , which is calculated similarly for all models. Using this cycle,

Table 1: PDB ID of the complexes in the four data sets.

data set 1

data set 2

data set 3

data set 4

(small complexes) 1B11 1BKF 1F40 1FB7 1FKB 1FKF 1FKG 1FKH 1FKJ 1FKL 1PBK 1ZP8 1ZPA 2FKE 2HAH 3FKP

(protein-drug) 1JTX 1JTY 1JUM 1JUP 1QVT 1QVU 3BQZ 3BR3 3BTC 3BTI 3BTL 3PM1

(protein-protein) 1B6C 1BEB 1BVN 1E96 1EMV 1FC2 1GRN 1HE1 1KXQ 1SBB 1FMQ 1UDI 2BTF 2PCC 2SIC 2SNI 7CEI

(RNA-peptide) 1A1T 1A4T 1BIV 1EXY 1HJI 1I9F 1MNB 1NYB 1QFQ 1ULL 1ZBN 2A9X 484D

small complexes protein-drug protein-protein RNA-peptide

Complex 1769.9 6203.4 6147 1209.5

Ligand 101.4 44.9 2475.1 393.5

(1)

In all the calculations presented here we assume no conformational changes to the structures and do not consider the nonpolar component of the solvation free energy or the entropic contributions to binding. While estimating these quantities would be necessary for a direct comparison with experiment, the associated uncertainties in their theoretical estimates can be large, 9,14,15,104,116–119 and may obfuscate the errors in ∆∆Gel , which is the only quantity that the GB model predicts.

Average |Charge| (e) Complex 0.00 3.615 6.00 16.615

 R L C ∆∆Gel = ∆GRL el − ∆Gel − ∆Gel + ∆Eel

Figure 1: The thermodynamic cycle used to compute the electrostatic component of the binding free energy. The surrounding dielectric medium is shaded for water and is white for vaccum.

Table 2: The average sizes (in atoms) and average absolute total charges of the complexes and ligands for the four test sets. Average Size (atoms)

Page 4 of 25

Ligand 0.00 1.154 4.667 7.769

Reference Poisson-Boltzmann Calculations The finite-difference solution of the 3D linear PB equation was solved using the Adaptive PoissonBoltzmann Solver (APBS). 120 To obtain the ∆Gel of a molecule the linear PB equation was solved twice, first 4

ACS Paragon Plus Environment

Page 5 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

in salt solution and then in vacuum, then the difference between the two electrostatic free energies was taken. Each molecule was treated as a low dielectric region with a dielectric constant (εin ) of 1. The solvent (here a salt solution) region was treated as a high dielectric region with a dielectric constant (εout ) of 80 and NaCl concentration of 0.145 M, with the latter representing a physiological environment. The solvent temperature was set to 298.15 K. The dielectric boundary that separates the low dielectric molecule from the solvent region was defined using the solvent excluded molecular surface obtained with a 1.4 Å spherical water probe radius. No ion exclusion region was considered for consistency with the GB models here examined. We used an autofocusing protocol with a fine grid 20 Å larger than the molecule in each dimension, and a coarse grid 1.7 times the size of the molecule in each dimension, and a fine grid spacing of 0.3 or 0.5 Å (to check for convergence with respect to grid spacing 105,107 ). The default values of all other APBS parameters were used.

Apart from a “dielectric” correction to Eq. (2) used by one of the models (see below), the GB Green function Eq. (2) is the same for all of the GB models considered in this work. That Green function can be rigorously derived from the Poisson equation under a certain set of assumptions, 36 which further supports 124 the use of the numerical PB for testing of the GB accuracy, at least in the context of the GB flavors tested here. The latter were developed primarily for modeling of biological macromolecules. At the same time, we acknowledge the existence of a view 108 that the GB approximation in general is not a direct approximation to the PB, see e.g. Ref. 124 for a discussion. The way the effective Born radii (Ri ) are calculated differ significantly between implementations of the GB models; in that aspect, the GB models (flavors) that are investigated here represent a variety of classes. Full details of these GB models are provided in the original papers; here, we briefly review the main differences between them. While the GB does not employ the dielectric boundary in exactly the same manner as the PB model does, the solute/solvent boundary plays the same conceptual role in the GB; technically, the dielectric boundary enters 125 the GB model through the effective Born radii. In all of the models tested here the effective Born radii are computed as approximate or exact integrals over the suitably chosen solute volume or surface. The twodielectric model considered here is defined by specifying the values of the internal (solute) and external (solvent) dielectrics, and the postion of the sharp boundary between them. Thus, within this model it is natural to assume the same dielectric constants for the GB when testing it directly against the PB. Within this limited scope testing, the precise value of the dielectric constant used to describe the low dielectric interior of the macromolecule is not very relevant. However, we argue that using the lowest physically meaningful value of εin = 1 is best if the goal is to highlight the GB versus PB error – the error is maximized when the solute and solvent dielectrics differ the most, and disappears when the two are equal. 126 For all the GB models, we used interior and exterior dielectric constants of (εin = 1) and (εout = 80) respectively, a NaCl concentration of 0.145 M, the temper-

The Generalized Born Models The “canonical” generalized Born model approximates the Green function of the Poisson equation via a simple closed form expression, first introduced by Still et al., 60 and later augmented to account for the presence of monovalent salt: 121 1 ∆Gel = − 2



1 exp(−κ fGB ) − εin εout



qi q j

∑ fGB(ri j , Ri, R j ) ij

(2) where ri j is the distance between atoms i and j, qi and q j are the respective (partial) charges, the Ri and R j are the so-called effective Born radii, and κ is the Debye-Hückel screening parameter used to incorporate 121 the electrostatic screening effects of (monovalent) salt. While alternatives have been proposed, 71,89,122,123 the most widely used functional form of fGB is fGB (ri j , Ri , R j ) = [ri2j + Ri R j exp(−ri2j /4Ri R j )]1/2 . (3) 5

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ature was set to 298.15 K and a solvent probe radius of 1.4 Å where appropriate. To make correspondence to the reference PB calculations, we used the same intrinsic atomic radii for the GB and PB calculations, see below. Most of the reported results are based on Bondi radii, but we have explored the sensitivity of the conclusions to several other radii sets as described below. Appropriate radii were incorporated into the prmtop files using a variation of the AMBER2CHARMM tool.

Page 6 of 25

in AMBER by setting igb=8. GBNSR6 109,133,136 (“Numerical Surface R6 GB") is one of the latest additions to the AMBER family of GB models, now also implemented in MM_PBSA 137,138 module and Free Energy Workflow (FEW) 139 tool of AmberTools17 136 In GBNSR6 model, the effective Born radii are computed via the “R6” numerical integration over the molecular surface of the solute, defined by the solute intrinsic radii and the solvent probe. GBNSR6 includes a single additional (to the PB) parameter: a constant offset B to the inverse effective Born radii. The value of the offset is fixed for a given probe radius; here we used B = 0.028 Å−1 as proposed by Mongan et al 135 for the probe radius of 1.4 Å. That is, as long as the solvent probe radius is set to the default 1.4 Å, and agreement with the PB is the goal, GBNSR6 model becomes parameter-free in the same sense as the PB. Note that in addition to input parameters that are generally used by the PB, such as the intrinsic atomic radii, other GB models rely on multiple additional parameters to approximate the PB energies, the optimum values of these parameters generally depend on the atomic radii. By default, ∆Gel in GBNSR6 is calculated by the ALPB model, 134 which introduces a physically correct dependence on the dielectric constants into the original GB model (Eq. (2)), while maintaining the efficiency of the original:

AMBER GB models Four variants of the GB model were used: GB-HCT, 127–129 GB-OBC, 130,131 GBneck2 88,132 and GBNSR6. 109,133 Except for GBNSR6, all of these GB models are based on the Coulomb field approximation and differ mainly in how the effective Born radii are computed. The analytical linearized PB (ALPB) correction 134 was used with the GBNSR6 model model only. GB-HCT uses the pairwise descreening approach of Hawkins, Cramer and Truhlar, 127,128 with parameters described by Tsui and Case, 129 which leads to a GB model implemented in Amber with igb=1. The 3D integral used in the estimation of the effective radii is performed over the van der Waals spheres of solute atoms, which implies a definition of the solute volume in terms of a set of spheres, rather than the more complex molecular surface. In GB-OBC, 131 the effective Born radii are re-scaled to account for the interstitial spaces between atom spheres missed by the GB-HCT approximation. In that sense, GB-OBC is intended to be a closer approximation to the true molecular volume, albeit in an average sense. GB-OBC can be used in AMBER by setting igb=5 or 2, here we used the igb=5 version. GB-neck2 is the latest, most successful version of an earlier GBn model, 135 with different additional parameters. 88,132 GB-neck2 uses a pairwise correction term to GB-OBC to approximate 135 a molecular surface dielectric boundary; that is to eliminate interstitial regions of high dielectric smaller than a solvent molecule. This correction affects all atoms and is geometry-specific, going beyond the geometry-free, “average” re-scaling approach of GB-OBC, which mostly affects buried atoms. GB-neck2 can be used

1 ∆Gel = − 2



1 1 − εin εout



1 qi q j 1+βα ∑ ij   αβ 1 + , (4) f GB A

where β = εin /εout , α = 0.571412, and A is the electrostatic size of the molecule, which is essentially the overall size of the structure; A is estimated analytically. 134 CHARMM GB models Four GB models available in the CHARMM software package 94 were evaluated in this study: GBMV1 and GBMV2 are analytical methods in which the molecular volume is 6

ACS Paragon Plus Environment

Page 7 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

constructed from a superposition of atomic functions. 73,140 GBMV1 is the oldest and least accurate of these two analytical GB models. GBMV3 73,100 is a grid-based method that uses a Cartesian grid representation of the molecular volume similar to that used in traditional Poisson-Boltzmann calculations, and GBSW 87,141 uses a van der Waals-based surface with a simple polynomial switching function to smooth the dielectric boundary. A more detailed description of these models is available in a recent review. 100 To compute these energies, CHARMM topology and coordinate files were created with a script. The notation used here for naming the different CHARMM GB parameters follows the CHARMM manual (see the documentation for the gbmv and gbsw modules). For GBSW, the molsurf option was used to mimic the molecular surface, conc=0.145 M, sw=0.2, and the parameters aa0 and aa1 were set to 1.2045 and 0.1866, respectively. For GBMV1 beta was set to -100, the shift parameter was set to -0.5, lambda1 was set to 0.1, p1 to 0.44, tt to 2.92, corr to 0, bufr to 0.5, mem to 20, cuta to 20, p3 to 0, wtyp to 0, and eshift to 0. For GBMV2, tol was set to 10−10 , mem to 20, cuta to 20, dn to 1, bufr to 0.2, beta to -12, shift to -0.1, slope to 0.9, lambda1 to 0.5, p1 to 0.45, p2 to 1.25, p3 to 0.65, p6 to 8, onx to 1.9, offx to 2.1, corr to 1, son to 1.2, soff to 1.5, nphi to 5, sa to 0, sb to 0, and kappa to 0.1239950497. For GBMV3, dn was set to 0.2, p6 to 4.0, nphi to 38, shift to -0.007998, slope to 0.9026, wtyp to 2 and kappa to 0.1239950497. All other parameters used their default values.

pends on the kind of comparison one wants to make, and the specifics of the models used. In principle, one could consider using different radii sets for different models; such strategy would make sense if the accuracy metric were an experimental observable such as the binding free energy. In this case, one could choose a set of parameters for each model that gives best agreement with experiment, and then compare the outcomes to determine the most accurate model, GB or PB, with respect to the chosen metric. However, for reasons outlined above, our goal here is more limited and focused: to evalute only the electrostatic component of solvation energy relative to the PB reference. We argue that in this case using, for all of the GB models, one and the same radii set employed to obtain the PB reference energies makes sense on the grounds of physical consistency. Indeed, all of the GB models reduce to the Born formula for a single spherical ion, which also corresponds to the exact solution of the Poisson equation within the twodielectric formalism shared by all of the models considered here. Thus, it makes sense that all the models, GB or PB, are required to give identical result for a single isolated spherical ion in water, which implies the same radii set. Here we propose to use the well established Bondi radii set: it is physically wellgrounded and has only 5 main atom types, all of which bodes well for low probability of an unwanted bias towards a specific training set and type of reference used in radii set optimizations. Note that more complex radii sets optimized against hydration free energies of small molecules or amino-acids are more likely to be poorly transferable to computation of proteinligand binding energies. 148 Unfortunately, even the seemingly straightforward choice of Bondi radii needs to be carefully justfied for the diverse set of practical GB models tested here. This is because most of these GB models contain multiple fitting parameters used to optimize the agreement with the respective accuracy reference, which most often includes the numerical PB solvation energies. These GB models are intended to be used with specific atomic radii set employed during the parameter optimization, e.g. modifications of Bondi set mbondi, mbondi2 and mbondi3 for GB-HCT, GB-OBC and GB-neck2, respectively

Choice of the Input Atomic Radii Set Within the implicit solvation framework, the solvation energies are extremely sensitive to the intrinsic atomic radii values, 107,125,142 because these determine the position of the dielectric boundary between the low dielectric interior and the high dielectric solvent. Both the GB and the PB models take the intrinsic atomic radii as input – many different radii sets exist. 143–147 Thus, one has to carefully consider the choice of the input radii set for any accuracy analysis of these models, including the one presented here. As we shall see below, the appropriate choice is not obvious, and de7

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 25

Results

(AMBER), and CHARMM radii for many CHARMM GB models. Among all of the GB models tested here, GBNSR6 is the only model with no recommended radii set, because the model is essentially agnostic to the input intrinsic radii in the same sense as the PB reference is. For all the other GB models, changing the input atomic radii away from the recommended set without an extensive re-parametrization effort – which is out of scope here – may result in deterioration of the accuracy of the model relative to the PB reference values based on the same radii set; a possible appearance of accuracy improvement for some structural classes would likely be due to a fortuitous cancellation of error for this class of structures only. To address this subtle issue, we have confirmed that the computed ∆∆Gel values are not sensitive to the use of Bondi set instead of the recommended set for the GB models where Bondi set was not listed as one of the recommended options, see "Results" section for details.

Poisson-Boltzmann Convergence To determine whether our PB results were converged with respect to grid spacing, we computed 20 estiRL L mates of each of ∆GR el , ∆Gel , and ∆Gel at grid spacings of 0.5 Å with the centers of the solution grid shifted by a random fraction of a grid spacing in each direction and averaged these results. We then compared these numbers to similar numbers obtained at a grid spacing of 0.3 Å. By averaging over these solution grids we attempted to eliminate the error due to grid placement. The root mean square deviation (RMSD) between the APBS results with grid spacings of 0.3 Å and those with grid spacings of 0.5 Å are 12.6, 7.9, 2.7 and 2.8 kcal/mol for the RNApeptide, protein-protein, protein-drug and small complexes data sets, respectively. These RMSD’s between the APBS calculations with grid spacings of 0.3 Å and those with grid spacings of 0.5 Å may at first appear to be large compared to those between the GB models and the APBS calculations at 0.3 Å, but the energies with grid spacings of 0.3 Å are always systematically smaller than those with a grid spacing of 0.5 Å because of the bias present in such calculations. 105,107 The correlations between the two sets of APBS calculations are therefore much higher than would be expected from the RMSD alone. The square (R2 ) of the Pearson’s correlation coefficients between ∆Gel given by a grid spacing of 0.3 Å and those given by a grid spacing of 0.5 Å were greater than 0.999 for all four data sets. For ∆∆Gel R2 was greater than 0.992 for all four data sets. For any of the GB models, weaker correlations can be attributed to errors in the GB model.

Perfect effective Born radii To compute the perfect 122 effective Born radii Ri , a Poisson problem is set up and solved separately for each atom using the numerical PB solver PEP, 30 which is especially well suited for the task. In the process, the dielectric boundary of the full molecule is present, but only the charge of that particular atom is non-zero (all other charges are set to zero). The process yields the self-energy ∆Giiel for each atom i, from which the perfect effective Born radius Ri is obtained by inverting the Born equation ∆Giiel = − 21 (1/εin − 1/εout )R−1 i . We set solute dielectric εin = 1, and use εout = 1000 for the solvent to mimic the conductor limit as discussed previously. 126 Here, the perfect radii are used only to further explore the accuracy of GBNSR6; for consistency with the GBNSR6 results without the perfect radii, the corresponding ∆Gel (perfect radii) values in Table 6 are computed using (Eq. (4)). The corresponding PB reference values of ∆∆Gel reported in Table 6 are computed using APBS (rather than PEP) for consistency with the rest of the paper; we verified that these values change insignificantly if computed with PEP instead.

Electrostatic (∆Gel )

Solvation

Free

Energies

In this section, we briefly review the accuracies of the different GB models in predicting ∆Gel for the four sets of complexes (including the complex and receptor and ligand components). Table 3 and Figure 2(a) summarize the deviations of the ∆Gel given by the different GB models from those given by the PB equa8

ACS Paragon Plus Environment

Page 9 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

tion. The root mean square deviation (RMSD), average signed error (< error >), square of the Pearson correlation coefficient (R2 ), and RMSD of the 20% of complexes with the worst agreement with the reference ∆Gel were calculated for each of the four data sets. Overall, the ∆Gel values from all the GB models highly correlate with the values from the PB equation. The lowest correlation is for GBMV1 for data set 3 (R2 =0.9957), but even this correlation is good. Although the R2 between ∆Gel given by the PB equation and that given by the different GB models was good for each GB model, the RMSD and average deviations are significant, and significantly different, between different GB models (Table 3 and Figure 2(a)). For instance, while the RMSD of GBNSR6 from the PB is smaller than 39.5 kcal/mol over all four sets, the RMSD of GBMV1 from the PB can be as large as 895.5 kcal/mol (data set 2). That GBMV1 is less accurate than GBMV2 agrees with similar GB calculations done elsewhere. 73 The RMSD of GBOBC varies significantly over the four sets, from 24.7 kcal/mol for data set 1 to 351.7 kcal/mol for data set 4. Among the AMBER GB models GBNSR6 has the smallest RMSD over-all (all four sets), and among the CHARMM GB models GBMV2 had the smallest over-all RMSD from the PB; the largest RMSD in ∆Gel for these models are 39.5 kcal/mol (data set 3) and 59.8 kcal/mol (data set 4), respectively. However, for one structure set, namely set 1 (small neutral complexes), it is GB-neck2 that shows the best agreement with the PB. The non-zero average deviations (Table 3) point to the existence of noticeable systematic shifts between the estimates of ∆Gel obtained with the PB equation and those obtained with the GB models, which can be due to a number of factors, including the following: (i) Differences in the dielectric boundary definitions used by each model. The PB model typically employs a grid-resolution dependent numerical approximation to the molecular surface, while the analytical GB models mimic this surface to varying degrees of accuracy. The exception is GBNSR6 – the version used here employs essentially the same molecular surface, but still subtle details, including surface discretization, differ between it and what is used by the PB solver used here.

(ii) The GB Green function is only an approximation to the correct Green function of the Poisson problem for a given dielectric boundary. The errors introduced by the approximate Green function can be tested via perfect (PB-based) effective Born radii, as discussed below. One should notice that since binding free energies depend on the “difference” in ∆Gel values of the receptor and ligand components, large systematic shifts in ∆Gel can cancel out in the binding free energy calculations. In general, we find that GB models that show lower RMSD with respect to the PB ∆Gel values are also in better agreement with the PB ∆∆Gel values, as discussed below.

Electrostatic (∆∆Gel )

Binding

Free

Energies

Figure 2(b), and Table 5 (bottom) summarize the deviation of different GB models from the PB reference in estimating ∆∆Gel , while the breakdown by individual structure test sets is given in Figure 3 and Table 4. We calculate RMSD, average signed error < error >, R2 , and the RMSD of the 20% of complexes with the largest deviations between ∆∆Gel given by the PB equation and those given by the GB model. Over-all, a wide range of accuracies is seen across the GB models tested, as judged by the different accuracy metrics. Some models (GBNSR6 and GBMV3) have very little over-all systematic deviation from the PB, Table 5 (bottom), while others deviate substantially, e.g. the older GB-HCT and GBMV1 models. By far the lowest RMS error and the highest R2 correlation are seen for GBNSR6 model. With regards to specific structures, the GB models also demonstrate a wide range of accuracies. For data set 1 (small, net neutral complexes), all of the GB models show good correlation with the reference PB values, except for GBMV1 for which R2 is 0.46. The low correlation of GBMV1 with the PB ∆∆Gel values in set 1 is consistent with the large RMSD and average error of the ∆Gel values (RMSD=135.35 kcal/mol, Table 3). The average error in ∆∆Gel for all of AMBER GB models (GBNSR6, GB-HCT, GB-OBC and GBneck2) is negative (also see Figure 3), suggesting that

9

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 25

Table 3: Deviation of GB ∆Gel values (kcal/mol) from the PB reference for the four data sets. Error metrics calculated for each set of predictions include root mean square deviation (RMSD), average signed error (< error >), square of the Pearson correlation coefficient (R2 ), and RMSD of the 20% of complexes with the worst agreement with the reference. Data Set 1 (small complexes)

Data Set 2 (protein-drug)

R2

R2

GBNSR6 GB-HCT GB-OBC GB-neck2 GBSW GBMV1 GBMV2 GBMV3

RMSD 21.8 33.6 24.7 6.7 11.5 135.4 14.0 25.0

< error > RMS of worst 20% 0.999987 18.6 31.0 0.999703 -25.3 52.5 0.999904 20.8 36.8 0.999919 -0.2 12.5 0.999799 5.4 20.0 0.999537 -109.2 201.2 0.999964 12.0 23.3 0.999945 20.7 37.3 Data Set 3 (protein-protein)

RMSD 21.4 260.6 172.8 36.1 148.0 895.7 19.9 80.7

< error > RMS of worst 20% 0.999974 9.3 42.7 0.999938 -211.1 358.3 0.999764 134.7 271.8 0.999902 1.5 70.7 0.999937 -119.1 216.8 0.999854 -731.9 1164.2 0.999971 2.5 37.1 0.999954 62.6 131.0 Data Set 4 (RNA-peptide)

GBNSR6 GB-HCT GB-OBC GB-neck2 GBSW GBMV1 GBMV2 GBMV3

RMSD 39.5 358.2 93.0 101.4 112.8 701.0 39.0 82.1

R2 0.999944 0.996962 0.999094 0.999391 0.999709 0.995759 0.999786 0.999759

RMSD 14.5 141.7 351.7 153.5 67.5 176.6 59.8 44.2

R2 0.999997 0.999835 0.999343 0.999701 0.999902 0.999571 0.999947 0.99996

< error > 33.8 -275.7 -26.5 -62.2 -89.8 -610.7 13.0 64.3

RMS of worst 20% 66.9 674.3 193.4 205.6 211.8 1167.1 71.5 148.5

< error > 11.1 95.0 276.1 -111.06 -46.5 -144.1 34.0 -28.0

RMS of worst 20% 25.3 256.8 578.0 292.8 126.3 301.5 111.4 85.9

Table 4: Deviation of GB ∆∆Gel values (kcal/mol) from the PB reference for the four data sets. The error metrics are described in Table 3. Data Set 1 (small complexes)

Data Set 2 (protein-drug)

R2

R2

GBNSR6 GB-HCT GB-OBC GB-neck2 GBSW GBMV1 GBMV2 GBMV3

RMSD 3.9 10.7 7.3 6.2 4.0 9.6 5.0 3.0

< error > RMS of worst 20% 0.979509 -3.6 6.71 0.876606 -10.2 15.4 0.946172 -6.7 10.6 0.910121 -5.5 10.6 0.924397 3.0 6.61 0.463363 6.9 15.6 0.935667 -4.2 7.7 0.913558 0.2 6.0 Data Set 3 (protein-protein)

RMSD 4.2 27.7 23.0 13.5 12.8 9.5 7.0 24.4

< error > RMS of worst 20% 0.949986 1.2 5.9 0.764202 -26.0 40.3 0.377215 -19.6 37.0 0.776368 -10.8 24.3 0.774716 -9.53 22.2 0.822145 -5.81 16.5 0.88332 1.8 11.4 0.638866 -20.0 39.2 Data Set 4 (RNA-peptide)

GBNSR6 GB-HCT GB-OBC GB-neck2 GBSW GBMV1 GBMV2 GBMV3

RMSD 13.6 50.8 26.2 24.0 19.0 49.1 25.9 16.8

R2 0.985424 0.921176 0.927651 0.952807 0.948775 0.94559 0.99063 0.981712

RMSD 7.7 100.5 57.3 65.7 69.6 116.9 80.9 48.2

R2 0.998602 0.91455 0.950862 0.95664 0.892278 0.920653 0.921114 0.924991

< error > -9.2 -41.8 -11.9 -11.1 4.3 -44.2 -24.6 12.4

RMS of worst 20% 24.7 87.1 46.8 43.9 32.2 71.4 36.4 30.0

10

ACS Paragon Plus Environment

< error > 4.7 -89.2 -45.0 -56.5 -43.8 -107.0 -64.5 -10.5

RMS of worst 20% 12.4 148.1 87.8 99.1 95.4 152.6 123.3 86.3

Page 11 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(a)

(b)

Figure 2: a) The root-mean-square deviations (RMSD) between the electrostatic solvation free energies (∆Gel ) predicted by the different generalized Born (GB) models and those predicted by the reference PB for different protein-ligand complex sets. b) The RMSD between the electrostatic binding free energies (∆∆Gel ) predicted by the different GB models and those predicted by the reference PB for different complex sets. the deviation of AMBER GB models from the used PB reference contains a systematic component for this test set. For data set 1, the best correlations with the PB reference is seen for the GBNSR6 (R2 =0.98) and GBMV3 (R2 = 0.91) predictions. Overall, the lowest R2 correlation with the PB ∆∆Gel across different GB models is seen for data set 2 (protein-drug). Perhaps not surprisingly, GB-OBC, which is among the oldest commonly used generalpurpose GB models, yields a very low correlation with the PB (R2 =0.37). The correlation (R2 ) is only 0.63 for GBMV3, which is surprising for a model shown to be superior to other CHARMM GB models. 73,140 Only GBNSR6 maintains a high correlation with the PB for this data set, (R2 ≈ 0.95) (Table 4 and Figure 3), suggesting that challenging structures are less forgiving to the approximations made by multi-parameter GB models; among the tested models, GBNSR6 makes the fewest approximations relative to the PB. On average, the best agreement with the PB across all of the GB models is achieved for data set 3 (proteinprotein). The lowest correlation with the PB is again for the oldest models, GB-HCT and GB-OBC, with (R2 ≈ 0.92), while R2 is as good as ≈0.99 for GBMV2 and GBNSR6. GBNSR6 also yields the smallest RMSD (13.62 kcal/mol). The good agreement with the PB for this set is not unexpected given that for most of the GB models protein structures were always

a large part of the training set against which parameters of the models were fit. Data set 4 (RNA-peptide) appears to be the most challenging for all the GB models, with the exception of GBNSR6, with respect to RMSD from the PB (Table 4 and Figure 2). The average R2 is the second worst among all of the data sets, after data set 2. Again, GBNSR6 is an exception for this set: it shows a relatively small deviation from the PB (RMSD=7.64 kcal/mol, R2 =0.999). Overall, the large average deviation (and RMSD) from the PB reference in set 4 and the lowest deviation for set 1 correlate well with the average net charge of these two data sets, which carry the largest and smallest average net charges, respectively (Table 2). Our explanation for the correlation is as follows: errors in the GB Green function are scaled by the partial atomic charges, Eq. (2). Note that all structures in set 4 contain fragments of nucleic acids (RNA), which carry high partial charges on the phosphate groups, absent from all other sets. The correlation with the PB is worst for set 2 (not set 4, with the highest average charge), which suggests a more nuanced story. As discussed above, we conjecture that the GB Green function performs the worst for set 2, leading to the poorest correlation with the PB, but the resulting error values (RMSD and average) are not as high because the structures are not as highly 11

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 25

Table 5: Overall deviation of the GB ∆Gel and ∆∆Gel values (kcal/mol) from those computed with the numerical PB equation for 60 complexes. The error metrics are described in Table 3.

charged (in the limit of zero partial charges, even very low correlation with the PB can still result in small average and RMS errors).

Correlation between accuracy in ∆∆Gel and ∆Gel calculations

∆Gel

In general, we find a moderate correlation between the accuracies of ∆Gel and those of ∆∆Gel , Figure 4. Namely, when comparing a group of GB models that agree most with the PB to the group that agrees the least, there is a clear correlation between the accuracy of ∆Gel and ∆∆Gel between these broad groups (but not within each group), whether the accuracy is measured by RMSD or R2 (Figure 4 (a) and (b)). Thus, a very poor agreement of ∆Gel with the PB suggests that a good agreement of ∆∆Gel is unlikely. Even though the range of R2 in GB ∆Gel is very narrow (all R2 values are fairly close to 1 for all the GB models), Figure 4 (b) , there is still a correlation between R2 of ∆Gel and R2 of ∆∆Gel . This correlation suggests that, when interpreting R2 for ∆Gel , decimal numbers up to the fourth or even fifth place can be important for distinguishing between the accuracies of different GB models in terms of their ability to predict ∆∆Gel . In addition, Figure 4 (c) shows that the accuracy of ∆Gel and ∆∆Gel measured by RMSD and R2 , respectively, also correlate to some extent (Figure 4 (c)). Given that RMSD in ∆Gel from the PB can significantly differ across GB models (Table 3), the data shown in Figure 4 (c) suggests that, for GB models on the “opposite ends" of the accuracy spectrum, RMSD can be used instead of R2 to distinguish between the models. One can not reliably predict the accuracy in ∆∆Gel based on that of ∆Gel for models where the difference is not very large. For example, while the RMSD in ∆Gel for GBMV2 (37 kcal/mol) is smaller than that of GBMV3 (63.4 kcal/mol); the RMSD in ∆∆Gel for GBMV2 is larger (40.4 kcal/mol) than that of GBMV3 (26.8 kcal/mol) (Table 5). Another example is the GB-OBC model which is the third least accurate model in ∆Gel values (RMSD=189.8 kcal/mol), yet it is the third most accurate model in ∆∆Gel values (RMSD=32.3 kcal/mol, R2 =0.956) (Table 5).

R2

GBNSR6 GB-HCT GB-OBC GB-neck2 GBSW GBMV1 GBMV2 GBMV3

RMSD 27.2 240.5 189.8 92.1 97.9 577.0 37.0 63.4

0.9999 0.9945 0.9974 0.9994 0.9995 0.9862 0.9999 0.9996

< error > 19.5 -114.6 86.6 -42.4 -61.4 -402.1 15.0 32.3 ∆∆Gel

RMS of worst 20% 50.2 471.8 395.9 198.1 194.2 1123.3 75.2 123.4

GBNSR6 GB-HCT GB-OBC GB-neck2 GBSW GBMV1 GBMV2 GBMV3

RMSD 8.7 56.2 32.3 34.0 34.6 61.0 40.4 26.8

R2 0.994889 0.89914 0.955511 0.949725 0.91599 0.804674 0.913058 0.953638

< error > -2.4 -40.2 -19.4 -19.4 -9.4 -35.9 -22.1 -2.8

RMS of worst 20% 17.3 113.2 66.0 70.8 73.6 191.2 69.7 54.4

Source of the remaining error in GBNSR6 The error of the GB relative to the exact solution of the corresponding Poisson problem can be decomposed into two independent components: the error in approximating the effect of the same (in our case, molecular surface) dielectric boundary, and the error due to the approximate GB Green function. 125 The first error can be eliminated by the use of the perfect, that is PB-based effective Born radii – while computationally ineffective, these effective radii ensure that the self terms in Eq. (2) and the corresponding PB values are identical, so that the remaining error is only due to the cross terms, i.e. the GB Green function itself. As the errors of GBNSR6 on our test sets are the lowest, we chose this model to explore the origin of the remaining error. Due to computational constraints, we focused on set 1 – the set of smallest structures among the four sets used here. We calculated perfect effective Born radii 63,122 for the two complexes from set 1 that showed the largest deviation of GBNSR6 ∆∆Gel values from the PB reference. The results, Figure 5, show that, for the two complexes tested here, most

12

ACS Paragon Plus Environment

Page 13 of 25

GB ∆∆Gel (kcal/mol) GB ∆∆Gel (kcal/mol)

(a) Data set 1 (small complexes) 30 GBNSR6

GB-HCT

GB-OBC

GB-neck2

GBSW

GBMV1

GBMV2

GBMV3

15

0 30

15

0

30

15

0

30

15

0

30

15

0

30

15

0

PB ∆∆Gel (kcal/mol) PB ∆∆Gel (kcal/mol) PB ∆∆Gel (kcal/mol) PB ∆∆Gel (kcal/mol)

GB ∆∆Gel (kcal/mol) GB ∆∆Gel (kcal/mol)

(b) Data set 2 (protein-drug) 75

GBNSR6

GB-HCT

GB-OBC

GB-neck2

GBSW

GBMV1

GBMV2

GBMV3

60 45 30 15 0 75 60 45 30 15 0 75

60

45

30

15

0

75

60

45

30

15

0

75

60

45

30

15

0

75

60

45

30

15

0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

PB ∆∆Gel (kcal/mol) PB ∆∆Gel (kcal/mol) PB ∆∆Gel (kcal/mol) PB ∆∆Gel (kcal/mol)

Figure 3: Correlation between electrostatic binding free energies (∆∆Gel ) computed by different GB models relative to the PB. The diagonal lines are x = y. 13 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

GB ∆∆Gel (kcal/mol) GB ∆∆Gel (kcal/mol)

(c) Data set 3 (protein-protein) 200

GBNSR6

GB-HCT

GB-OBC

GB-neck2

GBSW

GBMV1

GBMV2

GBMV3

150 100 50 200 150 100 50 200

150

100

50

200

150

100

50

200

150

100

50

200

150

100

50

PB ∆∆Gel (kcal/mol) PB ∆∆Gel (kcal/mol) PB ∆∆Gel (kcal/mol) PB ∆∆Gel (kcal/mol)

GB ∆∆Gel (kcal/mol) GB ∆∆Gel (kcal/mol)

(d) Data set 4 (RNA-peptide) 300 250

GBNSR6

GB-HCT

GB-OBC

GB-neck2

GBSW

GBMV1

GBMV2

GBMV3

200 150 100 50 0 300 250 200 150 100 50 0 300

250

200

150

100

50

0

300

250

200

150

100

50

0

300

250

200

150

100

50

0

300

250

200

150

100

50

0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 25

PB ∆∆Gel (kcal/mol) PB ∆∆Gel (kcal/mol) PB ∆∆Gel (kcal/mol) PB ∆∆Gel (kcal/mol)

Figure 3: Correlation between electrostatic binding free energies (∆∆Gel ) computed by different GB models relative to the PB.The diagonal lines are x = y. (cont.) 14 ACS Paragon Plus Environment

Page 15 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(a)

(b)

(c)

Figure 4: Correlation between the accuracy of the GB models in predicting electrostatic binding affinities (∆∆Gel ) and their accuracy in predicting electrostatic solvation free energy components (∆Gel ) over all of the four complex sets. a) RMSD in ∆∆Gel versus RMSD in ∆Gel b) R2 correlations in ∆∆Gel versus R2 correlations in ∆Gel c) R2 correlations in ∆∆Gel versus RMSD in ∆Gel .

(a)

(b)

Figure 5: Comparing the inverse effective Born radii calculated using GBNSR6 and PB (perfect radii) for two complexes from data set 1: a) 1bkf , b) 1fkl. The diagonal lines are x = y.

15

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

of the effective Born radii from GBNSR6 correlate well with the perfect (PB) radii, with the discrepancies seen mostly for “buried” atoms. We attribute most of the discrepancy to subtle differences between the dielectric boundary employed by this version of GBNSR6 (the dielectric boundary is the molecular surface calculated by MSMS 149 ) and the approximation of the boundary employed by typical PB solvers, see Ref. 136 for further details. One may excpect that reducing/eliminating the difference between the dielectric boundaries used by the GB and PB should further reduce the discrepancy between the respective ∆∆Gel values. This appears to be the case: an algorithm that produces a molecular surface very similar to that used by the PB solvers was recently implemented 136 for GBNSR6 in AMBER, which reduced the ∆∆Gel RMSD error with respect to PB by more than a factor of two, from 3.9 to 1.43 kcal/mol for data set 1 for which the evaluation was performed. 136 One might conjecture that direct use of the perfect effective Born in the GB equation would bring the GB accuracy into an even closer accord with the PB, but that may not be the case, at least based on the two structures we tested here. Namely, when the perfect radii were used instead of the R6 radii in the GB Eq. (2), we found no reduction of the GB. vs. PB error in ∆∆Gel , Table 6. The fact that for GBNSR6 the

GB vs. PB gap in ligand binding calculations. Insensitivity to the input radii set We have explicitly confirmed that our key conclusions are not affected by the use of Bondi intrinsic atomic radii instead of recommended defaults for the models where this choice may matter. Here we tested CHARMM GB models (i. e. GBSW, GBMV1, GBMV2, GBMV3) on our data set 2, using CHARMM atomic radii instead of Bondi radii set. We chose this set of structures for the sensitivity test because this set shows the poorest R2 correlation with the reference, so that if a “native” radii set were to lead to an improvement, we would most likely see it on this set. We used the same logic to also evaluate the sensitivity of GB-neck2 to input radii. Here we used the other “worst performing” data set – set 4. We chose to test GB-neck2 on set 4 because GB-neck2 uses a different set of parameters for nucleic-acids, 150 and our data set 4 is the only one that contains nucleic acid fragments. For the test we employed mbondi3 radii (the suggested default for GB-neck2) instead of Bondi radii. As before, we used the matching set of atomic radii for the reference PB calculations. Our results show that the RMS deviations and R2 correlations change only slightly when using these different atomic radii other than Bondi (Table 7). Thus, all of our conclusions based on the Bondi radii set remain valid.

Table 6: PB ∆∆Gel values (kcal/mol) compared to GBNSR6 (Bondi radii) and the GB using perfect effective Born radii. When the latter are used in the GB formalism, the remaining error relative to the PB comes from the GB Green function itself. PDB ID 1BKF 1FKL

PB -0.76 12.28

GBNSR6(Bondi radii) -9.19 6.23

Page 16 of 25

Table 7: Sensitivity of some GB models to the choice of atomic radii set. Deviations of ∆∆Gel values (kcal/mol) relative to the PB values using the same set of radii for both GB and PB. Included here are root mean square deviation (RMSD) and square of the Pearson correlation coefficient (R2 ).

GB(Perfect radii) -10.72 3.98

Data Set 3 (protein-protein) ∆∆Gel (RMSD) ∆∆Gel (R2 )

agreement with the PB does not necessarily improve with the use of the perfect Born radii, suggests that for this GB model a significant component of the remaining error in ∆∆Gel likely comes from the approximate GB Green function itself, and not from the way the effective Born radii are approximated 109,133 within this model. It remains to be seen whether more sophisticated GB Green functions 123 can further reduce the 16

GBSW GBMV1 GBMV2 GBMV3

Bondi radii 19.0 49.1 25.9 16.8

GB-neck2

Bondi radii 65.7

CHARMM radii Bondi radii CHARMM radii 29.6 0.949 0.972 43.7 0.946 0.937 18.4 0.991 0.971 20.5 0.982 0.960 Data Set 4 (RNA-peptide) ∆∆Gel (R2 ) ∆∆Gel (RMSD)

ACS Paragon Plus Environment

mbondi3 radii 64.7

Bondi radii 0.957

mbondi3 radii 0.953

Page 17 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Conclusions

better agreement with PB. Among the CHARMM GB models and considering all biomolecular complexes together, GBMV3 shows the highest correlation with the PB results for ∆∆Gel , whereas GBMV2 is the most accurate for ∆Gel . Moreover, GBMV1, which is an older analytical model, is less accurate than the other CHARMM GB models when all biomolecular complexes are combined into one data set. Among the GB models available in AMBER, GB-neck2, which has the largest number of fitting parameters, shows a good performance (second to GBNSR6) cross different test cases. In contrast to the other GB models tested here, GBNSR6 uses only a single additional (to the PB) fitting parameter, yet it shows the best agreement with reference PB among all of the models tested. The closer agreement of GBNSR6 with PB consistently holds over all the four complex sets. Moreover, the accuracy of GBNSR6 ∆∆Gel estimates does not change significantly if the perfect (PB-based) Born radii are used instead of the R6 radii calculated within GBNSR6. Therefore, we conjecture that the majority of the remaining error in GBNSR6 estimates of ∆∆Gel are due to the approximate GB Green function itself, rather than the way the effective Born radii is calculated. Future work will assess if this superior accuracy of the GBNSR6 model in Amber translates into better binding affinity ranking when performing MM(GB)SA studies of biomolecular systems for which solid experimental data is available. We have also explored how the accuracy of the GB models in predicting ∆Gel correlates with their ability to predict the binding affinity ∆∆Gel . To summarize our finding with respect to the PB accuracy metric: (1) GB models that tend to be considerably more accurate in electrostatic solvation free energies generally perform better in binding affinity calculations. (2) The GB models that lead to poor correlations with PB in electrostatic solvation free energies are least accurate in terms of electrostatic binding free energies. (3) Nevertheless, the same accuracy rank order of GB models in the electrostatic solvation free energies does not necessarily hold in the corresponding electrostatic binding affinities predictions; 56,63 especially when the differences in accuracies are relatively small.

High computational efficiency of the methods used in calculation of binding free energies is important, since these calculations involve large structural ensembles. The GB models offer a good compromise: they offer efficient methods to account for solvent without explicitly modeling individual water molecules. Of course, the accuracy of the models are absolutely critical. This study is an attempt to systematically investigate the accuracy of several commonly used GB models implemented in major molecular simulation packages in estimating the electrostatic component of the binding free energy, ∆∆Gel . The accuracy was evaluated in terms of agreement with the numerical PB reference across a diverse set of 60 biomolecular complexes. Our findings demonstrate that different GB models lead to very different binding affinity predictions, with R2 correlations varying largely among different complex sets, as well as among different GB models. An assessment of the literature reveals a similar behavior of R2 correlations for other biomolecular complexes with varying energy range and spacing (e.g., 151–155 ). Future studies comparing the GB-predicted solvation free energies directly to experiment will need to evaluate to what extend performance of the promising GB models holds in realistic problems where binding free energies differ by only a few kcal/mol. Among all different complex sets tested in this study, the best agreement with the PB was achieved for the proteinprotein set, most likely because protein structures have been conventionally used as a large part of the training sets used in the parametrization of most GB models. The correlation with PB is also good for the small complexes, which are relatively small and not highly charged. However, the RNA-peptide and protein-drug complex sets appeared to be most challenging. The RNA-peptide complexes are highly charged, which can directly scale up the errors in GB Green function. The force field parameters for “non-standard” drug components seem to be least compatible with the GB parameters, leading to a relatively low correlation with PB for the protein-drug test. In general, we found that newer GB models are in 17

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

We stress that a close agreement of a GB model with the numerical PB, while being an important metric of the GB accuracy, 124 does not, by itself, guarantee an equally good performance of the model compared to experiment. There are several reasons for the lack of a simple direct correlation here. First, the PB itself is a serious approximation to reality – a mean field, linear response treatment of atomistic electrostatic interactions in the presence of discrete water molecules and mobile ions replaced by a structureless continuum within the PB. Equally importantly, the PB accounts for the electrostatics only, while the total binding free energy includes other important contributions, most notably non-polar and entropic, the latter being particularly difficult to estimate computationally – these contributions have not been considered here. At the same time, a significant disagreement of a GB model with the PB on a certain class of structures is a serious indication of potential problems in practical calculations. Indeed, many publications based on MM(GB)SA and MM(PB)SA have shown a large variability in their predictive power when compared against experimental binding data. 156–160 Based on the very different accuracies of the GB models tested here and elsewhere (e.g., 151,161–163 ), we think this variability may stem largely from the very different electrostatic binding free energies predicted by the different GB models. The results presented here can therefore be used as a guide for the MM(GB)SA/MM(PB)SA community in deciding which GB models should be avoided, and which are more likely than others to lead to more accurate results. The much improved level of accuracy in more recent GB methods provides an encouraging basis for more reliable GB simulations and opens a whole new range of possibilities for reaching larger system sizes in ligand binding calculations. Future studies will need to evaluate the most promising GB models against appropriate experimental data. In this sense, “electrostatic" mutants in which a charged amino acid is mutated into a neutral one may be particularly useful, as the electrostatic effects should dominate the corresponding free energy changes to the binding free energy. Finally, the detailed description of parameters of many practical GB flavors provided

Page 18 of 25

here may serve as a useful reference guide for those wishing to utilize these models in ligand binding computations. Acknowledgement A.V.O. thanks the NIH (GM076121) for partial support. All authors thank Michael Feig for many helpful discussions of the GBMV family of models and the associated best practices; Carlos Simmerling for suggestions regarding best radii set for GB-neck2; Wonpil Im for a comment on GBSW model. M.O.F thanks Dr. Ilisa Lee for doing a literature search on MMPB(GB)SA studies and Dr. Michael E. Zawrotny for maintaining and helping with computer resources at the Institute of Molecular Biophysics in Florida State University. Supporting Information Available: A README file, one folder for each of the four data sets and *.dat files describing the names of the files in the folder are provided. The Amber topology (*.top) and coordinate (*.crd) are included in each of the four folders. The *.pqr files are also included in each of the four folders. This material is available free of charge via the Internet at http://pubs.acs.org/.

References (1) Chodera, J. D.; Mobley, D. L.; Shirts, M. R.; Dixon, R. W.; Branson, K.; Pande, V. S. Curr. Opin. Struct. Biol. 2011, 21, 150–160. (2) Wang, L.; Wu, Y.; Deng, Y.; Kim, B.; Pierce, L.; Krilov, G.; Lupyan, D.; Robinson, S.; Dahlgren, M. K.; Greenwood, J.; Romero, D. L.; Masse, C.; Knight, J. L.; Steinbrecher, T.; Beuming, T. et al. J. Am. Chem. Soc. 2015, 137, 2695–2703. (3) Jorgensen, W. L. Science 2004, 303, 1813– 1818. (4) Mobley, D. L.; Dill, K. A. Structure 2009, 17, 489–498. (5) Shirts, M. R.; Mobley, D. L.; Brown, S. P. In Structure Based Drug Design, 1st ed.; Merz, K. M., Ringe, D., Reynolds, C. H., Eds.; 18

ACS Paragon Plus Environment

Page 19 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Lecture Notes in Computer Science; Cambridge University Press: Cambridge, New York USA, 2010; pp 61–85.

(19) Mobley, D. L.; Bayly, C. I.; Cooper, M. D.; Shirts, M. R.; Dill, K. A. J. Chem. Theory Comput. 2009, 5, 350–358.

(6) Kastritis, P. L.; Bonvin, A. M. J. J. J. R. Soc. Interface 2012, 10.

(20) Shivakumar, D.; Deng, Y.; Roux, B. J. Chem. Theory Comput. 2009, 5, 919–930.

(7) Lau, A. Y.; Roux, B. A. Nat. Struct. Mol. Biol. 2011, 18, 283–287.

(21) Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L. J. Chem. Phys. 1983, 79, 926–935.

(8) Lin, Y.-L.; Meng, Y.; Jiang, W.; Roux, B. Proc. ˙ 2013, 110, 1664–1669. Natl. Acad. Sci. U. S.A. (9) Gilson, M. K.; Zhou, H.-X. Annu. Rev. Biophys. Biomol. Struct. 2007, 36, 21–42.

(22) Horn, H. W.; Swope, W. C.; Pitera, J. W.; Madura, J. D.; Dick, T. J.; Hura, G. L.; HeadGordon, T. J. Chem. Phys. 2004, 120, 9665– 9678.

(10) Pollard, T. D. Mol. Biol. Cell 2010, 21, 4061– 4067.

(23) Izadi, S.; Anandakrishnan, R.; Onufriev, A. V. 2014, 5, 3863–3871.

(11) Kollman, P. Chem. Rev. 1993, 93, 2395–2417.

(24) Söderhjelm, P.; Tribello, G. A.; Parrinello, M. Proc. Natl. Acad. Sci. USA 2012, 109, 5170– 5175.

(12) Beveridge, D. L.; ; DiCapua, F. M. Annu. Rev. Biophys. Biophys. Chem. 1989, 18, 431–492. (13) Åqvist, J.; Medina, C.; Samuelsson, J.-E. Protein Eng. 1994, 7, 385–391.

(25) Proctor, E. A.; Yin, S.; Tropsha, A.; Dokholyan, N. V. Biophys. J. 2012, 102, 144 – 151.

(14) Srinivasan, J.; Cheatham, T. E., III.; Cieplak, P.; Kollman, P. A.; Case, D. A. J. Am. Chem. Soc. 1998, 120, 9401–9409.

(26) Gao, K.; Yin, J.; Henriksen, N. M.; Fenley, A. T.; Gilson, M. K. J. Chem. Theory Com˘ S4564. put. 2015, 11, 4555–âA ¸

(15) Kollman, P. A.; Massova, I.; Reyes, C.; Kuhn, B.; Huo, S.; Chong, L.; Lee, M.; Lee, T.; Duan, Y.; Wang, W.; Donini, O.; Cieplak, P.; Srinivasan, J.; Case, D. A.; Cheatham, T. E., III. Acc. Chem. Res. 2000, 33, 889–897.

(27) Izadi, S.; Aguilar, B.; Onufriev, A. V. J. Chem. Theory Comput. 2015, 11, 4450–4459. (28) Cramer, C. J.; Truhlar, D. G. Chem. Rev. 1999, 99, 2161–2200. (29) Honig, B.; Nicholls, A. Science 1995, 268, 1144–1149.

(16) Gohlke, H.; Kiel, C.; Case, D. A. J. Mol. Biol. 2003, 330, 891–913.

(30) Beroza, P.; Case, D. A. Methods Enzymol. 1998, 295, 170–189.

(17) Hu, Y.; Sherborne, B.; Lee, T.-S.; Case, D. A.; York, D. M.; Guo, Z. J. Comput. Aided Mol. Des. 2016, 30, 533–539.

(31) Madura, J. D.; Davis, M. E.; Gilson, M. K.; Wade, R. C.; Luty, B. A.; McCammon, J. A. Rev. Comp. Chem. 1994, 5, 229–267.

(18) Mobley, D. L.; Dill, K. A.; Chodera, J. D. J. Phys. Chem. B 2008, 112, 938–946.

(32) Gilson, M. K. Curr. Opin. Struct. Biol. 1995, 5, 216–223. 19

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(33) Scarsi, M.; Apostolakis, J.; Caflisch, A. J. Phys. Chem. A 1997, 101, 8098–8106.

Page 20 of 25

(48) Mackoy, T.; Harris, R. C.; Johnson, J.; Mascagni, M.; Fenley, M. O. Commun. Comput. Phys. 2013, 13, 195–206.

(34) Luo, R.; David, L.; Gilson, M. K. J. Comput. Chem. 2002, 23, 1244–1253.

(49) Boschitsch, A. H.; Fenley, M. O. In Computational Electrostatics for Biological Applications: Geometric and Numerical Approaches to the Description of Electrostatic Interaction Between Macromolecules; Rocchia, W., Spagnuolo, M., Eds.; Springer International Publishing: Switzerland, 2015; pp 73–110.

(35) Simonson, T. Rep. Prog. Phys. 2003, 66, 737– 787. (36) Onufriev, A. In Modeling Solvent Environments, 1st ed.; Feig, M., Ed.; Wiley: USA, 2010; pp 127–165.

(50) Fenley, M. O.; Mascagni, M.; McClain, J.; Silalahi, A. R. J.; Simonov, N. A. J. Chem. Theory Comput. 2009, 6, 300–314.

(37) Labute, P. J. Comput. Chem. 2008, 29, 1693– 1698. (38) Nicholls, A.; Honig, B. J. Comput. Chem. 1991, 12, 435–445.

(51) Baker, N. A. Curr. Opin. Struct. Biol. 2005, 15, 137–143.

(39) Baker, N. A.; Sept, D.; Joseph, S.; Holst, M. J.; ˙ McCammon, J. A. Proc. Natl. Acad. Sci. U. S.A. 2001, 98, 10037–10041.

(52) Boschitsch, A. H.; Fenley, M. O. J. Chem. Theory Comput. 2011, 7, 1524–1540.

(40) Bashford, D.; Karplus, M. Biochemistry 1990, 29, 10219–10225.

(53) Chen, D.; Chen, Z.; Chen, C.; Geng, W.; Wei, G.-W. J. Comput. Chem. 2011, 32, 756– 770.

(41) Im, W.; Beglov, D.; Roux, B. Comput. Phys. Commun. 1998, 111, 59–75.

(54) Wang, J.; Tan, C.; Chanco, E.; Luo, R. Physical chemistry chemical physics : PCCP 2010, 12, 1194–1202.

(42) Madura, J. D.; Davist, M. E.; Gilson, M. K.; Wades, R. C.; Luty, B. A.; McCammon, J. A. Reviews in Computational Chemistry; John Wiley and Sons, Inc., 2007; pp 229–267.

(55) Sarkar, S.; Witham, S.; Zhang, J.; Zhenirovskyy, M.; Rocchia, W.; Alexov, E. Communications in computational physics 2013, 13, 269–284.

(43) Altman, M. D.; Bardhan, J. P.; White, J. K.; Tidor, B. J. Comput. Chem. 2009, 30, 132–153.

(56) Harris, R. C.; Mackoy, T.; Fenley, M. O. J. Chem. Theory Comput. 2015, 11, 705–712.

(44) Li, B.; Cheng, X.; Zhang, Z. SIAM J. Appl. Math. 2011, 71, 2093–2111.

(57) Fenley, M. O.; Harris, R. C.; Mackoy, T.; Boschitsch, A. H. J. Comput. Chem. 2015, 36, 235–243.

(45) Simonov, N. A.; Mascagni, M.; Fenley, M. O. J. Chem. Phys. 2007, 127, 185105.

(58) Hoijtink, G.; De Boer, E.; Van der Meij, P.; Weijland, W. Recl. Trav. Chim. Pays-Bas 1956, 75, 487–503.

(46) Boschitsch, A. H.; Fenley, M. O.; Zhou, H.-X. J. Phys. Chem. B 2002, 106, 2741–2754. (47) Boschitsch, A. H.; Fenley, M. O. J. Comput. Chem. 2004, 25, 935–955.

(59) Tucker, S. C.; Truhlar, D. G. Chem. Phys. Lett. 1989, 157, 164–170. 20

ACS Paragon Plus Environment

Page 21 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(60) Still, W. C.; Tempczyk, A.; Hawley, R. C.; Hendrickson, T. J. Am. Chem. Soc. 1990, 112, 6127–6129.

(75) Romanov, A. N.; Jabin, S. N.; Martynov, Y. B.; Sulimov, A. V.; Grigoriev, F. V.; Sulimov, V. B. J. Phys. Chem. A 2004, 108, 9323–9327.

(61) Bashford, D.; Case, D. A. Annu. Rev. Phys. Chem. 2000, 51, 129–152.

(76) Dominy, B. N.; Brooks III., C. L. J. Phys. Chem. B 1999, 103, 3765–3773.

(62) Chen, J.; Brooks III, C. L.; Khandogin, J. Curr. Opin. Struct. Biol. 2008, 18, 140 – 148.

(77) David, L.; Luo, R.; Gilson, M. K. J. Comput. Chem. 2000, 21, 295–309.

(63) Harris, R. C.; Mackoy, T.; Fenley, M. O. Mol. Based Math. Biol. 2013, 1, 63–74.

(78) Tsui, V.; Case, D. A. J. Am. Chem. Soc. 2000, 122, 2489–2498.

(64) Feig, M.; Brooks III, C. L. Curr. Opin. Struct. Biol. 2004, 14, 217–224.

(79) Calimet, N.; Schaefer, M.; Simonson, T. Proteins: Struct., Funct., Bioinf. 2001, 45, 144– 158.

(65) Hawkins, G. D.; Cramer, C. J.; Truhlar, D. G. Chem. Phys. Lett. 1995, 246, 122–129.

(80) Spassov, V. Z.; Yan, L.; Szalma, S. J. Phys. Chem. B 2002, 106, 8726–8738.

(66) Haberthür, U.; Caflisch, A. J. Comput. Chem. 2007, 29, 701–715.

(81) Simmerling, C.; Strockbine, B.; Roitberg, A. E. J. Am. Chem. Soc. 2002, 124, 11258–11259.

(67) Hawkins, G. D.; Cramer, C. J.; Truhlar, D. G. J. Phys. Chem. 1996, 100, 19824–19839.

(82) Wang, T.; Wade, R. C. Proteins: Struct., Funct., Bioinf. 2003, 50, 158–169.

(68) Schaefer, M.; Karplus, M. J. Phys. Chem. 1996, 100, 1578–1599.

(83) Nymeyer, H.; García, A. E. Proc. Natl. Acad. ˙ 2003, 100, 13934–13949. Sci. U. S.A.

(69) Qiu, D.; Shenkin, P. S.; Hollinger, F. P.; Still, W. C. J. Phys. Chem. A 1997, 101, 3005– 3014.

(84) Onufriev, A.; Bashford, D.; Case, D. A. Proteins: Struct., Funct., Bioinf. 2004, 55, 383– 394.

(70) Edinger, S. R.; Cortis, C.; Shenkin, P. S.; Friesner, R. A. J. Phys. Chem. B 1997, 101, 1190– 1197.

(85) Gallicchio, E.; Levy, R. M. J. Comput. Chem. 2004, 25, 479–499.

(71) Jayaram, B.; Liu, Y.; Beveridge, D. L. J. Chem. Phys. 1998, 109, 1465–1471.

(86) Lee, M. C.; Duan, Y. Proteins: Struct., Funct., Bioinf. 2004, 55, 620–634.

(72) Ghosh, A.; Rapp, C. S.; Friesner, R. A. J. Phys. Chem. B 1998, 102, 10983–10990.

(87) Im, W.; Lee, M. S.; Brooks III, C. L. J. Comput. Chem. 2003, 24, 1691–1702.

(73) Lee, M. S.; Salsbury, F. R.; Brooks III, C. L. J. Chem. Phys. 2002, 116, 10606–10614.

(88) Nguyen, H.; Perez, A.; Bermeo, S.; Simmerling, C. J. Chem. Theory Comput. 2015, 11, 3714–3728.

(74) Felts, A. K.; Harano, Y.; Gallicchio, E.; Levy, R. M. Proteins: Struct., Funct., Bioinf. 2004, 56, 310–321.

(89) Lange, A. W.; Herbert, J. M. J. Chem. Theory Comput. 2012, 8, 1999–2011.

21

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 25

(90) Tjong, H.; Zhou, H.-X. J. Phys. Chem. B 2007, 111, 3055–3061.

(102) Helgaker, T.; Klopper, W.; Tew, D. P. Molec. Phys. 2008, 106, 2107–2143.

(91) Xu, Z.; Cheng, X.; Yang, H. J. Chem. Phys. 2011, 134, 064107.

(103) Fennell, C. J.; Kehoe, C. W.; Dill, K. A. Proc. ˙ 2011, 108, 3234–3239. Natl. Acad. Sci. U. S.A.

(92) Izadi, S.; Anandakrishnan, R.; Onufriev, A. V. J. Chem. Theory Comput. 2016, 12, 5946–5959, PMID: 27748599.

(104) Merz, K. M. J. Chem. Theory Comput. 2010, 6, 1769–1776. (105) Harris, R. C.; Boschitsch, A. H.; Fenley, M. O. J. Chem. Theory Comput. 2013, 9, 3677–3685.

(93) Case, D. A.; Cheatham III, T. E.; Darden, T.; Gohlke, H.; Luo, R.; Merz Jr, K. M.; Onufriev, A.; Simmerling, C.; Wang, B.; Woods, R. J. J. Comput. Chem. 2005, 26, 1668– 1688.

(106) Harris, R. C.; Boschitsch, A. H.; Fenley, M. O. J. Chem. Phys. 2014, 140. (107) Sørensen, J.; Fenley, M. O.; Amaro, R. E. In Computational Electrostatics for Biological Applications: Geometric and Numerical Approaches to the Description of Electrostatic Interaction Between Macromolecules; Rocchia, W., Spagnuolo, M., Eds.; Springer International Publishing: Switzerland, 2015; pp 39–71.

(94) Brooks, B. R.; Brooks III, C. L.; MacKerell, A. D., Jr.; Nilsson, L.; Petrella, R. J.; Roux, B.; Won, Y.; Archontis, G.; Bartels, C.; Boresch, S.; Caflisch, A.; Caves, L.; Cui, Q.; Dinner, A. R.; Feig, M. et al. J. Comput. Chem. 2009, 30, 1545–1614. (95) Pronk, S.; Páll, S.; Schulz, R.; Larsson, P.; Bjelkmar, P.; Apostolov, R.; Shirts, M. R.; Smith, J. C.; Kasson, P. M.; van der Spoel, D.; Hess, B.; Lindahl, E. Bioinformatics 2013, 29, 845–854.

(108) Cramer, C. J.; Truhlar, D. G. Acc. Chem. Res. 2008, 41, 760–768. (109) Aguilar, B.; Onufriev, A. V. J. Chem. Theory Comput. 2012, 8, 2404–2411.

(96) Phillips, J. C.; Braun, R.; Wang, W.; Gumbart, J.; Tajkhorshid, E.; Villa, E.; Chipot, C.; Skeel, R. D.; Kalé, L.; Schulten, K. J. Comput. Chem. 2005, 26, 1781–1802.

(110) Nguyen, H.; Maier, J.; Huang, H.; Perrone, V.; Simmerling, C. J. Am. Chem. Soc. 2014, 136, 13959–13962.

(97) Eastman, P.; Pande, V. Comput. Sci. Eng. 2010, 12, 34–39.

(111) Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. Nucleic Acids Res. 2000, 28, 235–242.

(98) Qin, S.; Zhou, H.-X. Biopolymers 2007, 86, 112–118.

(112) Katkova, E. V.; Onufriev, A. V.; Aguilar, B.; Sulimov, V. B. J. Mol. Graphics Mod. 2017, 72, 70 – 80.

(99) Dong, F.; Zhou, H.-X. Proteins: Struct., Funct., Bioinf. 2006, 65, 87–102.

(113) Dolinsky, T. J.; Nielsen, J. E.; McCammon, J. A.; Baker, N. A. Nucleic Acids Res. 2004, 32, W665–W667.

(100) Knight, J. L.; Brooks III., C. L. J. Comput. Chem. 2011, 32, 2909–2923. (101) Bartlett, R. J.; Musiał, M. Rev. Mod. Phys. 2007, 79, 291–352. 22

ACS Paragon Plus Environment

Page 23 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(114) Dolinsky, T. J.; Czodrowski, P.; Li, H.; Nielsen, J. E.; Jensen, J. H.; Klebe, G.; Baker, N. A. Nucleic Acids Res. 2007, 35, W522–W525.

(126) Sigalov, G.; Scheffel, P.; Onufriev, A. J. Chem. Phys. 2005, 122, 094511. (127) Hawkins, G.; Cramer, C.; Truhlar, D. J. Phys. Chem. 1996, 100, 19824–19839.

(115) Case, D. A.; Berryman, J. T.; Betz, R. M.; Cerutti, D. S.; Cheatham III, T. E.; Darden, T. A.; Duke, R. E.; Giese, T. J.; Gohlke, H.; Goetz, A. W.; Homeyer, N.; Izadi, S.; Janowski, P.; Kaus, J.; Kovalenko, A. et al. AMBER 2015; University of California, San Francisco, 2015.

(128) Schaefer, M.; Van Vlijmen, H.; Karplus, M. Adv. Protein Chem. 1998, 51, 1–57. (129) Tsui, V.; Case, D. Biopolymers (Nucl. Acid. Sci.) 2001, 56, 275–291. (130) Onufriev, A.; Bashford, D.; Case, D. J. Phys. Chem. B 2000, 104, 3712–3720.

(116) Hu, C. Y.; Kokubo, H.; Lynch, G. C.; Bolen, D. W.; Pettitt, B. M. Prot. Sci. 2010, 19, 1011–1022.

(131) Onufriev, A.; Bashford, D.; Case, D. Proteins: Struct., Funct., Bioinf. 2004, 55, 383–394.

(117) Kokubo, H.; Hu, C. Y.; Pettitt, B. M. J. Am. Chem. Soc. 2011, 133, 1849–1858.

(132) Nguyen, H.; Roe, D. R.; Simmerling, C. J. Chem. Theory Comput. 2013, 9, 2020–2034.

(118) Kokubo, H.; Harris, R. C.; Asthagiri, D.; Pettitt, B. M. J. Phys. Chem. B 2013, 117, 16428– 16435.

(133) Aguilar, B.; Shadrach, R.; Onufriev, A. V. J. Chem. Theory Comput. 2010, 6, 3613–3630. (134) Sigalov, G.; Fenley, A.; Onufriev, A. J. Chem. Phys. 2006, 124, 124902.

(119) Harris, R. C.; Pettitt, B. M. Proc. Natl. Acad. ˙ 2014, 111, 14681–14686. Sci. U. S.A.

(135) Mongan, J.; Simmerling, C.; A. McCammon, J.; A. Case, D. A.; Onufriev, A. J. Chem. Theory Comput. 2007, 3, 156–169.

(120) Baker, N. A.; Sept, D.; Joseph, S.; Holst, M. J.; ˙ McCammon, J. A. Proc. Natl. Acad. Sci. U. S.A. 2001, 98, 10037–10041.

(136) Forouzesh, N.; Izadi, S.; Onufriev, A. V. J. Chem. Inf. Model. 2017, 57, 2505–2513, PMID: 28786669.

(121) Srinivasan, J.; Trevathan, M. W.; Beroza, P.; Case, D. A. Theor. Chem. Acc. 1999, 101, 426– 434.

(137) Homeyer, N.; Gohlke, H. Mol. Inf. 2012, 31, 114–122.

(122) Onufriev, A.; Case, D. A.; Bashford, D. J. Comput. Chem. 2002, 23, 1297–1304.

(138) Miller III, B. R.; McGee Jr, T. D.; Swails, J. M.; Homeyer, N.; Gohlke, H.; Roitberg, A. E. J. Chem. Theory Comput. 2012, 8, 3314–3321.

(123) Onufriev, A. V.; Sigalov, G. J. Chem. Phys. 2011, 134, 164104. (124) Onufriev, A. V.; Izadi, S. Wiley Interdisciplinary Reviews: Computational Molecular Science e1347–n/a, e1347.

(139) Homeyer, N.; Gohlke, H. J. Comput. Chem. 2013, 34, 965–973. (140) Lee, M. S.; Feig, M.; Salsbury, F. R., Jr.; Brooks III, C. L. J. Comput. Chem. 2003, 24, 1348–1356.

(125) Onufriev, A. V.; Aguilar, B. J. Theor. Comput. Chem. 2014, 13, 1440006.

23

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(141) Im, W.; Feig, M.; Brooks III, C. L. Biophys. J. 2003, 85, 2900–2918.

Page 24 of 25

(156) Wright, D. W.; Hall, B. A.; Kenway, O. A.; Jha, S.; Coveney, P. V. J. Chem. Theory Comput. 2014, 10, 1228–1241, PMID: 24683369.

(142) Tjong, H.; Zhou, H.-X. J. Chem. Theory Comput. 2008, 4, 507–514.

(157) Mikulskis, P.; Genheden, S.; Rydberg, P.; Sandberg, L.; Olsen, L.; Ryde, U. J. Comput. Aided Mol. Des. 2012, 26, 527–541.

(143) Nina, M.; Beglov, D.; Roux, B. J. Phys. Chem. B 1997, 101, 5239–5248.

(158) Suenaga, A.; Okimoto, N.; Hirano, Y.; Fukui, K. PLOS ONE 2012, 7, 1–9.

(144) Nina, M.; Im, W.; Roux, B. Biophysical Chemistry 1999, 78, 89 – 96.

(159) Panel, N.; Sun, Y. J.; Fuentes, E. J.; Simonson, T. Frontiers in Molecular Biosciences 2017, 4, 65.

(145) Swanson, J. M. J.; Adcock, S. A.; McCammon, J. A. J. Chem. Th. Comp. 2005, 1, 484– 493.

(160) Genheden, S. J. Computer-Aided Mol. Des. 2011, 25, 1085–1093.

(146) Nicholls, A.; Mobley, D. L.; Guthrie, J. P.; Chodera, J. D.; Bayly, C. I.; Cooper, M. D.; Pande, V. S. J. Med. Chem. 2008, 51, 769–779.

(161) Ren, X.; Tortorella, M.; Wang, J.; Wang, C. Sci.Rep. 2017, 7, 14934.

(147) Chen, J. Journal of Chemical Theory and Computation 2010, 6, 2790–2803. (148) Rankin, K. N.; Sulea, T.; Purisima, E. O. J. Comput. Chem. 2003, 24, 954–962.

(162) Ylilauri, M.; PentikÃd’inen, O. T. J. Chem. Inf. Model. 2013, 53, 2626–2633, PMID: 23988151.

(149) Sanner, M. F.; Olson, A. J.; Spehner, J. C. Biopolymers 1996, 38, 305–320.

(163) Wittayanarakul, K.; Hannongbua, S.; Feig, M. J. Comput. Chem. 2008, 29, 673–685.

(150) Nguyen, H.; Pérez, A.; Bermeo, S.; Simmerling, C. J. Chem. Theory Comput. 2015, 11, 3714–3728. (151) Vorontsov, I. I.; Miyashita, O. J. Comput. Chem. 2011, 32, 1043–1053. (152) Carra, C.; Saha, J.; Cucinotta, F. A. Journal of Molecular Modeling 2012, 18, 3035–3049. (153) Genheden, S.; Luchko, T.; Gusarov, S.; Kovalenko, A.; Ryde, U. J. Phys. Chem. B 2010, 114, 8505–8516, PMID: 20524650. (154) Delgado-Soler, L.; Pinto, M.; Tanaka-Gil, K.; Rubio-Martinez, J. J. Chem. Inf. Model. 2012, 52, 2107–2118, PMID: 22794663. (155) Rastelli, G.; Rio, A. D.; Degliesposti, G.; Sgobba, M. J. Comput. Chem. 2010, 31, 797– 810. 24

ACS Paragon Plus Environment

Page 25 of 25

TABLE OF CONTENTS GRAPHIC

!"#$%&'()*+,'%*-%./

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

25

ACS Paragon Plus Environment