OPLS3: A Force Field Providing Broad Coverage of Drug-like Small

Nov 17, 2015 - POVME 3.0: Software for Mapping Binding Pocket Flexibility .... Virus NS5B Replicase Palm Site Allosteric Inhibitor (BMS-929075) Advanc...
0 downloads 4 Views 3MB Size
Article pubs.acs.org/JCTC

OPLS3: A Force Field Providing Broad Coverage of Drug-like Small Molecules and Proteins Edward Harder,*,† Wolfgang Damm,† Jon Maple,† Chuanjie Wu,† Mark Reboul,† Jin Yu Xiang,† Lingle Wang,† Dmitry Lupyan,† Markus K. Dahlgren,† Jennifer L. Knight,† Joseph W. Kaus,† David S. Cerutti,† Goran Krilov,† William L. Jorgensen,§ Robert Abel,† and Richard A. Friesner‡ †

Schrodinger, Inc., 120 West 45th Street, New York, New York 10036, United States Department of Chemistry, Columbia University, 3000 Broadway, New York, New York 10027, United States § Department of Chemistry, Yale University, New Haven, Connecticut 06520, United States ‡

S Supporting Information *

ABSTRACT: The parametrization and validation of the OPLS3 force field for small molecules and proteins are reported. Enhancements with respect to the previous version (OPLS2.1) include the addition of offatom charge sites to represent halogen bonding and aryl nitrogen lone pairs as well as a complete refit of peptide dihedral parameters to better model the native structure of proteins. To adequately cover medicinal chemical space, OPLS3 employs over an order of magnitude more reference data and associated parameter types relative to other commonly used small molecule force fields (e.g., MMFF and OPLS_2005). As a consequence, OPLS3 achieves a high level of accuracy across performance benchmarks that assess small molecule conformational propensities and solvation. The newly fitted peptide dihedrals lead to significant improvements in the representation of secondary structure elements in simulated peptides and native structure stability over a number of proteins. Together, the improvements made to both the small molecule and protein force field lead to a high level of accuracy in predicting protein−ligand binding measured over a wide range of targets and ligands (less than 1 kcal/mol RMS error) representing a 30% improvement over earlier variants of the OPLS force field.

I. INTRODUCTION Realistic atomistic simulation of complex molecular systems is contingent upon the availability of an accurate and reliable molecular mechanics force field. The treatment of systems of interest in biology and materials science typically requires models on the order of thousands to millions of atoms and time scales from nanoseconds to seconds; even at the low end of both of these ranges, purely quantum mechanical dynamics is extremely expensive (and not necessarily more accurate, as quantum chemical methods that are tractable for systems of this size have deficiencies of their own). Hence, molecular mechanics force field development has been a central goal of computational chemistry for the past four decades. Force field development is a highly challenging, laborintensive effort; it takes large groups working together, over an extended time period, to produce a model that is useful in addressing practical problems. Beginning in the early 1980s, a number of force fields have emerged as the result of such collective efforts. In the biomolecular simulation area (which will be the main focus of the present work), these force fields include a number of variants of the OPLS,1−4 AMBER,5−10 CHARMM,11,12 and Gromos13,14 potential energy functions. All of these force fields share certain basic characteristics: they rely on a fixed charge (as opposed to explicitly polarizable) © 2015 American Chemical Society

electrostatic model, they employ a standard (Lennard-Jones 6− 12) van der Waals potential to model electronic repulsion and dispersive nonbonded interactions, and they utilize harmonic stretching and bending terms, and dihedral angle based torsional potentials, to model the valence component of the energy. Ultimately, explicit inclusion of polarization effects will be required to achieve the best fidelity with the actual underlying physics of the system; however, parametrization of a polarizable force field adds considerable additional complexity, and fixed charge models have proven to be capable of far more precision, in a wide range of applications, than one might have originally surmised.15−22 Hence the aforementioned force field development groups have continued to focus on producing the best possible fixed charge model, as manifested by comparison to experimental data.2,8,10,12,23,24 As well as being deployed as the workhorse solutions in current modeling projects, such models serve as benchmarks for the coming generations of polarizable force fields.25−28 A truly compelling polarizable model must reproduce the (very extensive) successes of fixed charge Received: September 7, 2015 Published: November 17, 2015 281

DOI: 10.1021/acs.jctc.5b00864 J. Chem. Theory Comput. 2016, 12, 281−296

Article

Journal of Chemical Theory and Computation

Section IV presents FEP calculations for a large and diverse set of ligands and receptors, comparing with experimental binding affinity data. The data was not used to optimize the force field parameters and hence constitutes an unbiased evaluation of both protein and small molecule force field quality. We compare results to earlier variants of the OPLS force field (OPLS_2005, OPLS2.1), enabling the accuracy of the results to be assessed as various upgrades to the force field are implemented. This data shows very significant improvement as the functional form and parametrization of the model are improved; furthermore, in many cases, large outliers are eliminated by the modifications of the force field and afford a straightforward physical interpretation. The results for the most recent version of the force field and sampling protocol display an RMS error suggesting that, for the first time, the errors in the simulation are approaching estimates, perhaps within a factor of 2, of the experimental error. Further improvements in both the force field and the sampling protocol will be required to achieve results at this level. However, the robustness of the methodology and systematic reduction in errors obtained via force field modification are highly encouraging with regard to deployment of FEP and related methods in projects such as drug discovery and materials simulation. We consider in more detail the theoretical and practical implications of the current results in section V, the Discussion. Section VI, the Conclusion, summarizes the results and discusses future directions.

models, as well as improving the results in cases where explicit polarization is important to achieving high accuracy. In the present paper, we describe the development of a new generation of models based on the OPLS force field of Jorgensen and co-workers.3,4 OPLS was one of the first models in which parameters were extensively optimized to reproduce liquid state thermodynamic properties for a variety of small organic molecules; this approach has been subsequently adopted by many of the force field development groups noted above. These efforts yielded a core set of nonbonded (van der Waals) parameters, which still remain at the heart of OPLS development, including the OPLS_2005,1 OPLS2.1,22,29 and OPLS3 parametrizations discussed below. However, other components of the force field and associated development protocols have been substantially enhanced based on improvements in the underlying software and technologies used in their parametrization and the exponential growth of computing power. The key improvements discussed below are as follows: (1) Extensive use of high quality quantum chemical calculations to generate more accurate valence and torsional parametrizations. The OPLS3 model contains nearly 2 orders of magnitude more explicitly fit torsional parameters than alternative small molecule force fields. (2) Improvements of the charge model, including the use of a CM1A-BCC methodology30,31 to compute partial charges, and incorporation of virtual (off-atom centered) sites to better represent lone pair and sigma hole charge distributions. (3) The development of an improved protein force field. (4) Extensive testing of the models against large data sets of small molecule, protein, and protein−ligand conformations and energetics. These comparisons guide key heuristic assumptions made in the models and ultimately provide validation of the model without further parameter adjustment. A novel aspect of the present work is the use of free energy perturbation (FEP) theory to evaluate the ability of the force field to predict protein−ligand binding affinities. The FEP data set used here includes hundreds of ligands and multiple protein target classes and enables us to benchmark both the protein and small molecule models in the force field. The paper is organized as follows. In section II, we review the functional form of the OPLS force field family, briefly discuss the heuristic assumptions inherent in using a functional form of this type, and outline the protocol used for parameter development, again discussing the rationale for the approaches we have taken as well as the mechanics. Section III presents the major development efforts focused on the valence parametrization and the nonbonded parametrization, including the new charge model. These sections include a discussion of the small molecule quantum chemical and experimental data sets used for the parametrization and accuracy benchmarks achieved for different versions of the force field in both training sets and test sets. This section also includes a discussion of the parametrization protocol used in the development of the protein force field including investigations of protein stability under simulation for a wide variety of structures, examining a significant fraction of the standard test cases used for this purpose in the literature. Although, improvement of the OPLS protein force field has received less attention than for CHARMM and AMBER over the past decade,2,23,24 the major revision of the protein force field in OPLS3 reported here appears to display performance that is competitive with the state of the art.

II. OUTLINE OF THE OPLS FORCE FIELD: FUNCTIONAL FORM, PARAMETRIZATION, AND KEY APPROXIMATIONS The functional form of the OPLS force field is given by E=

∑ [qiqje 2/rij + 4εij(σij12/rij12 − σij6/rij6)]fij i 1.0 and < 2.0 kcal/mol

% > 2.0 kcal/mol

MMFF OPLS_2005 OPLS2.1 OPLS3

33 33 89 88

32 33 8 9

35 32 3 3

RMS error distributions for the OPLS3 training set. As expected, the OPLS3 RMS errors are very small in the great majority of cases. The consistency with OPLS2.1 reflects a training set that is largely the same between the respective models. Both OPLS_2005 and MMFF display a high fraction of cases in which the RMS error is large (greater than 2 kcal/ mol). These results reveal the major deficiencies in the older force fields; a very high fraction of molecules is going to have at least one very inaccurate torsional parameter, and many will have more than one. Parameter transferability is assessed against additional data sets that enumerate relative energies between conformational minima. Three such sets of molecules are investigated: 1. Molecules are in common with the OPLS3 training set. 2. Molecules that differ from the training set with constituent torsion types fully covered by OPLS3. 3. Molecules that contain torsion types not well covered by OPLS3. Structures are developed from force field based conformational searches followed by unrestrained B3LYP/6-31G* optimizations. QM energies are evaluated using the LMP2 level discussed above. Torsion restraints are applied to the respective force field based minimizations. Corresponding RMS errors are reported in Table 3. Not surprisingly the OPLS3 error correlates with data 285

DOI: 10.1021/acs.jctc.5b00864 J. Chem. Theory Comput. 2016, 12, 281−296

Article

Journal of Chemical Theory and Computation

primarily based on liquid state simulations of small organic molecules.3,43−45 OPLS3 has a total of 124 atom types with different van der Waals parameter sets (Lennard-Jones σ and ε values). Parameters for sulfides and disulfides are based on the Kaminski et al. revision of OPLS-AA,1 while ion parameters were taken from Jensen and Jorgensen.46 These parameters are used in all three OPLS variants that we discuss herein. E. Charge Model. The OPLS3 force field employs a CM1A-BCC based charge model that was initially introduced with OPLS2.0.29 The model is based on a combination of Cramer-Truhlar CM1A charges31 with specifically fit bond charge correction terms (BCC). Up through OPLS2.1,22 this model has employed atom-centered partial charges to represent the molecular charge distribution. In OPLS3, we introduce virtual (off-atom centered) sites for aromatic ring nitrogens and aryl halogens excluding fluorine; we show below that this augmentation of the charge model yields important quantifiable improvements in the prediction of binding free energy. The BCC corrections are fit to reproduce both QM electrostatic potentials (using HF/6-31G* over the 11,845 model compounds used for torsional fitting and an additional 357 aromatic rings) as well as solvation free energies for 153 molecules for which experimental data is available. BCC’s can range in magnitude from close to zero to nearly a full elementary charge. Protein charges, treated specially, conform to values published by Kaminski et al. for OPLS-AA/L.1 For the presently covered functional groups, the geometry of the off-site charge is defined in 2 ways. For aryl halides the offsite charge is located along the carbon-halide bond vector, distal to the aromatic ring. The distance between the halide atom and the off-site charge is 1.6 Å for chlorine and bromine and 1.8 Å for iodine. For the aromatic divalent nitrogens (n) bonded to atoms x1 and x2, the off-site charge resides in the plane formed by these three atoms, 0.4 Å from the nitrogen along the bisector of the x1−n−x2 angle. The geometries are consistent with those developed by Jorgensen et al.32 for aryl halides and by Harder et al. for the aryl nitrogens28 and are illustrated in Figure 1.

Table 3. Conformational Energy RMS Error with Respect to QM (kcal/mol)a

a

model

training molecules (6500)

test molecules: well covered (1600)

test molecules: missing coverage (3400)

MMFF OPLS_2005 OPLS2.1 OPLS3

2.2 2.1 0.9 0.9

2.2 2.0 1.2 1.2

2.1 2.3 2.0 2.0

Number of molecules in each set is shown in parentheses.

set similarity to the torsion training set. Good accuracy is seen when the constituent torsions of a molecule are fully covered by OPLS3 fitted torsion types (RMSE ∼ 1 kcal/mol), and poor accuracy is seen otherwise (RMSE ∼ 2 kcal/mol). An analysis of the 6 M compound database of drug-like molecules discussed above indicates that 93% of the torsional bonds found in drug-like molecules are well covered by OPLS3 fitted types. Poor accuracy (similar to when OPLS3 has poor coverage) is seen across all data sets for both OPLS_2005 and MMFF. The results shown above demonstrate that OPLS3 is much better able to represent the conformational propensities of organic molecules relevant to medicinal chemistry than older force fields such as OPLS_2005 and MMFF (or any molecular mechanics force field that does not contain tens of thousands of torsional parameters). However, the coverage of OPLS3 is not complete; we estimate that the probability that a typical new molecule of interest in a drug discovery program will have at least one missing torsion is 33%. To provide coverage for these cases, we have developed an automated version of our fitting protocol, which can be deployed by users to generate high quality parameters for such “missing” torsions (i.e., those which our atom typing software identifies as not possessing a good match in our parameter database). We have tested this module, referred to herein as the FFBuilder, on a wide variety of molecules. Validation results for FFBuilder are summarized in Table 4. Table 4. Torsional Profile RMS Errors with Respect to QM (kcal/mol) before and after Applying the FFBuildera data set

no. ligands

no. missing torsions

RMS error

before FFBuilder after FFBuilder

199 199

43 0

1.6 0.6

a The 199 ligands are from the protein−ligand binding accuracy validation set documented in Table 11.

The combination of OPLS3 and the FFBuilder provides an accurate and reliable solution to the problem of assigning torsional parameters across a wide range of organic compounds. The CPU time required to run the FFBuilder for a single torsion is typically less than 1 h in with a modest computational investment (12 CPUs). This is quite small when compared to, for example, the computational effort required for an FEP simulation. It is important to note that without the extensive torsional parametrization found in OPLS3, it would be much more difficult (requiring an order of magnitude more computation) to use the FFBuilder technology to complete the force field in an automated fashion. D. van der Waals Parameters. The van der Waals parameters are those developed by the Jorgensen group,

Figure 1. Examples illustrating the position of the off-site charge (pink circle) for the nitrogen of pyridine and the chlorine of chlorobenzne. 286

DOI: 10.1021/acs.jctc.5b00864 J. Chem. Theory Comput. 2016, 12, 281−296

Article

Journal of Chemical Theory and Computation A BCC between the parent real atom and the off-site charge defines the partial charge assigned to the virtual site. That BCC parameter along with additional intraring BCC’s are fit to minimize the difference in the electrostatic potential (ESP) with respect to HF/6-31G*. The fitted parameters are subject to restraints (with respect to a BCC values of zero). Comparison against experimental condensed phase data is used to calibrate the strength of the restraint for a handful of test cases (pyridine, pyrimidine, pyridazine, pyrazole, oxazole, chlorobenzene). A total of 357 ring fragments including 81 monocycles and 256 fused rings were used in the present parametrization. The present round of charge fitting also included aromatic heterocycles not containing virtual site parameters (like thiophene and furan). The average ESP RMSD over these training set molecules drops to 1.2 kcal/mol/e for OPLS3 versus 2.2 kcal/mol/e for OPLS2.1. Preliminary testing with adding virtual sites to hydroxyl oxygen indicated only marginal impact, which is consistent with the observations concerning alcohols made previously.28 This work also found that off-site charges added little to improve model accuracy of amide carbonyl oxygens. These results suggest that the partial charge distribution around oxygen is sufficiently close to spherical for the off-site charges to be of limited importance. The semiempirical quantum chemistry in conjunction with the parametrized BCC corrections does a reasonable job in many cases in reorganizing the charge distribution in response to these sorts of modifications. For applications where rapid computation of the force field parameters is required, the speed of a semiempirical calculation as compared to ab initio methods is highly relevant. For cases where the force field model is to be used in an expensive simulation such as FEP, one could employ ab initio quantum mechanics to compute charge distributions directly (rather than making BCC corrections). We intend to pursue this direction in future work. For the present, it is useful to calibrate the accuracy and reliability of the BCC model by comparing the FEP results to experiment, as is done below and in section IV. A useful initial test of the accuracy of the charge model is obtained by computing the aqueous solvation free energies of small molecules via FEP. The aggregate results obtained with OPLS3 are very similar to those published previously for OPLS2.0 (unchanged in OPLS2.1);29 the accuracy versus experiment is summarized in Table 5 and compared to OPLS_2005, CHARMM, and AMBER. Table 6 highlights a small number of cases where the solvation free energy result is impacted by the addition of virtual sites; significant improvement in these cases is observed. Overall, OPLS3 yields accurate

Table 6. Hydration Free Energies for Small Molecules Impacted by New Virtual Site Based Chargesa

MUE (kcal/mol)

RMSE (kcal/mol)

%>2 kcal/mol

ChelpG/CHARMM AM1-BCC/GAFF OPLS_2005 OPLS2.0/OPLS2.1 OPLS3

1.93 1.17 1.10 0.73 0.72

2.28 1.39 1.33 0.88 0.87

44.7 15.0 8.5 2.1 2.1

exp.

OPLS2.1

OPLS3

−1.1 −1.5 −1.7 −1.4 −1.0 −4.7 −4.4 −4.0 −9.6 −8.4 −5.7

−1.4 −1.5 −1.5 −1.4 −1.9 −4.2 −3.9 −2.9 −8.0 −4.3 −5.0 1.4

−0.9 −1.4 −1.7 −1.3 −0.7 −4.3 −4.8 −3.5 −9.4 −7.4 −5.5 0.4

a

Free energies are reported in kcal/mol. The N-methyl imidazole outlier significantly impacts the RMS error of OPLS2.1. Removing it from the set gives an RMS error of 0.8 kcal/mol.

solvation free energies for an expansive set of molecules, suggesting a capacity for high performance in protein−ligand binding (where a reliable description of solvation is important). Consistent with the OPLS2.0 work cited above,29 we employ the SPC water model for all simulation-based results presented herein though previous work has shown that there is relatively little sensitivity to the choice of water model in computed aqueous hydration free energies.47 As noted above, the fixed charge form of the charge model employed in OPLS3 represents a significant heuristic approximation. History has shown that fitting nonbonded parameters for fixed charge models to condensed phase properties is a reliable means of creating transferable parameters, the lineage of water models being the bestknown example. However, such systems comprise only a small fraction of the chemical space OPLS3 aims to cover. In order to validate the computational fitting methods discussed above, we turn to FEP calculations of protein−ligand binding systems as described in section IV. F. Protein Force Field Development. For predictive biomolecular simulation, the protein force field must be refined to a very high degree of accuracy and stability. Small errors in, for example, the torsional potential of the protein dihedral angles, can translate into significant disagreements with experiment, ranging from discrepancies in the prediction of NMR spectra to incorrect prediction of the stability of proteins and peptides. Over the past decade, the CHARMM and AMBER force fields have incorporated a greatly increased amount of quantum chemical data, and training to simulation data as compared with experiment, to produce protein force fields of substantially higher quality and reliability than those available 20 years ago. The OPLS3 protein force field represents a similar upgrade of the OPLS force field and employs similar data sets and techniques to achieve improved fidelity. We briefly describe the methodology and results in what follows. The protein partial charges and van der Waals parameters are retained from older version of OPLS; we use the parameters reported in Kaminski et al.1 These parameters were developed over many years, primarily by fitting to small molecule liquid state data. This parametrization to experiment is important in the context of simulation of a marginally stable protein structure, in which an unbalanced set of nonbonded terms

Table 5. Error Summary for Hydration Free Energy Results for 239 Small Moleculesa force field

compound Cl-benzene Br-benzene I-benzene 1,2 Cl-benzene 1,4 Cl-benzene pyridine 2,Cl-pyridine 3,Cl-pyridine imidazole N-methyl imidazole quinoline RMS error

a

Values for ChelpG/CHARMM and AM1-BCC/GAFF, OPLS_2005, and OPLS2.0/2.1 are those reported in Shivakumar et al.29 287

DOI: 10.1021/acs.jctc.5b00864 J. Chem. Theory Comput. 2016, 12, 281−296

Article

Journal of Chemical Theory and Computation can easily lead to poor stabilization of appropriate secondary and tertiary structures. The largest changes were implemented in the torsional parameters. First after some experimentation, we reverted to a 1−4 scale factor of 0.5 (0.833 was used in OPLS2.0 and OPLS2.1). Second, we refit the backbone torsions to a larger set of quantum chemical data with better coverage of phase space. The side chain torsions were also refit to new quantum chemical data, with additional refinement so as to reproduce torsion angle distributions observed in PDB structures as discussed below. Nonglycine/proline backbone torsions were fit against alanine dipeptide QM energy data over a set of 2391 structures. This set includes 324 states spanning the phi/psi space in regular 20-degree increments. An additional 2067 structures were sampled from a high temperature (T = 340 K) molecular dynamics simulation of alanine dipeptide in explicit water. All conformations were energy-minimized at the B3LYP/6-31G* level with constraints applied to the phi and psi dihedrals followed by single point LMP2/cc-pVTZ(-f). All nonglycine/ proline residues employed the same set of backbone phi/psi torsion parameters. Table 7 presents results comparing the RMS error for the 2391 structures in the training set for OPLS3 and OPLS2.1. We

Table 8. Side Chain Conformational Energy RMS Error with Respect to QMa backbone conformation

OPLS2.1 (kcal/mol)

OPLS3 (kcal/mol)

alpha beta

1.7 1.9

1.4 1.2

Error is computed over all available χ1,χ2 rotamers over representative dipeptides of all amino acid residues. a

OPLS3 as compared to the quantum chemical benchmark data. As in the case of the backbone, there is significant reduction of the RMS error. Figure 2 compares the torsional distributions for the key residues Asp, Arg, and Glu for OPLS3 and OPLS2.1 to

Table 7. Backbone Conformational Energy RMS Errors with Respect to QMa fragment

OPLS2.1 (kcal/mol)

OPLS3 (kcal/mol)

ala2 ala4

1.3 4.0

0.97 1.8

a

Error is computed over 2391 alanine dipeptide (ala2) conformations and 1700 alanine tetrapeptide conformations (ala4).

also used a set of 1700 alanine tetrapeptide structures as a test set; the RMS error for these structures for both force fields is also shown in Table 7. The large improvement in RMS error in OPLS3 is apparent. Further reduction of the error is possible for example by using a numerical two-dimensional form of the backbone torsional potential; however, as discussed in detail in ref 48, it appears as though at some point there are diminishing returns to fitting quantum chemical data with increasing accuracy. Glycine backbone torsions were fit against 703 glycine dipeptide structures spanning the phi/psi space in regular 10 degree increments with energies calculated at the LMP2/ccpVTZ(-f) level from B3LYP/6-31G* optimized structures. Proline backbone phi torsion parameters were based on an energy scan of the proline dipeptide. Three distinct sets of psi torsion parameters were defined and parametrized based on tripeptides where psi interfaces between: two prolines, prolineglycine, and proline-alanine. Residue specific side chain torsions were predominantly fit against chi1 and chi2 dipeptide torsion scans (30-degree increments) averaging over two one-dimensional scans with the backbone phi/psi in either the alpha or beta backbone state. Charged residues were based on fits to the beta conformer alone. The fits of ASP chi1, ARG chi1, and GLU chi2 were additionally tuned by comparing simulated propensities (averaging over all ASP, ARG, GLU’s in gb3, ubiquitin, crambin, bpti, sumo2, lysozyme, and trp-cage) to propensities observed in 5307 high resolution PDB structures. Table 8 displays RMS errors over all side chains for OPLS2.1 and

Figure 2. Histograms of the chi1 torsion angle developed from simulation and a survey of the PDB. The OPLS2.1 and OPLS3 populations are taken over matching residues in the 7 protein systems studied herein over three 200 ns simulations. The PDB histogram is developed form 5307 high-resolution crystal structures.

experimental distributions observed in the PDB. As can be seen, these distributions are much improved in OPLS3. A consequence of these improved distributions can be seen in the simulation of GB3 using OPLS3 and OPLS2.1. There is a key Asp residue in one of the GB3 helices, where the native rotamer state is disfavored (incorrectly) by OPLS2.1. This inaccuracy in the torsional potential plays a significant role in the destabilization of the helix in the OPLS2.1 simulation. A comparison of the problematic structure observed in the OPLS2.1 simulation versus the more native-like structure observed in the OPLS3 simulation is shown in Figure 3. 288

DOI: 10.1021/acs.jctc.5b00864 J. Chem. Theory Comput. 2016, 12, 281−296

Article

Journal of Chemical Theory and Computation

Figure 3. Distributions at left contrast simulated propensity of ASP chi1 across 7 protein simulations against propensities seen across 5307 X-ray crystal structures. OPLS2.1 overstabilizes the chi1 = 90 rotamer which leads to competition with intrahelix hydrogen bonding in ASP36 of GB3 (center) and a concerted unfolding of the helix after 50 ns of simulation. In contrast the same instability is not seen with OPLS3, which has well balanced ASP rotamer populations.

To assess the fidelity of simulated protein structures we investigated structural propensities of three short peptides and structural stability (as measured by RMSD to experiment) of a set of seven small proteins. The short peptides include two helix forming peptides (AAQAA,49 K1950,51) and CLN025,52 which forms a hairpin-like structure at lower temperatures. In total, 3.2 μs of simulated tempering MD simulations were run for AAQAA and CLN025. Results for K19 are based on three independent 200 ns simulations. The helical content of AAQAA, K19, and folded fraction of CLN025 as compared to experiment is summarized in Table 9. The OPLS_2005 and

Table 10 summarizes results for the protein structural stability test. Each system was run in triplicate for 200 ns starting from different random seeds. The RMSD’s reported in the table are averaged over the last 100 ns of the respective simulations. The OPLS3 results are significantly improved compared to OPLS_2005 and OPLS2.1. The same CHARMM (C22*)53 and Amber force fields (AMBER99sb-ILDN)7 discussed above are also included, against which OPLS3 is fully competitive. While there is much more work to be done in calibrating the performance of the OPLS3 protein force field, the present results provide preliminary evidence that it now performs at the state of the art for protein models. Additional testing of OPLS3 is presented below when we discuss the dependence of FEP results upon the protein force field. These comparisons represent a true unbiased test of the model, as no adjustment of protein force field parameters was performed in response to FEP data.

Table 9. Secondary Structure Propensitiesa protein

exp.

OPLS2.1

OPLS3

C22*

AMBER99sb-ILDN

K19 (aaqaa)3 cln025

50% 43% 91%

17% 2.5% 40%

74% 51% 91%

62% 30% 90%

21% 9% 80%

a Fraction helix for K19 at 277 K and Ac-(AAQAA)3-NH2 at 280 K and fraction folded for hairpin forming cln025 at 300 K. Amber (aaqaa)3 is taken from Lindorff-Larsen et al.23

IV. APPLICATION TO PROTEIN−LIGAND BINDING In ref 22, we presented a large set of FEP calculations of protein−ligand binding affinities using the OPLS2.1 force field. In addition to the improvements in the small molecule parametrization present in OPLS2.1, a number of other important advances were combined to enable a remarkable improvement in the consistency and accuracy of FEP results. These include implementation on a GPU platform, the use of the REST approach to accelerate conformational sampling of

OPLS2.1 force fields clearly underestimate the ordered propensities to a significant degree not seen with recent variants of the CHARMM (C22*)53 and Amber force fields (AMBER99sb-ILDN).7 In contrast, OPLS3 fairs reasonably well in line with the relatively good agreement observed with CHARMM (C22*).53

Table 10. Average RMS Difference with Respect to X-ray or NMR Structure protein name

PDB

OPLS_2005

OPLS2.1

OPLS3

C22*

AMBER99sb-ILDN

trpcage GB3 ubiquitin Sumo2 BPTI crambin lysozyme average

1L2Y 1P7E 1UBQ 1WM3 1BPI 1CRN 6LYT

5.3 1.7 1.1 1.9 1.7 1.7 1.6 2.1

2.7 1.8 1.1 1.8 1.1 0.9 1.6 1.6

1.2 0.8 0.9 1.3 1.2 0.8 1.3 1.1

1.7 0.7 0.8 1.3 1.1 0.7 1.1 1.1

1.7 0.8 0.9 1.4 1.1 0.8 1.1 1.1

289

DOI: 10.1021/acs.jctc.5b00864 J. Chem. Theory Comput. 2016, 12, 281−296

Article

Journal of Chemical Theory and Computation Table 11. Relative Binding Free Energy Resultsa OPLS_2005

OPLS2.1

OPLS2.1 + new proteinFF

OPLS3

system

no. compds

series ref

no. pert.

R2

RMSE

R2

RMSE

R2

RMSE

R2

RMSE

BACE CDK2 JNK1 MCL1 P38 PTP1B thrombin Tyk2 weighted average

36 16 21 42 34 23 11 16

54 55 56 57 58 59 60 61, 62

58 25 34 71 56 49 16 24

0.01 0.48 0.75 0.46 0.32 0.55 0.21 0.86

1.35 0.98 0.87 1.77 0.95 1.55 1.35 0.75 1.28 ± 0.06

0.56 0.07 0.74 0.62 0.54 0.50 0.40 0.80

1.03 1.27 0.87 1.44 0.97 1.05 0.97 0.98 1.11 ± 0.05

0.58 0.34 0.76 0.33 0.46 0.51 0.40 0.84

0.79 1.04 0.81 1.48 1.01 0.91 1.03 0.93 1.04 ± 0.05

0.64 0.51 0.76 0.37 0.57 0.79 0.38 0.84

0.89 0.86 0.62 1.40 1.05 0.57 0.83 0.98 0.95 ± 0.04

a Includes root-mean-square error (RMSE) reported in kcal/mol and square of the correlation coefficient (R2). Uncertainty in weighted average RMSE calculated based on bootstrapping method.

Figure 4. Experimental versus FEP-predicted binding free energies over 6 BACE ligands distinguished by their R1 substituent. Left panel shows the binding mode of the methoxy derivative indicating the position of the functional group varied across the series. Binding affinities as a function of the R1 functional group shown in the middle panel. At right, representative structures, taken from simulations of OPLS2.1 and OPLS3, of the Ala396 residue and its associated helix, which enclose the S3 pocket of the BACE1 binding site.

Figure 5. Representative configurations of the 1h1s complex taken from simulations of OPLS2.1 (left) and OPLS3 (middle). Histogram of the distance between the sulfonamide nitrogen of the ligand and carboxylate carbon of Asp86 is shown at right.

force field and addition of virtual sites coupled with a more extensive parametrization of heterocyclic systems. All of these factors are significant across the entire data set; for a given molecule, one or two may be dominant. As our purpose here is to highlight the advances in OPLS3, we will focus on the effects of the protein force field and virtual sites below. Table 11 also includes one additional model which combines the new protein force field with the small molecule OPLS2.1 model (OPLS2.1+new_proteinFF) to delineate the statistical improvements due to the new protein force field and virtual sites individually for each series in the data set; it can be seen that both make notable contributions to reducing the errors, albeit with significant variation in impact depending upon which series is being studied.

the complex, and an automated interface for set up of FEP jobs and error analysis via cycle closure calculations. Details of these various aspects of our FEP methodology are provided in ref 22 and will be expanded upon in upcoming publications. Our primary objective in the present paper is to examine the effects of improvements in the force field on the accuracy of FEP calculations. To that end, we present a statistical summary of FEP performance across a wide range of data sets for OPLS_2005, OPLS2.1, and OPLS3, presented in Table 11. The change in RMS error from OPLS_2005 to OPLS2.1 and then to OPLS3 is substantial. The principal sources of improvement in OPLS2.1, as compared to OPLS_2005, are the ligand partial charges and torsional parameters. In OPLS3, improvement is driven by enhancements made to the protein 290

DOI: 10.1021/acs.jctc.5b00864 J. Chem. Theory Comput. 2016, 12, 281−296

Article

Journal of Chemical Theory and Computation We demonstrate the effects of the protein force field on ligand binding affinity prediction with a couple of examples. From Table 11, the RMSE over the BACE series improves from 1.03 kcal/mol with OPLS2.1 to 0.89 kcal/mol with OPLS3. As illustrated in Figure 4, the main driving force behind this improvement is better modeling of ligand perturbations that specifically explore the S3 subpocket of the protein.54 The substitution position that probes this pocket (R1) is illustrated in the left panel of Figure 4. The plot in the middle gives the experimental and FEP-predicted binding affinity over a ligand subseries that solely varies R1. OPLS3 captures the trend well, whereas OPLS2.1 overestimates the favorability of larger R1 groups (methoxy, nitrile, methyl-alkyne) by 1−2 kcal/mol relative to their smaller counterparts (hydrogen and methyl). The reason for the overly favorable OPLS2.1 predictions is illustrated in the right panel. In OPLS2.1, the helix that partially encloses the S3 pocket preferentially adopts an alternate conformation translating the pocket occluding Ala396 residue leading to an overly permissible protein environment for the larger R1 substituents. A substantial improvement is also seen in CDK2, where the RMSE is 0.86 kcal/mol in OPLS3 and 1.27 kcal/mol in OPLS2.1. This level of improvement is also seen with the OPLS2.1+new_proteinFF model, which indicates that a significant portion of this gain in accuracy is attributable to improved modeling of the protein-binding site. More specifically, as illustrated in Figure 5, poor fidelity of the Asp86 rotamer states in OPLS2.1 looks to be the underlying cause of the discrepancy. The figure provides a histogram of the distance between the sulfonamide nitrogen of 1h1s and the carboxylate carbon of Asp86 derived from the simulations. Distances less than 4.5 Å indicate hydrogen bond forming configurations between the respective groups. In OPLS2.1, Asp86 preferentially adopts rotameric states that prevent sulfonamide−carboxylate hydrogen bonds which is inconsistent with the hydrogen bonding state indicated by the 1h1s X-ray structure. In contrast, a hydrogen bond is formed between these groups in approximately 80% of sampled OPLS3 configurations. The CDK2 series also contains two additional ligands that form similar hydrogen bonds to Asp86 (1oiy and 1oi9 shown in Figure 6). The relative predicted affinity between these three ligands with respect to the unsubstituted variant 1h1q is summarized in Table 12. On average, OPLS2.1 underestimates the affinity of these hydrogen-bond forming ligands by approximately 1 kcal/mol which is consistent with

Table 12. Experimental and FEP Predicted Relative Binding Free Energies for CDK2 Ligandsa ligand

ΔΔG(exp.)

ΔΔG(OPLS2.1)

ΔΔG(OPLS3)

1h1q → 1h1s 1h1q → 1oiy 1h1q → 1oi9

−3.07 −1.61 −1.56

−1.70 −0.61 −0.75

−2.58 −1.38 −2.10

a

Specified ligands are illustrated in Figure 6. Free energies are in kcal/ mol.

the lack of hydrogen bonding observed in Figure 5. In contrast, the OPLS3 predicted relative binding affinities are consistent with experiment. We now turn to illustrative examples where the use of virtual sites provides major improvement in FEP predictions for specific ligand series. Without virtual sites, the JNK1 series in Table 11 gives a RMSE of 0.87 and 0.81 kcal/mol using the OPLS2.1 and OPLS2.1+new_proteinFF models, respectively. With the addition of virtual site based charges in OPLS3 the error improves to 0.62 kcal/mol. In part, this improvement is a consequence of improved treatment of halogen bonding effects with the virtual site based model. Halogen bonds consist of favorable interactions between Lewis bases and aryl halogens owing to a positive electrostatic potential that arises from a depletion of electron density at the tip of the halide.63 As has been discussed elsewhere,32,64−66 capturing this effect in a force field necessitates breaking charge symmetry inherent to atomcentered partial charge models which can be accomplished using virtual sites. Figure 7 provides an illustration contrasting

Figure 7. A) Illustrates two ligands from the JNK1 series forming a halogen bond contact with the backbone carbonyl oxygen of Met111. B) The chlorobenzene dimer interaction energy versus intermolecule distance. The distance is defined between the chlorine and carbonyl oxygen in the illustrated orientation.

the modeling of a halogen bond between chlorobenzene and the carbonyl oxygen of N-methylacetamide. The atom centered OPLS2.1 models this interaction as unfavorable at all distances. In contrast OPLS3 models the interaction as favorable at a distance of approximately 3 Å, which is consistent with QM. As shown in Figure 7, ligands 6f and 6q consist of aryl halogen groups that can form halogen bonds with the backbone carbonyl oxygen of Met111. Table 13 gives the associated predicted binding affinity for both ligands. Consistent with the lack of halogen bonding in the model, OPLS2.1 underestimates the free energy of binding by 1.2 kcal/mol (averaged over the pair). In contrast, the binding affinity of the pair is predicted to

Figure 6. Subset of CDK2 series ligands illustrating hydrogen-bond to carboxylate of Asp86 for ligand 1h1s, 1oi9, and 1oiy. Binding free energies relative to the unsubstituted 1h1q are tabulated in Table 12. 291

DOI: 10.1021/acs.jctc.5b00864 J. Chem. Theory Comput. 2016, 12, 281−296

Article

Journal of Chemical Theory and Computation

Table 14. Relative Binding Free Energy Error (RMSE)a

Table 13. Experimental and Predicted Relative Binding Free Energies for JNK1a

a

ligand

ΔΔG(exp.)

ΔΔG(OPLS2.1)

ΔΔG(OPLS3)

6f 6q

−8.11 −7.51

−6.64 −6.53

−7.33 −7.36

receptor BACE MCL1 CHK1 FXA weighted average

Ligand pair is illustrated in Figure 7. Free energies are in kcal/mol.

no. series no. compds 3 5 7 2

no. perts

OPLS2.1

OPLS3

17 15 16 14

1.05 1.04 1.52 2.05 1.39

1.08 0.88 1.30 1.27 1.13

14 15 18 8

a

Free energies, reported in units of kcal/mol, are calculated for perturbation pairs that probe changes in the composition of aryl rings. Full set covers a broad range of chemical perturbation types and receptor environments. An example ligand for each receptor is given in Figure 8. Perturbation sets for retrospective data sets are given in the Supporting Information. Experimental reference for BACE is ref 54, for MCL1 is ref 57, for CHK1 is refs 67−72, and for FXA is ref 73.

be more favorable by ∼0.8 kcal/mol, as well as more consistent with experiment using OPLS3. Figure 8 displays additional test series that were developed to specifically probe changes in protein−ligand affinity due solely to changes in the composition and position of heteroatoms in aromatic rings; this is the type of problem for which we expect the inclusion of virtual sites and to have the greatest impact. Table 14 summarizes the RMS error over the various examples. In total, the composite system includes 55 ligands, 62 FEP perturbations across 4 protein targets. The overall improvement in the error is significant (approximately 20%). The most significant improvement in this test set is seen in the FXA series where the RMSE drops from 2.05 kcal/mol in OPLS2.1 to 1.27 kcal/mol in OPLS3. The improvement in this series is driven by improved modeling of aryl nitrogen substitutions in the chloroheteroaryl ring of the series that probe the S1 pocket of the binding site.73 As shown in Figure 9 and Table 15, placement of aryl nitrogens and their associated lone pairs (ligands 17i and 17h) in the electron rich receptor environment formed by the backbone carbonyl Gly219 and the carboxylate of Asp18972 leads to a significant drop in affinity relative to the biaryl thiophene derivative (17d). Experimentally the drop in affinity is between 4 and 5 kcal/mol. In OPLS2.1 the drop is marginal by comparison (0.5−1 kcal/mol). The introduction of virtual sites in the representation of the thiazole derivatives more accurately captures the inhospitable nature of this regime of the binding site leading to a significant improvement in the binding affinity trend (drop in affinity of 2−4 kcal/mol).

Figure 9. Subset of FXA series ligands illustrating electron rich receptor environment from the carboxylate of Asp189 and the backbone carbonyl of Gly219 coordinating aryl nitrogen lone pairs of 17i and 17h. Binding free energies relative to 17d are tabulated in Table 15.

Table 15. Experimental and Predicted Relative Binding Free Energies for FXA Ligandsa ligand

ΔΔG(exp.)

ΔΔG(OPLS2.1)

ΔΔG(OPLS3)

17d → 17i 17d → 17h

3.9 4.9

0.5 1.0

2.1 3.9

a Free energies are in kcal/mol. Specified ligands are illustrated in Figure 9. Ligand names are taken from Chan et al.71

V. DISCUSSION As was discussed above in section II, the current state of the art in force field development entails the use of significant heuristic approximations (classical treatment of the nuclei, lack of explicit polarization and the associated models for the effective charge distribution, fitting to gas phase quantum chemical potential energy surfaces) as well as specific choices within these approximations (e.g., the precise level of quantum chemistry used to generate model torsional profiles). The central issue with regard to any such development protocol is the degree of transferability of the models, beyond explicit fitting to experiment. Fitting to experiment is at present necessary at various key points in the development process

(e.g., optimizing van der Waals parameters to fit liquid state thermodynamic data, modifying charge distributions to fit aqueous solvation free energies, adjusting torsional parameters to reflect experimental rotamer distributions in the protein force field). The question then becomes how well the overall process translates into predictive capability for experimental observables that are not directly fitted. For the protein force field, there are numerous benchmarks that have historically been used to calibrate performance in this regard. These include simulations assessing the stability of peptides and small proteins, calculations of observables in NMR experiments (NOEs and dipolar couplings), and computation of time scales for dynamical events (e.g., protein

Figure 8. Example compounds illustrating aryl heterocycle perturbations (perturbed rings highlighted in orange) representing each FEP series in Table 14. 292

DOI: 10.1021/acs.jctc.5b00864 J. Chem. Theory Comput. 2016, 12, 281−296

Article

Journal of Chemical Theory and Computation

force field error observed in the large scale FEP binding affinity testing. This admittedly approximate analysis suggests the underlying accuracy of the OPLS3 force field to be ∼0.5 kcal/ mol or of comparable accuracy to the underlying experimental target affinity data. The preceding not withstanding, it is not possible to rigorously separate the force field and phase space sampling components of the error. Based on our experience investigating outliers in the FEP simulations up to this point, we estimate the RMS error in binding affinity predictions due to the force field to be in the range of 0.5−0.75 kcal/mol, roughly in agreement with the above first order analysis. This is a remarkably small value, particularly given the approximate treatment of polarization which some have believed to fundamentally limit force field accuracy to the point where it might not be useful for applications like a priori binding affinity prediction. Furthermore, our belief is that this error can and will be significantly reduced as we address other details of the force field model, such as improved protocols for assigning partial charges. It seems likely that the point can be reached where the RMS error in the simulation is comparable to the experimental RMS error (0.5 kcal/mol). At that point, higher precision experiments would be required to make progress in calibrating further improvements in the theoretical methodology. From the point of view of practical drug discovery projects, there is also the question of correlation between in vitro and in vivo inhibition; many factors other than in vitro potency affect activity in the cell (some of which are very difficult to model theoretically). It is important to point out that the binding of a small molecule to a protein is only one among many biophysically important events that one would like to predict using biomolecular simulation. This particular event benefits from cancellation of errors (between the ligand in solution and the ligand in the protein) for various types of interactions. For example, if a hydrogen bond strength between a ligand and protein group is overestimated, the error may in large part cancel if the corresponding hydrogen bonds between the ligand and water (and the protein and water) are proportionately overestimated. Furthermore, the ligand and protein have a small number of total interactions as compared to for example a typical protein−protein interface or the self-interactions of a folded protein; in the latter, a large number of small errors may add up to produce nonphysical behavior in binding or folding. Hence, the conclusions presented here with regard to the observed errors do not guarantee that other quantities of interest can be computed to a similar precision. From this more general perspective on biomolecular simulation, perhaps the greatest reason for optimism is the observation that the RMS errors are systematically reduced as the force field quality is improved. Table 11 reports the estimated RMS error due to the force field for OPLS_2005, OPLS2.1, and OPLS3. The improvement across this series is 30%, which is well above the level of noise in the calculations. Furthermore, the reduction in error was achieved without any empirical fitting to protein−ligand binding data. Ultimately force field improvement will include technically difficult features such as explicit polarization, and this should continue the trend in Table 11 further; and it may well be that explicit polarization has a greater qualitative impact on important properties such as relative protein conformational energetics (for example in comparing folded and unfolded states) than it apparently does on protein−ligand binding. However, the

folding, conformational changes, diffusion through ion channels). We have reported results for some of these standard benchmarks in section III.F above; we are currently performing simulations of NMR spectra, which will be discussed in a future publication. The protein−ligand binding affinity FEP results described in section IV constitute a sufficiently large and diverse data set to enable a rigorous assessment of the performance of both the protein and small molecule ligand force fields. As none of the parameters were adjusted to improve the prediction of binding affinities, the RMS errors obtained enable a fair evaluation of transferability and accuracy. Of equal importance is the fact that all of the simulations were run using an identical, fully automated sampling protocol. These aspects of the data generation protocol ensure that the comparisons of theory to experiment are completely unbiased and allow an objective comparison of the accuracy of the three force fields investigated (OPLS_2005, OPLS2.1, OPLS3). The first step in quantitatively analyzing the results is to consider the various sources of error, with respect to experiment, in computed free energies, i.e. what are the various contributions to the observed RMSE of the prediction method versus the experimental data. We enumerate these sources of error as follows: (1) Errors in the force field (2) Errors in the experimental measurements (3) Random errors in the precision of the sampling (defined further below) (4) Systematic errors in the sampling (i.e., failure to fully explore the phase space of the complex or the protein and ligand in solution). Components 1, 2, and 3 are straightforward. One can understand the precision of the sampling as the variance in the reported binding affinity one would typically obtain if a large number of simulations were performed using the standard protocol but with slight variations in the starting geometry or the initially assigned velocities. Errors (3) and (4) together constitute the total sampling error, but it is useful to separate precision and adequacy of phase space coverage as they would in many cases in practice be addressed by different approaches; longer sampling times in the first case, some sort of change in the sampling algorithm in the second (although simulating for an infinitely long time would solve both problems). Interestingly, the recently introduced cycle-closure method of estimating the sampling error can partially estimate the systematic sampling error, so long as all relevant states occur in at least one perturbation of the closed cycle.74 We estimate the error in the experiments (averaged over many different data sets) as on the order of 0.5 kcal/mol RMS error.75 The error in precision of the sampling and a partial estimate of the systematic error in the sampling can be estimated from a cycle closure analysis of the FEP data and is here observed to be 0.66 kcal/mol.74 We further expect that the total observed error must grow as the quadrature of the individual sources of the error. Thus, the total observed error can be related to the individual sources of the error through the following equation 0.95 kcal/mol = sqrt([0.5 kcal/mol]2 + [0.66 kcal/mol]2 + X2) (2)

where 0.95 kcal/mol is the observed RMSE versus the experimental data of the OPLS3 force field in this work, and the remaining variable X can be interpreted to be the estimated 293

DOI: 10.1021/acs.jctc.5b00864 J. Chem. Theory Comput. 2016, 12, 281−296

Article

Journal of Chemical Theory and Computation Notes

results presented herein also set a bar for the performance that should be expected of a polarizable force field that represents an actual improvement over the state of the art fixed charge models. It is not enough to simply include polarization explicitly; the functional form of the remainder of the force field, parametrization of the entire model, and the sampling protocols have to be good enough to enable the benefits of the augmented physical representation to be realized. A demonstration that this is the case in turn requires presenting results for large data sets; a few anecdotal cases cannot be used to draw conclusions about accuracy or robustness. We provide force field parameters for the OPLS3 protein force field and for all molecules relevant to FEP calculations, in the Supporting Information. These data sets should enable benchmarking of alternative force fields, FEP protocols, and other types of binding affinity calculations. The OPLS3 force field, DESMOND molecular dynamics simulation package, and FEP+ module are available from Schrodinger, Inc.

The authors declare the following competing financial interest(s): W.L.J. is a consultant to Schrodinger, Inc. and is on its Scientific Advisory Board. R.A.F. has a significant financial stake in, is a consultant for, and is on the Scientific Advisory Board of Schrodinger, Inc.



VI. CONCLUSION We have shown that the OPLS3 force field delivers state of the art performance in protein simulations and that it is capable of predicting protein−ligand binding affinities to high accuracy over a wide range of targets and ligands (less than 1 kcal/mol RMS error). We have also demonstrated that the accuracy of protein−ligand binding affinity can be systematically improved by improving the force field. This data is highly encouraging with regard to the ability of computational methodology to directly impact drug discovery via high accuracy binding affinity prediction. Many such applications are currently in progress in a number of pharmaceutical and biotechnology companies, as well as via Schrodinger collaborations with various experimental groups. The impact of the methodology on projects remains to be established, but the initial results in this regard are encouraging. In the immediate future, our intention is to carefully investigate the relatively small number of outliers that we have observed in our binding affinity predictions and from this deduce the most important improvements that need to be made in both the force field and the FEP sampling protocols. In parallel, we will continue to work on the protein force field, for example by comparing simulation results with NMR spectra. The combination of systematic force field improvement, better sampling algorithms, and more efficient hardware (via Moore’s law) will continue to increase the utility of computational methods in a wide range of biomolecular simulation applications. In the long run, as such calculations become less expensive, more convenient, and more reliable, theory will evolve toward a full partnership with experiment, to the benefit of both approaches.



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jctc.5b00864. FEP results, input structures, and OPLS3 parameters for each series (ZIP)



REFERENCES

(1) Kaminski, G. A.; Friesner, R. A.; Tirado-Rives, J.; Jorgensen, W. L. J. Phys. Chem. B 2001, 105, 6474. (2) Robertson, M. J.; Tirado-Rives, J.; Jorgensen, W. L. J. Chem. Theory Comput. 2015, 11, 3499−3509. (3) Jorgensen, W. L.; Maxwell, D. S.; Tirado-Rives, J. J. Am. Chem. Soc. 1996, 118, 11225−11236. (4) Jorgensen, W. L.; Maxwell, D. S.; Tirado-Rives, J. J. Am. Chem. Soc. 1988, 110, 1657−1666. (5) Cornell, W. D.; Cieplak, P.; Christopher I. Bayly, C. I.; Gould, I. R.; Merz, K. M.; Ferguson, D. M.; Spellmeyer, D. C.; Fox, T.; Caldwell, J. W.; Kollman, P. A. J. Am. Chem. Soc. 1995, 117, 5179− 5197. (6) Hornak, V.; Abel, R.; Okur, A.; Strockbine, B.; Roitberg, A.; Simmerling, C. Proteins: Struct., Funct., Genet. 2006, 65, 712−725. (7) Lindorff-Larsen, K.; Piana, S.; Palmo, K.; Maragakis, P.; Klepeis, J. L.; Dror, R. O.; Shaw, D. E. Proteins: Struct., Funct., Genet. 2010, 78, 1950−1958. (8) Cerutti, D. S.; Rice, J. E.; Swope, W. C.; Case, D. A. J. Phys. Chem. B 2013, 117, 2328−2338. (9) Li, D.-W.; Bruschweiler, R. Angew. Chem., Int. Ed. 2010, 49, 6778−6780. (10) Maier, J. A.; Martinez, C.; Kasavajhala, K.; Wickstrom, L.; Hauser, K.; Simmerling, C. J. Chem. Theory Comput. 2015, 11, 3696− 3713. (11) MacKerell, A. D., Jr.; Bashford, D.; Bellott, M.; Dunbrack, R. L., Jr.; Evanseck, J. D.; Field, M. J.; Fischer, S.; Gao, J.; Guo, H.; Ha, S.; Joseph-McCarthy, D.; Kuchnir, L.; Kuczera, K.; Lau, F. T. K.; Mattos, C.; Michnick, S.; Ngo, T.; Nguyen, D. T.; Prodhom, B.; Reiher, W. E., III; Roux, B.; Schlenkrich, M.; Smith, J. C.; Stote, R.; Straub, J.; Watanabe, M.; Wiórkiewicz-Kuczera, J.; Yin, D.; Karplus, M. J. Phys. Chem. B 1998, 102, 3586−3616. (12) Best, R. B.; Zhu, X.; Shim, J.; Lopes, P. E. M.; Mittal, J.; Feig, M.; MacKerell, A. D., Jr. J. Chem. Theory Comput. 2012, 8, 3257−3273. (13) Schuler, L. D.; Daura, X.; van Gunsteren, W. F. J. Comput. Chem. 2001, 22, 1205−1218. (14) Oostenbrink, C.; Villa, A.; Mark, A. E.; van Gunsteren, W. F. J. Comput. Chem. 2004, 25, 1656−1676. (15) Simmerling, C.; Strockbine, B.; Roitberg, A. E. J. Am. Chem. Soc. 2002, 124, 11258−11259. (16) Mobley, D. L.; Graves, A. P.; Chodera, J. D.; McReynolds, A. C.; Shoichet, B. K.; Dill, K. A. J. Mol. Biol. 2007, 371, 1118−1134. (17) Lindorff-Larsen, K.; Piana, S.; Dror, R. O.; Shaw, D. E. Science 2011, 334, 517−520. (18) Shan, Y.; Kim, E. T.; Eastwood, M. P.; Dror, R. O.; Seeliger, M. A.; Shaw, D. E. J. Am. Chem. Soc. 2011, 133, 9181−9183. (19) Pande, V. S.; Rokhsar, D. S. Proc. Natl. Acad. Sci. U. S. A. 1999, 96, 9062−9067. (20) Zagrovic, B.; Snow, C. D.; Shirts, M. R.; Pande, V. S. J. Mol. Biol. 2002, 323, 927−937. (21) Fujitani, H.; Tanida, Y.; Ito, M.; Jayachandran, G.; Snow, C. D.; Shirts, M. R.; Sorin, E. J.; Pande, V. S. J. Chem. Phys. 2005, 123, 084108. (22) Wang, L.; Wu, Y.; Deng, Y.; Kim, B.; Pierce, L.; Krilov, G.; Lupyan, D.; Robinson, S.; Dahlgren, M. K.; Greenwood, J.; Romero, D. L.; Masse, C.; Knight, J. L.; Steinbrecher, T.; Beuming, T.; Damm, W.; Harder, E.; Sherman, W.; Brewer, M.; Wester, R.; Murcko, M.; Frye, L.; Farid, R.; Lin, T.; Mobley, D. L.; Jorgensen, W. L.; Berne, B. J.; Friesner, R. A.; Abel, R. J. Am. Chem. Soc. 2015, 137, 2695−2703. (23) Lindorff-Larsen, K.; Maragakis, P.; Piana, S.; Eastwood, M. P.; Shaw, D. E. PLoS One 2012, 7, e32131.

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. 294

DOI: 10.1021/acs.jctc.5b00864 J. Chem. Theory Comput. 2016, 12, 281−296

Article

Journal of Chemical Theory and Computation (24) Beauchamp, K. A.; Lin, Y.-S.; Das, R.; Pande, V. S. J. Chem. Theory Comput. 2012, 8, 1409−1414. (25) Shi, Y.; Xia, Z.; Zhang, J.; Best, R.; Wu, C.; Ponder, J. W.; Ren, P. J. Chem. Theory Comput. 2013, 9, 4046−4063. (26) Lopes, P. E. M.; Huang, J.; Shim, J.; Luo, Y.; Li, H.; Roux, B.; MacKerell, A. D., Jr. J. Chem. Theory Comput. 2013, 9, 5430−5449. (27) Kaminski, G. A.; Stern, H. A.; Berne, B. J.; Friesner, R. A.; Cao, Y. X.; Murphy, R. B.; Zhou, R.; Halgren, T. A. J. Comput. Chem. 2002, 23, 1515−1531. (28) Harder, E.; Anisimov, V. M.; Vorobyov, I. V.; Lopes, P. E. M.; Noskov, S. Y.; MacKerell, A. D., Jr.; Roux, B. J. Chem. Theory Comput. 2006, 2, 1587−1597. (29) Shivakumar, D.; Harder, E.; Damm, W.; Friesner, R. A.; Sherman, W. J. Chem. Theory Comput. 2012, 8, 2553−2558. (30) Jakalian, A.; Bush, B. L.; Jack, D. B.; Bayly, C. I. J. Comput. Chem. 2000, 21, 132−146. (31) Storer, J. W.; Giesen, D. J.; Cramer, C. J.; Truhlar, D. G. J. Comput.-Aided Mol. Des. 1995, 9, 87−110. (32) Jorgensen, W. L.; Schyman, P. J. Chem. Theory Comput. 2012, 8, 3895−3901. (33) Berendsen, H. J. C.; Grigera, J. R.; Straatsma, T. P. J. Phys. Chem. 1987, 91, 6269−6271. (34) Rocklin, G. J.; Mobley, D. L.; Dill, K. A. J. Chem. Theory Comput. 2013, 9, 3072−3083. (35) Faver, J. C.; Yang, W.; Merz, K. M., Jr. J. Chem. Theory Comput. 2012, 8, 3769−3776. (36) Palazzesi, F.; Prakash, M. K.; Bonomi, M.; Barducci, A. J. Chem. Theory Comput. 2015, 11, 2−7. (37) Mobley, D. L.; Guthrie, J. P. J. Comput.-Aided Mol. Des. 2014, 28, 711−720. (38) Beauchamp, K. A.; Behr, J. M.; Rustenburg, A. S.; Bayly, C. I.; Kroenlein, K.; Chodera, J. D. J. Phys. Chem. B 2015, 119, 12912− 12920. (39) Dahlgren, M. K.; Schyman, P.; Tirado-Rives, J.; Jorgensen, W. L. J. Chem. Inf. Model. 2013, 53, 1191−1199. (40) DuBay, K. H.; Hall, M. L.; Wu, C.; Reichman, D. R.; Friesner, R. A. J. Chem. Theory Comput. 2012, 8, 4556−4569. (41) Bochevarov, A. D.; Harder, E.; Hughes, T. F.; Greenwood, J. R.; Braden, D. A.; Philipp, D. M.; Rinaldo, D.; Halls, M. D.; Zhang, J.; Friesner, R. A. Int. J. Quantum Chem. 2013, 113, 2110−2142. (42) Murphy, R. B.; Beachy, M. D.; Friesner, R. A.; Ringnalda, M. N. J. Chem. Phys. 1995, 103, 1481−1490. (43) Damm, W.; Frontera, A.; Tirado−Rives, J.; Jorgensen, W. L. J. Comput. Chem. 1997, 18, 1955−1970. (44) Jorgensen, W. L.; McDonald, N. A. J. Mol. Struct.: THEOCHEM 1998, 424, 145−155. (45) Rizzo, R. C.; Jorgensen, W. L. J. Am. Chem. Soc. 1999, 121, 4827−4836. (46) Jensen, K. P.; Jorgensen, W. L. J. Chem. Theory Comput. 2006, 2, 1499−1509. (47) Shivakumar, D.; Williams, J.; Wu, Y.; Damm, W.; Shelley, J.; Sherman, W. J. Chem. Theory Comput. 2010, 6, 1509−1519. (48) MacKerell, A. D., Jr.; Feig, M.; Brooks, C. L., III J. Comput. Chem. 2004, 25, 1400−1415. (49) Shalongo, W.; Dugad, L.; Stellwagen, E. J. Am. Chem. Soc. 1994, 116, 8288−8293. (50) Doig, A. J.; Baldwin, R. L. Protein Sci. 1995, 4, 1325−1336. (51) Song, K.; Stewart, J. M.; Fesinmeyer, M.; Andersen, N. H.; Simmerling, C. Biopolymers 2008, 89, 747−760. (52) Honda, S.; Akiba, T.; Kato, Y. S.; Sawada, Y.; Sekijima, M.; Ishimura, M.; Ooishi, A.; Watanabe, H.; Odahara, T.; Harata, K. J. Am. Chem. Soc. 2008, 130, 15327−15331. (53) Piana, S.; Lindorff-Larsen, K.; Shaw, D. E. Biophys. J. 2011, 100, L47−L49. (54) Cumming, J. N.; Smith, E. M.; Wang, L.; Misiaszek, J.; Durkin, J.; Pan, J.; Iserloh, U.; Wu, Y.; Zhu, Z.; Strickland, C.; Voigt, J.; Chen, X.; Kennedy, M. E.; Kuvelkar, R.; Hyde, L. A.; Cox, K.; Favreau, L.; Czarniecki, M. F.; Greenlee, W. J.; McKittrick, B. A.; Parker, E. M.; Stamford, A. W. Bioorg. Med. Chem. Lett. 2012, 22, 2444−2449.

(55) Hardcastle, I. R.; Arris, C. E.; Bentley, J.; Boyle, F. T.; Chen, Y.; Curtin, N. J.; Endicott, J. A.; Gibson, A. E.; Golding, B. T.; Griffin, R. J.; Jewsbury, P.; Menyerol, J.; Mesguiche, V.; Newell, D. R.; Noble, M. E.; Pratt, D. J.; Wang, L. Z.; Whitfield, H. J. J. Med. Chem. 2004, 47, 3710−3722. (56) Szczepankiewicz, B. G.; Kosogof, C.; Nelson, L. T. J.; Liu, G.; Liu, B.; Zhao, H.; Serby, M. D.; Xin, Z.; Liu, M.; Gum, R. J.; Haasch, D. L.; Wang, S.; Clampit, J. E.; Johnson, E. F.; Lubben, T. H.; Stashko, M. A.; Olejniczak, E. T.; Sun, C.; Dorwin, S. A.; Haskins, K.; AbadZapatero, C.; Fry, E. H.; Hutchins, C. W.; Sham, H. L.; Rondinone, C. M.; Trevillyan, J. M. J. Med. Chem. 2006, 49, 3563−3580. (57) Friberg, A.; Vigil, D.; Zhao, B.; Daniels, R. N.; Burke, J. P.; Garcia-Barrantes, P. M.; Camper, D.; Chauder, B. A.; Lee, T.; Olejniczak, E. T.; Fesik, S. W. J. Med. Chem. 2013, 56, 15−30. (58) Goldstein, D. M.; Soth, M.; Gabriel, T.; Dewdney, N.; Kuglstatter, A.; Arzeno, H.; Chen, J.; Bingenheimer, W.; Dalrymple, S. A.; Dunn, J.; Farrell, R.; Frauchiger, S.; La Fargue, J.; Ghate, M.; Graves, B.; Hill, R. J.; Li, F.; Litman, R.; Loe, B.; McIntosh, J.; McWeeney, D.; Papp, E.; Park, J.; Reese, H. F.; Roberts, R. T.; Rotstein, D.; San Pablo, B.; Sarma, K.; Stahl, M.; Sung, M.-L.; Suttman, R. T.; Sjogren, E. B.; Tan, Y.; Trejo, A.; Welch, M.; Weller, P.; Wong, B. R.; Zecic, H. J. Med. Chem. 2011, 54, 2255−2265. (59) Wilson, D. P.; Wan, Z.-K.; Xu, W.-X.; Kirincich, S. J.; Follows, B. C.; Joseph-Mccarthy, D.; Foreman, K.; Moretto, A.; Wu, J.; Zhu, M.; Binnun, E.; Zhang, Y.-L.; Tam, M.; Erbe, D. V.; Tobin, J.; Xu, X.; Leung, L.; Shilling, A.; Tam, S. Y.; Mansour, T. S.; Lee, J. J. Med. Chem. 2007, 50, 4681−4698. (60) Baum, B.; Mohamed, M.; Zayed, M.; Gerlach, C.; Heine, A.; Hangauer, D.; Klebe, G. J. Mol. Biol. 2009, 390, 56−69. (61) Liang, J.; Tsui, V.; Van Abbema, A.; Bao, L.; Barrett, K.; Beresini, M.; Berezhkovskiy, L.; Blair, W. S.; Chang, C.; Driscoll, J.; Eigenbrot, C.; Ghilardi, N.; Gibbons, P.; Halladay, J.; Johnson, A.; Kohli, P. B.; Lai, Y.; Liimatta, M.; Mantik, P.; Menghrajani, K.; Murray, J.; Sambrone, A.; Xiao, Y.; Shia, S.; Shin, Y.; Smith, J.; Sohn, S.; Stanley, M.; Ultsch, M.; Zhang, B.; Wu, L. C.; Magnuson, S. Eur. J. Med. Chem. 2013, 67, 175−187. (62) Liang, J.; van Abbema, A.; Balazs, M.; Barrett, K.; Berezhkovsky, L.; Blair, W.; Chang, C.; Delarosa, D.; DeVoss, J.; Driscoll, J.; Eigenbrot, C.; Ghilardi, N.; Gibbons, P.; Halladay, J.; Johnson, A.; Kohli, P. B.; Lai, Y.; Liu, Y.; Lyssikatos, J.; Mantik, P.; Menghrajani, K.; Murray, J.; Peng, I.; Sambrone, A.; Shia, S.; Shin, Y.; Smith, J.; Sohn, S.; Tsui, V.; Ultsch, M.; Wu, L. C.; Xiao, Y.; Yang, W.; Young, J.; Zhang, B.; Zhu, B.-y.; Magnuson, S. J. Med. Chem. 2013, 56, 4521− 4536. (63) Politzer, P.; Murray, J. S.; Clark, T. Phys. Chem. Chem. Phys. 2010, 12, 7748−7757. (64) Kolár,̌ M.; Hobza, P. J. Chem. Theory Comput. 2012, 8, 1325− 1333. (65) Ibrahim, M. A. J. Comput. Chem. 2011, 32, 2564−2574. (66) Mobley, D. L.; Liu, S.; Cerutti, D. S.; Swope, W. C.; Rice, J. E. J. Comput.-Aided Mol. Des. 2012, 26, 551−562. (67) Lin, N.-H.; Xia, P.; Kovar, P.; Park, C.; Chen, Z.; Zhang, H.; Rosenberg, S. H.; Sham, H. L. Bioorg. Med. Chem. Lett. 2006, 16, 421− 426. (68) Huang, S.; Garbaccio, R. M.; Fraley, M. E.; Steen, J.; Kreatsoulas, C.; Hartman, G.; Stirdivant, S.; Drakas, B.; Rickert, K.; Walsh, E.; Hamilton, K.; Buser, C. A.; Hardwick, J.; Mao, X.; Abrams, M.; Beck, S.; Tao, W.; Lobell, R.; Laura Sepp-Lorenzino, L.; Yan, Y.; Ikuta, M.; Murphy, J. Z.; Sardana, V.; Munshi, S.; Kuo, L.; Reillyd, M.; Mahand, E. Bioorg. Med. Chem. Lett. 2006, 16, 5907−5912. (69) Oza, V.; Ashwell, S.; Almeida, L.; Brassil, P.; Breed, J.; Deng, C.; Gero, T.; Grondine, M.; Horn, C.; Ioannidis, S.; Liu, D.; Lyne, P.; Newcombe, N.; Pass, M.; Read, J.; Ready, S.; Rowsell, S.; Su, M.; Toader, D.; Vasbinder, M.; Yu, D.; Yu, Y.; Xue, Y.; Zabludoff, S.; Janetka, J. J. Med. Chem. 2012, 55, 5130−5142. (70) Zhao, L.; Zhang, Y.; Dai, C.; Guzi, T.; Wiswell, D.; Seghezzi, W.; Parry, D.; Fischmann, T.; Siddiqui, M. A. Bioorg. Med. Chem. Lett. 2010, 20, 7216−7221. 295

DOI: 10.1021/acs.jctc.5b00864 J. Chem. Theory Comput. 2016, 12, 281−296

Article

Journal of Chemical Theory and Computation (71) Tao, Z.-F.; Wang, L.; Stewart, K. D.; Chen, Z.; Gu, W.; Bui, M.H.; Merta, P.; Zhang, H.; Kovar, P.; Johnson, E.; Park, C.; Judge, R.; Rosenberg, S.; Sowin, T.; Lin, N.-H. J. Med. Chem. 2007, 50, 1514− 1527. (72) Huang, X.; Cheng, C. C.; Fischmann, T. O.; Duca, J. S.; Yang, X.; Richards, M.; Shipps, G. W., Jr. ACS Med. Chem. Lett. 2012, 3, 123−128. (73) Chan, C.; Borthwick, A. D.; Brown, D.; Burns-Kurtis, C. L.; Campbell, M.; Chaudry, L.; Chung, C-w.; Convery, M. A.; Hamblin, J. N.; Johnstone, L.; Kelly, H. A.; Kleanthous, S.; Patikis, A.; Patel, C.; Pateman, A. J.; Senger, S.; Shah, G. P.; Toomey, J. R.; Watson, N. S.; Weston, H. E.; Whitworth, C.; Young, R. J.; Zhou, P. J. Med. Chem. 2007, 50, 1546−1557. (74) Wang, L.; Deng, Y.; Knight, J. L.; Wu, Y.; Kim, B.; Sherman, W.; Shelley, J. C.; Abel, R. J. Chem. Theory Comput. 2013, 9, 1282−1293. (75) Brown, S. P.; Muchmore, S. W.; Hajduk, P. J. Drug Discovery Today 2009, 14, 420−427.

296

DOI: 10.1021/acs.jctc.5b00864 J. Chem. Theory Comput. 2016, 12, 281−296