A Comparison of Quantum and Molecular Mechanical Methods to

May 9, 2017 - Department of Discovery Chemistry, Genentech, Inc., 1 DNA Way, South San Francisco, California 94080, United States. J. Chem. Inf. Model...
0 downloads 0 Views 3MB Size
Article pubs.acs.org/jcim

A Comparison of Quantum and Molecular Mechanical Methods to Estimate Strain Energy in Druglike Fragments Benjamin D. Sellers,* Natalie C. James, and Alberto Gobbi Department of Discovery Chemistry, Genentech, Inc., 1 DNA Way, South San Francisco, California 94080, United States S Supporting Information *

ABSTRACT: Reducing internal strain energy in small molecules is critical for designing potent drugs. Quantum mechanical (QM) and molecular mechanical (MM) methods are often used to estimate these energies. In an effort to determine which methods offer an optimal balance in accuracy and performance, we have carried out torsion scan analyses on 62 fragments. We compared nine QM and four MM methods to reference energies calculated at a higher level of theory: CCSD(T)/CBS single point energies (coupled cluster with single, double, and perturbative triple excitations at the complete basis set limit) calculated on optimized geometries using MP2/6-311+G**. The results show that both the more recent MP2.X perturbation method as well as MP2/CBS perform quite well. In addition, combining a Hartree−Fock geometry optimization with a MP2/CBS single point energy calculation offers a fast and accurate compromise when dispersion is not a key energy component. Among MM methods, the OPLS3 force field accurately reproduces CCSD(T)/CBS torsion energies on more test cases than the MMFF94s or Amber12:EHT force fields, which struggle with aryl−amide and aryl−aryl torsions. Using experimental conformations from the Cambridge Structural Database, we highlight three example structures for which OPLS3 significantly overestimates the strain. The energies and conformations presented should enable scientists to estimate the expected error for the methods described and we hope will spur further research into QM and MM methods.



number of publications7−11 present findings after mining these databases, and some have developed empirical rules for predicting the preferred conformation of various molecular fragment geometries.12−15 The potential energy of molecular conformations can be compared with ab initio, semiempirical, and force field methods. Quantum mechanical (QM) ab initio methods solve the Schrödinger equation and, in the limit to infinite computational resources, can yield the exact gas phase energy of a conformation. Semiempirical methods solve the Schrödinger equation more quickly by introducing parameters which can yield deviations from the true energy. Force fields replace the quantum mechanical description of nuclei and electrons with a molecular mechanical (MM) model derived from classical mechanics where bond distances, bond angles, and dihedral angles are assigned energy profiles, and their contributions are combined to yield the conformational energy. Force field based energy calculations are extremely fast, but the accuracy depends strongly on the mathematical functions and the parametrization of the force field for a given molecule. In addition to the potential energy, the entropy of binding needs to be considered to evaluate the binding free energy that determines the affinity of a drug. Recently, several groups have described methods that

INTRODUCTION The optimization of the binding affinity (free energy) of a drug for a receptor is a major effort in drug design, as doing so can improve the efficacy for a given dose and potentially reduce side effects.1 The free energy of binding consists of multiple components: the ligand/receptor interaction energy, desolvation energies, and internal conformational energies. Thus, potency is improved if the sum of the contributions is optimized. During the binding event, the internal conformational energy of the ligand changes because the ligand/receptor interactions favor a ligand conformation that may be higher in energy than the conformation of the ligand in solution. This energy has been termed “strain energy.” Minimizing the change in conformational energy upon binding has been a successful strategy in optimizing or explaining potency.2−4 Thus, being able to quickly and accurately compute strain energy can be a powerful tool in improving lead potency. There are a number of approaches to identifying conformational preferences for a molecule. On the empirical side, the Protein Data Bank (PDB)5 and the Cambridge Structural Database (CSD)6 offer a wealth of experimental evidence of small molecule conformations, many of which may be assumed to be in a low energy conformation. While a number of methods and force fields derive protein and ligand parameters from experimental data, a review is beyond the scope of this work, and we therefore focus on methods which identify conformational preferences of druglike, small molecules. A © 2017 American Chemical Society

Received: October 10, 2016 Published: May 9, 2017 1265

DOI: 10.1021/acs.jcim.6b00614 J. Chem. Inf. Model. 2017, 57, 1265−1275

Article

Journal of Chemical Information and Modeling include an entropic component to the strain energy.16−18 This paper focuses on the potential energy component only. In order to compare the accuracy and resource requirements of QM and MM methods, it is important to reduce the problem to a set of simple test cases. Typical drugs, for example from the DrugBank19 with molecular weights between 200 and 500 Da, contain about five rotatable bonds on average (data not shown). If we were to compare methods using whole drugs, errors in these methods may be hidden or compounded by the many degrees of freedom, making it difficult to assess the causes of a method’s failure. In contrast, torsion energy analysis, the estimation of the energy profile of a 360° rotation around a single rotatable bond within small, druglike fragments enables the evaluation of methods across a range of energies and chemo-types with a reduced set of degrees of freedom. What is a meaningful difference in torsion energy between methods? We can start by asking, “What is a meaningful free energy difference in binding of a drug?” A free energy difference of 4.2 kcal/mol in binding affinity between two drugs results in a 1000-fold difference in their IC50. Thus, given a drug molecule with five rotatable bonds, if each rotatable bond contributes a strain free energy of 0.84 kcal/mol/rotatable bond (4.2 kcal/mol total), a very meaningful decrease in potency can be attributed to the internal strain. Thus, ∼0.8 kcal/mol was chosen as an approximate guide when comparing torsion scans with various methods below. The present study builds on a number of previous works which have investigated the torsional preferences and strain energies within small molecules. Hao et al.9 compared torsion angle observations of common fragments from the PDB to QM-calculated energies (B3LYP20/6-311++G**) and found good agreement. Brameld et al.7 summarized observed torsion angle preferences in the CSD and PDB. Halgren21 compared a number of MM force fields to QM (HF22/6-31G*) energy calculations. Allen et al.23 found low strain in a small set of small-molecule X-ray crystal structures while Perola and Charifson10 found significant strain using force field methods in a larger set of crystal structures. Valdes et al.24 performed a large comparison of QM methods on small peptides and in particular identified the need to properly account for dispersion when noncovalent, intramolecular interactions are involved. In this work, we present a test set of 62 fragments with druglike functional groups (Table 1) and perform a torsion scan analysis around a single rotatable bond in each fragment using several QM and MM methods. The highest level of theory used was the coupled-cluster with singles, doubles, and

perturbative triples, or CCSD(T),25 using the complete basis set extrapolation (CBS).26 Using this baseline method, we performed relaxed torsion scans. The torsion scans were executed by fixing the torsion of interest while optimizing all other geometrical degrees of freedom. In order to ensure that the lowest energy conformation at each step was identified, we used multiple starting conformations if other rotatable bonds were present in the molecule (including torsions involving hydroxyl and methyl groups). To compare accuracy and resource requirements of commonly used ab initio, hybrid functional, and semiempirical QM methods, as well as MM force fields, we evaluated differences between the CCSD(T)/ CBS torsion energy profiles and the profiles computed with 13 such methods. In addition, we report generally on the effects of solvation estimations in torsion calculations as well as method performance. The test set, including all minimized conformations and corresponding energies, is available in the Supporting Information. It is our hope that these results will help guide computational chemistry research groups as well as benefit the QM and MM force field development communities.



METHODS Test Set. A test set of 62 simple fragments containing rotatable bonds was constructed from the examples in Brameld et al.7 and Hao et al.9 The four atoms defining the torsion to be scanned were selected to match the torsions in the source publications. Additional fluorine modified fragments were added to expand the set with small perturbations (Table 2). Generation of QM Baseline Geometries and Energies. The programs and scripts developed at Genentech and used to perform this analysis have been released as part of the Chemalot open-source package.27 For each test molecule, a single starting conformation was manually constructed using the Builder interface in MOE.28 In some cases, multiple starting conformations were generated with bonds outside the fixed torsion rotated in order to ensure that the global minimum conformation was identified given the torsion constraint (see below). Thirty-six conformers at 10° increments around the torsion bond were generated for each starting conformer using the qTorsionMultiplexer.pl program. We then performed an energy minimization of each conformer with Gaussian 0929 using the MP230−35/6-311+G**36−38 method and basis set. The evaluated torsion was held fixed with the “D atom1 atom2 atom3 atom4 F” command. An example Gaussian input file is included in the Supporting Information (Figure S1). Each geometry-optimized conformer was then passed to the opensource QM package, Psi4,39 where a single point energy calculation was performed using the coupled-cluster with single, double, and perturbative triple excitations, CCSD(T),25 with a complete basis-set (CBS) approximation.26 The CCSD(T) energy with the aug-cc-pvdz basis set26 is extrapolated to the CBS limit using the MP2/aug-cc-pv[tq]z40 level of theory. The lowest CCSD(T)/CBS energy of all minimizations for each test case was taken as the global minimum and relative torsion profiles were generated by computing the relative energies to the global minimum at each torsion angle. The final energies were inspected in Vortex41 as a scatter plot of energy versus torsion angle. These CCSD(T)/CBS//MP2/6-311+G** profiles represent our gas-phase baseline calculations. We also generated a water-phase baseline set using MP2/6-311+G** with the Polarizable Continuum Method (PCM).42,43 CCSD(T)/CBS energies were not calculated for the water-phase baseline due to computational resource limitations.

Table 1. Summary of Torsion Test Cases

1266

DOI: 10.1021/acs.jcim.6b00614 J. Chem. Inf. Model. 2017, 57, 1265−1275

Article

Journal of Chemical Information and Modeling

Figure 1. Distribution of mean gas-phase energy differences ⟨ΔΔEMethod⟩ in kcal/mol between each method and the CCSD(T)/CBS baseline for the 62 test cases. Sixty-two points are plotted for each method. Each point is the difference in energy for a single test case between the listed gas phase method and the CCSD(T)/CBS baseline, averaged across the 36 conformations of that test case. Green boxes are QM single point energy calculations. Orange boxes are QM geometry optimizations, and blue boxes are MM geometry optimizations. The bold, horizontal, black line represents the median. The lower and upper bounds of the colored boxes demark the 25−75% range, and the “whiskers” denote either the min/max values or 1.5 times the interquartile range, whichever is closer to the median. The (+6) and (+1) labels indicate that there are six and one additional outliers (not shown for clarity) above 4.0 kcal/mol for the MMFF94s and Amber12_EHT methods, respectively. All energies are given in Table 2.

define the SDF tag with the torsion atom indices. Details of the parameters used by sdfMMMinimize.csh specific to each method are specified below. MOE (Chemical Computing Group):28 Energy minimizations were carried out using the sdminimize28 script provided by the Chemical Computing Group using the Amber12:EHT force field (extended Hückel theory48) with and without the generalized Born volume integral (GBVI)49 solvation model. Torsion angles were held fixed by specifying -restrainforce 5000 kcal/mol and -restrainDihAtoms parameters of sdminimize. Macromodel (Schrodinger, Inc.):50 Energy minimizations were carried out using the bmin program, configured with OPLS3.051 and OPLS200552 force fields with and without the generalized Born53 solvation model. Torsion angles were held fixed using the FXTA command with a force constant of 5000 kJ/mol. SZYBKI16 (OpenEye): Energy minimizations were carried out using the SDFTorsionScanner program, which employs the OEChem54 and SZYBKI toolkits16 from OpenEye to (1) read an SDFile containing the conformer which contained an SDTag with the indices of the atoms that define the torsion angle and (2) perform an energy minimization with the torsion angle held fixed using the SetTorsionConstraint function with a force constant of 15,700 kcal/mol. The evaluations were performed using the MMFF94s force field with and without the Sheffield solvation model. Data Analysis. We first converted the total energy at each angle, EtotAngle, into energies relative to the lowest energy of the 360° torsion scan ΔEAngle. Differences in relative energy between each method and the baseline method, ΔΔEAngle,method = ΔEAngle,method − ΔEAngle,baseline, were calculated at each torsion angle. We then calculated mean deviations for each method across all angles, ⟨ΔΔE Me thod ⟩ = ( 1 / 36 ) × ∑ 36angle s ΔΔEAngle,Method. These averages were then plotted as box plots using the R programming language55 with the median as a bold black line, 25−75% quartile extents as a solid box, and either the min/max values or 1.5 times the interquartile range, whichever is closer to the median, as “whiskers” (R default).

When additional rotatable bonds are present in the molecule, the minimization can result in the identification of a local minimum as opposed to the global minimum. This occurred when internal hydrogen bonds were not formed or when steric clashes were not relieved, even in simple cases where the additional rotatable bond was a hydroxyl or methyl rotor. In these instances, we computed the torsional scans with multiple starting conformations that differed in the dihedral angle of the additional rotatable bonds. Amides were flipped cis and trans; hydroxyls and methyls were rotated every 120°. The profiles from multiple starting conformations were combined by taking the single lowest energy conformer of all profiles at each angle, producing a single profile of 36, low-energy conformers at 10° increments. Torsion Profiles for Other Evaluated Methods. For each test case, the 36 conformations of the torsion energy profile at the CCSD(T)/CBS//MP2/6-311+G** baseline level served as input to either single point energy calculations or geometry optimizations using QM and MM methods with the torsion angle held fixed. The final energies and conformations of each minimization defined the torsion profile for each method, and then a deviation between each method profile and the CCSD(T)/CBS profile is reported at each angle. Evaluated QM Methods. All QM evaluations were performed with Psi4.39 The following QM methods were evaluated. Single point energies: MP2.X44/CBS (see Supporting Information Figure S4 for example input file), MP2/CBS, MP2/6-311+G**; Geometry optimizations: HF/6-311+G**,22 B3LYP/6-311+G**, 20 B3LYP-D3/6-311+G**, 45 AM1, 46 PM3;47 Hybrid: MP2/CBS//HF/6-31G*.30 As with the MP2(PCM) water-phase baseline, PCM was used with a subset of the above methods when assessing solvation. Evaluated Molecular Mechanics Methods. Minimizations were carried out using the chemalot program, sdfMMMinimize.csh, which is capable of running MM minimizations using MOE (Chemical Computing Group), Macromodel (Schrodinger), or SZYBKI (OpenEye). Torsion angles were held rigid by applying a strong harmonic potential using the “-fixTorsion true” and “-fixedAtomTag” options to 1267

DOI: 10.1021/acs.jcim.6b00614 J. Chem. Inf. Model. 2017, 57, 1265−1275

Article

Journal of Chemical Information and Modeling

Figure 2. Distribution of mean gas-phase energy differences ⟨ΔΔEMethod⟩ in kcal/mol between each method and the CCSD(T)/CBS baseline for the 62 test cases, by chemotype. Energies and methods are identical to those in Figure 1 but are divided into chemical classes. The MMFF94s ArylAmide box plot was cut off for clarity (third quartile = 6.5 kcal/mol, maximum = 8.6 kcal/mol).

B3LYP, and B3LYP-D345 are also quite good with increasing accuracy below 0.5 kcal/mol median average deviations from the baseline. The D3 dispersion correction did not increase accuracy significantly. The correct description of dispersion interactions has been shown to be important to describe noncovalent interactions of aromatic side chains.24 However, the rigidity of the model systems in this paper does not allow for this, and the D3 correction has a small effect on the torsion profiles in these examples. Interestingly, MP2/6-311+G** showed slightly poorer performance at just over 0.5 kcal/mol median average error. When considering the good results of MP2/CBS, we surmise that the correlation correction in MP2 must be more sensitive to the smaller 6-311+G** basis set than HF or B3LYP. All ab initio methods tested maintained a median average energy deviation less than the 0.8 kcal/mol per rotatable bond “significant difference” we discussed in the Introduction. Our analysis of the semiempirical methods shows larger average energy differences with CCSD(T)/CBS (⟨ΔΔEMethod⟩) than the ab initio methods mentioned above. The medians of the ⟨ΔΔEMethod⟩ for AM1 and PM3 are both above 1 kcal/mol, with the largest outlier ⟨ΔΔEMethod⟩ approaching 4 kcal/mol. These deviations are even more significant in comparison to our 0.8 kcal/mol per rotatable bond cutoff. The semiempirical method errors appear to derive primarily from the aryl test cases, as seen in Figure 2. We then questioned whether the methods are more or less accurate at low energies. One could argue that the most “important” conformations are those within low-energy wells because compounds rarely visit higher energy geometries. To help answer this question, we performed the same analysis on a truncated data set in which all conformers with a calculated energy >4.2 kcal/mol were removed. The value 4.2 kcal/mol was chosen as an arbitrary cutoff representing the energy difference between two states with a 1:1000 occupancy ratio at 298 K. As shown in Figure S2, the QM methods produced highly similar distributions in energy differences when only low energy conformers were analyzed as compared to the entire

OPL3 Strain Energy Evaluation for CSD Examples. In the CSD full-molecule strain energy calculations, “strain” in these cases is defined as the difference in energy between the global energy minimum and a minimized conformer close to the input conformation. The calculation was performed using the Chemalot script, sdfMMConfAnalysis.pl. The script first finds the global energy minimum by generating conformers using Omega13 (version 2.5.1.4) followed by minimization with Macromodel50 (OPLS3/GBSA). The lowest energy conformer identified was retained as a global minimum. The energy for the input conformer was calculated by minimizing the crystallographic conformation using Macromodel (OPLS3/GBSA) with the addition of a flat bottom restraint on the atom positions (FiX ATom function with 0.6 Å half-width and 21 kJ mol−1 Å−2).



RESULTS

Reliable and significant statistical comparisons between methods are difficult for several reasons: (1) Even with 62 test cases, we do not believe we are significantly sampling the chemical space of torsions. (2) Our sampling of fragments is biased, as some are chemically similar (e.g., collection of arylamides) and will produce similar results, thus biasing the conclusions. (3) Within a single fragment, we evaluate energies at torsion angle increments which are highly correlated; e.g., the energy at 10° is highly correlated to the energy calculated at 20°. Comparing ab Initio, Hybrid Functional, and Semiempirical QM Methods to CCSD(T)/CBS: Gas Phase. Our QM goals were (1) to identify an accurate “mid-range” ab initio method by evaluating the more recent MP2.X and MP2/CBS methods as well as the standard Hartree−Fock (HF) method and B3LYP functional and (2) to find a fast but still reasonably accurate method among the semiempirical methods, AM1 and PM3. The distributions of ⟨ΔΔEMethod⟩ in Figure 1 show that MP2.X and MP2/CBS offer the best reproduction of the CCSD(T)/CBS baseline energies with a median average deviation less than 0.3 kcal/mol across all test cases. HF, 1268

DOI: 10.1021/acs.jcim.6b00614 J. Chem. Inf. Model. 2017, 57, 1265−1275

Article

Journal of Chemical Information and Modeling Table 2. Mean Deviation of Torsion Profile Energiesa

1269

DOI: 10.1021/acs.jcim.6b00614 J. Chem. Inf. Model. 2017, 57, 1265−1275

Article

Journal of Chemical Information and Modeling Table 2. continued

The ⟨ΔΔEMethod⟩ energies are in kcal/mol relative to the CCSD(T)/CBS method and averaged over the 36 steps of the torsion profile. The torsion evaluated is highlighted by bold bonds. a

data set in Figure 1. This suggests that the errors in reproducing CCSD(T)/CBS baseline energies are distributed equally between low and high energy conformations. Comparing Molecular Mechanics Methods to CCSD(T)/CBS: Gas Phase. The goal in evaluating the molecular mechanics force fields was to identify a satisfactory approach for fast energy evaluations toward estimating internal strain energy. OPLS3 showed excellent reproduction of CCSD(T) energies, with a median energy difference close to that of MP2/6311+G**; however, over 1 kcal/mol average energy differences (⟨ΔΔEOPLS3⟩) were found for some cases. OPLS2005 and MMFF94s (szybki) had larger median energy differences over OPLS3 but with a median still below 1 kcal/mol. MMFF94s and Amber12:EHT produced more high-energy outlier cases than the OPLS and the ab initio QM methods. The Amber12:EHT force field in MOE produced slightly higher energy differences, with a median of over 1 kcal/mol and with even more high-energy (>2 kcal/mol) outliers. All force fields tested had increased errors with the aryl amide cases (Figure 2), especially MMFF94s. This appears to be due to overly large energy penalties when the cis amide geometry is preferred at some torsion angles. More discussion can be found below. The Amber12:EHT force field additionally produced increased ⟨ΔΔEAmber12:EHT⟩ energies in the aryl test cases. As with the QM methods, we looked at the distribution in energy differences for only low energy conformations. In contrast to the QM methods, the force fields (Figure S2) show reduced errors when only the low energy conformers are considered. In particular, Amber12:EHT shows a large reduction of nearly 0.4 kcal/mol median average differences in energy when considering low energy conformations only. The reduction suggests that these force fields are producing larger deviations from CCSD(T)/CBS at high energies than at low energies. This may be attributed to the use of low energy conformations in the parametrization of the force fields or oversimplification of training use cases which do not take into account complex steric interactions. Role of Implicit Solvent. A secondary goal was to evaluate the ability of QM and MM methods to approximate torsion energies from geometry optimization using MP2 with the polarizable continuum model (PCM) to approximate the role of water. We note that the evaluation is not ideal due to a lack of consistent solvent models across the MM force fields (see Methods), and we used MP2/6-311+G** instead of CCSD(T)/CBS calculations with solvation as a baseline to conserve computing resources. As seen in Figure S3, the rank order of methods reproducing the MP2/6-311+G**(PCM) baseline remains largely the same as with the gas phase CCSD(T)/CBS evaluation. The Amber12:EHT with GBVI in particular shows a reduction in the number of outliers with ⟨ΔΔEAmber12:EHT⟩ differences over 3 kcal/mol. 2-(Methylsulfonyl)thiazole Gas Phase Example. We highlight an example test case with the 2-(methylsulfonyl)thiazole. The torsion is defined by the ring nitrogen, ring carbon, sulfonyl sulfur, and the methyl carbon. In Figure 3A, the CCSD(T)/CBS baseline energies clearly identify two symmetric minima close to −50 and +50° which correspond to the sulfonyl oxygens lying in plane with the thiazole ring sulfur

Figure 3. Torsion profile for 2-(methylsulfonyl)thiazole. (A) Relative energy ΔΔEAngle,Method (kcal/mol) versus torsion angle for a subset of methods tested. The evaluated torsion is highlighted by bold bonds in the inset. The CCSD(T)/CBS baseline energies are in black triangles. (B) 3D depictions of the CCSD(T)/CBS global minimum energy conformation at −50° and the higher energy conformation at 0°. (C) Small-molecule X-ray structure containing the 2-(methylsulfonyl)thiazole substructure from the CSD (ID: PAVDOF).

(Figure 3B). These low energy conformations have been identified previously56 as a favorable sulfur−oxygen interaction. The B3LYP QM method and the OPLS3 MM methods correctly reproduce the baseline energies with little deviation. However, the MMFF94s and Amber12:EHT force fields both show large energy differences to CCSD(T)/CBS but also identify a single minimum at 0° (Figure 3B). The −50° minimum identified by CCSD(T)/CBS, MP2, B3LYP, and OPLS3 is observed in the only 2-(methylsulfonyl)thiazole small-molecule X-ray structure in the Cambridge Structure Database (CSD; Figure 3C). N-Phenylacetamide Gas-Phase Torsion Scan Example. When generating the baseline CCSD(T)/CBS conformations and energies, we found in many cases that a simple relaxed torsion scan, in which the molecule is minimized at each orientation of the fixed torsion, would not always find the lowest possible energy. Instead, the minimization frequently was trapped in a local minimum. Therefore, we additionally sampled rotatable bonds outside of the torsion being evaluated in order to identify the lowest energy conformer at each torsion angle. An example is N-phenylacetamide. In Figure 4A, we observe two torsion scans using CCSD(T)/CBS//MP2/6311+G**, one starting with a trans amide geometry and one with a cis. We observed two very different energy profiles. Interestingly, for angles close to 90° out of plane, the lower energy conformer is a cis amide. Thus, the torsion “profile” for this test case is constructed using the lowest energy conformer between the trans and cis profiles at each angle. This requires a flip of the amide bond as we rotate around the phenyl carbon− nitrogen torsion. This was also observed in several of the other aryl-amide test cases. We see varying results with the QM and MM methods attempting to reproduce the N-phenylacetamide trans/cis amide 1270

DOI: 10.1021/acs.jcim.6b00614 J. Chem. Inf. Model. 2017, 57, 1265−1275

Article

Journal of Chemical Information and Modeling

Figure 4. Example torsion profiles for N-phenylacetamide. (A) Torsion scans at the CCSD(T)/CBS level for trans and cis amide conformations. The resulting lowest-energy scan is thus the minimum energy at each angle (solid triangles). (B) Relative energies ΔΔEAngle,Method (kcal/mol) are shown for a subset of methods. The black CCSD(T)/CBS profile in B is identical to the lowest energy profile in A.

CCSD(T)/CBS baseline scan. In Figure 4b, the low energy CCSD(T)/CBS baseline from Figure 4a is shown (black line). While all methods capture the overall baseline profile, including global minima, some do better than others at reproducing energies when the amide prefers the cis orientation between −120° and −60°. MMFF94s greatly overestimates the energetic barrier. B3LYP overestimates the barrier to lesser degrees. MP2/6-311+G**, OPLS3, and Amber12:EHT slightly underestimate these cis amide energies. Performance Evaluation. In addition to our desire to find accurate methods, we would like to identify methods which are fast and sufficiently accurate for most cases. We have two drugdiscovery use cases when we perform energy evaluations: (1) Less frequently, knowledge of the relative energy of two conformers will influence a decision with a high experimental resource cost (e.g., synthesizing a core change in a drug), in which case we are willing to wait for more accurate results that take minutes to hours to compute. (2) During real-time, interactive design meetings, many ideas need to be prioritized, and results from one evaluation influence the next idea. In this case, we want results in seconds and delivered within our molecular modeling tool.57 Some loss of accuracy is acceptable in this latter case. All timings should be viewed as a rough estimate for relative comparison. While they have been measured on the same modern hardware, timings depend on the application used, for example, on how the method scales with the number of atoms and what basis function and starting geometry (for optimization) are used. As seen in Figure 5A, MP2/CBS is roughly 1.5 times faster than MP2.X/CBS and shows comparable results as seen in Figure 1. In turn, a Hartree−Fock geometry optimization took the same amount of time as single point energy calculations using MP2/CBS. HF is more than 3 times faster (ratio of medians) than a B3LYP geometry optimization for the N-phenylacetamide case in our hands. As expected, the B3LYP and B3LYP with D3 correction have equal median times since the D3 correction is much faster than the DFT energy calculation. Interestingly, however, we observed differences in variability between the two methods,

Figure 5. Energy calculation times per torsion angle for the Nphenylacetamide case. For each QM method (A), the distribution of CPU times is plotted in minutes for calculating the single point energy (green) or geometry optimization (orange) at each of the 36 angles. The CCSD(T)/CBS single-point energy calculation took a median of 16.1 h per torsion for this test case but is not shown for clarity. For the MM methods (B), the CPU time in seconds, averaged over 10 executions and over the 36 torsion angles, is reported. 1271

DOI: 10.1021/acs.jcim.6b00614 J. Chem. Inf. Model. 2017, 57, 1265−1275

Article

Journal of Chemical Information and Modeling

conformations highlighting “OPLS limitations” and are shown in Figure 6. In the FOPDAQ (CSD code) case, OPLS3(GBSA)

which we found were due to variation in the number of optimization steps between the two methods. Finally, in most real-world use cases, we will need to perform a geometry optimization, and so we evaluated a hybrid approach where Hartree−Fock is used for geometry optimization and MP2/ CBS for computing the final energies. The method is described below. The molecular mechanics methods tested show a similar story. The more accurate OPLS3 is roughly 5 times slower (ratio of medians) than the SZYBKI implementation of MMFF94s and MOE’s Amber12:EHT on the N-phenylacetamide case. Though all energy minimizations complete in less than a second, these differences in timing become meaningful with larger molecules or with large libraries of molecules. Additionally, in the case of OPLS3, our perexecution license limits the number of minimizations we can run in parallel. As a result, the total execution time for a torsion scan with OPLS3 is increased, thus hindering a medicinal chemist’s ability to evaluate conformations in the real-time design use case mentioned above. A Hybrid ab Initio Method: Hartree−Fock Geometry Optimization with MP2/CBS Single-Point Energy Calculation. As a result of the increased performance of the Hartree−Fock method over B3LYP, we evaluated whether a fast geometry optimization using Hartree−Fock with a smaller basis set, followed by a single-point energy calculation using MP2/CBS, would further improve on the HF results at a reasonable performance. Using a geometry optimization at the HF/6-31G* level followed by a single-point energy calculation at MP2/CBS (MP2/CBS//HF/6-31G* abbreviated as MP2/ CBS//HF), we found that this hybrid method very closely reproduces the energies from the CCSD(T)/CBS baseline, with less than a 0.3 kcal/mol median average error (Figure 1) and with significantly improved performance (Figure 5). MP2/ CBS//HF/6-31G* is roughly twice as slow as HF/6-311+G** but still twice as fast as B3LYP/6-311+G**. In summary, this hybrid approach is a fast compromise with energies consistent with CCSD(T)/CBS. Computing Strain Energies with OPLS3 on Druglike Molecules in the CSD. To ascertain whether the good performance of OPLS3 may extend to more complex molecules, we computed strain energies (see Methods) for druglike molecules in conformations observed in small molecule crystals. Two reasons could explain high, calculated strain energies within small molecule crystal structures: (1) a crystal packing interaction that induces a high energy conformation and (2) inaccuracies of the OPLS3 force field. Most examples showed strain energies