COSMO-RS

Publication Date (Web): November 28, 2017 ... Finally, our data sets based on the rigorous quantum chemical approach allow us to formulate a composite...
13 downloads 9 Views 2MB Size
Subscriber access provided by UNIV OF NEW ENGLAND ARMIDALE

Article

Macrocycle Conformational Sampling by DFT-D3/COSMO-RS Methodology Ondrej Gutten, Daniel Bím, Jan #ezá#, and Lubomir Rulisek J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.7b00453 • Publication Date (Web): 28 Nov 2017 Downloaded from http://pubs.acs.org on November 29, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Information and Modeling is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Macrocycle Conformational Sampling by DFTD3/COSMO-RS Methodology Ondrej Gutten, Daniel Bím, Jan Řezáč, Lubomír Rulíšek* The Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Gilead Sciences Research Center & IOCB, Flemingovo náměstí 2, 166 10 Praha 6, Czech Republic

ABSTRACT. To find and calibrate a robust and reliable computational protocol for mapping conformational space of medium-sized molecules, exhaustive conformational sampling has been carried out for the series of seven macrocyclic compounds of varying ring size and one acyclic analogue. While five of them were taken from the MD/LLMOD/force-field study by Shelley and coworkers (Watts, K. S.; Dalal, P.; Tebben, A. J.; Cheney, D. L.; Shelley, J. C.: Macrocycle Conformational Sampling with MacroModel. J. Chem. Inf. Model. 2014, 54, 2680–2696), three represent potential macrocyclic inhibitors of human cyclophilin A. The free energy values (GDFT/COSMO-RS) for all conformers of each compound were obtained by the composite protocol based on the in vacuo quantum mechanics (DFT-D3 method in a large basis set), standard gasphase thermodynamics and the COSMO-RS solvation model (QM+COSMO-RS). The GDFT/COSMO-RS values were used as the reference for evaluating performance of conformational sampling algorithms: standard and extended MD/LLMOD search (simulated annealing

ACS Paragon Plus Environment

1

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 48

molecular dynamics with the low-lying eigen-vector following algorithms; employing OPLS2005 force field + GBSA solvation) available in Macromodel and the plain molecular dynamics sampling at high temperature (1000K) using the semiempirical potential, SQM(PM6D3H4/COSMO) followed by the energy minimization of the snapshots. It has been shown that the former protocol (MD/LLMOD) may provide more complete set of initial structures that ultimately leads to identifying a greater number of low-energy conformers (assessed by the GDFT/COSMO-RS) than the latter, i.e. plain SQM MD. The CPU time needed to fully evaluate one medium-sized compound (~100 atoms, typically resulting in several hundreds or few thousands of conformers generated and treated quantum mechanically) is approximately 10 000 – 100 000 CPU hours on today’s computers which transforms to 1-7 days on a small-sized computer cluster with few hundred nodes. Finally, our data sets based on the rigorous quantum chemical approach allow us to formulate a composite conformational sampling protocol with multiple checkpoints and truncation of redundant initial structural data that offers superior insights at affordable computational cost.

ACS Paragon Plus Environment

2

Page 3 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

1. Introduction Macrocycle-containing molecules have been recognized as an interesting, yet underexploited, class of molecules for medicinal applications.1,2,3,4 They account for more than one hundred drugs available on the market.5 Considerable effort has been exerted to rationalize the protocol for design of macrocyclic compounds6 and computational chemistry has been often used (as the only tool) to map the conformational space of macrocycles which is one of key ingredients of the design.7,8,9 From the physico-chemical point of view the binding of macrocycles to target proteins may, in general, be accompanied by lower entropic penalty in comparison with the acyclic- analogues of the same size, which are presumably more flexible. Therefore, the overall free energy change associated with the macrocycle binding can be more favorable. The price to be paid is, however, that only a limited number of conformations of the ring may fit into the binding site. As a consequence, more elaborate design effort might be needed which can be greatly assisted by modern computational techniques.6 A need for effective conformational sampling methods and approaches to tackle related problems have been recently reviewed.10 Specifically, for macrocycles with the typical ring sizes of several tens of atoms (20-30) and many side chains outside of the ring pose, tens to hundreds of distinct conformations can be found in a relatively narrow energy window (0-40 kJ.mol-1 from the global minimum).8,9 Thus, an efficient exploration of exponentially growing coordinate space requires robust and smart sampling algorithm,11 as well as reliable evaluation of conformers’ energy.12 Exhaustive search methods are only feasible for small systems.13 For larger systems (including macrocycles) stochastic methods are preferable,14 such as various Monte Carlo-based algorithms15 exploring physically relevant parts of conformational space by perturbing low

ACS Paragon Plus Environment

3

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 48

energy structures. Complementary approaches, sometimes denoted as deterministic, are based on molecular dynamics (MD) methods, including steered or replica exchange molecular dynamics or quite popular methods exploring soft (low-lying) vibrational modes.16,17,18 These and many other problems concerning the efficient sampling of conformational space of macrocycles by force field methods were recently addressed in excellent studies carried out by Shelley and coworkers8 and by Chen and Follope.9 Both studies concluded that the MD/LLMOD (simulated annealing molecular dynamics followed by the low-lying eigen-vector algorithms coupled with pruning of the redundancies) is presumably superior to other approaches. Moreover, various technical parameters of the MD-based protocols (temperature, number of steps in the molecular dynamics, RMSD between the distinct structures, etc.) were thoroughly investigated in these studies. Last but not least, an emphasis was given on deletion of the structural memory of the studied systems which was realized by the 3D → 2D conversion of existing crystal structures and reconstruction of the structural information by the standard 2D → 3D convertors. However, from the rigorous quantum-chemical point of view, the missing component in both theoretical studies was the lack of benchmark values to compare the calculated energies of conformers with. Essentially, the only qualitative criterion were histograms showing how often the ‘bioactive’ conformation (i.e., conformation found in the crystal structure of the macrocyle+protein complex) was identified among low-lying conformers, assuming that it might represent or at least be close to the global energetic minimum (equilibrium geometry) of the isolated ligand. Also, none of the analyses involved a gas-phase thermodynamics that can be obtained from the normal-mode analysis and that gives certain indication about “local conformational entropy” of the given conformer.

ACS Paragon Plus Environment

4

Page 5 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

In this contribution, a careful calibration of the macrocycle conformational sampling algorithms is presented. The single-most important quantity used throughout this work is the free energy of the conformer defined as: GDFT/COSMO-RS = Eel(DFT-D3) + Gsolv(COSMO-RS) + EZPVE - RTln(qtransqrotqvib) + pV

(1)

where Eel(DFT-D3) is the electronic energy of the molecule in vacuo (throughout this work computed at the DFT level with empirical dispersion terms) Gsolv is the COSMO-RS solvation energy, EZPVE zero-point vibrational energy whereas RTln(qtransqrotqvib) are the entropic terms obtained from the rigid-rotor/harmonic oscillator (RRHO) or RRFRHO (FR, free rotor applied for low-lying vibrational modes), whereas pV (= RT) term is constant for each conformer. Allowing for variability in electronic structure and solvation, it represents a standard computational protocol that has been successfully used for calculations of many thermodynamic properties, such as activation energies and reaction thermodynamics,19 protein-ligand interaction energies,20 reduction potentials,21,22 pKa values,23,24 conformationally averaged QM calculations of NMR spectra (employing the SQM composite protocol with DFT-D3/COSMO-RS QM posttreatment)25 and often provided values near quantitative accuracy. As can be inferred from the benchmark studies of, among others, Grimme,26 Hobza,27 or Truhlar28 one may expect that errors in the above composite protocol can be lowered to few kcal.mol-1 (10-15 kJ.mol-1) and more likely even less when it comes to the comparison of neutral closed-shell molecules with the same chemical structure - conformers of the particular compound. Presumably, such free energy values should represent currently available benchmarks in computational studies of medium-sized systems and will be used as such in this work.

ACS Paragon Plus Environment

5

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 48

After calibration of the computed GDFT/COSMO-RS reference values, carried out on the set of 120 conformers (15 of each of the studied compounds), we compare two protocols to generate the initial set of conformers. In particular, the MD/LLMOD with the recommended and extended parameter set and plain MD at 1000K followed by collection of 2500 snapshots and their subsequent energy minimization. In the latter, potential energy is calculated on-the-fly at the SQM(PM6-D3H4/COSMO) level. In all searches, including the QM-post-processing, the sets of equilibrium geometries obtained are pruned for redundancies. By providing solid reference quantum chemical data to evaluate the currently used protocols for conformational sampling of medium-sized systems of ~100 atoms, we attempt to answer few questions related (macrocycle) conformational sampling, e.g.: (i) how many distinct local minima can we expect in the energy window of several tens of kJ.mol-1 from the global minimum, (ii) what might be the probability of missing one or several low-lying structures, (iii) what can be said about accuracy of the force field and SQM approaches when compared with the quantum chemistry, (iv) can we propose a robust, yet computationally affordable protocol that might be generally used in modelling studies where the accuracy of several kJ.mol-1 is required?

2. Computational Details Studied Systems. Eight compounds (34 to 116 atoms) are used throughout the study and are depicted in Figure 1. Five macrocyclic compounds of varying ring size were taken from the study of Shelley and coworkers8 and can be found in the Cambridge Structural Database (CSD). They are further denoted by their CSD codes: POXTRD, CAMVES, CHPSAR, COHVAW, YIVNOG. Of remaining three, two macrocycles are potential candidates for the inhibition of

ACS Paragon Plus Environment

6

Page 7 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

CypA, to some extent inspired by cyclosporin: sanglifehrin A (denoted here as SANGLI, which differs from the sanglifehrin A by truncation of the R1; i.e. R1 = H),29 and recently published developmental compound denoted here as Cpd_A (6 in Ref. 30) with KD = 64 nM in TR-FRET Cyp A binding assay. The last compound is the acyclic analogue (more precisely, synthetic precursor of Cpd_A), denoted as Cpd_B (KD > 10 µM).31

CAMVES

Cpd_A

CHPSAR

POXTRD

COHVAW

SANGLI

Cpd_B

YIVNOG

Figure 1. Chemical formulas of studied compounds.

MD/LLMOD Conformational Search. The conformational sampling (basic protocol) as implemented in MacroModel (Schrodinger 2014-2 suite32) has been performed using the default set of parameters: OPLS2005 force-field, GB/SA solvation model, 5000 quench cycles at 1000K/300K followed by up to 5000 large-scale low-mode search steps; energy window of 41.84 kJ.mol-1 (10 kcal.mol-1). For more technical details see Ref. 8, Table S5. The single exception to the default parameter set was the loosened threshold for conformer redundancy being set to RMSD = 1.0 Å (instead of default 0.75 Å). This version is referred to as the default

ACS Paragon Plus Environment

7

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 48

MD/LLMOD protocol (or “defMD/LLMOD”). The experimental crystal structures were used as starting structures for CSD systems (CAMVES, CHPSAR, COHVAW, POXTRD, YIVNOG and SANGLI) whereas Cpd_A and Cpd_B were converted from their chemical formulas (2D) to 3D using Maestro 2014-2 software. For the extended MD/LLMOD (or “extMD/LLMOD”), MacroModel (Schrodinger 2015-3 suite) and the following set of parameters was used: 10000 simulation cycles at 1000K/300K followed by up to 10000 large-scale low-mode search steps; energy window of 15 kcal.mol-1 (~62.76 kJ.mol-1). Moreover, all eight 3D structures were converted to 2D and back to 3D using Maestro 2015-3 and used as starting structures to delete the original structural information. Molecular Dynamics with Semiempirical (PM6-D3H4/COSMO) Potential. An alternative conformational sampling protocol was based on molecular dynamics using a semiempirical QM potential. For all the calculations, the PM6 method33 augmented with the D3H4 empirical corrections for London dispersion and hydrogen bonding (PM6-D3H4) was used.34 It was complemented with COSMO solvation model35, as implemented in MOPAC 2012.36 500 ps long molecular dynamics (velocity-Verlet algorithm, 1fs timestep and Berendsen thermostat) at high temperature of 1000K was used to generate 2500 equally spaced snapshots. Each snapshot was then optimized at the same level. This protocol was fully automated using the Cuby framework37 which also provided the D3H4 corrections and the MD engine. In all cases except Cpd_A and Cpd_B, crystal structures were used for starting geometry (the same as the defMD/LLMOD approach above). This protocol is referred to as PM6-MD throughout this work. Quantum Chemical Calculations. All quantum chemical calculations reported in this work were carried out employing the density functional theory (DFT) within the resolution-of-the-identity

ACS Paragon Plus Environment

8

Page 9 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(RI, alternatively termed the density fitting) approximation using the TURBOMOLE 7.0 program.38 The geometry optimizations were carried out using the Becke-Perdew 86 (BP86) exchange-correlation functional,39,40,41 Ahlrichs’ def-TZVP basis set,42 the empirical dispersion correction with zero-damping43 (denoted as D3) and conductor-like screening model (COSMO) for implicit solvation. Normal mode analyses (vibrational frequency calculations) were carried out at the BP86-D3/def2-SVP level and they were used to obtain gas-phase thermochemical data (EZPVE - RTln(qtransqrotqvib) + pV). For the test set of 120 conformers, vibrational frequencies were also computed numerically, using def-TZVP and def2-SVP basis sets and COSMO solvation model, since analytical second derivatives are not implemented for COSMO solvation model in TURBOMOLE 7.0. The single-point energies were calculated using the BP86-D3 and BLYPD339,44 functionals and the def2-TZVPD basis set (Eel term in Eq. 1).42 Solvation (free) energies of all studied species, Gsolv, were calculated using the COSMO-RS method45,46 (conductor-like screening model for real solvents) as implemented in the COSMOthermX16 program,47 employing

“BP_TZVPD_FINE_C30_1601.ctd”

parametrization

file,

FINE

cavities

($cosmo_isorad keyword), and the BP86-D3/def-TZVP//COSMO(εr = 80) optimized geometries (vide supra). The use of the so-called FINE parametrizations and the $cosmo_isorad keyword in the COSMO input is critical to prevent rare numerical failures of the protocol for conformers with very small ring cavity. The COSMO-RS calculations were performed according to the recommended protocol which includes BP86-D3/def2-TZVPD COSMO calculation with εr = ∞ (ideal conductor) and BP86-D3/def2-TZVPD in vacuo calculation as a prerequisite for final calculation in water. The Gibbs free energy was subsequently calculated according to Eq. 1 (vide supra) at p = 1 atm, T = 298.15 K. The collection of the last three terms in (1) is collectively referred to as μ⦵ throughout the rest or the study.Definition of a test-set. Various approaches and

ACS Paragon Plus Environment

9

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 48

computational protocols to calculate GDFT/COSMO-RS values (variations in the DFT functional used, gas-phase thermochemistry, solvation methods) were tested

on data set comprising 120

conformers, 15 per each of the 8 systems studied. The conformers were selected from the PM6MD runs, and cover the full energy spectrum of ~100 kJ.mol-1, thus avoiding a bias to the lowenergy structures. The data calculated for each conformer include various DFT gas-phase single-point energies (BP86, BLYP, B3LYP, TPSS, TPSSH; with and without D3 correction; def2-TZVPD basis set), BP86-D3/COSMO single-point energy, COSMO-RS solvation energy, ZPVE and entropic contributions based on various normal mode analyses. The frequencies are studied employing two basis set choices (def2-SVP, def-TZVP), either in gas-phase or implicit solvent (COSMO) environment, which implies that free energy values computed for a single conformer involve several geometry optimizations. If the normal mode with imaginary frequency could not be eliminated and its magnitude (modulus) greater than |i50| cm-1, the data point has been excluded from the analysis. The full account of computed data is deposited in the supporting information (Table SI_test.data) whereas the most important correlations and conclusions are discussed below (Section 3.1). A thorough calibration of the electronic structure methods against the CCSD(T)/CBS (or DLPNO-CCSD(T)/CBS) benchmark values for a more extensive set of medium-sized macrocycles and peptides, including the 120 conformers studied in this test set (denoted MP-CONF196 data set) has been presented elsewhere.48 In short, it has been shown that the accuracy of the computed Egp values is quite similar for most of the standard dispersioncorrected (D3) functionals with the error bar of approximately 5 kJ.mol-1. Energy filters. Two energy filters were used in case of extended MD/LLMOD composite protocol in order to reduce the number of conformers to be processed. The first of these was a

ACS Paragon Plus Environment

10

Page 11 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

100 kJ.mol-1 energy window of PM6-D3H4/COSMO energy, prior to DFT optimization and applied for YIVNOG system only due to the largest amount of conformers (4364). The other filter was applied after DFT optimization on all systems and consisted of an energy window of 65 kJ.mol-1 of BP-86/COSMO energy. Pruning methods. Elimination of redundant conformers has been performed using two criteria. Firstly, two conformers could only be considered similar if their energy did not differ by more than 1 kcal.mol-1 (4.184 kJ.mol-1). The energy function in question depended on context, either OPLS2005/GBSA as implemented in Schrodinger suite; PM6-D3H4/COSMO as implemented in MOPAC 2012, or Gibbs free energy as defined in (1). The second criterion is structure similarity given by RMSD measure of heavy atoms and polar hydrogens (effectively hydroxyl hydrogens). The invariance to renumbering of atoms of some systems (CHPSAR, YIVNOG) was recognized by the alignment algorithm. Unless stated otherwise, the threshold for similarity of these compounds was set to 1.0 Å. The pruning implemented in Schrodinger 2016-3 suite was used. Quantities. The most important quantity for further discussions of a set of conformers is the relative energy, i.e. difference in energy between a pair of conformers. We report mean absolute errors defined as:  =





 ∑  |∆ − ∆ |

(2)

where ∆ is a difference in a quantity in question for a pair of conformers (depending on context either energy from OPLS2005/GBSA, PM6-D3H4/COSMO, or μ⦵ from an alternative 

normal mode analysis method). ∆ method (usually  or μ⦵ /

is an equivalent quantity obtained from a reference

 ).

ACS Paragon Plus Environment

11

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 48

In one case, where a comparison for an individual conformer, rather than a pair, was desirable an alternative quantity named absolute centered error was used: ! = |( − # ) − (



− #



)|

(3).

Structures are compared based on RMSD of optimally superimposed structures. Only heavy atoms and hydroxyl hydrogens are used for fitting and calculation. Renumbering symmetry of systems (CHPSAR, YIVNOG) was respected. 

%&'(,* = + ∑ ||,( − ,* || .

(4)

3. Results and Discussion The central idea behind this contribution is formulation of the quantitatively accurate yet computationally efficient and affordable conformational sampling protocol via linking the low-cost conformer generating methods with the high-level DFT(+PCM) post-processing of the conformers.

ACS Paragon Plus Environment

12

Page 13 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 2. Schematic roadmap of a composite protocol. A circle represents an individual conformer. A rectangle represents a procedure performed on each individual structure. A vertical rounded box represents a procedure performed on a set of conformers.

The general outline of such protocol is presented in Figure 2 and comprises five major steps: (1) initial generation of conformers; (2) elimination of redundant and ‘unpromising’ conformers generated in step 1; (3) DFT-D3/COSMO(PCM) optimization; (4) normal mode analysis and thermochemistry; (5) elimination of redundant conformers using structures and energies from 3 and 4. The cornerstone of the protocol is represented by the high-level quantum chemical (+solvation) calculations (steps 3 and 4 above), and therefore, accuracy considerations and technical details are discussed in section 3.1 below. These calculations are used as benchmark values for evaluating three different sampling methods - step 1. The performance of sampling protocols is discussed in section 3.2. Section 3.3 discusses a proposed method for evaluating the state of convergence of a conformational sampling. Problems related to elimination of redundant conformers (steps 2 and 5) are discussed in section 3.4. Finally, we summarize the most

ACS Paragon Plus Environment

13

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 48

important choices to be done in fine-tuning the protocol and briefly discuss the computational cost of the combined protocol in section 3.5. 3.1. Calibration of the DFT-D3/COSMO-RS Gibbs free energy values. DFT calculations constitute a high-level part of the protocol which comprises molecular geometry optimization, calculations of the gas-phase single point energy and solvation free energy, and frequency analysis that provides zero-point vibrational energy and thermal contributions to the overall free energy. The discussion of these components (terms) of the final GDFT/COSMO-RS is based on a test set defined in Section 2. Their interplay is discussed below, while a more detailed discussion is discussed separately in Supporting Information (Section S3.1). Low-lying vibrational modes and the treatment of imaginary frequencies. Thermal contributions are traditionally computed using a rigid-rotor harmonic oscillator (RRHO) model. The assumption of harmonicity is especially inappropriate for soft (low-lying) vibrational frequencies, grossly overestimating the vibrational partition function. An alternative approach, examined herein, is to apply the free rotor model for these frequencies, with a smooth switching function, e.g. as implemented by Grimme.49 A further common complication (~20% cases) in computing vibrational frequencies for larger and more complex systems is an occurrence of low imaginary frequencies (vide supra), quite often a numerical noise appearing as the consequence of standard DFT grids and cutoffs implemented in TURBOMOLE. A quick and dirty trick that circumvents the need for costly reoptimization with finer grids and weighting of the derivatives is to simply flip an imaginary frequency into its positive value. This approximation seems to introduce only a minor error (around 1 kJ.mol-1, see Section S3.1 for quantitative analysis and further discussion) and is

ACS Paragon Plus Environment

14

Page 15 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

therefore adapted throughout the. Only imaginary frequencies of 0-50 cm-1 are treated in this manner, as higher values cannot be attributed to numerical noise. Caution should be taken if multiple imaginary frequencies are present. Gas-phase versus solvent vibrational frequencies. A practical, commonly used, and straightforward, though not 100% rigorous, approach for calculating thermal contributions is to perform the frequency analysis in the PCM-like approximation at the equilibrium solvent geometry using identical potential (method) to the one used for optimization of the structure. This approach is relatively costly, especially if analytical second derivatives for the solvent model are not implemented, as has been the case for TURBOMOLE 7.0 and COSMO. An alternative approach is to re-optimize the ‘solvent’ structure in vacuo with subsequent frequency analysis. This can be performed with a smaller basis set to decrease computational cost. A comparison of these approaches with numerical def-TZVP/COSMO frequency analysis is shown in Figure 3.

ACS Paragon Plus Environment

15

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 48

Figure 3: The absolute centered error (ACE, Eq. 3) between various approaches in normal mode analysis and a reference (TZVP/COSMO numerical frequencies). As each approach requires a separate geometry optimization, the structures diverge from their reference. The orange and yellow lines represent MAE by only estimating or even neglecting μ⦵ , respectively, see below. This is done without using structural information, hence the values are depicted as horizontal lines. The “fitting” lines serve merely to demonstrate the general tendency for error to increase as the structures diverge.

ACS Paragon Plus Environment

16

Page 17 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

The mean ACE and MAE values between these approaches are reasonably low (~15% of the range) and comparable. Thus, the choice can be driven by the computational cost, which is lowest for gas-phase frequency analysis in a smaller basis sets (even at the extra cost of an e.g. def2-SVP geometry re-optimization). In the case of solvent frequency analysis with different basis sets, there is a clear relation between the magnitude of an error and divergence of the structures. In all cases, however, divergent structures with RMSD greater than 1.0 Å leads to significant increase of error and can be used as a diagnostic for reliability of the frequency analysis. While this does not pose a serious threat in our case, this observation is especially important for charged and highly polar neutral systems, where the gas-phase analysis of frequencies can turn out to be irrelevant to the solvent structure. Interplay of individual components of the free energy. A polar group can find a profitable intramolecular interaction - reflected by improved electronic (gas-phase) energy, or is exposed to the solvent, improving solvation energy. This dichotomy reflects in significant negative correlation (average correlation coefficient of -0.7; Figure 4) between gas-phase energy (Egp) and the solvation energy (Gsolv). Similarly, a tight intramolecular interaction improves electronic energy, but also increases curvature of the potential well, leading to inaccessible vibrational states and diminished partition function. This is similarly reflected in non-negligible negative correlation (average correlation coefficient of -0.4) between the Egp and μ⦵ . The third pair (Gsolv and , μ⦵ ) can be understood as a combination of the two effects mentioned above (average correlation of 0.3).

ACS Paragon Plus Environment

17

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 48

Figure 4: Pearson correlation coefficients between individual components of free-energy. The values are averaged over all eight systems in the test set.

This could be exploited for quickly estimating one of the quantities, while only investing resources into calculating the other. The requirements are high correlation and small range of the quantity being estimated. These conditions are met for the pair, Egp and μ⦵ , with the latter being the estimated component. Observing that a difference between the highest and lowest value of μ⦵ for a conformer of the particular compound is ~15 kJ.mol-1 and assuming a simple linear relationship with the electronic gas-phase energy (i.e., hypothetical correlation coefficient of -1), this approximation may result in MAE of 4.0 kJ.mol-1, compared to the “full” def-

ACS Paragon Plus Environment

18

Page 19 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

TZVP/COSMO normal mode analysis. This error is similar to cases, where frequency-analysis structure diverges beyond 1.0 Å of RMSD. Comparing this to other approximate, yet significantly more expensive frequency analysis methods discussed above, or disregarding the calculation altogether (MAE of 4.9 kJ.mol-1), this back-of-an-envelope approach merits for a competitive alternative (Figure 3) when the computational resources are limited. 3.2. Comparison of the conformational sampling protocols. Three distinct protocols for generating initial set of conformers are tested in terms of number of conformers found, ability to find low energy conformers and similarity to the crystal structure. The protocols are referred to as default MD/LLMOD, extended MD/LLMOD and PM6-MD, see Methods for details.

ACS Paragon Plus Environment

19

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 48

Figure 5: Total number of distinct conformers found by individual (non-composite) protocols after elimination of redundant conformers (step 2 in Figure 2). An energy window used was 10 kcal.mol-1 for defMD/LLMOD and PM6-MD, and 15 kcal.mol-1 for extMD/LLMOD.

ACS Paragon Plus Environment

20

Page 21 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Number of conformers found by sampling protocols. In order to simplify further discussion, we classify the systems based on their apparent flexibility (total number of conformers found) as small (POXTRD, CAMVES), medium (CHPSAR, COHVAW, Cpd_B) and large (Cpd_A, SANGLI, YIVNOG). This grouping is loosely related to the size of the systems. In order of increasing number of atoms: POXTRD, Cpd_B, COHVAW, CAMVES, CHPSAR, Cpd_A, SANGLI, YIVNOG. For a given system, the number of distinct conformers found by individual methods (vide Figure 5) is related to its size for smaller and middle-sized systems, but there are significant differences for the large systems - for MD/LLMOD sampling methods there is an expected increase in the number of conformers, but this is not the case for sampling by PM6MD. This is most likely due to insufficient sampling produced by the molecular dynamics, as discussed below. Conformational space characterization. High number of distinct conformers is merely an indirect indicator of the explored size of the conformational space. In most of the contexts, lowenergy conformers are of interest. To understand how quantity and “quality” are related, we examine the energy profiles (relative GDFT/COSMO-RS) of conformers found by individual protocols.

ACS Paragon Plus Environment

21

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 48

Figure 6: Representative energy spectrum shown as a cumulative distribution of GDFT/COSMO-RS values (left) and OPLS2005/GBSA initial energies (middle). The former results from convolution (*) of the latter with an approximatively normally distributed error function (right). The units are kJ.mol-1.

The shape of the spectrum results from convolution of two distributions. First is the preselection of conformers using lower precision method (OPLS2005/GBSA in this case). In general, there is a relatively low number of low-lying conformers (usually tens of conformers), but the number rises rapidly with increasing energy window. This profile can be rationalized using the model of additive force field, where individual terms are viewed as idealized geometry constraints. Respecting most of the constraints leaves only a few possible geometries, all with low relative energy. As we allow more and more structural parameters to stray from their ideal values, both the energy and the number of possible structures, rapidly grow. The same profiles are, however, obtained from PM6-MD protocol despite the fact that PM6-D3H4/COSMO energy is a nonadditive function. The “explosion” of conformers seems to stop at higher energy in some of the

ACS Paragon Plus Environment

22

Page 23 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

extended MD/LLMOD samplings, which is probably related to the design of the protocol which discriminates high-energy structures. The second factor that contributes to the shape of energy profile is the post-processing by a different method (DFT-D3/COSMO-RS). The energy functions of the sampling method (OPLS2005/GBSA or PM6-D3H4/COSMO) and the post-processing method (DFT-D3/COSMORS) are related by approximately normally distributed error. Thus, the resulting energy profile (vide Figure 6) can be modelled by convolution of the sampling method energy profile and a gaussian function. Low energy conformers. In many applications, one might be satisfied with identifying the global energy minimum of a particular compound. However, due to expected inaccuracy of the protocol, looking at all structures in some energy window above the global minimum for each system should provide a more robust overview. The window of 20 kJ.mol-1 was chosen, reflecting the expected accuracy of our composite protocol and allowing for a reasonable level of tolerance. The qualitative conclusions reached are not dependent on this choice.

ACS Paragon Plus Environment

23

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 48

Figure 7: Number of low-lying conformers found by individual composite sampling protocols. Discovery of a global minimum (i.e. lowest GDFT/COSMO-RS) is indicated by a diamond mark below the bars. An energy window used was 10 kcal.mol-1 for defMD/LLMOD and PM6-MD, and 15 kcal.mol-1 for extMD/LLMOD.

ACS Paragon Plus Environment

24

Page 25 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Measured by the number of low-energy conformers the extended MD/LLMOD protocol is superior to the other approaches, although only marginally compared to the default version (vide Figure 7). The use of the DFT/COSMO filter (and PM6-D3H4/COSMO filter in case of YIVNOG) was not visibly detrimental to its performance. However, it does miss the global minimum in case of the larger SANGLI and YIVNOG systems. Considering the significant disadvantage to the other two protocols, which were initialized with the crystal structures, this is a solid result. The PM6-MD protocol arguably fails in both respects. Global minimum is only found in a single case. Even for the smallest POXTRD system the global minimum was not found and although the number of low energy conformers is similar to that of the other two protocols, all of these are more than 15 kJ.mol-1 above the global minimum. The picture is even worse for the large systems, where, as already suspected from the low total amount of conformers, only a handful of low-energy conformers are found altogether. It should be noted that all three protocols use some flavor of molecular dynamics for generation of conformers, but the PM6-MD protocol uses 100times longer trajectory (compared to the default MD/LLMOD protocol). We conclude that despite the appeal of conceptual simplicity of pure molecular dynamics (combined with structural optimization of snapshots), this approach becomes impractical for conformational exploration even for medium-sized systems. Comparison to crystal structures. Given the expected error of the protocol, looking merely at the global minimum is not robust. Alternatively, we evaluate the protocols based on the ability to reproduce crystal structures. In absence of strong crystal effects (such as crystal packing), the crystal structure should be presumably quite close to a global minimum conformation. We use these cases for benchmarking purposes.

ACS Paragon Plus Environment

25

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 48

The crystal structures are available either as standalone molecules (CAMVES, CHPSAR, COHVAW, POXTRD, YIVNOG) or conformations of compounds in protein-ligand complexes (SANGLI, Cpd_A), sometimes called ‘bioactive conformations’. No crystal structure of the system Cpd_B is available. In line with previous procedure, we define a conformer to be similar to a reference crystal structure if it lies in the 20 kJ.mol-1 energy window and the mutual RMSD is within 1.0 Å. The performance of the standalone sampling protocols and the composite protocol is compared in Table S1. The default and the extended MD/LLMOD protocol are comparably successful. The composite versions are capable of identifying several conformers similar to crystal structures. This is encouraging, as the starting structures for the extended protocol were generated from 2D formulas using Schrodinger 2015-3 suite, rather than from crystal structures themselves, as was the case for the default protocol. The standalone protocols (without DFT post-processing) manage to identify some of these similar structures as well. However, there are a number of false positive hits, as well as false negative hits. The benefit of the composite protocol thus lies not only in providing more reliable results, but also in providing fewer candidates for relevant conformers, which simplifies interpretation of the results. he composite PM6-MD protocol is overall less successful than MD/LLMOD in identifying conformers similar to crystal structures. However, this is still encouraging, as the standalone PM6-MD protocol manages to find only a single similar structure. Considering that the crystal structures were used as starting structures for the molecular dynamics, this result is very disappointing. In conjunction with previous results we conclude that the potential energy surface

ACS Paragon Plus Environment

26

Page 27 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

of PM6-D3H4/COSMO method is inappropriate for conformational sampling. A similar observation has been made for conformational energies of small carbohydrates.50 It is obviously “unfair” to compare the size of a set (similar structures found by composite protocol) to its subset (similar structures found by both the composite and the original sampling protocol). However, even if we relax the criteria for the standalone sampling protocols (vide Table S1), the conclusion stands. The composite protocol is more successful in finding lowenergy conformers similar to crystal structures. Given the available experimental data, this is the strongest benchmark argument we can present for applying the composite protocol for conformational sampling. There are, on average, hundreds of conformers found for a system. Despite the conformational space being larger still, the fact that a structure (and often more than one) that is similar to the crystal structure is found is a significant result. New PRIME macrocycle conformational sampling algorithm. A final remark in this section concerns a new macrocycle conformational sampling algorithm implemented in the PRIME module of Schrödinger 2017-1 suite that became available at the time of the finalization of this work and is documented in the paper of Sindhikara and coworkers.51 The performance of the algorithm was tested on two compounds of our series: CAMVES and SANGLI with the “Thorough” option, with and without the option of sampling peptide bonds. For CAMVES, the performance of the algorithm was comparable to the Macromodel MD/LLMOD algorithm whereas in the case of SANGLI it failed to locate low-energy minima by ~20 kJ.mol-1 (measured by GDFT/COSMO-RS). Further testing of the new algorithm might be therefore needed. All structural information and energetics concerning the PRIME MCS algorithm (for CAMVES and SANGLI) are included in the Supporting Information.

ACS Paragon Plus Environment

27

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 48

3.3 Estimating size of conformational space. A method to estimate the size of the low-energy part of the conformational space based on hypergeometric distribution of repeatedly discovered conformers is provided and used to discuss the convergence of the sampling. Ideally, we would like to obtain all possible (or at least all low-energy) conformers from a conformational sampling. In such a case, a repeated search would provide the same conformers as the initial one. However, this is not the case even for methodologically similar searches (such as default and extended MD/LLMOD), where the number of structures found by both of the protocols is very low, indicating that the search has not converged. For rigorous estimation of the number of conformers using the overlap of two searches, we would need a probability distribution of discovering individual minima. This information is practically impossible to obtain. However, we can simplify the consideration by assuming that at least for low-energy structures this probability distribution is uniform. The assumption is justified by the following observation. The goal of the sampling method is to discover low-energy conformers. Hence, it is reasonable to expect that the probability of finding a specific conformation is a function of its energy, with low-energy conformers being more likely to be discovered than high-energy conformers. This expectation can be tested by comparing the ratio of low energy conformers in the overlap subset of two searches to the same ratio for individual searches. This dimensionless ratio is almost universally in the range of 1-4 (data not shown). In other words, low-energy conformers (i.e. within 20 kJ.mol-1 of the lowest conformer found) are up to four times more likely to be found repeatedly in independent searches than high-energy structures. Considering that the protocols aim to find primarily low-energy structures such a bias is to be expected. The bias does not seem

ACS Paragon Plus Environment

28

Page 29 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

to be particularly strong, however. This encourages us to explore the assumption that the bias within the group of low-energy conformers is even lower and, possibly, non-existent. Exploiting the assumption of uniform probability of drawing a low-energy conformer, the total number of low-energy conformers of a system and the confidence intervals can be inferred from hypergeometric distribution.52 These intervals, together with the actual number of conformers found, are shown in Figure 8.

Figure 8. Number of unique low-energy conformers found by default MD/LLMOD (dashed line) and extended MD/LLMOD (full line). A Chapman estimate (colored full line) together with 95% confidence intervals52 of the total number of low-energy conformers is indicated. The upper estimate for SANGLI and the lower estimate for YIVNOG are out of bounds of the graph.

ACS Paragon Plus Environment

29

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 48

The intervals and estimates of the total amount of low-energy conformers differ quite significantly for individual pairs of samplings, see Tables S2-S6. The PM6-MD protocol has been previously concluded as inferior, but it appears that it also samples a different part of a conformational space altogether, resulting in small overlap of discovered conformers. This points out to a breakdown of our initial assumption that all low-energy structures have equal probability of being discovered. Making inference from overlaps of substantially different protocols is dubious. On the other hand, defMD/LLMOD and extMD/LLMOD overlaps can provide insight into convergence of these samplings. The total number of low-energy conformers found by these two samplings is below the 95% confidence interval for all but the smallest systems (CAMVES and POXTRD). If the objective is to examine all low-lying conformers, further sampling is advised in all of these cases. In other words, even a combination of a default and extended protocol is unlikely to discover all low-energy conformations even for medium-sized systems. Although the actual total number of conformers is unknown, these conclusions correspond to the apparent flexibility and size of the systems as discussed in 3.2, providing indirect support for this method of inference. Since none of the individual protocols seems to be capable of discovering all low-energy structures, we propose an analogous inference method to be evaluated on-the-fly during a sampling and used as a criterion for termination of a conformational search. Even if this method is based on only partially justified assumptions, an approximate termination criterion based on the actual shape of a conformational space should be superior to a fixed number of steps. This is our recommendation for further improvements and implementations of conformational sampling protocols.

ACS Paragon Plus Environment

30

Page 31 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

3.4 Elimination of redundant conformers. Interpretation of sampling is dependent on the definition of what should constitute a unique structure. Ironically, there are many possible definitions of uniqueness. Throughout this work we have used a definition of similarity as RMSD of aligned heavy atoms and polar hydrogens being below a threshold of 1.0 Å, while at the same time, the energy difference is required to be below 1 kcal.mol-1. Here we discuss several possible aspects of these definitions and how they affect the quality of our sampling. However, we have also run a new set of MD/LLMOD samplings with 20,000 simulation cycles, 10 kcal.mol-1 energy window, and a redundancy criterion set at 0.01 Å. Thus, virtually all conformers found by the sampling are retained. Only CAMVES and CHPSAR systems were subjected to this procedure. Examining a relation between RMSD of two conformers and their difference in GDFT/COSMO-RS (Figure S1) shows that the RMSD of up to ca. 0.4 Å implies a small energy difference (below 10 kJ.mol-1) in vast majority of cases. Above 0.4 Å, the energy differences are usually larger and it is in these cases that the energy criterion takes effect.

Table 1: Lowest RMSD ( in Å ) to crystal structure throughout individual steps of the composite protocol. All

pre-DFT

DFT

post-DFT

conformers

pruning

optimization

pruning

CAMVES

0.51

0.70

0.71

0.77

CHPSAR

0.37

0.37

0.39

0.39

ACS Paragon Plus Environment

31

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 48

Table 1 shows minimum RMSD to crystal structure of our ensemble at different stages of the composite protocol (vide Figure 2). In the case of CAMVES, our default pruning leads to elimination of lowest RMSD structures both before and after DFT optimization step. Although other similar structures still survive in our ensemble, this does raise a question whether there is a favorable trade-off between the number of eliminated relevant and redundant structures. The number of retained unique low-energy conformers as a function of this threshold drops visibly in the range of 0.6 – 0.8 Å (vide Figure S2 for details), while the total number of uneliminated conformers decreases approximately linearly. We conclude that our choice of 1.0A was too generous and a threshold of 0.6 Å would have been more appropriate. An interesting observation is that the RMSD to crystal structure of the most similar structure (CAMVES set) actually increases after DFT optimization (from 0.51 to 0.80 Å). However, the number of dihedral angles of heavy atoms differing by more than 60º compared to a crystal structure actually drops for this structure from 4 (OPLS2005 geometry) to 0 (DFT geometry). Rather than interpreting small differences in RMSD, looking at number of differing dihedral angles can provide better structural insight. Furthermore, the number of dihedral angles that change during DFT optimization is very low (02 out of ca. 70) in ~90% of our cases. On the other hand, more than 50% of cases “wander” by more than 0.5 Å during DFT optimization with respect to the starting structure. Dihedral angles thus appear to be more invariant to optimization than RMSD. This is despite the fact, that both CAMVES and CHPSAR possess only a few heavy atoms outside of the macrocycle ring, which makes the correlation between the number of differing dihedrals and RMSD quite high (around 0.7), but is unlikely to be transferrable to other systems in general.

ACS Paragon Plus Environment

32

Page 33 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Number of differing dihedrals thus appears to provide a more robust criterion for uniqueness, both in comparing two different conformers and for correlating the force-field and DFT geometries. The increased resolution does, however, come with a price of increased number of conformers to be processed. 3.5. Energy filters and computational costs. As shown in section 3.2, the low-energy conformers comprise only a few percent of the total number of conformers generated. Processing all of the conformers represents a massive computational overhead. Significant savings can be achieved by reevaluating the relevance of a conformer after every computational step by setting up energy filters, i.e. thresholds for quantities obtained during individual computational steps. These criteria can eliminate a significant number of high-energy conformers while removing very few or no low-energy ones. The levels for these thresholds is system-dependent, but is explored herein to provide general guidelines. The decisions concerning the workflow can be divided into two broad categories – before and throughout the DFT treatment. Decisions prior to DFT treatment. Prior to DFT, we decide on what sampling protocol is used and potential application of energy filters that use affordable force-field or semiempirical methods. The problem with the low-level methods is the lack of robustness, rather than the magnitude of error. For example, an energy window of as little as 30 kJ.mol-1 using the OPLS2005/GBSA force field covers most of the low-lying conformers in most of the cases, whereas in other cases (CHPSAR, SANGLI, YIVNOG), there is very low or zero correlation between the OPLS2005/GBSA and final GDFT/COSMO-RS values. A possible explanation of this problem is the idea that the force-field surfaces may be flat but rough, resulting in multiple structurally similar minima with comparable energies. Thus, the

ACS Paragon Plus Environment

33

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 48

structures should still be qualitatively correct. This would result in significant reduction of the number of unique conformers after DFT optimization (vide step 5 of Figure 2). However, only around 10% of structures are eliminated by this step. Moreover, a reoptimization of DFT structures by a low-level method should tend to yield lower energies (compared to the initial low-level optimization). However, this is not observed either and, thus, we cannot support such a hypothesis. In summary, we do not recommend any elimination of conformers based on the low-level energy methods. This sampling step of the composite protocol should aim for maximum diversity, as opposed to accurate ranking of conformers. DFT Energy filters. The DFT portion of the protocol consists of 1) DFT optimization 2) DFT single-point calculations (including gas-phase and solvation energies) 3) gas-phase optimization and 4) frequency analysis. The relative computational cost of these steps is approximately 25% : 15% : 10% : 50%. 1)

DFT/COSMO energy in a def-TZVP basis set is available after optimization. MAE for

DFT/COSMO (with GDFT/COSMO-RS as a reference) ranged from 5.2 kJ.mol-1 (for COHVAW system) to 13.0 kJ.mol-1 (for YIVNOG system). The threshold of 65 kJ.mol-1 was tested for the extended MD/LLMOD protocol whereas for small and medium sized systems a 50 kJ.mol-1 threshold would have been sufficient. 2)

DFT/gas-phase and DFT/COSMO single-point calculations using a large basis set

(def2-TZVPD). The sum of gas-phase and solvation energy differs from the reference, GDFT/COSMO-RS, only by μ⦵ obtained from frequency analysis. MAE for this quantity (with GDFT/COSMO-RS as a reference) ranged from 3.1 kJ.mol-1 (for POXTRD system) to 6.3 kJ.mol-1

ACS Paragon Plus Environment

34

Page 35 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(for YIVNOG system), see section 3.1 for further discussion. This filter was not used, but a level of 30 kJ.mol-1 would have been appropriate, as can be seen in Figure 9. 3)

Gas-phase optimization does not provide further quantity for improving the estimate of

GDFT/COSMO-RS. However, should the structure diverge significantly from the solvent structure (beyond ~2.0 Å, see section 3.1) or change chemically (e.g. zwitterionic vs neutral form, intramolecular hydrogen transfer) it is reasonable to abandon the frequency analysis, as it is unlikely to provide reliable information. The effect of the first two thresholds on the number of processed conformers and number of discovered low-energy conformers is shown in Figure 9. The number of conformers for which the frequency analysis is performed can be reduced immensely, especially for the large systems. Number of low-energy structures lost to this approximation is almost zero. The overall effect of using the DFT/COSMO energy filter can be seen in Figure 10, which shows that the total computational cost of the extended MD/LLMOD protocol was comparable or even lower than the default MD/LLMOD.

ACS Paragon Plus Environment

35

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 36 of 48

Figure 9: Number of conformers to undergo normal mode analysis in the default MD/LLMOD composite protocol (10 kcal.mol-1 energy window) as a function of what energy filters are applied. The drop is most significant for the large systems. The portion of low-energy conformers (black outlines) is almost unaffected by the application of energy filters.

ACS Paragon Plus Environment

36

Page 37 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 10: Number of low-energy conformers as a function of total computational cost (in CPU days) for individual systems and composite protocols. The “fitting lines” serve merely to group identical systems.

ACS Paragon Plus Environment

37

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 38 of 48

CPU-time Considerations. The total time needed for a conformational analysis of a system naturally depends on its size. The time necessary for fully processing one conformer ranged from ca. 1.5 CPU hours (POXTRD) to 1.5 CPU days (YIVNOG). The full composite protocol cost will depend not only on the size of a system but also on details of sampling procedure and application of filters described above. For a computational cluster of ~300 CPUs a single day should be sufficient for analyzing a system of ~60 atoms. For larger systems (100-120 atoms), we face not only the increase of cost to several “cluster days” but also problems with convergence of sampling (see Section 3.4). These systems likely represent the frontier of accessibility of the described approach.

4. Conclusions In this work, we presented results of a comprehensive conformational study carried out on eight model systems: seven of them representing small- to medium-sized macrocycles and one acyclic analogue. To the best of our knowledge, this is the first study that have used the high-level free energy value provided by the composite protocol comprised by the calibrated DFT-D3 method, normal mode analysis and the subsequent thermochemistry, and COSMO-RS solvation energies (a state-of-the-art solvation method). We argue that such GDFT/COSMO-RS represent the top-notch values that contemporary quantum chemistry can provide (for the data sets comprised of several thousands of medium-sized systems). Thus, the immense amount of primary computational data deposited in the supplementary materials can be then used as the invaluable material for calibrating various lower-level (force field and semiempirical) methods in the field of conformational sampling. Last but not least, the GDFT/COSMO-RS values are the ‘objective’ values

ACS Paragon Plus Environment

38

Page 39 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(not burdened by system dependent parametrizations) that can be used to validate and compare various force-fields or SQM-based approaches that on their own suffer from the non-transferability. The proposed compound protocol consists of MD/LLMOD conformational sampling using e.g. OPLS2005/GBSA energy method as implemented in Schrodinger suite. Generated conformers are subsequently optimized with BP86-D3 in implicit solvent (COSMO model). The resulting structure is then subjected to single point DFT and COSMO-RS calculations, in order to obtain gas-phase electronic and solvation energy, respectively. In parallel, the structure is reoptimized in vacuum using a smaller basis set and subjected to frequency analysis in order to obtain zero-point vibrational energy and thermochemical data. Conformers are discarded with high reliability along the way based on newly obtained information and similarity to already obtained structures, which leads to significant decrease of computational cost. While frequency analysis in solvent is probably more relevant it might be computationally unfeasible. However, it might be necessary for highly polar and charged structures, cases where gas-phase and solvent structures diverge significantly. In either case, divergence of the two structures can be used for estimate of the error introduced in this way. In case of most neutral molecules, gas-phase frequency analysis, even with lower basis sets is sufficient. A common complication during frequency analysis is the presence of low-lying imaginary vibrations, arising most commonly due to numerical inaccuracies. Finer grids and calculation of weight derivatives is an extremely expensive solution to the problem and can be circumvented by flipping an imaginary frequency to a real positive value, either fixed or equal to the original

ACS Paragon Plus Environment

39

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 40 of 48

imaginary value. This approximation in combination with use of a free-rotor model, instead of the simple harmonic oscillator, prevents interpretation of numerical noise. The number of conformers grows rapidly with the size of an energy window. Yet, even systems with hundreds of conformers have only a handful of relevant, low-energy conformers. A crystal structure of a small molecule is likely to be similar to its unbound free solvent structure and this can be used as a criterion for evaluating conformational sampling methods. This conjecture is reproduced by our compound protocol in most cases. Individual sampling protocols, without the help of DFT treatment, perform significantly worse in this respect. The MD/LLMOD sampling protocol is more successful than PM6-MD protocol in the number of low-energy conformers found, discovery of global minima, and similarity to available crystal structures. Both OPLS2005/GBSA and PM6-D3H4/COSMO energy surfaces are quite different from GDFT/COSMO-RS. Thus, we attribute this difference in performance to the emphasis on locating diverse structures in MD/LLMOD, rather than the applied energy function. Assuming uniform probability of discovering low-energy conformers, we can estimate their total amount based on frequency of repeatedly discovered conformers. This is in turn useful for estimating state of convergence of a sampling procedure. Using this inference, we observe that even for the more efficient extMD/LLMOD procedure, the sampling is not converged for all but the smallest systems. Medium-sized systems (60-70 atoms) are expected to have approximately twice as many low-energy conformers in total, while the largest systems (100-120) atoms 10 or more times as many.

ACS Paragon Plus Environment

40

Page 41 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Sampling procedures use structural and energetic criteria for eliminating redundant conformers. RMSD only appears to be robust if small values (around 0.5 Å) are used. On the other hand, number of differing dihedral angles of heavy atoms offers more intuitive picture of similarity and also appears to be more invariant during reoptimization. The increased resolution has to be weighted against increased number of conformers to be processed. We conclude that the sampling part of the composite protocol should emphasize structural diversity, as opposed to accuracy in calculating energy and structure prior to DFT treatment. To maintain computational feasibility of the composite protocol it is beneficial to discard conformers that are unlikely to end up with low relative free energy. There are highly predictive information generated after every step of the DFT part of a protocol, i.e. DFT/COSMO energy obtained after optimization, single-point DFT energies and even gas-phase optimization. This allows for substantial savings, since a frequency analysis, performed at the very end of the workflow, constitutes approximately one half of the computational cost. In summary, the presented study clearly demonstrates that employing high-level quantum chemical methods together with the state-of-the-art solvation methods is conditio sine qua non for obtaining meaningful set of low lying conformers of medium-sized molecules. At the same time, we demonstrate that such a task is within the grasp of contemporary computational chemistry.

ASSOCIATED CONTENT

ACS Paragon Plus Environment

41

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 42 of 48

Supporting Information. Equilibrium geometries of all studied conformers, all primary energetic data, including thermochemistry data, solvation energies. This material is available free of charge via the Internet at http://pubs.acs.org. AUTHOR INFORMATION Corresponding Author *E-mail: [email protected] Notes The authors declare no competing financial interest. Author Contributions The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript. Funding Sources Grant Agency of the Czech Republic (Grant No. 17-24155S), Czech Academy of Sciences (RVO 61388963) ACKNOWLEDGMENT The project was initiated upon short-term scientific visit of LR to Gilead Sciences, Inc., and many insightful comments from Uli Schmitz, Petr Jansa, Johannes Voigt, William A. Lee, and Tomáš Cihlář are acknowledged. ABBREVIATIONS

ACS Paragon Plus Environment

42

Page 43 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

DFT, density functional theory; COSMO(-RS), conductor-like screening model (for real solvents); MD/LLMOD, simulated annealing molecular dynamics with the low-lying eigenvector following algorithms; SQM, semiempirical quantum mechanics. REFERENCES (1) Giordanetto, F.; Kihlberg, J.: Macrocyclic Drugs and Clinical Candidates: What Can Medicinal Chemists Learn from their Properties? J. Med. Chem. 2014, 57, 278-295. (2) Canales, A.; Nieto, L.; Rodriguez-Salarichs, J.; Sanchez-Murcia, P. A.; Coderch, C.; Cabrera, A. C.; Paterson, I.; Carlomagno, T.; Gago, F.; Andreu, J. M.; Altmann, K.-H.; Jimenez-Barbero, J.; Diaz, J. F. The Molecular Recognition of Epothilones by Microtubules and Tubulin Dimers Revealed by Biochemical and NMR Approaches. ACS Chem. Biol. 2014, 9, 1033-1043. (3) Schneider, G.; Reker, D.; Chen, T.; Hauenstein, K.; Schneider, P.; Altmann, K.-H.: Deorphaning the Macromolecular Targets of the Natural Anticancer Compound Doliculide. Angew. Chem. Int. Ed. 2016, 55, 12408-12411. (4) Steinmetz, H.; Li, J.; Fu, C.; Zaburannyi, N.; Kunze, B.; Harmrolfs, K.; Schmitt, V.; Herrmann, J.; Reichenbach, H.; Höfle, G.; Kalesse, M.; Müller, R.: Isolation, Structure Elucidation, and (Bio)Synthesis of Haprolid, a Cell-Type-Specific Myxobacterial Cytotoxin. Angew. Chem. Int. Ed. 2016, 55, 10113-10117. (5) Driggers, E. M.; Hale, S. P.; Lee, J.; Terrett, N. K.: The Exploration of Macrocycles for Drug Discovery—an Underexploited Structural Class. Nat. Rev. Drug Discov. 2008, 7, 608–624. (6) Coutsias, E. A.; Lexa, K. W.; Wester, M. J.; Pollock, S. N.; Jacobson, M. P.: Exhaustive Conformational Sampling of Complex Fused Ring Macrocycles Using Inverse Kinematics. J. Chem. Theory Comput. 2016, 12, 4674–4687. (7) Bonnet, P.; Agrafiotis, D. K.; Zhu, F.; Martin, E. J.: Conformational Analysis of Macrocycles: Finding What Common Methods Miss. J. Chem. Inf. Model. 2009, 49, 2242-2259. (8) Watts, K. S.; Dalal, P.; Tebben, A. J.; Cheney, D. L.; Shelley, J. C.: Macrocycle Conformational Sampling with MacroModel. J. Chem. Inf. Model. 2014, 54, 2680–2696. (9) Chen, I. J.; Foloppe, N.: Tackling the Conformational Sampling of Larger Flexible Compounds and Macrocycles in Pharmacology and Drug Discovery. Bioorg. Med. Chem. 2013, 21, 7878-7920. (10) Hawkins, P. C. D. Conformation Generation: The State of the Art. J. Chem. Inf. Model., 2017, 57, 1747–1756.

ACS Paragon Plus Environment

43

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 44 of 48

(11) Bruccoleri, R. E.; Karplus, M. Prediction of the Folding of Short Polypeptide Segments by Uniform Conformational Sampling. Biopolymers 1987, 26, 137-168. (12) Sitzmann, M.; Weidlich, I. E.; Filippov, I. V.; Liao, C.; Peach, M. L.; Ihlenfeldt, W.-D.; Karki, R. G.; Yulia V. Borodina, Y. V.; Cachau, R. E.; Nicklaus, M. C.: PDB Ligand Conformational Energies Calculated Quantum-Mechanically. J. Chem. Inf. Model. 2012, 52, 739–756. (13) Porta, J. M.; Ros, L.; Thomas, F.; Corcho, F.; Canto, J.; Perez, J. J. Complete Maps of Molecular Loop Conformational Spaces. J. Comput. Chem. 2007, 28, 2170–2189. (14) Leach, A. R. Survey of Methods for Searching the Conformational Space of Small and Medium-Sized Molecules. In ReViews in Computational Chemistry; Lipkowitz, K. B., Boyd, D. B., Eds.; VCH: New York, 1991. (15) Nakajima, N.; Nakamura, H.; Kidera, A.: Multicanonical Ensemble Generated by Molecular Dynamics Simulation for Enhanced Conformational Sampling of Peptides. J. Phys. Chem. B 1997, 101, 817–824. (16) Kolossváry, I.; Guida, W. C.: Low Mode Search. An Efficient, Automated Computational Method for Conformational Analysis:  Application to Cyclic and Acyclic Alkanes and Cyclic Peptides. J. Am. Chem. Soc. 1996, 118, 5011–5019. (17) Kamachi, T.; Yoshizawa, K.: Low-Mode Conformational Search Method with Semiempirical Quantum Mechanical Calculations: Application to Enantioselective Organocatalysis. J. Chem. Inf. Model. 2016, 56, 347–353. (18) Labute, P. Lowmodemd-Implicit Low-Mode Velocity Filtering Applied to Conformational Search of Macrocycles and Protein Loops. J. Chem. Inf. Model. 2010, 50, 792-800. (19) Bím, D.; Svobodová, E.; Eigner, V.; Rulíšek, L.; Hodačová, J.: Copper(II) and Zinc(II) Complexes of Conformationally Constrained Polyazamacrocycles as Efficient Catalysts for RNA Model Substrate Cleavage in Aqueous Solution at Physiological pH. Chem. Eur. J. 2016, 22, 10426-10437. (20) Ryde, U.; Söderhjelm, P. Ligand-Binding Affinity Estimates Supported by QuantumMechanical Methods. Chem. Rev. 2016, 116, 5520–5566. (21) Marenich, A. V.; Ho, J.; Coote, M. L.; Cramer, C. J.; Truhlar, D. G. Computational Electrochemistry: Prediction of Liquid-Phase Reduction Potentials. Phys. Chem. Chem. Phys. 2014, 16, 15068-15106.

ACS Paragon Plus Environment

44

Page 45 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(22) Bím, D.; Rulíšek, L.; Srnec, M. Accurate Prediction of One-Electron Reduction Potentials in Aqueous Solution by Variable-Temperature H-Atom Addition/Abstraction Methodology. J. Phys. Chem. Lett. 2016, 7, 7-13. (23) Alongi, K. S; Shields, G. C. Theoretical Calculations of Acid Dissociation Constants: A Review Article. In Annual Reports in Computational Chemistry (Volume 6); Wheeler, R. A.; Ed., Elsevier, 2010, p. 113. (24) Napagoda, M.; Rulíšek, L.; Jančařík, A.; Klívar, J.; Šámal, M.; Stará, I. G.; Starý, I.; Šolínová, V.; Kašička, V.; Svatoš, A. Azahelicene Superbases as MAILD Matrices for Acidic Analytes. ChemPlusChem 2013, 78, 937-942. (25) Grimme, S.; Bannwarth, C.; Dohm, S.; Hansen, A.; Pisarek, J.; Pracht, P.; Seibert, J.; Neese, F. Fully Automated Quantum-Chemistry-Based Computation of Spin–Spin-Coupled Nuclear Magnetic Resonance Spectra Angew. Chem. Int. Ed.2017, 56, 14763-14769. (26) Goerigk, L.; Grimme, S. A Thorough Benchmark of Density Functional Methods for General Main Group Thermochemistry, Kinetics, and Noncovalent Interactions. Phys. Chem. Chem. Phys. 2011, 13, 6670-6688. (27) Řezáč, J.; Hobza, P. Benchmark Calculations of Interaction Energies in Noncovalent Complexes and Their Applications. Chem. Rev. 2016, 116, 5038–5071. (28) Zheng, J. J.; Zhao, Y.; Truhlar, D. G. Representative Benchmark Suites for Barrier Heights of Diverse Reaction Types and Assessment of Electronic Structure Methods for Thermochemical Kinetics J. Chem. Theory Comput. 2007, 3, 569–582. (29) Zenke, G.; Strittmatter, U.; Fuchs, S.; Quesniaux, V. F. J.; Brinkmann, V.; Schuler, W.; Zurini, M.; Enz, A.; Billich, A.; Sanglier, J.-J.; Fehr, T. Sanglifehrin A, A Novel CyclophilinBinding Compound Showing Immunosuppressive Activity with a New Mechanism of Action. J. Immunol. 2001, 166, 7165-7171. (30) Steadman, V. A.; Pettit, S. B.; Poullennec, K. B.; Lazarides, L.; Keats, A. J.; Dean, D. K.; Stanway, S. J.; Austin, C. A.; Sanvoisin, J. A.; Watt, G. M.; Fliri, H. G.; Liclican, A. C.; Jin, D.; Wong, M. H.; Leavitt, S. A.; Lee, Y.-J.; Tian, Y.; Frey, C. R.; Appleby, T. C.; Schmitz, U.; Jansa, P.; Mackman, R. L.; Schultz, B. E.: Discovery of Potent Cyclophilin Inhibitors Based on the Structural Simplification of Sanglifehrin A. J. Med. Chem. 2017, 60, 1000-1017. (31) Schultz, B. E. personal communication. (32) Schrödinger Release 2014-2: MacroModel, Schrödinger, LLC, New York, NY, 2014. Schrödinger Release 2014-2: Maestro, Schrödinger, LLC, New York, NY, 2014

ACS Paragon Plus Environment

45

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 46 of 48

(33) Stewart, J. J. P. Optimization of Parameters for Semiempirical Methods V: Modification Of NDDO Approximations and Application To 70 Elements. J. Mol. Model. 2007, 13, 1173. (34) Řezáč, J.; Hobza, P. Advanced Corrections of Hydrogen Bonding and Dispersion for Semiempirical Quantum Mechanical Methods. J. Chem. Theory Comput. 2012, 8, 141-151. (35) Klamt, A.; Schüümann, G.: COSMO: A New Approach to Dielectric Screening in Solvents with Explicit Expressions for the Screening Energy and its Gradient. J. Chem. Soc. Perkin Trans. 2, 1993, 799-805. (36) MOPAC2012, James J. P. Stewart, Stewart Computational Chemistry, Colorado Springs, CO, USA, HTTP://OpenMOPAC.net (2012); (accessed 2012). (37) Řezáč, J. Cuby: An Integrative Framework for Computational Chemistry. J. Comput. Chem. 2016, 37, 1230; http://cuby4.molecular.cz. (accessed Dec 15, 2015) (38) TURBOMOLE V7.0 2015, a development of University of Karlsruhe and Forschungszentrum Karlsruhe GmbH, 1989-2007, TURBOMOLE GmbH, since 2007; (accessed Aug 15, 2015) . (39) Becke, A. D. Density-Functional Exchange-Energy Approximation with Correct Asymptotic Behavior. Phys. Rev. A 1988, 38, 3098-3100. (40) Vosko, S. H.; Wilk, L.; Nusair, M. Accurate Spin-Dependent Electron Liquid Correlation Energies for Local Spin Density Calculations: a Critical Analysis. Can. J. Phys. 1980, 58, 1200-1211. (41) Perdew, J. P. Density-Functional Approximation for the Correlation Energy of the Inhomogeneous Electron Gas. Phys. Rev. B, 1986, 33, 8822-8824. (42) Weigend, F.; Ahlrichs, R. Balanced Basis Sets of Split Valence, Triple Zeta Valence and Quadruple Zeta Valence Quality For H To Rn: Design an assessment of accuracy. Phys. Chem. Chem. Phys. 2005, 7, 3297. (43) Grimme, S.; Antony, J., Ehrlich, S., Krieg, H. A Consistent and Accurate Ab Initio Parametrization of Density Functional Dispersion Correction (DFT-D) for the 94 Elements H-Pu. J. Chem. Phys. 2010, 132, 154104. (44) Lee, C.; Yang, W.; Parr, R. G.: Development of the Colle-Salvetti Correlation-Energy Formula into a Functional of the Electron Density. Phys. Rev. B 1988, 37, 785-789. (45) Klamt, A. Conductor-like Screening Model for Real Solvents: A New Approach to the Quantitative Calculation of Solvation Phenomena. J. Phys. Chem. 1995, 99, 2224-2235.

ACS Paragon Plus Environment

46

Page 47 of 48 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(46) Klamt, A.; Jonas, V.; Buerger, T.; Lohrenz, J. C. W. Refinement and Parametrization of COSMO-RS. J. Phys. Chem. A 1998, 102, 5074-5085. (47) Eckert, F.; Klamt A. COSMOtherm, Version C3.0, Release 15.01; COSMOlogic GmbH & Co. KG, Leverkusen, Germany, 2014. (48) Řezáč, J.; Bím, D.; Gutten, O.; Rulíšek, L.: Towards Accurate Conformational Energies of Smaller Peptides and Medium-Sized Macrocycles: MPCONF196 Benchmark Energy Data Set. J. Chem. Theor. Comput. 2017, submitted. (49) Grimme, S.: Supramolecular Binding Thermodynamics by Dispersion-Corrected Density Functional Theory. Chem. Eur. J. 2012, 18, 9955-9964. (50) Marianski, M.; Supady, A.; Ingram, T.; Schneider, M.; Baldauf, C. Assessing the Accuracy of Across-the-Scale Methods for Predicting Carbohydrate Conformational Energies for the Examples of Glucose and α-Maltose J. Chem. Theory Comput., 2016, 12 , 6157–6168. (51) Sindhikara, D.; Spronk, S. A.; Day, T.; Borrelli, K.; Cheney, D. L.; Posy, S. L. Improving Accuracy, Diversity, and Speed with Prime Macrocycle Conformational Sampling. J. Chem. Inf. Model., 2017, 57 , 1881–1894. (52) Wang, W. Exact Optimal Confidence Intervals for Hypergeometric Parameters. J. Am. Stat. Assoc. 2015, 110, 1491-1499.

ACS Paragon Plus Environment

47

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 48 of 48

TOC GRAPHIC

ACS Paragon Plus Environment

48