Improving Accuracy, Diversity, and Speed with Prime Macrocycle

Jul 20, 2017 - The algorithms were evaluated in terms of accuracy (ability to reproduce the crystal structure), diversity (coverage of conformational ...
0 downloads 6 Views 2MB Size
Subscriber access provided by University of Florida | Smathers Libraries

Article

Improving Accuracy, Diversity, and Speed with Prime Macrocycle Conformational Sampling Dan Sindhikara, Steven A. Spronk, Tyler Day, Ken Borrelli, Daniel L. Cheney, and Shana L Posy J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.7b00052 • Publication Date (Web): 20 Jul 2017 Downloaded from http://pubs.acs.org on July 23, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Information and Modeling is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Improving Accuracy, Diversity, and Speed with Prime Macrocycle Conformational Sampling

Dan Sindhikaraʃ*, Steven A. Spronk§, Tyler Dayʃ, Ken Borrelliʃ, Daniel L. Cheney§, Shana L. Posy§ ʃ

Schrödinger, Inc., 120 West 45th Street, 17th Floor, New York, New York 10036, United States

§

Bristol-Myers Squibb Research and Development, Computer-Assisted Drug Design, Molecular Discovery Technologies, PO BOX 5400, Princeton, New Jersey 08543, United States

ABSTRACT

A novel method for exploring macrocycle conformational space, Prime Macrocycle Conformational Sampling (Prime-MCS), is introduced and evaluated in the context of other available algorithms (Molecular Dynamics, LowModeMD in MOE, and MacroModel Baseline Search). The algorithms were benchmarked on a dataset of 208 macrocycles which was curated for diversity from the Cambridge Structural Database, the Protein Data Bank, and the Biologically Interesting Molecule Reference Dictionary. The algorithms were evaluated in terms of accuracy (ability to reproduce the crystal structure), diversity (coverage of conformational space), and computational speed. Prime-MCS most reliably reproduced crystallographic structures for RMSD thresholds >1.0 Å, most often produced the most diverse conformational

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ensemble, and was most often the fastest algorithm. Detailed analysis and examination of both typical and outlier cases were performed to reveal characteristics, shortcomings, expected performance, and complementarity of the methods. INTRODUCTION Cyclic macromolecules, or macrocycles, comprise a diverse set of compounds with a wide range of physicochemical properties and biological activities.1 Macrocycles are a structurally broad class, including small molecules with cyclized linkages as well as large cyclic peptides that resemble protein loops.2 Their potential therapeutic utility has long been recognized since macrocycles are found in natural products such as macrolide antibiotics (e. g., erythromycin) and polyene antifungals (e. g., amphotericin B).3 More recently, synthetic macrocycles have achieved recognition for their ability to bind protein targets traditionally categorized as “undruggable” due to their comparatively flat, featureless surfaces, which interact poorly with smaller drug-like molecules.4, 5 Large macrocycles can form extensive interactions with a target binding site, and X-ray structural data demonstrates that macrocycle binding sites tend to be larger and more fully ligand-occupied than sites that bind smaller druglike molecules.6 As a consequence, macrocycles have the potential to access an increased range of targets compared to small molecules. For example, cyclic peptides have been identified that inhibit difficult protein-protein interaction targets4 such as the Ras-effector complex7 and transcription factor C-terminal binding protein dimerization.8 Due to their cyclic structures, macrocycles have unique topologies that may provide differentiated ADME profiles compared to acyclic small molecules or linear peptides. Cyclic peptides can permeate cellular membranes by masking polar surface area and forming intramolecular hydrogen bonds that stabilize permeable conformations.9-12 They may be more

ACS Paragon Plus Environment

Page 2 of 50

Page 3 of 50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

resistant to proteolytic cleavage13-15 and can have improved metabolic stability.16 As a result, macrocycles have been identified with excellent pharmacokinetic properties and oral bioavailability, including molecules with high molecular weights and polar features that violate typical “Rule-of-Five” measures of druglikeness.2, 9, 12, 17-20 Many macrocycles currently in clinical development were discovered using structure-based design approaches.19 Structure-based modeling of macrocycles typically requires diverse, if not complete, sampling of lower energy conformations. These conformations comprise the input for lead optimization via docking/binding pose prediction as well as ADME models that analyze surface properties. In order to generate such high-quality conformations, a conformational sampling engine must efficiently sample the universe of potential conformations and select a spanning set of lowenergy structures that represents the more likely conformational states. With large numbers of rotatable bonds and ring closure constraints, macrocycles pose sampling challenges for conformation generation methods that are not present for linear molecules. Recent studies have benchmarked the macrocycle sampling performance of methods that apply distance geometry,21 inverse kinematics (IK),22,23 Monte Carlo torsional sampling,24 and molecular dynamics combined with normal-mode or low-mode search steps.25, 26. Here, we present a novel algorithm, Prime-MCS, which avoids dependence on the input backbone structure geometry and efficiently generates diverse conformations of even very large macrocycles. We investigate Prime-MCS’s performance on several challenging macrocycle datasets and compare the new method to a set of other macrocycle sampling algorithms available in commonly used molecular modeling software packages. The benchmarking results and case studies highlight the potential of Prime-MCS as an

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

engine for generating diverse, high-quality conformational ensembles for challenging structures and also illustrate the method’s potential limitations. METHODS Prime Macrocycle Conformational Sampling algorithm Algorithm overview The Prime macrocycle conformational sampling algorithm (Prime-MCS) is designed specifically to overcome the kinetic barriers to torsional rotation within the closed ring of a macrocycle. The algorithm divides the macrocycle into two ”half-loops” and independently semi-exhaustively samples each half. This workflow is an adaptation of the Prime loop sampling workflow.27 A related adaptation was applied to polyketide permeability prediction and macrocycle RRCK permeability.10,11 Here, we use the Schrödinger implementation of this algorithm, Prime-MCS, as detailed below and summarized in Figure 1.

ACS Paragon Plus Environment

Page 4 of 50

Page 5 of 50

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 1. Prime-MCS workflow summary diagram. 1. Characterize the topology and bonds (see “Macrocycle topology detection”) a. Identify the connected chain representing the macrocycle “backbone” b. Identify R-groups branching off from the backbone c. Detect cross-links d. Detect multicycles e. Assign accessible torsions for each bond using pre-defined rotamer libraries (see “Rotamer libraries”) 2. Split backbone and sample half-loops a. Arbitrarily split the backbone into two half-loops (see “Spinroot”) b. Independently sample half-loops according to coarse-grained rotor types with backbone self-clash, and cross-link restraint dead-end elimination (see “Dead-end elimination”). Coarse graining is determined according to an adaptive resolution scan, starting at 180°, going down to as little as 10° (see step 3a below) 3. Re-form and cluster backbone conformations a. Attempt to pair half-loops to form a closed cycle with no backbone clashes and crosslink restraint violations (see section “Closure conditions below”); if not possible, incrementally increase resolution and return to step 2b. b. Cluster closed macrocycle backbones using distances of cross-backbone atom pairs (distance from each backbone atom to another atom halfway around the backbone ring) 4. R-group sampling and minimization

ACS Paragon Plus Environment

Page 6 of 50

Page 7 of 50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

a. Sample R-groups for each backbone cluster representative using rotamers determined by a branch-length-dependent resolution algorithm (see “Branch-length-dependent resolution”) b. Choose lowest-energy R-group conformation c. Minimize entire system (see “Minimization”) d. Repeat once from step 4a using minimized backbone conformation Macrocycle topology detection For the algorithm to function properly, the macrocycle topology needs to be identified so it can be sampled appropriately. The macrocycle backbone is identified by iterating over a list of all “Smallest subset of smallest rings” (SSSR) from largest to smallest28 with modifications as discussed in ref. 26. Macrocyclic rings are defined by iterating over all SSSR rings size 7 or greater, merging any rings with intersecting members and keeping the non-intersecting portion as the “major ring” and the intersecting portion as the “cross-link.” Any remaining “major rings” with 10 or more members are considered macrocycles. For sampling purposes, any cross-links are sampled as R-groups but are factored into the backbone determination as a distance restraint. If multiple large (≥ 10 atom) rings with no intersection are identified, the compound is termed a “multicycle.” In the current implementation, only one macrocycle within a multicycle is considered the “true” macrocycle backbone and the other(s) are considered part of a “side-chain” and therefore are not sampled as backbone atoms in Prime-MCS. In the rare case where the “true” macrocycle cannot be distinguished by size for multicycles or cross-linked macrocycles, we arbitrarily select the one containing the lowest atom index. Rotamer libraries

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The rotamer libraries assign predetermined sets of dihedral angles to atom sets that match corresponding SMARTS patterns. Assigning these libraries to recognized bonds dramatically decreases the search space for the algorithm. For example, an N-methylated peptide bond would be recognized by the smarts pattern: [!#1][N,n;!H]-[C,c;!r6;!r5;!r4](=O)[*], and the bond associated with the 2nd and 3rd matching atoms would be assigned two possible rotational states (0° or 180°, representing cis and trans amide bonds).

An initial set of patterns and

corresponding rotamer libraries was taken from a statistical analysis of torsional tendencies of canonical amino acids.30 Additional patterns were added to eliminate redundant sampling of symmetric torsions and to appropriately sample heterocyclic biaryl groups. In total, 33 specific patterns were included in the final algorithm (these patterns and the assigned rotamers are available as Supporting Information). Bonds not recognized in the library are treated as freely rotatable and thus the torsions are restricted only by the sampling resolution. Spinroot To remove bias of the sampling based on the arbitrarily chosen “split point” (see section 2a of the algorithm), an option was added to “spin” the root of the split point. Effectively, multiple jobs are spawned with the root spun evenly around the macrocycle backbone. These multiple jobs can be run in parallel with the final results being merged. In this study, although spinroot was applied, the multiple jobs were run serially. Initial benchmarking with the CSD subset demonstrated that parallelizable runs using spinroot 10 with 100 conformers per spinroot yielded RMSDmin values that were comparable to the values produced with a single run (no spinroot) and the same total number of conformers (1000). These settings (spinroot 10, 100 conformations per spinroot) were applied for all benchmarking runs. Dead-end elimination

ACS Paragon Plus Environment

Page 8 of 50

Page 9 of 50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Dead-end elimination is a technique to increase computational efficiency of permutative algorithms by eliminating all permutations containing any disallowed property as soon as it is detected.31-33 In the case of macrocycles, the permutation of all allowed rotamers is combinatorially vast. Here, dead-end elimination is implemented by detecting disallowed combinations of rotamers (e.g., ones that cause a clash or violate cross-link restraints), and curtailing computation of any further permutations containing those combinations. Cross-link restraints are implemented by setting a maximum distance between the two atoms which intersect the cross-link and macrocycle ring to the maximum distance the cross-link would permit given unstretched bond angles. Closure conditions Half-loop pairs are accepted as “closed” backbones when multiple criteria for closure are accepted. Paired half-loops are first conjoined on one end. If the pairs sterically overlap, the closure is rejected. The angle of closure, between the three backbone atoms forming the opposite closure point, is rejected if it varies by 25 degrees of the angle formed by the minimized input (approximating the energetic minimum of the angle). The two torsions whose bond atoms include the shared closure atom are rejected if they vary by more than 25 degrees of any available torsion according to the assigned rotamer states. Strained bonds are relieved during the minimization step (Step 4). Branch-length-dependent resolution R-groups are sampled according to an angular resolution determined by how many bonds separate the rotor of interest from the backbone. Rotors closer to the backbone are given higher sampling resolution relative to ones further away. This branch-length-dependent resolution reduces the combinatorial explosion of available torsions while allowing finer motion where

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 50

clashes are more likely to occur (i.e., closer to the backbone). First rank rotors are preset to values with 20° resolution, second rank to 30°, third and fourth rank to 45°. Fifth and higher rank are kept fixed to their input values (the only dependence on input conformation in the PrimeMCS algorithm). Minimization Structures are minimized with the OPLS2005 force field in vacuum. The minimization starts with conjugate gradient up to 5 iterations if the maximum gradient on any atom is greater than 105 kcal/mol/Å. Truncated Newton minimization is then used with an energy convergence cutoff of 0.1 kcal/mol and energy gradient cutoff of 0.01 kcal/mol/Å. Stereoisomer constraints are implemented automatically based on input geometry using strong torsional constraints (5 kcal/mol/radian). An energy cutoff of 100 kcal/mol was used to prevent output of distorted geometries. MOE conformational search (MOE-LMMD) Sampling was performed as described below using MOE2014. The LowModeMD sampling algorithm, which was developed for sampling of complex molecules such as macrocycles,25 was run via moebatch with mostly default parameters passed to the ConfSearch function. The energy cutoff was increased from the automatic default (7 kcal/mol) to the largest standard option available in the MOE GUI (10 kcal/mol) to enable broader sampling of macrocyclic structures. The parameters used were as follows: maximum iterations: 10000, maximum conformations returned: 10000, RMSD cutoff for similar structures: 0.25, force field: Amber10:EHT, solvation: Reaction field model. By default, the conformational search terminates if no new conformation is discovered after 100 consecutive iterations. The LowModeMD protocol incorporates an all-atom

ACS Paragon Plus Environment

Page 11 of 50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

minimization, which was run with the default parameters of maximum 500 iterations and RMS gradient of 0.005 kcal/mol/ Å. MacroModel baseline search (MM-Base) The MM-Base conformational search calculations were performed with MacroModel using the combined MD/LowMode search algorithm designed for macrocycles as implemented in Schrödinger Suite 2015-4.

34

OPLS2005 and GB/SA aqueous solvent model were used along

with other default parameters (as optimized in Watts et al.24) as follows: 10000 search steps, a 50% quench cycle (0.5 ps 1000 K then 0.5 ps at the target temperature of 300 K). The low-mode search eigenvectors were updated at each global minimum and search step sizes were limited to between 3 and 18 Å. A 0.75 Å cutoff was used for distinguishing conformers and 10 kcal/mol energy window was used. MD conformational searching Sampling by molecular dynamics (MD) was performed to establish a baseline for a simple and readily-available approach for conformation generation. The simulations were run using the version of Desmond contained within Schrödinger Suite 2014-4.35 Solvation of the system into a cubic box was performed using all the default parameters available in the System Builder interface of Maestro. The box was sized to provide a buffer of at least 10 Å in all directions around the macrocycle and filled with SPC water. When necessary, sodium or chloride ions were added in random locations to neutralize the system. The OPLS2.1 force field was used. The dynamics were performed using all the default parameters available in the Molecular Dynamics interface, including the default equilibration protocol. The simulations were run for 24 ns, with frames saved every 24 ps, yielding 1000 conformations for each macrocycle, which were extracted without minimization. No clustering or redundancy elimination was performed. No

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 50

effort was made to optimize the simulation length or time between snapshots in order to maximize accuracy and diversity for a given amount of CPU time, nor was parallelization or GPUs used. Consequently, the CPU times for the MD simulations in this study likely overestimate the wall-clock times that may actually be found in practice. Relevant metrics The purpose of developing a new macrocycle sampling algorithm is to create one that will yield robust results under realistic conditions (e. g., available CPU power or lack of 3D structural data) in a manner that enables downstream applications such as docking, permeability prediction, SAR analysis, and refinement of experimental structures. We therefore chose several metrics that relate to this goal. Broadly, the selected metrics quantify reproduction of experimentallydetermined structures (accuracy), diversity of ensembles (coverage), and speed. Atomic RMSD One of the most common metrics for characterizing molecular conformational similarities is RMSD (root-mean-squared-deviation) of atoms of interest. Typically, the heavy-atom RMSD (RMSDheavy) is a useful measure, and RMSDheavy values are generally deemed large at some heuristic incremental thresholds (1 Å, 1.5 Å, etc.). Here, we focus on the subset of atoms comprising the macrocycle backbone (RMSDback) as a metric for conformational similarity. This is useful for macrocycles as the side-chain conformations can often differ dramatically due to subtle differences in the environment. Throughout this work, we calculate the minimum RMSD among the output ensemble, denoted as RMSDheavy,min and RMSDback,min, to measure a method’s ability to produce an ensemble that contains the experimentally determined structure. The calculation of RMSDheavy,min and RMSDback,min accounted for the topological symmetry found in some macrocycles in the set. The reported values are the lowest RMSD for all heavy atoms and

ACS Paragon Plus Environment

Page 13 of 50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

the macrocycle backbone subset, respectively, resulting from any SMARTS match involving all heavy atoms in the molecule. Backbone torsional state analysis Expressing macrocycle conformations in terms of their backbone torsional state can be useful in capturing conformational properties such as how substituents are projected from the backbone. Here, we consider energetic wells derived from torsional profiles to define discrete states for each torsion. A conformation of the entire molecule can then be represented as a combination of torsion states. This representation functions as a conformational fingerprint and allows for additional analysis. Obtaining torsional profiles Using force field torsional parameters as independent terms is misleading since they are fit to match the quantum mechanical torsion profile in concert with other energetic components of the force field. To obtain the entire force field profile, we use a method derived by Lupyan and Sindhikara (unpublished) that has been in use for the torsional analysis portion of Desmond’s35 Simulation Interaction Diagram (SID) report. The procedure is detailed below and described in Figure 2.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 50

Figure 2. Torsional profiling algorithm. 1. Identify the bond of interest within molecule 2. Truncate molecule about the bond of interest to the smallest representative fragment 3. Conformationally sample the fragmented molecule using short hybrid MD/MC calculation in Prime 4. Obtain energy profile for 6 conformations (the top 5 output and 1 reference conformation) by performing constrained optimizations in which the torsion of interest is fixed in 10° increments from -180° to 170° 5. Output the lowest energy of any conformer at each torsion as the representative profile

Utilizing these profiles, one can see how an individual conformation or ensemble of conformations relates to the underlying torsional energy potential. The underlying potential is

ACS Paragon Plus Environment

Page 15 of 50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

not identical to the full molecular potential and thus the individual torsional minimum may not reflect the global minimum. This is especially apparent in macrocyclic backbones where the cyclization introduces (unrelievable) molecular strain (Figure 3). These profiles may depend significantly on the force field. Here, we have used OPLS3 profiles generated in vacuum.36

Figure 3. An example torsional profile analysis of a conformational ensemble. The light blue line represents the calculated profile. Foreground bars represent histograms across the ensemble and are colored from likely to unlikely (green to red) based on the expected Boltzmann probability of the profile. A red reference line is shown for the state of the crystal reference conformation. Colored background fills represent three distinct energy wells. Torsional binning and fingerprints These profiles enable identification of the energy wells for each torsion, calculated as the torsional space between maxima in the profile (Figure 3). Since the profiles are occasionally rough, a minimum well size of 30° was enforced. Further, each torsion was separated into, at most, four wells. We define the torsional fingerprint as the ordered set of N base-4 numbers, where N is the number of macrocycle backbone rotatable bonds, in which each of the N items identifies the energy well that contains a specific rotatable bond. To ensure appropriate treatment of topologically symmetric macrocycles, a canonical fingerprint for a given conformation of such a macrocycle was taken as the first one of the sorted list of all equivalent fingerprints that

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 50

describe it. Two conformations are considered the same if they have identical torsional fingerprints. Fingerprint identity can be used to eliminate redundant conformations during conformation generation as well as to determine unambiguously if a conformation matches a reference. The lack of ambiguity is advantageous compared to RMSD, which requires an arbitrary cutoff to define whether two conformations are close enough to be considered the “same.” RMSD also may mask changes in conformation that are significant in a modeling context. For example, rotation of an amide group or a backbone alkyl chain with a substituent can produce a significantly different presentation of molecular recognition elements while having little effect on RMSD. Two conformations with identical torsional fingerprints generally have a low RMSD, but two low-RMSD conformations do not necessarily have identical fingerprints. It should be noted that if one conformation has a specific torsion near a well boundary (energy maximum), another conformation could be nearly identical in structure but have a nonidentical torsional fingerprint if that torsion falls on the other side of the maximum. This happens infrequently because torsions rarely occur near energy maxima, although it may occur more often in macrocycles because of conformational restriction due to the macrocyclic ring closure. Such cases were occasionally observed in this study (for one example, see the Supporting Information). Torsional fingerprints can also enable unambiguous comparison of the overall conformational space explored by each method. Conformations in the union and intersection of the spaces as well as conformations unique to each set are well defined by their fingerprints. For a given compound, let F be the set of unique torsional fingerprints generated by a method. The Tanimoto similarity T of F1 and F2 generated by methods 1 and 2, respectively, is defined as

ACS Paragon Plus Environment

Page 17 of 50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

=

| ∩  | (Equation 1) | ∪  |

A second metric, the coverage difference (C) between methods, is a measurement of how well one method explored conformational space that was not explored by the other method. For a given compound, the C value of method 1 compared to method 2 is defined as =

| | − | | (Equation 2) | ∪  |

where F1 and F2 are the sets of unique torsional fingerprints of the conformations generated by methods 1 and 2, respectively. In other words, the C value between two methods is the difference between the numbers of fingerprints generated by each of them individually normalized by the number of fingerprints in the union. C is a number between -1 and 1, where a value closer to 1 implies that the method generated more fingerprints (covered more conformational space) than the other, and a value closer to -1 implies that the method generated fewer fingerprints (covered less conformational space). Two methods generating exactly the same number of fingerprints results in C values of zero. Span of Radius of Gyration A more concrete way of examining conformational diversity is to look at the range in sizes of conformers for a given molecules. This can be considered a proxy for the breadth of sampling within the ensemble and can more directly be considered a measure of the ability of the method to sample the molecule’s most compact and most extended states. Here, we use the span of radius of gyration—the difference between the largest radius of gyration in the ensemble and the smallest. ( ) = !"( ) − !#( )

ACS Paragon Plus Environment

(Equation 3)

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 50

Figure 4 shows an example of a macrocycle whose conformational ensemble contains radii of gyration spanning 0.65 Å.

Figure 4. Example of span of radius of gyration for two Prime-MCS conformations (green/purple) of CSD compound PIKZEO (27 backbone atoms) with a relatively large difference in radii of gyration (4.92 vs. 5.57 Å). The value for span(rgyr) for this compound is 0.65 Å.

Energy Within the limits of the particular force field potential (and implicit solvent model, if present), the computed energy of individual conformations can be used to infer the relative probability of each conformation. A method that can consistently sample low energy structures is assumed to be more likely to yield output conformations which are relevant, even though the lowest-energy structures may not be the most accurate compared to a particular experimental reference. Since the methods were applied with different force field environments, we calculate the minimized energy using a consistent force field; minimization was performed with OPLS3 in vacuum.36 Minimization was performed with the Schrodinger’s python minimizer API using default

ACS Paragon Plus Environment

Page 19 of 50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

settings: BFGS if there were less than 500 atoms or conjugate gradient otherwise using an 0.05 kcal/mol/ Å gradient cutoff and 5E-9 kcal/mol energy cutoff and 1500 max steps. Dataset preparation Here we consider a macrocyclic compound to be one with a ring structure containing at least 10 backbone atoms. We also eliminated the few compounds containing multicycles since the multiplicity of their macrocycle backbones makes comparison of performance with monocyclic molecules non-trivial. In particular, we refrain from quantifying relevance of predicting relative positioning of multiple cyclic systems in the absence of an environmental model. Macrocycle datasets were procured from three sources (as described below) to obtain as much diversity as possible. There is no overlap between any of the subsets. CSD subset A structurally diverse subset of 130 macrocycles from the Cambridge Structural Database (CSD) was prepared as follows.37, 38 Carbon-containing molecules were extracted from the CSD using Conquest (version 2013, update 1) excluding polymers, organometallic molecules, or entries described as having errors or “powders.”39 The resulting 122581 molecules were exported keeping the largest fragment (eliminating solvent molecules and counterions) and using the “normalize hydrogens” option. 5118 macrocycles were identified in this set; these were filtered both visually and with SMARTS strings to remove undruglike molecules, such as boronates, allenes, potentially reactive alkyl halides, poly-ynes, highly strained systems, duplicates, most polyethers (while retaining a small representative set), and charged systems. The latter were excluded to eliminate the issue of conformations stabilized by counterions. The resulting 1631 macrocycles were binned according to ring size, and 130 representative structures were selected either visually (for macrocycles with >35 atom rings, of which the number was small; see the

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 50

Supporting Information) or on the basis of clustering using the Canvas Similarity and Clustering utility in Maestro with the following parameter settings: fingerprints: topological torsions; atom typing scheme: Daylight invariant atom types; precision: 32 bit; similarity metric: Tanimoto; clustering: average linkage method.40 BIRD subset The BIRD database (Biologically Interesting Molecule Reference Dictionary) was downloaded on September 19, 2014 from ftp.wwpdb.org/pub/pdb/data/bird/prd/ (filename prd-all.cif.gz). The database lists biologically interesting peptide-like molecules in the PDB. For each PDB code in the BIRD database, chains of molecular weight 500-2000 containing macrocycles of 10 atoms or more were retained, yielding 107 candidate structures. Each PDB file was then visually inspected and curated as follows. Each PDB entry was split into its constituent chains and molecules. Using the Maestro Protein Prep Wizard,41 hydrogens were added, bond orders assigned, and alternative positions/missing atoms were identified. Macrocycles with disorder, relatively high temperature factors, or poor resolution (>2.1 Å) were generally not considered. An exception was 1OKX with a resolution of 2.8 Å which exhibited uniformly low temperature factors throughout the macrocycle structure. In cases where multiple copies of a macrocycle were present exhibiting comparable temperature factors, a single representative was selected for benchmarking. Due to the scarcity of high quality structures in the BIRD dataset, macrocycles with ionizable groups were not automatically excluded, in contrast to the CSD set. When such groups were present, the protonation state was assessed based on the reported pH of crystallization conditions, estimated basicity or acidity of the functional groups in question, and on chemical context. For example, in 1QOW, there occurs a close intramolecular contact between a carboxyl and a

ACS Paragon Plus Environment

Page 21 of 50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

primary amine of the ligand. Crystallization was undertaken at pH 7.0, and so it was assumed that both groups were ionized. For 4M6E, the experimental pH was not reported; its sole ionizable group—a primary amine—was assumed to be ionized. In all, after curation, 18 structures were selected for this benchmarking study on the basis of quality and structural diversity. PDB subset In a previous study,24 Watts et al curated a combined set of CSD and PDB structures for training and testing of the MM-Base algorithm. The set was curated to contain diverse and challenging macrocyclic topologies including polyglycines, cyclodextrins and peptidic macrocycles. Here we utilized only the PDB structures from that set (67 compounds) and eliminated ones with either fewer than 10 ring atoms or multicycles leaving 60 compounds. Reference and input structure preparation Reference structures taken from the respective databases were prepared using Schrödinger software. Bond orders were assigned based on input geometry and structures were prepared using LigPrep,42 with options to preserve input stereochemistry and chirality. Epik was used to predict ionization and tautomerization states using default settings. No minimization was performed on the input references. To prepare reference geometry-agnostic input structures, the prepared reference structures were converted to SMILES strings, then back to 3D using LigPrep. These structures were then forced to have trans peptide and cis ester bonds. Datasets overview Table 1 shows an overview of statistics for the three macrocycle datasets used here. There are some characteristic differences between the different sets. The BIRD set shows the most interesting features. It has dramatically more cyclic peptides, 83%, compared to 22% and 8% for

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 50

the CSD and PDB sets, respectively, the most cross-links (28% vs 7% and 5%), the largest median molecular weight, the largest median backbone size, and the largest median number of heavy side-chain atoms. Unless otherwise specifically stated, results are shown for the entire pooled set. The full list of macrocycles (including identifier, source, and relevant topology information), the full results for all methods for all compounds, and an overview of the performance of subsets of interest (as noted in Table 1) are included in the Supporting Information.

Table 1. Statistics for composition of included data sets.1

CSD

BIRD

PDB

All

N

130

18

60

208

N cyclic pep

29 (22%)

15 (83%)

5 (8%)

49 (24%)

N cross-linked

9 (7%)

5 (28%)

3 (5%)

17 (8%)

MW

154-1202 (432)

497-1727 (964)

357-1665 (705)

154-1727 (536)

#BB (heavy)

10-40 (17)

15-42 (21)

11-43 (19.5)

10-43 (18)

1

N represents number of compounds in that class while # represents the quantity calculated for each compound. BB and SC represent portions of the macrocycle backbone and side-chains respectively. “Heavy” and “rot” represent the # of heavy atoms or rotatable bonds respectively. For statistics within compounds (#), the min-max (median) values are shown. For each metric, the maximum value (% or median) across all datasets is boldfaced.

ACS Paragon Plus Environment

Page 23 of 50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

#SC (heavy)

0-54 (11)

16-77 (49)

10-71 (31)

0-77 (18.5)

#SC/#BB (heavy)

0-2.6 (0.7)

1-3.1 (2.0)

0.5-3.8 (1.6)

0-3.8 (1.1)

#BB (rot)

7-40 (15.5)

12-39 (19.5)

8-33 (16)

7-40 (16)

#SC (rot)

0-28 (1.5)

5-45 (18)

0-36 (10)

0-45 (4)

#SC/#BB (rot)

0-2 (1.2)

0.4-2.2 (0.8)

0-2 (0.6)

0-2.2 (0.3)

RESULTS Structural diversity Characterization of the full range of accessible conformations is critical for sampling macrocycles. While conformational sampling algorithms usually attempt to optimize “accuracy” relative to a reference structure, often there is no single “correct” structure. It has also been demonstrated that for large, conformationally flexible macrocycles, the environment may cause a population to shift from one accessible conformation to another.43 For example, conformations that maximize internal hydrogen bonds may be preferred in a hydrophobic (e. g., membrane) environment, while conformations that expose more polar groups to solvent may be preferred in an aqueous state; other structural adaptations may be required to fit a receptor active site. Knowledge of the complete ensemble of diverse accessible conformations is key in discovering biologically relevant conformations. We therefore benchmarked the algorithms under consideration by calculating the number of unique conformations they were able to generate across our complete dataset as a reflection of the coverage of conformational space. Diversity of output ensembles

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 50

Number of conformers and torsional fingerprints The distributions over all compounds of the total numbers of conformations generated by the methods are reported in Figure 5A. Each of the algorithms under consideration has different settings controlling the number of output conformations (see Methods—the algorithms differ in their maximum number of output conformations and in whether they output redundant conformations or cluster conformations and output a nonredundant set).

Figure 5. Box and whisker plots of distributions of the number of output conformations (A) and number of unique backbone torsional fingerprints (B) for each method. Box limits encapsulate the interquartile range. Medians are represented with red lines with black text labels. Outliers are points greater than 1.5 times the interquartile range above the first or below the last quartile. Differences in distributions between figures A and B reflect the relative uniqueness (or conversely redundancy) of the conformations produced by each method. Because of variation in the retention or elimination of redundant conformations in the methods studied here, the absolute number of conformations generated by each method may not be the most relevant way to measure diversity of the output conformations. As an alternative measure, Figure 5B shows the distributions of the numbers of unique backbone conformations, determined

ACS Paragon Plus Environment

Page 25 of 50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

by macrocycle backbone torsional fingerprints, that were generated by each method. It is apparent from comparing Figure 5A and Figure 5B that many of the conformations generated by MD were highly redundant, since the number of torsional fingerprints are far less than the number of output conformations. In contrast, for the other three methods, especially Prime-MCS, the numbers of torsional fingerprints are closer to the numbers of overall conformations. According to Figure 5B, Prime-MCS produces significantly more diverse conformations than MM-Base, MOE-LMMD, and MD. The breakdown of diversity by size is shown in the Supporting Information. The superior performance of Prime-MCS is most apparent in macrocycles of relatively small size (backbone size less than ~20 atoms). MOE-LMMD and MM-Base generated more conformations for some of the larger macrocycles, although MOELMMD tended to produce significantly fewer for cyclic peptides (see Table S1 in the Supporting Information). This is probably due to the maximum number of output structures for Prime-MCS being set lower by default than for the other two methods. Radius of gyration span Another metric for determining the conformational diversity within an ensemble is the span of the radius of gyration (rgyr) of the given molecule within that ensemble. The span(rgyr) is defined as the difference between the largest and smallest rgyr of that ensemble. Figure 6 shows the distributions of span(rgyr) for each method among the entire dataset of macrocycles. Again, Prime-MCS tends to have slightly larger span(rgyr) values than the other methods, indicating a more diverse set of conformations are produced.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 50

Figure 6. Box and whisker plots representing distributions of spans of radii of gyration within conformers across compounds. Larger numbers represent larger variation in size across conformers for a specific compound. Box-and-whisker definitions are the same as in Figure 5. Comparative diversity of output ensembles Torsional fingerprints of the conformations were used to assess the similarities and differences in conformational space explored by each method. The overlap of conformations generated by the different methods, defined by the Tanimoto similarity (T) of the generated torsional fingerprints is always very low (see the Supporting Information for distributions). The similarity distributions for every pair of methods (over all compounds) has a very low median value of 0.03 or below. Even for the smallest macrocycles, there are very few individual compounds for which the similarity for any pair of methods exceeds 0.4. Low T could arise as a consequence of the large number of possible conformations available to some compounds. If the number of possible conformations is large compared to the number generated, it is statistically unlikely that fingerprints will appear in both sets and the Tanimoto similarity will be low. This is the case for many compounds in this study. For a macrocycle with the median number of rotatable bonds (16), even if each has only two torsional bins (resulting in 216 ≈ 65,000 possible fingerprints) and only 10% of them are actually accessible, the number of

ACS Paragon Plus Environment

Page 27 of 50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

possible conformations (~6500) still exceeds the number of output conformations most methods typically produced (see Figure 5A). Alternatively, low T, especially for the small macrocycles, could be due to the disparity in the numbers of conformations generated by the different methods. If for one compound, one method generates 1000 fingerprints and another generates 50 fingerprints, the similarity can never exceed 0.05. To differentiate these two scenarios, the coverage difference (C) was calculated to complement T as a useful metric for quantifying the disparity in set sizes. The median C values are reported in Table 2, and the distributions of C values comparing each pair of methods across all compounds are shown in the Supporting Information. Overall, the relative position of the methods is closely related to the number of fingerprints generated (Figure 5B). Prime-MCS is superior to the other three in that its distributions fall almost entirely in the upper half of the range and have medians of 0.64, 0.59, and 0.74 against MOE-LMMD, MM-Base, and MD, respectively. For the other three methods compared with each other, the C values are more evenly distributed around 0, with medians of 0.16 (MM-Base over MOE-LMMD), 0.37 (MM-Base over MD), and 0.16 (MOELMMD over MD). While these three methods contribute approximately equally to the overall set of conformations, they can be ordered MM-Base > MOE-LMMD > MD.

Table 2. Median values of distributions of coverage difference values, C, of each pair of methods.

vs Prime-MCS Prime-MCS

vs MOE-LMMD

vs MM-Base

vs MD

0.64

0.59

0.74

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

MOE-LMMD

-0.64

-0.16

MM-Base

-0.59

0.16

MD

-0.74

-0.16

Page 28 of 50

0.16 0.37

-0.37

Structural accuracy RMSD Figure 7A shows the distribution of the lowest RMSD structure (relative to crystallographic reference structures) of a molecular conformational ensemble for macrocycle backbone atoms (denoted as RMSDback,min). Plots for the RMSDheavy,min values (the equivalent calculation except all heavy atoms including side-chains are considered) are shown in the Supporting Information and are qualitatively similar. MD has the worst median RMSDback,min, at 0.56 Å, while the other three methods perform similarly with medians around 0.4 Å. However, in terms of outliers (compounds where the best conformation is still very far from the reference) there is a qualitative difference. For other methods, the worst outliers had RMSDback,min. values up to 3.8 Å. In contrast, the worst Prime-MCS structure had only 2.1 Å RMSDback,min. Thus, considering both median RMSDback,min and outlier magnitude, Prime-MCS appears to be the most consistent and robust.

ACS Paragon Plus Environment

Page 29 of 50

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 50

Figure 7. Performance of algorithms in generating structures similar to crystal references. (A) Box and whisker plots for the RMSDback,min for each algorithm. (B) Scatter plot of RMSDback,min vs. backbone size for each algorithm. (C) Normalized cumulative distribution functions (CDFnorm) of RMSDback,min as a function of backbone RMSD threshold for each method. For each method, the plot shows the fraction of compounds with at least one output structure with RMSDback,min below the threshold. This can also be seen by observing the individual RMSDback,min values as a function of backbone size (Figure 7B), which allows for an overview of the of RMSDback,min range expected for each method for macrocycles of a given size. Prime-MCS consistently samples near the crystallographic structure (< 1.0 Å) for macrocycles with backbone sizes up to about 25 atoms. Even for larger macrocycles, where performance for all methods declines due to the compounds’ greater complexity, Prime-MCS tends to have much lower variation; the RMSDback,min values do not exceed 1.5 Å until the macrocycle size reaches 40. It is also notable that unlike MOE-LMMD and MD, the performance of Prime-MCS and MM-Base shows very little decline for the more complicated cross-linked macrocycles (both with median RMSDback,min values near 0.5 Å) compared to non-cross-linked (both with median RMSDback,min values near 0.4 Å; see Table S1 in the Supporting Information); the slight difference is consistent with the larger sizes of the crosslinked macrocycles.

An important factor with regards to accuracy is getting structures that are “good enough” to enable structural insight and downstream processing. Depending on the case, certain thresholds

ACS Paragon Plus Environment

Page 31 of 50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

may be used to determine whether the conformation is useful. Figure 7C shows the cumulative distribution function (CDF) of the number of compounds satisfying RMSDback,min thresholds (breakdown by dataset shown in the Supporting Information). For a cutoff of 0.5 Å, MM-Base performs best, with the RMSDback,min for ~65% of compounds under the threshold. In contrast, for all thresholds 1.0 Å and above, Prime-MCS performs best, achieving 100% of compounds with RMSDback,min below the threshold at ~2.1Å. The other 3 methods achieve 100% at thresholds of 3 Å or above. Performance did vary by dataset (see the Supporting Information). For both the CSD and PDB sets, MM-Base performed best at the 0.5 Å threshold, with Prime-MCS surpassing it at thresholds of ~1 Å and above. However, for the complex compounds in the BIRD set, PrimeMCS was better by about 15% for all thresholds. Torsional identity Comparison of torsional fingerprints from the output ensemble to the reference structure can also be used to assess structural accuracy. Figure 8A shows the number of compounds for which each method generated a fingerprint that exactly matched the reference. By absolute count, Prime-MCS is slightly better than MOE-LMMD and MM-Base, which are all significantly better than MD.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 50

Figure 8. (A) For each method, the number of compounds for which at least one conformation had a torsional fingerprint exactly matching that of the reference structure. (B) For each method, the fraction of macrocycles of a given size for which at least one conformation had a torsional fingerprint exactly matching that of the reference structure. The number of compounds with each backbone size is also shown as a histogram. The size range extends to only 30 atoms because no method generated a fingerprint match for any macrocycle larger than 30. However, there are some differences in the performance of the various methods with respect to macrocycle size, as shown in Figure 8B. Prime-MCS performs especially well for macrocycles in the small end of the range, obtaining an exact fingerprint for 53 out of 65 (82%) of macrocycles with a ring size of 14 atoms or smaller, including 10 out of 10 compounds with a

ACS Paragon Plus Environment

Page 33 of 50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

ring size of exactly 10 atoms. The good performance of Prime-MCS in this size regime is probably a consequence of the fact that, for the smaller macrocycles, Prime-MCS was able to generate far more conformations than the other methods and provided more exhaustive sampling. For both Prime-MCS and MOE-LMMD, there was a precipitous dropoff in performance at a ring size of 20 atoms. Interestingly, MM-Base and MD did not display the same dropoff. Of macrocycles with 20 or more heavy atoms, Prime-MCS generated an identical fingerprint for only three compounds (two of which are at the small end of this range), compared to 7, 10, and 12 compounds for MOE-LMMD, MD, and MM-Base, respectively. While even MM-Base does not reproduce the reference fingerprint for a majority of compounds (only 19% of the 64 compounds in the size range 20-28), it is the best method in this size regime by this metric. Representative structures where MM-Base matched the reference torsional fingerprint exactly are shown in the Supporting Information. MM-Base was trained on the PDB set, which generally contains larger macrocycles than the CSD set, possibly resulting in default parameters better suited to larger macrocycles. However, its superior performance is not solely due to disproportionately reproducing the PDB set reference structures, thereby reflecting an unfair advantage gained by the training, as evidenced by the fact that MM-Base performs about equally well in this size regime for both the PDB (6 identical fingerprints out of 24 compounds with size 20-28) and CSD sets (6 identical fingerprints out of 34 compounds). In no case did any method produce an exact fingerprint for a macrocycle with more than 30 backbone atoms. This is not surprising given the vast conformational space available to large macrocycles. Energies calculated for output structures

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 50

OPLS3 minimized energies were calculated for all structures as a metric for how often the algorithms produced “reasonable” conformations.36 Table 3 shows statistics of the minimum energy conformations produced by each method, and distributions are reported in the Supporting Information. Median values and interquartile ranges (IQRs) among the compounds is shown. For these calculations, since MD, MM-Base, and Prime-MCS all use versions of the OPLS force field during sampling, MOE-LMMD might be expected to be at a disadvantage and produce higher-energy conformations. However, this disadvantage is not apparent when observing the minimized energies. Energies shown are all relative to the lowest energy conformation coming from any method for that compound.

Table 3. Median and interquartile ranges (IQRs) of relative energies2(kcal/mol) obtained for energy minimized structures among the dataset. Prime-MCS 0.70/5.03 Median/IQR for minimum minimized

MOE-LMMD

MM-Base

MD

0.04/4.30

0.02/1.04

4.1/9.75

Speed Figure 9 shows the timing comparisons for all methods. The four methods were not all run on the same architecture, so speed comparisons are not exact. MOE-LMMD and Desmond were run 2

Energies are relative to the lowest energy obtained across all methods for each compound.

ACS Paragon Plus Environment

Page 35 of 50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

at Bristol-Myers Squibb (BMS) on a cluster containing AMD Opteron(tm) 8435 (2.6 GHz) processors. Though Desmond is optimized for GPU, CPUs were used to compare calculation times directly with the other CPU-based methods. Nonetheless the numbers may not be realistic as GPU implementations are much more commonly used for MD. MM-Base and Prime-MCS were run at Schrödinger using Six-Core AMD Opteron(tm) Processor 2427 with hyperthreading (2.211 GHz) and 32-Core AMD Opteron(TM) Processor 6274 (2.2 GHz). Since the BMS and Schrödinger clusters are roughly in the same speed range, we believe it is fair to compare the calculation run times as “ballpark” estimates.

Figure 9. Distributions of serial CPU calculation times for each method. Dashed horizontal lines indicate significant thresholds (10 minutes, 1 hour, 1 day). These times do not reflect potential speedups from parallelization or from GPU implementation. Case examination In addition to the benchmarking results described above, it is useful to highlight specific examples that represent realistic use cases for diverse macrocycles. Here, we examine some of the lowest-RMSD conformations identified by Prime-MCS to illustrate the successful sampling for macrocycles of varying sizes. (For all examples, RMSDs are reported for all heavy atoms and for macrocycle backbone atoms only. We report comparative RMSDs for MOE-LMMD as a

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 36 of 50

representative second method). Figure 10A shows two examples of small macrocycles where Prime-MCS succeeds in sampling conformations that fully recapitulate the geometries of the reference crystal structures. For both KUFPUW and HEBHUR, structures with 14-15 atom macrocycles, Prime-MCS identifies conformations with RMSDs compared to the crystal structures ~0.5 Å (0.54 Å/ 0.23 Å and 0.49 Å/ 0.18 Å, respectively). These structures are not challenging to sample given their relatively small sizes, and MOE-LMMD identifies equally accurate conformations. The low-RMSD Prime-MCS and MOE-LMMD conformations are also exact matches by the more stringent measure of torsional fingerprint identity.

ACS Paragon Plus Environment

Page 37 of 50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 10. Sample structures and database identifiers from the CSD (A-E) and BIRD (F) sets. In each case, the reference crystal structure is shown in gray and the Prime-MCS conformation with the lowest RMSD compared to the reference is shown with cyan carbons. Figure 10B provides two examples of larger non-peptidic structures with macrocycle sizes of 36-40 atoms (POWYEG and SUQJET). Prime-MCS again performs well, though here the conformations closest to the reference structures have RMSDs of 1.41 Å/ 1.41 Å and 1.15 Å/ 0.88 Å (for MOE-LMMD, the closest conformations have RMSDs of 2.2 Å/ 2.1 Å and 1.3 Å/ 1.0Å, respectively). The overlays of the best conformations with the crystal structures show that while some torsions deviate from the reference values, overall the geometries match well. For mid-sized peptidic macrocycles, sampling with Prime-MCS can identify conformations with very low RMSDs compared to the reference. Two examples are shown in Figure 10C, both with 24-atom macrocycles; one (UZUKUW, RMSD 0.37 Å/ 0.17 Å) has a globular shape and the other (DUVGOQ10, 0.53 Å/ 0.29 Å) is more beta-turn like, with a disulfide cross-link spanning the center. In both cases, the best Prime-MCS conformation displays the same intramolecular hydrogen bonds as in the crystal structures. While MOE-LMMD succeeds equally well for the elongated DUVGOQ10 structure (minimum RMSD 0.64 Å/ 0.32 Å), it fails to sample the globular UZUKUW peptide adequately, achieving a minimum RMSD of 2.6 Å/ 1.6 Å. For larger peptidic macrocycles, where sampling is sparser compared to the much larger number of possible conformations, Prime-MCS can still produce reasonable conformations that reproduce the overall shape of the reference structures. However, in each case individual torsions deviate significantly, leading to local conformational distortions and side-chains that do not align well. RIVLUE (Figure 10D) is an extended beta-turn-like structure with a 28-atom macrocycle. Prime-MCS identifies a conformation with an RMSD of 1.77 Å/ 0.64 Å (compared to 3.2 Å/ 2.73

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 38 of 50

Å for the best MOE-LMMD conformation). While the backbone closely traces the reference macrocycle, some of the backbone torsions are rotated relative to the reference, leading to local misalignment of some side-chains. Similarly, for the SAFVOM structure (Figure 10D), a challenging 27-atom macrocycle that forms a beta-turn with a Pro-Pro motif, the best PrimeMCS conformation has an RMSD of 2.02 Å/ 0.97 Å (2.48 Å/ 1.31 Å for MOE-LMMD), and there are similar local distortions. In some cases, unusual features of the reference crystal structures may influence the quality of the results. YACWIK, for example, is a large 40-atom macrocycle with an extended conformation that both Prime-MCS and MOE-LMMD have difficulty replicating (the best RMSDs are 2.1 Å/ 1.57 Å and 2.3 Å/ 1.49 Å, respectively). The crystal structure conformation is highly symmetric, and its calculated energy is unusually large; the conformation may be induced by crystal packing effects that promote a relatively high-energy conformation not sampled by the tested algorithms (Figure 10E). The BIRD dataset comprises a subset of large, complex cyclic peptides, for which achieving accuracy remains challenging. One example is 4KEL, a 42-atom macrocycle with a disulfide bridge and a Pro-Pro motif. Prime-MCS’s best RMSD is only 3.2 Å/ 1.9 Å (4.6 Å/ 2.9 Å for MOE-LMMD), reflecting the difficulty of this class (Figure 10F). Another difficult BIRD example is 1MIK. While the 33-atom macrocycle backbone is not as large as 4KEL, the structure contains multiple N-methylated residues, and there are few intramolecular contacts to guide sampling (only one intramolecular H-bond is present) (Figure 10F). The RMSD of the best Prime-MCS conformation is 2.8 Å/ 1.24 Å (3.5 Å/ 1.8 Å for MOE-LMMD). DISCUSSION Comparison of supplemental approaches to improve diversity

ACS Paragon Plus Environment

Page 39 of 50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

The low Tanimoto similarities (T) of the sets of torsional fingerprints generated by the methods suggest that none of the methods, using the default parameters applied in this study, exhaustively samples the conformational space for even modestly sized macrocycles. Consequently, it is worthwhile to consider how one could increase diversity. One approach would be to perform larger calculations using Prime-MCS (the best individual method for generating diversity), potentially generating more conformations than the current default of 1000. To investigate this approach, we performed Prime-MCS calculations for the CSD set requesting 2000 conformers instead, referred to as Prime-MCS (2K). However, if Prime-MCS were biased away from otherwise reasonable conformations that another method were able to identify, an alternative approach could be to supplement the Prime-MCS sets with those generated by the other methods, referred to as supplemented MOE-LMMD, MM-Base, and MD. We compared the diversity generated for the CSD macrocycles by these approaches. Distributions of C values comparing the different approaches are provided in the Supporting Information. Trivially, all approaches of supplementing Prime-MCS with other conformations were superior to Prime-MCS itself, but supplementing with the Prime-MCS (2K) conformations yielded the most additional diversity (with a median C over Prime-MCS of 0.50, compared to 0.19, 0.22, and 0.14 for supplemented MOE-LMMD, MM-Base, and MD, respectively). The other three methods have wider distributions, including some compounds with C as high as 0.9 (from the larger macrocycles where MOE-LMMD and MM-Base produce several thousand conformations), but the median C values are significantly lower, indicating that in general the additional calculations provide only 10-20% more fingerprints than Prime-MCS alone. When comparing all supplemental methods to each other, supplementing with Prime-MCS (2K) also produced the most diversity in general compared to the other methods. Although the median C

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 40 of 50

values are not very high (0.04, 0.18, and 0.29 for supplemented Prime-MCS (2K) compared to supplemented MOE-LMMD, MM-Base, and MD, respectively), indicating that there is considerable variation between different compounds, the distributions clearly favor supplemented Prime-MCS (2K). Thus, there is no clear evidence that a supplemental approach is required to attain maximum diversity; rather, it appears that running Prime-MCS with a larger number of output conformations generally offers as good or better diversity. Improving sampling with prior structural knowledge Sampled macrocyclic conformations can be used for multiple downstream analyses such as docking, rationalizing SAR of congeneric series, and design of novel analogs that can maintain a bioactive conformation. However, we have observed that in the absence of a crystallographic reference structure, it is difficult to identify the specific bioactive conformations using calculated energy values alone. While the force field energy can be a useful tool for identifying reasonable conformations, it does not correlate with RMSD and cannot reliably identify the crystal structure from within a conformational ensemble (see the Supporting Information). Even if the force field were perfect, the experimental environment (such as crystal packing or protein binding pocket) reshapes the energy landscape and is not considered by the force field model. For example, as noted in the case studies, the crystal reference for the YACWIK structure (Figure 10E) had an extraordinarily large force field energy (28 kcal/mol higher than the lowest energy sampled structure even after minimization). Without prior knowledge to identify candidate bioactive conformations, Prime-MCS sampling will generate a diverse ensemble of conformations where the lowest-energy structures are not necessarily the closest to the “true” bioactive structure (see the Supporting Information). Postprocessing steps may be necessary to filter an ensemble output by Prime-MCS and reduce the

ACS Paragon Plus Environment

Page 41 of 50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

number of conformations taken forward in the next steps of the analysis. These steps may include mapping known SAR onto the conformational ensemble. For example, if a side-chain is known from activity data to interact directly with the receptor, conformations where that sidechain is buried inside the ligand structure can be discarded. Alternatively, if a crystal structure of an analog has been solved or if NMR experiments have been performed, Prime-MCS sampling can be constrained to fit distance restraints derived from experimental data. The more information that is available, the higher quality the output conformations will be. Even with prior information, for very large macrocycles it is difficult to generate conformations that are within 1 Å all-atom accuracy of known crystallographic reference structures. We have demonstrated that conformations within 2-3 Å (all heavy atoms) can be confidently generated (and within 1 Å for the macrocycle backbone), which could be accurate enough for coarse docking and analysis of how side-chains may be projected towards a receptor binding site. High-resolution modeling, however—for example, predicting the impact of sidechain substitution or rigidification of a molecule via cross-linking—may require some prior knowledge in order to bias sampling. CONCLUSIONS Macrocycles pose an excellent opportunity for pharmaceutical design as well as a significant computational challenge. Here we introduced a methodology, Prime-MCS, to perform computational sampling of macrocycles and analyzed its performance in the context of relevant compounds compared to other available methodologies. Measured by both span of radius of gyration and torsional fingerprints, Prime-MCS produces the most diverse structures. PrimeMCS was the best method in reproducing experimental structures by most metrics and was also the fastest in serial CPU calculations. Overall, Prime-MCS provides a rapid method for

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 42 of 50

generating diverse, high-quality macrocycle conformations that are suitable for modeling applications with the capability of being further enhanced for more challenging applications. Future developments will include integration with docking protocols, incorporation of advanced structural restraints, and otherwise expanded capabilities for handling more challenging topologies. ASSOCIATED CONTENT Supporting Information The following files are available free of charge. Example of torsional fingerprint identity missing a closely related conformation, scatter plots of analysis metrics compared to macrocycle backbone size, distributions of T and C in pairs of ensembles, figures reporting heavy-atom RMSD, RMSD CDF plots broken down by dataset, distributions and scatter plots of energies, examples of relatively large macrocycle backbones generated with torsional fingerprints exactly matching the reference, distributions of C for pairs of diversity-generating approaches, scatter plot comparing energy and RMSD for selected compounds (Figures S1- S11) and performance of all methods for selected compound subsets as measured by all analysis metrics (Table S1)) (PDF) Identifier, source, and topology information for all macrocycles (backbone size, molecular weight, and classification into size bins, cross-linked, and cyclic peptide subsets) (CSV.ZIP) Input structures used for conformational sampling (MAE.ZIP) Full table of results (radius of gyration, RMSDs, torsional fingerprints, energies and other properties for all conformations generated by all methods for all macrocycles) (CSV.ZIP) Lowest backbone RMSD structures for each method (MAE.ZIP)

ACS Paragon Plus Environment

Page 43 of 50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

SMARTS-pattern-based rotamer libraries (TXT) AUTHOR INFORMATION Corresponding Author *[email protected]

Author Contributions The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript. ACKNOWLEDGEMENTS The authors wish to thank Matthew P Jacobson for contribution of the original FORTRAN sampling engine code as well as Debbie Loughney and Steve Johnson for helpful discussions. ABBREVIATIONS ADME, absorption, distribution, metabolism, and excretion; Prime-MCS, Prime macrocycle conformational sampling; MOE-LMMD, MOE low-mode molecular dynamics; MD, molecular dynamics; MM-Base, MacroModel baseline searchRMSD, root-mean-square deviation; C, coverage value; T, Tanimoto coefficient; CSD, Cambridge Structural Database; BIRD, Biologically interesting molecule reference dictionary; PDB, protein databank; BMS, BristolMyers Squibb; GPU, graphics processing unit; CPU, central processing unit, Prime-MCS (2K), Prime macrocycle conformational sampling with 2000-conformation default; IQR, interquartile range; CDF, cumulative distribution function REFERENCES (1) Heinis, C. Drug Discovery: Tools and Rules for Macrocycles. Nat. Chem. Biol. 2014, 10, 696-698. (2) Driggers, E. M.; Hale, S. P.; Lee, J.; Terrett, N. K. The Exploration of Macrocycles for Drug Discovery—An Underexploited Structural Class. Nat. Rev. Drug Discovery 2008, 7, 608624.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 44 of 50

(3) Ganesan, A. The Impact of Natural Products upon Modern Drug Discovery. Curr. Opin. Chem. Biol. 2008, 12, 306-317. (4) Cardote, T. A. F.; Ciulli, A. Cyclic and Macrocyclic Peptides as Chemical Tools To Recognise Protein Surfaces and Probe Protein–Protein Interactions. ChemMedChem 2016, 11, 787-794. (5) Doak, B.C.; Zheng, J.; Dobritzsch, D.; Kihlberg, J. How Beyond Rule of 5 Drugs and Clinical Candidates Bind to Their Targets. J. Med. Chem. 2016, 59, 2312–2327. (6) Villar, E. A.; Beglov, D.; Chennamadhavuni, S.; Porco Jr, J. A.; Kozakov, D.; Vajda, S.; Whitty, A. How Proteins Bind Macrocycles. Nat. Chem. Biol. 2014, 10, 723-731. (7) Upadhyaya, P.; Qian, Z.; Selner, N. G.; Clippinger, S. R.; Wu, Z.; Briesewitz, R.; Pei, D. Inhibition of Ras Signaling by Blocking Ras–Effector Interactions with Cyclic Peptides. Angew. Chem., Int. Ed. Engl. 2015, 54, 7602-7606. (8) Birts, C. N.; Nijjar, S. K.; Mardle, C. A.; Hoakwie, F.; Duriez, P. J.; Blaydes, J. P.; Tavassoli, A. A Cyclic Peptide Inhibitor of C-terminal Binding Protein Dimerization Links Metabolism with Mitotic Fidelity in Breast Cancer Cells. Chem. Sci. 2013, 4, 3046-3057. (9) Bockus, A. T.; Lexa, K. W.; Pye, C. R.; Kalgutkar, A. S.; Gardner, J. W.; Hund, K. C. R.; Hewitt, W. M.; Schwochert, J. A.; Glassey, E.; Price, D. A.; Mathiowetz, A. M.; Liras, S.; Jacobson, M. P.; Lokey, R. S. Probing the Physicochemical Boundaries of Cell Permeability and Oral Bioavailability in Lipophilic Macrocycles Inspired by Natural Products. J. Med. Chem. 2015, 58, 4581-4589.

ACS Paragon Plus Environment

Page 45 of 50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(10) Leung, S. S. F.; Sindhikara, D.; Jacobson, M. P. Simple Predictive Models of Passive Membrane Permeability Incorporating Size-Dependent Membrane-Water Partition. J. Chem. Inf. Model. 2016, 56, 924-929. (11) Wang, Q.; Sciabola, S.; Barreiro, G.; Hou, X.; Bai, G.; Shapiro, M. J.; Koehn, F.; Villalobos, A.; Jacobson, M. P. Dihedral Angle-Based Sampling of Natural Product Polyketide Conformations: Application to Permeability Prediction. J. Chem. Inf. Model. 2016, 56, 21942206. (12) White, T. R.; Renzelman, C. M.; Rand, A. C.; Rezai, T.; McEwen, C. M.; Gelev, V. M.; Turner, R. A.; Linington, R. G.; Leung, S. S. F.; Kalgutkar, A. S.; Bauman, J. N.; Zhang, Y.; Liras, S.; Price, D. A.; Mathiowetz, A. M.; Jacobson, M. P.; Lokey, R. S. On-resin Nmethylation of Cyclic Peptides for Discovery of Orally Bioavailable Scaffolds. Nat. Chem. Biol. 2011, 7, 810-817. (13) Lawson, K. V.; Rose, T. E.; Harran, P. G. Template-constrained Macrocyclic Peptides Prepared from Native, Unprotected Precursors. Proc. Natl. Acad. Sci. U. S. A. 2013, 110, E3753-E3760. (14) Sham, H. L.; Bolis, G.; Stein, H. H.; Fesik, S. W.; Marcotte, P. A.; Plattner, J. J.; Rempel, C. A.; Greer, J. Renin Inhibitors. Design and Synthesis of a New Class of Conformationally Restricted Analogs of Angiotensinogen. J. Med. Chem. 1988, 31, 284-295. (15) Tyndall, J. D. A.; Reid, R. C.; Tyssen, D. P.; Jardine, D. K.; Todd, B.; Passmore, M.; March, D. R.; Pattenden, L. K.; Bergman, D. A.; Alewood, D.; Hu, S.-H.; Alewood, P. F.; Birch, C. J.; Martin, J. L.; Fairlie, D. P. Synthesis, Stability, Antiviral Activity, and Protease-Bound

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 46 of 50

Structures of Substrate-Mimicking Constrained Macrocyclic Inhibitors of HIV-1 Protease. J. Med. Chem. 2000, 43, 3495-3504. (16) Hu, X.; Nguyen, K. T.; Jiang, V. C.; Lofland, D.; Moser, H. E.; Pei, D. Macrocyclic Inhibitors for Peptide Deformylase:  A Structure−Activity Relationship Study of the Ring Size. J. Med. Chem. 2004, 47, 4941-4949. (17) Doak, B. C.; Over, B.; Giordanetto, F.; Kihlberg, J. Oral Druggable Space beyond the Rule of 5: Insights from Drugs and Clinical Candidates. Chem. Biol. 2014, 21, 1115-1142. (18) Bell, I. M.; Gallicchio, S. N.; Abrams, M.; Beese, L. S.; Beshore, D. C.; Bhimnathwala, H.; Bogusky, M. J.; Buser, C. A.; Culberson, J. C.; Davide, J.; Ellis-Hutchings, M.; Fernandes, C.; Gibbs, J. B.; Graham, S. L.; Hamilton, K. A.; Hartman, G. D.; Heimbrook, D. C.; Homnick, C. F.; Huber, H. E.; Huff, J. R.; Kassahun, K.; Koblan, K. S.; Kohl, N. E.; Lobell, R. B.; Lynch, J. J.; Robinson, R.; Rodrigues, A. D.; Taylor, J. S.; Walsh, E. S.; Williams, T. M.; Zartman, C. B. 3-Aminopyrrolidinone Farnesyltransferase Inhibitors:  Design of Macrocyclic Compounds with Improved Pharmacokinetics and Excellent Cell Potency. J. Med. Chem. 2002, 45, 2388-2409. (19) Giordanetto, F.; Kihlberg, J. Macrocyclic Drugs and Clinical Candidates: What Can Medicinal Chemists Learn from Their Properties? J. Med. Chem. 2014, 57, 278-295. (20) Marsault, E.; Peterson, M. L. Macrocycles Are Great Cycles: Applications, Opportunities, and Challenges of Synthetic Macrocycles in Drug Discovery. J. Med. Chem. 2011, 54, 19612004.

ACS Paragon Plus Environment

Page 47 of 50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(21) Bonnet, P.; Agrafiotis, D. K.; Zhu, F.; Martin, E. Conformational Analysis of Macrocycles: Finding What Common Search Methods Miss. J. Chem. Inf. Model. 2009, 49, 2242-2259. (22) Coutsias, E.A.; Lexa, K.W.; Wester, M.J.; Pollock, S.N.; Jacobson, M.P. Exhaustive Conformational Sampling of Complex Fused Ring Macrocycles Using Inverse Kinematics. J. Chem. Theory Comput. 2016, 12, 4674–4687. (23) Tran, H.L.; Lexa, K.W.; Julien, O.; Young, T.S.; Walsh, C.T.; Jacobson, M.P.; Wells, J.A. Structure–Activity Relationship and Molecular Mechanics Reveal the Importance of Ring Entropy in the Biosynthesis and Activity of a Natural Product. J. Am. Chem. Soc. 2017, 139, 2541-2544. (24) Watts, K. S.; Dalal, P.; Tebben, A. J.; Cheney, D. L.; Shelley, J. C. Macrocycle Conformational Sampling with MacroModel. J. Chem. Inf. Model. 2014, 54, 2680-2696. (25) Labute, P. LowModeMD—Implicit Low-Mode Velocity Filtering Applied to Conformational Search of Macrocycles and Protein Loops. J. Chem. Inf. Model. 2010, 50, 792800. (26) Chen, I. J.; Foloppe, N. Tackling the Conformational Sampling of Larger Flexible Compounds and Macrocycles in Pharmacology and Drug Discovery. Bioorg. Med. Chem. 2013, 21, 7898-7920. (27) Jacobson, M. P.; Friesner, R. A.; Xiang, Z.; Honig, B. On the Role of the Crystal Environment in Determining Protein Side-chain Conformations. J. Mol. Biol. 2002, 320, 597608.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 48 of 50

(28) Figueras, J. Ring Perception Using Breadth-First Search. J. Chem. Inf. Model. 1996, 36, 986-991. (29) Downs, G. M. Ring Perception. In Handbook of Chemoinformatics; Wiley-VCH Verlag GmbH: 2008, pp 161-177. (30) Jacobson, M. P.; Pincus, D. L.; Rapp, C. S.; Day, T. J.; Honig, B.; Shaw, D. E.; Friesner, R. A. A Hierarchical Approach to All-atom Protein Loop Prediction. Proteins 2004, 55, 351-367. (31) Desmet, J.; De Maeyer, M.; Hazes, B.; Lasters, I. The Dead-End Elimination Theorem and its Use in Protein Side-Chain Positioning. Nature 1992, 356, 539-542. (32) Goldstein, R.F. Efficient Rotamer Elimination Applied to Protein Side-Chains and Related Spin Glasses. Biophys. J. 1994, 66, 1335-1340. (33) Pierce, N.A.; Spriet, J.A.; Desmet, J.; Mayo, S.L. (2000). Conformational Splitting: A More Powerful Criterion for Dead-End Elimination. J. Comput. Chem. 2000, 21, 999-1009. (34) Schrodinger Release 2015-4: MacroModel, Schrödinger, LLC, New York, NY, 2015. (35) Schrodinger Release 2014-4: Desmond Molecular Dynamics System, D. E. Shaw Research, New York, NY, 2014. Maestro-Desmond Interoperability Tools, Schrödinger, New York, NY, 2014. (36) Harder, E.; Damm, W.; Maple, J.; Wu, C.; Reboul, M.; Xiang, J. Y.; Wang, L.; Lupyan, D.; Dahlgren, M. K.; Knight, J. L.; Kaus, J. W.; Cerutti, D. S.; Krilov, G.; Jorgensen, W. L.; Abel, R.; Friesner, R. A. OPLS3: A Force Field Providing Broad Coverage of Drug-like Small Molecules and Proteins. J. Chem. Theory Comput. 2016, 12, 281-296.

ACS Paragon Plus Environment

Page 49 of 50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(37) Allen, F. The Cambridge Structural Database: A Quarter of a Million Crystal Structures and Rising.Acta Crystallogr., Sect. B: Struct. Sci., Cryst. Eng. Mater. 2002, 58, 380-388. (38) Groom, C. R.; Bruno, I. J.; Lightfoot, M. P.; Ward, S. C. The Cambridge Structural Database. Acta Crystallogr., Sect. B: Struct. Sci., Cryst. Eng. Mater. 2016, 72, 171-179. (39) Bruno, I. J.; Cole, J. C.; Edgington, P. R.; Kessler, M.; Macrae, C. F.; McCabe, P.; Pearson, J.; Taylor, R. New Software for Searching the Cambridge Structural Database and Visualizing Crystal Structures. Acta Crystallogr., Sect. B: Struct. Sci., Cryst. Eng. Mater. 2002, 58, 389-97. (40) Schrödinger Release 2015-4: Canvas, Schrödinger, LLC, New York, NY, 2015. (41) Schrödinger Release 2015-4: Maestro, Schrödinger, LLC, New York, NY, 2015. (42) Schrödinger Release 2015-4: LigPrep, Schrödinger, LLC, New York, NY, 2015. (43) el Tayar, N.; Mark, A. E.; Vallat, P.; Brunne, R. M.; Testa, B.; van Gunsteren, W. F. Solvent-Dependent Conformation and Hydrogen-Bonding Capacity of Cyclosporin A: Evidence from Partition Coefficients and Molecular Dynamics Simulations. J. Med. Chem. 1993, 36, 3757-3764.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment

Page 50 of 50