Torsion Library Reloaded: A New Version of Expert-Derived SMARTS

Dec 17, 2015 - Inf. Model. , 2016, 56 (1), pp 1–5 ... The overall number of red alerts for a filtered CSD data set with 130 000 structures was reduc...
0 downloads 0 Views 1MB Size
Application Note pubs.acs.org/jcim

Torsion Library Reloaded: A New Version of Expert-Derived SMARTS Rules for Assessing Conformations of Small Molecules Wolfgang Guba,*,† Agnes Meyder,‡ Matthias Rarey,*,‡ and Jérôme Hert† †

Molecular Design and Chemical Biology, Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Grenzacherstrasse 124, CH-4070 Basel, Switzerland ‡ Center for Bioinformatics, University of Hamburg, Bundesstrasse 43, D-20146 Hamburg, Germany S Supporting Information *

ABSTRACT: The Torsion Library contains hundreds of rules for small molecule conformations which have been derived from the Cambridge Structural Database (CSD) and are curated by molecular design experts. The torsion rules are encoded as SMARTS patterns and categorize rotatable bonds via a traffic light coloring scheme. We have systematically revised all torsion rules to better identify highly strained conformations and minimize the number of false alerts for CSD small molecule X-ray structures. For this new release, we added or substantially modified 78 torsion patterns and reviewed all angles and tolerance intervals. The overall number of red alerts for a filtered CSD data set with 130 000 structures was reduced by a factor of 4 compared to the predecessor. This is of clear advantage in 3D virtual screening where hits should only be removed by a conformational filter if they are in energetically inaccessible conformations.

W

e have previously published the Torsion Library,1 a hierarchical collection of rules for the assessment of small molecule conformations. Using rule sets with preferred torsion angles has a long tradition in molecular modeling going back to the development of MIMUMBA.2,3 For the Torsion Library, molecular design experts created a systematic torsion pattern hierarchy and derived rules from the frequency distributions of torsion angles in structural databases such as the Cambridge Structural Database (CSD)4 or Protein Database (PDB).5 The rules consist of SMARTS patterns describing a rotatable bond within its chemical context together with associated frequency distribution information for the respective torsion angle. Peak position(s) as well as first and second tolerance intervals are assigned from these distributions. Frequency distributions are a good estimate of the energy profile of torsion angles: the more frequently a value is observed experimentally, the more likely it corresponds to a low energy geometry. To a first approximation, the conformation of a molecule is a function of its dihedrals and the overall conformational strain can, therefore, be assessed by examining individual torsion angles. The Torsion Library is at the core of the TorsionAnalyzer, a program that colors the rotatable bonds of a 3D molecule using the rules of the library. If the angle of a torsion pattern lies within the first tolerance interval of a peak (i.e., it was frequently observed in CSD molecules), the corresponding rotatable bond is colored in green. If the angle lies within the second tolerance of a peak, the bond is colored in orange. If it lies outside of the second tolerance interval, the bond is colored in red and indicates that this dihedral angle was not observed in © XXXX American Chemical Society

CSD structures and that the conformation is questionable. Both the Torsion Library and TorsionAnalyzer have been previously described in detail.1 We have been using the Torsion Library and TorsionAnalyzer for the interactive assessment of small molecule structures and for filtering out strained conformations in the output of virtual screening applications (docking, shape searches, 3D pharmacophore queries). In our experience, we encountered cases where rotatable bonds were flagged as red even though the angle was just slightly outside its tolerances. An energetically favorable conformation could usually be obtained by a small alteration of the torsion angle. We also identified torsion patterns that did not properly reproduce previously published conformational rules.6 These observations led us to conduct a systematic analysis of the alerts produced by the Torsion Library. The revision of the Torsion Library is based on the tenet that the conformations of the CSD X-ray structures, which are assumed to be close to their energetic minima and are used to derive the torsion rules, should lead to a number of red flags that is minimal. Some X-ray structures may have particular, specific conformations due to, e.g., sterically bulky substituents or crystal packing effects which are not captured by the torsion rules. Putting the Torsion Library categorization scheme into the context of a confusion matrix, torsion angles observed in the CSD structures and categorized as green or orange correspond to true positives, those not observed in the CSD Received: August 19, 2015

A

DOI: 10.1021/acs.jcim.5b00522 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Application Note

Journal of Chemical Information and Modeling

Figure 1. Percentage of red alerts per pattern is plotted against the total number of occurrences for each respective SMARTS pattern in the filtered CSD data set. The following coloring pattern is used: red (>40% red flags), orange (between 10 and 40% red flags), green (