Simulation-Based Algorithm for Two-Dimensional Chemical Structure

Simulation-Based Algorithm for Two-Dimensional Chemical Structure Diagram Generation of Complex Molecules and Ligand–Protein Interactions. Tomasz Fr...
0 downloads 13 Views 3MB Size
Subscriber access provided by Macquarie University

Article

Simulation-Based Algorithm for 2D Chemical Structure Diagram Generation of Complex Molecules and Ligand-Protein Interactions. Tomasz Fr#czek J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.6b00391 • Publication Date (Web): 28 Nov 2016 Downloaded from http://pubs.acs.org on November 28, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Information and Modeling is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Simulation-Based Algorithm for 2D Chemical Structure Diagram Generation of Complex Molecules and Ligand-Protein Interactions Tomasz Frączek Institute of Applied Radiation Chemistry, Lodz University of Technology, Zeromskiego 116, 90924 Lodz, Poland

ACS Paragon Plus Environment

1

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 49

Abstract

Computer programs for structure diagram generation (SDG) are indispensable cheminformatic tools that translate 1D or 3D chemical structure data stored in electronic formats to humanreadable 2D depictions. Although many such programs are known, only a moderate part of chemical space can be handled by existing algorithms. For many classes of natural and synthetic compounds the results obtained with current SDG methods are illegible. A new algorithm based solely on a physical simulation of a molecule has been developed. The method allows for a general and global approach to avoid overlapping atoms and substituents. While the algorithm shows no advantage for trivial molecules, it shows superior performance over existing methods in depicting the most challenging compounds. The algorithm can generate chemically correct and legible 2D structure diagrams of many classes of natural and synthetic compounds that are intractable with existing SDG algorithms. The use of the method for generating schematic ligand-receptor interaction diagrams is also discussed.

Introduction Two-dimensional structure diagrams are a standard method of a concise and comprehensive representation of molecular structures of chemical compounds. The conventions of drawing chemical diagrams evolved over the years and are currently regulated by IUPAC.1 2D representation of molecules helps humans to perceive structures of natural and synthetic organic and inorganic compounds that are inherently three dimensional, and often very complex. A glimpse of an eye on a well prepared structure diagram is usually enough for a skilled chemist to understand the basic structure of a compound. Therefore, 2D chemical structure diagrams are

ACS Paragon Plus Environment

2

Page 3 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

widely used in publications, as well as in the most of molecular modeling software, chemical structure editors/viewers, medicinal chemistry tools and chemical databases.2 Manual drawing of 2D chemical diagrams requires skill, and may be cumbersome for some large and complex molecules, and depiction of large datasets is unfeasible. Electronic file formats used to store and process chemical data contain 3D coordinates of atoms or atom connectivity information – both of which are not easily understandable to humans. Electronic databases engines prefer linear notations, such as SMILES,3 to effectively store and browse chemical data, but they usually use 2D diagrams to present query results. For these reasons, there is a need for automated 2D structure depiction methods. The problem of structure diagram generation (SDG) has a long history in cheminformatics, and dates back to 1960-1970s when first algorithms were developed.4,5,6,7 Since that time, a number of computer programs and libraries with SDG capabilities appeared, some of which are open-source (e.g., Open Babel, Indigo, CDK),8,9,10,11 but most algorithms are proprietary and undisclosed. Despite the apparent abundance of such methods, virtually all algorithms fall into one of the two categories: sequential placement and optimization-based. A mix of these two approaches is also possible, nevertheless, the actual variety of SDG algorithms is very small. In the sequential placement approach, the structure to be drawn is analyzed and separated into drawing units (atoms, rings, etc.), then a starting fragment is selected, and the rest of drawing units are sequentially added using a set of priority rules. Various heuristics can be used to analyze a molecular graph, decide a substituent placement, construct ring systems, etc.12,13,14,15 Optimization-based (also known as model-based or dynamic) methods are less popular, and only a limited amount of data on their operating principles is available. In such approaches, a sort of specialized force field is used to find a layout of atoms with a minimum potential energy, as a function of atom distances, bond

ACS Paragon Plus Environment

3

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 49

lengths and bond angles. A simplified model of a molecule is created, and atom positons are iteratively updated to minimize the model’s energy.6,13,15 Despite a large number of possible implementations, the applicability and limitations of existing SDG algorithms are similar. While most of SDG programs are capable of generating correct and readable structure diagrams for simple drug-like molecules, many natural

and

synthetic compounds show a level of complexity that exceeds drawing capabilities of most of existing algorithms (Fig. 1).

Figure 1. Examples of failed depictions of typical natural and synthetic compounds. a) aconitine – Open Babel9, b) paclitaxel – Molegro Molecular Viewer,16 c) punicalagin – Cactvs17,18 d) rifampicin – Indigo10 (note that all double bonds in the macrocycle should have E configuration), e) C60 fullerene – Maestro19, f) metal coordination compound – Marvin20. Depiction failures occur predominately when a normal placement (with consistent bond lengths and angles between bonds) of molecule fragments leads to overlapping atoms or intersecting bonds (hereafter referred to as conflicts). Usually, a simple solution to the problem is possible, such as changing an orientation of substituents, or loosening constraints on bond lengths and angles. In some cases, however, attempting to resolve a conflict at one location in a

ACS Paragon Plus Environment

4

Page 5 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

molecule will result in creating conflicts in other parts of the molecule. An inherent limitation of algorithms based on the sequential placement is that they require to decompose a structure to smaller fragments, and are unable to perceive the molecule as a whole. Each placement decision is therefore only locally optimal.21 Sometimes the consequences of a suboptimal decision may not be realized until the very last stage of a diagram building. An algorithm can use backtracking or test all combinations of substituent placement (both of exponential complexity!),13 but there are situations when this approach will also fail (e.g., when many bond angles/lengths must be modified simultaneously in a coordinated fashion). A good example to illustrate such a situation are cucurbituril compounds:22 to achieve the symmetrical layout shown on Figure 2, a sequential algorithm would have to use a very particular ring placement with many modified bond lengths and angles, from the very beginning of the drawing procedure. It would require the algorithm to somehow envision the final layout of the molecule and to plan its actions accordingly. A sequential building algorithm will rather end up with a linear arrangement of rings, and fix the structure with absurdly elongated bonds (Fig. 2). Another problem is conflict handling. When an SDG algorithm detects a conflict in an initial layout of a molecule, and tries to resolve the conflict, it faces several problems: what is the minimum necessary number of modifications to the structure to resolve all conflicts? Is it better to make one large change, or several smaller adjustments? Where to start with fixing the structure to not limit its degrees of freedom later on? How to make changes to the structure to avoid creating more conflicts? And most importantly: is the initial layout a good starting point to obtain a readable depiction to begin with? All these problems are non-trivial and require an advanced artificial intelligence.

ACS Paragon Plus Environment

5

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 49

Figure 2. 2D representation of cucurbit[5]uril. Left: pre-drawn template - Indigo, right: algorithmically generated - Cactvs The problem of locally optimal molecule layout, inherent to sequential placement algorithms, is to a large extent eliminated in optimization-based methods. These methods use energy minimization for both molecule layout and conflict handling. However, a discrimination must be made if the energy minimization is applied before, or after the initial layout of atoms. In the latter case, the optimization can usually only find one of nearby energy minima (due to the limited degrees of freedom of the initial layout), but it can improve some depictions and resolve simple conflicts. Several sequential-based algorithms use such post-layout energy minimization routines.15,20,23 In order to find an optimal layout with a minimum number of conflicts, an unconstrained molecule has to be subjected to a global energy minimization. Only a handful of such methods were published, namely Carhart’s,6 Hibbert’s24 (both described shortly below), and there are also a few unpublished SDG algorithms of this type included in some larger programs.16,25 However, finding a molecule layout with a low potential energy is not always equivalent to an ideal 2D depiction. The main difficulty for optimization-based methods is how to adhere to the many rules of correct and aesthetic representation of chemical structures, and how, at the same time, give atoms a sufficient freedom of movement to find a conflict-free

ACS Paragon Plus Environment

6

Page 7 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

layout. The apparent lack of a method that addresses these contradictive objectives was the main motivation behind this work. In this paper, a novel, general, optimization-based algorithm for automated 2D structure diagram generation is presented. The method is based entirely on a representation of a structure diagram as a physical object that is subjected to a simulation. During the simulation, a simplified physical model allows the structure diagram to achieve a relaxed state, that in many cases is equivalent to an ideal depiction of the molecule. The physical representation of the structure diagram is chosen to allow the algorithm to avoid most of possible conflicts, while retaining reasonable aesthetics. The basic concept of the described method is not new, in fact it is very similar to the approach used by Carhart,6 although the presented algorithm was not inspired by Carhart’s work. In both methods atom positions are iteratively updated, first in a three-dimensional space, then after flattening, on a plane. However, this work’s algorithm uses different formulas for calculation of atoms positions, with additional terms for bond angles, and atom-bond interactions. In addition, force constants and vector scaling change several times during the simulation (see methods section), and even though the simulated molecule is flattened by an applied force, it is still capable of making substituent rearrangements in the three-dimensional space. It should be noted that the main part of Carhart’s algorithm focused on a transforming of an initial atom layout into a representation suitable for a teletyping device, which makes it now almost completely obsolete (some of the modern programs still support text-based outputs of 2D structure diagrams, e.g., Open Babel). Another approach for an optimization-based SDG was presented by Hibbert, where he used a genetic algorithm to find a set of bond angles that gave a maximum distance atom layout.24

ACS Paragon Plus Environment

7

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 49

However, the method was only a proof of concept, and did not receive an appropriate development. The main drawback of the algorithm was that a molecule was easily trapped in a local energy minimum. In addition, the long simulation time and the difficulty to encode various structural features of chemical compounds into “genomes” make genetic algorithm-based SDG of a low practical use. Other optimization-based algorithms known to the author, such as the one in Molegro Molecular Viewer, perform reasonably well for small molecules, but larger and more complex structures usually end up tangled, and the aesthetics of generated diagrams is very poor (see Comparison with other algorithms in Results section). In overall, to the best of the author’s knowledge, the algorithm described in this work is a first successful implementation of a solely optimization-based, general method for 2D chemical structure diagram generation, applicable to unusual and complicated molecules. The main purpose of the algorithm is to obtain chemically correct and comprehensible depictions of molecules too complex to be handled properly by currently existing SDG methods. An applicability of the new method is not limited only to chemical structure diagrams. A similar but more complex area of molecular graphics is the schematic, flat representation of ligand-protein interactions.2,26 Such diagrams are composed of a 2D representation of a ligand (which is sometimes constrained to a particular conformation), surrounded by amino acid residues which form interactions with the ligand, such as hydrogen bonds, salt bridges, or π-π stacking. On the diagram, all its components must be arranged sparsely, with no overlaps, and with as few collisions and crossing interaction lines as possible. It was realized, that this work’s algorithm can effectively handle both tasks of SDG and amino acid placement simultaneously, even without the need to specially modify the method. As it will be shown in the Results section, a

ACS Paragon Plus Environment

8

Page 9 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

specially prepared input is sufficient to use the presented algorithm for solving the layout problem in protein-ligand interaction diagrams. The presented algorithm can be accessed through Omnidepict online molecule depiction service (www.omnidepict.p.lodz.pl). On the web page, it is possible to upload a chemical file (many formats are supported), and a graphical 2D representation of the molecule will be generated.

Methods Overview of the algorithm The algorithm finds a 2D depiction of a molecule by representing a molecular graph as a system of spring connected, ‘charged’ points repelling each other. The system is simulated in a three-dimensional space with varying force parameters that help the system to reach a global energy minimum. At a certain point of the simulation, the molecule is flattened by an applied force. The design of the system and its ‘physics’ is chosen so to lead the system to a state that resembles a typical 2D chemical diagram. At every iteration, the forces that act on each atom are calculated, and atoms positions are updated. To dissipate the system’s potential energy, atom velocities are scaled down at the beginning of a new iteration. The algorithm does not use any structural templates, and at any point, it does not analyze the molecular graph to search for rings, chains, or any specific structural features. It also does not require initial placement of atoms, and can start with any set of random atom coordinates.

System preparation

ACS Paragon Plus Environment

9

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 49

The algorithm starts with analyzing the atom connectivity table, and adding/removing atoms to the physical representation of a molecular graph: -

All hydrogen atoms from methyl, hydroxyl, thiol, and first-order amino groups are removed.

-

One hydrogen atom is removed from second- and third-order carbon atoms (Fig. 3a).

-

A ‘lone pair’ atom is added to all heteroatoms that are required to attain a 120° angle between bonds in the final depiction (e.g., oxygen in diethyl ether, and nitrogen in pyridine, Fig. 3b), but not if the heteroatom already has an appropriate valence (e.g., nitrogen in triethylamine).

Secondly, special bonds are added to control geometry of double bonds. These additional springs link corresponding neighbors of double-bonded carbon atoms (Fig. 3c). To control bond angles, all atoms having a common neighbor are connected (Fig 3c). Finally, atom coordinates are randomized.

Figure 3. Preparation of a molecule for the simulation. a) removal of redundant hydrogens (circled), b) addition of a special atom to heteroatoms (shown as ‘q’, see text), c) linking atoms with additional bonds (dashed lines) to control angles (left) and geometry of double bonds (right) Simulation

ACS Paragon Plus Environment

10

Page 11 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

The simulation is iterative; at each iteration atom positions are updated, according to the vector sum of acting forces. These forces are: -

All-atom, repulsive ‘electrostatic’ force:  =

 

where crep is a force parameter (initial value equals 0.04), and rij is a distance between an atom pair. This expression is different from the Coulombic potential, where the force is inversely proportional to a square of atom distance. It was found that the used potential performs better than Coulombic, and also has the advantage of omitting calculation of square roots in acceleration vectors:

=

  ∑ 

where d is a distance between atoms in k-th dimension, n is the number of dimensions. This saves a significant amount of time, since the repulsion force is calculated between all atom pairs, and is the longest step in each iteration. If the distance between an atom pair is greater than a certain threshold (which changes several times as the simulation progresses), no force is applied. If the distance between atoms is smaller than a set limit (0.1), an arbitrary value of force is used (1.0). -

Bond spring force:  =   − 

where cbond is a force parameter (initial value equals 0.1), blength is a bond length value for a given bond type, and r is a distance between connected atoms. The blength value for normal bonds is set arbitrarily to 5.0 and for coordination bonds to 8.0. For an atom with a valence greater than four all its bonds are treated as ‘coordination bonds’. For the sake of generality, the algorithm permits any valence to any atom, so there is no periodic table checkup.

ACS Paragon Plus Environment

11

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

-

Page 12 of 49

Angle force: angles between atoms are controlled by virtual bonds that connect all atoms having a common neighbor (Fig. 3c). These bonds are treated like normal bonds, but with a different blength parameter. In the current version of the algorithm, the angle forces are calculated in such a manner only for 120° (√3 × blength) and 180° angles (0.65 × blength). The cbond value is 0.3. For 90° angles, a different approach is used: the repulsion force between all atoms having a common 4-ordered neighbor is doubled.

-

Flattening force: at a set moment of the simulation, a constant force (0.2) is applied to all atoms, which pushes them onto the xy plane. Because the atom acceleration vectors parallel to the z-axis are scaled down more than for the remaining axes (acceleration decay factors are 0.65 and 0.85 for z and x, y axes respectively), the model automatically arranges itself with its widest dimensions on the xy plane. Therefore, the flattening force vector is always perpendicular to the xy plane.

-

Other: to prevent atom-bond overlaps, in the later stage of the simulation an additional repulsive force (crep = 0.5) is applied to atoms that are closer than a set distance (blength/2) to a bond between other atoms. The force vector is perpendicular to the bond.

If a sum of acceleration vectors resulting from all forces exceeds a maximal allowed dislocation per iteration (1.5), it is truncated down to that limit. The number of iterations is constant (5000), regardless of the number of atoms in a molecule. The simulation can be divided into several stages: Stage 1. The simulation starts with only bond and electrostatic forces applied (crep = 0.04, cbond = 0.1). A rough shape of a molecule is obtained. Stage 2. The repulsive force is set to 1.0. A sparse atom layout is obtained.

ACS Paragon Plus Environment

12

Page 13 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Stage 3. A constant flattening force (0.2) is applied up to the end of the simulation. Most molecules become completely flat. Stage 4. The bond spring force is doubled (cbond = 0.2). blength is changed to an average bond length of the simulated model. Angle forces are activated. The distance threshold for repulsive interactions is progressively reduced (from an initial value of 80.0 × blength to 1.0 × blength). The model starts to look like a proper molecule. Stage 5. The repulsive force between overlapping bonds and atoms is added. Hydrogen atoms and ‘lone pairs’ atoms are excluded from repulsive forces calculations. An atom reversing procedure for incorrect angles is applied (explained in the Results section for macrocycles). The model attains its final shape. Problem detection and correction – if the resulting model contains intersecting bonds or incorrectly oriented double bonds, the following automatic correction procedure is used. First, all atoms are marked as ‘frozen’, which means that their positions are not updated during the simulation. Every atom within a set radius (2.4 bond lengths) around each bonds intersection (or incorrect double bond) is unfrozen, the largest set of connected unfrozen atoms is retained, and all the remaining atoms are frozen back. The coordinates of unfrozen atoms are randomized, and the system is subjected to a new round of simulation. During the third stage of the simulation (see above), at regular intervals, all frozen atoms (and all their neighbors) that have at least one non-frozen neighbor are unfrozen, and placed randomly within one bond length radius of the non-frozen neighbor atom. Currently four such unfreezing epochs are used, but a molecule sizedependent number of cycles may be used in future versions. Before progressing to the fourth stage, the simulated (non-frozen) fragment is centered at the original position from before the simulation, and all the remaining atoms are unfrozen with unchanged coordinates. The

ACS Paragon Plus Environment

13

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 49

simulation then proceeds normally. Intersecting bonds are once again detected, and if their number is equal or greater than before the correction procedure, the changes are undone, otherwise the new coordinates are retained. Final processing – the final diagram is rotated by a minimal angle that will maximize the number of bonds which angles are a multiple of 30 degrees. The website version of the program contains also a simple routine for placement of hydrogen atoms (for the downloadable result files), but it will not be discussed here. Results and discussion Simple molecules The algorithm is capable of handling any molecule that can be represented as an atom connectivity table. Simple drug-like molecules can be drawn by the algorithm very reliably. An ideal, high-quality depiction of a molecule requires, beside the chemical correctness, equal bond lengths with regular angles. The described algorithm is able to fulfill these requirements for most of simple molecules (Fig. 4). Additional rules exist that define how a molecule and various substituents should be oriented on a plane (e.g., standard ring orientations), but it should be noted that this task can be handled in most cases by an independent algorithm, after the initial layout process. Such methods do exist,14 and they lie outside the scope of the layout algorithm described in this work. Currently, the algorithm uses a rotation to maximum bond alignment (RMBA),13 as described in the Methods section. For the publication purposes all the obtained depictions were manually rotated around the z axis, if necessary. In many molecules, especially larger ones, some disorder can be seen in the form of non-regular bond angles, which results from a slow convergence near an energy minimum (Fig. 4, lower right structure). Increasing the number of iterations can reduce this problem.

ACS Paragon Plus Environment

14

Page 15 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 4. Examples of depictions of simple molecules obtained with the algorithm. Avoiding of conflicts in simple molecules is achieved automatically, and results from the nature of the method. The potential energy-driven orientation of substituents makes the conflict occurrence not likely, unless a significant congestion of substituents is present in a molecule. There are situations when a conflict cannot be avoided with any choice of substituent placement under the constraints of equal bond lengths and angles (Fig. 5). In such cases, an SDG algorithm can loosen the constraints in order to solve the conflict. The quality of depictions obtained with either approach is largely subjective (Fig. 5). According to IUPAC recommendations, an elongation of bonds should be the first choice to resolve conflicts in most cases.1 However, the bond elongation in SDG programs creates a risk of an overuse that can lead to unpleasing results (see Fig. 2). Moreover, from the physicochemical point of view, the modification of bond angles is a more reasonable approach, because bond angles in real molecules are very often non-regular due to a strain or steric interactions, while any significant stretching of bonds would lead to breaking them. For these reasons, the described algorithm generally prioritizes uniform bond lengths over ideal angles between bonds.

ACS Paragon Plus Environment

15

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 49

Figure 5. Two possible approaches to resolve a conflict of overlapping atoms. Macrocycles Simple macrocycles usually yield a regular polygon near the end of the simulation (Fig. 6a). Such a representation of macrocycles is technically correct, and is common in many structure depiction programs, but not esthetically appealing, and for larger rings it can lead to illegible results. To handle this problem, the algorithm inverts positions of atoms bound to the vertices of a macrocycle, which have angles close to 180° (atom chains are also handled by this procedure, if needed). The routine starts by flagging all atoms that fulfill the following criteria: valence equals 3 (which means that they are connected to three other atoms which are a part of the model, not a chemical valence; see model preparation in Methods section), one (and only one) neighboring atom has a valence 1, the angle between bonds to the two neighboring atoms with the valence greater than 1 is >150°. Subsequently, the procedure attempts to rotate the monovalent neighbors by 180° around the flagged atoms (hereafter called ‘inversion’). The flagged atoms can form rings or chains of various lengths. Chains are processed from their ends, and the inversion of monovalent atoms is decided using the following rules: length one chains (single atoms) have the inversion; two atom chains have the inversion on the first atom; length three chains on the middle atom; length four and longer chains have the inversion on the first atom, the following two atoms are skipped, and the chain length is reduced by three – the procedure is repeated until the whole chain is processed. The same procedure is used for closed

ACS Paragon Plus Environment

16

Page 17 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

chains (rings), in this case the starting atom is a heteroatom (if any) or the atom with the lowest index in the data table. A special situation are double bonds – in this case both monovalent atoms are rotated around the double bond simultaneously to preserve its configuration (unless the configuration is incorrect, in such case only one atom is inverted). The subsequent molecular simulation relaxes the structure. This simple procedure usually yields acceptable results for most macrocyclic systems (Fig. 6).

Figure 6. Examples of macrocyclic compounds depictions generated with the algorithm a) after an initial simulation (all hydrogens shown), b) after applying the atom reversing procedure (see text), c) simple cyclic peptide, d) atypical macrocycle with embedded small rings. Fused ring systems Acceptable layouts of complex fused and bridged rings systems in most cases cannot be obtained by combining single, regular rings, therefore sequential placement SDG algorithms

ACS Paragon Plus Environment

17

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 49

usually must use a library of ring templates.13 Because there is a large number of possible combinations of fused rings, and there may be several different representations of a given ring system, this approach is limited only to the most common cases. In contrast, the simulation based algorithm in this work is able to reliably find ring layouts with as small a number of intersecting bonds as it is reasonably possible (Fig. 7). This approach also allows to avoid the use of nonuniformly sized rings (with one or more significantly elongated bonds) which is a quite common and generally accepted drawing convention, though not chemically correct.27

Figure 7. Depictions of compounds with complex fused ring systems. Aconitine – lower right structure. 3D ring systems Some fused ring systems are typically drawn in a stylized pseudo-3D manner. This is achieved by most SDG algorithms with the use of pre-drawn templates, at the expense of generality. In the present work, the simulation performed in three dimensions usually yields caged ring systems with a shape very similar to the actual 3D geometry in a real molecule. It is therefore possible to generate very good representations of such compounds by the presented approach. However, the flattening force applied at some point of the simulation can cause a collapse of the 3D ring system, and the output is strongly dependent on the parameterization of the method. To obtain

ACS Paragon Plus Environment

18

Page 19 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

optimal results, different sets of parameter for 2D and pseudo-3D diagrams would be required. While of course possible, it is better to find parameters that will give reasonable results for the most of the chemical space. In the current parameterization of the algorithm, smaller threedimensional caged ring systems that are difficult to flatten, such as cubane, retain their shape and result in a desirable depiction (Fig. 8a). Less rigid systems, such as adamantane, become completely flat, but interestingly they adapt a shape similar to a 2D projection of the non-flat molecule, consistent with recommended drawing conventions (Fig. 8a). Large 3D ring systems, like fullerenes, depending on their size and substitution may be partially or completely flattened. Buckyball fullerenes up to C60 do not become planar (Fig 8b). Figure 8c demonstrates the effect of a full flattening, obtained with a different parameterization of the method. This non-standard depiction has maximized atom-atom distances, and though it may be less visually appealing, it is arguably more readable, because no atoms are overlapping.

Figure 8. a) example depictions of ring systems that resemble conventionally used pseudo-3D representations b) C60 can resist being flattened, depending on a parameters set c) deliberately flattened fullerene retains readability. Metal coordination complexes

ACS Paragon Plus Environment

19

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 49

Coordination compounds are especially difficult for 2D depiction algorithms. Metal atoms in such compounds are often multivalent, which results in a congestion of ligands. Polynuclear complexes can form unusual and intricate ring systems. To handle these problems, the described algorithm uses longer bonds for atoms with valence greater than four. Also, in the later stage of the simulation, the spring force for these bonds is reduced, which helps them to attain an optimal length. While this approach breaks the rule of uniform bond lengths, it is often the only way to obtain a legible, conflict-free depiction. In many cases, it is difficult or not feasible for this method to accurately represent coordination spheres of metal atoms in terms of spatial relations, shape, and optical isomerism. This feature would require a prior knowledge of a compound structure and, in many cases, a stylized three-dimensional representation of certain ligands. Instead, the algorithm offers a way of clearly representing the composition and connectivity of a coordination complex (Fig. 9). Some ligand geometries around a central atom (e.g., square planar or octahedral) lead to a coordination isomerism. To conserve spatial relations between ligands and prevent generation of random isomers, the algorithm use virtual bonds that link ligand atoms bound to the central atom. In the current version of the algorithm, only the square planar geometry is supported.

ACS Paragon Plus Environment

20

Page 21 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 9. Examples of metal coordination complexes depicted with the algorithm (charges and counterions omitted for clarity).

Double bonds An important issue in generation of structure diagrams is a correct representation of the double bonds stereochemistry. The presented algorithm has a three-stage control mechanism to ensure a correct arrangement of substituents around a double bond. During the simulation, the cis/trans geometry of double bonds is controlled by two additional bonds connecting the neighbors of double-bonded atoms (Fig. 3c). This forms a system of two fused, four-membered rings of sufficient rigidity to preserve the correct orientation of atoms around the double bond. If, however, the system ends up twisted at the end of the simulation, this fault is detected and corrected, by moving one single-valent atom to the opposite side of the double bond, followed by a short simulation to minimize energy. If it is not possible to move the atom (when none of the

ACS Paragon Plus Environment

21

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 49

atoms neighboring the double bond have a valence one) the bond is marked as erroneous, and the structure correction procedure is called (see Methods section).

Performance The main part of the described algorithm is the simulation phase. In the current version, the number of iterations is constant, regardless of the atom count. The slowest part of the simulation step is the all-atom repulsive force calculation, which results in a second-order polynomial computational complexity of the algorithm. The plot on Figure 10 shows the simulation time for an increasing number of atoms (calculations performed with a desktop computer on a single core of Intel i5-3570K, 3.40 GHz). If the structure correction procedure needs to be applied, the computation time is roughly doubled.

Figure 10. Time required by the algorithm for a given number of simulated atoms. It should be noted that the required time is reported for simulated atoms only; the actual atom count in a molecule can be higher, because some of the redundant hydrogen atoms are removed in the system preparation step. As can be seen, the algorithm can produce more than ten small molecules per second, which makes it suitable for real-time applications. What is also important,

ACS Paragon Plus Environment

22

Page 23 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

the execution time is independent of the complexity of a molecule. Moreover, the shown results come from a development version of the algorithm, and numerous program-level optimizations are possible, which can further boost the performance. To assess the performance of generation of structure diagrams of complex molecules, PubChem database28 was searched with the following settings: molecular weight 4000. The complexity score used in PubChem database is based on the formula of Hendrickson et al.29 A total of 8121 compounds were retrieved. The dataset was processed with the algorithm, and unsuccessful (or probably unsuccessful) depictions were automatically detected on the basis of presence of intersecting bonds. After a single pass, over 70% of the dataset was depicted without detected conflicts, after two more passes this number increased to 80%. The remaining set, consisting mainly of compounds with ring systems that cannot be depicted without intersecting bonds (e.g., fullerene derivatives), was manually examined. Less than 50 cases were identified where the results were unsatisfactory, or could not be obtained with a reasonable probability. In overall, the algorithm was able to handle over 99% compounds of this difficult dataset. However, only a handful of the remaining 1% of unsuccessful depictions were found to be intractable with the algorithm, most of those compounds were successfully depicted with a different parameter set, or multiple passes of the correction procedure.

Comparison with other algorithms To compare the new algorithm with existing SDG methods, a test set of 85 natural, synthetic and hypothetical compounds without a trivial depiction (with regular bond lengths and angles) was prepared. Each test case was chosen to pose one or more problems for a depiction algorithm.

ACS Paragon Plus Environment

23

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 49

The spectrum of problems was chosen to identify most of the limitations of 2D structure depiction programs, and to test their effectiveness to solve conflicts that may appear during the SDG process. Eight programs with an SDG capability, freely available for non-commercial or academic use, were selected for this comparison. (Table 1.) No commercial programs were included in this comparison, mainly due to license limitations which often forbid publication of such benchmarks. However, several commercial programs were tested, but none of them showed a marked advantage over the used non-commercial programs. The new algorithm described in this work will be called PSM (physical simulation method) to discern it from other algorithms.

Table 1. Programs with SDG capabilities used in the comparison (the operating principles of non-open source algorithms are the author’s best guess, based on an analysis of obtained results) Program

Algorithm type Open Babel Sequential placement Indigo Sequential placement JChemPaint Sequential placement Molegro Molecular Viewer Optimization (MMV) based Cactvs Sequential placement Maestro Sequential placement Marvin Sequential placement DataWarrior Sequential placement PSM Optimization based

Open source Yes

Energy optimization No

Reference 8,9

Yes

No

10

Yes (CDK) No

No

30

Yes

16

No

No

17,18

No

Yes

19

No

Yes

20

Yes

No

31,32

No

Yes

This work

ACS Paragon Plus Environment

24

Page 25 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

To not supply any clues to the tested algorithms, the test compounds were prepared in an atom coordinates-free SMILES or MDL SDfile format. The test compounds were submitted to each program, and the obtained results were manually assessed and scored on the basis of chemical correctness and readability. As mentioned earlier, depiction aesthetics are difficult to measure quantitatively, especially for unusual structure diagrams, therefore the visual quality was not assessed here. The structures of test compounds (1-85) and detailed scoring rules can be found in Supporting Information. In short, each generated depiction was awarded with 0 points for a failure, 2 points for a suboptimal depiction, or 3 points for a success. MMV and PSM were able to generate multiple results for each compound. A successful depiction was considered only if it could be generated at least 30% of times. The obtained scores and successful depiction rates for all methods are summarized on Figure 11. The number of ideal depictions (scored with 3 points) can be calculated from the formula:   !"#"$%& = $'%(&$)%+ , 2.55 − &'&&0' !"#"$%&)%+ , 1.73

ACS Paragon Plus Environment

25

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 49

Figure 11. Results of the SDG algorithms comparison. Left column - % of obtained points (max. 255 points), right column - % of successful depictions (regardless of score).

Selected results of the comparison can be seen in Table 2 (additional results and a more detailed discussion can be found in Supporting Information, Table S3, discussion – pages S37S43). Compounds 1-13 tested the ability of SDG methods to detect and correct (if possible) overlapping atoms and intersecting bonds. All conflicts in this subset could be removed by shortening/elongating one or more bonds, or rotating a part of a diagram. All of the tested programs were capable of making at least one of these modifications, but only PSM correctly handled all the compounds in this subset, the remaining methods performed moderately to poorly. The most common faults were: ignoring the occurrence of a conflict or an incomplete/inconsiderate handling that usually resulted in more conflicts. The new method’s advantage lies in the fact that the conflict detection and handling is not an optional feature but an intrinsic property of the SDG process.

Table 2. Selected depictions of compounds 1-85 obtained with the tested SDG algorithms.

1 Marvin

Open Babel

JChemPaint

PSM

ACS Paragon Plus Environment

26

Page 27 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

2

Cactvs

Marvin

Open Babel

JCP

PSM

3

Cactvs

Open Babel

PSM

14

Marvin

DataWarrior

MMV

PSM

19

Indigo

Maestro

PSM

ACS Paragon Plus Environment

27

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 49

24

Cactvs

JChemPaint

PSM

31

OBabel

JChemPaint

PSM

Maestro

Indigo

PSM

33

37

JChemPaint

Maestro

PSM

ACS Paragon Plus Environment

28

Page 29 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

42 Cactvs

Indigo

DataWarrior

JChemPaint

PSM

47 Cactvs

Maestro

DataWarrior

PSM

51

Marvin

MMV

PSM

53

Cactvs

DataWarrior

OBabel

PSM

55

Indigo

JChemPaint

PSM

ACS Paragon Plus Environment

29

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 49

60

Marvin

DataWarrior

MMV

PSM

63 Indigo

OBabel

JChemPaint

MMV

PSM

64

Cactvs

DataWarrior

MMV

PSM

68

Cactvs

OBabel

JChemPaint

PSM

ACS Paragon Plus Environment

30

Page 31 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

77

Cactvs

Marvin

PSM

82

Cactvs

JChemPaint

PSM

Macrocycles (14-29) were predominately drawn as regular polygons or, in simpler cases, with a use of templates. In the polygonal representation of a macrocycle, all double bonds have imposed the cis configuration, which is an obvious limitation, and leads to incorrect depictions in many cases. For 14 Marvin and DataWarrior used a template, but failed to consider a different arrangement of substituents and heteroatoms (Table 2). In compounds where a small ring (or a system of small rings) was fused with a macrocycle with more than one common edge, many algorithms used the following drawing priority: start with a small ring (or a system of fused

ACS Paragon Plus Environment

31

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 49

small rings), add a macrocycle, then add remaining substituents/rings. The effect of such an approach was an unacceptable distortion of other small rings fused with the macrocycle (e.g., 19 and 33 – Indigo, 31 – OBabel and JChemPaint, Table 2). Relatively good results were obtained with the method employed in PSM, described in the Methods section. The routine performs very well for simple rings; for unusual macrocycles it may fail to produce an ideal layout, but it allows for the preservation of double bonds and small rings geometry, and increases the number of bond angles close to 120° (14, 19, 24 – Table 2, Fig. 6). A major flaw in most of the tested algorithms is neglecting the importance of correctly representing the cis/trans stereochemistry of compounds with carbon-carbon double bonds. Cis and trans isomers are essentially different compounds, hence an SDG algorithm must unconditionally be able to generate an appropriate layout of substituents around a double bond. This task is usually trivial in case of aliphatic compounds, because conflicts around double bonds rarely occur. Some algorithms, however, adjust the geometry of double bonds in the last stage of SDG, which can leave an output with an unresolved conflict (Fig. 12a). Most of the tested programs do not have any procedure that allows to handle the geometry of double bonds in rings – their cis/trans orientation is imposed by the shape of a ring. PSM includes a triple control mechanism to ensure a correct representation of the cis/trans stereochemistry, and was able to produce a valid result in all test cases. As discussed in the introduction, cucurbituril-like compounds (37-38) cannot be depicted with greedy (locally optimal), sequential placement algorithms. Adding one ring at a time proceeds smoothly until the next to the last ring. At this point an algorithm reaches a dead-end, and its only option is to use enormously elongated bonds (Fig. 2). A simple energy minimization was able to find layouts with normal-length bonds, but with multiple conflicts (e.g., 37 – Maestro,

ACS Paragon Plus Environment

32

Page 33 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Table 2). PSM was able to find conflict-free depictions of these compounds, though with a rather low reliability (43% and 31% of generated depictions for 37 and 38, respectively) and visual quality (37, Table 2). The calyxarene 39 with bulky substituents on both edges, required placing of smaller t-butyl groups inside the macrocycle, and preferably elongating the methylene bridges to reduce crowding of substituents in the center. None of the tested algorithms, including PSM, were able to achieve such a layout (Fig. 12c). Compounds 40-59 checked the effectiveness of ring placement routines of SDG algorithms. When two rings are fused with more than two edges, at least one of the rings must be drawn in a non-regular manner. For a larger number of rings, there may be multiple solutions to this problem. In the absence of a pre-drawn template, an algorithm must be able to find an optimal solution with a minimal number of distorted rings and crossing bonds. Additionally, it must take into account substituents of the ring system, so larger groups are preferentially connected to the outermost edges.

ACS Paragon Plus Environment

33

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 49

Figure 12. Selected outputs of the tested programs. a) retrospective fixing of double bond orientation by Open Babel for compound 12 (left - initial layout, right – final output) b) the method of fixing double bonds orientation in macrocycles by Marvin (15) c) failed depiction of 39 by PSM d) progressive deformation of rings due to their sequential addition (40, JChemPaint) e) the effect of flat drawing of non-flattenable molecules such as fullerenes (59 - Indigo) The results for this subset are diversified. Most of algorithms found solutions with more crossing bonds than it was necessary. Creating bridges resulted in many cases with atom-bond overlaps. Some of the sequential placement algorithms (e.g., Indigo, Open Babel) used a „bruteforce” approach, adding one ring after another, without any consideration of an optimal solution (53, 55, Table 2). In some situations, the sequential addition of one ring at a time caused a progressive deformation of each new ring (Fig. 12d). Energy minimization used by Marvin, PSM and MMV was quite effective in reducing the number of intersecting and elongated bonds, however, most rings were distorted as a consequence. The final quality and correctness of depictions obtained with those algorithms depended mostly on the parameterization of their force fields (51, Table 2). A template or a good 3D minimization method was required for some ring systems, such as fullerenes, because no other approach could give acceptable results (Fig. 12e). Metal coordination compounds (63-77) were found to be the most challenging cases in the test set. To deal with the ligand congestion it was necessary to modify many bond lengths and angles, and carefully choose placement of substituents. Most of the algorithms were able to depict correctly only 1-2 compounds. The elongation of coordinate bonds was an effective method to obtain a good result for at least six test cases (63-65, 68-70) but this possibility was generally not recognized by the tested algorithms besides PSM.

ACS Paragon Plus Environment

34

Page 35 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

85 is a large dendrimer-fullerene hybrid (the structure can be found in Supporting Information). The number of possible orientations of substituents grows exponentially with each branching of the dendrimer. A sequential building of its diagram requires sampling of a large number of combinations, and is computationally very expensive. Among sequential placement algorithms, only Open Babel successfully found a conflict-free layout, but it required over 16 minutes on a 3.40 GHz Intel i5-3570K processor. PSM which also was able to find a proper layout for 85 has a polynomial time complexity (quadratic to the number of simulated atoms), and needed 7-14 seconds for the task (with a 64% success rate, see Supporting Information, Table S4). In overall, this concise benchmark of SDG utilities showed that non-trivial molecules cannot be reliably handled by older algorithms. Most of the tested SDG algorithms are based on a sequential placement of drawing units. It was found that such an approach lacked the ability to make globally optimal decisions for the placement of substituents. The lack of consideration of future consequences of each placement decision, usually caused problems in complex structure diagrams with a limited number of degrees of freedom of substituent positions. In addition, conflict handling was often ineffective and there was little or no attention given to macrocyclic compounds, fused ring systems and coordination complexes. A different approach to the SDG problem, presented by the new algorithm, showed an unmatched robustness in this comparison. PSM’s potential energy driven layout of atoms was inherently non-local. The method was able to depict complex molecules which no other method could handle, and proves that there are viable alternatives to the sequential placement approach.

Ligand-protein interaction diagrams

ACS Paragon Plus Environment

35

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 36 of 49

2D ligand-protein interaction diagrams are generated by specialized programs included in most molecular modeling environments,19,33,34 and a few standalone tools.35,36,37 Generation of ligandprotein interaction diagrams is a multi-step process that can be divided into three main stages: 1. analysis of an input 3D structure of a ligand-protein complex, 2. generation of a layout for ligands and interacting amino acids, 3. creating a graphical representation of the complex. The first and the last stage are not in the scope of this work. The second stage is the most difficult task in the whole process. It can be further divided into two phases: a ligand representation, which is a typical SDG problem, and an amino acids arrangement. In the latter process the goal is to obtain a sparse, clear layout, with no overlaps or crossing interaction lines. The complexity of the problem stems from the fact that both phases can generate conflicts. Moreover, an optimal result for the first stage, in terms of structure diagrams ideals, can be inappropriate for a conflict free layout in the second stage.35 It is apparent that the problem cannot be reliably solved using locally optimal heuristics of typical SDG algorithms. It was realized, that the unique robustness of the new algorithm can be employed for the generation of optimal (sparse layout, minimum number of conflicts) ligand-protein interaction diagrams. The nature of the algorithm allows it to handle both SDG and amino acids arrangement simultaneously. Interestingly, no special modifications of the algorithm are necessary, only a specially prepared input, and an appropriate post-processing of the output. The basic method for preparing an input is as follows: all interactions are added as explicit bonds; dummy atoms (marked as ‘X’ on Fig. 13) are added to increase the distance between interacting atoms; if amino acids residues are not to be displayed explicitly, dummy atoms are used instead. Examples of a raw output illustrate the method of input preparation (Fig. 13a-b, left structures). Some comments are needed for the π-π interactions representation in the input. On Figure 13a it can be seen that the interacting aromatic rings are

ACS Paragon Plus Environment

36

Page 37 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

connected by a dummy atom located at an arbitrary position, because the input was prepared manually. For an automatic preparation of the input, it could be assumed that there would be some procedure that looks for a most suitable attachment point. In reality, if the program would to be used for the purpose of automatic generation of ligand-protein interaction diagrams, a special non-interacting atom class could be added to the program. Such special atoms would be placed at the center of interacting rings and connected with a bond. The approach used here was chosen only as a proof of concept. Additional modifications can be used to improve results: on Figure 13a one oxygen atom was replaced by a dummy atom (‘Z’) to force the use of 180° instead of 120° angles between atoms; bulky substituents, such as cyclopentyl, can be attached to terminal atoms of amino acid side chains (in place of backbone atoms) to direct them outwards, etc. After generating the atom layout, all unnecessary bonds and atoms need to be removed, and with some graphical post-processing, the final diagram is obtained (Fig. 13a-b, right structures). It should be noted that hydrophobic and other interactions (such as solvent exposure) are not included in this process. Hydrophobic interactions are non-specific, and are visualized by existing programs in different manners. PoseView35 uses spline segments drawn around ligand parts that are located near hydrophobic amino acid residues, while LigPlot36 uses arcs with radiating spokes between interacting atoms and residues. The majority of these other interactions can be visualized with specialized algorithms independently of the layout procedure, therefore they are not discussed here. PoseView35 is probably the most popular program for the automatic generation of interaction diagrams of ligand-protein complexes, and it is used by the RCSB Protein Data Bank (PDB) website.38 While it can supposedly handle 92% of drug-like complexes deposited in PDB, only 80% of them are collision-free.39,40 Indeed, many diagrams found on the PDB website show

ACS Paragon Plus Environment

37

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 38 of 49

problems with PoseView’s SDG utility (e.g., TI4 in 5BRX, TXL in 1IA0, 5QO in 5EOL, 5R2 in 5EPN) or layout procedure (e.g., P06 in 5CSW, 02Z in 3RZB, GDP in 5CA1). The new algorithm, provided that an appropriate input can be prepared, has no practical limitations to handle any molecular complex. In cases where multiple interactions are present along with significant crowding of chemical moieties, the new algorithm is able to find conflict-free layouts, while PoseView usually results in multiple collisions (Fig. 13c).

ACS Paragon Plus Environment

38

Page 39 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 13. Use of the algorithm for generation of ligand-protein interaction diagrams. a-b) raw output (left) and processed image (right). c) comparison of outputs by this work’s algorithm (left) and PoseView (right). Dashed lines and arrows – hydrogen or coordination bonds, dashed line with two black dots – π-π stacking. Limitations of the algorithm The algorithm starts with random initial atom coordinates, therefore the SDG process has a stochastic nature, and successful depictions for a given molecule are obtained with a certain probability. The success rate for simpler molecules is in most cases equal or close to 100%, but for very difficult structures it can be significantly lower (Table S4). The simulation can be repeated with different initial coordinates, until a satisfactory result will be obtained, but from a practical point of view, the success rate should not be lower than 10% or so. Very large compounds with multiple branching substituents, or elongated molecules with a tendency to twist have a high probability of one or multiple faults to occur somewhere in their structure diagrams. Some of the failed depictions contain randomly occurring fragments stuck in a local energy minimum (Fig. 14a). A partial decomposition of large molecular graphs and independent treatment of fragments would probably increase the chances of obtaining better results. Because of the general character of the algorithm and the lack of any size or structure dependent optimizations, the same set of parameters is applied to every molecule. Given the large diversity of chemical structures, it is not possible to obtain a set of parameters that will yield a good result for every existing or possible molecule. To achieve its robustness, the algorithm must make some compromises, which often result in less than ideal (in terms of aesthetics) depictions of simple molecules or their fragments. Typical examples are long alkyl chains. Because the algorithm starts with random initial coordinates of atoms, the resulting depiction is a chain with several

ACS Paragon Plus Environment

39

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 40 of 49

random folds (Fig. 14b). While this depiction is chemically correct, it is aesthetically inferior to results obtained with even the simplest of SDG algorithms. An obvious solution to this problem would be a post-processing routine which detects and corrects such fragments, using a typical sequential-placement algorithm. Another problem affecting the visual quality of results is that there is often a low energy barrier between different orientations of some substituents, and a molecule may not end up in a global, but in a nearby local energy minimum (Fig. 14c). This leads to a conflicting placement of substituents, which could be avoided by choosing a different orientation of one of the substituents. The algorithm will ultimately resolve the conflict by the usual means, but ideally no modifications to a structure should be made if there is an alternative, better depiction. Also in this case a possible solution would be a post-processing routine that tries to reorient some of the substituents, to find a lower energy layout. The bond angle control method is best suited for 120° angles. In situations where mixed angles should be used, as in the case of substituted 3-membered or fused 4-membered rings, the method causes a distortion of the rings (Fig. 14d-e).

ACS Paragon Plus Environment

40

Page 41 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 14. Limitations of the algorithm a) the most common faults found in failed depictions, b) random folding of long alkyl chains, c) suboptimal local placement of substituents, d) deformation of substituted 3-membered rings e) deformation of fused 4-membered rings f) displacement of a phenyl ring caused by combined repulsive forces.

A known class of problematic molecules are macrocyclic compounds with bulky substituents, that need to be placed inside a macrocycle. The repulsive potential of atoms in the first stage of the simulation causes a sparse distribution of substituents, which are pushed away from the center of a molecule. The return of larger substituents to the inside of a macrocycle is hindered in the later stages of the simulation. In simpler cases, the correction procedure is able to solve this problem, there are, however, molecules that cannot be drawn reliably with the current version of the algorithm, e.g., some substituted calyxarenes (Fig. 12c), or the cyclophane on Figure 14f. The problem in the latter example results also from a low rigidity of ring systems in the physical model used by the algorithm. Another limitation is that the model does not allow for a significant elongation of bonds (except coordination bonds) which in some cases would give better results than extensive or multiple modifications of bond angles (e.g., 37, Table 2). The problem is that loosening of bond length constraints cannot be done for all bonds, because the effects would cancel each other out. A special algorithm that decides which bonds should be loosened needs to be developed.

Extensions and future development Further development of the algorithm is possible, particularly addressing depiction aesthetics, and increasing the rate of successful depictions. Due to the simulation-based, uncontrolled nature

ACS Paragon Plus Environment

41

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 42 of 49

of the algorithm, the final output can have non-optimal placements of substituents in terms of aesthetics, or structure diagram drawing conventions. In many simple cases the result can be a perfectly readable and chemically correct depiction, yet visually less satisfying than an output obtained with other algorithms. Therefore, it would be reasonable to create a hybrid algorithm that combines the ability of existing SDG methods to handle simple, non-demanding molecules, with the power of this new, simulation-based algorithm, to depict complex molecules for which other methods fail. To improve the aesthetics of depictions, it would be also possible to add a post-processing routine that analyzes an output, and attempts to fix poorly placed substituents and other faults. Depiction of some molecules with the described algorithm is sensitive to the choice of simulation parameters. To increase the success rate of depictions of various molecules, it would be possible to take an advantage of a multiple-core architecture of modern processors, and simultaneously perform simulations with different sets of parameters, then choose the best result. A function that scores the output on the basis of a chemical correctness and visual quality needs to be developed for that purpose. The correction procedure for failed depictions yields in many cases a better result than the initial simulation output. Some molecules cannot be reliably drawn without invoking the procedure. However, in many cases the correction method does not significantly increase the rate of successful depictions. Therefore, a more sophisticated correction procedure is needed, and it will be the main issue for the future development of the algorithm. The algorithm can be used for a generation of ligand-protein interactions diagrams, however, for that purpose it requires a specially prepared input. To create a fully functional utility for

ACS Paragon Plus Environment

42

Page 43 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

depiction of molecular complexes, the algorithm can be extended with procedures that automatically analyze PDB files, prepare an input and visualize an output.

Conclusions The new algorithm is a simple, yet surprisingly effective approach for the generation of 2D chemical structure diagrams. Without the use of any templates, and with the lack of structuredependent limitations, the method can be applied to any molecule that can be represented as an atom connectivity table. The physical simulation-based nature of the algorithm allows it to overcome the limitations of other SDG methods to find a globally optimal layout of atoms. The algorithm outperforms every other SDG method known to the author, in generating chemically correct and readable depictions of complex molecules. The new method allows to depict a much broader class of compounds that it was previously possible with automatic SDG utilities. The algorithm can also be used to solve the problem of an optimal layout in generation of ligandprotein interaction diagrams. There are some issues that need to be addressed in the future versions of the algorithm, particularly improving the visual quality of generated diagrams. The new algorithm can be used as a standalone SDG method, or in combination with other algorithms as a “last resort”, when all other attempts to produce a readable depiction fail. The described method is written in C++ programming language and is freely available as an online SDG tool at http://omnidepict.p.lodz.pl/. There are many aspects of the described algorithm that can be improved. The author wishes to further develop the algorithm alone or in a cooperation with interested researchers from the industry or academia.

ACS Paragon Plus Environment

43

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 44 of 49

Supporting information Scoring rules for the benchmark, structures of test compounds, score table, selected results of the benchmark, extended discussion of results. This material is available free of charge via the Internet at http://pubs.acs.org.

Corresponding Author *E-mail: [email protected]; Phone: +48 42 631 31 98; Conflict of interests The author declares no competing financial interest.

Acknowledgements The author wishes to thank Prof. Piotr Paneth from the Institute of Applied Radiation Chemistry, Lodz University of Technology, for his useful comments on the manuscript. The author would like to express his gratitude to Dr. Lesław Sieroń for creating the Omnidepict web page. References (1) Brecher, J. Graphical Representation Standards for Chemical Structure Diagrams. Pure Appl. Chem. 2008, 80, 277–410. (2) Zhou, P.; Shang, Z. 2D Molecular Graphics: a Flattened World of Chemistry and Biology. Brief. Bioinform. 2009, 10, 247-258.

ACS Paragon Plus Environment

44

Page 45 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(3) Weininger, D. SMILES, a Chemical Language and Information System. 1. Introduction to Methodology and Encoding Rules. J. Chem. Inf. Comput. Sci. 1988, 28, 31-36. (4) Thomson, L. H.; Hyde, E.; Matthews, F. W. Organic Search and Display Using a Connectivity Matrix Derived from Wiswesser Notation. J. Chem. Doc. 1967, 7, 204-209. (5) Zimmerman, B. L. Computer-Generated Chemical Structural Formulas with Standard Ring Orientations, PhD Dissertation, University of Pennsylvania, Philadelphia, PA, 1971. (6) Carhart, R. E. A Model-Based Approach to the Teletype Printing of Chemical Structures. J. Chem. Inf. Comput. Sci. 1976, 16, 82–88. (7) Dittmar, P. G.; Mockus, J.; Couvreur, K. M. An Algorithmic Computer Graphics Program for Generating Chemical Structure Diagrams. J. Chem. Inf. Comput. Sci. 1977, 17, 186–192. (8) O’Boyle, N. M.; Banck, M.; James, C. A.; Morley, C.; Vandermeersch, T.; Hutchison, G. R. Open Babel: An Open Chemical Toolbox. J. Cheminformatics 2011, 3, 33-46. (9) Open Babel 2.3.2, (www.openbabel.org, accessed 01.09.2013), 2012 (10) Indigo 1.1.12, GGA Software, 2013. (11) Steinbeck, C.; Han, Y.; Kuhn, S.; Horlacher, O.; Luttmann, E.; Willighagen, E. The Chemistry Development Kit (CDK):

An Open-Source Java Library for Chemo- and

Bioinformatics. J. Chem. Inf. Comput. Sci. 2003, 43, 493-500. (12) Shelley, C. A. Heuristic Approach for Displaying Chemical Structures. J. Chem. Inf. Comput. Sci. 1983, 23, 61-65. (13) Helson, H. E. Structure Diagram Generation. Rev. Comput. Chem. 1999, 13, 313–398.

ACS Paragon Plus Environment

45

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 46 of 49

(14) Fricker, P. C.; Gastreich, M.; Rarey, M. Automated Drawing of Structural Molecular Formulas Under Constraints. J. Chem. Inf. Comput. Sci. 2004, 44, 1065–1078. (15) Clark, A. M.; Labute, P.; Santavy, M. 2D Structure Depiction. J. Chem. Inf. Model. 2006, 46, 1107–1123. (16) Molegro Molecular Viewer 2.5, CLC bio (www.clcbio.com, accessed 17.01.2015), 2012. (17) CACTVS 3.420, Xemistry GmbH, 2013 (18) Ihlenfeldt, W. D.; Takahashi, Y.; Abe, H.; Sasaki, S. Computation and Management of Chemical Properties in CACTVS: An Extensible Networked Approach toward Modularity and Flexibility. J. Chem. Inf. Comp. Sci. 1994, 34, 109–116. (19) Schrödinger Suite 2014, Schrödinger, LLC, New York, NY, 2014. (20) Marvin View 6.2.2, ChemAxon, 2014. (21) An algorithm can use special metrics to determine how deeply a particular atom is buried in a molecular graph, like in Ref. 12, but such metrics are practically useless when there are many equivalent atoms, e.g., in some symmetrical compounds. (22) Assaf, K. I.; Nau, W. M.; Cucurbiturils: From Synthesis to High-Affinity Binding and Catalysis. Chem. Soc. Rev. 2015, 44, 394–418. (23) Weininger, D. Smiles. 3. Depict. Graphical Depiction of Chemical Structures. J. Chem. Inf. Comput. Sci. 1990, 30, 237–243. (24) Hibbert, D. B. Generation and Display of Chemical Structures by Genetic Algorithms. Chemometr. Intel. Lab. Syst. 1993, 20, 35–43.

ACS Paragon Plus Environment

46

Page 47 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(25) Spark 10.3.0, Cresset BioMolecular Discovery Ltd., 2014. (26) Stierand, K.; Rarey, M. Flat and Easy: 2D Depiction of Protein-Ligand Complexes. Mol. Inf. 2011, 30, 12–19. (27) Compounds such as ajmaline or aconitine are often drawn in that manner. (28) National Center for Biotechnology Information. PubChem Compound Database, pubchem.ncbi.nlm.nih.gov. (accessed 08.05.2015) (29) Hendrickson, J. B.; Huang, P.; Toczko, A. G. Molecular Complexity: A Simplified Formula Adapted to Individual Atoms. J. Chem. Inf. Comput. Sci. 1987, 27, 63-67. (30) JChemPaint 3.3-1210, The CDK Project (www.sourceforge.net/projects/cdk, accessed 24.04.2015), 2012. (31) OSIRIS DataWarrior 4.1.1, Actelion Pharmaceuticals Ltd. 2015 (32) Sander, T.; Freyss,

J.; von Korff,

M.; Rufener, C. DataWarrior: An Open-Source

Program for Chemistry Aware Data Visualization and Analysis. J. Chem. Inf. Model. 2015, 55, 460–473. (33) Molecular Operating Environment (MOE), 2013.08; Chemical Computing Group Inc., 1010 Sherbooke St. West, Suite #910, Montreal, QC, Canada, H3A 2R7, 2016 (34) Discovery Studio Modeling Environment, Release 3.5, Accelrys Software Inc., San Diego, CA, 2012 (35) Stierand, K.; Maass, P.; Rarey, M. Molecular Complexes at a Glance: Automated Generation of Two-Dimensional Complex Diagrams. Bioinformatics 2006, 22, 1710–1716.

ACS Paragon Plus Environment

47

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 48 of 49

(36) Wallace, A.; Laskowski, R.; Thornton, J. LIGPLOT: A Program to Generate Schematic Diagrams of Protein-Ligand Interactions. Protein Eng., Des. Sel. 1995, 8, 127–134. (37) Laskowski, R. A.; Swindells, M. B. LigPlot+: Multiple Ligand-Protein Interaction Diagrams for Drug Discovery. J. Chem. Inf. Model. 2011, 51, 2778-2786. (38) Berman, H.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T.; Weissig, H.; Shindyalov, I.; Bourne, P. The Protein Data Bank. Nucleic Acids Res. 2000, 28, 235–242. (39) Stierand, K.; Rarey, M. Drawing the PDB: Protein-Ligand Complexes in Two Dimensions. Med. Chem. Lett. 2010, 1, 540–545. (40) It should be noted that this number is inflated by small and simple ligands, or complexes with only 1-2 interactions.

ACS Paragon Plus Environment

48

Page 49 of 49

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

For Table of Contents use only

ACS Paragon Plus Environment

49