GRAIL: GRids of phArmacophore Interaction fieLds - ACS Publications

Viet-Khoa Tran-NguyenFranck Da SilvaGuillaume BretDidier RognanViet-Khoa Tran-Nguyen, Franck Da Silva, Guillaume Bret, and Didier Rognan. Journal of ...
2 downloads 0 Views 43MB Size
Subscriber access provided by University of Winnipeg Library

Biomolecular Systems

GRAIL: GRids of phArmacophore Interaction fieLds Doris Alexandra Schuetz, Thomas Seidel, Arthur Garon, Riccardo Martini, Markus Körbel, Gerhard F. Ecker, and Thierry Langer J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.8b00495 • Publication Date (Web): 03 Aug 2018 Downloaded from http://pubs.acs.org on August 7, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

GRAIL: GRids of phArmacophore Interaction fieLds

Doris A. Schuetz1‡, Thomas Seidel2‡,*, Arthur Garon2, Riccardo Martini1,2, Markus Körbel2, Gerhard F. Ecker2, Thierry Langer1,2

1

2

Inte:Ligand GmbH, Mariahilferstrasse 74B/11, A-1070 Vienna, Austria

University of Vienna, Department of Pharmaceutical Chemistry, UZA 2, Althanstrasse 14, 1090 Vienna, Austria

ACS Paragon Plus Environment

1

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 49

Abstract In the absence of experimentally derived, three-dimensional structures of receptors in complex with active ligands, it is of high value to be able to gain knowledge about energetically favorable interaction sites solely from the structure of the receptor binding site. For de novo ligand design as well as for lead optimization, this information retrieved from the protein is inevitable. The herein presented method called GRAIL combines the advantages of traditional grid based approaches for the identification of interaction sites and the power of the pharmacophore concept. A reduced pharmacophoric abstraction of the target system enables the computation of all relevant interaction grid maps in short amounts of time. This allows to extend the utility of a grid based method for the analysis of large amounts of coordinate sets obtained by long-time MD simulations. In this way it is possible to assess conformation dependent characteristics of key interactions over time. Furthermore, conformational changes of the protein can be taken into account easily and information thus obtained well guide a rational ligand design process. A study employing MD trajectories of the oncology target Heat shock protein 90, showcases how well our novel approach GRAIL performs for a set of different inhibitors bound to their target protein and how molecular features of the inhibitors are subject to optimization.

Introduction Since the advent of computers in medicinal chemistry, a plethora of computer-based methods and algorithms have been developed that aim to help human researchers in the cost and time efficient rational design of new drugs for a particular target of interest. Depending on the nature of the input data they are operating on, computer-aided drug design methods (CADD) methods can be broadly classified as either being structure-based or ligand-based. Structure-based

ACS Paragon Plus Environment

2

Page 3 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

methods have the advantage that available knowledge about key residues and their spatial arrangement can be taken into account and thus often deliver models with a higher predictive power.

Among

the

structure-based

approaches,

3D

pharmacophore

modeling

and

pharmacophore-based virtual screening,1,2 which can also be applied to purely ligand-based data, have become increasingly popular in the last decades and matured to valuable and efficient tools for a wide variety of CADD projects.3 When it comes to the development of high quality structure-based pharmacophore models, information about the 3D structure of a ligand-receptor complex from NMR or X-ray crystallography experiments is usually required. However, experimentally determined structures of a ligand-target complex represent just a single snapshot of a dynamic system and thus do not provide information about the conformational flexibility of the ligand. They neither capture motions of the residues in or near the binding pocket.4,5 This leads to the thought, that pharmacophore models derived from afore mentioned crystal structures might include features that are temporary artifacts, caused either by the crystal specific conformation or due to the limitation of looking at a single set of coordinates of the structure. Models can be improved significantly by introducing molecular dynamics (MD) simulations. Multiple plausible conformations of the system can be obtained, and used as a basis for pharmacophore models. In this way, the dependency on a single set of coordinates is avoided. Incorporating dynamic

aspects

from

MD

simulations

into

classical

structure-based

pharmacophore models has proved overall beneficial.6–10 Structure-based pharmacophores derived from specific co-crystals are always limited to representing ligand-target interactions observable within the system studied. If various systems have been investigated, a number of common ligand-target interactions can be taken into consideration. However, this approach still lacks the ability to clearly depict how the ligand needs to be modified to successfully establish

ACS Paragon Plus Environment

3

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 49

additional interactions and therefore increase the ligands’ binding affinity (in terms of KD values). To tackle structure-based design and optimization of ligands without having threedimensional information of a ligand-receptor complex at hand, several approaches have been developed. They can also be utilized when only the structural information of the drug target, but not the co-crystallized structure is accessible. LUDI,11,12 for instance, is a software tool which aims at the automated de novo design of ligands in protein binding sites. The ligands are designed and positioned in a way that hydrogen bonds and ionic interactions can be formed with the protein environment and hydrophobic regions on the protein site will be filled with hydrophobic moieties of the ligand. This is done in three steps: Firstly, interaction sites are calculated. These interaction sites are discrete positions within the binding site which point out suitable regions for the formation of hydrogen bonds or hydrophobic pockets to be filled. The interaction site calculation is guided by information derived from a statistical analysis of energetically favorable non-bonded contacts in crystal packings of molecules from the Cambridge Structural Database.13 An alternative way to generate interaction sites is utilizing built-in rules. Secondly, molecular fragments are placed into those interaction sites. And thirdly the connection of those fragments is obtained by employing suitable bridge fragments to result in actual molecules. The ligands generated by this procedure provide valuable suggestions how protein binding sites can possibly be complemented. The suggested chemical structures should most likely provide high affinity and thus significantly ease the

process

of

novel

ligand

design

and

optimization.

The GRID program developed by Goodford similarly offers a possibility to identify energetically favorable interaction sites, however it approaches the challenge differently,14–16 compared to LUDI.

ACS Paragon Plus Environment

4

Page 5 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

GRID evaluates a classical empirical energy function for a given probe molecule at each point of a regular grid that covers the region of interest within the target protein. Volumetric areas of the grid that have probe/target interaction energies above or below a certain threshold can then be outlined by corresponding iso-surfaces using a suitable graphical display program. Areas with large negative energies indicate energetically favorable regions for the particular probe employed, while those with large positive energies correspond to regions in which the probe would encounter repelling forces. During the calculation of the interaction energies, the target protein remains stationary although the mobility of hydrogen atoms and lone pairs are taken into account. As a consequence, each GRID map can describe the stereoelectronic properties of only one particular heavy-atom conformation of the target protein. Enabling to capture dynamic characteristics of the investigated system, the grid calculation would need to be repeated multiple times for a representative set of conformations which adequately describe large structural movements. To obtain physically meaningful and accurate interaction energy values, GRID uses an energy function (eq 1) that is composed of multiple terms: an explicit hydrogen bonding term (Ehb), a Lennard-Jones term (Elj) and an electrostatic term (Eel). The total energy (E) is calculated as the sum of the pairwise interactions between the probe and all atoms of the target molecule:

 = ∑  + ∑  + ∑ 

(1)

where E is the total Energy. Ehb stands for an explicit hydrogen bonding term, Elj for the Lennard-Jones term and Eel for the electrostatic term. This energy function is essentially the same for every type of probe molecule. By generating grid maps for multiple probes that are orthogonal in characteristics like size, charge distribution and ability to form hydrogen bonds it is possible to identify regions of the target molecule in

ACS Paragon Plus Environment

5

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 49

which specific structural features of a ligand are more favorable or less favorable. This information can be exploited for the design of novel ligands with high putative affinity as well as for the optimization of existing lead structures. This approach works remarkably well as demonstrated by several successful drug-design projects17,18 and derived applications19–21 in which grid map calculations are an integral part of the workflow. One drawback of the approach is that the grid maps are strongly biased to the molecular structure of the probe and the calculated interaction energies always represent a mix of several contributions. These contributions cannot be separated easily and it is challenging to deduce which physicochemical property of the probe is the largest contributor to the energy calculated at a particular grid point. Therefore, for being able to derive a classical pharmacophoric representation of favorable ligand features,22 many grid calculations for different types of probes and substructures of the target usually need to be performed and substantial post-processing is required. The calculation of physically meaningful interaction energy values is computationally expensive and does not scale well with the size of the investigated system. Including dynamic information about the target system (i.e. obtained by long-time MD-simulations) is even more time consuming and practically limited to smaller targets if the results need to be obtained within a reasonable time frame. The method introduced in this paper implements the calculation of interaction scores for an array of grid points in a defined region of interest. In contrast to GRID, our approach called GRAIL works solely on a pharmacophoric representation of the target system and consequently, the probe itself is also a single pharmacophoric feature of a particular type (i.e. hydrogen bond donor, hydrogen bond acceptor, aromatic feature, positive or negative ionizable group, hydrophobic). The calculated scores do not represent physical interaction energies as displayed using the GRID approach but rather reflect how well geometric constraints (i.e. allowed distance

ACS Paragon Plus Environment

6

Page 7 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

and angle ranges) are met when the probe feature is placed at a set grid point. It furthermore gives insight in how well the probe feature interacts with complementary features within the protein environment. Performing grid map calculations on a pharmacophoric level has two main advantages: (i) The calculation of pharmacophore-based grid maps is considerably faster than grid calculations performed on the atomistic level. This is due to the much smaller number of pharmacophoric features derived from diverse parent structures. Therefore dynamic information obtained by MD simulations can be implemented, even for larger systems. (ii) Since the number of different feature types and their interaction counterparts are limited, the number of corresponding grid maps that characterize all typically observed ligand-target interactions in biological systems (H-bonding interactions, π-stacking, coulomb interactions, ...) is also limited. Consequently, performance of only a few grid calculations is required to capture all probable, relevant interactions for a specific target. The resulting calculated grid maps can be considered as ‘pharmacophore interaction fields’ that fully exploit the power of the pharmacophore concept. The following sections will focus on the algorithmic details that are involved in the generation of pharmacophore grid maps. Furthermore, the applicability of GRAIL will be demonstrated and discussed using the oncology target Heat shock protein 90 (Hsp90) as a test system, showcasing how information obtained by MD simulations can be included in this approach.

Algorithm Calculation of Pairwise Feature Interaction Scores Calculated feature interactions scores have to reflect the quality of a grid point for the placement of ligand fragments that can be associated with a given probe feature type. Especially for ligand design and lead optimization, a reasonable measure of quality is the likeliness for the

ACS Paragon Plus Environment

7

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 49

presence of energetically favorable interactions with complementary chemical features of the target. In general, the presence and strength of a non-bonded interaction between two complementary interaction partners depends on characteristic stereoelectronic properties, their distance, and for directed interactions (i.e. H-bond acceptor/donor interactions) also on their mutual orientation. Consequently, the overall interaction score FISij of two complementary features i and j (eq 2) was modeled as a product of distance and angle dependent score contributions DSij and ASij weighted by a factor Cj which accounts for interaction ‘strength’ determining stereoelectronic properties:

 =   ∙   ∙ 

(2)

where FISij is the interaction score of the features i and j. DSij stands for the distance and ASij for the angle dependent score contributions. Cj is the interaction strength weighting factor for feature j.

For each specific type of non-bonded interaction, optimum empirical distance and angle ranges exist which can be used to parameterize the distance and scoring functions. If the feature distances and orientations are within the optimum value ranges, the corresponding scores should be at maximum, and if outside, drop towards zero. A continuous mathematical function which fulfills this requirements is the Generalized Bell Function (GBF) displayed in Equation 3:

  =





(3)

   

where a controls the width of the curve at GBF(x) = 0.5, b controls the slope of the curve at x = c - a and x = c + a and c represents the center of the bell curve.

ACS Paragon Plus Environment

8

Page 9 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Unlike the well-known Gaussian Function, this type of bell function does not drop drastically when deviating from the centerline of the curve, but stays at its maximum for a configurable range of x values (Figure 1). This behavior is more appropriate for the scoring of supposed interactions involving a broad variety of chemical functionalities with the same associated pharmacophoric feature types, yet different distance and angle demands. Due to this, more realistic convergence, the GBF was eventually selected as the basis for all angle and distance scoring functions in Equation 2.

Figure 1. Shape of the Generalized Bell Function plotted for the parameters a = 2, b = 10 and c =5

To employ the GBF for the actual calculation of distance and angle scores, the parameter c needs to be set to the midpoint of the value range and the parameter a to the range half-width. An appropriate choice for the slope of the curve b is 10 which was used in all performed calculations throughout this work. In the herein presented approach, for every fundamental non-bonded interaction type a separate grid map is calculated. Every interaction type is characterized by a specific pair of probe and complementary target feature type. A summary of all currently implemented interaction

ACS Paragon Plus Environment

9

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 49

types which cover most of the relevant non-bonded interactions that typically occur in biological macromolecules is listed in Table 1. For every interaction type, Table 1 provides information about the actual form of the distance and angle scoring functions. Furthermore, associated distance and angle ranges are covered, as well as how the distances and angles are related to the positions and orientations of the probe and target features. The probe feature always possesses point geometry and it is assumed that a virtual ligand fragment matching the probe’s feature type can rotate freely to ensure optimal interactions with complementary chemical features of the target. The geometry of the target feature can be either point, vector or plane, depending on the nature of the interaction. Actual spatial orientations of plane and vector features are derived from the underlying target fragments during the feature perception process, described in detail in the section “Pharmacophoric Feature Perception”. The weighting factor Cj in equation 2 has been introduced to allow for a differentiation of target features of the same type but distinct underlying chemical structure. Differences in chemical structure and, as a consequence, physicochemical and steric properties may significantly affect the interaction energy. An appropriate structure specific choice of Cj for the target features thus allows to account for this fact by scaling the maximum achievable score values. However, in the current implementation Cj is set to 1 for all interaction types listed in Table 1 except for hydrophobic interactions (H-H). For the latter type of interaction, Cj ist set to the summed lipophilicity value (see section “Pharmacophoric Feature Perception”) of the atom set covered by a hydrophobic target feature.

Table 1. Distance and angle scoring functions, associated value ranges and interaction geometries for all implemented interaction types.

ACS Paragon Plus Environment

10

Page 11 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Complementary Feature Pair

Interaction Geometry

Distance Scoring Function/ Distance Range

Angle Scoring Function/ Angle Range

Probe Feature Type

Target Feature Type

Geometry

H

H

Point

GBF(d) d = 2.0 - 6.0

-

NI

PI

Point

GBF(d) d = 1.5 - 5.5

-

PI

NI

Point

GBF(d) d = 1.5 - 5.5

-

AR

PI

Point

GBF(d) d = 3.5 - 5.5

-

PI

AR

Plane

GBF(d) d = 3.5 - 5.5

GBF(a) a = -60° - 60°

HBA

HBD

Vector

GBF(d) d = 1.2 - 2.8

GBF(a) a = -50° - 50°

HBD

HBA

Vector

GBF(d) d = 1.2 - 2.8

GBF(a) a = -85° - 85°

Plane

GBF(dv)*GBF(dh) dv = 3.5 - 6.0 dh = 0.0 - 2.8

-

AR

AR

PF = Probe Feature, TF = Target Feature, GBF = Generalized Bell Function (eq 3), H = Hydrophobic, NI = Negative Ionizable, PI = Positive Ionizable, HBA = H-Bond Acceptor, HBD = H-Bond Donor, AR = Aromatic Ring; The position of the hydrogen atom in HBA-HBD interactions is calculated using an average H-bond length of 1.05Å.

Pharmacophoric Feature Perception

ACS Paragon Plus Environment

11

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 49

The annotation of target system substructures with pharmacophoric feature types is the first step of the grid map calculation workflow. Currently the perception of hydrogen-bond donors (HBD), hydrogen-bond acceptors (HBA), positive ionizable groups (PI), negative ionizable groups (NI), hydrophobic areas (H) and aromatic systems (AR) is supported. This basic feature set suffices to cover the most important types of non-bonded interactions which typically occur in biomolecules. For the perception of HBD, HBA, PI and NI features a simple pattern-based approach has been implemented in which for each feature type a collection of include and exclude SMARTS-patterns has been defined (Table 2). “Include patterns” describe substructures of a molecule that have to be assigned the corresponding feature type they were defined for and are usually formulated quite generic to match as many characteristic substructures as possible. However, too generic patterns might also match fragments that shall not be attributed with a particular feature type. To handle this issue, “exclude patterns” were provided, which define substructures that are covered by an “include pattern” but shall not be assigned the given feature type. For the perception and placement of hydrophobic and aromatic features a procedural approach was chosen. Hydrophobic features are perceived and placed according to the algorithm developed by Greene et al.23 which is described elsewhere. Aromatic features are placed on every ring in the smallest set of smallest rings of the target structure that fulfills Hückel’s rule and the feature orientation corresponds to the normal vector of the ring plane.

Table 2. SMARTS-Patterns used to perceive a subset of the supported pharmacophoric features. Feature Type

Geometry

Patterns

Exclude Patterns

PI

Point

[NX3:3]([CX4])([CX4,#1])[CX4,#1] [NX4:3]([CX4])([CX4,#1])([CX4,#1])[CX4,#1] [N:3]=[CX3:3]([N:3])[!N] [N:3]=[CX3:3]([N:3])[N:3]

[N:1]-[C,S,P]=O

ACS Paragon Plus Environment

12

Page 13 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

[+,+2,+3,+4,+5,+6,+7;↴ !$(*~[-,-2,-3,-4,-5,-6,-7]):3] NI

Point

[c;!$(c[#7][#6]);!$(c[#7][#7][#6]):3]1[n:3][n:3]↴ [ n:3][n:3]1 [P,S:3](~[O:3])(~[O:3])(~[O:3])[OH1,O-:3] [P,S:3](~[O:3])(~[O:3])[OH1,O-:3] [NR;H1,-1:3](-C(=O))-[SX4](=O)(=O) [C:3](=[O:3])[N:3][OH1,O-:3] [C:3]([O:3])[N:3][OH1,O-:3] [C:3][O:3](=[N:3][OH1,O-:3]) [S,C,P:3](~[O:3])[OH1,O-:3] [N:3]-[SX4](=O)(=O)[CX4](F)(F)F [-,-2,-3,-4,-5,-6,-7;↴ !$(*~[+,+2,+3,+4,+5,+6,+7]):3]

C(=O)[OH0+0] C(O)O [P,S](~O)(~O)(~O)N

HBA

Vector

[*:8]~[#8,#7,S;X2:7]~[*:8] [*:8]=[N;X2:7]-[#1] [*:8]#[N;X1:7] [*:8]-[O,S;X2:7]-[#1] [*:8]-,=[O,S;X1:7] [O,S;X2:3] [#7;X3:3] [*:8]-[F:7]

[O,N,S:1]-[a] [O,S:1](-C)-C=O [N:1]-[P,S,C]=O

HBD

Vector

[CX2:7][#1:9] [*](=,:[a,O,S,N])-,:[#7:7][#1:9] [*]=,:[#7:7][#1:9] [#7:7]([*])([*])[#1:9] [#7,O,S:3][#1]

[O:1]-[C,P,S]=O c1[n:1][n:1][n:1]↴ [n:1]1 [N:1]-[SX4](=O)↴ (=O)[CX4](F)(F)F

NI = Negative Ionizable, PI = Positive Ionizable, HBA = H-Bond Acceptor, HBD = H-Bond Donor; Atom map class numbers correspond to the decimal value of a bitset with flags that specify the role of matched target atoms in feature type annotation and the calculation of feature positions and orientations: bit 0 = atom is part of the feature atom set, bit 1 = atom position is used for the calculation of the feature position (centroid), bit 2 = atom position is used in the calculation of the orientation vector origin (centroid), bit 3 = atom position is used in the calculation of the orientation vector target position (centroid) Calculation of Atom Density Grids. The feature interaction scores calculated by equation 2 are atom agnostic and grid points that are close to or lie within the van der Waals sphere of a target atom may obtain unreasonable high scores. The calculated scores thus have to be scaled down by a factor reflecting atom proximity,

ACS Paragon Plus Environment

13

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 49

therefore only grid points in the unoccupied space of cavities allowing for a collision free placement of ligand atoms actually obtain high interaction scores. This is achieved by the calculation of an atom density ADi for every grid point i according to the following equation (eq 4):

 = max   % 

j = 1, , NA

(4)

where dij is the distance between atom j and the grid point i and NA is the number of atoms to consider. The parameter a of the bell function GBFj (eq 3) corresponds to the van der Waals radius of atom j and parameter b is set to a value of 10 for all atom types. Atom densities can be calculated on the fly during the generation of the feature interaction grids or calculated and stored as a separate grid map for later use. Separate grid maps have the advantage, that redundant atom density calculations - which are identical for every feature interaction grid - can be avoided and thus a dramatic decrease of overall computation time may be achieved. Furthermore, separate density grids allow the visualization of the van der Waals surface of the target structure by the display of contour surfaces at a level of 0.5 which is of great help in the analysis and interpretation of the obtained interaction grid maps. Atom density grids also serve as an input for the classification of water molecules according to the properties of the surrounding target environment which will explained in more detail in the methods section.

Calculation of Feature Interaction Grids Depending on the probe feature type and the number of complementary target features, a vector of multiple pairwise feature interaction scores (see eq 2) is usually obtained at every grid point. To arrive at a single interaction score FISi for every point i of the final grid map, we selected the maximum pairwise feature interaction score FISij calculated among all

ACS Paragon Plus Environment

14

Page 15 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

complementary target features j. An alternative approach to derive a single score value is to use the sum of the FISij vector elements. In contrast to the maximum element, a summation will favor interaction “quantity” over “quality” which might also be desirable because multiple complementary target features around a grid point increase the overall chance of interactions with an actual ligand feature that has a significantly limited degree of rotational freedom. As described in the previous section, final interaction scores furthermore have to reflect any steric clashes that might occur due to the presence of target atoms at or near the grid points. All of these aspects can be summarized by the following equation (eq 5):

 = max    ∙ 1 −  

j = 1, , NF

(5)

where FISi is the scalar feature interaction score at grid point i, FISij is the pairwise feature interaction score (see eq 2) of the probe feature and a complementary target feature j, ADi is the atom density (see eq 4) at grid point i and NF is the number of target features complementary to probe feature type. The final (optional) step in the calculation of feature interaction grid maps is the normalization of the scores calculated by equation 5 to the range [0, 1] (eq 6):

,)*+,-./ =

0123 4012536

0125 4012536

(6)

where FISi,normalized is the normalized feature interaction score at grid point i, FISi is the feature

interaction score at grid point i, FISmin is the minimum feature interaction score among all grid points and FISmax is the maximum feature interaction score among all grid points.

Methods Molecular Dynamics simulations for Hsp90

ACS Paragon Plus Environment

15

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 49

All production runs and the system preparation of the apo structures were done in GROMACS 5.0.4 (single precision).24–27 Integration of the equations of motion was done for all simulations using the velocity Verlet algorithm.28 Periodic boundary conditions and the minimum image convention were applied and the Verlet cut-off scheme for neighbor list generation set with a 1.1 nm cut-off. Coulombic interactions were calculated with the Smooth Particle Mesh Ewald29 method using cubic cardinal B-spline interpolation, a maximal FFT grid spacing of 0.16 nm and a real-space cut-off of 1.1 nm. van der Waals interactions were treated with a cut-off at 1.1 nm. All protein and ligand bonds were constrained with the LINCS algorithm30 the SETTLES algorithm31 was used for water molecules. Interaction parameters were taken from the AMBER99SB force field.32 The NVT ensemble was generated with the v-rescale method, pressure-coupling was performed with the Parrinello-Rahman barostat.33,34

Protein-Ligand Complex Simulations Preparation Parametrization of the ligands and preparation of the simulation boxes of Hsp90 in complex with ligands 1 to 4 were done in BIKI 1.3.5.35 Force field parameters were generated from ab initio quantum chemical/molecular mechanics calculations (zero net charge, multiplicity 1 and charge method 'Resp' in BIKI). The initial coordinates were minimized using the steepest descent method. Three 100 ps equilibration steps in a NVT ensemble followed, each with increasing temperature (100 K, 200 K and 300 K) and integration step (1 fs, 2 fs and 2 fs). A force of 1000 kJ mol-1 nm-1 was applied to the protein main chain atoms in the first two equilibrations. A final 1 ns simulation in a NPT ensemble (300 K, 1 atm) with an integration step of 1 fs was performed and the resulting simulation box length was used for the production runs.

Production Runs

ACS Paragon Plus Environment

16

Page 17 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

The production runs of the protein-ligand complexes were carried out with an integration timestep of 2 fs in the NVT ensemble at 300 K. Coordinates were saved every 10 ps with a precision of 0.001 nm. Simulation length was 200 ns for each of the four replica. The simulations were performed on an in house high performance computing cluster (using 256 CPU cores in total and 16 GeForce GTX 980 Ti GPUs).

Calculation of Feature Interaction Score and Atom Density Grids for MD Trajectories All required steps to perform feature interaction score and atom density grid calculations have been implemented as basic modules in our in-house C++ chemoinformatics toolkit CDPKit which is freely available online36 under the terms of the LGPL V2.1. CDPKit provides bindings for the Python programming language which allows to access the whole functionality of the toolkit flexibly from simple scripts while maintaining a high computational performance and memory efficiency. For this reason, the overall workflow (Figure 2) has been implemented as as series of Python scripts which are also available online free of charge.37 The first step in the workflow after MD simulation is concerned with the conversion of the obtained trajectory data to an internal CDPKit molecule file format for a more efficient processing in later steps. The import of trajectory data has been implemented using the MDAnalysis38 Python package and therefore allows the computation of grids for a wide range of input file formats. In the next step, the individual frames of the trajectory are aligned on a set of selected residues. This needs to be done because the target system must not move or rotate relative to the location of the grid box so that the for each frame calculated grids can be overlaid and compared. The alignment reference residues were selected according to the mobility of their backbone atoms relative to the first frame throughout the whole MD simulation which had to be below a configurable threshold. Using just a subset of ‘static’ residues ensures, that larger

ACS Paragon Plus Environment

17

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 49

movements of side chains and secondary structure rearrangements are not averaged-out and thus reflected accordingly in the generated grids while only slightly moving residues result in a less pronounced fluctuation of the scores of proximal grid points. Technically, the alignment of the frames has been carried out by means of a CDPKit implementation of the Kabsch algorithm39 using the centroid of the residue backbone atom positions of the first output frame as a spatially fixed reference point set for the alignment of the remaining frames.

Figure 2: Schematic representation of the overall workflow for the generation and visualization of feature interaction score and atom density grids for MD trajectories.

ACS Paragon Plus Environment

18

Page 19 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Once the trajectory has been prepared as previously described, the computation of the interaction and atom density grids can be carried out. As a prerequisite, the grid calculation requires the definition of a grid box that covers the region of interest of the target system. The grid box dimensions and location are selected in a way that the grid will be large enough to cover a specified list of residues in every frame of the trajectory. This is done in an automatized manner by calculating the overall bounding box of the residues of interest using the residue atom coordinates provided by each frame as an input. Division of the bounding box dimensions by a predefined grid resolution finally results in the desired size of the grid box along each axis. For target feature perception and atom density grid calculations, only residues that overlap the grid box with at least one atom are considered. Feature interaction grids are calculated for each of the supported interaction types given in Table 1 separately, without consideration of water molecules and the ligand. This eventually results in a total of eight output interaction grid maps that characterize the covered target region. In the current implementation, atom density grids are computed individually for residue atoms, water atoms and (if present) ligand atoms. Multiple atom density grids offer somewhat more flexibility in the visualization and analysis of the grid maps and also for the calculation of derived grid maps as described in “Visualization and Analysis”. The resulting set of 11 grid maps for each frame is then stored to a specified output file in a CDPKit specific binary format for further processing in subsequent steps.

Calculation of Score Average and Standard Deviation Grids In a post processing operation, for each type of interaction and atom density grid, average score and score standard deviation values are computed. This is done for every grid point over all frames of the MD simulation. As a result, 22 unique output grids, 11 for both the score average and standard deviation are obtained. Those grids allow to assess the dynamic behavior of the

ACS Paragon Plus Environment

19

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 49

target system in a way such that “stable” and “unstable” grid map regions can be easily identified. In the remaining text, these grid maps are referred to as “averaged MD grids”.

Visualization and Analysis LigandScout 4.240 was mainly used to display grid maps together with the target structure for individual frames (see Figure 3 for examples). For the visualization in LigandScout, the grids maps were converted to the GRID *.kont file format by means of a Python script. For an animated visualization of complete MD trajectory grid maps the open-source data visualization software ParaView41 was used. ParaView allows a versatile three-dimensional visualization of grid data in various styles and is able to fluently display time series data. For this purpose, the grid maps were first converted to a native structured grid data format of the Visualization Toolkit (VTK)42 readable by ParaView with the help of the PyEVTK Python package43. Since ParaView does not natively support the display of molecular structures, the trajectory of the target structure was likewise converted to a VTK specific format for the storage of geometric line data. In this way it is possible to execute an animated visualization of the grid maps along with the target structure for complete MD trajectories (see Figure 4 for examples).

ACS Paragon Plus Environment

20

Page 21 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 3. Examples for the visualization of grid contours enclosing interaction hot spots around selected residues of Hsp90 using LigandScout 4.2. Residue Phe 138: (A) aromatic - aromatic (AR-AR) grid in blue, (B) hydrophobic - hydrophobic (H-H) grid in yellow, and (C) positive ionizable - aromatic (PI-AR) grid in purple. Positively charged residue Lys 100: (arthur 83) (D) negative ionizable - positive ionizable (NI-PI) grid in pink, (E) aromatic - positive ionizable

ACS Paragon Plus Environment

21

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 49

(AR-PI) grid in light blue, and (F) hydrogen bond acceptor - hydrogen bond donor grid (HBAHBD) in red. Negatively charged residue Asp 85. (G) hydrogen bond donor - hydrogen bond acceptor grid (HBD-HBA) grid in green, and (H) positive ionizable - negative ionizable (PI-NI) grid in turquoise.

More interestingly, ParaView allows an on the fly evaluation and visualization of mathematical expressions that involve other grids as variables. Any combination of such in situ generated and loaded input grids can also be displayed dynamically. For example, areas of the grid box where water molecules are in a hydrophobic region without potential H-bonding interactions can be identified and visualized within ParaView as a combination of the water atom density grid and three interaction score grids by the following expression (eq 7):

78 9:);-