Impact of Binding Site Comparisons on Medicinal Chemistry and

Apr 5, 2016 - The identification of similarities between binding sites of various proteins is a useful approach to cope with those challenges. The mai...
1 downloads 8 Views 3MB Size
Subscriber access provided by GAZI UNIV

Perspective

The impact of binding site comparisons on medicinal chemistry and rational molecular design. Christiane Ehrt, Tobias Brinkjost, and Oliver Koch J. Med. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.jmedchem.6b00078 • Publication Date (Web): 05 Apr 2016 Downloaded from http://pubs.acs.org on April 7, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Medicinal Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

The impact of binding site comparisons on medicinal chemistry and rational molecular design. Christiane Ehrt†, Tobias Brinkjost †,‡, Oliver Koch*,† † Faculty of Chemistry and Chemical Biology, TU Dortmund University, Otto-Hahn-Straße 6, 44227 Dortmund, Germany ‡Department of Computer Science, TU Dortmund University, OttoHahn-Straße 14, 44224 Dortmund, Germany

ABSTRACT

Modern rational drug design not only deals with the search for ligands binding to interesting and promising validated targets but also aims to identify the function and ligands of yet uncharacterized proteins having impact on different diseases. Additionally, it contributes to the design of inhibitors with distinct selectivity patterns and the prediction of possible off-target effects. The identification of similarities between binding sites of various proteins is a useful approach to cope with those challenges. The main scope of this perspective is to describe applications of different protein binding site comparison approaches to outline their applicability and impact on molecular design. The article deals with various substantial application domains and provides some outstanding examples to show how various binding site comparison methods can be applied to promote in silico drug design workflows. In addition, we will also briefly introduce the fundamental principles of different protein binding site comparison methods.

ACS Paragon Plus Environment

1

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 109

INTRODUCTION The knowledge of 3D structures of proteins and protein-ligand complexes is one prerequisite for rational structure-based drug design which deals with the utilization of this structural data to generate ideas for new promising compounds displaying a certain bioactivity. Furthermore, it supports the development from hit to lead to drug in various other ways. The exploitation of known protein structures publically available in the Protein Data Bank (www.rcsb.org)1 becomes more and more complex due to the exponentially growing number of new structures being published daily. Although a plateau can now be recognized in the emergence and discovery of novel folds (Figure 1), the number of identified novel druggable small molecule binding sites is still increasing, especially owing to the development of useful computational methods for the detection of new binding sites.2,3

ACS Paragon Plus Environment

2

Page 3 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

Figure 1. PDB statistics on the number of released PDB entries (blue bars) and the number of new CATH4 folds (red bars) and SCOP5 folds (green bars) per year.6 Apart from that, a recent study suggests a finite nature of possible ligand binding pocket properties and shows the impact of binding site comparison to shed light on the question of ligand promiscuity.7 The fact that the binding of one ligand to different proteins is by no means an exceptional case is illustrated by the statistics of the protein-ligand complex database sc-PDB8,9 (Figure 2). Sturm et al.10 also showed that ligand promiscuity is a direct consequence of the presence of similar binding cavities by performing a large scale analysis of 247 promiscuous ligands extracted from 8,166 sc-PDB8,9 entries. The methods SiteAlign4.011, Volsite/Shaper12, and FuzCav13 were used for binding site comparison. Nevertheless, the authors also found certain similar chemotypes that adapt to different protein environments (especially compounds of very low or very high selectivity), examples where none of the binding site comparison methods presented here were able to succeed in predicting promiscuity.

ACS Paragon Plus Environment

3

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 109

Figure 2. sc-PDB8,9 statistics on the number of unique targets per ligand illustrating the impact of ligand promiscuity.14 Cofactors and ions are not considered as ligands. Today, the comparison of whole proteins is quite easily achievable on different levels of protein structure (primary structure: sequence alignments; tertiary structure and quaternary structure: structural alignments). Nevertheless, the comparison of protein binding sites, that is parts of a protein which are crucial for structure-based drug design approaches, remains challenging. The elucidation of similarities and differences between protein pockets enables researchers to cope with selectivity issues, off-target effects, the prediction of binding sites, and supports the elucidation yet uncharacterized protein functions. It even facilitates the analysis of protein-protein interactions (PPIs). Over the last few decades, it became obvious that similar binding sites can be found in proteins with low or even without any overall similarity. The focus was drawn towards the elucidation of local similarities within proteins on the structural level. Therefore, it becomes increasingly interesting to exploit this “pocket space” as kind of a dictionary to accelerate modern in silico drug design processes. This leads to a need for highly efficient protein binding site comparison methods. Several methods exist to elucidate similarities between protein binding sites on the basis of structural fold similarities or sequence motifs common to proteins with similar functions which were successfully applied in several cases. However, similar ligands can bind to proteins with little or no overall structural similarity and without common sequence patterns. Other methods are required to explain such scenarios. An article providing an exhaustive analysis using CATH (Class, Architecture, Topology, Homologous superfamily)4 data showed that on the one hand, fold is closely related to ligand type, but on the other hand, the EC (enzyme commission) classification is not related to this kind of similarity.15 Additionally, many case studies show that

ACS Paragon Plus Environment

4

Page 5 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

the binding of similar ligands cannot be deduced from fold but rather from local similarities between two proteins. Examples for such binding site pairs of unrelated proteins in the PDB are provided in an article by Barelier et al.16. It might be useful to apply a ligand-based approach to elucidate similarities between protein binding sites by looking for chemically similar ligands binding to different proteins17. An article of Li et al.18 is only one example for a successful application of this approach. In addition to structural databases such as the PDB, the huge knowledge stored in chemogenomics databases such as ChEMBL19, DrugBank20, PubChem BioAssay21, or MATADOR22 can be exploited. In the latter cases, the basic problem is that often the ligand’s binding site is unknown. Therefore, it becomes difficult and tedious to compare protein structures on the basis of ligand-derived data. A drawback of searching the PDB for similar ligands which bind to different enzymes is the restricted number of such examples as compared to the protein ligand binding space.16 Nevertheless, the analysis of the available data led to the development of useful publically available binding site databases such as sc-PDB8,9 or Pocketome23. Unfortunately, those databases only include binding sites of already solved protein-ligand complexes. A myriad of publically available binding site detection and prediction methods such as LIGSITE24, Q-SiteFinder25, Molsite26, Rate4Site27 or MetaPocket28 enables researchers not only to rely on known binding sites but also to identify and compare yet uncharacterized ligand binding pockets of apo structures or homology models. The reader is referred to a comprehensive article for more details on surface-based, energy-based or conservation-based binding site detection algorithms.29 For the reasons outlined above, the comparison of proteins on the binding site level using their solved 3D structures is a promising tool to reveal similarities between proteins which do not seem to have any evolutionary relationship or even do not share a similar fold or function.

ACS Paragon Plus Environment

5

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 109

Moreover, some of them are suitable for the elucidation of crucial differences between binding sites of highly similar proteins. Here, we will not focus on methods using 2D descriptors to represent and compare binding sites30,31 but will introduce methods to compare binding sites in terms of the spatial arrangement of binding site features. The scope of this review is to give a short overview of the basic approaches developed to compare protein binding sites in 3D space but especially to provide examples of some useful applications of those methods. Therefore, the text is divided into two main parts that deal with binding site comparison methods and their applications. The first part starts with an introduction into the different basic approaches for binding site comparison and a classification scheme. Then, it gives a brief summary of the discussed methods. Readers who are more interested in the applications and application domains described in the second part can skip this brief method overview and can refer back later on for more details on a certain comparison method. (1) BINDING SITE COMPARISON METHODS The comparison of ligand binding sites in proteins is motivated by the basic assumption that similar proteins or protein binding sites bind similar ligands. This statement can be interpreted in various ways depending on the description of ligand and target space which is nicely outlined in an article by Rognan32. Typically, it is true that similar physicochemical properties of binding sites lead to the binding of ligands with high similarity in terms of overall structural similarity33 as determined with such metrics as the Tanimoto coefficient and various other measures34. Binding site similarity regarding physicochemical properties represents the basis for many in silico drug design methods. A review on distinct ligand binding pockets focuses on this interpretation of similarity.7 Nevertheless, there are known examples where the local similarity of proteins in terms of structural similarity of the protein backbone leads to a binding of similar

ACS Paragon Plus Environment

6

Page 7 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

scaffolds.35,36 Therefore, the question arises: How do we define binding site similarity? Unfortunately, a unique definition of binding site similarity is not possible and always depends on the way binding sites are compared with each other. On the basis of a study of the data stored in the PDB, Barelier et al.16 identified at least three classes of recognition of an identical ligand by different proteins: proteins of different families with related binding site residues, proteins of different families that bind common ligand functional groups via dissimilar residues, and different (or similar) proteins recognizing different functional groups of a similar ligand. Furthermore, they state that their detailed analysis did not include all binding events of similar ligands to different targets. So, how should we compare proteins and especially protein binding sites? To shed light on this question, the next two paragraphs deal with basic approaches towards the description of ligand binding sites and different available comparison methods. INTRODUCTION INTO DIFFERENT BASIC APPROACHES Binding sites are compared by considering the proteins’ surfaces, physicochemical properties, interaction profiles, or backbone structure. In general, the published methods can be differentiated on the basis of the modeling of the binding site, the comparison algorithm, and the way the developers score or rank the identified similarities. While some of the methods use whole structures and are able to find local similarities, others rely on predefined binding sites making it difficult to use apo structures of proteins. The automated identification of protein binding sites on its own is a challenging task with several different approaches. Therefore, we would like to refer to a comprehensive review focusing on the different approaches to characterize protein binding sites.37 Here, the following questions should be answered: ‘What is compared?’, ‘How is it compared?’, and ‘What means similarity?’.38

ACS Paragon Plus Environment

7

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 109

Due to the different facets of binding site comparison, the presented methods are classified according to the model used to represent the ligand binding pocket features. A multitude of methods uses graph models. Objects such as atoms, functional groups, or surface points are represented as nodes connected by edges that can be labeled with distances or other characteristics of the relation between two nodes. Other comparison algorithms use purely geometric representations of binding site atoms, for example as points, feature clouds, or even volumes. The use of pharmacophore-based features to describe binding sites often results in fingerprints that are compared with each other saving a huge amount of computational time. Furthermore, vectors or grids can be used to represent binding sites leading to more complex comparisons. The computationally most demanding methods compare binding sites on the basis of distributions leading to high accuracy but also to high sensitivity in terms of protein flexibility. Figure 3 summarizes all methods in a comprehensive manner. It has to be pointed out that the classification scheme is a very simplified one in view of the high diversity of binding site comparison algorithms. Some methods compare binding sites in multiple consecutive steps using different comparison types. Others model binding sites in various ways to reduce runtimes or to introduce implicit fuzziness, for example to account for the flexibility of proteins and uncertainties in the protein structure models. Nevertheless, the methods are grouped accordingly to provide a crude measure of runtime and computational costs.

ACS Paragon Plus Environment

8

Page 9 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

Figure 3. Different levels of binding site representation for subsequent comparisons. The figure is based on a crystal structure of matriptase in complex with an inhibitor (pdb 4r0i) and was generated using MOE 2013.0839. Table 1 summarizes the presented methods and provides information about their availability as well as their respective applications described below. An overview of those and alternative approaches, their availability, and the respective web sites can be found in the supplemental material. Table 1. Binding site comparison methods, their availability, and their fields of successful application. Links to the respective web servers, mail servers, and web sites with the

ACS Paragon Plus Environment

9

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 109

downloadable software are included in the Supporting Information. Because PSSC40 cannot be regarded as an independent algorithm, it is not included in this table. method

availability

model

application

APF41

software (commercial)

grid

Off-Target Prediction42

software (commercial)

graph

Cavbase44

Homolog Analysis43 Protein-Ligand Interactions45 Virtual Screening46 Evolutionary Relationships47 Drug Repurposing48

CMASA49

software (free), mail server (free)

geometric (points)

Homolog Analysis50

geometric (points)

Function Prediction52

COFACTOR51

web server (free)

CPASS53

web server (free for geometric academic users) (points)

PPIs54

eF-site55

web server (free)

graph

Function Prediction56

FLAP57

software (commercial)

fingerprint

Prediction of Compound Selectivity Profiles and Affinities58

fPOP59

web server (free)

geometric (points)

Evolutionary Relationships60

FuzCav13

software (free)

fingerprint

Protein-Ligand Interactions10

IFP61

software (free)

fingerprint

Virtual Screening62

IsoCleft Finder63

web server (free)

graph

Drug Repurposing64

KRIPO65

not publically fingerprint available

Off-Target Prediction66

publically geometric of not (points) and available

Off-Target Prediction68

Method Brakoulias Jackson67

Method of Wu et not

publically fingerprint

Identification of Novel Targets70

ACS Paragon Plus Environment

10

Page 11 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

al.69

available

PESD71

web server (open)

PocketFEATURE7

web server (free for fingerprint academic users),

3

distribution

Prediction of Compound Selectivity Profiles and Affinities72 Protein-Ligand Interactions74

software (free for academic users) PocketMatch75

web server (free),

fingerprint

Function Prediction76 Polypharmacology Prediction77

software (free)

Homolog Analysis78 PoLiMorph79

not publically graph available

Binding Site Prediction80

ProBiS81

web server (free)

Function Prediction82

graph

Off-Target Prediction83 PSIM84

software (commercial)

grid

Off-Target Prediction85

Query3d86

web server (free)

geometric (points)

Evolutionary Relationships87

Shaper12

software (free)

grid

Protein-Ligand Interactions10

SiteAlign11

software (free)

fingerprint

Protein-Ligand Interactions88 Protein-Ligand Interactions10

SiteEngine89

web server (free for geometric academic users), (points)

PPIs90

software (free for academic users) SMAP91

software (free for graph academic users)

Polypharmacology Prediction92 Drug Repurposing33 Drug Repurposing93

SOIPPA94

not publically graph available

Drug Repurposing95 Off-Target Prediction96

ACS Paragon Plus Environment

11

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

SurfaceScreen97

not publically distribution available

TM-align99

web server (free), software (free)

geometric (points)

Page 12 of 109

Binding Site Prediction98 Drug Repurposing100

(2) THE DIFFERENT LEVELS OF BINDING SITE COMPARISON We do not intend to explain the underlying algorithms in detail but rather aim to provide some basic information on the different methods and to place emphasis on their unique properties. Due to the huge amount of available tools, we focus on a small set of algorithms which were successfully applied in the field of medicinal chemistry. The Supporting Information contains a more comprehensive list of developed methods and their availability. Binding Site Comparisons Based on Fingerprints FLAP (Fingerprints for Ligands And Proteins)57 is based on the binding site analysis using GRID101 molecular interaction fields. A limited series of distinct chemical probes is used to scan the protein cavity. This analysis results in the identification of favorable and unfavorable interactions. The application of a weighted energy-based and space-coverage function allows the condensation into fewer pharmacophore property-based points. For each protein or binding site of interest all possible 4-point pharmacophores are calculated. Target pair comparisons are performed as many times as there are binding sites under investigation. Two proteins are assumed to be similar if the 4-point pharmacophores of proteins A and B are equivalent and their cavity shapes are similar. The similarity is calculated by multiplying the total number of shape coincidences T(i,j) with the number of shape coincidences of the inverted match T(j,i) and division of this term by T(i,i)*T(j,j) to normalize the corresponding similarity values.

ACS Paragon Plus Environment

12

Page 13 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

Another method called IFP (Interaction FingerPrints)61 translates the 3D information about protein-ligand interactions into a one-dimensional bit vector representation. Interactions (hydrogen bonds, weak hydrogen bonds and ionic, π-cation, metal, hydrophobic interactions as well as face-to-face and edge-to-face aromatic interactions) are determined using different geometric rules. A modified form of the Tanimoto coefficient is used as a measure for similarity. KRIPO (Key Representation of Interaction in POckets)65 analyzes ligand fragments and local binding sites that were derived using a set of transformations rules for bond cleavage. Altogether, 299,591 fragments were extracted and included in the KRIPO database with their respective local protein binding sites which are represented by residues with at least one atom within the radius of 6 Å of any ligand atom. The features to derive pharmacophore-based fingerprints are based on interactions at certain positions relative to the residues (hydrogen bond donor or acceptor, aromatic T- or π-stacking interaction, hydrophobic contacts, interaction of positively or negatively charged moieties). After feature assignment to the pocket residues, only pharmacophore features within 2.5 Å of any of the ligand fragment atoms are used for binding site representation. The distances between pharmacophore features are binned into discrete distance ranges according to a set of binning schemes. Three fingerprint schemes (tow-point, three-point, and four-point pharmacophores) were compared to optimize the method. As a result of the optimization procedure, fuzzified 3-point fingerprints are applied to represent local binding sites. The feature opposite the shortest distance is prioritized to calculate pairwise distances. As a similarity measure, the developers use a modified Tanimoto similarity fingerprint based on the mean bit density.

ACS Paragon Plus Environment

13

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 109

SiteAlign11 makes use of a projection of eight topological and physicochemical properties from cavity-lining protein residues to an 80 triangle-discretized sphere placed at the center of the binding site of interest. A sphere with a radius of 1 Å is placed at the active site center and three topological (distance between Cβ and the sphere center, side chain orientation, size) as well as five chemical descriptors (hydrogen bond donor or acceptor, aromatic, aliphatic, charged) are used to represent the cavity residues. The cavity descriptors are projected onto the sphere according to the respective amino acids. This is achieved by deriving a geometrical vector from the Cα atom of the active site residues to the sphere center and a respective descriptor assignment to the 80 triangles. The resulting fingerprint of 640 integers (8 descriptors, 80 triangles) is subsequently used to compare binding sites with each other. The similarity score is calculated according to the sum of normalized differences for each triangle descriptor and the number of triangles with non-null values. FuzCav13 is uses generic 4,833-integer vectors describing druggable protein-ligand binding sites. In the first step, the Cα atom coordinates of residues lining the binding site are extracted and annotated by six pharmacophoric properties in dependence of the corresponding residues (hydrogen bond donor or acceptor, positive or negative ionizable, aromatic, aliphatic). The vector’s integers register the count of unique pharmacophore feature triplets (three properties and three respective distances between Cα atoms). The vector size is decreased by distance binning and removal of redundant triplets. In comparison to most other methods, this approach uses exclusively Cα atoms for the structural representation of the binding site. Another publication102 underlines the applicability of this type of pocket representation. The developers calculate the binding site similarity via division of the number of common non-null counts in both fingerprints by the minimum number of non-null counts for the fingerprints of protein A and B.

ACS Paragon Plus Environment

14

Page 15 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

The following two methods do not use fingerprints, but make use of 2D representations of geometric binding site properties in form of lists and strings. Those approaches lead to comparable or even lower computational costs as compared to fingerprint-based methods. PocketMatch75 is an example for the comparison of binding sites based on a one-dimensional representation of the binding site by a set of all distance pairs between points derived from the chemical nature of the cavity residues. By default, 20 natural amino acids are divided into five distinct groups. A pair of sorted distance sequences is aligned and similarities are scored according to the average of the number of matching distance elements. The method of Wu et al.69 compares local structures in proteins on the basis of their one-dimensional structural sequence according to a so-called structural alphabet103. The backbone of a set of non-redundant protein structures is represented by consecutive five residue segments. They are described by a vector of the respective eight backbone dihedral angles V(ψn2,φn-1,ψn-1,φn,ψn,φn+1,ψn+1,φn+2).

The dissimilarity between two of those vectors is determined by

calculating the rmsd of the particular dihedral angles. Altogether, 16 distinct protein blocks were identified with the help of an unsupervised cluster analyzer. They were defined as the structural alphabet. This enables the translation of 3D protein structures into strings of structural letters by the use of a five residue sliding window and the assignment of the closest structure out of the structural alphabet. Recurring structural patterns (more than three times in a set of non-redundant proteins) are then classified as potential 3D motifs, avoiding the necessity of a query motif or the knowledge of essential residues. Furthermore, this method does not consider any local physicochemical similarities. Binding Site Comparisons Based on Grids

ACS Paragon Plus Environment

15

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 109

Using continuous pharmacophore-based Atomic Property Fields (APF) as proposed for ligands104, seven atomic properties (hydrogen bond donor or acceptor, lipophilicity, size, electronegativity, charge, hybridization) are represented as continuous potentials projected in 3D space from atom centers using Gaussian functions.41 Grids are generated with seven APF potential components. One binding site is defined as a collection of receptor atoms carved by a sphere around the ligand of the protein structure and is used to generate seven-component APF potentials on a grid. The second site (regarded as rigid) is placed in these grid-potentials and subjected to a Monte-Carlo minimization procedure to elucidate the optimal superposition. The similarity of atomic property distributions is represented by the APF pseudoenergy for the optimal alignment. Distance measures are derived for the calculation of distance matrices and subsequent clustering. PSIM (Protein SIMilarity)84 is a binding site comparison method based on the morphological similarity between protein pockets. It is derived from a comparison method for small molecules105. Observation points are used to measure the surface of the binding site as seen from the ligand’s perspective. Originally, the presence of a bound ligand in the structure was a prerequisite for comparison. This necessity could be circumvented by the introduction of an additional binding site detection method106. Each binding site is represented by a tessellation with so-called observation points as determined from bound molecules. The distance to the nearest steric, positively charged, and negatively charged surface is measured. The similarity function is based on the comparison of distances to the molecular surface from these points. Successively, an alignment is searched that maximizes the similarity function. For this scope, the bound ligand is placed in a uniform grid. Molecule grid points within a chosen distance are weighted on the basis of their minimum distance to the molecular surface. Because the so-called

ACS Paragon Plus Environment

16

Page 17 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

observation points will perceive the same things in the optimal alignment when the molecular surfaces are highly similar in shape and polarity, the alignment is optimized. A transformation is applied to move each observation point set of the opposing protein onto the reference pocket and the differences in the corresponding surface measurements are minimized. The similarity is calculated as the normalized sum of weighted Gaussian-like functions of differences in distances from observation points to molecules. A tool called Shaper12 compares protein binding sites with respect to their physicochemical properties. A regularly spaced pharmacophore-based grid points represents the pocket, thereby creating an inverse image of the binding site (as perceived by the ligand). The annotation of cavity grid points is realized by a projection of a protein-ligand complex on a grid. A property is assigned to each cell according to its location with respect to the protein (hydrophobic, aromatic, hydrogen bond donor or acceptor, positively or negatively ionizable, null). A shape-based alignment method is used to align cavities by an optimization of the annotated shapes’ volume overlap. The similarity is calculated by a Tversky index. As compared to Tanimoto this similarity measure allows for weight assignment towards the reference and fit pocket. Binding Site Comparisons Based on Graph Models In 2002, Schmitt et al. published a binding site comparison method that relies on a database called Cavbase44. This assembly of cavities, identified using the cavity detection method LIGSITE24, was created for the data stored in the Relibase107,108 database. The physicochemical properties of the cavity-flanking residues are mapped on so-called pseudocenters corresponding to five properties (hydrogen bond donor or acceptor, mixed donor and acceptor, hydrophobic aliphatic, aromatic), leading to the generation of a graph model. The Bron-Kerbosch clique

ACS Paragon Plus Environment

17

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 109

detection algorithm is applied to find a maximum common subgraph. Similarity scoring is achieved by a calculation of the overlap in surface points which are assigned to identical physicochemical properties. eF-site (electrostatic-surface of Functional site)55 is a database of protein surfaces for the comparison of functional sites of known protein structures. The electrostatic potential of the protein’s binding site and hydrophobic scales were used to label the vertices of the graph models used for comparison. Functional sites were selected according to bound ligands or active site annotation. The molecular Connolly surfaces were labeled with electrostatic potentials as well as hydrophobic scales and represented as graphs. Finally, the binding site comparison was performed using the Bron-Kerbosch clique search algorithm. The similarity score was derived from the nodes and edges of the largest clique (based on electrostatic potentials, hydropathy values, distances, angles, and dihedral angles). The method IsoCleft63 derives pocket atoms using the SURFNET algorithm109 or the conservation scores based on the ConSurf-HSSP database110. The cleft is represented as a graph model. The largest common subgraph isomorphism is searched by the construction of an association graph (nodes are atom pairs with chemical similarity and edges are drawn if atom pairs are geometrically similar). Eight atom types are used to define chemical similarity (hydrophilic, hydrogen bond donor or acceptor, hydrophobic, aromatic, neutral, neutral donor or acceptor). The largest subset of atoms with identical types is searched in two consecutive steps to reduce computational costs. First, an initial superposition is achieved by detecting the largest clique in an association graph by the Bron-Kerbosch algorithm using only Cα atoms of equivalent residues. Atoms are then superimposed on the basis of the transformation matrix and translation vectors of the largest clique by a least squares fitting procedure. In a second step, all

ACS Paragon Plus Environment

18

Page 19 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

non-hydrogen atoms are matched with two restrictions: atoms have to be of the same type, the spatial distances after the first superimposition have to be within a certain threshold. Cα atoms are not used in this step. Similarity is determined by three measures: modified Tanimoto scores, clique size, and an additional measure which reflects the binding site’s size. Xie and Bourne94 developed a sequence-order independent profile-profile alignment (SOIPPA) that can be used to detect the local similarities between unknown binding sites by comparing whole proteins. The approach is based on a protein structure representation using Delaunay tessellation of Cα atoms characterized by geometric potentials. A mesh surface is generated around the Cα atoms so that a normal vector that is perpendicular to the mesh surface can be assigned to each Cα atom. It also encodes a graph representation in 3D space where the Cα atoms represent nodes which are connected by edges derived from the connections in the regular tessellation. This graph representation is used to find the maximum weight common subgraph between two proteins. The weight is represented by chemical similarity using the McLachlan chemical similarity matrix111 or evolutionary correlation represented by the BLOSUM45 substitution matrix112. Later, a unified statistical model was introduced to assess the statistical significance of found similarities. SOIPPA and the established extreme value distribution model were combined in the software SMAP.91 ProBiS (Protein Binding Sites)81 was developed to compare patterns of physicochemical properties on proteins’ surfaces. Using graph models and a maximum clique detection method113, the software was designed to find local structural similarities between a query protein and other known protein structures. This algorithm compares complete proteins without previous definition of binding sites. Therefore, the method can also be applied to detect yet unknown binding sites.

ACS Paragon Plus Environment

19

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 109

Finally, four scoring measures are provided: surface vectors angle (outer-pointing surface vector originating in the geometric center and perpendicular to the protein’s surface for each superimposed set of vertices), surface patch rmsd, surface patch size, E-value. Another graph-based comparison method is PoLiMorph (Pocket-Ligand Morphing)79. Proteins are represented by self-organizing graphs that fill the cavity volume. Instead of clique detection, a fast heuristic algorithm was developed. The vertices of these 3D models are labeled with information about the local ligand-receptor interaction potential coded by fuzzy property labels to replace a crisp property assignment to the vertices by smooth functions. The properties (buriedness, solvatability, electrostatics, and hydrogen bond terms) are directly derived from the structure and projected onto the vertices. On the basis of the original GRID101 approach, the vertex labels are integrated from five grid potentials of their surrounding grid points. After finishing these steps, a 3D fuzzy pocket graph is fit into the pocket. The grid point coordinates are used as input vectors. The final graph comparison is based on an error-tolerant heuristic for the matching of 3D graphs. The scoring function includes similarities with respect to property distributions and protein-ligand interaction patterns. Binding Site Comparisons Based on Geometry (Points) The method of Binkowski et al.114 identifies all surface pockets and interior voids of proteins stored in the PDB. A set of protein surface patterns called pvSOAR patterns (pocket and void surface patterns of amino acid residues) was built for each protein pocket and void in the CASTp database115. The similarity of the patterns of pocket-forming residues is assessed in terms of sequence, spatial arrangement, and directional arrangement. Protein pockets and interior voids are calculated using weighted Delauney triangulation and the α shape method. A set of protein

ACS Paragon Plus Environment

20

Page 21 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

surface patterns is derived. Their sequence patterns are used to assess the similarity relationship among protein surfaces using the Smith-Waterman algorithm as implemented in SSEARCH116 and the BLOSUM50 substitution matrix112. Shape comparison is performed to obtain an optimal alignment in terms of rmsd values. One point is used to represent a residue (more than one surface atom: geometric center of atoms). Additionally, a different scoring measure for the dissimilarity in pocket shape called oriented rmsd (ormsd) was introduced to overcome the drawbacks of coordinate rmsd (increases as the number of aligned residues increases, will be dominated by a few outliers). The method combines two levels of binding site comparison: structural binding site comparison and binding site sequence comparison. SiteEngine89 matches binding sites or proteins on the basis of a geometric representation of their physicochemical features. In comparison to computationally demanding clique detection algorithms, this method relies on efficient hashing and matching of triangles of centers with different properties (e.g. hydrogen bond donor or acceptor, mixed donor and acceptor, hydrophobic aliphatic, aromatic). The modeling of the binding sites’ Connolly surfaces is similar to that of the Cavbase44 approach. All possible transformations for the superposition of two pockets are calculated and scored in a hierarchical manner. The final scoring function for the optimized match includes goodness-of-fit between matched pseudocenters as well as the size and shape of the common overlap region. In the following year, two web servers were provided for the comparison of binding sites and protein-protein interfaces with SiteEngine.117 A fast comparison of proteins at atomic level is also possible with a method developed by Brakoulias and Jackson67. A series of triplets is generated by building “seed” matches consisting of three atoms forming a triangle. Atom to atom distance cut-offs are introduced to restrict the number of triangles and their size (upper and lower boundary). Atom types (carbon, nitrogen,

ACS Paragon Plus Environment

21

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 109

oxygen, sulfur, phosphor atoms) and distances are stored. For the reference binding site, a discretized 3D image is generated. It constitutes the reference frame where atom types are stored at indices in a Cartesian grid (grid size controls resolution). The comparison is achieved by systematically matching all possible triplets between two objects. For each match within a predefined maximum length difference a transformation is performed using a least squares fitting procedure. The resulting transformation (translational and rotational matrix) is applied to the second object by converting the x,y,z coordinates. Then, a score of unity is given to each atom type match. Those steps are repeated to find the match with the highest score in a way that the rmsd of the superposed atoms is below 1 Å. Finally, the transformation matrix with the highest score is stored. As compared to an additionally implemented Bron-Kerbosch algorithm for clique detection, this matching procedure was shown to be two orders of magnitude faster. Nevertheless, it is stated that the definition of the binding site is crucial for the outcome of an allagainst-all comparison as minor shifts introduce shifted alignments of identical environments. The method TM-align99 was originally developed to identify the best structural alignment between protein pairs. By definition, it is no explicit binding site comparison approach. Nevertheless, it is often used to find local similarities between proteins and was successfully applied in various scenarios. Furthermore, it is nicely suited to explain an algorithm for the spatial alignment of proteins. In the first step, an initial alignment of secondary structures of two proteins is generated using dynamic programming where the element of the score matrix is assigned to be 1 or 0 in dependence of the identity of the aligned secondary structure elements using the DSSP118 assignment. Additionally, a gapless threading against the larger structure is performed as in another method called SAL119. The methods differ in the assignment of scores. Whereas in SAL119 the rmsd is used as measure, TM-align uses a different score (TM-score) to

ACS Paragon Plus Environment

22

Page 23 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

select the optimal alignment. The third alignment is again obtained by dynamic programming using a gap-opening penalty of -1. The scoring matrix is a combination of secondary structure and distance score matrix from the second alignment. A heuristic iterative algorithm is used to refine these initial alignments. The structures are rotated by the TM-score rotation matrix on the basis of the aligned residues. Again, similarity scores are calculated. A new alignment can be obtained by implementing dynamic programming on the similarity matrix. This process is iterated until a stable alignment is reached. The alignment with the highest TM-score is returned. Query3d86 compares binding sites with respect to three similarity criteria: all residues in a set of matching amino acids have to be neighbors of at least one of the other residues in the set, the rmsd of the Cα atoms and the coordinates of the geometric average of all residue atoms has to be below a certain threshold, the similarity between two residues has to be above 0.3 according to the Dayhoff substitution matrix120 (average substitution value of the complete structural match has to be below 1.2). For two protein structures an exhaustive depth-first search is used to find the largest set of matching residues. All matches of a length of one residue are created. Each single amino acid of the reference structure is matched to each amino acid of the target structure. For a reference protein consisting of m amino acids and a target protein with n residues, this step results in n·m matches. All matches violating one of the three above mentioned criteria are discarded leading to a reduction of the number of possible matches. Then, match extension is performed using length two matches. All neighboring amino acids of the previously matched residues are taken into account. Assuming that the residue of the reference protein has i neighbors, while the matching residue has j neighbors, this leads to i·j new matches for the initial length one match. For length two matches which were considered valid according to the predefined exclusion criteria matches of each three residues are generated. The procedure is

ACS Paragon Plus Environment

23

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 109

repeated as long as valid matches are obtained. The algorithm automatically stops when 10 residues are matched to save computation time. This is a hint for globally similar proteins which are of no interest for this binding site comparison method. The algorithm CPASS (Comparison of Protein Active Site Structures)53 aims to compare ligand-defined protein active sites by means of optimal sequence and structural alignment without maintaining sequence connectivity. A maximization of the rmsd weighted BLOSUM62 substitution matrix112 is performed during the alignment. The rmsd between two aligned residues is additionally corrected by 1 Å to account for experimental inaccuracies of solved protein structures. Furthermore, the score is scaled by the shortest distance from a residue at the active site to the ligand to account for small structural changes at the boundaries of the active sites. The fPOP (footprinting Pockets Of Proteins)59 database was generated by a binding site comparison based on 38,900 split pockets as generated by the SplitPocket121 approach. The superimposition was performed via geometrical matching of spatial patterns of the templates with putative pockets in the unbound form. Site-specific spatial patterns were compared by the application of the Smith-Waterman algorithm (generation of local pairwise alignments) and shape matching to compare an unbound pocket against canonical functional surfaces of all split pockets. A surface conservation index is determined to evaluate the results. The assessment of the statistical significance for a surface alignment was achieved using a similar approach to that of Binkowski et al.114. CMASA (Contact MAtrix based local Structural Alignment)49 is an algorithm which performs a fast and sensitive binding site comparison via various steps. Cα atoms and the residue atom furthest away from these atoms for each residue in the protein structure represent the binding

ACS Paragon Plus Environment

24

Page 25 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

site. All local structures that match with the query structure in terms of BLOSUM62 similarity112 are aligned. Most local matches are removed with respect to the Contact Matrix Average Deviation, a measure of the distances between each two atoms belonging to the template and those two of the matched local structure. Only those matches above a certain cut-off are used. Finally, the rmsd, p-values, and ranks are calculated. The PocketFEATURE algorithm73 first assigns microenvironments to proteins. Local spherical regions with a radius of 7.5 Å are extracted from the structure. Altogether, 80 physicochemical properties are collected over six concentric spherical shells centered on a predefined functional center. The FEATURE microenvironments122,123 of two binding sites are compared by an exhaustive calculation of raw Tanimoto scores of all available microenvironment pairs. A search for the best scoring microenvironment pairs (with a score cut-off) is performed. Finally, a microenvironment similarity score based on a normalized Tanimoto score is calculated as a measure for binding site similarity. A rigid geometric matching criterion is avoided enabling a flexible matching between two microenvironments. COFACTOR51 allows for the comparison of proteins to other proteins with known proteinligand binding interactions, EC or GO (gene ontology) classifications. The protein of interest is compared to a database of structures using TMalign99 to reveal global structural similarities which might help to identify family or fold relationships. Non-random structural similarities are investigated in more detail in a subsequent local functional site identification step. By means of a heuristic algorithm query and template structures are compared to identify the best local match. Evolutionary conserved residues according to a multiple sequence alignment are identified and excised from the query structure. The resulting set of local structural motifs is superimposed onto known functional site residues of the template structures. A subsequent iterative superposition

ACS Paragon Plus Environment

25

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 109

refinement similar to TMalign99 is applied. Finally, the match with the highest score in terms of Cα distances and BLOSUM62 substitution scores112 is retained. A different approach compares proteins based on secondary structure elements, which is called PSSC (Protein Structure Similarity Clustering)40. Compared to all other methods presented here, it does not represent an independent algorithm. A workflow is applied to find similarities for available structural data. The alignment of the binding site secondary structure elements has to be done manually according to the interesting matches. The workflow consists of a structural alignment that involves a combinatorial extension of an alignment path defined by aligned fragment pairs124, the calculation of the rmsd of all Cα atoms, and the visual inspection of the alignment to ensure its relevance. The applicability of this approach is based on the assumption that a similar spatial arrangement of secondary structure elements that are part of the ligand binding site enables the binding of similar scaffolds. Or vice versa, the molecular framework as definition of the scaffold125 is able to recognize targets whose binding sites are similar in terms of secondary structure architecture. A review discusses the impact of secondary structure element information on the binding of similar ligands and its impact on drug design.36 Binding Site Comparisons Based on Shape Distributions A further development of the pvSOAR methodology introduced earlier was achieved by the introduction of the SurfaceScreen algorithm97 to optimize the matching of global shape and local physicochemical features to evaluate the similarities between surface pairs. A 3D object recognition algorithm is used to find surface shape similarities, while the similarity of physicochemical features is assessed using a spatial alignment of conserved residues between surfaces. The metric SSS (Surface Shape Signatures) was introduced to represent the signature of

ACS Paragon Plus Environment

26

Page 27 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

an object as a probability distribution sampled from a shape function that measures global geometric properties. As compared to shape matching, this approach leads to a complexity reduction since two probability distributions are analyzed. SurfaceAlign was developed to compare the spatial arrangement of localized surface patterns and to examine side chain orientations. The 3D representation of a surface residue is achieved by a single point at the center of mass of all residue atoms. Alignment of two surfaces is achieved by combinatorically identifying the best superposition of the maximum subset of conserved residues. A library of annotated surfaces, the Global Protein Surface Survey (GPSS), is used to reveal undiscovered similarities between proteins. A surface volume overlap Tanimoto was introduced to score found similarities. The PESD (Property-Encoded Shape Distributions)71 method compares binding sites with respect to shape and property distributions on their surface. Cavities are represented by propertymapped triangulated Gauss-Connolly surfaces. Property-encoded shape distributions are calculated for all binding site surfaces to provide a measure of the probability that a particular property will be located at a specific distance to another one. For the comparisons, D2 shape distributions are used. Each vertex of the triangulated surface is represented by its Cartesian coordinates and a property-based color code (RGB) that encodes the magnitude of its surface properties. The Euclidean distance between two randomly chosen points is calculated and the respective color codes are recorded. Afterwards, distances are binned in 19 bins of 1 Å where each bin is further divided into 4,096 sub-bins for the different possible color combinations. Binding site surfaces were created using MOE Active LP color coding and electrostatics color coding. The weighted sum of chi-squared statistic for electrostatic and Active LP properties is

ACS Paragon Plus Environment

27

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 109

used as dissimilarity measure. The developers also provide a webserver for the comparison of user-defined binding sites against a surface database of 106,796 sites.126 (2) APPLICATIONS OF BINDING SITE COMPARISON ALGORITHMS The following section is dedicated to numerous successful applications to outline the impact of binding site comparison methods on medicinal chemistry and rational drug design in general. Certainly, one could argue that most of the previously cited publications concerned with the development of binding site algorithms also provide various interesting applications. Nevertheless, the more interesting question remains whether those methods proved to be useful in independent projects. Most of the algorithms presented above are used to explain polypharmacology, to annotate proteins, and to predict functions. However, their applicability domains clearly exceed those three examples. The section is roughly divided into two parts to address the different potential readerships: The first part deals with applications with impact on a medicinal chemistry program (hit identification, hit-to-lead or lead optimization), whereas the second part deals with more general applications within a broad drug design context. APPLICATIONS WITH IMPACT ON A MEDICINAL CHEMISTRY PROGRAM Similarity-Based Virtual Screening Approaches Protein binding site similarity can be exploited as an idea generator for new potential ligands or the improvement of existing ligands. On the basis of the assumption that similar ligands bind to similar binding sites, different virtual screening approaches were developed, most of them relying on known ligands (e.g. bioisosteric replacements, scaffold hopping, focused libraries, virtual screening of ligands with high similarity towards a known ligand). The major

ACS Paragon Plus Environment

28

Page 29 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

disadvantage of all ligand-based methods is the fact that they rely on the knowledge of at least one bioactive ligand of the target of interest. If nothing is known about small molecules binding to the target, a ligand-based approach becomes useless. For such challenges, binding site comparison comes into play. Once a similarity between the binding site to be addressed and a binding site of a well-characterized target is found, the scientist can look for compounds displaying bioactivity on the similar target and test them on the own protein of interest. A promising example is the application of Cavbase44 to design novel peptide aldehyde-based reversible inhibitors of the SARS coronavirus main protease, SARS-CoV Mpro.46 Peptide-bound crystal structures of this protease were analyzed. Subsequently, a modeling study was performed using Cavbase44 searches and molecular docking. The binding pocket of SARS CoV Mpro (pdb 1uk4) was divided into the different amino acid recognition subpockets which were then subjected to single Cavbase44 searches for similar subpockets. A detailed analysis of the obtained results is explained in another article47. The information about bound ligand fragments in the related subpockets was then used as a guideline for the design of suitable side chains of a potent protease peptide inhibitor. In sum, 1,230 natural and non-natural amino acids were used for docking studies covering the S1, S2 and S4 binding pockets while the localization of the peptide backbone as observed in known protein-peptide complexes was retained. The analysis suggests that the S1 pocket can be addressed by a glutamine, the S2 pocket prefers aromatic side chains superior to the native leucine, and the S4 pocket is able to complex various anionic and uncharged polar amino acids. Indeed, four of the rationally designed peptides exhibit the highest potential of all peptide aldehydes tested. They all comprise glutamine residues in the P1 position (AcAsn-Ser-Thr-Asp-Gln-H, AcAsn-Ser-Thr-Ser-Gln-H, AcGlu-Ser-Thr-Leu-Gln-H, AcAspSer-Thr-Leu-Gln-H) and show IC50 values of 20, 7.5, 7.5, and 10 µM, respectively. Glutamate

ACS Paragon Plus Environment

29

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 109

seems to be preferred over aspartate in the P5 position, while introduction of a histidine proved to decrease activity (IC50 for AcHis-Ser-Thr-Leu-Gln-H > 50 µM). Introduction of hydrophobic and aromatic side chains at position P3 addressing the S2 pocket led to only slight improvements. Crystal structures of four enzyme-bound peptide aldehydes allow for a more detailed analysis.127 Additionally, Ki values of 40, 72, 8, and 41 µM were reported for the four peptide aldehydes AcAsn-Ser-Thr-Ser-Gln-H, AcAsn-Ser-Phe-Ser-Gln-H, AcGlu-Ser-Thr-LeuGln-H, and AcAsp-Ser-Phe-Asp-Gln-H, respectively. Those results show that hydrophilic serine or charged aspartate residues in P2 position instead of the hydrophobic residues in P3 position occupy the hydrophobic S2 pocket and represent an example for unexpected methioninecarboxylate interactions. Additionally, the preference for a glutamate in P5 position over an aspartate residue could be verified. The PSSC approach40 was used to find potential inhibitors of lysine-specific demethylase 1 (LSD1), an enzyme removing mono- and dimethyl marks from lysine 4 or 9 of histone H3. The protein in association with the androgen receptor controls androgen-dependent gene expression and prostate tumor cell proliferation.128 The authors extracted the LSD1 binding site from the reported crystal structure (pdb 2ejr) and performed a Dali129 similarity search against all proteins stored in the PDB. The resulting alignments were visually inspected to verify the presence of a similar ligand-sensing core. Besides the identification of structurally related proteins, similarities between LSD1 and the members of the monoamine oxidase (MAO) family MAO-A (pdb 2bxr) and MAO-B (pdb 1gos) were found. Inspired by the finding that γ-pyrones are a novel class of reversible MAO-A and MAO-B inhibitors130, a library of 705 compounds was screened for LSD1 inhibition. Consequently, namoline (1)128 was found to inhibit LSD1 demethylase activity with an IC50 of 51 µM. Finally, the compound was tested on MAO enzymes which are inhibited

ACS Paragon Plus Environment

30

Page 31 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

by

Journal of Medicinal Chemistry

the

related

molecules

N-(3-(2,4-dichlorophenoxy)propyl)-N-methylprop-2-yn-1-amine

(clorgyline)128 and N-benzyl-N-methylprop-2-yn-1-amine (pargyline)128. Compound 1 did not inhibit MAO-A nor MAO-B because the sequence similarity between the ligand-sensing cores of LSD1 and both enzymes is only 19%. Figure 4 gives a detailed picture of the superimposed binding sites of LSD1 and MAO-B after an alignment according to the common secondary structure elements. This example shows how the presence of common ligand-sensing cores can facilitate the search for new ligands without bearing the risk of not being able to design compounds with a promising selectivity profile. To further validate the applicability of 1 to treat prostate cancer, evidence for the compound’s antiproliferative effect on LNCaP cells (androgensensitive human prostate adenocarcinoma cells) was provided. In vivo studies with Xenograft tumors (subcutaneous implantation of LNCaP cells into nude mice) were performed to analyze the effect of 1 on tumor cell proliferation. Despite some adverse effects (slight weight loss, minor liver toxicity), treatment with 1 was shown to be beneficial in terms of proliferation inhibition.

ACS Paragon Plus Environment

31

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 109

Figure 4. Common ligand-sensing core of LSD1 and MAO-B. (a) Structure of compound 1. (b) Common ligand-sensing core of LSD1 in green (pdb 2ejr) and MAO-A with bound inhibitor in orange in ball-and-sticks representation (pdb 2bxr). (c) Active sites of both enzymes with cavitylining residues represented as sticks and names according to the crystal structures. Figures (b) and (c) were generated using UCSF Chimera 1.10.1131. The last example62 discusses a successful combination of an IFP comparison61 and a molecular docking approach for the selective virtual screening for ligands with a specific functional effect. The signaling activity of GPCRs can be stimulated (full or partial agonist), blocked (antagonist), or reduced (inverse agonist) by various small molecules. To date, it is quite demanding to find a new potential agonist or antagonist. Known complexes of different small molecules with various effects in 31 β1 and β2 adrenoceptor crystal structures were used to derive unique protein-ligand IFPs. Altogether, the IFPs of 33 β1 adrenoceptor and 15 β2 adrenoceptor binding sites were

ACS Paragon Plus Environment

32

Page 33 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

calculated. Additionally, physicochemically similar decoys were included for subsequent docking studies. After post-processing of the complexes of 1,920 unique effector molecules, function classification on the basis of the protein-ligand interaction profiles was performed. The enrichment of specific ligand types was strongly influenced by differences between receptor conformations and reference IFPs for docking pose post-processing. Optimum structure IFP combinations could be established to successfully discriminate between antagonists, inverse agonists, and partial or full agonists. The predicted IFP for the full agonist norepinephrine gave the highest enrichment of agonists over antagonists for all structures. Additionally, the so-called biased ligands in the dataset were analyzed. GPCRs transfer signals via G-proteins and β-arrestins132. Therefore, the dataset was screened for compounds that induce signaling via the βarrestin pathway and those that do not bias signaling. The IFPs were grouped according to biased and unbiased ligands. The difference in the abundance of each interaction with each pocket residue was calculated. Consequently, four interactions, which are abundant (≥ 50%) and unique for biased ligands, were identified. In summary, this study nicely shows how the comparison of binding sites facilitates the search for GPCR ligands that show a desired effect. IFP comparison might become a suitable tool for the elucidation of the biased signaling mechanism in GPCRs. Drug Repurposing The challenging and laborious process of discovering a small molecule being able to exhibit the desired pharmacological effect without leading to severe side effects can be facilitated by exploiting the knowledge about approved drugs. The choice of compounds with known toxicity profiles, efficacies, and pharmacokinetic and pharmacodynamic properties leads to reduced attrition rates. Drug repurposing (also called repositioning, redirecting, reprofiling) approaches

ACS Paragon Plus Environment

33

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 109

benefit from this knowledge. They make use of a plethora of computational methods.133 Generally speaking, most strategies for repositioning are based on high-throughput in vitro and in vivo screens using a panel of approved drugs. Obviously, binding site comparison methods can help to find new indications for old drugs by comparing the new target to well-characterized targets with known approved drugs.134 The predictivity of this approach is shown by the huge number of articles published about the repositioning of prominent drugs. A review135 gives information about successful drug repositioning strategies. Here, we show a few examples underlining how the comparison of binding sites can facilitate this process. An exhaustive study33 published in 2013 focused on the reasons for drug promiscuity as prerequisite for drug repositioning. The analysis investigated the subject from two perspectives: ligand features and binding site features leading to the promiscuity of a drug. A workflow comprising three steps was conducted to address the second question whether promiscuous drugs have targets with similar binding sites: alignment of all pairs of binding sites for all promiscuous drugs using SMAP91 without considering bound ligands, the removal of redundant targets, and the exclusion of sites with inconsistent ligand binding modes. For the third step the authors implemented LigandRMSD to find out matches where the ligands are in similar positions. For two conformers, the maximum common subgraph is determined using OpenBabel136 or the Small Molecule Subgraph Detector137 if no isomorphism was found by OpenBabel136. The rmsd was calculated with respect to the SMAP91 binding site alignment and the optimal alignment of both ligand conformers. This method allowed for result analysis without tedious visual inspection of all aligned binding sites which becomes inappropriate for large datasets (after removal of redundant targets the dataset consisted of 3,948 alignments). Matches with a ligand rmsd above 3 Å were refused. The application of this analysis led to a final set of 1,628 binding site matches

ACS Paragon Plus Environment

34

Page 35 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

that revealed a correlation between binding site or structural similarity and promiscuity. To become more detailed from a ligand-based perspective, it could be shown that 71% of the drugs have at least one target pair with a similar binding site and for 18% of the drugs all of their targets are similar. The top four promiscuous drugs in terms of the number of targets are those with the most similar binding site among their targets: benzimidamide (benzamidine)33, staurosporine (2)33, (2R,3R,4S)-3-acetamido-4-hydroxy-2-((1R,2R)-1,2,3-trihydroxypropyl)-3,4dihydro-2H-pyran-6-carboxylic acid (DANA)33, and (2S,5S)-2,5-diamino-6-((2R,3S,4R,5R)-5-(6amino-9H-purin-9-yl)-3,4-dihydroxytetrahydrofuran-2-yl)hexanoic

acid

(sinefungin)33.

Additionally, the influence of global structural similarity was assessed using TM-align99. The investigations revealed that 55% of the protein pairs are structurally dissimilar and 15% of the pairs with similar binding sites are significantly dissimilar (TM-score below 0.5). Although local and global protein similarity correlated with ligand promiscuity, it was concluded that some offtarget effects cannot be shown by conventional global sequence and structure alignment tools. This kind of similarity can exclusively be identified on the basis of efficient binding site comparison methods underlining their impact on repurposing approaches. The authors plead that their dataset built from data as stored in the databases PDB and BindingDB138 is limited in terms of chemical space coverage of drugs and protein structure space. A comparison of their results to those of previous promiscuity studies shows that the number of promiscuous drug in their analysis is much smaller. Nevertheless, the inclusion of structural data in the analysis led to new observations. In conclusion, the degree of drug promiscuity (number of off-targets) weakly correlates with ligand flexibility, whereas structural similarity, and moreover, binding site similarity account for the promiscuity of known drugs. Altogether, 71% of the analyzed drugs

ACS Paragon Plus Environment

35

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 36 of 109

possess at least one pair of similar target binding sites. In conclusion, the results shed light on the impact of binding site comparison for the repurposing of known drugs. The knowledge that the NSAIDs (nonsteroidal anti-inflammatory drugs) celecoxib (3)48 and valdecoxib (4)48 possess an unsubstituted arylsulfonamide moiety provided the basis for the first approach presented here. As it is widely known that this group is present in a wide range of known carbonic anhydrase (CA) inhibitors, similarities between the target of both inhibitors, COX-2 (cyclooxygenase) and CA were investigated48. A nanomolar affinity of the COX-2 specific inhibitors 3 and 4 for CA family members I, II, IV, and IX could be shown with the help of enzyme kinetics and crystallographic studies. Intriguingly, 4-(4-(methylsulfonyl)phenyl)-3phenylfuran-2(5H)-one (rofecoxib)48, a COX-2 inhibitor harboring a methyl sulfone moiety, has no effect on those enzyme isoforms. Furthermore, compounds 3 and 4 show a much higher selectivity towards hCA II and hCA IX than towards hCA I and bovine CA IV as compared to known CA inhibitors. This finding was confirmed by results suggesting that the oral administration of 3 and 4 to glaucomatous rabbits leads to a lowered intraocular pressure. For CA II, a crystal structure was solved in complex with 3 (pdb 1oq5) showing a coordinative binding to the catalytic zinc ion in the enzyme’s active site. A COX-2 complex structure with a related p-bromo derivative of 3 (4-(4-(4-bromophenyl)-3-(trifluoromethyl)-1H-pyrazol-1yl)benzenesulfonamide (SC-558)48) was also known (pdb 6cox) that allowed for a detailed comparison of the binding site geometries. The Cavbase44 approach was applied to represent the COX-2 sulfonamide anchor binding cavity by 25 pseudocenters according to the properties of the amino acids flanking the binding site. On the basis of this representation, a set of 9,433 ligand-containing cavities was screened with Cavbase44. After ranking of the similarities and removal of COX cavities, the first CA cavity is retrieved on rank 38 with more following on

ACS Paragon Plus Environment

36

Page 37 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

subsequent ranks. The subcavity bound to the trifluoromethyl substituent was also subjected to a Cavbase44 search. Again, 41 CAs were found within the 200 best ranked hits. The last screen was performed for the bromophenyl group accommodating subcavity of COX-2 and did not lead to an enrichment of CA pockets in the best ranked results. This was explained by different physicochemical properties of those cavities in both enzymes. However, the study allowed a matching of equivalent residues between COX-2 (pdb 6cox) and CA II (pdb 1bn4) with respect to the binding of compound 3. It provided insights into the residues crucial for binding of this inhibitor type, thus facilitating the creation of structure-based pharmacophores. Finally, the analysis presents new indications for COX-2 selective NSAIDs with respect to glaucoma and anticancer therapy. The second example95 represents an approach that uses repositioning to treat multi-drug resistant (MDR) and extensively drug resistant (XDR) tuberculosis. A chemical systems biology approach was applied to reveal the off-targets of pharmaceuticals on a proteome-wide scale. It was predicted that the drugs entacapone (5)95 and tolcapone (6)95, originally prescribed for the treatment of Parkinson’s disease, might be useful for the treatment of MDR and XDR tuberculosis. The investigations of this effect led to the assumption that the drugs inhibit the enzyme InhA by directly binding to its substrate binding site. The known drugs isonicotinohydrazide (isoniazid)95 and 2-ethylpyridine-4-carbothioamide (ethionamide)95 are prone to resistance mechanisms while others that require no enzymatic activation avoid the resistance mechanism of MDR Mycobacterium tuberculosis (Mtb). To validate the finding that S-adenosylmethionine(SAM)-dependent methyltransferases and NAD-binding enzymes share similar binding site properties, the binding site of catechol-O-methyltransferase (COMT) was compared to NAD-binding enzymes. This protein is involved in the breakdown of

ACS Paragon Plus Environment

37

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 38 of 109

catecholamine-based neurotransmitters such as dopamine with impact on Parkinson’s disease. Known drugs are the COMT substrate (S)-2-amino-3-(3,4-dihydroxyphenyl)propanoic acid (levodopa)95, 5 and 6. A novel polypharmacology prediction process was developed to assess the possibility that the latter two drugs might bind to other interesting targets. The applied workflow includes the extraction of the binding site of a commercially available drug or its prediction based on a 3D structure or homology model of the target, the prediction of off-targets with similar ligand binding sites across the proteome, the evaluation of the interactions between the drugs and putative off-targets using protein ligand docking, and a final optimization step to increase the compound’s potency, selectivity, and ADME properties. Using the SOIPPA94 algorithm, it was shown that the NAD binding site of the Rossmann fold and the SAM binding site of SAM methyltransferases share significant similarities. Compounds 5 and 6 were docked into 215 non-redundant NAD-binding proteins from the PDB to identify similar ligand binding sites in the cofactor binding site’s vicinity. The results of those studies suggested that enoyl-acyl carrier protein reductases (InhA) from several organisms show favorable binding affinities for those drugs. Especially, a protein identified from Mtb (pdb 4tzk) attracted the researchers’ attention. The enzyme is essential for type II fatty acid biosynthesis and the synthesis of the bacterial cell wall. Therefore, it represents a promising drug target. The next step was the search for known InhA inhibitors. The 2D molecule similarities between the 22 found known inhibitors and compounds 5 and 6 were calculated. This analysis revealed that the 2D similarities between 5 and known InhA inhibitors were statistically insignificant. As a crystal structure of human COMT, the original target of those drugs, was missing, a structure of brown rat COMT (pdb 2cl5) sharing 81% sequence similarity with the human protein was used for docking of existing and potential InhA inhibitors together with nine InhA structures. On the basis of the resulting

ACS Paragon Plus Environment

38

Page 39 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

binding poses, the authors discussed several similarities between the binding poses of 5 and 6 to those of known InhA inhibitors despite the low overall 2D similarity. Additionally, a binding pose analysis of putative InhA inhibitors and a calculation of logP and logD values of existing and putative antitubercular drugs were performed. Although both 5 and 6 had higher logP and logD values than existing antitubercular drugs, it was argued that their higher hydrophobicity allows for a more easy permeation of the Mtb cell envelope. A validation of this prediction by in vitro and InhA kinetic assays showed that 5 has a MIC99 of about 260 µM when tested on Mtb. It inhibits InhA activity by 47% at a concentration of 80 µM. With respect to the results, the authors suggest the inclusion of their pipeline in drug discovery to find new drug leads with desired safety profiles. In another study100 it was shown that the tyrosine kinase inhibitor pazopanib (7)100 inhibits AChE activities at submicromolar concentrations with impact on the reversal of memory and cognitive deficits in rat model neurodegeneration. A combination of 2D virtual screening, ligand screening for 3D shape and binding site similarity comparison led to the finding that 7 is able to restore memory loss and cognitive dysfunction to a similar extent as the marketed AChE inhibitor donepezil (8)100. This is not only an impressive example of how computational tools can help to facilitate the process of repositioning but also shows the impact of previous studies using binding site comparison methods on drug design. The authors performed a docking study of 1,385 FDA-approved small molecule drugs for the binding to AChE and used the resulting scores for compound ranking. The top hits resulting from this approach were three tyrosine kinase inhibitors (7, sorafenib (9)100, and sunitinib (10)100) aside from the known AChE inhibitor 8. Owing to the availability of known compounds inhibiting AChE activity, a ligand-based approach could be applied. A 3D shape comparison between 8 and the 1,385 FDA-approved

ACS Paragon Plus Environment

39

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 40 of 109

drugs was performed using Tanimoto scores and electrostatic similarity. The final hits were reranked using a combination of shape, Tanimoto, and electrostatic similarity. Nearly one third of the hits belong to the class of antipsychotic and antidepressant drugs. Compounds 7, 9, and 10 were also among the best scored hits. Additionally, pairwise binding site comparison of AChE (pdb 1eve) and 1,105 PDB X-ray structures of 377 FDA-approved drug targets using TM-align99 revealed that 14 out of the 20 top ranked identified targets belong to the protein kinase family (mainly tyrosine kinases). In conclusion, it can be stated that all three methods (docking, ligandbased and structure-based screening) led to consistent results. Furthermore, the results were confirmed by an enzymatic assay. IC50 values of 0.93 and 5.87 µM were determined for 7 and 10, respectively. Altogether, 13 tyrosine kinase inhibitors were tested. Compound 9 and some others proved to be weak AChE binders. An MD simulation revealed that compounds 7 and 8 do not directly interact with the catalytic triad but bind via hydrophobic contacts with adjacent residues. Compound 7 was tested in vivo on cognition impairment in rat models, on spatial memory impairment, and in a test to measure the short-term working memory for model rats treated with different compounds to prove its in vitro efficacy. The results suggest that the compound restored memory loss and cognition impairment in rat models. It reduced learning and memory deficits in a dose-dependent manner. This study provides an outstanding example of the benefits of drug repositioning in terms of pharmacodynamics and pharmacokinetic properties. Another study93 based on the SMAP91 algorithm shows that repurposing of known kinase inhibitors is a promising tool to cope with alternative disease targets. Encouraged by a previous finding139, the binding site of β-secretase (BACE-1) as a potential drug target to cope with Alzheimer’s disease (AD) was compared against 24 well-characterized receptor tyrosine kinases using TM-align99. EGFR was predicted to have the highest global structural similarity to

ACS Paragon Plus Environment

40

Page 41 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

BACE-1. SMAP91 was used to compare both binding sites with each other highlighting highly similar functional sites. Afterwards, 13 FDA-approved drugs were screened against BACE-1. It could be shown that the EGFR inhibitor gefitinib (11)93 had the strongest effect on the BACE-1 activity (IC50 = 20 µM) while the remaining inhibitors showed IC50 values above 100 µM. Cell viability assays with 11 had good outcomes. In vitro studies suggest that compound 11 has an impact on the metabolic amyloid precursor protein products some of which are components of amyloid plaques in brains of AD patients. The binding of 11 was further analyzed via docking studies and MD simulations, providing a thorough study of this inhibitor and sufficient evidence for further testing. The last example64 shows the use of metabolic network construction and binding site comparison for the identification of promising targets and drug repurposing. An accurate metabolic reconstruction of the Gram-positive anaerobic bacterium Clostridium difficile led to the identification of enzymes essential for survival. The identified essential genes were compared to those of Bacillus subtilis homologs (experimental data existing). Comparative models were generated for 123 essential genes. The largest binding cavities were identified. Those binding sites were compared to a set of 7,339 non-redundant small molecule binding sites using IsoCleft Finder63. For each protein, the closest match was extracted and analyzed in more detail. For 29 predicted essential C. difficile targets 41 homolog proteins were found in the database DrugBank20. Altogether, 125 molecules were identified (22 of them approved) that might bind to one of the analyzed potential targets. Given that those molecules could also bind to the C. difficile homolog, the analysis provides starting points for further evaluation. Polypharmacology Prediction

ACS Paragon Plus Environment

41

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 42 of 109

The principle of polypharmacology relies on the idea that one drug is able to modulate multiple targets leading to desired effects. Therefore, this drug is more effective than drugs designed according to the one-drug-one-target paradigm. A single drug could fulfill tasks that were previously achieved by combination therapies that bear a higher risk of undesired adverse effects. But the question is: Can promiscuous drugs be rationally designed? The multitude of computational approaches useful to predict the activity profile of ligands to a set of targets is discussed in an article of Rastelli and Pinzi140. The impact of binding site comparison approaches and protein-ligand interaction patterns on the design of molecules acting on more than one target is examined in a review by Salentin et al.141. Protein binding site comparison can facilitate the design of new small molecular modulators of bioactivity as indicated by the following examples. The first example92 deals with the HIV protease inhibitor nelfinavir (12)92 which exhibits pleiotropic effects in cancer cells. By a SMAP91 binding site comparison of the compound’s binding pocket in the HIV protease dimer (pdb 1ohr) against 5,985 PDB structures of human proteins or homologs of human proteins, 126 similar structures were found. Starting from the superimposed binding sites, compound 12 was docked into all identified similar binding sites. After removal of those proteins for which the docked ligand clashes, 92 potential off-targets were left to be investigated. The seven top ranked proteins belong to the aspartyl protease family, whereas 51 are protein kinases and 17 are nucleotide binding proteins. Of all identified protein kinases, mainly belonging to tyrosine kinase, cAMP-, cGMP-dependent kinase, or protein kinase C family, the 12 best ranked were subjected to exhaustive docking studies. Further investigations using MD simulations and MM/GBSA free energy calculations led to the conclusion that 12 directly interacts with EGFR, while it is unlikely to bind to FGFR (fibroblast growth factor receptor), EphB4, and Abl. For six other kinases (IGF-1R (insulin like growth factor 1 receptor),

ACS Paragon Plus Environment

42

Page 43 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

FAK (focal adhesion kinase), Akt2, CDK2, ARK (β adrenergic receptor kinase), PDK1 (phosphoinositide-dependent kinase 1)) it was concluded that a binding might be possible. A high-throughput screening with a compound concentration of 20 µM was performed for EGFR, ErbB2, ErbB4, and all Akt isoforms. A low inhibition was detected for ErbB2, whereas EGFR inhibition was more pronounced. The authors argue that the anticancer activity is not due to aggregation of 12. Compared to most cellular studies that required higher inhibitor concentrations, the compound exhibited specific anticancer activity without non-specific binding. Furthermore, they refer to an article by Gills et al.142 who showed that 20 µM 12 reduced the activation of EGFR, IGF-1R, and Akt signaling pathways. It was concluded that the compound might inhibit other protein kinases through weaker interactions. Finally, the computational findings were discussed in the context of biological network and signal transduction pathway analysis. All these facts underline the applicability of compound 12 as potent anticancer agent in terms of polypharmacology. Today, data which was provided by the National Toxology Program substantiates this finding and can be found in the ChEMBL database19. An IC50 of approximately 56 µM was measured for EGFR inhibition by compound 12 (kinase substrate: Poly(Glu-Tyr)). The method PocketMatch75 proved to be useful to explain the reasons for polypharmacology observed for the competitive antagonist of serotonin type 2B/2C metabotropic receptors (5-HT2B/2CR) SB-206553 (13)143.77 This drug also acts as a positive allosteric modulator of the ionotropic α7 nAcChR with an EC50 of 1.5 µM for potentiation of calcium responses to EC20 nicotine143. Homology models of the extracellular and transmembrane domain of α7 nAcChR and 5-HT2CR were generated. Explicit MD simulations of the protein structures in hydrated palmitoyl-oleyl-phosphatidyl-choline bilayer membranes were performed for all final models to

ACS Paragon Plus Environment

43

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 44 of 109

evaluate the model quality. Blind docking 13 into α7 nAcChR, 5-HT2CR, and 5-HT2BR without the definition of a binding site and further MD simulations of the obtained complexes led to the identification of putative binding sites in all three receptor proteins. This enabled the authors to perform an exhaustive binding site comparison using the PocketMatch algorithm with minor modifications. The binding site was represented as sorted list of distances between amino acids at a distance range of 3.5 to 8 Å from the docked ligand. Each residue was classified according to its physicochemical properties. The binding sites were aligned using a threshold of 0.5 Å to account for the dynamics of macromolecular systems. The binding sites of 5-HT2BR and 5-HT2CR were highly similar, as expected given their homology. This explains the similar affinities of 13 for both proteins144. The comparison of the α7 nAcChR extracellular domain binding site with those of both serotonin receptors revealed that the binding sites are similar with respect to size and chemical nature of the pocket-lining residues. No significant similarity to the serotonin receptor binding sites was found for the putative binding site of compound 13 in the transmembrane domain of nAcChR. Altogether, the study revealed the reasons for the polypharmacological properties of the investigated drug and facilitates the structure-based design of novel drugs acting at both receptor types.

ACS Paragon Plus Environment

44

Page 45 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

Figure 5. Structures of compounds analyzed in repurposing and polypharmacology studies. Analysis of Protein-Ligand Interactions In the last years, the generation and prediction of protein-ligand interaction networks became a promising tool for drug design. Various approaches deal with the construction and mapping of various drug-oriented interaction networks. A comprehensive review145 introduces the structural bioinformatics of the interactome and their application in drug development. In this section, the impact of binding site comparison on this field is discussed. Yang et al.45 constructed an in silico drug-target network for ten AD drugs, 47 randomly chosen molecules from DrugBank20, and a set of 401 human protein pockets. Their docking results suggested that drugs targeting AChE (the only validated AD target) might also act on HDACs and members of the estrogen receptor family. The pocket comparison algorithm

ACS Paragon Plus Environment

45

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 46 of 109

developed by Schmitt et al.44 revealed some promising similarities between HDAC7 (pdb 3c0y) and AChE (pdb 1f8u) active sites indicating a similar nature of contacts to their respective ligands. Inverse docking and additional docking studies were performed to support those findings. Intriguingly, the authors state that the relationship between both enzymes is not determined by the shape of the pocket or sequence similarities, but by the interactions with AD drugs and HDAC inhibitors. Cross-reactivity with various ATP-binding targets could also be shown using SiteAlign11 for the similarity measurement between ligand-annotated binding sites.88 The authors extracted all druggable protein-ligand binding sites from the database sc-PDB8,9 and screened them for their similarity to the ATP-binding site of Pim-1 kinase (pdb 1yhs). Protein kinases were removed from the dataset to exclude obvious similarities. Proteins which were present in a single copy were also eliminated. SiteAlign11 was applied to the remaining complexes. The synapsin I (pdbs 1aux, 1px2) binding site showed the highest similarity to Pim-1 kinase. A detailed analysis revealed a relationship between binding site-lining residues (Figure 6). The docked pan-kinase inhibitor 2 showed promising binding modes for synapsin I. In fact, compound 2 binds to synapsin I, inhibits ATP-binding, and even affects the synapsin I-dependent F-actin bundling. Further analysis indicated similarities of synapsin I to the kinases CDK2 and CKII (casein kinase II), whereas other ATP-binding proteins were classified as being dissimilar (e.g. PKA, HSP90α, CHK1, DNA topoisomerase II, diacylglycerol kinase). To confirm those in silico results, eight high affinity inhibitors for the eight representative targets were purchased and tested for in vitro inhibition of synapsin I. The Pim-1 kinase inhibitor 2-(3,4-Dihydroxyphenyl)3,5,6,7-tetrahydroxy-4H-chromen-4-one (quercetagetin)88 proved to be the most potent effector (IC50 = 0.15 µM). Roscovitine (14)88 and 2-(4,5,6,7-tetraiodo-1,3-dioxoisoindolin-2-yl)acetic

ACS Paragon Plus Environment

46

Page 47 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

acid (70159800251)88, both inhibitors of CDK2 and CKII, also acted ATP-competitive, but with higher IC50 values for synapsin I (1.0 and 0.5 µM, respectively). Other protein kinase inhibitor activities were negligible. Altogether, this binding site comparison approach significantly contributed to the extension of the protein kinase inhibitor interactome.

Figure 6. Alignment of synapsin I to the protein kinase Pim-1 with respect to the ATP-binding site. The structure of Pim-1 is shown in forest green (pdb 3a99) with bound ATP analog represented as sticks. Synapsin I (pdb 1aux) is colored orange. (a) Alignment of both proteins which differ in fold. (b) Detailed view of the aligned ATP-binding sites and the cavity-lining residues. Figures were generated using UCSF Chimera 1.10.1131. The algorithm PocketFEATURE73 was applied to explain the mechanism of action of the antimicrobial allosteric inhibitor PC190723 (15)146.74 The compound inhibits Staphylococcus aureus GTPase FtsZ activity in vitro and showed in vivo antimicrobial activities for Grampositive bacteria.146 Its mechanism on the different bacterial species is still unknown. The use of PocketFEATURE73 and MD simulations provided insights into the inhibitor’s properties. Initially, proteins from different species were aligned and pocket residues identified. In a second

ACS Paragon Plus Environment

47

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 48 of 109

step, microenvironments around the functional centers of a residue were defined and physicochemical properties calculated. Afterwards, a modified Tanimoto coefficient for each pocket pair was calculated. Obtained scores were evaluated against a set of equivalent microenvironment pairs from 1,116 non-redundant PDB structures. An optimum binding pocket of 15 was defined as the microenvironment of 20 residues within a 6 Å radius of the bound compound in a co-crystal structure of the enzyme from Staph. aureus (pdb 4dxd). After performing MD simulations with the FtsZ structures of Staph. aureus (pdb 3vo8), Staphylococcus epidermidis (pdb 4m8i), and B. subtilis (pdb 2rhl), amino acid pocket scores were calculated for the equilibrated structures. It could be shown that the conformation of the FtsZ binding pocket is strongly influenced by the species, allosteric binding, genetic perturbations, and polymerization state. Especially, oligomerization and allosteric binding of GTP seem to stabilize the pocket. Although the proteins from different organisms show a high overall structural similarity, the binding site differences lead to different affinities and experimental outcomes. Varying responses of SaFtsZ and BsFtsZ could be explained by different effects of bound nucleotides on the binding site architecture. One study suggested that the enzymatic activity of SaFtsZ is increased by 15, while no such behavior could be observed for BsFtsZ147. In contrast, the compound’s antimicrobial properties in Staph. aureus and B. subtilis agreed with the in vitro results of another study (drug acts through excess polymer stabilization)148. The computational approach revealed that 15 might bind to another site of the B. subtilis enzyme or acts through another mechanism on BsFtsZ as its binding site architecture is different from known binding pockets of 15 (see also Figure 7). Resistance variants were analyzed likewise. The similarity score to the “optimal pocket” considerably decreased after introduction of the mutations. This study underlines the impact of binding site comparison and

ACS Paragon Plus Environment

48

Page 49 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

similarity measurements on the evaluation of binding affinities. MIC values for the compound from the ChEMBL database19 for different organisms support these findings.

Figure 7. Comparison of FtsZ from Staph. aureus, Staph. epidermidis, and B. subtilis with respect to the SaFtsZ binding site of compound 15. (a) Structure of 15. (b) Overall alignment of SaFtsZ with bound 15 in ball-and-sticks representation in green (pdb 4dxd), SeFtsZ in purple (pdb 4m8i) and BsFtsZ in orange (pdb 2rhl). (c) Binding site alignment of all three proteins. Their structures share highly similar binding site residues (represented as sticks, names according to crystal structures). Nevertheless, they differ with respect to their binding site geometry. Figures (b) and (c) were created using UCSF Chimera 1.10.1131. Analysis of Protein-Protein Interactions Host-pathogen interactions, signal cascades, and regulation mechanisms of all organisms highly rely on PPIs. This led to a huge interest in understanding the nature of those interactions and the emergence of various methods for their computational analysis. Two reviews149,150 give

ACS Paragon Plus Environment

49

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 50 of 109

an overview over structural bioinformatics of PPIs as well as methods currently applied to predict protein-protein interfaces. In the following example90 the binding site comparison algorithm SiteEngine89 was used to scan molecular surfaces of all PDB protein chains with a template ubiquitin-binding domain (UBD) to screen for new potential UBDs. Ubiquitylation signals play a significant role in trafficking of endogenous and retroviral transmembrane proteins. It was shown that the blocking of distinct UBDs in vivo can influence retroviral budding. The workflow leading to those results included: i) choice of a template UBD (pdb 3k9p), ii) binding site comparison, iii) superimposition of the template UBD on the candidate-binding patch and search for energetically favorable binding patches in the immediate vicinity. The authors identified the domain ALIX-V (pdb 2ojq) as potential new UBD and confirmed the in silico findings by biophysical affinity measurements. Later, another study151 proved this hypothesis. Additionally, it could be shown that the yeast Alix homolog Bro1 functions as a ubiquitin receptor for protein sorting into multivesicular endosomes.152 The binding site comparison method CPASS53 enabled the elucidation of a structural and functional similarity between the type III secretion system needle protein PrgI from Salmonella typhimurium and the eukaryotic apoptosis Bcl-2 proteins.54 The authors combined NMR ligand affinity screening using a fragment-based functional library and bioinformatics methods to reveal relationships between otherwise unrelated proteins. Chemical shift perturbations in the 2D 1

15

N-

H HSQC experiments between free PrgI and the complex of PrgI and its previously identified

ligand N-decyl-N,N-dimethyldecan-1-aminium bromide (DDAB)54 helped to identify residues constituting the binding site. Docking and active site comparison revealed that PrgI is similar to the anti-apoptosis regulating protein Bcl-xL (pdb 1ysn) complexed to (R)-4-(4-((4'-chloro-[1,1'-

ACS Paragon Plus Environment

50

Page 51 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

biphenyl]-2-yl)methyl)piperazin-1-yl)-N-((4-((4-(dimethylamino)-1-(phenylthio)butan-2yl)amino)-3-nitrophenyl)sulfonyl)benzamide (ABT-737)54, an acylsulfonamide-based inhibitor. Although both ligands are quite dissimilar from a purely chemical point of view, they show similarities in their target interactions. This finding led to the prediction of 1,2-dimethoxy-12methyl-[1,3]dioxolo[4',5':4,5]benzo[1,2-c]phenanthridin-12-ium (chelerythrine)54 as potential PrgI binder. Docking of this compound into a binding site identified by NMR studies of PrgI suggested that both proteins bind the ligand at similar binding sites and share a locally limited structural similarity. Interestingly, both binding sites are essential for PPIs, pointing to their impact on protein oligomerization. Off-Target Prediction Given the huge space of protein binding sites, the prediction of potential off-targets is a crucial step for target validation as well as the choice of a suitable lead. Usually, demanding experimental studies are required during lead optimization. Although prominent off-targets are already known for many compounds, the identification of distant drug off-targets153 is a valuable tool to give hints towards compound optimization. Paolini et al.154 analyzed the pharmacological relationships of an assembly of annotated pharmacological data. The dataset contained 4.8 million non-redundant chemical structures. About 275,000 biologically active compounds, more than 600,000 SARs of molecular binding from Pfizer’s internal screening files, commercial screening data, competitive intelligence on approved and investigational drugs, and key components of 25 years of published medicinal chemistry data were analyzed. The binding affinity threshold was set to 10 µM and compounds violating at least one of Lipinski’s rule-of-five criteria were excluded from further analysis. At

ACS Paragon Plus Environment

51

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 52 of 109

least one binding molecule is known for 727 targets, whereas 529 of them have at least one compound with a binding affinity below 100 nM. After the exclusion of non-specific aggregation inhibitors, 276,122 compounds were analyzed with regard to known targets. This study revealed that 65% of them have recorded activity for a single target, whereas 35% hit more than one target. The results show that the activity of a compound on various targets cannot only be exploited for designing more efficient drugs, but also leads to severe side effects. Importantly, the various possible combinations between 276,122 compounds and at least 727 human targets leaves room for various computational big data approaches to address this challenge. An analysis of PubChem21 data revealed that 57.7% of the compounds with confirmatory bioassays showed single target activity and that the number of compounds drops exponentially when the number of targets increases.155 Nevertheless, we have to admit that this kind of analysis is speculative as long as the drug-target matrix is not completely filled. Another study attempted to fill this matrix with the help of docking on high-performance computing machines supporting the idea that the available drug on-target interaction data is not sufficient to statistically assess the off-target space.156 The following paragraph is dedicated to examples where binding site comparison methods assisted in the prediction of off-targets. Coping with cross-reactivity of protein kinase inhibitors is a well-known challenge in medicinal chemistry. Kinnings and Jackson68 tried to shed light on the question of how binding site comparison can help to identify potential cross-reactivity of certain well-known kinase inhibitors. They compared the ATP-binding sites of 354 structures by using a geometric hashing algorithm67. Protein kinases were extracted from the PDB on the basis of their PFAM157 families Pkinase (PF00069) and Protein tyrosine kinase (PF07714). All protein chains were superimposed onto a cAMP-dependent protein kinase structure (pdb 1atp) as a typical protein kinase in terms

ACS Paragon Plus Environment

52

Page 53 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

of backbone architecture. Although the algorithm takes no ligand atoms into consideration, all ATP-binding sites and their respective ligands were successfully aligned. Afterwards, a “compound ligand” composed of all the ligand atoms of the aligned structures was created. Atoms were removed if less than 50 atoms of other ligands were within 1.5 Å to exclude less populated regions. The binding site was defined by choosing all atoms in a 5 Å radius of the “compound ligand”. The resulting binding sites were subsequently compared to each other using geometric hashing. Similarity scores were calculated for atom-atom correspondences (same element and relative spatial orientation). Additionally, the sizes of the two binding sites were taken into account. Binding sites were clustered according to the final similarity score. This structure-based comparison was evaluated against a sequence-based similarity analysis. It could be shown that various structural similarities exist between protein kinases sharing low sequence identity. Additionally, the authors provide evidence that several differences in the binding site conformations exist, although the respective kinases show high sequence similarity. Therefore, one can conclude that binding site comparison is strongly dependent on the state of the respective kinases. For three of nine kinase inhibitors (p38α inhibitors (4-(4-(4-fluorophenyl)-2(4-(methylsulfinyl)phenyl)-1H-imidazol-5-yl)pyridine (SB-203580)68 and 1-(3-(tert-butyl)-1-(ptolyl)-1H-pyrazol-5-yl)-3-(4-(2-morpholinoethoxy)naphthalen-1-yl)urea

(doramapimod)68,

CDK5 inhibitor 1468) the in silico findings in terms of enrichment factors were in line with an in vitro screening while high enrichment factors were also observed for the remaining compounds. The same comparison for another dataset led to similar results. Nevertheless, the achieved and highest

achievable

enrichment

factors

for

imatinib

(16)68,

N-(3-chloro-4-((3-

fluorobenzyl)oxy)phenyl)-6-(5-(((2-(methylsulfonyl)ethyl)amino)methyl)furan-2-yl)quinazolin4-amine

(lapatinib)68,

9,

(R)-4-(1-aminoethyl)-N-(pyridin-4-yl)cyclohexane-1-carboxamide

ACS Paragon Plus Environment

53

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 54 of 109

(Y27632)68 deviate to a huge extent. Additionally, the binding of EGFR inhibitor N-(3ethynylphenyl)-6,7-bis(2-methoxyethoxy)quinazolin-4-amine (erlotinib)68 to c-Src, Lck, and c-Abl could be rationalized. A consensus scoring procedure was performed for c-Abl inhibitor 16 to explain its inhibitory profile (calculation of the mean similarity score of the c-Abl binding site to all kinase binding sites). The results were re-ranked accordingly. This procedure is described in more detail elsewhere158. The most similar binding site was that of c-Src, although this protein kinase is not inhibited by compound 16 because it does not adopt the DFG-out conformation (inactive state) required for binding. Lck, c-Kit, and Hck show also significant similarities to c-Abl. Inhibition constants for 16 are in a nanomolar range for those three proteins. The similarity between Lck and c-Abl is a hint to a possible immunosuppressant effect of compound 16. Altogether, it can be concluded that this case study of binding site comparison for kinases led to reasonable results. Application of the method of Brakoulias & Jackson67 could facilitate the search for potential protein kinase inhibitor cross-reactivity. The next example deals with the off-target prediction for cannabinoid receptor 1 (CB1R) antagonists used for the treatment of obesity.66 The first antagonist 5-(4-chlorophenyl)-1-(2,4dichlorophenyl)-4-methyl-N-(piperidin-1-yl)-1H-pyrazole-3-carboxamide

(rimonabant)66

was

withdrawn due to psychiatric side effects. This led to the development of the 3,4-diarylpyrazoline derivative ibipinabant (17)66. Muscle toxicity of this drug was observed in a preclinical dog study. Furthermore, it caused mitochondrial dysfunction as measured by cellular generation of reactive oxygen species and mitochondrial ATP production. It was analyzed whether compound 17 inhibits one of the enzymes of the respiratory chain (Complexes I-V) by measuring the oxygen consumption in C2C12 myoblasts after preincubation with the drug. In comparison to control cells, no differences were observed. It was concluded that drug exposure

ACS Paragon Plus Environment

54

Page 55 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

leads to decreased ADP availability in the mitochondrial matrix as suggested by measurement of the mitochondrial membrane potential. Supported by those results, an in silico study followed using the pharmacophore-based binding site comparison method KRIPO65. A homology model of CB1R was built. From the list of similar targets with potential impact on mitochondrial ADP/ATP exchange, ANT1 (adenine nucleotide translocase 1) and VDAC (voltage-dependent anion channel) were manually selected. A successive docking approach with ANT1 (pdb 2c3e) suggested that binding of 17 to ANT1 is likely to occur. A comparable observation for a binding to VDAC was not possible due to its large pore which allows the transport of various small molecules. Result verification was performed by measuring the ADP uptake into isolated bovine heart mitochondria. The application of compound 17 led to a decrease of mitochondrial VDACdependent and ANT-dependent ADP import. C2C12 mitoplasts were isolated to shed light on the question which of both membrane transporters is inhibited. Maximal complex I- and complex IIspecific respiratory rates in dependence of the presence of the drug were measured. The results suggest a reduced respiration by the inhibition of the ANT-mediated ADP/ATP exchange. Additionally, a negligible effect of a closely related 3,4-diarylpyrazoline derivative was observed. Together, those experimental results support the computationally obtained results. Another example of successful off-target prediction was the finding that peroxisome proliferator-activated receptor α (PPARα) and COX enzymes share similarities leading to the binding of similar ligands.85 A combination of different computational methods was applied to a known PPARα ligand. Data from chemical structures of ligands of the identified new target, the textual patient package insert for the query ligand, drugs that modulate the new target, and crystallographic structures were analyzed. The applied methods produce a set of scores (ligand structural similarity, package insert similarity) or a single score (docking and protein pocket

ACS Paragon Plus Environment

55

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 56 of 109

similarity) for known drugs and the final score is calculated from all scores including p-values. First, 602 drugs and 91 diverse protein targets were chosen to investigate drug-target relationships by deriving a drug-target matrix. The uncharacterized potential drug-target interactions were scored using a combination of 3D ligand comparison and package insert similarity. The top ranked results had HIV reverse transcriptase as predicted target and viral or human polymerases as the intended targets. Among the analyzed compounds are examples with known

HIV

reverse

transcriptase

activities:

2-amino-9-((1S,3R,4S)-4-hydroxy-3-

(hydroxymethyl)-2-methylenecyclopentyl)-3,9-dihydro-6H-purin-6-one

(entecavir)85,

1-

((2R,3R,4S,5R)-3,4-dihydroxy-5-(hydroxymethyl)tetrahydrofuran-2-yl)-1H-1,2,4-triazole-3carboxamide

(ribavirin)85,

4-amino-1-((2R,4R,5R)-3,3-difluoro-4-hydroxy-5-

(hydroxymethyl)tetrahydrofuran-2-yl)pyrimidin-2(1H)-one (gemcitabine)85, 2-((2-amino-6-oxo3,6-dihydro-9H-purin-9-yl)methoxy)ethyl L-valinate (valacyclovir)85. The following best scored examples included a relationship between PPARα, COX-1, and COX-2 with regard to the known drug gemfibrozil (18)85. Additionally, the free acids of two other fibrates (clofibric acid (19)85, fenofibric acid (20)85) were included in successive calculations. An alignment of the PPARα agonist 2-(1-(4-chlorobenzoyl)-5-methoxy-2-methyl-1H-indol-3-yl)acetic acid (indomethacin)85 and compound 20 showed a good overlay in terms of chemical features such as surface or electrostatic similarity. The PPARα agonists 18, 19, and 20 were predicted as potential COX ligands. COX protein structures (pdbs 2oyu, 3kk6, 3n8x, 3n8z for COX-1, pdbs 1pxx, 3ln1, 3nt1, 3rr3, 4cox for COX-2) were used for a docking of the three ligands. After the alignment of all binding sites docking into multiple structures was performed. The docking results were similar for COX-1 and COX-2. Compound 18 yielded the most significant scores. The created alignment between the new potential ligands and the known NSAIDs supported the hypothesis that PPARα

ACS Paragon Plus Environment

56

Page 57 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

agonists are COX ligands. PSIM84 was applied to compare the binding sites of both proteins which share no significant sequence similarity and belong to different CATH4 and SCOP5 families. The overall structural comparison using TM-align99 yielded a TM-score below 0.4. All human PPARα structures (pdbs 1i7g, 1k7l, 1kkq, 2npa, 2p54, 2rew, 2znn, 3et1, 3fei, 3g8i, 3kdt, 3kdu, 3sp6, 3vi8) were used for the comparison to nine COX structures. It could be shown that some hydrogen bond and hydrophilic features of the binding sites nicely overlap. COX enzyme assays were performed to validate the in silico results for compounds 18, 19, and 20. The presence of 250 µM of 18, 19, and 20 led to 18, 14 and 48% inhibition of COX-1, respectively. Compound 20 showed dose-dependent inhibition when applied in different concentrations. The analysis of the effect on 18 and 19 against COX-1 and COX-2 led to inconclusive results. Therefore, only IC50 values for the COX-1 and COX-2 inhibition by 20 were determined. The IC50 value for COX-1 was 950 µM being at least twofold weaker for COX-2. The known NSAIDs N-(4-hydroxyphenyl)acetamide (acetaminophen)85 and 2-hydroxybenzoic acid (salicylic acid)85 have IC50 values of about 200 and 500 µM, respectively, in a microsomal assay for COX-1. In this example, the exploitation of existing medicinal chemistry data led to the prediction of already known and unexpected cross-reactivity. In another study83 the validity of an interesting target with regard to potential off-targets was assessed using ProBiS81. The antiviral compound 2-([1,1'-biphenyl]-4-yl)quinoline-4-carboxylic acid (RK424)83 was identified by a library screening in a cell-based assay. It shows inhibition for many subtypes of influenza A virus in vitro and was responsible for a partial protection of mice against a lethal dose of A/WSN/1933 (H1N1) virus in vivo. The compound inhibits the viral ribonucleoprotein (NP) complex by binding to a small pocket of the viral NP leading to its accumulation in the cell nucleus. Additionally, the molecule disrupts NP-RNA and NP-NP

ACS Paragon Plus Environment

57

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 58 of 109

interactions and inhibits oligomerization. Binding site similarity analyses were applied to two ligand binding sites: one of the identified novel target NP (pdb 2iqh) and one of a potential novel antiviral target, the polymerase protein PA (pdb 4e5e). The pockets of both proteins were excised and used to search for similar binding sites in the ProBiS81 protein structure library. As a result, the NP pocket was shown to have a unique surface structure whereas for PA twenty similar pocket structures were identified, some of them involved in host cell function. Therefore, the authors reasoned that it is more beneficial to target the NP pocket than the PA structure due to potential off-target effects. This analysis shows how binding site comparison might help to find a suitable target with low risk of designing compounds with potential side effects. The same was achieved in another study42 dealing with the methylmalonyl CoA mutaseassociated GTPase MeaB, an enzyme essential for the growth of many pathogenic bacteria such as Mtb. The Mtb protein Rv1496 and its homologs in Mycobacterium smegmatis and Mycobacterium thermoresistibile (pdbs 3md0, 3nxs, 3tk1) were shown to be MeaB enzymes. The crystal structures were solved and compared to similar enzymes from other organisms using the method of Totrov et al.41. The comparison of their nucleotide binding sites revealed that Rv1496 is more similar to its mycobacterial orthologous proteins and more distinct from the human homologous protein MMAA (methylmalonic aciduria associated protein A, pdb 2www). Figure 8 illustrates the main differences between the four binding sites. Mutations in the MeaB homolog MMAA protein cause fatal methylmalonic aciduria. Therefore, research towards the design of potential MeaB inhibitors should always consider their potential activity on the human homolog of MeaB.

ACS Paragon Plus Environment

58

Page 59 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

Figure 8. Comparison between three mycobacterial MeaB enzymes and human MMAA. (a) Overall alignment of the different enzymes’ structures (forest green: Mtb MeaB with bound GDP molecule in sticks representation, pdb 3md0; orange: M. smegmatis MeaB, pdb 3nxs; purple: M. thermoresistibile MeaB, pdb 3tk1; magenta: human MMAA protein, pdb 2www). (b) Detailed view of the major differences between the mycobacterial proteins and the human MMAA protein in the nucleotide binding site. Respective residues are represented as sticks and named according to the crystal structures. This figure was created using UCSF Chimera 1.10.1131. The last example does not provide a successful prediction of side effects by binding site comparison methods but points towards their impact on the design of molecules with reduced side effects. Xie et al.96 used their method SOIPPA94 to analyze the cross-reactivity of selective estrogen receptor modulators (SERMs), for example 1-(4-(2-(azepan-1-yl)ethoxy)benzyl)-2-(4hydroxyphenyl)-3-methyl-1H-indol-5-ol (bazedoxifene)96, (5R,6S)-6-phenyl-5-(4-(2-(pyrrolidin1-yl)ethoxy)phenyl)-5,6,7,8-tetrahydronaphthalen-2-ol

(lasofoxifene)96,

1-(2-(4-((3S,4S)-7-

methoxy-2,2-dimethyl-3-phenylchroman-4-yl)phenoxy)ethyl)pyrrolidine (ormeloxifene)96, (6-

ACS Paragon Plus Environment

59

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 60 of 109

hydroxy-2-(4-hydroxyphenyl)benzo[b]thiophen-3-yl)(4-(2-(piperidin-1yl)ethoxy)phenyl)methanone (raloxifene)96, (Z)-4-(1-(4-(2-(dimethylamino)ethoxy)phenyl)-2phenylbut-1-en-1-yl)phenol (4-hydroxytamoxifen)96, tamoxifen (21)96. Compound 21 is a commercial drug targeting estrogen receptor α (ERα) to treat breast cancer. It has known side effects like cardiac abnormalities, thromboembolic disorders, and ocular toxicity. The SERM binding site in ERα (pdb 1xpc) was used to search for similar ligand binding sites in a representative set of 825 structures using SOIPPA94. The most significant similarity was found for a sarcoplasmic reticulum Ca2+ ion channel ATPase (SERCA, pdb 2zbd) from Oryctolagus cuniculus sharing 96% sequence similarity with its human homolog. SERCA regulates cytosolic calcium levels by accumulating calcium in the lumen. A complex structure of the enzyme bound to two known inhibitors (pdb 2agv) gives information about its binding site. The authors conclude that SERMs bind to a site similar to the predicted site as 30% of the residues of the known and predicted binding site overlap. Known binding sites of SERCA were scanned against a database of the druggable proteome to validate this assumption. Indeed, ERα structures were found within the top ranked hits. Additionally, the comparison of electrostatic potentials of both binding sites revealed that they share similar negative potentials. For the reasons outlined above, it was suggested that SERMs bind in a similar fashion as known inhibitors preventing the binding of two calcium ions to the ATPase. Subsequent docking studies led to the prediction of reasonable binding poses. Finally, experimental evidence supporting the in silico findings is provided: pretreatment with 21 inhibits the effect of the known SERCA inhibitor (3S,3aR,4S,6S,6aR,7S,8S,9bS)-6-acetoxy-4-(butyryloxy)-3,3a-dihydroxy-3,6,9-trimethyl-8-(((Z)2-methylbut-2-enoyl)oxy)-2-oxo-2,3,3a,4,5,6,6a,7,8,9b-decahydroazuleno[4,5-b]furan-7-yl octanoate (thapsigargin)96 of increasing intracellular calcium ion concentrations. Additionally,

ACS Paragon Plus Environment

60

Page 61 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

compound 21 significantly reduces intracellular calcium ion concentrations and the release of platelets which is correlated with platelet adhesion and aggregation. The authors suggest that an inhibition of SERCA could lead to cardiac abnormalities due to impaired muscle contraction. Altogether, the predicted binding of 21 to SERCA might represent one reason for the severe drug side effects. This prediction was supported by Liu et al.159. Additionally, Beca et al.160 could observe an effect of 21 on SERCA2a ATPase activity; they provide additional evidence that the compound inhibits a cardiac sarcoplasmic reticulum chloride channel rather than SERCA.

Figure 9. Structures of compounds analyzed in off-target prediction studies. Prediction of Compound Selectivity Profiles and Affinities The ranking of ligands according to their affinity towards a certain target or even affinity prediction using different computational methods plays a significant role in different phases of rational drug design. One important application of such methods is the scoring of generated docking poses by means of different methodologies discussed in a review by Wang and Lin161. In

ACS Paragon Plus Environment

61

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 62 of 109

the field of proteochemometrics162, this task is extended towards the prediction of selectivity profiles for certain small molecule binders. The performance of those approaches benefits significantly from calculated descriptors derived from structural binding site comparisons. The following two examples demonstrate the applicability of binding site comparison methods for proteochemometric approaches. The method FLAP57 was used to cluster kinase families.58 Furthermore, it was tested whether the method is able to identify binding site features that correlate with the reported inhibition profile of the highly promiscuous kinase inhibitor 2. Fourteen protein kinases of potential interest as pharmaceutical targets with available X-ray structures and pocket information were selected. The authors used the generated binding site similarity matrix to compute a partial least squares model with the pIC50 value of 2 for the different kinases as dependent variable. This analysis resulted in the choice of the best combination of chemical probes to generate GRID molecular interaction fields for further analyses. All common four-point pharmacophores among the kinases were computed. A variable selection routine (fractional factorial design) was used to remove irrelevant variables from the obtained similarity matrix. Finally, a partial least squares model was created that was able to cluster binding sites according to the level of inhibition by compound 2. This example underlines the applicability of binding site comparison methods to create selectivity profiles for promiscuous inhibitors. Das et al.72 used the PESD71 method for a proteochemometric approach, trying to predict pKd/pKi values on the basis of PESD signatures from protein and ligand interaction surfaces. Support vector machine (SVM) models were trained with the PESD signature of 278 nonredundant protein-ligand complexes out of a refined PDBbind163 set of complexes with known pKd/pKi values. From this set 977 different protein-ligand complexes were used as test set for

ACS Paragon Plus Environment

62

Page 63 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

model validation. Altogether, they created five models with Pearson’s correlation coefficients ranging from 0.517 to 0.638, a leave-one-out cross-validated r between 0.482 and 0.633, and a maximum standard deviation of 1.86. Trained SVM models were used to predict the binding affinities for eight different targets. Furthermore, the model performance with respect to the scoring of docking poses to another scoring function was assessed. It could be shown that both methods achieved comparable accuracy. Nevertheless, by examining data of enthalpy and entropy analyses obtained from the SCORPIO database164 the authors conclude that the method works best for complexes with a dominant enthalpic contribution, as most scoring and ranking methods do. APPLICATIONS COVERING THE WHOLE DRUG DESIGN WORKFLOW Identification of Novel Promising Targets To avoid off-target effects, it can be useful to identify interesting targets that do not have anything in common with any human protein or are at least quite dissimilar to their human counterpart. Often, the life of pathogenic microorganisms depends on unique metabolic pathways. The next example shows how binding site comparison methods can help to identify promising targets on the basis of their singularity for the organisms of interest. NAD and NADP are essential cofactors of various enzymes. A comparison of distinct locally conserved pyrophosphate-binding structures (3D motifs) of NAD(P)-bound protein structures with the help of one-dimensional motifs was used to annotate NAD(P)-binding proteins.70 The basic idea is to identify structural motifs by using an automated method for the discovery of motifs of various sizes across protein families with the help of a 16-letter structural alphabet69. Five out of 12 previously defined165 classes of pyrophosphate binding βα structures (β-strand

ACS Paragon Plus Environment

63

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 64 of 109

followed by a turn and a phosphate binding α-helix) with distinct architectures were used to derive sequence motifs. With the help of the derived one-dimensional motifs it was not only possible to differentiate between proteins binding NAD(P) and those not binding these cofactors, but it was even possible to differentiate between proteins binding only NAD or NADP. Intriguingly, one enzyme was identified that owns a unique pyrophosphate-binding 3D motif. This enoyl-acyl carrier protein reductase (EC 1.3.1.9, 1.3.1.10) is essential for bacterial fatty acid biosynthesis. The authors conclude that its uniqueness as compared to human proteins and the fact that the enzyme is conserved across many bacterial species point towards its suitability as a novel drug target. Binding Site Identification The PDB contains about 114,000 biological macromolecular structures, whereby approximately 28,000 apo structures have unknown binding sites. Therefore, it is of great interest to identify putative druggable binding sites by different methods. Most of them are based on geometric or energetic properties of the protein’s surface or sequence similarities. Two articles29,166 give an overview over different approaches to find binding sites in proteins. Given the redundancy of PDB entries, it is also possible to identify binding sites derived from liganddefined binding sites of similar proteins using appropriate comparison methods. This approach led to reliable results in two examples which are presented below. The first example98 deals with the detection of a putative binding site in Cas1 of Escherichia coli (YgbT). The protein takes part in a system of adaptive immunity against viruses and plasmids in prokaryotes. It is a CRISPR (clustered regularly interspaced short palindromic repeats) associated protein whose function could not be elucidated. It was known that the

ACS Paragon Plus Environment

64

Page 65 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

enzyme exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions, replications forks, and 5’-flaps. By means of a Dali search167 three other Cas1 proteins belonging to different CRISPR subtypes with low overall sequence similarity were found. A surface charge analysis was used to detect several large patches of positively charged residues as potential DNA binding sites. The DNA binding site of the enzyme could not be identified by both methods because YgbT contains two clusters of positively charged amino acids. The application of SurfaceScreen97 analysis revealed similarities between the enzyme of interest (pdb 3nkd) and E. coli topoisomerase III (pdb 1i7d). After superposition of an YgbT main basic patch and the single-stranded DNA binding site of E. coli topoisomerase III, a possible binding mode of single-stranded DNA to YgbT was modeled. Site-directed mutagenesis studies substantiated the results. The authors could show that YgbT belongs to a novel, structurally distinct family of nucleases acting on branched DNAs. They suggest that some components of the CRISPR-Cas system additionally have a function in DNA repair. The identified binding site was later validated by crystal structures of the DNA-bound enzyme (pdb 5dlj). Binding site comparison approaches are also valuable tools for the de-orphanization and functional prediction of “orphan” proteins. One example80 shows the applicability of purely computational approaches to investigate nuclear receptors (NR) without obvious ligand-binding pockets in their X-ray structures. NR4A1 is part of the steroid receptor superfamily. Its expression is induced by stress stimuli and cellular growth-factor signaling. Its impact on tumorcell apoptosis in multiple tissue types and on apoptotic signaling of thymocytes as well as within the hypothalamic-pituitary axis was shown before, motivating the search for small molecular modulators of NR4A1. The apo NR4A1 (pdb 2qw4) was subjected to a 4.1 µs MD simulation. It

ACS Paragon Plus Environment

65

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 66 of 109

was shown that an irreversible loop movement leads to the presence of an “open” and “closed” receptor state. A pocket analysis was performed for 15 clusters obtained from a clustering of 4,100 MD snapshots. For each structure large pockets, which were absent in the original X-ray structure, were extracted. This analysis revealed the presence of a previously unidentified pocket. In comparison to the general NR ligand-binding site, it is differently located. Its presence can be observed for about 500 to 800 ns before a flexible loop region closes the binding site in an irreversible manner. The stability of the identified pocket was assessed by performing an independent MD simulation starting with the empty pocket. It remained stable for 1 µs. Encouraged by those results, a PoLiMorph79 pocket graph description was used to search for structurally related protein binding sites in the sc-PDB8,9. The ligands of the most similar binding sites were extracted. Apart from nucleotides (adenosine-5’-β,γ-methylene triphosphate, CoA, FAD, guanosine-5’-monophosphate), a bis-aza-indole compound was retrieved. The fact that one of the known NR4A1 activators is bis-indole substantiates the impact of the discovered binding site. The already known ligands of NR4A1 were docked into the identified binding site. MD simulations of the complexes showed stable protein conformations for the first ligand for at least 1 µs. A simulation with the second ligand shows a higher flexibility in terms of rmsd due to the ligand’s flexible alkyl chains. The presence of the second ligand led to a closure of the binding pocket forming loop and a trapping of the bound ligand. As this remote loop has the capacity to communicate with a Nurr1-binding motif NBRE within an RXR-α/NR4A1 heterodimer, the finding suggests a possible mode of action for the investigated small molecule modulators. The presented results show how previously identified pockets can be additionally verified using suitable binding site comparison tools. Comparison of Proteins from Different Organisms

ACS Paragon Plus Environment

66

Page 67 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

In silico drug design methods highly rely on structures of similar proteins and homologs of the protein of interest (e.g. for homology modeling or binding pose prediction). Additionally, many experimental analyses depend on the findings for enzymes from other organisms rather than from the organism of interest due to expression or crystallization problems. Two applications of binding site comparison methods presented here show that pockets of related proteins are not always similar. The assumption that the results obtained for a protein of a different organism can be transferred towards the protein of interest has to be taken with caution. MAOs (isoforms MAO-A, MAO-B) are catabolic enzymes of monoamine neurotransmitters. Therefore, they represent interesting targets of antidepressant and antiparkinsonian drugs. The study presented here78 aimed to study the biochemical validity of zebrafish MAO as disease model for neurobehavioral studies and virtual screening. PocketMatch75 was used to analyze similarities and dissimilarities between the MAO isoforms from human, rat, and zebrafish. Intriguingly, the comparison between human and rat MAO-A, which share a high sequence identity with respect to the active sites, revealed that both enzymes differ in the conformations of several active site residues. Nevertheless, the use of various distance thresholds from 3 to 10 Å could show that the binding sites are more similar when considering a larger ligand environment. Homology models of zebrafish MAO based on human MAO-A (pdb 2bxs) and MAO-B (pdb 2byb) were compared to the human enzymes. While they share a high global structural similarity, their binding sites are dissimilar in terms of shape and physicochemical properties. Kinetic measurements with seven known mammalian MAO-A and MAO-B inhibitors clearly show that inhibition constants largely differ among different species. No effect on zebrafish MAO could be observed for the amphetamine derivative 1-(4-(methylthio)phenyl)propan-2amine (MTA)78. The same holds true for the highly selective MAO-B inhibitors 6-(4-

ACS Paragon Plus Environment

67

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

butoxyphenyl)thiomorpholin-3-one

(BTO)78,

Page 68 of 109

6-(4-(benzyloxy)phenyl)thiomorpholin-3-one

(ZTO)78, and 2-(4-(benzyloxy)phenyl)thiomorpholine (ZTI)78. The inhibitory mechanism of the naphthylisopropylamine derivative 1-(6-methoxynapthalen-2-yl)propan-2-amine (MeONIPA)78 was analyzed using molecular docking and MD simulations. The in silico studies suggest that the inhibitors show different modes of action for distant MAOs despite similar inhibition constants. In conclusion, the authors state that, although zebrafish and human MAOs share some overlapping functional and structural properties, they might not be comparable with respect to their effector molecule responses. Therefore, studies using zebrafish as model system should be analyzed with care. A similar study50 was done for structures targeted by cardiovascular drugs. CMASA49 was applied to compare the binding sites of different drug binding pockets which were previously defined using docking and MD simulations. All binding sites were used as queries to search for structural similarities in a non-redundant SCOP5 database. The binding patterns of withdrawn cardiovascular drugs between human and mouse proteins were shown to be highly different due to different active site compositions. In contrast, the binding patterns and affinities seem to be similar for FDA-approved cardiovascular drugs (3R,5R)-7-(2-(4-fluorophenyl)-5-isopropyl-3phenyl-4-(phenylcarbamoyl)-1H-pyrrol-1-yl)-3,5-dihydroxyheptanoic acid (atorvastatin)50 and methyl (clopidogrel)50.

(S)-2-(2-chlorophenyl)-2-(6,7-dihydrothieno[3,2-c]pyridin-5(4H)-yl)acetate The

withdrawn

drugs

5-(4-((6-hydroxy-2,5,7,8-tetramethylchroman-2-

yl)methoxy)benzyl)thiazolidine-2,4-dione (troglitazone)50, (3R,5S,E)-7-(4-(4-fluorophenyl)-2,6diisopropyl-5-(methoxymethyl)pyridin-3-yl)-3,5-dihydroxyhept-6-enoic acid (cerivastatin)50, 4methoxy-N-(2-(2-(1-methylpiperidin-2-yl)ethyl)phenyl)benzamide (encainide)50, (1S,2S)-2-(2((3-(1H-benzo[d]imidazol-2-yl)propyl)(methyl)amino)ethyl)-6-fluoro-1-isopropyl-1,2,3,4-

ACS Paragon Plus Environment

68

Page 69 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

tetrahydronaphthalen-2-yl 2-methoxyacetate (mibefradil)50, and ethyl ((R)-1-cyclohexyl-2-((S)-2((4-((Z)-N'-hydroxycarbamimidoyl)phenyl)(methyl)carbamoyl)azetidin-1-yl)-2oxoethyl)glycinate (ximelagatran)50 were analyzed. Altogether, the differences between humans and mouse models for the testing of compounds to treat cardiovascular diseases might arise due to differences in the binding sites of their respective target proteins. Additionally, off-target effects observed in humans might not necessarily show up in an animal model as a result of sequence divergence and, vice versa, wanted effects observed in the mouse model might not occur in humans. The last example43 is dedicated to a rather rarely used application of binding site comparison methods. Given that only protein structures of 16 out of 179 potential drug targets for the treatment of tuberculosis are solved, a so-called “homolog-rescue strategy” was applied. Based on 1,675 homologs from nine mycobacterial species different from Mtb, it could be shown that structurally solved homologs of 52 otherwise intractable Mtb targets exist. Altogether, 614 proteins from mycobacterial species were extracted from the PDB to assess the applicability of those targets in antimycobacterial drug design. Pairs of non-Mtb-mycobacterial proteins and Mtb proteins with a sequence similarity above 25% and sequence coverage above 70% over the Mtb sequences were retained for further analyses. The active sites of the resulting 106 pairs of Mtb and non-Mtb-mycobacterial enzyme homologs were compared with experimentally determined structures to elucidate whether those 52 targets represent useful surrogates for tuberculosis drug design. For some of those pairs, binding site comparison using a method developed by Totrov41 revealed conserved active site shape and geometry in terms of active site Cα rmsd, side chain identity, and similarity with respect to pharmacophoric properties. The results for 52 non-Mtb mycobacterial enzyme homologs indicate that it is possible to increase the effective structural

ACS Paragon Plus Environment

69

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 70 of 109

coverage of all potential Mtb targets more than threefold. This strategy was applied to cytidylate kinase. The enzyme is crucial for the synthesis and salvage pathways of DNA and RNA precursors. Protein structures were solved for homologs from M. smegmatis (pdb 3r20) and Mycobacterium abscessus (pdb 4die) that share 68 and 74% sequence identity with the enzyme from Mtb, respectively. Although experimental verification of this “homolog rescue” approach is still missing, various examples in medicinal chemistry show that the presence of a homolog enzyme structure led to successful drug design strategies. This work should encourage researchers to solve crystal structures of homolog proteins or exploit the structures of known homologs if the target structure is intractable. Binding site comparison proved to be an indispensable tool to assess the suitability of homology models for rational drug design workflows. Function Prediction Although the number of available protein structures increases exponentially and function annotation by means of sequence or fold comparison is possible within minutes or even seconds, a study168 suggested that more than one thousand entries in the PDB are annotated as proteins of unknown function. Binding site comparison methods facilitate the elucidation of their function and biological importance. This is also discussed in an article by Petrey et al.169. A clique detection method was used to compare the binding site surface of a pyrophosphatase encoded by the gene MJ0226 of Methanocaldococcus jannaschii (pdb 1b78) to other functional sites stored in the eF-site database55. The enzyme hydrolyzes non-canonical purine nucleotides to their respective monophosphate derivatives.170 The method of Kinoshita et al.55 successfully revealed a high similarity of this pyrophosphatase to two other structurally unrelated nucleotide-

ACS Paragon Plus Environment

70

Page 71 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

hydrolyzing enzymes: folylpolyglutamate synthetase (pdb 1jbv) and pyruvate kinase (pdb 1a49).56 On the basis of those similarities, a possible mononucleotide binding site was identified, underpinning the applicability of surface comparison methods for function prediction. Interestingly, the applied method helped to detect the protein’s functional binding site due to the method’s ability to find an optimal substructure match for a query surface of the complete protein. Figure 10 shows a nucleotide binding site alignment of pyrophosphatase and pyruvate kinase and provides a comparison of both surfaces. Intriguingly, the enzymes show dissimilar folds. They exclusively share similar binding site surface properties.

Figure 10. Comparison of the nucleotide binding surfaces of the pyrophosphatase and pyruvate kinase after an alignment according to bound nucleotides. (a) Overall structure of the

ACS Paragon Plus Environment

71

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 72 of 109

Methanocaldococcus jannaschii pyrophosphatase (pdb 2mjp). (b) Overall structure of pyruvate kinase from Oryctolagus cuniculus (pdb 1a49). (c) Binding site surface of pyrophosphatase (pdb 2mjp) colored according to the electrostatic potential with bound ATP analog (sticks). (d) Binding site surface of pyruvate kinase (pdb 1a49) colored according to the electrostatic potential with bound ATP (sticks). Figures were created using UCSF Chimera 1.10.1131. Konc et al. were able to partly unravel the function of a protein encoded by the TM1631 gene from Thermotoga maritima.82 The predicted binding site of a known crystal structure (pdb 1vpq) deposited by the Joint Center for Structural Genomics was compared to binding sites in the PDB with the help of ProBiS81. The most similar binding sites are involved in DNA replication, phosphate transfer, or DNA repair. They are part of proteins with quite different folding patterns. A relationship between TM1631 and the DNA-binding protein endonuclease IV (pdb 2nqj) with regard to a similar electrostatic potential of their binding sites was proposed. A model of the uncharacterized DNA-bound protein was build. The complex was validated using MD simulations. Finally, the authors suggested a possible DNA repair function of the protein. The experimental validation of this hypothesis is missing. Nevertheless, this example underlines the impact of binding site comparison if function cannot be inferred from sequence or fold similarity to well-characterized proteins. Binding site comparison methods were also successfully applied to annotate proteomes as shown for the Mtb proteome.76 The authors developed a workflow using the Mtb genome, extracted all available protein structures from the databases PDB and ModBase171, and modeled 54 additional protein structures. Putative binding sites were identified using two binding site detection algorithms and compared to known sites in the PDB using PocketMatch75. A PLP binding site of a methionine-γ-lyase of Trichomonas vaginalis (pdb 1e5f) was within the best

ACS Paragon Plus Environment

72

Page 73 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

scored hits. Consequently, a possible function of the PLP dependent enzyme Rv3340 as sulfhydrylase was proposed. In the same year, a crystallographic study172 supported this hypothesis. Unfortunately, the crystal structure of the Mtb enzyme was never published. A crystal structure of the Mycobacterium marinum enzyme (pdb 4kam, 91% sequence identity to the Mtb enzyme) allows a structural comparison to methionine-γ-lyase from Mtb. Both enzymes share a common overall fold and a highly similar active site. A comparable analysis52 was performed to analyze 616 human protein-coding genes that have no or insufficient evidence of protein existence. Protein sequences with the lowest level of existence evidence (PE5 proteins) were used as most dubious set of missing proteins. Structures were modeled by folding simulations using I-TASSER173 which is insignificantly dependent on structures of homologous proteins as compared to other structure prediction algorithms. Functional annotation and GO assignment was achieved using COFACTOR51. It could be shown that PE5 proteins are overrepresented in transporter and receptor activity GO terms as compared to PE1 proteins (highly credible evidence of protein existence). This observation is consistent with the fold family annotation finding that PE5 contains many membrane proteins and could explain the preclusion of their detection in mass spectrometry. Six high-scoring PE5 proteins out of 66 proteins could be found in the PeptideAtlas 2014-08174 (based on mass spectrometry reanalyses) dataset. Therefore, they might be worth further examination. Altogether, the authors present a useful pipeline to annotate protein-coding genes for proteome analysis. Analysis of Evolutionary Relationships The analysis of the evolution of the protein world is commonly performed with respect to protein folds.175 Nevertheless, cases of convergent evolution and similarities of proteins with

ACS Paragon Plus Environment

73

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 74 of 109

distant functionalities or different folds are usually not accessible from that perspective. Binding site comparison methods, especially those that work sequence and fold independent, can reveal interesting aspects of evolution not analyzed so far. The following two examples will underpin this statement. Cluster analysis for the functional classification of binding pockets was implemented in the Cavbase approach44. Its applicability for protein family classification was investigated.47 The method was optimized and validated for a dataset of 105 cavities from functionally diverse enzyme families. Results are shown for two families with pharmaceutical relevance: α-CAs and protein kinases. The CA family contains 14 members whereas crystal structures were only available for six members showing quite different levels of sequence similarity. Cavbase44 succeeded in classifying the different structures at subfamily level. Furthermore, the approach enabled the discrimination between two conformational states of the enzyme as part of the enzymes’ catalytic mechanism. Some unique features could be identified for CA IV demarcating this family from other CA isozymes. Altogether, binding site comparisons can help to elucidate links between proteins which cannot be found using sequence comparisons. The clustering of protein kinases structures according to the physicochemical properties of their binding sites led to similarly promising results. The overall classification was in good agreement with CATH4, SCOP5, and the landscape analysis of Naumann and Matter176 using a GRID/cPCA approach. Distinct activation states of protein kinases, which play a crucial role in different disease models, could also be captured by the Cavbase44 approach. As an example, the authors show how the method could distinguish between CDKs bound to cognate cyclin and phosphorylated at a threonine residue of the activation loop. The comparison also resulted in the establishment of relationships between different protein kinase family members. A superposition of Erk2 and

ACS Paragon Plus Environment

74

Page 75 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

p38α structures revealed a high level of similarity between those MAP kinases in distinct binding site regions. Such investigations could contribute to the location of selectivitydiscriminating regions within kinase families. The method’s ability to reveal relationships between remote homologs (low sequence similarity, highly similar in structure) was shown for cAMP-dependent kinases from different organisms. The term convergent evolution is used whenever different organisms evolve in a convergent manner owing to similar selection pressures. Such evolutionary scenarios can be studied on the basis of amino acid changes in protein evolution. An exhaustive search for local structural similarities between non-redundant protein functional sites was performed to find new examples for convergent evolution.87 A non-redundant set of 1,924 protein chains was generated using only X-ray structures and a sequence similarity cut-off corresponding to a minimum BLAST177 p-value of 10-7. 10,175 surface clefts from this dataset were identified by the SURFNET algorithm109. The identified cavities were processed using PROSITE178 patterns to find functionally important residues and ligand binding sites. Afterwards, they were subjected to an all-against-all comparison using Query3d86. Non-collinear and significant matches in terms of the corresponding Z-score were analyzed. Altogether, 28 out of 32 non-collinear structural matches were identified as common incidents of non-collinearity deriving from circular permutation events in protein sequences. Of the remainder, three sequence inversion events were already published. Additionally, members of the ABC transporter (pdb 1b0u) and Hpr kinase/phosphorylase families (pdb 1kkl) represent a case of local analogy in the ATP binding sites. The co-crystallized ligands superpose quite well in the corresponding alignment while the three sequence regions involved in the structural match are located 1-2-3 versus 3-1-2 (Figure 11). The authors conclude that the identification of such analogies could be relevant for

ACS Paragon Plus Environment

75

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 76 of 109

suggesting experimental strategies to devise new classes of inhibitors for members of the ABC transporter family.

Figure 11. Illustrative example of convergent evolution as observed for a member of the ABC transporter family and a protein of the Hpr kinase/phosphorylase family. (a) Alignment between the unique phosphate binding motif of the ABC transport family member (pdb 1b0u, green) with bound ATP in green sticks representation and Hpr kinase/phosphorylase (pdb 1kkl, orange). (b) Detailed view of the common phosphate binding motif of both proteins. Two differently oriented β-strands contain essential binding site residues which are represented as sticks. Figures were created using UCSF Chimera 1.10.1131. In the last example60 a library of about 2,000 binding surface types derived from bound forms of protein structures were generated to reflect functional and evolutionary relationships among highly divergent proteins. Pairwise similarities between the binding surfaces of 28,986 ligandbound PDB structures and the corresponding rmsd values were calculated using fPOP59 to build a database. The proteins’ functional surfaces as extracted on the basis of a purely geometric approach were obtained from the SplitPocket database121. Afterwards, a finer classification

ACS Paragon Plus Environment

76

Page 77 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

scheme was applied by using surface characteristics of the functional pockets (geometrical, physicochemical, evolutionary features). As a test case, 41 EC annotated oxidoreductase structures with the CATH ID 3.20.20.70 (Aldolase class I) were divided into three distinct subtypes that exactly match the EC annotations. The method was also validated in terms of function inference on an oxidoreductase test set containing 50 ligand-bound structures with Aldolase class I fold. Nine of these complexes are not EC annotated. The PSC (Protein Surface Classification) approach was used to reconstruct a tree using the three previously determined surface subtypes as references. It could be shown that the PDB entry 3gka belongs to the surface subtype with the EC number 1.3.1.42, whereas the proteins with pdbs 1gwj, 2r14, 3kru and 3krz show a similar surface to NADPH dehydrogenase (EC 1.6.99.1). Unfortunately, none of those proteins has got an EC classification up to now. Nevertheless, the authors conclude that a comparable approach is also suitable to identify the surface type of an unbound structure, using the classification of bound structures. PSC could thereby reveal the common ancestor of two identified subtypes by inferring from ancestral binding surfaces. This could facilitate the experimental validation of a hypothetical pathway and the investigation of functional interchangeability. The method was tested for the classification of highly divergent proteins to substantiate this assumption. As an example, 143 members of the surface type glycosidase with low sequence identity, but highly conserved surfaces were clustered using a fine classification scheme. A dendrogram was obtained via hierarchical clustering analysis. It shows a division into four subtypes. The first three belong to distinct EC classes, while the last group includes enzymes with EC numbers differing in the last digit. All surfaces show similar physicochemical properties, suggesting that functional dissimilarity is related to a variation in residue composition. One class associated with EC 2.4.1.19 (other classes belonging to EC 3.2.1.-) is

ACS Paragon Plus Environment

77

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 78 of 109

distinct from the others with respect to surface area, global skewness, and kurtosis. Altogether, this application not only shows the impact of binding site comparison on the analysis of evolutionary pathways, but also the influence of different protein surface features on classification schemes. SUMMARY AND OUTLINE Taken together, nearly all binding site comparison approaches enable protein classification and functional inference for uncharacterized proteins. Intriguingly, matches are not only found within one family, but also between proteins which share no obvious relationship when comparing one binding site against a large database of small molecule binding sites. This indicates that binding site comparison should be an essential part of the rational drug design workflow to identify potential off-targets effects, desirable polypharmacology, or novel potential effector molecules. It could support the first steps of new projects by identifying binding sites, predicting protein function, and analyzing novel and promising targets. However, the exploitation of protein binding pocket space is by no means complete. Especially, the linkage between protein similarities and ligand similarities can be explored exhaustively given the huge amount of data available in the public domain and the possible amount of missing links between several proteins. Although this perspective gives an overview over successful applications of different binding site comparison algorithms, the number of examples underlining the predictivity and general applicability of those methods in rational drug design is still small. We have to keep in mind that those methods were not designed to predict already experimentally validated results, but to promote experimental studies and to bring forward new ideas. Unfortunately, the number of such

ACS Paragon Plus Environment

78

Page 79 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

applications is limited and anything but proportional to the vast amounts of excellent binding site comparison methods. This was also the limiting factor for the number of approaches presented in this article, although we do not claim completeness. With regard to the level of binding site comparison, it has to be stated that various tools exist implementing very different binding site representations and comparison methods leading to different similarity measures. A quite recent trend tends towards the transformation of complex approaches for the comparison of ligands to the comparison of binding pockets for small molecules (e.g. APF41, PSIM84). Some approaches also include the information of bound ligands into the binding site comparison, for example by comparing protein-ligand interactions as realized in KRIPO65, SILIRID (Simple Ligand-Receptor Interaction Descriptor)179, or a recently developed method180 using graph comparisons - a methodology that is also favored in various other computational applications. Based on the assessment of available literature providing information on successfully applied binding site comparison methods, we conclude that there is no state-of-the-art best-performing algorithm to compare the binding sites of more or less unrelated proteins. Furthermore, it becomes clear that benchmarking of the different methods against each other in terms of quality can never be satisfying. Their performance always depends on the user’s demands, such as finding similarities between otherwise unrelated proteins, which usually equals the search for a needle in the haystack. Additionally, the variability of the approaches and parameters that can be defined and even optimized according to the desired applicability domain hampers a comparison of available methods. Some examples also show that binding site comparison methods do not lead to promising results when used separately but are encouragingly successful when applied in a workflow including different in silico or experimental methods. Nevertheless, it is quite useful

ACS Paragon Plus Environment

79

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 80 of 109

to compare all methods with respect to runtime, consideration of protein flexibility, and their applicability for protein classification. Having a look at the methods presented here, it becomes also clear that appropriate datasets to benchmark pocket comparison algorithms are missing. All benchmarks are performed using more or less different datasets. The diverse ways of evaluation impede the choice of a suitable method for the respective scientific problem. A method comparison with respect to the test set generated by Barelier et al.16 could settle the question: What do we have to improve: the characterization of the binding site, the comparison algorithm, or the similarity measure? Another question not answered yet is the influence of secondary structure elements on ligand binding. Some first examples show that conserved motifs and spatial arrangements of secondary structure elements can recognize specific functional groups or privileged scaffolds36. One of the presented studies40 serves as a hint towards this possibility, but still a detailed analysis of all available binding sites is missing. GIRAF (Geometric Indexing with Refined Alignment Finder)181 is one approach that analyzed ligand binding site similarities by clustering structural motifs of ligand binding sites. The structural diversity was assessed by an all-against-all comparison.182 It could be demonstrated that ligand binding sites are shared across different folds. Additionally, some links concerning bound ligands result from highly regular secondary structure elements. This observation is supported by an earlier article183 stating that specific combinations of fragments encode specific functions. Considering the examples for kinase binding site comparison, it becomes clear that we cannot deny the huge impact of the protein’s conformational space on the results. Therefore, further development of binding site comparison methods should focus on the inclusion of protein flexibility which highly depends on bound ligands and functional states of the proteins. Protein

ACS Paragon Plus Environment

80

Page 81 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

flexibility is taken into account at least to some extent in most methods. Structures retrieved from MD simulation studies are often used to take protein flexibility into account. NMR structures are also a good starting point to introduce flexibility, but they are restricted to small proteins. Additionally, new methods184 are developed to overcome restrictions in structure comparison which are attributable to conformational flexibility. Finally, we have to ask ourselves the question: Do we need another binding site comparison algorithm? Of course, many improved, faster, and more efficient binding site comparison algorithms were developed recently (Supporting Information). The majority of methods could not be introduced here as a consequence of missing successful applications. Therefore, this question cannot be fully answered as long as binding site comparison is not applied to promote drug design strategies. A deeper look into the literature reveals that the number of methods published to compare small molecule binding sites clearly exceeds the number of publications underlining the basic, independent applicability for structure-based rational drug design (aside from retrospective analyses). Consequently, it is crucial to examine the different application domains and to test binding site comparison methods in predictive studies, including experimental validation of the results. A single basic question remains: Why is there such a low amount of successful published applications of binding site comparison algorithms? Maybe, it can be explained by the fact that only a few of the methods are publically available. Nevertheless, there are many software tools that can be downloaded as well as servers that allow users a fast comparison of their protein of interest to known binding sites. Furthermore, the presented applications show that the success of the binding site comparison methods strongly depends on the combination with other in silico tools, for example MD simulation to account for

ACS Paragon Plus Environment

81

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 82 of 109

the protein’s conformational flexibility or binding site detection methods to derive binding sites for apo structures of proteins. Finally, it could be shown that the various applications of binding site comparison methods presented here underline their importance for rational drug design. This assembly of useful applications should encourage its reader to use the presented methods or other methods provided as ready-to-use web servers and software packages (Supporting Information) to cope with challenges of his or her own project. Depending on the comparison methods used, the time required to compare binding site ranges from milliseconds to minutes185. Therefore, we highly recommend the application of binding site comparison tools to support or even guide experimental investigations. Figure 12 gives an overview of possible applications of binding site comparison algorithms including successfully employed tools in various hit identification steps to facilitate the choice of a suitable method. Given this huge amount of applicability domains, we have to admit that all algorithms clearly exceed the limits of sequence and structure comparisons. Therefore, they should become an essential part of rational drug design approaches. Of course, as an emerging technology it should be seen as an interesting new method to support design decision or speed up the identification of interesting ligands in an orthogonal way.

ACS Paragon Plus Environment

82

Page 83 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

Figure 12. The impact of binding site comparison methods on rational drug design and medicinal chemistry with focus on the importance of its combination with other structure-based drug design methods. Helpful computational tools are shown for different scenarios are presented in green boxes while orange boxes highlight steps in a screening workflow which have to be done in the lab. BSC marks steps in those workflows where binding site comparison can help to accelerate the search for promising hits. The superscripts refer to a summary of methods which were already successfully used in this stage of hit identification as discussed in the text. The reader has to bear in mind that the application domains partially overlap and cannot be clearly distinguished.

ACS Paragon Plus Environment

83

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 84 of 109

ASSOCIATED CONTENT Supporting Information. Summary of different binding site comparison methods and their availability. This material is available free of charge via the Internet at http://pubs.acs.org. AUTHOR INFORMATION Corresponding Author * For O.K.: phone, +49 231 755 6104; E-mail, [email protected] Author Contributions The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript. Funding Sources C. Ehrt is funded by the Kekulé Mobility Fellowship of the Chemical Industry Fund (FCI). T. Brinkjost is partially supported by the German Research Foundation (DFG, Priority Program 1736, Algorithms for Big Data). O. Koch is funded by the German Federal Ministry for Education and Research (BMBF, Medizinische Chemie in Dortmund, TU Dortmund University, Grant No. BMBF 1316053) ABBREVIATIONS 5-HT2B/2CR, serotonin type 2B/2C metabotropic receptors; AD, Alzheimer’s disease; ANT1, adenine nucleotide translocase 1; APF, Atomic Property Fields; ARK, β adrenergic receptor kinase; BACE-1, β-secretase; CA, carbonic anhydrase; CB1R, cannabinoid receptor 1; CKII, casein kinase II; COMT, catechol-O-methyltransferase; COX, cyclooxygenase; CRISPR, clustered regularly interspaced short palindromic repeats; EC, enzyme commission; ERα,

ACS Paragon Plus Environment

84

Page 85 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

estrogen receptor α; FAK, focal adhesion kinase; FGFR, fibroblast growth factor 1 receptor; GO, gene ontology; IFP, interaction fingerprint; IGF-1R, insulin like growth factor 1 receptor; LSD1, lysine-specific demethylase 1; MAO, monoamine oxidase; MMAA, methylmalonyl aciduria associated protein A; Mtb, Mycobacterium tuberculosis; NP, nucleoprotein; NR, nuclear receptor; NSAID, nonsteroidal anti-inflammatory drug; PDK1, phosphoinositide-dependent kinase 1; PPARα, peroxisome proliferator-activated receptor α; PPI, protein-protein interaction; PSC, Protein Surface Classification; SAM, S-adenosylmethionine; SARS-CoV Mpro, SARS coronavirus main protease; SERCA, sarcoplasmic reticulum Ca2+ ion channel ATPase; SERM, selective estrogen receptor modulators; SVM, support vector machine; UBD, ubiquitin-binding domain; VDAC, voltage-dependent anion channel.

Biographies Christiane Ehrt received her M.Sc. in Biochemistry at the Martin-Luther-University HalleWittenberg. Currently, she works as a Ph.D. student at the Faculty of Chemistry and Chemical Biology at the TU Dortmund University in the research group of Dr. Oliver Koch. Her interests and current focus include the identification and comparison of protein ligand binding sites, virtual screening studies for novel targets and, in this context, comparative modeling, MD simulations and biochemical assay development. Tobias Brinkjost received his Diploma in Computer Science from the TU Dortmund University in 2012. Afterwards he joined the Medicinal Chemistry workgroup of Dr. Oliver Koch in collaboration with the chair for Algorithm Engineering of Prof. Dr. Petra Mutzel at the TU Dortmund for his Ph.D. studies. His current research mainly focusses on the development of models and algorithms in the field of rational drug design.

ACS Paragon Plus Environment

85

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 86 of 109

Oliver Koch studied pharmacy and computer science at the Philipps-University Marburg, Germany, where he also obtained his Ph.D. in pharmaceutical chemistry with Prof. G. Klebe. After postdoctoral research at the Cambridge Crystallographic Data Centre in 2008 and working in drug discovery at MSD Animal Health Innovation, he started his independent academic career in 2012 as a junior group leader for medicinal chemistry at the TU Dortmund University, Germany. His research interests involve the development and application of computational methods in computational molecular design and medicinal chemistry.

REFERENCES (1) Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. The Protein Data Bank. Nucl. Acids Res. 2000, 28 (1), 235–242. (2) Johnson, D. K.; Karanicolas, J. Druggable protein interaction sites are more predisposed to surface pocket formation than the rest of the protein surface. PLoS Comput. Biol. [Online] 2013, 9 (3), e1002951. (3) Loving, K. A.; Lin, A.; Cheng, A. C. Structure-based druggability assessment of the mammalian structural proteome with inclusion of light protein flexibility. PLoS Comput. Biol. [Online] 2014, 10 (7), e1003741. (4) Orengo, C. A.; Michie, A. D.; Jones, S.; Jones, D. T.; Swindells, M. B.; Thornton, J. M. CATH—a hierarchic classification of protein domain structures. Structure 1997, 5 (8), 1093– 1108. (5) Murzin, A. G.; Brenner, S. E.; Hubbard, T.; Chothia, C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 1995, 247 (4), 536–540.

ACS Paragon Plus Environment

86

Page 87 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

(6) RCSB PDB. http://www.rcsb.org/pdb/static.do?p=general_information/pdb_statistics/ index.html (accessed March 21, 2016). (7) Skolnick, J.; Gao, M.; Roy, A.; Srinivasan, B.; Zhou, H. Implications of the small number of distinct ligand binding pockets in proteins for drug discovery, evolution and biochemical function. Bioorg. Med. Chem. Lett. 2015, 25 (6), 1163–1170. (8) Paul, N.; Kellenberger, E.; Bret, G.; Müller, P.; Rognan, D. Recovering the true targets of specific ligands by virtual screening of the protein data bank. Proteins: Struct., Funct., Bioinf. 2004, 54 (4), 671–680. (9) Kellenberger, E.; Muller, P.; Schalon, C.; Bret, G.; Foata, N.; Rognan, D. sc-PDB: an annotated database of druggable binding sites from the Protein Data Bank. J. Chem. Inf. Model. 2006, 46 (2), 717–727. (10) Sturm, N.; Desaphy, J.; Quinn, R. J.; Rognan, D.; Kellenberger, E. Structural insights into the molecular basis of the ligand promiscuity. J. Chem. Inf. Model. 2012, 52 (9), 2410–2421. (11) Schalon, C.; Surgand, J.-S.; Kellenberger, E.; Rognan, D. A simple and fuzzy method to align and compare druggable ligand-binding sites. Proteins: Struct., Funct., Bioinf. 2008, 71 (4), 1755–1778. (12) Desaphy, J.; Azdimousa, K.; Kellenberger, E.; Rognan, D. Comparison and druggability prediction of protein-ligand binding sites from pharmacophore-annotated cavity shapes. J. Chem. Inf. Model. 2012, 52 (8), 2287–2299. (13) Weill, N.; Rognan, D. Alignment-free ultra-high-throughput comparison of druggable protein-ligand binding sites. J. Chem. Inf. Model. 2010, 50 (1), 123–135. (14) scPDB - An Annotated Database of Druggable Binding Sites from the Protein DataBank. http://cheminfo.u-strasbg.fr/scPDB/ABOUT (accessed March 21, 2016).

ACS Paragon Plus Environment

87

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 88 of 109

(15) Martin, A. C.; Orengo, C. A.; Hutchinson, E. G.; Jones, S.; Karmirantzou, M.; Laskowski, R. A.; Mitchell, J. B.; Taroni, C.; Thornton, J. M. Protein folds and functions. Structure 1998, 6 (7), 875–884. (16) Barelier, S.; Sterling, T.; O’Meara, M. J.; Shoichet, B. K. The recognition of identical ligands by unrelated proteins. ACS Chem. Biol. 2015, 10 (12), 2772–2784. (17) Keiser, M. J.; Roth, B. L.; Armbruster, B. N.; Ernsberger, P.; Irwin, J. J.; Shoichet, B. K. Relating protein pharmacology by ligand chemistry. Nat. Biotechnol. 2007, 25 (2), 197–206. (18) Li, G.-Y.; Zheng, Y.-X.; Sun, F.-Z.; Huang, J.; Lou, M.-M.; Gu, J.-K.; Wang, J.-H. In silico analysis and experimental validation of active compounds from Cichorium intybus L. ameliorating liver injury. Int. J. Mol. Sci. 2015, 16 (9), 22190–22204. (19) Bento, A. P.; Gaulton, A.; Hersey, A.; Bellis, L. J.; Chambers, J.; Davies, M.; Krüger, F. A.; Light, Y.; Mak, L.; McGlinchey, S.; Nowotka, M.; Papadatos, G.; Santos, R.; Overington, J. P. The ChEMBL bioactivity database: an update. Nucl. Acids Res. 2014, 42 (Database issue), D1083-D1090. (20) Wishart, D. S.; Knox, C.; Guo, A. C.; Shrivastava, S.; Hassanali, M.; Stothard, P.; Chang, Z.; Woolsey, J. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucl. Acids Res. 2006, 34 (Database issue), D668-D672. (21) Wang, Y.; Suzek, T.; Zhang, J.; Wang, J.; He, S.; Cheng, T.; Shoemaker, B. A.; Gindulyte, A.; Bryant, S. H. PubChem BioAssay: 2014 update. Nucl. Acids Res. 2014, 42 (Database issue), D1075-D1082. (22) Günther, S.; Kuhn, M.; Dunkel, M.; Campillos, M.; Senger, C.; Petsalaki, E.; Ahmed, J.; Urdiales, E. G.; Gewiess, A.; Jensen, L. J.; Schneider, R.; Skoblo, R.; Russell, R. B.; Bourne, P.

ACS Paragon Plus Environment

88

Page 89 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

E.; Bork, P.; Preissner, R. SuperTarget and Matador: resources for exploring drug-target relationships. Nucl. Acids Res. 2008, 36 (Database issue), D919-D922. (23) Kufareva, I.; Ilatovskiy, A. V.; Abagyan, R. Pocketome: an encyclopedia of small-molecule binding sites in 4D. Nucl. Acids Res. 2012, 40 (Database issue), D535-D540. (24) Hendlich, M.; Rippmann, F.; Barnickel, G. LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins. J. Mol. Graphics Modell. 1997, 15 (6), 35963, 389. (25) Laurie, A. T. R.; Jackson, R. M. Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding sites. Bioinformatics 2005, 21 (9), 1908–1916. (26) Fukunishi, Y.; Nakamura, H. Prediction of ligand-binding sites of proteins by molecular docking calculation for a random ligand library. Protein Sci. 2011, 20 (1), 95–106. (27) Pupko, T.; Bell, R. E.; Mayrose, I.; Glaser, F.; Ben-Tal, N. Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics 2002, 18 Suppl 1, S71-S77. (28) Huang, B. MetaPocket: a meta approach to improve protein ligand binding site prediction. OMICS 2009, 13 (4), 325–330. (29) Xie, Z.-R.; Hwang, M.-J. Methods for predicting protein-ligand binding sites. Methods Mol. Biol. 2015, 1215, 383–398. (30) Andersson, C. D.; Chen, B. Y.; Linusson, A. Mapping of ligand-binding cavities in proteins. Proteins: Struct., Funct., Bioinf. 2010, 78 (6), 1408–1422. (31) Boareto, M.; Yamagishi, M. E. B.; Caticha, N.; Leite, V. B. P. Relationship between global structural parameters and Enzyme Commission hierarchy: implications for function prediction. Comput. Biol. Chem. 2012, 40, 15–19.

ACS Paragon Plus Environment

89

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 90 of 109

(32) Rognan, D. Chemogenomic approaches to rational drug design. Br. J. Pharmacol. 2007, 152 (1), 38–52. (33) Haupt, V. J.; Daminelli, S.; Schroeder, M. Drug promiscuity in PDB: protein binding site similarity is key. PLoS One [Online] 2013, 8 (6), e65894. (34) Bajusz, D.; Rácz, A.; Héberger, K. Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? J. Cheminf. [Online] 2015, 7, 20. (35) Dekker, F. J.; Koch, M. A.; Waldmann, H. Protein structure similarity clustering (PSSC) and natural product structure as inspiration sources for drug development and chemical genomics. Curr. Opin. Chem. Biol. 2005, 9 (3), 232–239. (36) Koch, O. Use of secondary structure element information in drug design: polypharmacology and conserved motifs in protein-ligand binding and protein-protein interfaces. Future Med. Chem. 2011, 3 (6), 699–708. (37) Henrich, S.; Salo-Ahen, O. M. H.; Huang, B.; Rippmann, F. F.; Cruciani, G.; Wade, R. C. Computational approaches to identifying and characterizing protein binding sites for ligand design. J. Mol. Recognit. 2010, 23 (2), 209–219. (38) Kellenberger, E.; Schalon, C.; Rognan, D. How to measure the similarity between protein ligand-binding sites? Curr. Comput.-Aided Drug Des. 2008, 4 (3), 209–220. (39) MOE (Molecular Operating Environment); Chemical Computing Group Inc.: 1010 Sherbooke St. West, Suite #910, Montreal, QC, Canada, H3A 2R7, 2013. (40) Koch, M. A.; Wittenberg, L.-O.; Basu, S.; Jeyaraj, D. A.; Gourzoulidou, E.; Reinecke, K.; Odermatt, A.; Waldmann, H. Compound library development guided by protein structure similarity clustering and natural product structure. Proc. Natl. Acad. Sci. U. S. A. 2004, 101 (48), 16721–16726.

ACS Paragon Plus Environment

90

Page 91 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

(41) Totrov, M. Ligand binding site superposition and comparison based on Atomic Property Fields: identification of distant homologues, convergent evolution and PDB-wide clustering of binding sites. BMC Bioinf. [Online] 2011, 12 Suppl 1, S35. (42) Edwards, T. E.; Baugh, L.; Bullen, J.; Baydo, R. O.; Witte, P.; Thompkins, K.; Phan, I. Q. H.; Abendroth, J.; Clifton, M. C.; Sankaran, B.; van Voorhis, W. C.; Myler, P. J.; Staker, B. L.; Grundner, C.; Lorimer, D. D. Crystal structures of Mycobacterial MeaB and MMAA-like GTPases. J. Struct. Funct. Genomics 2015, 16 (2), 91–99. (43) Baugh, L.; Phan, I.; Begley, D. W.; Clifton, M. C.; Armour, B.; Dranow, D. M.; Taylor, B. M.; Muruthi, M. M.; Abendroth, J.; Fairman, J. W.; Fox, D.; Dieterich, S. H.; Staker, B. L.; Gardberg, A. S.; Choi, R.; Hewitt, S. N.; Napuli, A. J.; Myers, J.; Barrett, L. K.; Zhang, Y.; Ferrell, M.; Mundt, E.; Thompkins, K.; Tran, N.; Lyons-Abbott, S.; Abramov, A.; Sekar, A.; Serbzhinskiy, D.; Lorimer, D.; Buchko, G. W.; Stacy, R.; Stewart, L. J.; Edwards, T. E.; van Voorhis, W. C.; Myler, P. J. Increasing the structural coverage of tuberculosis drug targets. Tuberculosis 2015, 95 (2), 142–148. (44) Schmitt, S.; Kuhn, D.; Klebe, G. A new method to detect related function among proteins independent of sequence and fold homology. J. Mol. Biol. 2002, 323 (2), 387–406. (45) Yang, L.; Chen, J.; Shi, L.; Hudock, M. P.; Wang, K.; He, L. Identifying unexpected therapeutic targets via chemical-protein interactome. PLoS One [Online] 2010, 5 (3), e9568. (46) Al-Gharabli, S. I.; Shah, S. T. A.; Weik, S.; Schmidt, M. F.; Mesters, J. R.; Kuhn, D.; Klebe, G.; Hilgenfeld, R.; Rademann, J. An efficient method for the synthesis of peptide aldehyde libraries employed in the discovery of reversible SARS coronavirus main protease (SARS-CoV Mpro) inhibitors. ChemBioChem 2006, 7 (7), 1048–1055.

ACS Paragon Plus Environment

91

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 92 of 109

(47) Kuhn, D.; Weskamp, N.; Schmitt, S.; Hüllermeier, E.; Klebe, G. From the similarity analysis of protein cavities to the functional classification of protein families using cavbase. J. Mol. Biol. 2006, 359 (4), 1023–1044. (48) Weber, A.; Casini, A.; Heine, A.; Kuhn, D.; Supuran, C. T.; Scozzafava, A.; Klebe, G. Unexpected nanomolar inhibition of carbonic anhydrase by COX-2-selective celecoxib: new pharmacological opportunities due to related binding site recognition. J. Med. Chem. 2004, 47 (3), 550–557. (49) Li, G.-H.; Huang, J.-F. CMASA: an accurate algorithm for detecting local protein structural similarity and its application to enzyme catalytic site annotation. BMC Bioinf. [Online] 2010, 11, 439. (50) Zhao, Y.; Wang, J.; Wang, Y.; Huang, J. A comparative analysis of protein targets of withdrawn cardiovascular drugs in human and mouse. J. Clin. Bioinf. [Online] 2012, 2 (1), 10. (51) Roy, A.; Yang, J.; Zhang, Y. COFACTOR: an accurate comparative algorithm for structure-based protein function annotation. Nucl. Acids Res. 2012, 40 (Web Server issue), W471-W477. (52) Dong, Q.; Menon, R.; Omenn, G. S.; Zhang, Y. Structural bioinformatics inspection of neXtProt PE5 proteins in the human proteome. J. Proteome Res. 2015, 14 (9), 3750–3761. (53) Powers, R.; Copeland, J. C.; Germer, K.; Mercier, K. A.; Ramanathan, V.; Revesz, P. Comparison of protein active site structures for functional annotation of proteins and drug design. Proteins: Struct., Funct., Bioinf. 2006, 65 (1), 124–135. (54) Shortridge, M. D.; Powers, R. Structural and functional similarity between the bacterial type III secretion system needle protein PrgI and the eukaryotic apoptosis Bcl-2 proteins. PLoS One [Online] 2009, 4 (10), e7442.

ACS Paragon Plus Environment

92

Page 93 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

(55) Kinoshita, K.; Furui, J.; Nakamura, H. Identification of protein functions from a molecular surface database, eF-site. J. Struct. Funct. Genomics 2002, 2 (1), 9–22. (56) Kinoshita, K.; Nakamura, H. Identification of protein biochemical functions by similarity search using the molecular surface database eF-site. Protein Sci. 2003, 12 (8), 1589–1595. (57) Baroni, M.; Cruciani, G.; Sciabola, S.; Perruccio, F.; Mason, J. S. A common reference framework for analyzing/comparing proteins and ligands. Fingerprints for Ligands and Proteins (FLAP): theory and application. J. Chem. Inf. Model. 2007, 47 (2), 279–294. (58) Sciabola, S.; Stanton, R. V.; Mills, J. E.; Flocco, M. M.; Baroni, M.; Cruciani, G.; Perruccio, F.; Mason, J. S. High-throughput virtual screening of proteins using GRID molecular interaction fields. J. Chem. Inf. Model. 2010, 50 (1), 155–169. (59) Tseng, Y. Y.; Chen, Z. J.; Li, W.-H. fPOP: footprinting functional pockets of proteins by comparative spatial patterns. Nucl. Acids Res. 2010, 38 (Database issue), D288-D295. (60) Tseng, Y. Y.; Li, W.-H. Classification of protein functional surfaces using structural characteristics. Proc. Natl. Acad. Sci. U. S. A. 2012, 109 (4), 1170–1175. (61) Marcou, G.; Rognan, D. Optimizing fragment and scaffold docking by use of molecular interaction fingerprints. J. Chem. Inf. Model. 2007, 47 (1), 195–207. (62) Kooistra, A. J.; Leurs, R.; de Esch, Iwan J P; Graaf, C. de. Structure-based prediction of Gprotein-coupled receptor ligand function: a β-adrenoceptor case study. J. Chem. Inf. Model. 2015, 55 (5), 1045–1061. (63) Najmanovich, R.; Kurbatova, N.; Thornton, J. Detection of 3D atomic similarities and their use in the discrimination of small molecule protein-binding sites. Bioinformatics 2008, 24 (16), i105-i111.

ACS Paragon Plus Environment

93

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 94 of 109

(64) Larocque, M.; Chénard, T.; Najmanovich, R. A curated C. difficile strain 630 metabolic network: prediction of essential targets and inhibitors. BMC Syst. Biol. [Online] 2014, 8, 117. (65) Wood, D. J.; Vlieg, J. de; Wagener, M.; Ritschel, T. Pharmacophore fingerprint-based approach to binding site subpocket similarity and its application to bioisostere replacement. J. Chem. Inf. Model. 2012, 52 (8), 2031–2043. (66) Schirris, T. J. J.; Ritschel, T.; Herma Renkema, G.; Willems, Peter H G M; Smeitink, J. A. M.; Russel, F. G. M. Mitochondrial ADP/ATP exchange inhibition: a novel off-target mechanism underlying ibipinabant-induced myotoxicity. Sci. Rep. [Online] 2015, 5, 14533. (67) Brakoulias, A.; Jackson, R. M. Towards a structural classification of phosphate binding sites in protein-nucleotide complexes: an automated all-against-all structural comparison using geometric matching. Proteins: Struct., Funct., Bioinf. 2004, 56 (2), 250–260. (68) Kinnings, S. L.; Jackson, R. M. Binding site similarity analysis for the functional classification of the protein kinase family. J. Chem. Inf. Model. 2009, 49 (2), 318–329. (69) Wu, C. Y.; Chen, Y. C.; Lim, C. A structural-alphabet-based strategy for finding structural motifs across protein families. Nucl. Acids Res. [Online] 2010, 38 (14), e150. (70) Hua, Y. H.; Wu, C. Y.; Sargsyan, K.; Lim, C. Sequence-motif detection of NAD(P)binding proteins: discovery of a unique antibacterial drug target. Sci. Rep. [Online] 2014, 4, 6471. (71) Das, S.; Kokardekar, A.; Breneman, C. M. Rapid comparison of protein binding site surfaces with property encoded shape distributions. J. Chem. Inf. Model. 2009, 49 (12), 2863– 2872. (72) Das, S.; Krein, M. P.; Breneman, C. M. Binding affinity prediction with property-encoded shape distribution signatures. J. Chem. Inf. Model. 2010, 50 (2), 298–308.

ACS Paragon Plus Environment

94

Page 95 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

(73) Liu, T.; Altman, R. B. Using multiple microenvironments to find similar ligand-binding sites: application to kinase inhibitor binding. PLoS Comput. Biol. [Online] 2011, 7 (12), e1002326. (74) Miguel, A.; Hsin, J.; Liu, T.; Tang, G.; Altman, R. B.; Huang, K. C. Variations in the binding pocket of an inhibitor of the bacterial division protein FtsZ across genotypes and species. PLoS Comput. Biol. [Online] 2015, 11 (3), e1004117. (75) Yeturu, K.; Chandra, N. PocketMatch: a new algorithm to compare binding sites in protein structures. BMC Bioinf. [Online] 2008, 9, 543. (76) Anand, P.; Sankaran, S.; Mukherjee, S.; Yeturu, K.; Laskowski, R.; Bhardwaj, A.; Bhagavat, R.; Brahmachari, S. K.; Chandra, N. Structural annotation of Mycobacterium tuberculosis proteome. PLoS One [Online] 2011, 6 (10), e27044. (77) Möller-Acuña, P.; Contreras-Riquelme, J. S.; Rojas-Fuentes, C.; Nuñez-Vivanco, G.; Alzate-Morales, J.; Iturriaga-Vásquez, P.; Arias, H. R.; Reyes-Parada, M. Similarities between the binding sites of SB-206553 at serotonin type 2 and alpha7 acetylcholine nicotinic receptors: rationale for its polypharmacological profile. PLoS One [Online] 2015, 10 (8), e0134444. (78) Fierro, A.; Montecinos, A.; Gmez-Molina, C.; Nez, G.; Aldeco, M.; E., D.; Vilches Herrera, M.; Luhr, S.; Iturriaga-Vsquez, P.; Reyes Par, M. Similarities between the binding sites of monoamine oxidase (MAO) from different species — Is zebrafish a useful model for the discovery of novel MAO inhibitors? In An Integrated View of the Molecular Recognition and Toxinology - From Analytical Procedures to Biomedical Applications; Radis-Baptista, G., Ed.; InTech, 2013. (79) Reisen, F.; Weisel, M.; Kriegl, J. M.; Schneider, G. Self-organizing fuzzy graphs for structure-based comparison of protein pockets. J. Proteome Res. 2010, 9 (12), 6498–6510.

ACS Paragon Plus Environment

95

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 96 of 109

(80) Lanig, H.; Reisen, F.; Whitley, D.; Schneider, G.; Banting, L.; Clark, T. In silico adoption of an orphan nuclear receptor NR4A1. PLoS One [Online] 2015, 10 (8), e0135246. (81) Konc, J.; Janežič, D. ProBiS algorithm for detection of structurally similar protein binding sites by local structural alignment. Bioinformatics 2010, 26 (9), 1160–1168. (82) Konc, J.; Hodošček, M.; Ogrizek, M.; Trykowska Konc, J.; Janežič, D. Structure-based function prediction of uncharacterized protein using binding sites comparison. PLoS Comput. Biol. [Online] 2013, 9 (11), e1003341. (83) Kakisaka, M.; Sasaki, Y.; Yamada, K.; Kondoh, Y.; Hikono, H.; Osada, H.; Tomii, K.; Saito, T.; Aida, Y. A Novel Antiviral Target Structure Involved in the RNA Binding, Dimerization, and Nuclear Export Functions of the Influenza A Virus Nucleoprotein. PLoS Pathog. [Online] 2015, 11 (7), e1005062. (84) Spitzer, R.; Cleves, A. E.; Jain, A. N. Surface-based protein binding pocket similarity. Proteins: Struct., Funct., Bioinf. 2011, 79 (9), 2746–2763. (85) Cleves, A. E.; Jain, A. N. Chemical and protein structural basis for biological crosstalk between PPARα and COX enzymes. J. Comput.-Aided Mol. Des. 2015, 29 (2), 101–112. (86) Ausiello, G.; Via, A.; Helmer-Citterich, M. Query3d: a new method for high-throughput analysis of functional residues in protein structures. BMC Bioinf. [Online] 2005, 6 Suppl 4, S5. (87) Ausiello, G.; Peluso, D.; Via, A.; Helmer-Citterich, M. Local comparison of protein structures highlights cases of convergent evolution in analogous functional sites. BMC Bioinf. [Online] 2007, 8 Suppl 1, S24. (88) Defranchi, E.; Franchi, E. de; Schalon, C.; Messa, M.; Onofri, F.; Benfenati, F.; Rognan, D. Binding of protein kinase inhibitors to synapsin I inferred from pair-wise binding site similarity measurements. PLoS One [Online] 2010, 5 (8), e12214.

ACS Paragon Plus Environment

96

Page 97 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

(89) Shulman-Peleg, A.; Nussinov, R.; Wolfson, H. J. Recognition of functional sites in protein structures. J. Mol. Biol. 2004, 339 (3), 607–633. (90) Keren-Kaplan, T.; Attali, I.; Estrin, M.; Kuo, L. S.; Farkash, E.; Jerabek-Willemsen, M.; Blutraich, N.; Artzi, S.; Peri, A.; Freed, E. O.; Wolfson, H. J.; Prag, G. Structure-based in silico identification of ubiquitin-binding domains provides insights into the ALIX-V:ubiquitin complex and retrovirus budding. EMBO J. 2013, 32 (4), 538–551. (91) Xie, L.; Xie, L.; Bourne, P. E. A unified statistical model to support local sequence order independent similarity searching for ligand-binding sites and its application to genome-based drug discovery. Bioinformatics 2009, 25 (12), i305-i312. (92) Xie, L.; Evangelidis, T.; Xie, L.; Bourne, P. E. Drug discovery using chemical systems biology: weak inhibition of multiple kinases may contribute to the anti-cancer effect of nelfinavir. PLoS Comput. Biol. [Online] 2011, 7 (4), e1002037. (93) Niu, M.; Hu, J.; Wu, S.; Xiaoe, Z.; Xu, H.; Zhang, Y.; Zhang, J.; Yang, Y. Structural bioinformatics-based identification of EGFR inhibitor gefitinib as a putative lead compound for BACE. Chem. Biol. Drug Des. 2014, 83 (1), 81–88. (94) Xie, L.; Bourne, P. E. Detecting evolutionary relationships across existing fold space, using sequence order-independent profile-profile alignments. Proc. Natl. Acad. Sci. U. S. A. 2008, 105 (14), 5441–5446. (95) Kinnings, S. L.; Liu, N.; Buchmeier, N.; Tonge, P. J.; Xie, L.; Bourne, P. E. Drug discovery using chemical systems biology: repositioning the safe medicine Comtan to treat multi-drug and extensively drug resistant tuberculosis. PLoS Comput. Biol. [Online] 2009, 5 (7), e1000423.

ACS Paragon Plus Environment

97

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 98 of 109

(96) Xie, L.; Wang, J.; Bourne, P. E. In silico elucidation of the molecular mechanism defining the adverse effect of selective estrogen receptor modulators. PLoS Comput. Biol. [Online] 2007, 3 (11), e217. (97) Binkowski, T. A.; Joachimiak, A. Protein functional surfaces: global shape matching and local spatial alignments of ligand binding sites. BMC Struct. Biol. [Online] 2008, 8, 45. (98) Babu, M.; Beloglazova, N.; Flick, R.; Graham, C.; Skarina, T.; Nocek, B.; Gagarinova, A.; Pogoutse, O.; Brown, G.; Binkowski, A.; Phanse, S.; Joachimiak, A.; Koonin, E. V.; Savchenko, A.; Emili, A.; Greenblatt, J.; Edwards, A. M.; Yakunin, A. F. A dual function of the CRISPRCas system in bacterial antivirus immunity and DNA repair. Mol. Microbiol. 2011, 79 (2), 484– 502. (99) Zhang, Y.; Skolnick, J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucl. Acids Res. 2005, 33 (7), 2302–2309. (100) Yang, Y.; Li, G.; Zhao, D.; Yu, H.; Zheng, X.; Peng, X.; Zhang, X.; Fu, T.; Hu, X.; Niu, M.; Ji, X.; Zou, L.; Wang, J. Computational discovery and experimental verification of tyrosine kinase inhibitor pazopanib for the reversal of memory and cognitive deficits in rat model neurodegeneration. Chem. Sci. 2015, 6 (5), 2812–2821. (101) Goodford, P. J. A computational procedure for determining energetically favorable binding sites on biologically important macromolecules. J. Med. Chem. 1985, 28 (7), 849–857. (102) Feldman, H. J.; Labute, P. Pocket similarity: are alpha carbons enough? J. Chem. Inf. Model. 2010, 50 (8), 1466–1475. (103) Brevern, A. G. de; Etchebest, C.; Hazout, S. Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks. Proteins: Struct., Funct., Bioinf. 2000, 41 (3), 271–287.

ACS Paragon Plus Environment

98

Page 99 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

(104) Totrov, M. Atomic property fields: generalized 3D pharmacophoric potential for automated ligand superposition, pharmacophore elucidation and 3D QSAR. Chem. Biol. Drug Des. 2008, 71 (1), 15–27. (105) Jain, A. N. Morphological similarity: a 3D molecular similarity method correlated with protein-ligand recognition. J. Comput.-Aided Mol. Des. 2000, 14 (2), 199–213. (106) Spitzer, R.; Cleves, A. E.; Varela, R.; Jain, A. N. Protein function annotation by local binding site surface similarity. Proteins: Struct., Funct., Bioinf. 2014, 82 (4), 679–694. (107) Hendlich, M.; Bergner, A.; Günther, J.; Klebe, G. Relibase: design and development of a database for comprehensive analysis of protein-ligand interactions. J. Mol. Biol. 2003, 326 (2), 607–620. (108) Günther, J.; Bergner, A.; Hendlich, M.; Klebe, G. Utilising structural knowledge in drug design strategies: applications using Relibase. J. Mol. Biol. 2003, 326 (2), 621–636. (109) Laskowski, R. A. SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. J. Mol. Graphics 1995, 13 (5), 323-330, 307-308. (110) Glaser, F.; Rosenberg, Y.; Kessel, A.; Pupko, T.; Ben-Tal, N. The ConSurf-HSSP database: the mapping of evolutionary conservation among homologs onto PDB structures. Proteins: Struct., Funct., Bioinf. 2005, 58 (3), 610–617. (111) McLachlan, A. D. Repeating sequences and gene duplication in proteins. J. Mol. Biol. 1972, 64 (2), 417–437. (112) Henikoff, S.; Henikoff, J. G. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. U. S. A. 1992, 89 (22), 10915–10919. (113) Konc, J.; Janežič, D. An improved branch and bound algorithm for the maximum clique problem. MATCH 2007, No. 58, 569–590.

ACS Paragon Plus Environment

99

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 100 of 109

(114) Binkowski, T. A.; Adamian, L.; Liang, J. Inferring functional relationships of proteins from local sequence and spatial surface patterns. J. Mol. Biol. 2003, 332 (2), 505–526. (115) Binkowski, T. A.; Naghibzadeh, S.; Liang, J. CASTp: Computed Atlas of Surface Topography of proteins. Nucl. Acids Res. 2003, 31 (13), 3352–3355. (116) Pearson, W. R. Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. Genomics 1991, 11 (3), 635–650. (117) Shulman-Peleg, A.; Nussinov, R.; Wolfson, H. J. SiteEngines: recognition and comparison of binding sites and protein-protein interfaces. Nucl. Acids Res. 2005, 33 (Web Server issue), W337-W341. (118) Kabsch, W.; Sander, C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 1983, 22 (12), 2577–2637. (119) Kihara, D.; Skolnick, J. The PDB is a covering set of small protein structures. J. Mol. Biol. 2003, 334 (4), 793–802. (120) Dayhoff, M. O.; Schwartz, R. M.; Orcutt, B. C. A model of evolutionary change in proteins. In Atlas of Protein Sequence and Structure; Dayhoff, M. O., Ed.; National Biomedical Research Foundation: Washington, DC, 1978; pp 345–352. (121) Tseng, Y. Y.; Dupree, C.; Chen, Z. J.; Li, W.-H. SplitPocket: identification of protein functional surfaces and characterization of their spatial patterns. Nucl. Acids Res. 2009, 37 (Web Server issue), W384-W389. (122) Wei, L.; Altman, R. B.; Chang, J. T. Using the radial distributions of physical features to compare amino acid environments and align amino acid sequences. Pac. Symp. Biocomput. 1997, 465–476.

ACS Paragon Plus Environment

100

Page 101 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

(123) Yoon, S.; Ebert, J. C.; Chung, E.-Y.; Micheli, G. de; Altman, R. B. Clustering protein environments for function prediction: finding PROSITE motifs in 3D. BMC Bioinf. [Online] 2007, 8 Suppl 4, S10. (124) Shindyalov, I. N.; Bourne, P. E. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng. 1998, 11 (9), 739–747. (125) Bemis, G. W.; Murcko, M. A. The properties of known drugs. 1. Molecular frameworks. J. Med. Chem. 1996, 39 (15), 2887–2893. (126) Das, S.; Krein, M. P.; Breneman, C. M. PESDserv: a server for high-throughput comparison of protein binding site surfaces. Bioinformatics 2010, 26 (15), 1913–1914. (127) Zhu, L.; George, S.; Schmidt, M. F.; Al-Gharabli, S. I.; Rademann, J.; Hilgenfeld, R. Peptide aldehyde inhibitors challenge the substrate specificity of the SARS-coronavirus main protease. Antiviral Res. 2011, 92 (2), 204–212. (128) Willmann, D.; Lim, S.; Wetzel, S.; Metzger, E.; Jandausch, A.; Wilk, W.; Jung, M.; Forne, I.; Imhof, A.; Janzer, A.; Kirfel, J.; Waldmann, H.; Schüle, R.; Buettner, R. Impairment of prostate cancer cell growth by a selective and reversible lysine-specific demethylase 1 inhibitor. Int. J. Cancer 2012, 131 (11), 2704–2709. (129) Holm, L.; Rosenström, P. Dali server: conservation mapping in 3D. Nucl. Acids Res. 2010, 38 (Web Server issue), W545-W549. (130) Wetzel, S.; Wilk, W.; Chammaa, S.; Sperl, B.; Roth, A. G.; Yektaoglu, A.; Renner, S.; Berg, T.; Arenz, C.; Giannis, A.; Oprea, T. I.; Rauh, D.; Kaiser, M.; Waldmann, H. A scaffoldtree-merging strategy for prospective bioactivity annotation of gamma-pyrones. Angew. Chem., Int. Ed. Engl. 2010, 49 (21), 3666–3670.

ACS Paragon Plus Environment

101

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 102 of 109

(131) UCSF Chimera - a visualization system for exploratory research and analysis.; Resource for Biocomputing, Visualization, and Informatics: Computer Graphics Laboratory, Department of Pharmaceutical Chemistry, University of California, 600 16th Street, San Francisco, California 94143-2240, USA, 2015. (132) Rajagopal, S.; Rajagopal, K.; Lefkowitz, R. J. Teaching old receptors new tricks: biasing seven-transmembrane receptors. Nat. Rev. Drug Discovery 2010, 9 (5), 373–386. (133) Issa, N. T.; Byers, S. W.; Dakshanamurthy, S. Drug repurposing: translational pharmacology, chemistry, computers and the clinic. Curr. Top. Med. Chem. 2013, 13 (18), 2328– 2336. (134) Haupt, V. J.; Schroeder, M. Old friends in new guise: repositioning of known drugs with structural bioinformatics. Briefings Bioinf. 2011, 12 (4), 312–326. (135) Baek, M.-C.; Jung, B.; Kang, H.; Lee, H.-S.; Bae, J.-S. Novel insight into drug repositioning: methylthiouracil as a case in point. Pharmacol. Res. 2015, 99, 185–193. (136) O’Boyle, N. M.; Banck, M.; James, C. A.; Morley, C.; Vandermeersch, T.; Hutchison, G. R. Open Babel: An open chemical toolbox. J. Cheminf. [Online] 2011, 3 (1), 33. (137) Rahman, S. A.; Bashton, M.; Holliday, G. L.; Schrader, R.; Thornton, J. M. Small Molecule Subgraph Detector (SMSD) toolkit. J. Cheminf. [Online] 2009, 1 (1), 12. (138) Chen, X.; Liu, M.; Gilson, M. K. BindingDB: a web-accessible molecular recognition database. Comb. Chem. High Throughput Screening 2001, 4 (8), 719–725. (139) Lang, H.; Huang, X.; Yang, Y. Identification of putative molecular imaging probes for BACE-1 by accounting for protein flexibility in virtual screening. J. Alzheimer’s Dis. 2012, 29 (2), 351–359.

ACS Paragon Plus Environment

102

Page 103 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

(140) Rastelli, G.; Pinzi, L. Computational polypharmacology comes of age. Front. Pharmacol. [Online] 2015, 6, 7874. (141) Salentin, S.; Haupt, V. J.; Daminelli, S.; Schroeder, M. Polypharmacology rescored: protein-ligand interaction profiles for remote binding site similarity assessment. Prog. Biophys. Mol. Biol. 2014, 116 (2-3), 174–186. (142) Gills, J. J.; Lopiccolo, J.; Tsurutani, J.; Shoemaker, R. H.; Best, C. J. M.; Abu-Asab, M. S.; Borojerdi, J.; Warfel, N. A.; Gardner, E. R.; Danish, M.; Hollander, M. C.; Kawabata, S.; Tsokos, M.; Figg, W. D.; Steeg, P. S.; Dennis, P. A. Nelfinavir, A lead HIV protease inhibitor, is a broad-spectrum, anticancer agent that induces endoplasmic reticulum stress, autophagy, and apoptosis in vitro and in vivo. Clin. Cancer Res. 2007, 13 (17), 5183–5194. (143) Dunlop, J.; Lock, T.; Jow, B.; Sitzia, F.; Grauer, S.; Jow, F.; Kramer, A.; Bowlby, M. R.; Randall, A.; Kowal, D.; Gilbert, A.; Comery, T. A.; Larocque, J.; Soloveva, V.; Brown, J.; Roncarati, R. Old and new pharmacology: positive allosteric modulation of the alpha7 nicotinic acetylcholine receptor by the 5-hydroxytryptamine(2B/C) receptor antagonist SB-206553 (3,5dihydro-5-methyl-N-3-pyridinylbenzo[1,2-b:4,5-b’]di pyrrole-1(2H)-carboxamide). J. Pharmacol. Exp. Ther. 2009, 328 (3), 766–776. (144) Audia, J. E.; Evrard, D. A.; Murdoch, G. R.; Droste, J. J.; Nissen, J. S.; Schenck, K. W.; Fludzinski, P.; Lucaites, V. L.; Nelson, D. L.; Cohen, M. L. Potent, selective tetrahydro-betacarboline antagonists of the serotonin 2B (5HT2B) contractile receptor in the rat stomach fundus. J. Med. Chem. 1996, 39 (14), 2773–2780. (145) Petrey, D.; Honig, B. Structural bioinformatics of the interactome. Annu. Rev. Biophys. 2014, 43, 193–210.

ACS Paragon Plus Environment

103

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 104 of 109

(146) Haydon, D. J.; Stokes, N. R.; Ure, R.; Galbraith, G.; Bennett, J. M.; Brown, D. R.; Baker, P. J.; Barynin, V. V.; Rice, D. W.; Sedelnikova, S. E.; Heal, J. R.; Sheridan, J. M.; Aiwale, S. T.; Chauhan, P. K.; Srivastava, A.; Taneja, A.; Collins, I.; Errington, J.; Czaplewski, L. G. An inhibitor of FtsZ with potent and selective anti-staphylococcal activity. Science 2008, 321 (5896), 1673–1675. (147) Anderson, D. E.; Kim, M. B.; Moore, J. T.; O’Brien, T. E.; Sorto, N. A.; Grove, C. I.; Lackner, L. L.; Ames, J. B.; Shaw, J. T. Comparison of small molecule inhibitors of the bacterial cell division protein FtsZ and identification of a reliable cross-species inhibitor. ACS Chem. Biol. 2012, 7 (11), 1918–1928. (148) Andreu, J. M.; Schaffner-Barbero, C.; Huecas, S.; Alonso, D.; Lopez-Rodriguez, M. L.; Ruiz-Avila, L. B.; Núñez-Ramírez, R.; Llorca, O.; Martín-Galiano, A. J. The antibacterial cell division inhibitor PC190723 is an FtsZ polymer-stabilizing agent that induces filament assembly and condensation. J. Biol. Chem. 2010, 285 (19), 14239–14246. (149) Sudha, G.; Nussinov, R.; Srinivasan, N. An overview of recent advances in structural bioinformatics of protein–protein interactions and a guide to their principles. Prog. Biophys. Mol. Biol. 2014, 116 (2-3), 141–150. (150) Xue, L. C.; Dobbs, D.; Bonvin, A. M.; Honavar, V. Computational prediction of protein interfaces: A review of data driven methods. FEBS Lett. 2015, 589 (23), 3516–3526. (151) Dowlatshahi, D. P.; Sandrin, V.; Vivona, S.; Shaler, T. A.; Kaiser, S. E.; Melandri, F.; Sundquist, W. I.; Kopito, R. R. ALIX is a Lys63-specific polyubiquitin binding protein that functions in retrovirus budding. Dev. Cell 2012, 23 (6), 1247–1254.

ACS Paragon Plus Environment

104

Page 105 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

(152) Pashkova, N.; Gakhar, L.; Winistorfer, S. C.; Sunshine, A. B.; Rich, M.; Dunham, M. J.; Yu, L.; Piper, R. C. The yeast Alix homolog Bro1 functions as a ubiquitin receptor for protein sorting into multivesicular endosomes. Dev. Cell 2013, 25 (5), 520–533. (153) Schumann, M.; Armen, R. S. Identification of distant drug off-targets by direct superposition of binding pocket surfaces. PLoS One [Online] 2013, 8 (12), e83533. (154) Paolini, G. V.; Shapland, R. H. B.; van Hoorn, W. P.; Mason, J. S.; Hopkins, A. L. Global mapping of pharmacological space. Nat. Biotechnol. 2006, 24 (7), 805–815. (155) Han, L.; Wang, Y.; Bryant, S. H. A survey of across-target bioactivity results of small molecules in PubChem. Bioinformatics 2009, 25 (17), 2251–2255. (156) LaBute, M. X.; Zhang, X.; Lenderman, J.; Bennion, B. J.; Wong, S. E.; Lightstone, F. C. Adverse drug reaction prediction using scores produced by large-scale drug-protein target docking on high-performance computing machines. PLoS One [Online] 2014, 9 (9), e106298. (157) Sonnhammer, E. L.; Eddy, S. R.; Durbin, R. Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins: Struct., Funct., Bioinf. 1997, 28 (3), 405– 420. (158) Kuhn, D.; Weskamp, N.; Hüllermeier, E.; Klebe, G. Functional classification of protein kinase binding sites using Cavbase. ChemMedChem 2007, 2 (10), 1432–1447. (159) Liu, C.-G.; Xu, K.-Q.; Xu, X.; Huang, J.-J.; Xiao, J.-C.; Zhang, J.-P.; Song, H.-P. 17Betaoestradiol regulates the expression of Na+/K+-ATPase beta1-subunit, sarcoplasmic reticulum Ca2+-ATPase and carbonic anhydrase iv in H9C2 cells. Clin. Exp. Pharmacol. Physiol. 2007, 34 (10), 998–1004.

ACS Paragon Plus Environment

105

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 106 of 109

(160) Beca, S.; Pavlov, E.; Kargacin, M. E.; Aschar-Sobbi, R.; French, R. J.; Kargacin, G. J. Inhibition of a cardiac sarcoplasmic reticulum chloride channel by tamoxifen. Pfluegers Arch. 2008, 457 (1), 121–135. (161) Wang, J.-C.; Lin, J.-H. Scoring functions for prediction of protein-ligand interactions. Curr. Pharm. Des. 2013, 19 (12), 2174–2182. (162) Wikberg, J. E. S.; Lapinsh, M.; Prusis, P. Proteochemometrics: a tool for modeling the molecular interaction space. In Chemogenomics in Drug Discovery; Kubinyi, H., Müller, G., Eds.; Methods and Principles in Medicinal Chemistry; Wiley-VCH Verlag GmbH & Co. KGaA: Weinheim, FRG, 2004; pp 289–309. (163) Wang, R.; Fang, X.; Lu, Y.; Wang, S. The PDBbind database: collection of binding affinities for protein-ligand complexes with known three-dimensional structures. J. Med. Chem. 2004, 47 (12), 2977–2980. (164) Olsson, T. S. G.; Williams, M. A.; Pitt, W. R.; Ladbury, J. E. The thermodynamics of protein-ligand interaction and solvation: insights for ligand design. J. Mol. Biol. 2008, 384 (4), 1002–1017. (165) Wu, C. Y.; Hwa, Y. H.; Chen, Y. C.; Lim, C. Hidden relationship between conserved residues and locally conserved phosphate-binding structures in NAD(P)-binding proteins. J. Phys. Chem. B 2012, 116 (19), 5644–5652. (166) Huang, B. Identification of pockets on protein surface to predict protein–ligand binding sites. In Identification of Ligand Binding Site and Protein-Protein Interaction Area; RotermanKonieczna, I., Ed.; Focus on Structural Biology; Springer Netherlands: Dordrecht, 2013; pp 25– 39.

ACS Paragon Plus Environment

106

Page 107 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

(167) Holm, L.; Sander, C. Dali: a network tool for protein structure comparison. Trends Biochem. Sci. 1995, 20 (11), 478–480. (168) Nadzirin, N.; Firdaus-Raih, M. Proteins of unknown function in the Protein Data Bank (PDB): an inventory of true uncharacterized proteins and computational tools for their analysis. Int. J. Mol. Sci. 2012, 13 (10), 12761–12772. (169) Petrey, D.; Chen, T. S.; Deng, L.; Garzon, J. I.; Hwang, H.; Lasso, G.; Lee, H.; Silkov, A.; Honig, B. Template-based prediction of protein function. Curr. Opin. Struct. Biol. 2015, 32, 33– 38. (170) Hwang, K. Y.; Chung, J. H.; Kim, S. H.; Han, Y. S.; Cho, Y. Structure-based identification of a novel NTPase from Methanococcus jannaschii. Nat. Struct. Biol. 1999, 6 (7), 691–696. (171) Pieper, U.; Webb, B. M.; Dong, G. Q.; Schneidman-Duhovny, D.; Fan, H.; Kim, S. J.; Khuri, N.; Spill, Y. G.; Weinkam, P.; Hammel, M.; Tainer, J. A.; Nilges, M.; Sali, A. ModBase, a database of annotated comparative protein structure models and associated resources. Nucl. Acids Res. 2014, 42 (Database issue), D336-D346. (172) Yin, J.; Garen, C. R.; Bateman, K.; Yu, M.; Lyon, E. Z. A.; Habel, J.; Kim, H.; Hung, L.w.; Kim, C.-Y.; James, M. N. G. Expression, purification and preliminary crystallographic analysis of O-acetylhomoserine sulfhydrylase from Mycobacterium tuberculosis. Acta Crystallogr., Sect. F: Struct. Biol. Cryst. Commun. 2011, 67 (Pt 8), 959–963. (173) Zhang, Y. I-TASSER server for protein 3D structure prediction. BMC Bioinf. [Online] 2008, 9, 40.

ACS Paragon Plus Environment

107

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 108 of 109

(174) Farrah, T.; Deutsch, E. W.; Hoopmann, M. R.; Hallows, J. L.; Sun, Z.; Huang, C.-Y.; Moritz, R. L. The state of the human proteome in 2012 as viewed through PeptideAtlas. J. Proteome Res. 2013, 12 (1), 162–171. (175) Caetano-Anollés, G.; Wang, M.; Caetano-Anollés, D.; Mittenthal, J. E. The origin, evolution and structure of the protein world. Biochem. J. 2009, 417 (3), 621–637. (176) Naumann, T.; Matter, H. Structural classification of protein kinases using 3D molecular interaction field analysis of their ligand binding sites: target family landscapes. J. Med. Chem. 2002, 45 (12), 2366–2378. (177) Altschul, S. F.; Madden, T. L.; Schäffer, A. A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D. J. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acids Res. 1997, 25 (17), 3389–3402. (178) Bairoch, A. PROSITE: a dictionary of sites and patterns in proteins. Nucl. Acids Res. 1991, 19 Suppl, 2241–2245. (179) Chupakhin, V.; Marcou, G.; Gaspar, H.; Varnek, A. Simple Ligand-Receptor Interaction Descriptor (SILIRID) for alignment-free binding site comparison. Comput. Struct. Biotechnol. J. 2014, 10 (16), 33–37. (180) Chartier, M.; Najmanovich, R. Detection of binding site molecular interaction field similarities. J. Chem. Inf. Model. 2015, 55 (8), 1600–1615. (181) Kinjo, A. R.; Nakamura, H. Similarity search for local protein structures at atomic resolution by exploiting a database management system. Biophysics 2007, 3, 75–84. (182) Kinjo, A. R.; Nakamura, H. Comprehensive structural classification of ligand-binding motifs in proteins. Structure 2009, 17 (2), 234–246.

ACS Paragon Plus Environment

108

Page 109 of 109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

(183) Friedberg, I.; Godzik, A. Connecting the protein structure universe by using sparse recurring fragments. Structure 2005, 13 (8), 1213–1224. (184) Godshall, B. G.; Tang, Y.; Yang, W.; Chen, B. Y. An aggregate analysis of many predicted structures to reduce errors in protein structure comparison caused by conformational flexibility. BMC Struct. Biol. [Online] 2013, 13 Suppl 1, S10. (185) Volkamer, A.; Rarey, M. Exploiting structural information for drug-target assessment. Future Med. Chem. 2014, 6 (3), 319–331.

Insert Table of Contents Graphic and Synopsis Here

ACS Paragon Plus Environment

109