Identification of Conserved Water Sites in Protein Structures for Drug

Nov 20, 2017 - algorithm, which enables identification of conserved water sites in proteins using experimental protein structures from the PDB or a se...
2 downloads 17 Views 2MB Size
Subscriber access provided by READING UNIV

Article

Identification of Conserved Water Sites in Protein Structures for Drug Design Marko Jukic, Janez Konc, Stanislav Gobec, and Dusanka Janezic J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.7b00443 • Publication Date (Web): 20 Nov 2017 Downloaded from http://pubs.acs.org on November 25, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Information and Modeling is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Identification of Conserved Water Sites in Protein Structures for Drug Design Marko Jukič a, Janez Konc b,c, Stanislav Gobec a and Dušanka Janežič c,*

a

Faculty of Pharmacy, University of Ljubljana, Aškerčeva 7, SI–1000, Ljubljana, Slovenia.

b

National Institute of Chemistry, Hajdrihova 19, SI–1000, Ljubljana, Slovenia.

c

Faculty of Mathematics, Natural Sciences and Information Technologies, University of

Primorska, Glagoljaška 8, SI–6000 Koper, Slovenia. * Corresponding author: Dušanka Janežič, University of Primorska, Faculty of Mathematics, Natural Sciences and Information Technologies, Glagoljaška 8, SI–6000 Koper, Slovenia, Tel: + 386 5 611 76 59, E–mail: [email protected].

ACS Paragon Plus Environment

1

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 35

ABSTRACT Identification of conserved waters in protein structures is a challenging task with applications in molecular docking and protein stability prediction. As an alternative to computationally demanding simulations of proteins in water, experimental co-crystallized waters in the Protein Data Bank (PDB) in combination with a local structure alignment algorithm can be used for reliable prediction of conserved water sites. We developed the ProBiS H2O approach based on previously developed ProBiS algorithm, which enables identification of conserved water sites in proteins using experimental protein structures from the PDB or a set of custom protein structures available to the user. With a protein structure, a binding site or an individual water molecule as query, ProBiS H2O collects similar proteins from the PDB and performs local or binding site–specific superimpositions of the query structure with the similar proteins using the ProBiS algorithm. It collects the experimental water molecules from the similar proteins and transposes them to the query protein. Transposed waters are clustered by their mutual proximity, which enables identification of discrete sites in the query protein with high water conservation. ProBiS H2O is a robust and fast new approach that uses existing experimental structural data to identify conserved water sites on the interfaces of protein complexes, for example protein–small molecule interfaces, and elsewhere on the protein structures. It has been successfully validated in several reported proteins in which conserved water molecules were found to play an important role in ligand binding with applications in drug design. Availability and implementation: ProBiS H2O is implemented in Python 2.7 as a PyMOL plugin

for

LINUX

system

environment.

The

plugin

is

freely

available

http://insilab.org/probis-h2o. Contacts: [email protected], [email protected], [email protected], and [email protected].

ACS Paragon Plus Environment

2

from

Page 3 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

1. INTRODUCTION Conserved water molecules are a well–known phenomenon in medicinal chemistry and key contributors to hydrogen bonding networks as reviewed by Kubinyi et al. constituent of biological systems in which it is involved structurally.

1-2

3

Water is a key

Water permeates

proteins and forms complex hydrogen bond networks that stabilize their structure and intermacromolecular interfaces.

4

It is a vital constituent in catalysis, an amphoteric proton

mediator and involved in charge interactions.

5

It also plays an essential role in molecular

recognition 6 and is intimately involved in protein ligand interactions. 7, 8 Conserved water molecules are characterized by high residence times, low isotropic displacements, they are usually buried or groove bound, they interact with the protein, and can be observed at the same place in multiple structurally homologous proteins. 2 Displacement of such water molecules by ligands can result either in a dramatic decrease in protein-ligand affinity, increase in affinity if the displaced water is substituted by a suitable polar functional group, or no change in affinity due to the compensation between enthalpy and entropy in ligand binding thermodynamics. 1, 9 Displacement of tightly bound conserved water is usually accompanied by a change in the mutual enthalpy versus entropy ratio of ligand binding even if the Gibbs binding free energy remains unchanged.

9

The study of conserved water

molecules is therefore of utmost importance in understanding of macromolecular structures, pharmacodynamics and design or optimization of novel drugs. 9-13 Current methods used in computer programs for detection of conserved waters usually consist of two steps: sampling of water sites around the protein, followed by the clustering of water molecules (Figure 1). For prediction of water conservation, three general approaches have been reported. 14

ACS Paragon Plus Environment

3

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60



explicit water models (MD, RISM, Monte Carlo)



implicit water models (probe–based, grid–based)



experimental crystal water (PDB, CCDC Super Star)

Figure 1: Conserved water prediction: General approach.

Sampling can be done either by molecular dynamics (MD) simulations or by grid–based approaches.

15

Examples of representative MD–based methods, which use explicit water

models, are WaterMap, 16 SPAM, 17 3D-RISM, 18 GCMC, 19 and JAWS. 20 Examples of grid– based methods, where a water probe is translated through space on a grid and its interactions with protein are evaluated at each point, include SZMAP, 21 WaterFLAP, 22 and WaterDock. 23 Descriptor–based approaches have also been developed to energetically and geometrically evaluate specific water molecules. Instead of identifying conservation trends, these approaches evaluate whether a specific water molecule is feasible or not at a specific site. WaterScore thus establishes a statistical correlation between structural properties of water molecules, i.e., B–factor, solvent accessible surface area, total hydrogen bond energy and the number of protein atomic contacts in the apo– and holo– protein complexes and tries to discriminate between bound and displaceable waters.

24

Similarly, Consolv employs a hybrid

k–nearest neighbours genetic algorithm classifier to examine water environments, described with B–factors, the number of hydrogen bonds between the water molecule and the protein, the density and the hydrophilicity of neighbouring protein atoms. 25

ACS Paragon Plus Environment

4

Page 4 of 35

Page 5 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

As an alternative to molecular simulations, approaches based on experimental data, have been developed. In these, co–crystallized water molecules from the Protein Data Bank (PDB) 14, 26 – 29

(DRoP algorithm) 30, 31 or from the Cambridge Structural Database (CSD)

32, 33

of small–

molecule crystal structures (AcquaAlta algorithm) 34 are used to predict conserved water sites on proteins. PyWATER predicts discrete conserved water sites on a query protein; implemented as a PyMOL plugin, it finds PDB structures with sequences similar to the query, followed by a sequence independent structure–based superimposition of the protein backbones. 36, 37 Despite the variety of the described algorithms, there is still significant room for improvement. Approaches based on molecular dynamics currently use graphical processing units to speed up the computations, but they remain computationally intensive. On the other hand, the structural grid–based approaches often require high data curation and depend heavily on the input structure preparation whose use approaches the expert–system level. 42

38 –

Due to protein flexibility, structural alignment of whole protein backbones often cannot

accurately place water molecules in binding sites, which are usually localized at one part of a protein. Descriptor–based approaches are often simplest to use, enable fast calculations and water data collection from experimental results. However, they depend on accurate representation of hydration sites and have shown limited success, mainly due to a limited adoption rate and few reports on their usage. We developed ProBiS H2O, an extension of our earlier ProBiS plugin, ProBiS–ligands and GenProBiS web servers 43 – 45 to facilitate rapid identification of discrete conserved water sites from experimental protein structures in the PDB. The approach is implemented as the ProBiS H2O plugin for PyMOL molecular visualization program, which enables interactive visualization of the results and serves as a key tool in the in silico drug discovery process. The

ACS Paragon Plus Environment

5

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 35

approach has been validated in several proteins in which conserved waters play important roles in ligand binding with applications in medicinal chemistry. A ProBiS H2O plugin is freely available at http://insilab.org/probis-h2o. 2. PROBIS H2O The ProBiS H2O approach uses available PDB structures to predict conserved water sites by local structural superimposition using the ProBiS algorithm coupled with water clusters detection (Figure 2). ProBiS H2O is the first tool to perform local superimpositions for the detection of conserved waters and can therefore alleviate the global structure bias introduced by comparison of similar protein structures or protein structures that are in a different conformation than the query protein. Using the ProBiS algorithm,

46

the query protein

structure is locally aligned with similar protein structures from the PDB on the selected ligand or water binding site or on the entire chain. The ProBiS algorithm detects and aligns similar binding sites based on our maximum clique algorithm

47

and irrespective of the proteins'

similar folding patterns, produces superimpositions of local protein regions that are superior to whole backbone superimpositions. Water molecules are transposed from the similar proteins to the query protein based on these local superimpositions, and water clusters are identified. Protein similarity is addressed through the use of precalculated sequence similarity clusters ranging from 30 to 95% in similarity. The latter clustering scheme can be used to extend structural studies and observe the influence of sequence similarity on discrete conserved water clustering patterns. A conservation score is assigned to each cluster depending on the number of the water molecules in the cluster. A user can also calculate conserved water clusters using a custom PDB structure file and a custom similarity cluster.

ACS Paragon Plus Environment

6

Page 7 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 2: The ProBiS H2O approach.

COLLECTION OF SIMILAR PROTEIN STRUCTURES The ProBiS H2O approach begins with the entry of a PDB/Chain ID representing the query protein. Similar protein structures are obtained from the PDB, sharing a varying percentage of sequence identity with the query protein. This is achieved by using the precalculated sequence clusters available at the PDB website, where all the deposited protein chains longer than 20 amino acid residues have been clustered by blastclust and cd-hit at 100%, 95%, 90%, 70%, 50%, 40%, and 30% sequence identity.

48, 49

With the selected query protein and sequence

identity cutoff as user input, ProBiS H2O identifies the corresponding sequence cluster and retrieves all the similar protein structures belonging to this cluster. These constitute a set of similar protein structures for comparison with the query protein. A sequence identity cutoff of 95% is used as default in this study. A custom set of protein structures can be used instead, thereby avoiding the automated similar protein structure collection step.

ACS Paragon Plus Environment

7

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

SUPERIMPOSITION AND SAMPLING OF WATER SITES All of the retrieved similar protein structures are locally superimposed on the query protein using the parallel ProBiS algorithm.

46, 47, 50

This algorithm finds the most conserved local

surfaces in the compared proteins using the maximum clique algorithm 47, 50 and is suitable for comparisons of structurally similar as well as dissimilar proteins.

51

Local superimposition is

essential for accurate location of the water molecules in the query protein structure, as it ensures that the residues near to the water molecules will be superimposed. This enables collection of water molecules also from protein structures that cannot be aligned well by global superimposition of their backbone atoms. The superimposition on the query protein can be focused on one of its ligand binding sites or existing water sites, or one of its entire chains. ProBiS H2O currently uses only the best local superimposition found between a pair of proteins. 52 If a binding site is chosen as the query, a box 4 Å larger than its extreme borders is considered in water collection. Alternatively, all experimentally reported waters can be examined by choosing the protein chain as query. For each reported water, the user can inspect the isotropic displacements calculated from the experimental Debye–Waller factors (B–factors) and decide on the validity of the predicted water site. B–factors measure the flexibility of an atom's position in the protein crystal structure, and the isotropic displacement is related to the probability of that position. The mean displacement (u) of an individual atom from its specific equilibrium position is calculated as: u = sqrt(B/2π) where B is the experimentally determined B factor.

ACS Paragon Plus Environment

8

Page 8 of 35

Page 9 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

WATER CLUSTERING The collected waters are transposed to the query protein structure using rotational– translational matrices calculated from the superimposition, and for each water molecule, the isotropic displacement is calculated based on the experimental B-factor. Density of water molecules in the common coordinate system of the query protein is then examined. A single water molecule with no other waters nearby is most likely a random or bulk water molecule; on the contrary, a cluster of water molecules in the immediate vicinity in an indication of a conserved water site, because this means that a water molecule is at this same site in multiple protein structures. For clustering we used the Three Dimensional Density Based Spatial Clustering of Applications with Noise (3D–DBSCAN) algorithm implemented in the scikit– learn machine learning Python library.

53, 54

The selected clustering algorithm is suitable for

uneven cluster sizes and also it is fast as it examines distance between nearest points only. 3DDBSCAN therefore enables ProBiS H2O fast calculation times and superior experiment iteration for the user. Clusters are defined as dense regions in the data space, separated by regions of lower object density and are defined as a maximal set of density–connected points. 55

Two parameters are required: ɛ – diameter of a sphere defining the cluster space and n –

number of data points in this defined space. A dense region is defined as the ɛ-neighbourhood of a data point (p) with n or more data points (q). A dense region comprising a cluster is then calculated as: dense region: ɛ–neighbourhood of an object contains at least n of objects ɛ – neighbourhood: Nɛ(p):{q|d(p, q) ≤ ɛ } (ɛ–neighbourhood with objects (p, q) within a radius ɛ from an object)

ACS Paragon Plus Environment

9

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

In ProBiS H2O approach, we set ɛ to 0.9, which corresponds to a sphere with radius of 0.9 Å. The n parameter is increased iteratively from n = 1, indicating random water molecules, to n = N, where N is number of superimposed chains of similar proteins. The clustering calculation is iterated with increasing density until no more clusters can be identified, and all possible densities have been examined. IMPLEMENTATION AND VISUALIZATION The ProBiS H2O approach is implemented as a plugin in the PyMOL molecular viewer (v1.8), which enables visualization of water clusters and water binding sites in a protein structure.

37

The user should first choose the query protein's PDB ID, the sequence identity

cutoff for the similar proteins, and optionally, a binding site or chain. The ProBiS algorithm then superimposes the similar proteins onto the query protein and in this superimposition process water molecules are transposed from similar proteins to the query protein. This is followed by clustering, after which the user is presented with a color–coded list of identified water clusters sorted by their conservation, which is calculated as: conservation = number of water molecules in identified cluster / superimposed protein chains Water clusters are represented as ball–models in the three–dimensional viewer and the user can select one or multiple models for visual inspection using PyMOL (Figure 3). The water clusters can be examined in the context of the query protein and the residues on the query protein that interact with conserved waters can be displayed.

ACS Paragon Plus Environment

10

Page 10 of 35

Page 11 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

A

B

Figure 3: ProBiS H2O plugin. (A) Input panel with the PDB ID input field, checkboxes can restrict the search to a specific binding site, specific water site, or to whole protein chain; or to specify Debye–Waller (B–factor) correction or to consider only one protein chain per PDB ID entry if there are more chains with 100 % sequence identity. Below, a dropdown list allows selection of sequence identity of a comparison protein structure set. On first usage, the user clicks the “Setup DB” button (bottom–left), which downloads the sequence identity clusters and ProBiS algorithm from the Insilab website. Typical usage is then annotated with red numbers on panel A. A PDB ID or a custom PDB structure input field is marked with a red number 1; PDB ID is obtained from the RCSB PDB database, whereas a custom PDB structure together with a custom similarity cluster is read from the local disk. Next, the user clicks on the “Find” button (red number 2) that displays the PDB IDs that belong to the comparison protein set. The next step (red number 3) is in parentheses because this step is optional if the user already analysed a particular protein cluster and has the structures already downloaded. If it is a first time experiment, the “Download” button downloads the set from the PDB website, and “Identify” button (red number 4) displays all binding sites on the query

ACS Paragon Plus Environment

11

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 35

protein. The user can then select a binding site from the list, and finally, the calculation is started by clicking the “GO” button (red number 5). (B) Output panel with a list of identified water clusters ranked according to their conservation scores and color coded accordingly. Clicking on each row of this list selects the water clusters with the same conservation score and the selected results can be displayed by clicking the “display” button. Additionally the “fetch/reset” button displays the examined PDB ID, the “chain box” button emphasizes the studied chain, the “b-site” button displays conserved water binding sites with the “contacts” button providing interaction network of displayed conserved water clusters in the three– dimensional viewer.

3. VALIDATION AND EXAMPLES OF USE The ProBiS H2O approach was validated with several proteins and compared with other approaches. Examples of its use in various drug design experiments are described here. ANTICANCER DRUG BOSUTINIB SELECTIVITY TOWARDS SRC KINASES Levinson et al. postulated that bosutinib, an anticancer drug, selectively identifies its Src kinase target through interaction with two conserved water molecules.

56

We predicted the

conserved waters on the Src kinase protein structure PDB ID: 4MXO bound to bosutinib (Ligand ID: DB8) at the conservation threshold of 0.5 and using the »Compare whole chain« option of the ProBiS H2O plugin and, in agreement with the previous findings, successfully identified the two conserved water molecules (waters W1 and W2 in Figure 4: Bosutinib selectivity towards Src kinase (PDB ID: 4MXO) mediated by conserved waters. Water clusters W1 and W2 (red ball models) as identified in the vicinity of the nitrile group of bosutinib (CPK colored sticks); conserved waters W3–W5 are newly found and play

ACS Paragon Plus Environment

12

Page 13 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

structural and functional roles near the catalytic segment.) with conservation scores of 0.75 and 0.63, respectively. As there are many protein kinase chains, we used the Src kinase set reported by Levinson et al.

56

with PDB IDs: 4MXO, 1H1S, 1Q5K, 1XBB, 1YI3, 2OBJ,

3MA3, 2RKU, 4F9C, 3SRV and 3FHR as a set of similar protein structures. The two conserved waters occupy a cavity in the vicinity of Met314, which has been called the gatekeeper residue facing the bosutinib nitrile functional group. The importance of this particular residue was previously validated by clinical resistance to kinase inhibitors that frequently results from mutations at this position.

57

It was postulated that this residue exerts

steric control over inhibitor access to the binding pocket. Using ProBiS H2O, we identified three additional new conserved waters. The water W3 binds to the indole group of the Trp446 and to the hydroxyl group of Tyr463. This water plays an important role in conformational change upon the transfer of the γ-phosphate group of ATP to the substrate's tyrosine residue. 58 Further, the conserved water clusters W4 and W5 are in the vicinity of catalytic Asp386 in the catalytic segment of Src kinase. This residue is highly conserved in kinases and functions as a base that abstracts a proton from Tyr416 in the activation segment thereby facilitating its nucleophilic attack of the γ-phosphorus atom of MgATP.

59, 60

The conserved waters W4 and W5 are bound to two β turns composed of

residues 384-387 and 405-408: W4 forms hydrogen bonds to the main chain amide nitrogen of Leu407 and to the carbonyl oxygens of His384 and Asp404, whereas W5 is hydrogen bonded to the main chain amide nitrogen of Leu387 and carboxylate and side chain hydroxyl of Asp444 and Ser447, respectively. Solvent-exposed β turns frequently interact with water molecules and it is also known that conserved waters play an important structural role in stabilizing the conformation of twisted β turns. 61, 62 Therefore, the existence of conserved W4 and W5 at these sites is fully expected. In addition, W4 and W5 could participate in the switch

ACS Paragon Plus Environment

13

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

between the active and the inactive conformations of Src kinase involving the electrostatic network of the polar residues in the respective β turns;

59

they could also interact directly or

indirectly with the activation segment's Tyr416 which makes two water-mediated hydrogen bonds to the main chain carbonyl oxygen of Arg385 and side chain carboxylate of Asp386. 63 Our findings extend the current knowledge of water mediated selectivity of Src kinase towards bosutinib, a mechanism that probably is shared by other selective kinase inhibitors, and could be valuable for the further development of selective kinase inhibitors.

Figure 4: Bosutinib selectivity towards Src kinase (PDB ID: 4MXO) mediated by conserved waters. Water clusters W1 and W2 (red ball models) as identified in the vicinity of the nitrile group of bosutinib (CPK colored sticks); conserved waters W3–W5 are newly found and play structural and functional roles near the catalytic segment.

ACS Paragon Plus Environment

14

Page 14 of 35

Page 15 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

CONSERVED WATERS IDENTIFIED AT THE HUMAN PROGRAMMED DEATH RECEPTOR–LIGAND INTERFACE Human programmed death 1 receptor (hPD–1) and its ligand, human PD–1 inhibitory receptor ligand (hPD–L1) (PDB ID: 4ZQK) are of interest in cancer chemotherapy. 64 Recently, Zak et al. reported a detailed map of interaction surface between these two proteins. Antibodies that mimic the hPD–1 on immune effector cells or hPD–L1 on tumor cells were developed for modulation of cancer–induced immunosuppression and small molecules that could interfere with this pathway are also being developed. In the reported crystal structure, hPD–1 and the N–terminal domain of hPD–L1 both assume immunoglobulin variable (IgV) topology. The protein interface is also well defined, with the central hydrophobic residues responsible for hPD–1 to hPD–L1 affinity being Val64, Ile126, Leu128, Ala132, Ile134 and Ile54, Tyr56, Met115, Ala121, Tyr123. An additional buried π-π stacking interaction between Tyr68 and Tyr123 as well as a hydrogen bond between the side chain of Asn66 and the carbonyl oxygen of Ala121 and a salt bridge between Glu136 and Arg113 were also reported to be essential to the observed protein binding. Amongst other polar contacts, two water-mediated interactions, between the amide nitrogen of Ile134, carboxylate of Glu58 from hPD–1 and hydroxyl group of Tyr56 from hPD–L1 (Figure 5; water W1) and between the side chain of Asn66 from hPD– 1 and the chain carbonyl oxygen of Ala121 from hPD–L1 were reported to stabilize the interaction of the complex (Figure 5; water W3).

64

Using the ProBiS H2O approach, we

analyzed the ligand hPD–L1 as the query protein structure (PDB ID: 4ZQK, Chain A) using the 95% sequence identical comparison set. A total of 17 protein chains were superimposed onto the query, and eight conserved water clusters were identified, two of which were

ACS Paragon Plus Environment

15

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 35

positioned in the protein interface between hPD–1 and hPD–L1 (Figure 5). We correctly identified one of the conserved waters (Figure 5; water W1) reported previously.

64

The

second conserved water cluster (Figure 5; water W2) not previously observed, is involved in the interfaces between Asp122, Tr123 and Lys124 from hPD–L1 and Tyr68, Thr76 from hPD–1 (PDB ID: 4ZQK; chain B; Figure 5). The new conserved water W2 is not present in the original crystal structure, which is likely why it was not described previously. 64 To investigate the interface further, a new experiment was performed with the ProBiS H2O approach employing 90% sequence identity clusters. This resulted in 22 superimposed protein chains and a previously reported conserved water cluster (Figure 5; water W3) which forms a water bridge between Ala121 from hPD–L1 (PDB ID: 4ZQK; chain A) and Asn66 from hPD– 1 (PDB ID: 4ZQK; chain B). A new conserved water cluster (Figure 5; water W4) was observed in vicinity of Lys124, Arg125 from hPD–L1 and Gln75, Thr76 from hPD–1. All four identified clusters are presented in Figure 5, in which two of the reported discrete water clusters have not been reported previously. Novel conserved waters could aid in the placement of waters in future crystal structures or could be used in the analysis of existing similar protein-protein complexes. The strength of ProBiS H2O is that it collects waters from similar structures, and thus finds important waters that are missing in the query crystal structure.

ACS Paragon Plus Environment

16

Page 17 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 5: Conserved waters at the interface of human programmed death 1 protein (hPD–1) colored light–blue and its ligand hPD–L1 colored light–green (PDB ID: 4ZQK). Known conserved water clusters (W1, W3; green ball models) as identified on the interface of (PSB ID: 4ZQK) with newly reported conserved water clusters (W2, W4; green ball models). Conserved water neighbouring residues are presented in yellow stick representation with conserved water contacts displayed in magenta.

CONSERVED WATER MOLECULE AT THE ATP BINDING SITE OF DNA GYRASE B The ProBiS H2O approach was used successfully in the discovery of DNA Gyrase B inhibitors as novel antibacterials (Figure 6). 65 Using this approach, a highly conserved water molecule was identified at the ATP–binding pocket of the DNA Gyrase B (PDB ID: 4DUH), which forms a hydrogen bond with known inhibitors. This conserved water molecule (conservation score of 0.96; sequence identity of 95%) is in close proximity to an existing inhibitor 4–{[4'–methyl–2'–(propanoylamino)–4,5'–bi–1,3–thiazol–2–yl]amino}benzoic acid (Ligand ID: RLI). The water molecules in this cluster form hydrogen bonds with Asp73 and

ACS Paragon Plus Environment

17

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

the terminal amide and aromatic nitrogen moieties of the inhibitor. In order to substantiate this result, a similar calculation was performed using BLASTClust pre-clustered protein structures at 30% sequence similarity from the query protein, which resulted in 84 superimposed chains and a system of 20,047 water molecules. The identified most conserved water cluster with conservation of 0.95 contained 80 water molecules and was the same key conserved water molecule as found with the 95% sequence identity threshold. Thus, we used this water molecule (water 451, 4DUH) in a molecular docking experiment, which resulted in identification of high–nanomolar (IC50 = 480 nM) inhibitor of DNA Gyrase B of E. coli with an aminopiperidine central linker moiety. 65

Figure 6: Identification of conserved water with ProBiS H2O facilitates discovery of antibacterials. Conserved water (red ball model) forms hydrogen bond with bithiazole inhibitor (blue sticks) in DNA Gyrase B enzyme (PDB ID: 4DUH). Distance from the conserved water cluster to the inhibitor thiazole nitrogen is measured at 3 Å and colored magenta.

ACS Paragon Plus Environment

18

Page 18 of 35

Page 19 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

HUMAN MATRIX METALLOPROTEINASE METAL ION-WATER INTERACTIONS Human matrix metalloproteinase (hMMP–1) plays a key role in embryonic development and tissue remodeling, and a recent MD study suggested that catalytic zinc ions in the active site of hMMP-1 are coordinated by conserved water molecules. Two zinc ions are stabilized by the surrounding histidine residues and can form additional water–bridge interactions through neighboring conserved water molecules. Using ProBiS H2O, with hMMP–1 (PDB ID: 1HFC, Chain ID: A) as the query and a set of 15 protein structures sharing >95 % sequence identity with the query, we identified 12 water clusters (waters 295, 307 and 348) with high conservation above 0.6, 2 of which are in close proximity to catalytic zinc ions. (Figure 7). The three conserved waters coordinate zinc ions and support the previous theoretical observations, thus elaborating the hMMP–1 catalytic mechanism. 66

Figure 7: Conserved waters coordinate zinc ions. The two conserved waters (red ball models) and the two catalytic Zn2+ ions (magenta ball models) in human matrix metalloproteinase (PDB ID: 1HFC) (cartoon model).

ACS Paragon Plus Environment

19

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

COMPARISON WITH OTHER APPROACHES ProBiS H2O was used for identification of discrete conserved water clusters in multiple macromolecular systems (Table 1). In comparison with PyWATER

36

, an approach based on

whole protein backbone superimpositions that detect conserved water clusters that are above a certain used-defined conservation threshold, our approach generates all possible water clusters in a single calculation. In ProBiS H2O, all conservation levels from the most conserved water clusters in the examined protein system to the bulk water clusters containing only two water molecules are calculated. PyWATER on the other hand, requires the user to preselect the minimum water conservation level, and then run a calculation for each selected conservation level separately. At low conservations of hydration sites, both tools effectively identify the bulk hydration blankets of proteins and produce comparable number of water clusters. At high conservations scores (≥0.86, Table 1), ProBiS H2O generally detects fewer highly conserved water clusters than PyWATER. This difference can be attributed to different approaches and parameters employed. PyWATER uses crystal structures with resolution of ≤2.0 Å and filters water molecules according to their normalized B-factors. It performs global protein structure superimposition on backbone atoms, whereas in ProBiS H2O, local superimposition of protein surfaces is performed. PyWATER also calculates water clusters by general linkage hierarchical clustering with a 2.4 Å cutoff, whereas in ProBiS H2O a radius of 0.9 Å and a density-based clustering algorithm are used. Thus, a stricter clustering is performed in ProBiS H2O, giving fewer clusters in which waters are more tightly packed. With DNA Gyrase B (PDB ID: 4DUH, Chain B) as the query protein, PyWATER reports 27 water clusters with conservation ≥0.9 and with residue numbers: 404, 411, 413, 415, 438, 439, 441, 442, 443, 446, 451, 452, 453, 460, 462, 463, 464, 465, 471, 479, 480, 484, 487, 534, 551,

ACS Paragon Plus Environment

20

Page 20 of 35

Page 21 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

552 and 554, while ProBiS H2O identifies only one cluster (conservation ≥0.9) with residue number 451. This water 451 is in hydrogen bonding distance of carboxylate group of Asp73, and main chain amide groups of residues 75-55, as well as it forms a hydrogen bond to the inhibitor, which makes it relevant to drug design. ProBiS H2O thus makes it possible to distinguish between the nuances in crystal water densities. In this case, it enables inspection of fewer possibly interesting conserved water sites and allows the focus on a few conserved water sites of high interest to drug discovery. ProBiS H2O thus pinpoints highly conserved waters with the option to inspect and co-visualise all hydration sites in a specific binding site or whole protein chains while still supplying the researcher with relevant information on all clustering data. It is also noteworthy that on a typical workstation with Intel i7–4710HQ CPU, ProBiS H2O is faster than PyWATER, producing results from a complete conservation water calculation typically in 33% of the time. For example the calculation for protein 1HET: A from Table 1 with ProBiS H2O was done in 1’17’’, and in 3’34’’ with PyWATER. ProBiS H2O therefore grants several advantages: it can compare protein structures locally by focusing the search on specific binding site or globally by taking into account entire protein structure. It is a comparatively faster workflow that generates results for multiple water conservation levels in a single calculation.

Table 1: Comparison of ProBiS H2O with other approaches.1 ProBiS H2O

PDB

No.

ID:Chain ID

highly

PyWATER

of No. of low No conserved

of No.

superimposed highly

ACS Paragon Plus Environment

21

of No. low

of No

of

superimposed

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 35

conserved waters2

chains

waters2

95% seq. id.

waters2

waters2

95% seq. id.

1

at conserved conserved chains

1BUG : A 67

25 (1.0)

71 (0.5)

6

- (0.9)

- (0.5)

1HET : A 68

2 (0.9)

511 (0.5)

79

72 (0.9)

296 (0.5) 95

1F41 : A 69

1 (0.87)

241 (0.51) 470

7 (0.9)

33 (0.5)

4BRC : A 70

2 (0.88)

136 (0.57) 41

24 (0.9)

155 (0.5) 36

4P6R : A 71

1 (0.86)

61 (0.51)

35

25 (0.9)

63 (0.5)

9

4DUH : B 65

1 (0.92)

4 (0.5)

24

27 (0.9)

85 (0.5)

6

1

at

511

– Default settings were used in both plugins; 2 – Conservation scores are small italics text in

parentheses.

FURTHER VALIDATION The ProBiS H2O approach was also validated with the crystal structure of alcohol dehydrogenase (PDB ID: 1HET; Table 1) for which Bottoms et al. reported the role of conserved water molecules in a nucleotide–binding Rosmann motif. 68 In this case, phosphates of co-crystallized nicotinamide-adenine dinucleotide (NAD) were found to interact with neighboring residues Leu199, Gly200, Gly201 and Lys227 through water bridges with conserved waters with conservation of 0.84 (PDB ID: 1HET, waters 2536 and 2745), which were found in 79 superimposed protein chains.

ACS Paragon Plus Environment

22

Page 23 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

ProBiS H2O also confirmed an MD experiment by Banerjee et al. in which a discrete water cluster with conservation of 0.87 between catalytic amino acid residues His88 and Trp79 of human transthyretin chain A (PDB ID: 1F41) was identified amongst a dataset of 470 superimposed protein chains in a 95% sequence similarity cluster with a clustering calculation on 72,430 water molecules.

69

In this case, five clusters with conservation of 0.85 placed

discrete water hydration sites near Thr75 and Ser112. Conserved water molecules were also identified in the nucleoside triphosphate diphosphohydrolase (PDB ID: 4BRC) system reported by Zebich et al.

70

Several water

clusters with conservation of one could be identified between non–hydrolysable analogue of ATP (AMPNP) and Gln193 in close proximity to an Mg2+ ion. It was postulated by the authors that this conserved water might provide a transient binding site for the cleaved phosphate prior to expulsion of products from the active site. This demonstrates the critical role conserved waters can play in mechanistic studies. Another study with ProBiS H2O was also conducted on type–3 copper protein (PDB ID: 4P6R) in which a conserved water molecule cluster was identified in close proximity to two Zn2+ ions forming bridges to Glu195 and Asn205. 71 The authors suggested that this conserved water molecule serves as the base for deprotonation of the monophenols in the active site of this tyrosinase and also supply evidence on the importance of discrete conserved water site identification.

4. CONCLUSION It is believed by medicinal chemists that water molecules in polar environments are less likely to be displaced by ligands than those in hydrophobic regions, which can be displaced easily

ACS Paragon Plus Environment

23

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

and contribute favorably to ligand binding. This rule has been extended with the notion of conserved waters that form key interactions with drug molecules. Reports that recognize the importance of conserved waters in drug design are appearing increasingly and in silico water analysis tools are becoming an integral part of drug design efforts. We developed the ProBiS H2O, a robust, fast and experimental data-based approach for identification of discrete conserved water sites in protein structures to be used in drug discovery. The ProBiS H2O plugin is useful in the design of drug–protein interactions, for prediction of protein structural stability, and identification of water networks in proteins.

Additional information ProBiS H2O plugin is accompanied by the User Guide and Tutorial documents that can be found at http://insilab.org/probis-h2o. Acknowledgments This work is supported by the Slovenian Research Agency project grants J1–6743: Development of computational tools for modelling of pharmaceutically interesting compounds, L7–8269: New approaches for better biopharmaceuticals and Z1-8158: Discovery of new inhibitors of bacterial peptidoglycan biosynthesis enzymes MurA and MurB. We also gratefully acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40 GPU used for this research. Conflict of Interest The authors declare they have no conflict of interest.

ACS Paragon Plus Environment

24

Page 24 of 35

Page 25 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

REFERENCES 1. Kubinyi, H. Hydrogen Bonding: The Last Mystery in Drug Design. Pharmacokinetic optimization in drug research 2001, 513–524. 2. Poornima, C. S.; Dean, P. M. Hydration in Drug Design. 3. Conserved Water Molecules at the Ligand–Binding Sites Of Homologous Proteins. J. Comput. Aided. Mol. Des. 1995, 9, 521–531. 3. Bös, F.; Pleiss, J. Conserved Water Molecules Stabilize the Ω–Loop in Class A Β– Lactamases. Antimicrob. Agents. Ch. 2008, 52, 1072–1079. 4. Jeszenoi, N.; Bálint, M.; Horváth, I.; Van Der Spoel, D.; Hetényi, C. Exploration Of Interfacial Hydration Networks Of Target–Ligand Complexes. J. chem. Inf. Model. 2016, 56, 148–158. 5. Harms, M. J.; Castañeda, C. A.; Schlessman, J. L.; Sue, G. R., Isom, D. G.; Cannon, B. R. The pka Values of Acidic and Basic Residues Buried at The Same Internal Location in a Protein are Governed by Different Factors. J. Mol. Biol. 2009, 389, 34– 47. 6. Clarke, C.; Woods, R. J.; Gluska, J.; Cooper, A.; Nutley, M. A.; Boons, G. J. Involvement of Water in Carbohydrate− Protein Binding. J. Am. Chem. Soc. 2001, 123, 12238–12247. 7. McRobb, F. M.; Negri, A.; Beuming, T.; Sherman, W. Molecular Dynamics Techniques for Modeling G Protein–Coupled Receptors. Curr. Opin. Pharmacol. 2016, 30, 69–75.

ACS Paragon Plus Environment

25

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

8. Kaur, M.; Bahia, M. S.; Silakari, O. Exploring The Role Of Water Molecules for Docking and Receptor Guided 3D–QSAR Analysis of Naphthyridine Derivatives as Spleen Tyrosine Kinase (Syk) Inhibitors. J. chem. Inf. Model. 2012, 52, 2619–2630. 9. Klebe, G. Applying Thermodynamic Profiling in Lead Finding and Optimization. Nat. Rev. Drug Discov. 2015, 14, 95-110. 10. Spyrakis, F.; Ahmed, M. H.; Bayden, A. S.; Cozzini, P.; Mozzarelli, A.; Kellogg, G. E. The Roles of Water in the Protein Matrix: A Largely Untapped Resource for Drug Discovery. J. Med. Chem. 2017, 60, 6781–6827. 11. Barillari, C.; Taylor, J.; Viner, R.; Essex, J. W. Classification of Water Molecules in Protein Binding Sites. J. Am. Chem. Soc. 2007, 129, 2577–2587. 12. Kadirvelraj, R.; Foley, B. L.; Dyekjær, J. D.; Woods, R. J. Involvement of Water in Carbohydrate− Protein Binding: Concanavalin a Revisited. J. Am. Chem. Soc. 2008, 130, 16933–16942. 13. Ladbury, J. E. Just Add Water! The Effect of Water on the Specificity Of Protein– Ligand Binding Sites And Its Potential Application to Drug Design. Chem. Biol. 1996, 3, 973–980. 14. Bodnarchuk, M.S.; Viner, R.; Michel, J.; Essex, J. W. Strategies to Calculate Water Binding Free Energies in Protein–Ligand Complexes. J. Chem. Inf. Model. 2014, 54, 1623–1633. 15. Nguyen, T. T.; Viet, M. H.; Li, M. S. Effects of Water Models on Binding Affinity: Evidence From All–Atom Simulation of Binding Of Tamiflu To A/H5N1 Neuraminidase. Scientific World J. 2014, 2014, 1-14.

ACS Paragon Plus Environment

26

Page 26 of 35

Page 27 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

16. Abel, R.; Young, T.; Farid, R.; Berne, B. J.; Friesner, R. A. Role of the Active–Site Solvent in The Thermodynamics Of Factor Xa Ligand Binding. J. Am. Chem. Soc. 2008, 130, 2817–2831. 17. Cui, G.; Swails, J. M.; Manas, E.S. SPAM: A Simple Approach for Profiling Bound Water Molecules. J. Chem. Theory. Comput. 2013, 9, 5539–5549. 18. Kovalenko, A.; Hirata, F. Three–Dimensional Density Profiles of Water in Contact With a Solute of Arbitrary Shape: A RISM Approach. Chem. Phys. Let. 1998, 290, 237–244. 19. Ross, G. A.; Bodnarchuk, M. S.; Essex, J. W. Water Sites, Networks, and Free Energies With Grand Canonical Monte Carlo. J. Am. Chem. Soc. 2015, 137, 14930– 14943. 20. Bodnarchuk, M. S. Water, Water, Everywhere… It's Time to Stop and Think. Drug Discov. Today 2016, 21, 1139–1146. 21. Bayden, A. S.; Moustakas, D. T.; McCarthy, D. J.; Lamb, M. L. Evaluating Free Energies of Binding and Conservation of Crystallographic Waters Using SZMAP. J. Chem. Inf. Model. 2015, 55, 1552–1565. 22. Baroni, M.; Cruciani, G.; Sciabola, S.; Perruccio, F.; Mason, J. S. A Common Reference Framework for Analyzing/Comparing Proteins and Ligands. Fingerprints for Ligands and Proteins (FLAP): Theory and Application. J. Chem. Inf. Model. 2007, 47, 279–294. 23. Sridhar, A.; Ross, G. A.; Biggin, P. C. Waterdock 2.0: Water Placement Prediction for Holo–Structures With a Pymol Plugin. PloS One 2017, 12, 1-17.

ACS Paragon Plus Environment

27

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

24. García–Sosa, A. T.; Mancera, R. L.; Dean, P. M. Waterscore: A Novel Method for Distinguishing Between Bound and Displaceable Water Molecules in the Crystal Structure of the Binding Site of Protein–Ligand Complexes. J. mol. Model. 2003, 9, 172–182. 25. Raymer, M. L.; Sanschagrin, P. C.; Punch, W. F.; Venkataraman, S.; Goodman, E. D.; Kuhn, L. A. Predicting Conserved Water–Mediated and Polar Ligand Interactions in Proteins Using A K–Nearest–Neighbors Genetic Algorithm . J. Mol. Biol. 1997, 265, 445–464. 26. Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. The Protein Data Bank. Nucleic Acids Res. 2000, 28: 235–242. 27. Rose, P. W.; Prlić, A.; Altunkaya, A.; Bi, C.; Bradley, A. R.; Christie, C. H.; Di Costanzo, L.; Duarte, J. M.; Dutta, S.; Feng, Z.; Kramer Green, R.; Goodsell, D. S.; Hudson, B.; Kalro, T.; Lowe, R.; Peisach, E.; Randle, C.; Rose, A. S.; Shao, C.; Tao, Y.; Valasatava, Y.; Voigt, M.; Westbrook, J. D.; Woo, J.; Yang, H.; Young, J. Y.; Zardecki, C.; Berman, H. M.; Burley, S. K. The RCSB Protein Data Bank: Integrative View of Protein, Gene and 3D Structural Information Nucleic Acids Res. 2017, 45: D271–D281. 28. Berman, H. M.; Henrick, K.; Nakamura. H. Announcing the Worldwide Protein Data Bank Nat. Struct. Biol. 2003, 10, 980-980. 29. Kinjo, A. R.; Bekker, G. J.; Suzuki, H.; Tsuchiya, Y.; Kawabata, T.; Ikegawa, Y.; Nakamura, H. Protein Data Bank Japan (Pdbj): Updated User Interfaces, Resource

ACS Paragon Plus Environment

28

Page 28 of 35

Page 29 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Description Framework, Analysis Tools for Large Structures. Nucleic Acids Res. 2000, 45, D282–D288. 30. Kearney, B. M.; Johnson, C. W.; Roberts, D. M.; Swartz, P.; Mattos, C. Drop: A Water Analysis Program Identifies Ras–GTP–Specific Pathway of Communication Between Membrane–Interacting Regions and the Active Site. J. Mol. Biol. 2014, 426, 611–629. 31. Shindyalov, I. N.; Bourne, P. E. Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path. Protein Eng. 1998, 11, 739–47. 32. Cole, J. C.; Giangreco, I.; Groom, C. R. Using More Than 801 296 Small–Molecule Crystal Structures to Aid in Protein Structure Refinement and Analysis. Acta. Crystallogr. D 2017, 73, 234–239. 33. Radoux, C. J.; Olsson, T. S.; Pitt, W. R.; Groom, C. R.; Blundell, T. L. Identifying Interactions That Determine Fragment Binding at Protein Hotspots. J. Med. Chem. 2016, 59, 4314–4325. 34. Rossato, G.; Ernst, B.; Vedani, A.; Smiesko, M. Acquaalta: A Directional Approach to the Solvation of Ligand–Protein Complexes. J. chem. Inf. Model. 2011, 51, 1867– 1881. 35. Heyer, L. J.; Kruglyak, S.; Yooseph, S. Exploring Expression Data: Identification and Analysis of Coexpressed Genes. Genome Res. 1999, 9, 1106–15. 36. Patel, H.; Grüning, B. A.; Günther, S.; Merfort, I. PYWATER: A Pymol Plugin to Find Conserved Water Molecules in Proteins by Clustering. Bioinformatics 2014, 30, 29782980. 37. DeLano, W. L. Pymol: An Open–Source Molecular Graphics Tool. CCP4 Newsletter On Protein Crystallography 2002, 40, 82–92.

ACS Paragon Plus Environment

29

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

38. Yang, Y.; Hu, B.; Lill, M. A. Analysis of Factors Influencing Hydration Site Prediction Based on Molecular Dynamics Simulations. J. chem. Inf. Model. 2014, 54, 2987– 2995. 39. Sanschagrin, P. C.; Kuhn, L. A. Cluster Analysis of Consensus Water Sites in Thrombin and Trypsin Shows Conservation Between Serine Proteases And Contributions to Ligand Specificity. Protein Sci. 1998, 7, 2054–2064. 40. Loris, R.; Langhorst, U.; De Vos, S.; Decanniere, K.; Bouckaert, J.; Maes, D.; Transue, T. R.; Steyaert, J. Conserved Water Molecules in a Large Family Of Microbial Ribonucleases. Proteins 1999, 36, 117–134. 41. Knight, J. D.; Hamelberg, D.; McCammon, J. A.; Kothary, R. The Role of Conserved Water Molecules in the Catalytic Domain of Protein Kinases. Proteins 2009, 76, 527– 535. 42. Teze, D.; Hendrickx, J.; Dion, M.; Tellier, C.; Woods Jr, V. L.; Tran, V.; Sanejouand, Y. H. Conserved Water Molecules in Family 1 Glycosidases: A DXMS and Molecular Dynamics Study. Biochemistry 2013, 52, 5900–5910. 43. Štular, T.; Lesnik, S.; Rozman, K.; Schink, J.; Zdouc, M.; Ghysels, A.; Liu, F.; Aldrich, C. C.; Haupt, V. J.; Salentin, S.; Daminelli, S.; Schroeder, M.; Langer, T.; Gobec, S.; Janežič, D.; Konc, J. Discovery of Mycobacterium Tuberculosis InhA Inhibitors by Binding Sites Comparison and Ligands Prediction. J. Med. Chem. 2016, 59, 1106911078. 44. Konc, J.; Janežič, D. Probis-Ligands: A Web Server for Prediction of Ligands by Examination of Protein Binding Sites. Nucleic Acids Res. 2014, 42, W215-W220.

ACS Paragon Plus Environment

30

Page 30 of 35

Page 31 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

45. Konc, J.; Skrlj, B.; Erzen, N.; Kunej, T.; Janezic, D. GenProBiS: Web Server for Mapping of Sequence Variants to Protein Binding Sites. Nucleic Acids Res. 2017, 45, W253–W259. 46. Konc, J.; Janežič, D. ProBiS Algorithm for Detection of Structurally Similar Protein Binding Sites by Local Structural Alignment. Bioinformatics 2010, 26, 1160–1168. 47. Konc, J.; Janezic, D. An Improved Branch and Bound Algorithm for the Maximum Clique Problem. MATCH Commun. Math. Comput. Chem. 2007, 58, 569–590. 48. Altschul, S. F.; Gish, W.; Miller, W.; Myers, E. W.; Lipman, D. J. Basic Local Alignment Search Tool. J. Mol. Biol. 1990, 215, 403–410. 49. Li, W.; Godzik, A. Cd-hit: A Fast Program for Clusteringa and Comparing Large Sets of Protein or Nucleotide Sequences, Bioinformatics 2006, 22, 1658-1659. 50. Depolli, M.; Konc, J.; Rozman, K.; Trobec, R.; Janezic, D. Exact Parallel Maximum Clique Algorithm for General and Protein Graphs. J. Chem. Inf. Model. 2013, 53, 2217–2228. 51. Konc, J.; Janežič, D. Protein− Protein Binding–Sites Prediction by Protein Surface Structure Conservation. J. Chem. Inf. Model. 2007, 47, 940–944. 52. Konc, J.; Česnik, T.; Konc, J. T.; Penca, M.; Janežič, D. ProBiS–Database: Precalculated Binding Site Similarities And Local Pairwise Alignments of PDB Structures. J. Chem. Inf. Model. 2012, 52, 604–612. 53. Ester, M.; Kriegel, H–P.; Sander, J.; Xu, X. A Density–Based Algorithm for Discovering Clusters in Large Spatial Databases With Noise. Proceedings of the

ACS Paragon Plus Environment

31

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Second International Conference on Knowledge Discovery and Data Mining 1996, KDD–96, 226–231. 54. Sander, J.; Ester, M.; Kriegel, H. P.; Xu, X. Density–Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Application. Data. Min. knowl. Disc. 1998, 2, 169–194. 55. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; Vanderplas, J.; Passos, A.; Cournapeau, D.; Brucher, M.; Perrot, M.; Duchesnay, É. Scikit–Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. 56. Levinson, N. M.; Boxer, S.G. A Conserved Water–Mediated Hydrogen Bond Network Defines Bosutinib's Kinase Selectivity. Nat. Chem. Biol. 2014, 10, 127–132. 57. Dar, A. C.; Shokat, K. M. The Evolution of Protein Kinase Inhibitors From Antagonists To Agonists of Cellular Signaling. Annu. Rev. Biochem. 2011, 80, 769795. 58. Natarajan, K.; Neuwald, A. F. Did Protein Kinase Regulatory Mechanisms Evolve Through Elaboration of a Simple Structural Component? J. Mol. Biol. 2005, 351, 956– 972. 59. Ozkirimli, E.; Post, C. B. Src Kinase Activation: A Switched Electrostatic Network. Protein Sci. 2006, 15, 1051–1062. 60. Roskoski, R. Jr. Src Protein–Tyrosine Kinase Structure and Regulation. Biochem. Biophys. Res. Commun. 2004, 324, 1155–1164.

ACS Paragon Plus Environment

32

Page 32 of 35

Page 33 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

61. Rose, G. D.; Young, W. B.; Gierasch, L. M. Interior Turns in Globular Proteins. Nature, 1983, 304, 654–657. 62. Ogata, K.; Wodak, S. J. Conserved Water Molecules in MHC Class-I Molecules and Their Putative Structural and Functional Roles. Protein Eng. 2002, 15, 697–705. 63. Schindler, T.; Sicheri, F.; Pico, A.; Gazit, A.; Levitzki, A.; Kuriyan, J. Crystal Structure of Hck in Complex With a Src Family–Selective Tyrosine Kinase Inhibitor. Mol. Cell, 1999, 3, 639-648. 64. Zak, K. M.; Kitel, R.; Przetocka, S.; Golik, P.; Guzik, K.; Musielak, B.; Dömling, A.; Dubin, G.; Holak, T. A. Structure of the Complex of Human Programmed Death 1, PD–1, and Its Ligand PD–L1. Structure 2015, 23, 2341–2348. 65. Jukič, M.; Ilaš, J.; Brvar, M.; Kikelj, D.; Cesar, J.; Anderluh, M. Linker–Switch Approach Towards New ATP Binding Site Inhibitors of DNA Gyrase B. Eur. J. Med. Chem. 2017, 125, 500–514. 66. Chakrabarti, B.; Bairagya, H. R.; Mukhopadhyay, B. P.; Sekar, K. New Biochemical Insight of Conserved Water Molecules at Catalytic and Structural Zn2+ Ions in Human Matrix Metalloproteinase–I: A Study by MD–Simulation. J. Mol. Model. 2017, 23, 5770. 67. Favre, E.; Daina, A.; Carrupt, P. A.; Nurisso, A. Modeling the Met Form of Human Tyrosinase: A Refined and Hydrated Pocket for Antagonist Design. Chem. Boil. Drug. Des. 2014, 84, 206–215. 68. Bottoms, C. A.; Smith, P. E.; Tanner, J. J. A Structurally Conserved Water Molecule in Rossmann DinucleotideuBinding Domains. Protein Sci. 2002, 11, 2125–2137.

ACS Paragon Plus Environment

33

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

69. Banerjee, A.; Mukhopadhyay, B. P. An Insight to the Conserved Water Mediated Dynamics of Catalytic His88 and its Recognition to Thyroxin and RBP Binding Residues in Human Transthyretin. J. Biomol. Struct. Dyn. 2015, 33, 1973–1988. 70. Zebisch, M.; Krauss, M.; Schäfer, P.; Lauble, P.; Sträter, N. Crystallographic Snapshots Along the Reaction Pathway of Nucleoside Triphosphate Diphosphohydrolases. Structure 2013, 21, 1460–1475. 71. Goldfeder, M.; Kanteev, M.; Isaschar–Ovdat, S.; Adir, N.; Fishman, A. Determination of Tyrosinase Substrate–Binding Modes Reveals Mechanistic Differences Between Type–3 Copper Proteins. Nat. Commun. 2014, 5405, 1–5.

ACS Paragon Plus Environment

34

Page 34 of 35

Page 35 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

For Table of Contents Use Only (TOC graphic)

Identification of Conserved Water Sites in Protein Structures for Drug Design Marko Jukič, Janez Konc, Stanislav Gobec and Dušanka Janežič*

ACS Paragon Plus Environment

35