Placement of Water Molecules in Protein Structures - ACS Publications

May 4, 2018 - ABSTRACT: Water molecules are of great importance for the correct representation ... frequently applied software solutions require subst...
0 downloads 0 Views 6MB Size
Article pubs.acs.org/jcim

Cite This: J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Placement of Water Molecules in Protein Structures: From LargeScale Evaluations to Single-Case Examples Eva Nittinger,† Florian Flachsenberg,† Stefan Bietz,† Gudrun Lange,‡ Robert Klein,‡ and Matthias Rarey*,† †

Universität Hamburg, ZBH − Center for Bioinformatics, Bundesstraße 43, 20146 Hamburg, Germany Bayer CropScience AG, Industriepark Hoechst G836, 65926 Frankfurt am Main, Germany



Downloaded via STEPHEN F AUSTIN STATE UNIV on July 27, 2018 at 04:31:44 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.

S Supporting Information *

ABSTRACT: Water molecules are of great importance for the correct representation of ligand binding interactions. Throughout the last years, water molecules and their integration into drug design strategies have received increasing attention. Nowadays a variety of tools are available to place and score water molecules. However, the most frequently applied software solutions require substantial computational resources. In addition, none of the existing methods has been rigorously evaluated on the basis of a large number of diverse protein complexes. Therefore, we present a novel method for placing water molecules, called WarPP, based on interaction geometries previously derived from protein crystal structures. Using a large, previously compiled, high-quality validation set of almost 1500 protein−ligand complexes containing almost 20 000 crystallographically observed water molecules in their active sites, we validated our placement strategy. We correctly placed 80% of the water molecules within 1.0 Å of a crystallographically observed one.



INTRODUCTION A good understanding of water molecules and their interactions with proteins and small molecules is essential for the prediction of protein−ligand binding geometries and affinities. Not surprisingly, the interest in individual water molecules and their contribution to molecular interactions, and by this to the binding affinity, has increased dramatically in the past years. Not only do water molecules mediate interactions between protein and ligands, but also, their displacement can be a major contributor to protein−ligand binding affinity.1,2 This increased interest is also reflected by the number of tools and methods available nowadays for the prediction, placement, and scoring of water molecules, ranging from rather simple geometric scoring criteria to extensive molecular dynamics (MD) simulations. Available methods can be separated into four different classes: (1) empirical and knowledge-based methods (Consolv,3 WatCH,4 WaterScore,5 PyWATER,6 Proasis WaterRank,7 the relevance metric,8 AQUARIUS2,9,10 WATGEN,11 AcquaAlta,12 WaterDock,13 Tetrahedron-water-cluster model,14 Fold-X,15 HINT (Hydropathic Interactions) toolkit,16 Dowser++17), (2) statistical and molecular mechanics methods (GRID,18−20 3D-RISM,21,22 MCSS,23−25 WaterFLAP,26 wPMF,27 SZMAP28), (3) MD simulation methods (WaterMap,29,30 GIST,31 STOW,32 WATCLUST,33 SPAM,34 WATsite,35,36 GCT,37−39 BiKi Hydra 40 ), and (4) Monte Carlo simulation methods (RETI,41,42 the double decoupling method,43,44 double decoupling with RETI,45 MCRS,46 JAWS,47 GCMC48−50). The first category can be further classified according to the aim of the method. Some of those methods identify conserved © XXXX American Chemical Society

crystallographically determined water molecules, others try to assign a relevance score to them, while others place and/or score water positions. The number of protein structures used for evaluation of the methods declines throughout the four classes. Empirical and knowledge-based methods have been evaluated on seven to 193 structures, statistical and molecular mechanics methods on zero to 100 structures, MD simulation methods on fewer than 10 structures, and Monte Carlo simulation methods on fewer than 15 structures. Table 1 provides a comprehensive list of all of these methods, including short descriptions and information about their evaluation. We refer to a recent review51 and perspective52 on water for more detailed information about the various methods. A consistent, reliable, and fast water placement procedure is important for different application scenarios. Crystal structures with low resolution (>2.7 Å) do not allow modeling of water molecules.69 Usually, protein−ligand docking poses are generated without water molecules. However, water molecules are important for the correct estimation of their binding affinity. Most of the frequently used software solutions for water placement are time-consuming, preventing their dynamic application to a large number of protein−ligand complexes. Especially for the development of water placement and prediction methods, the data used for training as well as evaluation display an important and at the same time difficult aspect. The individual energy contribution of a water molecule cannot be measured experimentally. The difference in energy Received: May 4, 2018

A

DOI: 10.1021/acs.jcim.8b00271 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

B

multiple-copy simultaneous search (MCSS)23−25 WaterFLAP26 water potential of mean force (wPMF)27

3D reference interaction site model (3D-RISM)21,22

GRID18−20

HINT (Hydropathic Interactions) toolkit16 Dowser++17

Fold-X15

tetrahedron-watercluster model14

WaterDock13

AcquaAlta12

WATGEN11

AQUARIUS29,10

relevance metric8

Proasis WaterRank7

PyWATER6

WatCH4 WaterScore5

Consolv3

method

evaluation

GRID-based water prediction with different probes for entropic contribution (CRY, ENTR) aim: predict hydration sites in proteins; radial distribution functions of water in the proximity of protein atom types combined with equivalent potentials of mean force to predict hydration sites and assign wPMF scores to waters; trained on 3946 protein structures: extraction of water structure pattern and hydrophilicity; grid-based clustering scheme with wPMF to predict water sites

seven targets, ∼90% within 1.5 Å, ∼60% within 1.0 Å 100 crystal structures; 80% of predicted clusters occupied by an X-ray water within 1.4 Å

Empirical/Knowledge-Based Methods: Identification of Conserved X-ray Water Molecules evaluation of four environmental factors: B factor, number of H-bonds to the protein, density of neighboring protein atoms, training: 13 free vs ligand bound structures; test: seven structures (75%); 1.2 Å between hydrophilicity waters in the ligand-free and bound structures as the conservation criterion hierarchical clustering to identify conserved waters in related structures 10 thrombin, three trypsin, four BPTI, two trypsin/BPTI structures score consisting of a combination of B factor, solvent contact surface area, total H-bond energy, and number of protein training: 25 protein pairs; 0.5 Å between waters in the ligand-free and bound structures as contacts