Binding Space Concept: A New Approach To Enhance the

Mean scores are the parameters affording the largest statistical improvement, and all the here proposed docking-based descriptors show enhancing effec...
7 downloads 13 Views 3MB Size
Article pubs.acs.org/jcim

Binding Space Concept: A New Approach To Enhance the Reliability of Docking Scores and Its Application to Predicting Butyrylcholinesterase Hydrolytic Activity Giulio Vistoli,*,† Angelica Mazzolari,† Bernard Testa,‡ and Alessandro Pedretti† †

Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, Via Mangiagalli, 25, I-20133 Milano, Italy Department of Pharmacy, University Hospital Centre (CHUV), Rue du Bugnon, CH-1011 Lausanne, Switzerland



S Supporting Information *

ABSTRACT: Docking simulations are very popular approaches able to assess the capacity of a given ligand to interact with a target. Docking simulations are usually focused on a single best complex even though many studies showed that ligands retain a significant mobility within a binding pocket by assuming different binding modes all of which may contribute to the monitored ligand affinity. The present study describes an innovative concept, the binding space, which allows an exploration of the ligand mobility within the binding pocket by simultaneously considering several ligand poses as generated by docking simulations. The multiple poses and the relative docking scores can then be analyzed by taking advantage of the same concepts already used in the property space analysis; hence the binding space can be parametrized by (a) mean scores, (b) score ranges, and (c) score sensitivity values. The first parameter represents a very simple procedure to account for the contribution of the often neglected alternative binding modes, while the last two descriptors encode the degree of mobility which a given ligand retains within the binding cavity (score range) as well as the ease with which a ligand explores such a mobility (score sensitivity). Here, the binding space concept is applied to the prediction of the hydrolytic activity of BChE by synergistically considering multiple poses and multiple protein structures. The obtained results shed light on the remarkable potential of the binding space concept, whose parameters allow a significant increase of the predictive power of the docking results as revealed by extended correlative analyses. Mean scores are the parameters affording the largest statistical improvement, and all the here proposed docking-based descriptors show enhancing effects in developing predictive models. Finally, the study describes a new score function (Contacts score) simply based on the number of surrounding residues which appears to be particularly productive in the framework of the binding space. when flexibility is introduced only locally in a few side chains surrounding the docked ligands.4 The difficulty in accounting for protein flexibility can be overcome by two different approaches.5 After docking calculations, the generated complexes can be refined by postdocking simulations which are expected to favor a mutual and flexible adaptability of the two interaction partners. Alternatively, one may perform parallel docking campaigns utilizing different protein structures in the so-called ensemble docking which utilizes more resolved structures, if available, or

1. INTRODUCTION Docking simulations are gaining increasing relevance in all critical phases of the drug discovery pipeline, starting with hit identification and lead optimization up to metabolism and/or toxicity prediction.1−3 Basically, docking calculations provide a reliable estimate about how strongly a ligand is expected to interact with a given protein target and generate the best putative complexes able to maximize stabilizing interactions. Currently, docking simulations consider the ligand as flexible, while the protein target is often simulated as rigid, thus raising the problem of protein flexibility. In fact, few docking approaches directly implement protein flexibility, especially as this renders the calculations hugely more demanding even © 2017 American Chemical Society

Received: February 27, 2017 Published: June 20, 2017 1691

DOI: 10.1021/acs.jcim.7b00121 J. Chem. Inf. Model. 2017, 57, 1691−1702

Article

Journal of Chemical Information and Modeling

Figure 1. Scheme describing the three main applications of the proposed binding space concept, together with the corresponding parameters.

toward esters.14 BChE is covalently inhibited by carbamates and organophosphorous poisons (OPs), which react with the catalytic serine residue (Ser198) acting as hemisubstrates due to a very slow deacylation step.15 This choice is justified by the relevant role played by BChE in human metabolism,16 the many reported resolved structures in complex with different ligands, the large amount of published enzymatic data, and the observation that, despite the general interest in the prediction of ADME profiles, most computational studies focused on the rational design of BChE inhibitors,17 while the hydrolytic activity of BChE on medicinal esters was investigated in silico based only on a few studies analyzing the activity of BChE, and some of its mutated forms, toward heroin or cocaine.18 Such a lack may be explained by considering that most available enzymatic data are indeed kinetic parameters which depend on both molecular recognition and transition state activation. Hence, canonical docking simulations are unable to take the free energy of the transition state into account. Even though the binding space concept is still focused on the initial noncovalent complexes, one may argue that simulating the substrate mobility can prove successful in accounting for the dynamic processes which prelude to enzyme activation. Not to mention that the influence of enzyme or receptor activation is a general occurrence in pharmacological assays which does not have negligible effects even in affinity values. Hence, testing the binding space concept in such a challenging situation can reveal whether such an approach can also play a beneficial role to solve this well-known issue which often limits the predictive power of standard docking simulations. Finally, the present study continues our ongoing effort to investigate structure−function relationships of human esterases,19−21 the metabolizing activity of which was scantly analyzed in silico although hydrolytic reactions represent more than 10% of the human metabolism, as recently reviewed.22

different representative protein conformations as generated by Molecular Dynamics or Monte Carlo simulations.6 Regardless of how protein flexibility is taken into account, docking simulations generate a set of possible solutions (the socalled ligand poses) among which attention is usually focused only on the best one as selected by docking scores, visual analysis, or comparison with experimental data.7 Nonetheless, countless studies emphasized that bound ligands experience a marked mobility within the binding site, which reflects in slightly different binding modes all of which might contribute to the observed affinity. A reliable investigation of such a mobility could require very demanding simulations,8 although one may figure out that simultaneously considering more ligand poses (as generated by canonical docking simulations) would be a simple way to indirectly tackle the issue of ligand mobility. The multiple poses and the relative docking scores can be analyzed by utilizing the same concepts already used in the property space analysis, thus leading to the definition of the “binding space”. In detail, the docking scores can be analyzed by computing the score averages which would be more informative than the usually utilized best score values as well as the score ranges which, as already seen in property space analyses, should be a fruitful way to convey the dynamic information encoded by such a binding space.9,10 In analogy with property space analyses in which property sensitivity encoded the capacity of a given molecule to vary its conformer-dependent properties in relation to conformational changes,11 here binding sensitivity can parametrize the capacity of a given ligand to vary its docking scores (namely to assume alternative binding modes) by changing the arrangement within the binding cavity. As schematized in Figure 1, one may figure out three major applications of the binding space since it can be applied to (1) a set of best poses as generated within multiple protein structures by ensemble docking simulations or to (2) a set of multiple poses as generated for a given ligand within a single protein structure. Third, one may synergistically use the binding space concept by collecting all computed poses within different protein structures so developing a sort of ensemble binding space. In the present initial study, the potential of the binding space concept combined with ensemble docking simulations was assessed by applying it to the prediction of the hydrolytic metabolism by butyrylcholinesterase (BChE, EC 3.1.1.8) a serine hydrolase involved in the hydrolysis of neutral and positively charged esters12,13 as well as of aryl acyl amides and thioesters even though with an efficiency much lower than that

2. METHODS 2.1. BChE Structure Selection and Optimization. Since several human BChE (hBChE) structures have been resolved by X-ray crystallography (40 hBChE structures deposited with the PDB by Jan 1, 2017), the first step in the study involved the selection of a set of suitable structures for docking simulations. The resolved BChE structures can be roughly subdivided into aged and nonaged enzymes, since the formation of covalent 1692

DOI: 10.1021/acs.jcim.7b00121 J. Chem. Inf. Model. 2017, 57, 1691−1702

Article

Journal of Chemical Information and Modeling

performed by PLANTS with default settings and without geometric constraints, focusing the searches within a 10 Å radius spheres around the catalytic Ser198 residue, thus including the entire catalytic cavity. For each ligand and each BChE structure, PLANTS generated 10 poses which were ranked using the ChemPLP scoring function with the speed equal to 1.30 In order to avoid redundant docking results, the generated poses are clustered, and two poses are considered as nonredundant if their rmsd is greater than 2.0 Å. When considering the stochastic nature of the PLANTS search algorithm, docking calculations on the best performing 4XII structure (see below) were repeated five times with the same parameters to assess the reproducibility of the obtained results. With a view to investigating the role of some key parameters, the simulations on the 4XII structure were also repeated by changing the clustering criterion (1 and 3 Å) and increasing the number of generated poses (up to 50). The results of such a calibration study were reported in Table S5 and Figure S1 in the Supporting Information. The resulting complexes were then analyzed by using a new release of ReScore+ which, when compared to the previously published release, shows three major new features.31 First, the script now allows on-the-fly energy minimization of the complexes by interfacing the script with NAMD.32 In detail, the minimization is focused on a user-defined radius sphere (here by default equal to 10 Å) around the bound ligand, while the atoms outside this sphere are kept fixed. Second, the generated (and minimized) complexes can be saved into single files or into suitable databases. Third, ReScore+ calculates a new scoring function (the Contacts score) which corresponds to the number of residues surrounding the bound ligand within an user-defined cutoff (set by default at 3.0 Å). ReScore+ also calculates this score as normalized by the number of heavy atoms and by the molecular weight of the ligand. In detail, all computed poses were submitted to rescore procedures with and without energy minimization. 2.4. Binding Space Analysis and Correlative Studies. As mentioned in the Introduction and detailed below, the primary objective of the study was to assess the beneficial effect of simultaneously considering more ligand poses when analyzing docking results. In all considered applications and similarly to what was already described for property spaces, the resulting binding space can be parametrized considering (a) the mean scores, (b) the score ranges, and (c) the score sensitivity values. While the calculation of the first two parameters is simple and intuitive, the calculation of the score sensitivity is more complex and deserves a detailed description. For the calculation of property sensitivity, two formulas have been proposed, namely the ratio between property range and the root-mean-square deviation (rmsd) as computed by comparing the monitored conformers as well as the ratio between property range and number of flexible torsions. The latter is easily computable also here even though it provides sensitivity values which are unrelated to the simulated ligand mobility within the binding site and hence are not influenced by the explored binding space. The parametrization of ligand mobility can be done by averaging the rmsd values as computed by comparing each computed pose with the best score pose. Such a calculation is performed without symmetry correction and can be done with or without the ligand superimposition (see eqs A−C below). Accordingly, the two sets of rmsd values encode different information since the values calculated after superimposition

adducts with organophosphates (OPs) induces a well-known spontaneous time-dependent process, called aging, during which the OP-BChE conjugate is dealkylated and the active gorge undergoes conformational changes thus resulting in an irreversible enzyme inhibition.23 Consequently, attention was focused here on nonaged structures, also avoiding mutants and highly glycosylated forms. Among the remaining nonaged structures, the selection was further restricted to those having resolution lesser than 3.0 Å. In this way, the analysis involved five nonaged structures of the human BChE enzyme in complex with different ligands including both the covalent complex (as in the case of 2WIJ covalently inhibited by the Tabun analogue TA5 ethyl hydrogen ethylamido phosphate)24 and noncovalent complexes (i.e., 3O9M, 4BDS, 4TPK, 4XII, in complex with cocaine, tacrine, N-((1-(2,3-dihydro-1H-inden-2-yl)piperidin-3yl)methyl)-N-(2-methoxyethyl)-2-naphthamide, and N-((1(2,3-dihydro-1H-inden-2-yl)piperidin-3-yl)methyl)-8-hydroxyN-(2-methoxyethyl)-5-nitroquinoline-7-carboxamide, respectively).25−28 The selected protein structures were prepared by deleting ions, water molecules, and crystallization additives and by adding hydrogen atoms. When more monomers were included in the PDB file, attention was focused on that with the highest percentage of residues falling in the allowed regions of the Ramachandran plot. To be compatible with physiological pH, Asp, Glu, Lys, and Arg residues were considered ionized, while His and Cys residues are neutral by default. The so completed protein structures were minimized by fixing the backbone atoms to preserve the resolved folding. Before performing docking simulations, the bound ligands were removed, and the 2WIJ structure was updated by deleting the covalently bound TA5 analogue, so restoring the free catalytic Ser198 residue. 2.2. Ligand Data Set. As stated above, an impressive amount of enzymatic data (mostly expressed as substrate’s t1/2 values) is available in literature and was here converted into the corresponding elimination rate constants by using the wellknown equation k = ln 2/t1/2. All these literature data concerned hBChE but differed greatly in the experimental conditions used, including temperature, pH, and matrix. To overcome this problem as well as possible, a ligand data set of 75 known BChE ester substrates (see Table S1, Supporting Information) was collected from the literature by selecting molecules whose stability was measured under physiological conditions (37 °C and pH = 7.4) and excluding substrates having more than one hydrolyzable group. Since the reported substrates differ in the nature of the matrix used in the in vitro experiments, attention was focused on substrates tested in 80% human plasma, which represents the most common matrix found in literature data. All used metabolic data were obtained by following the same HPLC-based experimental procedure, and most data were produced by Bundgaard and co-workers as well as by Larsen and co-workers (see Table S1, Supporting Information) thus minimizing the interlaboratory variation. The collected substrates were simulated considering their most probable ionization form at physiological pH. After a preliminary PM6 semiempirical optimization, their conformational behavior was investigated by a clustered Monte Carlo procedure (as implemented in the VEGA suite of programs29) which generated 1000 minimized conformers by randomly rotating the flexible rotors. The lowest energy structures so obtained were then used in the following docking calculations. 2.3. Docking Simulations and Rescoring by ReScore+. Docking simulations on the selected hBChE structures were 1693

DOI: 10.1021/acs.jcim.7b00121 J. Chem. Inf. Model. 2017, 57, 1691−1702

Article

Journal of Chemical Information and Modeling encodes only conformational differences between the monitored poses, while the values obtained without superimposition encode conformational plus roto-translational differences between poses, thus suggesting that the difference between them encodes the sole roto-translational differences, namely the ligand mobility within the binding pocket. SensitivityR =

Scoremax − Scoremin NRotors

SensitivityS/U =

(A)

Scoremax − Scoremin n

∑i = 2 (xiS/U − x1S/U)2

(B)

n−1

SensitivityM =

Scoremax − Scoremin n ∑i = 2 (xiU

− x1U)2

n−1

n



∑i = 2 (xiS − x1S)2 n−1

Figure 2. Two-dimensional scheme illustrating the main residues lining the catalytic pocket. The following color legend was used: apolar residues = green; H-bonding residues = blue (the residues involved in the catalytic machinery are filled in azure); charged residues = red.

(C)

The above-reported equations define the four types of score sensitivity which can be thus computed. In detail, the score range (namely, Scoremax − Scoremin) can be subdivided by (1) the number of rotors (NRotors, SensitivityR, see eq A); (2) superimposed rmsd (SensitivityS) and (3) (eq B) unsuperimposed rmsd (SensitivityU see eq B), where x1S/U and xiS/U correspond to the coordinates of the best pose and of the i pose, respectively, as obtained with (S) or without (U) ligand superimposition, and n is the number of generated poses; and (4) ligand mobility, namely the difference between the two so computed rmsd values (SensitivityM, see eq C). All these sensitivity types will be evaluated in the following correlative studies. Correlative analyses involved a set of ligand-based and docking-based descriptors as listed in Table S2. In detail, the correlative equations were generated by including five unrelated (as tested by a variance inflation factor, VIF, < 5) independent variables as selected by a script of the VEGA suite of programs which exhaustively combines all available descriptors in order to produce fully reproducible models. The generated equations were evaluated by computing a set of well-known statistics such as the determination coefficient (r2), the leave-one-out crossvalidated r2 (q2), the standard error (SE), and the Fisher ratio (F). Since the q2 parameter tends to overvalue the predictive power,33 the best performing model (eq 56 (Table 4)), was also assessed by subdividing the data set (n = 75) in training (n = 50) and test (n = 25) sets, by recalculating the equation using only the training set and finally by using this to predict the activity of the test set. This task was repeated 100 times by using an ad hoc VEGA script to minimize the randomness effect in the collection of training and test sets.

oxyanion hole whose NH backbone groups interact with the substrate carbonyl oxygen polarizing the carbonyl function, further facilitating the nucleophilic attack and then stabilizing the resulting oxyanion.34 The vicinal Gly117, which may participate in reinforcing the oxyanion hole, can also have a direct role in catalysis as suggested by some mutants (e.g., G117H) which show remarkable efficiency in the hydrolysis of organophosphates.35 The acyl binding site is directed toward the gorge entrance and is lined with rather hydrophobic loops such as the residues between Phe225 and Ala232 or between Pro281 and Val1288. The abundance of apolar residues in the acyl loops may explain why hBChE can recognize quite large and hydrophobic acyl groups bearing both aliphatic and aromatic moieties. At the bottom of the catalytic gorge, the choline binding site appears to be a larger, more flexible and markedly less apolar cavity. The greater polarity is ascribable to H-bonding residues (e.g., Ser79, Tyr128, and Tyr332) as well as to negatively charged side chains (e.g., Asp70 and Glu197). These residues, together with the numerous aromatic residues lining the choline binding cavity (e.g., the highly conserved Trp82), play a role in determining the known preference of the enzyme for cationic esters. Figure 3 compares the complexes for a poor (Figure 3A, ketoprofen methyl ester) and a good (Figure 3B, metronidazole benzoate ester) substrate as generated for a well performing crystal structure (PDB Id: 4XXI, see below) and reveals that both compounds can be suitably accommodated within the catalytic cavity. Nevertheless, the analysis of all generated poses reveals clear differences in line with their different stability. Indeed, ketoprofen methyl ester orders its labile function in a pose conducive to catalysis, namely with its carbonyl carbon atom reasonably close to Ser198, only in one of the analyzed complexes (as shown in Figure 3A), while in the remaining complexes it assumes very similar poses which are stabilized by an extended set of hydrophobic contacts but in which the labile function remains rather distant from Ser198. In contrast, metronidazole benzoate ester shows poses conducive to catalysis in all analyzed complexes. While constantly approaching the ester group to Ser198, the computed poses of metronidazole benzoate ester reveal a degree of diversity even greater than that observed for the ketoprofen derivative, thus

3. RESULTS 3.1. Analysis of Some Modeled Complexes. Even though the present study is focused on investigating the role of binding space parameters in correlative analyses (see below), some key features of the enzyme cavity as well as of the generated complexes are worthy of mention here. As shown in Figure 2, the hBChE catalytic cavity can be subdivided into three regions. The central acylation site comprises the key residues involved in the enzymatic mechanism. On the one hand, one finds the highly conserved catalytic triad in which Glu325 and His438 polarize the Ser198 hydroxyl group, thus favoring its nucleophilic attack on the substrate carbonyl carbon. On the other hand, Gly115 and Gly116 make up the 1694

DOI: 10.1021/acs.jcim.7b00121 J. Chem. Inf. Model. 2017, 57, 1691−1702

Article

Journal of Chemical Information and Modeling

involves the relative abundance of poses in which the labile function conveniently contacts Ser198. This finding lays the ground for the binding space concept by indicating that the analysis of more generated poses can convey a richer information than that derived by considering only the best pose. 3.2. Correlative Studies. 3.2.1. Ligand-Based Correlations. Since the reported correlative studies also involved ligand-based descriptors, a preliminary analysis was performed by considering only the properties of the ligands. This analysis involved both the optimized property values as computed by focusing on the lowest energy conformation (namely the conformations used in the following docking simulations) and the property space descriptors as obtained by exploring all the minimized geometries generated by Monte Carlo procedures. This initial analysis had two primary objectives. On the one hand, it represented a novel application to investigate the contributing role of property space descriptors when compared to single property values. On the other hand, it allowed a precise evaluation of the statistical reliability reachable without docking simulations; hence, their results can be seen as a benchmark to assess the concrete contribution of docking results in the equations reported below. The models reported here were developed by using a set of ligand-based descriptors including both conformer-independent values, which remain constant in all performed analyses, and conformer-dependent descriptors for which the following parameters were considered: (a) the property values as computed for the lowest energy structure; (b) the property means as derived by averaging all the geometries generated by Monte Carlo procedures; (c) the property ranges as computed by considering all the geometries generated by Monte Carlo procedures; (d) the property sensitivity values by utilizing both proposed formulas (see under Methods). Equation 1 (Table 1) reveals that the simple property values of the lowest energy structures do not afford truly satisfactory results. This fair equation emphasizes the key role of substrate polarity and suggests that a substrate has to meet precise size criteria as indicated by the concomitant presence of two unrelated size descriptors (radius of gyration and number of bonds) with opposite sign. Notably, replacing the single property values with the property averages does not afford clear improvements, and the best equation obtained is almost identical to eq 1 (Table 1) (r2 = 0.56, equation not reported). This result underlines that property averages, while obviously more informative than a single property value, do not convey information concerning the dynamic behavior of a given molecule, and thus their concrete contribution to the correlative analysis is not significantly greater than a single optimized property value. In contrast, the inclusion of property ranges leads to a significant improvement, as seen in eq 2 (Table 1) which confirms the key roles played by lipophilicity and molecular size of a given substrate even though the presence of range values related to geometric descriptors highlights the major role played by substrate flexibility as an indirect measure of entropic factors. The inclusion of the property sensitivity values affords only marginal improvement as seen in eq 3 (Table 1) which is very similar to eq 2 (Table 1) and further underlines the critical role of ligand flexibility. Even though the development of ligand-based models is not the primary objective of this study, its results emphasize the noteworthy fruitfulness of property space descriptors, which

Figure 3. Major interactions stabilizing the putative complexes for a poor (A, ketoprofen methyl ester) and a good substrate (B, metronidazole benzoate ester) with 4XXI as well as for the same good substrate (C, metronidazole benzoate ester) with 3O9M. The reported poses were chosen considering both the primary ChemPLP score function and the closeness between the labile group and Ser198.

suggesting that the substrate can assume different and catalytically conducive binding modes. In detail, Figure 3A shows that the best pose for the ketoprofen ester is stabilized by a rich set of π−π stacking contacts between the substrate and Trp82, Phe329, Tyr332, and Tyr440, while metronidazole benzoate ester (Figure 3B) is seen to stabilize a richer set of contacts including both hydrophobic (such as between the ligand’s imidazole ring and Trp82) and H-bonds (such as between the ligand’s nitro group and Thr120). Finally, Figure 3C reports the best complex as computed for the same good substrate (metronidazole benzoate ester) but within a poorly performing BChE structure (PDB Id: 3O9M, see below). The BChE structure is clearly able to accommodate the substrate even though a comparison with Figure 3B reveals that its labile function assumes a less optimal catalytically effective pose, due to a greater distance with Ser198; this negative feature is seen in all computed poses, and in three cases out of 10 the substrate is even unable to assume a pose conducive to hydrolysis. Collectively, these examples suggest that the major difference between poor and good substrates as well as between reliable and less reliable BChE structures 1695

DOI: 10.1021/acs.jcim.7b00121 J. Chem. Inf. Model. 2017, 57, 1691−1702

Article

Journal of Chemical Information and Modeling

Table 1. Correlative Equations As Generated by Using Ligand-Based Descriptors Only (Eqs 1−3) and Docking Parameters Based on the Best Poses (Eqs 4−14)a no.

PDB

descriptors

1

ligand

single property values

2

ligand

3

ligand

mean properties + ranges eq 2 + sensR

4

2WIJ

best scores

Y

5

2WIJ

best scores

N

6

3O9M

best scores

Y

7

3O9M

best scores

N

8

4BDS

best scores

Y

9

4BDS

best scores

N

10 11 12

4TPK 4TPK 4XII

best scores best scores best scores

Y N Y

13

4XII

best scores

N

14

all

mean scores

Y

15

all

mean scores

N

16

all

eq 14 + score range

Y

17

all

eq 15 + score range

N

18

all

eq 16 + SensR

Y

19

all

eq 17 + SensR

N

a

equation, pK =

r2

q2

SE

F

−3.90 + 0.39 HBA − 0.18 LogPMLP + 0.054 Impropers + 0.50 Rgyration − 0.051 Bonds −2.23 + 2.10 Rgyration_range + 0.28 HBTot − 0.36 Lipole −0.019 SAS_range − 0.092 Torsions −2.20 + 0.27 HBT + 4.55 Rgyration_sensR − 0.41 Lipole −0.070 Torsions − 0.048 SAS_sens 1.65 + 0.28 HBA − 0.24 LogPMLP + 10.26 ChemPLPW − 0.043 Bonds + 0.013 ElectDD 0.48 + 0.33 HBA − 0.068 MLPInS + 7.94 ChemPLPW + 0.029 APBS − 0.047 Bonds 4.96 + 0.21 HBA − 0.22 LogPMLP − 2.37 ContactsH + 9.69 ChemPLPW − 0.0067 SAS −0.40 + 0.29 HBA − 0.22 LogPMLP − 37.67 ContactsW + 0.022 APBScore − 0.027 Bonds −5.31 + 0.42 HBA − 0.24 LogPMLP + 0.94 ContactsH + 0.75 Rgyration − 0.051 Bonds −4.53 + 0.36 HBA − 0.18 LogPMLP + 0.092 Unsaturation −0.030 CHEMPLP − 0.051 Bonds 4.69 + 0.25 HBA − 0.23 LogPMLP + 12.58 PLP95W − 0.0035 Sav + 0.0029 Elect 1.58 + 0.32 HBA − 0.59 MLPInS + 9.22 PLP95W + 0.039 APBS − 0.049 Atoms −0.46 + 0.31 HBA − 0.26 LogPMLP + 6.40 PLP95W + 0.54 Rgyration − 0.060 Bonds −1.35 + 0.36 HBA − 0.17 LogPMLP − 29.89 ContactsW + 0.025 APBScore − 0.025 Bonds 2.98 + 0.32 HBA − 47.29 ContactW − 1.41 MLPInS + 0.0034 Elect − 0.75 XscoreHS −0.30 − 45.79 ContactsW + 0.30 HBA − 0.50 MLPInS + 0.035 APBScore − 0.032 Bonds 3.45 + 0.34 HBA − 4.73 ContactsH + 0.039 PLP95_range + 0.12 MLPInS_range − 1.18 XscoreHS −2.57 + 0.45 HBA − 0.28 MLPInS − 0.18 Contacts + 0.029 APBScore + 0.55 MLPInS_range 2.37 + 0.39 HBA − 4.34 ContactsH + 0.034 PLP95_range + 0.64 MLPInS_sens − 1.04 XscoreHS −1.24 + 0.43 HBA − 0.90 XScoreHP_sens − 0.17 Contacts +0.052 APBS + 0.83 MLPInS_sens

0.56

0.49

0.79

17.58

0.68

0.63

0.66

30.26

0.69

0.64

0.65

31.43

0.61

0.51

0.74

21.70

0.63

0.55

0.72

23.14

0.60

0.52

0.75

20.75

0.61

0.53

0.73

21.88

0.57

0.48

0.78

17.80

0.57

0.50

0.77

18.39

0.61 0.65 0.58

0.52 0.58 0.49

0.74 0.70 0.77

21.68 25.98 18.97

0.59

0.49

0.76

19.58

0.61

0.52

0.75

20.51

0.64

0.56

0.71

24.43

0.63

0.56

0.73

23.19

0.67

0.61

0.68

27.63

0.64

0.58

0.71

24.12

0.69

0.63

0.66

29.66

Min

In all equations n = 75 and p < 0.0001 (Min: Y means minimized complexes, and N means nonminimized complexes).

The statistics obtained call for two relevant considerations. First and in all cases, the inclusion of docking scores leads to higher r2 values compared to eq 1 (Table 1), even though only five models show markedly enhanced statistics (i.e., r2 > 0.6), suggesting that the contribution of docking simulations is, on average, rather modest and depends on the BChE structure used with a r2 range of about 0.1. Second, the equations obtained by nonminimized complexes always show better statistics than those generated after minimization. This unexpected result may be explained by considering that some included knowledge-based score functions (such as the XScore values) are almost independent of complex minimization, while other functions (such as those computed by PLANTS) are optimized to evaluate the stability of complexes generated directly by the software and suffer when these complexes are modified by applying different force fields and/or computational algorithms. As a general rule, this result suggests that postdocking optimizations always enhance the structural quality of the complexes but do not necessarily increase the reliability of the resulting docking scores. As mentioned above, the use of multiple target structures allows the first application of the binding space concept as obtained by simultaneously considering the single best poses generated within different BChE structures. In detail, the binding space parameters utilized in the following analyses were calculated by excluding docking scores from the 4BDS structure

add relevant information related to the dynamic behavior of a given molecule. As stated above, these results also provide a robust touchstone to evaluate the contribution of docking simulations and, in particular, of the binding space descriptors proposed here. 3.2.2. Correlations Using Docking Scores from the Best Selected Poses. The first analysis of docking scores involved “canonical” predictive equations, as developed using docking scores computed only for the best generated poses, plus ligandbased descriptors derived from lowest energy conformations (eqs 4−13, Table 1). These equations will be compared to those obtained by including binding space descriptors (see below) to reveal their statistical contribution. They were also compared to eq 1 (Table 1) to assess the contribution of docking simulations. The reported equations (eqs 4−13, Table 1) confirm the beneficial role of both lipophilicity and an optimal molecular size. With regard to docking scores, eqs 4−13 (Table 1) frequently include the MLPInS score,19 which encodes for hydrophobic contacts and replaces the log P descriptor, as well as the score functions calculated by PLANTS (e.g., ChemPLP). The recurring presence of docking scores related to ionic interactions is in line with the well-known preference of the enzyme for positively charged substrates, while the inclusion in more equations of the Contacts scores demonstrates the efficacy of the proposed score function. 1696

DOI: 10.1021/acs.jcim.7b00121 J. Chem. Inf. Model. 2017, 57, 1691−1702

Article

Journal of Chemical Information and Modeling

Table 2. Correlative Equations As Generated by Averaging the Scores Computed for More Poses within Single Protein Structures (Eqs 20−29) Plus the Corresponding Score Ranges (Eqs 30−39)a Min

equation, pK =

r2

q2

SE

F

av 10 poses av 10 poses av 10 poses

Y N Y

0.66 0.67 0.60

0.60 0.61 0.53

0.69 0.68 0.74

26.80 27.95 21.36

3O9M 4BDS 4BDS

av 10 poses av 10 poses av 10 poses

N Y N

0.65 0.63 0.65

0.57 0.56 0.58

0.70 0.72 0.70

25.16 23.25 25.22

26 27

4TPK 4TPK

av 10 poses av 10 poses

Y N

0.66 0.67

0.60 0.60

0.69 0.69

26.81 26.81

28 29

4XII 4XII

av 10 poses av 10 poses

Y N

0.68 0.63

0.63 0.56

0.66 0.72

29.93 23.26

30

2WIJ

Y

0.68

0.61

0.67

28.67

31

2WIJ

0.70

0.65

0.64

32.71

32

3O9M

0.62

0.55

0.73

22.15

33

3O9M

0.67

0.62

0.67

28.59

34

4BDS

0.69

0.63

0.66

30.38

35

4BDS

0.65

0.57

0.70

25.43

36

4TPK

0.66

0.60

0.71

27.05

37

4TPK

0.68

0.62

0.67

29.70

38

4XII

0.71

0.66

0.63

33.97

39

4XII

eq 20 + ranges eq 21 + ranges eq 22 + ranges eq 23 + ranges eq 24 + ranges eq 25 + ranges eq 26 + ranges eq 27 + ranges eq 28 + ranges eq 29 + ranges

3.26 + 0.33 HBA − 7.43 ContactsH − 0.064 MLPInS − 0.12 Torsions + 0.0057 CVFF 2.55 − 6.80 ContactsH + 0.29 HBA − 0.64 MLPInS − 0.083 Torsions + 0.0051 Elect 0.92 + 0.23 HBA − 0.056 MLPInS − 61.17 ContactsW + 0.069 Unsaturation − 0.046 Bonds 0.66 + 0.44 HBA − 74.68 ContactsW − 4.07 MLPInS − 0.096 Torsions + 0.0088 Elect 2.32 + 0.35 HBA − 90.51 ContactsW − 0.57 MLPInS − 0.11 Torsions + 0.0042 Elect 1.43 − 86.34 ContactsW + 0.37 HBA − 4.27 MLPInS − 0.11 Torsions − 0.0033 APBScore 6.52 − 93.55 ContactsW − 0.19 Lipole −3.09 MLPInS − 0.91 XscoreHSS + 0.0038 Elect 6.20 − 99.36 ContactsW − 0.20 Lipole − 2.23 MLPInS − 1.05 XscoreHSS − 0.021 CHEMPLP 2.58 − 9.96 ContactsH + 0.35 HBA − 0.38 MLPInS − 0.10 Torsions − 4.90 PLP95W 0.42 − 60.22 ContactsW + 0.38 HBA − 3.32 MLPInS + 0.0079 Elect − 0.088 HeavyAtoms −0.54 − 63.85 ContactsW − 0.14 LogPMLP + 2.04 PLP95H_range − 2.59 XscoreHS_range + 0.24 HBA 2.20 − 6.55 − 0.21 LogPMLP − 25.71 ChemPLPW_range − 1.97 XscoreHS_range + 2.28 PLP95H_range 2.82 + 0.23 HBA − 0.23 LogPMLP − 62.85 ContactsW − 20.84 ChemPLPW_range − 0.0026 Sav −1.63 + 0.48 HBA − 20.80 PLPW_range −0.20 MLPInS_range −0.23 Contacts + 0.0052 Elect 1.02 + 0.47 HBA − 70.51 ContactsW − 0.10 Torsions − 4.04 MLPInS − 0.016 CVFF_range 2.74 + 0.30 HBA − 7.88 ContactsH − 0.57 MLPInS − 0.12 Torsions − 0.023 ElectDD_range 5.64 − 4.87 ContactsH + 0.21 HBA − 2.61 MLPInS − 29.62 ContactsW_range − 0.94 XscoreHS 7.32−106.02 ContactsW − 0.21 Lipole − 1.69 MLPInS − 0.069 MLPInS _range − 0.98 XscoreHS 7.05 − 112.60 ContactsW − 0.35 Lipole − 0.11 MLPInS_range − 28.92 ContactsW_range − 0.82 XscoreHS 0.45 + 0.45 HBA − 23.55 ChemPLPW_range + 0.039 APBS_range + 0.0083 Elect − 0.77 XscoreHS

0.64

0.57

0.71

24.12

no.

PDB

20 21 22

2WIJ 2WIJ 3O9M

23 24 25

a

scores

N Y N Y N Y N Y N

In all equations n = 75 and p < 0.0001 (Min: Y means minimized complexes, and N means nonminimized complexes).

Figure 4. Effect of the progressive introduction of the binding space parameters (i.e., mean scores, score ranges, and sensitivities) on the statistical goodness of the so developed models as parametrized by the r2 values and derived by using the five simulated BChE structures plus the overall means and by considering minimized or nonminimized complexes.

(hereinafter the BChE structures are named by using the corresponding PDB Id codes) since they afforded really unsatisfactory models (see eqs 8 and 9, Table 1). The two

equations obtained by using the average scores (eqs 14 and 15, Table 1) show statistics comparable with those of the best performing previous models. This finding suggests that the 1697

DOI: 10.1021/acs.jcim.7b00121 J. Chem. Inf. Model. 2017, 57, 1691−1702

Article

Journal of Chemical Information and Modeling

Table 3. Correlative Equations As Generated by Introducing the Score Sensitivity Values (sensR) in the Previous Modelsa no.

PDB

40

2WIJ

41

2WIJ

42

3O9M

43

3O9M

44

4BDS

45

4BDS

46

4TPK

47

4TPK

48

4XII

49

4XII

a

scores eq 30 + sensR eq 31 + sensR eq 32 + sensR eq 33 + sensR eq 34 + sensR eq 35 + sensR eq 36 + sensR eq 37 + sensR eq 38 + sensR eq 39 + sensR

Min

equation, pK =

r2

q2

SE

F

Y

−0.44 − 0.19 LogPMLP + 0.25 HBA − 89.19 ChemPLPW_sens + 0.083 PLP95_range − 0.22 Contacts 1.95 − 0.059 MLPInS + 2.34 XscoreHS_range − 0.32 ChemPLP_sens + 0.091 PLP95_range − 6.98 ContactsH 8.95 − 0.25 LogPMLP − 71.75 ContactsW + 0.19 HBA − 77.01 ChemPLPW_sens − 0.99 Vdiam −2.38 + 0.54 HBA − 0.48 PLP_sens + 1.02 MLPInS_sens − 0.17 Contacts + 0.0072 Elect

0.70

0.65

0.65

33.22

0.72

0.67

0.62

36.01

0.64

0.58

0.71

24.43

0.71

0.66

0.64

33.11

−1.77 + 0.032 PSA − 0.075 Torsions −171.02 ContactsW_sens − 3.54 MLPInS − 0.027 CVFF_range 0.90 + 0.24 HBA − 0.49 MLPInS − 73.21 PLPW_sens − 0.16 Mobility − 0.26 Contacts

0.68

0.62

0.67

28.70

0.66

0.60

0.69

26.37

0.68

0.62

0.67

29.34

0.71

0.65

0.64

33.01

0.74

0.69

0.61

38.64

0.69

0.64

0.67

30.86

N Y N Y N Y N Y N

16.31 − 0.33 Lipole − 239.95 ContactsW_sens + 17.82 PLP95W + 7.30 XscoreHS_sens − 2.05 XscoreHS −3.84 − 180.37 ContactsW − 0.20 LogPMLP − 6.49 ChemPLPH_sens −1.80 PLP95H + 0.089 PLP_range −0.80 + 0.52 HBA − 73.17 PLP95W_sens + 0.13 MLPInS_range − 0.32 Contacts + 0.40 Contacts_sens 0.11 + 0.47 HBA − 93.04 ChemPLPW_sens −0.74 XscoreHP + 0.059 APBS + 0.25 APBS_sens

In all equations n = 75 and p < 0.0001 (Min: Y means minimized complexes, and N means nonminimized complexes).

greater than those seen in eqs 4−13 (Table 1), suggesting that the use of average scores might increase the influence of postdocking minimizations. The included scores are in line with those included in previous models (Table 1), confirming the key role of ligand lipophilicity as assessed by the frequent inclusion of the MLPInS score. The utilized scores further emphasize the beneficial role of the proposed Contacts scores which are indeed included in all 10 developed models, suggesting that such scores are particularly meaningful when used in the framework of the binding space concept. Along with the mean scores discussed above, the next analysis also includes the score ranges as computed for each ligand within each protein structure. Equations 30−39 (Table 2) highlight that the inclusion of the range values has an overall beneficial effect since it improves statistics in 8 cases out of 10 with 9 equations showing a r2 ≥ 0.65 (see Figure 4). The minimized values perform better in two cases, and the difference between the equations generated using minimized and nonminimized complexes are on average greater than those derived using best scores or mean values (see Table S4); this suggests that the reliability of the score ranges is also influenced by postdocking minimization. Regardless of the score ranges included, there is a major difference between these last equations (eqs 30−39) (Table 2) and those obtained by including the ranges as computed considering the best scores only (eqs 16−19, Table 1). In eqs 30−39 (Table 2), the score ranges show almost always a positive effect on BChE enzymatic activity, probably as they encode the substrate capacity to assume alternative (yet catalytically conducive) binding modes, thus minimizing the entropic penalty that a ligand has to pay during the recognition event. In contrast, the score ranges always play a negative role in eqs 16−19 (Table 1) probably as they express the geometrical constraints experienced by a ligand when confronted with protein conformational changes as simulated by the different BChE structures. In other words and considering that an extremity of these ranges always corresponds to the score of the best complex, one may infer that the range in eqs 16−19 (Table 1) describes how much a given ligand has to worsen its binding mode to match protein flexibility.

reliability of docking scores, as obtained by a reliable target structure (e.g., those computed from nonminimized 4TPK complexes, eq 11 (Table 1)), is not affected when combined with other less effective docking scores. This also suggests that the inclusion of these average values can be seen as a straightforward procedure to reach the highest achievable predictive power while avoiding systematic analyses of the docking results obtained for each protein structure. More importantly, Table 1 reveals that the inclusion of score ranges and sensitivity values (eqs 16−19) markedly enhances the resulting statistics especially concerning the nonminimized complexes. One may suppose that ranges and sensitivity values here encode for the often disregarded entropic factors, allowing a suitable parametrization of the ligand adaptability within the simulated binding cavities. Taken together, these initial results emphasize the potentialities of the binding space parameters and invite assessment in more complex situations as done in the following sections. 3.2.3. Correlations Using Docking Scores from Multiple Ligand Poses. As mentioned in the Introduction and schematized in Figure 1, the second and probably most innovative application of the binding space concept does not focus on the best poses but considers all the poses computed for a given ligand within a single target structure. In this way, eqs 20−29 (Table 2) are computed by considering the score averages in lieu of the best score values (as seen with eqs 4−13, Table 1) and afford evidence that the inclusion of such average values largely improves the statistics, with seven models showing an r2 value ≥0.65 (as also displayed in Figure 4). Notably, these results suggest that the reliability of the scores computed for the best pose are not only unaffected when combined with those of the other poses (as seen in Table 1), but, more importantly, the resulting score averages perform markedly better, thus suggesting that most of the “non-best” poses represent in fact alternative binding modes experienced by the ligand due to its mobility within the binding site and the parametrization of which beneficially contributes to the reliability of the resulting score averages. Again, one finds here that the nonminimized complexes provide better models than the minimized ones; this is the case in 4 cases out of 5, and the differences between them are 1698

DOI: 10.1021/acs.jcim.7b00121 J. Chem. Inf. Model. 2017, 57, 1691−1702

Article

Journal of Chemical Information and Modeling

Figure 4 and Table S4 compare the coefficients of determination of all the hitherto reported docking-based equations and allows for some key considerations. On average, the inclusion of the mean scores instead of the single best scores affords the largest increase in r2. The beneficial effect of this very simple procedure highlights the potential of the here proposed binding space concept and emphasizes the crucial role of the often neglected alternative binding modes which contribute beneficially to the monitored enzymatic activity. Such a result, which clearly contrasts with the popular axiom “one ligand, one (best) pose, one (best) complex”, suggests that the inclusion of a larger number of poses can represent a convenient approach to account for the adaptability process which a given ligand experiences within a binding cavity. Clearly, such an approach cannot simulate protein flexibility, but it proved fruitful in simulating the ligand mobility, thus unveiling the pivotal role it can play in correlation analyses. The binding space descriptors computed here are obviously affected by the user-defined number of considered/generated poses. However, since this number is equal for all docked ligands, such an arbitrary choice equally influences all monitored parameters. Although to a lesser extent, range and sensitivity (in particular sensR) values also showed beneficial effects on the r2 values, the sum of which corresponds to that exerted by average values. These encouraging results suggest that the degree of mobility which a given ligand retains within the binding cavity (score ranges) as well as the ease with which a ligand explores such a mobility (score sensitivity) plays a key role in ligand recognition and contributes largely to predictive analyses. As already anticipated above, Figure 4 and Table S4 confirm that the inclusion of binding space descriptors progressively enlarges the difference between minimized and nonminimized complexes, with the latter increasing their statistical reliability. Nonetheless, the best equations (namely with r2 ≥ 0.7) were obtained equally from both minimized and nonminimized complexes thus suggesting that the difference between them is in fact not so significant, as also confirmed by the average values computed for each target proteins. The reported target-specific r2 averages confirm the general positive effect elicited by the binding space parameters and emphasize the marked differences between the simulated BChE structures. Notably, there is no relation between the r2 values obtained by using the single best scores and those afforded by including the binding space descriptors, thus suggesting that the protein features which influence the score reliability are different from those which affect the relevance of the binding space descriptors. 3.2.4. Correlations Using Docking Scores from Multiple Ligand Poses and Multiple Target Structures. The last application of the binding space concept synergistically combines the scores afforded from multiple poses as computed within multiple protein structures. This analysis involved the same binding space descriptors as previously utilized. In detail, the overall score averages are computed by averaging the previously derived target-specific mean scores, while the ranges are derived by considering the minimum and maximum score values as obtained in all simulated BChE structures. Accordingly, the four sensitivity ratios are calculated by using the derived overall ranges and are simultaneously considered when developing the new models. Equations 50 and 51 (Table 4) reveal that the inclusion of overall score averages allows the generation of models showing

The last analysis involved the inclusion of the sensitivity values which here are computed by using all four formulas as described under Methods. With a view to revealing the specific effect of the four types of sensitivity values, the equations were generated by including these sensitivity values separately. In this way, such an analysis was meant both to assess the contribution of the sensitivity values and to reveal which sensitivity ratio performs best. As a rule, Table 3 (i.e., eqs 40− 49 with sensR) and Table S3 (Supporting Information, eqs S1− S10 with sensS, eqs S11−S20 with sensu, and eqs S21−S30 with sensM) reveal that the inclusion of the sensitivity values induces marginal improvements in the resulting statistics as confirmed by the fact that in 11 out of 40 equations the included sensitivity values do not affect the model statistics, and in 7 models they even worsen the resulting models. Figure 4 and Table S4 summarize the statistical effects (as encoded by r2 values) exerted by the monitored binding space descriptors and reveal that, among the four sensitivity ratios, the values obtained by dividing the ranges by the number of rotors (Table 3, sensR) always yield the best equations and induce a statistical improvement comparable on average to that already gained by the inclusion of range values. The other three sensitivity ratios afforded very modest and similar statistical improvement as exemplified by the sensM values, the inclusion of which does not increase at all the resulting statistics. Even though the sensR values are independent of the actual ligand mobility as monitored by the generated poses, the remarkable results obtained by their inclusion are in agreement with those already reported for the property space analyses.10 This confirms that such kind of sensitivity ratio, albeit seemingly less accurate, represents an optimal balance between information richness, intuitiveness, and ease of calculation. The included sensitivity descriptors are in line with the score and range values found in previous models (Table 2) and underline the key role of both the PLANTS score functions and the scores encoding hydrophobic interactions (e.g., MLPINS and XscoreHS values). Notably, these equations further confirm the beneficial role of the here proposed Contacts scores which indeed are included in 33 out of the 40 reported models. Similarly to what was already seen for range values, the sensitivity descriptors almost always show a positive contribution to the BChE enzymatic activity; this fact confirms that the ligand’s capacity to adapt its binding modes to an enzyme cavity with limited conformational and/or rototranslational freedom contributes positively to ligand recognition. Interestingly, the computed rmsd averages, although less well adapted to calculate sensitivity ratios (as seen above), are included in more developed equations thus suggesting that they can play a role as such. In more details, the rmsd mean as computed after ligand superimposition appears in eqs S17 and S18 (Table S3) and reveals a deceasing effect on the predicted enzymatic activity; this influence seem reasonable as it codes for the conformational entropy paid by a ligand during the recognition process. Again, the mean mobility is included in eqs S13, S14, and S24 and shows a positive effect on the enzymatic activity thus confirming that the ligand capacity to assume more (catalytically conducive) binding modes contribute positively to the predicted enzymatic activity. Finally, the combination of different sensitivity ratios in the same equation does not result in better statistics apart from the minimized 3O9M complexes for which the simultaneous inclusion of sensR and sensU values leads to a modest increase in r2 (eq S31, Table S3). 1699

DOI: 10.1021/acs.jcim.7b00121 J. Chem. Inf. Model. 2017, 57, 1691−1702

Article

Journal of Chemical Information and Modeling Table 4. Correlative Equations As Generated by Considering Multiple Poses in Multiple Protein Structuresa no.

PDB

50

all

51

a

scores

r2

q2

F

Min

equation, pK =

mean

Y

0.70

0.64

0.65

31.67

all

mean

N

0.71

0.64

0.65

32.30

52

all

Y

0.70

0.63

0.66

31.50

53

all

0.71

0.65

0.64

32.95

54

all

0.72

0.67

0.63

35.07

55

all

0.74

0.70

0.60

39.13

56

all

0.76

0.71

0.59

34.94

57

all

0.59 (±0.03)

24.53 (±3.87)

all

−2.35 (±0.27) + 0.36 HBA (±0.03)+ 0.55 (±0.14) MLPInS_sensU + 0.026 (±0.01)APBS_range + 1.57 (±0.16) RMSD − 0.44 (±0.03) Contacts + 0.010 (±0.02) Elect pKexp = 0.96 pKcalc − 0.029

0.77 (±0.03)

58

eq 50 + range eq 51 + range eq 52 + sens all eq 53 + sens all eq 55 with 6 variables eq 56 training set eq 56 test set

4.12 + 0.39 HBA − 9.09 ContactsH − 0.14 Torsions −3.88 MLPInS + 0.0067 Elect 2.80 − 108.26 ContactsW + 0.31 HBA - 5.01 MLPInS − 0.11 Torsions + 0.0087 Elect 5.23 + 0.30 HBA − 11.51 ContactsH − 0.58 MLPInS − 0.14 Torsions + 27.22 ContactsW_range 1.54 + 0.39 HBA − 0.42 MLPInS_range −0.41 Contacts + 0.082 ChemPLP_range − 3.25 ChemPLPH_range 2.67 − 8.98 ContactsH − 0.069 MLPInS − 0.23 Lipole − 0.37 PLP95_sensM + 12.28 PLP95H_sensM −1.64 + 0.28 HBA + 0.57 MLPInS_sensU + 0.37 PLP_sensU + 1.50 RMSD - 0.53 Contacts −2.55 + 0.33 HBA + 0.55 MLPInS_sensU + 0.026 APBS_range + 1.61 RMSD - 0.43 Contacts + 0.010 Elect

0.70 (±0.06)

0.60 (±0.07)

55.63 (±13.7)

N Y N N N N

SE

In all equations n = 75 and p < 0.0001 (Min: Y means minimized complexes, and N nonminimized complexes).

variables did not induce further improvements. The predictive power of eq 56 (Table 4) was then investigated by using the training and test sets as explained under Methods. Equations 57 and 58 (Table 4) summarize the obtained results for training (eq 57) and test (eq 58) sets. In detail, eq 57 (Table 4) is almost identical to eq 56 (Table 4) thus emphasizing the stability of the model which appears to be almost independent of the included ligands as also confirmed by very low standard deviations in both coefficients and statistics. More importantly, eq 58 (Table 4) reveals an encouraging predictive power as assessed by both statistics which are comparable to those of eq 57 (Table 4) and coefficients which indicate a good agreement between experimental and predicted pK values as shown by the slope being close to 45° (43.8°) and the very low intercept.

statistics comparable with those of the best equations as obtained when focusing on the target-specific score averages (i.e., eqs 20−29, Table 2). This finding, which emphasizes that the best performing score averages, are not affected when combined with less informative descriptors, is in line with the models obtained by averaging the best scores (eqs 14 and 15, Table 1). Generally, these results confirm that these overall averages do not allow marked enhancements in the models probably because they do not account for multiple binding modes as seen in eqs 20−29 (Table 2), but they allow the best achievable models to be generated even avoiding systematic analyses of the docking scores computed for each considered target structure. Equations 52−55 (Table 4) reveal that the inclusion of overall ranges and sensitivity values has a very limited effect, thus emphasizing that overall ranges and sensitivity values do not have in eqs 52−55 (Table 4) the beneficial effects they had in the previous analyses of the binding space parameters. A possible explanation of such an unsatisfactory result may be based on the different sign with which the range and sensitivity variables appear in eqs 16−19 (Table 1) compared to eqs 30− 49 (Tables 2 and 3) and which suggests a substantially different role of such parameters in the two applications discussed above. In other words, range and sensitivity parameters encode for protein flexibility in eqs 16−19 (Table 1) and for ligand mobility in eqs 30−49 (Tables 2 and 3). These two phenomena play notably negative (protein flexibility) and positive (ligand mobility) roles in correlations with BChE activity, and they are suitably parametrized here by the proposed binding space descriptors, as proven by their beneficial effect on the resulting models. Nevertheless, these two phenomena are not additive (as confirmed by the different coefficient’s sign), meaning that the computed overall descriptors are unable to take both effects into account, which explains their limited role in eqs 52−55 (Table 4). The best model (eq 55, Table 4) was further investigated to assess its expandability and the resulting predictive power. Since the large number of considered substrates would permit the inclusion of a greater number of descriptors, the corresponding model with six variables was generated revealing slightly better statistics (eq 56) (Table 4), while the inclusion of other

4. CONCLUSIONS The study describes the binding space, a novel concept which accounts for ligand mobility within the binding site and allows a significant enhancement of the reliability of the so computed docking-based descriptors. The most relevant considerations which may be derived from the reported docking analyses can be summarized as follows. First, averaging docking score proves a simple but genuinely informative method to account for multiple binding modes as computed by considering more protein structures or more poses within a single structure. The truly encouraging results obtained by introducing the mean scores emphasize the key role of ligand mobility as well as the capacity of docking simulations to take this into account despite the static manner in which this was done. Second, score ranges and sensitivity values proved successful in parametrizing the effects of mutual adaptability in molecular recognition. Notably, these parameters show a contrasting behavior when derived from the best poses computed from a set of protein structures or from a set of poses obtained for a single protein structure. In the first case, they play a negative role on recognition, presumably as they encode the entropic penalty in terms of the conformational pressure that different protein conformations exert on ligand flexibility. In the second case they exert a positive effect since they describe the capacity of a ligand to retain part of its flexibility within the binding pocket. Third, the proposed rmsd functions appear as useful descriptors to define 1700

DOI: 10.1021/acs.jcim.7b00121 J. Chem. Inf. Model. 2017, 57, 1691−1702

Journal of Chemical Information and Modeling



ligand mobility within the binding site from a purely geometric standpoint. In conclusion, the binding space concept proposed here allowed a remarkable enhancement in the reliability of the computed docking scores in all the tested cases. Even though a robust validation of this novel approach will require more extended and more various applications, the reported results offer an encouraging initial validation of its notable potentialities in enhancing the general reliability of docking simulations. Even though much effort in the development of novel docking strategies has been focused on algorithms and scoring functions able to optimize their capacity to reproduce the experimentally resolved poses, this study highlights the non-negligible role of accounting for the alternative binding modes which can contribute to the reliability of docking scores. Even though the two objectives (to reproduce the experimental pose and parametrize the effect of other binding modes) are clearly not antagonistic, it will be interesting to understand if the current scoring functions which were developed to maximize the first objective are also able to successfully parametrize the relevance of the alternative binding modes. In this regard, the Contacts scores proposed here appear to be particularly informative when used in the binding space applications. As an aside, the reported ligand-based equations afford a further validation of the relevance of the property space concept which allows the development of robust equations and becomes a very interesting resource especially when structurebased studies are not feasible.



Article

AUTHOR INFORMATION

Corresponding Author

*Phone: +390250319349. Fax: +390250319359. E-mail: Giulio. [email protected]. ORCID

Giulio Vistoli: 0000-0002-3939-5172 Notes

The authors declare no competing financial interest.



ABBREVIATIONS BChE = butyrylcholinesterase; OPs = organophosphorous poisons; MLP = molecular lipophilicity potential; MLPInS = molecular lipophilicity potential interaction score; TA5 = ethyl hydrogen ethylamido phosphate; VIF = variance inflation factor



REFERENCES

(1) de Ruyck, J.; Brysbaert, G.; Blossey, R.; Lensink, M. F. Molecular Docking as a Popular Tool in Drug Design, an in Silico Travel. Adv. Appl. Bioinform Chem. 2016, 9, 1−11. (2) Ferreira, L. G.; Dos Santos, R. N.; Oliva, G.; Andricopulo, A. D. Molecular Docking and Structure-Based Drug Design Strategies. Molecules 2015, 20, 13384−421. (3) Yuriev, E.; Holien, J.; Ramsland, P. A. Improvements, Trends, and New Ideas in Molecular Docking: 2012−2013 in Review. J. Mol. Recognit. 2015, 28, 581−604. (4) Antunes, D. A.; Devaurs, D.; Kavraki, L. E. Understanding the Challenges of Protein Flexibility in Drug Design. Expert Opin. Drug Discovery 2015, 10, 1301−1313. (5) Wong, C. F. Flexible Receptor Docking for Drug Discovery. Expert Opin. Drug Discovery 2015, 10, 1189−200. (6) Fenwick, R. B.; Esteban-Martín, S.; Salvatella, X. Understanding Biomolecular Motion, Recognition, and Allostery by Use of Conformational Ensembles. Eur. Biophys. J. 2011, 40, 1339−55. (7) Ballante, F.; Marshall, G. R. An Automated Strategy for BindingPose Selection and Docking Assessment in Structure-Based Drug Design. J. Chem. Inf. Model. 2016, 56, 54−72. (8) Anselmi, M.; Pisabarro, M. T. Exploring Multiple Binding Modes Using Confined Replica Exchange Molecular Dynamics. J. Chem. Theory Comput. 2015, 11, 3906−18. (9) Vistoli, G.; Pedretti, A.; Testa, B. Chemodiversity and Molecular Plasticity: Recognition Processes as Explored by Property Spaces. Future Med. Chem. 2011, 3, 995−1010. (10) Vistoli, G.; Pedretti, A.; Testa, B. Assessing Drug-likeness–What Are We Missing? Drug Discovery Today 2008, 13, 285−94. (11) Vistoli, G.; Pedretti, A.; Testa, B. Partition Coefficient and Molecular Flexibility: the Concept of Lipophilicity Space. Chem. Biodiversity 2009, 6, 1152−69. (12) Lockridge, O. Review of Human Butyrylcholinesterase Structure, Function, Genetic Variants, History of Use in the Clinic, and Potential Therapeutic Uses. Pharmacol. Ther. 2015, 148, 34−46. (13) Testa, B.; Krämer, S. D. The Biochemistry of Drug metabolism– an Introduction: Part 3. Reactions of Hydrolysis and their Enzymes. Chem. Biodiversity 2007, 4, 2031−122. (14) Masson, P.; Froment, M. T.; Gillon, E.; Nachon, F.; Lockridge, O.; Schopfer, L. M. Hydrolysis of Oxo- and Thio-Esters by Human Butyrylcholinesterase. Biochim. Biophys. Acta, Proteins Proteomics 2007, 1774, 16−34. (15) Masson, P.; Lockridge, O. Butyrylcholinesterase for Protection from Organophosphorus Poisons: Catalytic Complexities and Hysteretic Behavior. Arch. Biochem. Biophys. 2010, 494, 107−20. (16) Fukami, T.; Yokoi, T. The Emerging Role of Human Esterases. Drug Metab. Pharmacokinet. 2012, 27, 466−77. (17) Khan, M. T. Molecular Interactions of Cholinesterases Inhibitors Using in Silico Methods: Current Status and Future Prospects. New Biotechnol. 2009, 25, 331−46.

ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jcim.7b00121. Table S1, all considered substrates; Table S2, list of the utilized ligand-based and docking based descriptors; Table S3, less performing sensitivity-based models (eqs S1−S41); Table S4, here obtained r2 values, Table S5, models generated for the calibration study; Figure S1 (A−C), effect of the number of considered poses on the statistics of the obtained models, on the trends of range and rmsd values (PDF) Table S6, data based on the best poses utilized to develop eqs 1−19 (XLSX) Table S7, data based on the multiple poses utilized to develop eqs 19−59 plus eqs S1−S31 (XLSX) Table S8, docking scores as computed by ReScore+ for the minimized complexes of the first ten substrates as generated for 4XII (as an example) and describing how binding space parameters can be easily calculated using the address functions in Excel (XLSX) PDB coordinates for the computational model of the putative complex between ketoprofen and BChE (PDB ID 4XII) as reported in Figure 3A (PDBPDB) PDB coordinates for the computational model of the putative complex between metronidazole benzoate ester and BChE (PDB ID 4XII) as reported in Figure 3B (PDB) PDB coordinates for computational model of the putative complex between ketoprofen and BChE (PDB ID 3O9M) as reported in Figure 3C (PDB) 1701

DOI: 10.1021/acs.jcim.7b00121 J. Chem. Inf. Model. 2017, 57, 1691−1702

Article

Journal of Chemical Information and Modeling (18) Zheng, F.; Zhan, C. G. Rational Design of an Enzyme Mutant for Anti-cocaine Therapeutics. J. Comput.-Aided Mol. Des. 2008, 22, 661−71. (19) Vistoli, G.; Pedretti, A.; Mazzolari, A.; Testa, B. In Silico Prediction of Human Carboxylesterase-1 (hCES1) Metabolism Combining Docking Analyses and MD simulations. Bioorg. Med. Chem. 2010, 18, 320−9. (20) Vistoli, G.; Pedretti, A.; Mazzolari, A.; Bolchi, C.; Testa, B. Influence of Ionization State on the Activation of Temocapril by hCES1: a Molecular-Dynamics Study. Chem. Biodiversity 2009, 6, 2092−100. (21) Vistoli, G.; Pedretti, A.; Mazzolari, A.; Testa, B. Homology Modeling and Metabolism Prediction of Human Carboxylesterase-2 Using Docking Analyses by GriDock: a Parallelized Tool Based on AutoDock 4.0. J. Comput.-Aided Mol. Des. 2010, 24, 771−87. (22) Testa, B.; Pedretti, A.; Vistoli, G. Reactions and Enzymes in the Metabolism of Drugs and Other Xenobiotics. Drug Discovery Today 2012, 17, 549−60. (23) Masson, P.; Nachon, F.; Lockridge, O. Structural Approach to the Aging of Phosphylated Cholinesterases. Chem.-Biol. Interact. 2010, 187, 157−62. (24) Carletti, E.; Aurbek, N.; Gillon, E.; Loiodice, M.; Nicolet, Y.; Fontecilla-Camps, J. C.; Masson, P.; Thiermann, H.; Nachon, F.; Worek, F. Structure-Activity Analysis of Aging and Reactivation of Human Butyrylcholinesterase Inhibited by Analogues of Tabun. Biochem. J. 2009, 421, 97−106. (25) Asojo, O. A.; Asojo, O. A.; Ngamelue, M. N.; Homma, K.; Lockridge, O. Cocrystallization Studies of Full-length Recombinant Butyrylcholinesterase (BChE) with Cocaine. Acta Crystallogr., Sect. F: Struct. Biol. Cryst. Commun. 2011, 67, 434−7. (26) Nachon, F.; Carletti, E.; Ronco, C.; Trovaslet, M.; Nicolet, Y.; Jean, L.; Renard, P. Y. Crystal Structures of Human Cholinesterases in Complex with Huprine W and Tacrine: Elements of Specificity for Anti-Alzheimer’s Drugs Targeting Acetyl- and Butyryl-cholinesterase. Biochem. J. 2013, 453, 393−9. (27) Brus, B.; Košak, U.; Turk, S.; Pišlar, A.; Coquelle, N.; Kos, J.; Stojan, J.; Colletier, J. P.; Gobec, S. Discovery, Biological Evaluation, and Crystal Structure of a Novel Nanomolar Selective Butyrylcholinesterase Inhibitor. J. Med. Chem. 2014, 57, 8167−79. (28) Knez, D.; Brus, B.; Coquelle, N.; Sosič, I.; Šink, R.; Brazzolotto, X.; Mravljak, J.; Colletier, J. P.; Gobec, S. Structure-based Development of Nitroxoline Derivatives as Potential Multifunctional AntiAlzheimer Agents. Bioorg. Med. Chem. 2015, 23, 4442−52. (29) Pedretti, A.; Villa, L.; Vistoli, G. VEGA: a Versatile Program to Convert, Handle and Visualize Molecular Structure on WindowsBased PCs. J. Mol. Graphics Modell. 2002, 21, 47−9. (30) Korb, O.; Stützle, T.; Exner, T. E. Empirical Scoring Functions for Advanced Protein-Ligand Docking with PLANTS. J. Chem. Inf. Model. 2009, 49, 84−96. (31) Pedretti, A.; Granito, C.; Mazzolari, A.; Vistoli, G. Structural Effects of Some Relevant Missense Mutations on the MECP2-DNA Binding: A MD Study Analyzed by Rescore+, a Versatile Rescoring Tool of the VEGA ZZ Program. Mol. Inf. 2016, 35, 424−33. (32) Phillips, J. C.; Braun, R.; Wang, W.; Gumbart, J.; Tajkhorshid, E.; Villa, E.; Chipot, C.; Skeel, R. D.; Kalé, L.; Schulten, K. Scalable Molecular Dynamics with NAMD. J. Comput. Chem. 2005, 26, 1781− 802. (33) Golbraikh, A.; Tropsha, A. Beware of q2! J. Mol. Graphics Modell. 2002, 20, 269−76. (34) Qiao, Y.; Han, K.; Zhan, C. G. Fundamental Reaction Pathway and Free Energy Profile for Butyrylcholinesterase-Catalyzed Hydrolysis of Heroin. Biochemistry 2013, 52, 6467−79. (35) Yao, Y.; Liu, J.; Zhan, C. G. Why Does the G117H Mutation Considerably Improve the Activity of Human Butyrylcholinesterase Against Sarin? Insights from Quantum Mechanical/Molecular Mechanical Free Energy Calculations. Biochemistry 2012, 51, 8980−92.

1702

DOI: 10.1021/acs.jcim.7b00121 J. Chem. Inf. Model. 2017, 57, 1691−1702