ADMET Evaluation in Drug Discovery. 11. PharmacoKinetics

May 6, 2012 - ... Knowledge Base (PKKB) to compile comprehensive information about ADMET properties into a single electronic repository. We incorporat...
0 downloads 0 Views
Subscriber access provided by Penn State | University Libraries

Article

ADMET Evaluation in Drug Discovery. 11. PharmacoKinetics Knowledge Base (PKKB) - a comprehensive database of pharmacokinetic and toxic properties for drugs Dongyue Cao, Junmei Wang, Rui Zhou, Youyong Li, Huidong Yu, and Tingjun Hou J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/ci300112j • Publication Date (Web): 06 May 2012 Downloaded from http://pubs.acs.org on May 7, 2012

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Information and Modeling is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 19

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

1

ADMET Evaluation in Drug Discovery. 11.

2

PharmacoKinetics Knowledge Base (PKKB) - a

3

comprehensive database of pharmacokinetic and toxic

4

properties for drugs

5 6

Dongyue Caoa†, Junmei Wangc†, Rui Zhoua, Youyong Lia, Huidong Yua, Tingjun Houa,b*

7 8 9 10 11

a

12

Institute of Functional Nano & Soft Materials (FUNSOM) and Jiangsu Key

13

Laboratory for Carbon-Based Functional Materials & Devices, Soochow University,

14

Suzhou, Jiangsu 215123, China b

15

c

16

College of Pharmaceutical Science, Soochow University, Suzhou, Jiangsu 215123, China

Department of Biochemistry, The University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd., Dallas, TX 75390

17 18 19 20



21

Corresponding author:

Co-first authors

22

Tingjun Hou

23

E-mail: [email protected]

24

Post address: Institute of Functional Nano & Soft Materials (FUNSOM), Soochow

25

University, Suzhou 215123, P. R. China.

or

[email protected]

26 27 28 29 30

Keywords: PKKB; ADMET; ADME; Toxicity; Database; CADD

1

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 19

Abstract

31 32 33

Good and extensive experimental ADMET (absorption, distribution, metabolism,

34

excretion and toxicity) data is critical for developing reliable in silico ADMET models.

35

Here we develop PharmacoKinetics Knowledge Base (PKKB) to compile

36

comprehensive information about ADMET properties into a single electronic

37

repository. We incorporate more than 10000 experimental ADMET measurements of

38

1685 drugs into PKKB. The ADMET properties in PKKB include octanol/water

39

partition coefficient, solubility, dissociation constant, intestinal absorption, Caco-2

40

permeability,

41

partitioning ratio, volume of distribution, metabolism, half-life, excretion, urinary

42

excretion, clearance, toxicity, half lethal dose in rat or mouse, etc. PKKB provides the

43

most extensive collection of freely available data for ADMET properties up to date.

44

All these ADMET properties, as well as the pharmacological information and the

45

calculated physiochemical properties are integrated into a web-based information

46

system. Ten separated datasets for octanol/water partition coefficient, solubility,

47

blood-brain partitioning, intestinal absorption, Caco-2 permeability, human oral

48

bioavailability, and P-glycoprotein inhibitors, have been provided for free download

49

and can be used directly for ADMET modeling. PKKB is available online at

50

http://cadd.suda.edu.cn/admet.

human

bioavailability,

plasma

protein

51 52 53 54 55 56 57 58 59 2

ACS Paragon Plus Environment

binding,

blood-plasma

Page 3 of 19

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

60

Introduction

61

Drug discovery and development is a time-consuming and expensive process. It was

62

estimated that 40~60% of new chemical entity (NCE) failures can be attributed to

63

poor ADMET profiles.1, 2 ADMET properties can be predicted from the chemical

64

structures, so that huge number of compounds can be evaluated prior to be

65

synthesized and assayed.3-5 Theoretical predictions of ADMET properties have been

66

proved to be efficient in recent years.3-7 The lack of enough high quality experimental

67

data for training reliable models has been the major hurdle to model ADMET

68

properties.6, 8, 9 When the sample size used in training is limited, the in silico models

69

cannot give robust and accurate predictions, especially for the ADMET properties

70

involving complex processes, such as bioavailability, metabolism, toxicity, etc.

71

Traditionally, the available experimental datasets for ADMET modeling in the public

72

domain are often limited in quantity and quality. This is particularly true for in vivo

73

properties obtained directly from human, where data is typically only available for

74

compounds in clinic development.8 Encouragingly, the available large datasets are

75

expanding in the recent years. For example, three extensive data sets for intestinal

76

absorption, oral bioavailability in human and P-gp inhibitors were reported by our

77

group.10-13Nevertheless, further developments on the availability of ADMET data for

78

the public use are still necessary.

6

79

With more available ADMET data, it will be helpful to integrate all these data of a

80

variety of ADMET properties from different sources into a single information system.

81

PK/DB reported by Moda and the coworker is one information system providing the

82

service.14 PK/DB incorporates 1389 compounds and 4141 pharmacokinetic

83

measurements for eight ADME properties. And the data in PK/DB were directly taken

84

from the reported publicly available ADME datasets without careful curation. For

85

instance, in PK/DB, two core datasets, the intestinal absorption dataset with 687

86

molecules and the oral bioavailability dataset with 660 molecules, were directly taken

87

from the datasets reported by us.11,

88

information.

12

Thus PK/DB only provides a limited data

3

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

89

Here, we develop PKKB (PharmacoKinetics Knowledge Base) to house 2-D and

90

3-D chemical structures, pharmacological information, experimental or calculated

91

physiochemical properties, and particularly high quality ADMET data of drugs.

92

PKKB incorporates the most extensive collections of ADMET data in the public

93

domain, which is much larger than PK/DB. The total number of experimental data in

94

PKKB is more than 10000, in comparison with 4000 in PK/DB. Most data in PKKB

95

were collected by us to develop the ADMET prediction models. During the ADMET

96

modeling process, the experimental data were carefully curated by us.10-13, 15-20 The

97

datasets in PKKB developed by us were widely used by a lot of well-known

98

information systems, such as Drugbank,21 KnowItAll22 and PK/DB14. The ADMET

99

data developed by us were used by a lot of scientists from pharmaceuticals and

100

academics. The extensive feedbacks from the users are very helpful for the

101

improvement of the data in PKKB. We frequently update the data in PKKB, which is

102

important to improve the quality of data. And we obtain the reliable data in PKKB. As

103

a publicly available online database, PKKB will be continuously maintained and

104

updated.

105 106

Methodology

107

The content of database

108

PKKB is hosted at http://cadd.suda.edu.cn/admet. The ADMET data currently in

109

PKKB are from 1685 drugs. All the FDA-approved small-molecule drugs found in

110

DrugBank have been collected into PKKB.21 We categorize the data fields for each

111

molecule into four groups: the general information, the pharmacological information,

112

the physicochemical properties and the ADMET properties (Table 1). The general

113

information for each molecule includes molecular name, synonyms, ACS number and

114

DrugBank ID. The pharmacological information for each molecule includes the status,

115

the administration route and the pharmacological effect. The physiochemical

116

properties for each molecule include the experimental octanol/water partition

117

coefficient (logP), the experimental aqueous solubility (logS), the experimental 4

ACS Paragon Plus Environment

Page 4 of 19

Page 5 of 19

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

118

dissociation constant (pKa), the calculated octanol/water partition coefficient (logP),

119

the calculated octanol/water distribution coefficient at pH=7 (logD), the calculated

120

aqueous solubility, the number of hydrogen bond donors, the number of hydrogen

121

bond acceptors, the number of rotatable bonds and the topological polar surface area

122

(TPSA). We performed all the calculations with ACD/Labs (version 12.0). The

123

ADMET properties for each molecule in PKKB are categorized into five parts:

124

absorption, distribution, metabolism, excretion and toxicity. The properties associated

125

with absorption include the absolute value of intestinal absorption, the description of

126

intestinal absorption, Caco-2 permeability and bioavailability in human. The

127

properties associated with distribution include protein binding, volume of distribution

128

(VD) and blood/plasma partitioning ratio (D-blood). The properties associated with

129

metabolism include general metabolism information and half-time; the properties

130

associated with distribution include excretion route, urinary excretion and clearance.

131

The properties associated with toxicity include general toxicity information, LD50 in

132

rat and LD50 in mouse. The distributions for ten ADMET properties are shown in

133

Figure 1.

134

We also release eleven ADMET datasets reported by us in PKKB. These ADME

135

datasets include three solubility datasets of 1290, 1708 and 1210 molecules,

136

respectively,16, 23, 24 a Caco-2 permeability dataset of 100 molecules,20 a blood-brain

137

partitioning dataset of 109 molecules,18 a P-gp inhibitor dataset of 1302 molecules,10 a

138

intestinal absorption dataset of 647 molecules,11, 15 a bioavailability dataset of 1013

139

molecules,12, 13 , a hERG blocker dataset 806 molecules,25 a combined dataset of 470

140

compounds with both intestinal absorption data and oral bioavailability data, and a

141

combined dataset of 69 compounds with both intestinal absorption data and Caco-2

142

permeability data.11-13,

143

directly used for ADMET modeling. In the near future, more datasets will be

144

continuously added to the dataset collections in PKKB. These datasets are important

145

for those who are interested in benchmarking the results of experiments, validating

146

the accuracy of existing ADMET predictive models, and building new predictive

147

models.

20

All these datasets have been post-processed and can be

5

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

148

It must be noted that the purpose of PKKB is quite different from those of

149

CheMBL,26 ChemSpider27 and DrugBank21. ChemSpide is a free chemical structure

150

database with more than 25 million molecules, and CheMBL is a free Chemical

151

database of bioactive drug-like small molecules. Drugbank is a popular information

152

system for drugs, and was primarily developed to provide chemical structures,

153

pharmacological data, and drug targets for drugs. ChemSpider does not afford

154

valuable information about ADMET. Some ADMET data are included in DrugBank

155

and ChEMBL, but they are quite limited. In comparison, PKKB provides the

156

comprehensive data for ADMET modeling.

157 158

Implementation

159

We developed an integrated data structure and a variety of querying functions to

160

allow easy and efficient retrieval of ADMET data (Figure 2). PKKB is installed on

161

Windows server’s workstations. MySQL5.1.46 is used as the relational database

162

management system (RDBMS). Apache Tomcat 6.0.26 server is used as the web

163

server platform. Meanwhile, we use J2EE (Spring3.0.5+Hibernate3.6.3), jQuery and

164

DHTML as the web interface. In order to construct user-friendly searching and

165

retrieval systems, PKKB provides interactive web interfaces based on the graphic

166

structure editor MarvinSketch and the substructure matching algorithm accomplished

167

in OpenBabel2.2.3.

168 169

Database access and database query

170

All querying functions in PKKB are available for everyone with internet access.

171

However, registration is highly encouraged for all users because the registered users

172

can download the search results using the download hyperlinks that the non-registered

173

users cannot use. Downloadable results are exported into a SDF database for

174

maximum flexibility of use in other applications. Everyone can download the ten

175

separated ADMET datasets.

176

PKKB provides registered users with an integrated searching interface for both 6

ACS Paragon Plus Environment

Page 6 of 19

Page 7 of 19

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

177

structure and text search (Figure 3a). The results from such searches lead the users to

178

a “Molecule list” view (Figure 3b), which shows the basic information about each

179

molecule, including 2-D structure, name, molecular weight, DrugBank ID and ACS

180

number. Users can click the hyperlink of a particular molecule to get more detailed

181

information: 2D and 3-D structures, molecular properties and ADMET properties,

182

which are displayed on its respective information webpage (Figures 3c and 3d).

183

PKKB is incorporated with a web-based query tool supported by MarvinSketch and

184

Openbabel2.2.3. The molecular drawing interface, MarvinSketch, allows users to

185

quickly draw molecules through some basic functions by the GUI and advanced

186

functionalities, including sprout drawing, customizable shortcuts, abbreviated groups,

187

default and user defined templates and context sensitive popup menus. SMARTS rules

188

allow users to define any specific or generic queries. Structural searches contain exact,

189

substructure and similarity search supported by Openbabel2.2.3. The similarity

190

between the query and each molecule was measured by Tanimoto coefficient based on

191

the FP2 fingerprint. User can also take a structural search by inputting a molecule

192

with different molecular formats supported by MarvinSketch.

193

PKKB is developed with rapid text searching functions for molecular name,

194

DrugBank ID and ACS number. Moreover, for several molecular properties, including

195

molecular weight, predicted logD, experimental logP, predicted logP, predicted

196

solubility, number of hydrogen-bond acceptor, number of hydrogen-bond donors,

197

number of rotatable bonds, intestinal absorption and bioavailability, numerical

198

interval of these attribute values can be searched to find the target molecules.

199 200

Database management

201

The administration user of PKKB can activate the database management system.

202

The database management includes three management interfaces: User Management,

203

Role Management and Compound Management. User Management interface is used

204

to manage registered user accounts, such as organizing the user's name, email, phone

205

number, etc. Role Management interface is used to setup user’s permissions.

206

Compound Management interface is used to add or remove compound in PKKB 7

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 19

207

(Figure 4a). In Compound Management interface, administrator can edit the available

208

properties of each molecule when it is necessary (Figure 4b); furthermore, the

209

structure of a compound can be edited by a molecular drawing interface supported by

210

MarvinSketch (Figure 4b). In Compound Management interface, administrators can

211

take a text search to find a specific compound to be edited (Figure 4a).

212

functions are helpful for the continuous maintenance of PKKB.

These

213 214

Conclusions and future development

215

In summary, we develop PKKB database, which is a unique knowledge environment

216

for ADMET properties. PKKB provides structures, pharmacological information,

217

important experimental or predicted physiochemical properties, and experimental

218

ADMET data for 1685 drugs.,PKKB integrates both predicted and experimental

219

information into a single and public resource. We expect that the rich content in

220

PKKB will facilitate the researchers to develop more reliable ADMET prediction

221

models in the near future. With the extensive data in PKKB, it is plausible to develop

222

more complicated models for more “complex” ADMET properties, such as

223

bioavailability, clearance, metabolism, etc. For example, from the combined dataset of

224

470 compounds with both intestinal absorption data and oral bioavailability data, we

225

may develop rules or models for the first-pass metabolism effect.

226

We are continuing to improve PKKB in the following directions. First of all, the

227

quality of the collected data needs to be improved further. The reliable data can

228

usually be generated under a single experimental protocol or even a single

229

experimental assay. Unfortunately, the data in PKKB were put together from various

230

sources, and they are subject to variability due to experimental conditions and

231

inter-laboratory errors. We will check the reliability of the data from different sources

232

carefully and guarantee the reliability of the data in PKKB as best as we can. Second,

233

to better characterize each entry, a complete record of that entry is needed, i.e., all the

234

fields for each molecule need to be filled in. Although PKKB already affords

235

extensive data for many ADMET properties, some data fields are far from 8

ACS Paragon Plus Environment

Page 9 of 19

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

236

“completeness”. The empty data field usually indicates that the data has not been

237

measured or reported. However, in many cases, the data probably exists in somewhere

238

else, but our ADMET team has not found it and validated it. Moreover, the coverage

239

of the ADMET properties in PKKB is still limited. In the new version of PKKB, some

240

important ADMET properties will be added, such as genotoxicity, aquatic toxicity,

241

eye irritation, skin irritation, P450 inhibitors and substrates, etc. In addition to the

242

existence of missing data, PKKB is also missing some drug-like molecules. Currently

243

more than 1000 drug candidates in clinical trials are still on the PKKB ‘to do’ list and

244

will be added in a short period. Third, we will replace Openbabel2.2.3 by MORT

245

(Molecular Objects and Relevant Templates) developed in our group soon. MORT, as

246

the foundation library of gleap in AmberTools,28 has already been released. The Java

247

MORT under development will be used by PKKB. Finally, we plan to afford on-line

248

prediction models for several important ADMET properties in the next version of

249

PKKB.

250 251

Acknowledgement

252

The project is supported by the National Science Foundation of China (Grant No.

253

20973121), the National Basic Research Program of China (973 program,

254

2012CB932600 to T. Hou), the NIH (R21GM097617 to J. Wang) and the Priority

255

Academic Program Development of Jiangsu Higher Education Institutions (PAPD).

256 257

References

258 259 260 261 262 263 264 265 266 267

1.

Kennedy, T. Managing the drug discovery/development interface. Drug Discovery Today 1997, 2, 436-444.

2. Kola, I.; Landis, J. Can the pharmaceutical industry reduce attrition rates? Nature Reviews Drug Discovery 2004, 3, 711-715. 3. Hou, T. J. In Silico Predictions of ADME/T Properties: Progress and Challenges. Combinatorial Chemistry & High Throughput Screening 2011, 14, 306-306. 4. Hou, T. J.; Wang, J. M.; Zhang, W.; Wang, W.; Xu, X. Recent advances in computational prediction of drug absorption and permeability in drug discovery. Current Medicinal Chemistry 2006, 13, 2653-2667. 5. Zhu, J. Y.; Wang, J. M.; Yu, H. D.; Li, Y. Y.; Hou, T. J. Recent Developments of In Silico Predictions of 9

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311

Oral Bioavailability. Combinatorial Chemistry & High Throughput Screening 2011, 14, 362-374. 6. Hou, T.; Wang, J. Structure - ADME relationship: still a long way to go? Expert Opinion on Drug Metabolism & Toxicology 2008, 4, 759-770. 7. van de Waterbeemd, H.; Gifford, E. ADMET in silico modelling: Towards prediction paradise? Nature Reviews Drug Discovery 2003, 2, 192-204. 8. Gola, J.; Obrezanova, O.; Champness, E.; Segall, M. ADMET property prediction: The state of the art and current challenges. Qsar & Combinatorial Science 2006, 25, 1172-1180. 9. Dearden, J. C. In silico prediction of ADMET properties How far have we come? Expert Opinion on Drug Metabolism & Toxicology 2007, 3, 635-639. 10. Chen, L.; Li, Y. Y.; Zhao, Q.; Peng, H.; Hou, T. J. ADME Evaluation in Drug Discovery. 10. Predictions of P-Glycoprotein Inhibitors Using Recursive Partitioning and Naive Bayesian Classification Techniques. Molecular Pharmaceutics 2011, 8, 889-900. 11. Hou, T. J.; Wang, J. M.; Zhang, W.; Xu, X. J. ADME evaluation in drug discovery. 7. Prediction of oral absorption by correlation and classification. Journal of Chemical Information and Modeling 2007b, 47, 208-218. 12. Hou, T. J.; Wang, J. M.; Zhang, W.; Xu, X. J. ADME evaluation in drug discovery. 6. Can oral bioavailability in humans be effectively predicted by simple molecular property-based rules? Journal of Chemical Information and Modeling 2007c, 47, 460-463. 13. Tian, S.; Li, Y. Y.; Wang, J. M.; Zhang, J.; Hou, T. J. ADME Evaluation in Drug Discovery. 9. Prediction of Oral Bioavailability in Humans Based on Molecular Properties and Structural Fingerprints. Molecular Pharmaceutics 2011, 8, 841-851. 14. Moda, T. L.; Torres, L. G.; Carrara, A. E.; Andricopulo, A. D. PK/DB: database for pharmacokinetic properties and predictive in silico ADME models. Bioinformatics 2008, 24, 2270-2271. 15. Hou, T. J.; Wang, J. M.; Li, Y. Y. ADME evaluation in drug discovery. 8. The prediction of human intestinal absorption by a support vector machine. Journal of Chemical Information and Modeling 2007a, 47, 2408-2415. 16. Hou, T. J.; Xia, K.; Zhang, W.; Xu, X. J. ADME evaluation in drug discovery. 4. Prediction of aqueous solubility based on atom contribution approach. Journal of Chemical Information and Computer Sciences 2004a, 44, 266-275. 17. Hou, T. J.; Xu, X. J. ADME evaluation in drug discovery - 1. Applications of genetic algorithms to the prediction of blood-brain partitioning of a large set of drugs. Journal of Molecular Modeling 2002, 8, 337-349. 18. Hou, T. J.; Xu, X. J. ADME evaluation in drug discovery. 3. Modeling blood-brain barrier partitioning using simple molecular descriptors (vol 43, 2137, 2003). Journal of Chemical Information and Computer Sciences 2004b, 44, 766-770. 19. Hou, T. J.; Xu, X. J. ADME evaluation in drug discovery. 2. Prediction of partition coefficient by atom-additive approach based on atom-weighted solvent accessible surface areas (vol 43, pg 1058, 2003). Journal of Chemical Information and Computer Sciences 2004c, 44, 1516-1516. 20. Hou, T. J.; Zhang, W.; Xia, K.; Qiao, X. B.; Xu, X. J. ADME evaluation in drug discovery. 5. Correlation of Caco-2 permeation with simple molecular properties. Journal of Chemical Information and Computer Sciences 2004d, 44, 1585-1600. 21. Wishart, D. S.; Knox, C.; Guo, A. C.; Shrivastava, S.; Hassanali, M.; Stothard, P.; Chang, Z.; Woolsey, J. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Research 2006, 34, D668-D672. 10

ACS Paragon Plus Environment

Page 10 of 19

Page 11 of 19

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355

22. KnowItAll information system; Bio-Rad Laboratories, Inc. http://www.knowitall.com, 2012. 23. Wang, J. M.; Hou, T. J.; Xu, X. J. Aqueous Solubility Prediction Based on Weighted Atom Type Counts and Solvent Accessible Surface Areas. Journal of Chemical Information and Modeling 2009, 49, 571-581. 24. Wang, J. M.; Krudy, G.; Hou, T. J.; Zhang, W.; Holland, G.; Xu, X. J. Development of reliable aqueous solubility models and their application in druglike analysis. Journal of Chemical Information and Modeling 2007, 47, 1395-1404. 25. Wang, S.; Li, Y.; Wang, J.; Chen, L.; Zhang, L.; Yu, H.; Hou, T. ADMET Evaluation in Drug Discovery. 12. Development of Binary Classification Models for Prediction of hERG Potassium Channel Blockage. Molecular Pharmaceutics 2012, 9, 996-1010. 26. Gaulton, A.; Bellis, L. J.; Bento, A. P.; Chambers, J.; Davies, M.; Hersey, A.; Light, Y.; McGlinchey, S.; Michalovich, D.; Al-Lazikani, B. ChEMBL: a large-scale bioactivity database for drug discovery. 2012, 40, D1100-D1107. 27. Pence, H. E.; Williams, A. ChemSpider: an online chemical information resource. Journal of Chemical Education 2010, 87, 1123-1124. 28. Zhang, W.; Hou, T. J.; Qiao, X. B.; Xu, X. J. Some basic data structures and algorithms for chemical generic programming. Journal of Chemical Information and Computer Sciences 2004, 44, 1571-1575.

11

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

356

Figure legends

357 358

Figure 1. The distributions of ten experimental physiochemical and ADMET

359

properties in PKKB.

360 361

Figure 2. The basic schema of PKKB. We mine, clean and organize the ADMET data

362

from the publications by manual curation.

363 364

Figure 3. (a). The integrated text and structure searching interface in PKKB, (b) the

365

searching results for a list of target molecules; (c) the detailed information of a

366

molecule with 2-D structure; (d) the detailed information of a molecule with 3-D

367

structure.

368 369

Figure 4. (a) The Compound Management interface and (b) the interface for editing

370

the available properties of a specific molecule.

371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 12

ACS Paragon Plus Environment

Page 12 of 19

Page 13 of 19

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

387

Table 1. The important data fields in PKKD and the corresponding number of

388

measures No.

Property

Measures

Physiochemical Properties 1

Molecular weight

1684

2

logP (experiment)

1019

3

logP (predicted, AB/logP v2.0)

1625

4

pka (experiment)

638

5

logD (pH=7, predicted)

1625

6

Solubility (experiment)

800

7

logS (predicted, ACD/Labs)(pH=7)

1614

8

logSw (predicted, AB/LogSw 2.0)

1625

9

Sw (mg/ml) (predicted, ACD/Labs)

1613

10

Sw (predicted)

1625

11

Number of hydrogen bond donors

1625

12

Number of hydrogen bond acceptors

1625

13

Number of rotatable bonds

1625

14

TPSA

1625

Pharmacology 15

Status

1372

16

Administration

501

17

Pharmacology

1543

Absorption 18

Intestinal absorption

679

19

Absorption (description)

699

20

Caco-2 permeability

64

21

Human bioavailability

992

Distribution 22

Plasma protein binding

1058

23

Volume of distribution (Vd)

646

24

Blood/plasma partitioning ratio (D-blood)

66

Metabolism 25

Metabolism

1111

26

Half-time

1116

Excretion 27

Excretion

855 13

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

28

Urinary excretion

281

29

Clearance

410

Toxicity 30

Description of toxicity

873

31

LD50 (rat)

219

32

LD50 (mouse)

243

389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423

14

ACS Paragon Plus Environment

Page 14 of 19

Page 15 of 19

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

424 425

Figure 1

15

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

426

427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454

Figure 2

16

ACS Paragon Plus Environment

Page 16 of 19

Page 17 of 19

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479

Figure 3

17

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

480

481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505

Figure 4

18

ACS Paragon Plus Environment

Page 18 of 19

Page 19 of 19

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

506

For Table of Contents Use Only

507 508

ADMET Evaluation in Drug Discovery. 11. PharmacoKinetics Knowledge Base

509

(PKKB) - a comprehensive database of pharmacokinetic and toxic properties for

510

drugs

511 512

Dongyue Caoa, Junmei Wang, Rui Zhou, Youyong Li, Huidong Yu, Tingjun Hou*

513

19

ACS Paragon Plus Environment