Steps Toward a Virtual Rat: Predictive Absorption ... - ACS Publications

Steps Toward a Virtual Rat: Predictive. Absorption, Distribution, Metabolism, and. Toxicity Models. Yufeng J. Tseng,*,1,2 Bo-Han Su,1 Ming-Tsung Hsu,3...
1 downloads 8 Views 637KB Size
Chapter 14

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

Steps Toward a Virtual Rat: Predictive Absorption, Distribution, Metabolism, and Toxicity Models Yufeng J. Tseng,*,1,2 Bo-Han Su,1 Ming-Tsung Hsu,3 and Olivia A. Lin2 1Department

of Computer Science and Information Engineering, National Taiwan University, No. 1 Sec. 4, Roosevelt Road, Taipei, Taiwan 106 2Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, No. 1 Sec. 4, Roosevelt Road, Taipei, Taiwan 106 3Genome and Systems Biology Degree Program, National Taiwan University and Academia Sinica, No. 1 Sec. 4, Roosevelt Road, Taipei, Taiwan 106 *E-mail: [email protected]

Predictive absorption, distribution, metabolism and toxicity models are promising tools to reduce the cost of preclinical safety screening in drug development processes. Traditionally, quantitative structure–activity relationship (QSAR)-based prediction models have a long-standing history of application for lead optimization on the drug development pipeline. With the advances in high-throughput screening techniques and public release of screening data, QSAR-based studies are no longer limited to a few analogs and lead optimization. This chapter focuses on the applications of predictive QSAR models in preclinical drug development. The key features of current QSAR practices, including molecular descriptors, machine learning methods, available databases, and the applications of various QSAR models of absorption, distribution, metabolism and toxicity studies, are reviewed and discussed.

Overview of Predictive Methods Key and common descriptors are listed and defined below (Table 1).

© 2016 American Chemical Society Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Table 1. Descriptor sets used in the general in silico modeling analyses. 0D/1D

One-D, 2-D and pseudo-3D physicochemical properties and molecular features

2D

Molecular interaction field properties, 3D, but each represented as a single non-integer value

3D/4D

Conformational ensemble averaged distances between pairs of all atom-types composing a decorated nanotube complex in their reduced eigenvalue representation

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

Molecular Descriptors Molecular descriptors are numerical values that characterize properties of molecules; they vary in the complexity of encoded information and computation time.

Physical Property Descriptors The partition coefficient (P) reflects the ratio of a compound in two immiscible phases (octanol and water) at equilibrium. The logarithm of this ratio, LogP(o/w), is a measure of lipophilicity, which specifies a drug compound’s ability to move from an aqueous environment through the hydrophobic membrane bilayer.

Semi-Empirical Molecular Descriptors Semi-empirical descriptors describe the electronic physicochemical properties of drug compounds; these properties include dipole moments, total SCF energy, electronic energy, heat of formation, highest occupied molecular orbital (HOMO), lowest unoccupied molecular orbital (LUMO) energy, and ionization potential.

2D and 2½D Molecular Descriptors 2D molecular descriptors are used to characterize a molecule’s physical properties, including surface area, atom counts, bond counts, and Kier-Hall (1) index, which describes the molecular connectivity and kappa indices (1). Other 2D descriptors encode the molecular index derived from the adjacency and distance matrix (2, 3), pharmacophore features, and partial charges information. A 2½D molecular descriptor is defined for 3D molecular properties that are represented as a singular numerical value. These descriptors are based on the conformations of a molecule and describe properties, such as the conformational potential energy, molecular surfaces, volumes, shapes, and other related components. 284 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

3D Descriptors 3D descriptors are based on 3D representations of a molecule. These descriptors enable the visualization of the molecular interactions in a 3D molecular field along with the chemical structures in an intuitive way. The interactions between a molecule and a protein (e.g., a receptor) can be easily mapped. Commonly used 3D descriptors are CoMFA, GRIND, and VolSurf. CoMFA (Comparative Molecular Field Analysis) determines the electrostatic and steric interaction energies between a probe and the molecule separately. CoMFA (4) is very sensitive to the alignment of molecules with respect to the protein/receptor. VolSurf (5) and GRIND (GRid-INdependent Descriptors) are alignment-independent descriptors. GRIND (6) descriptors use auto-correlograms and cross-correlograms to describe the distance between certain regions by the spatial extent of the molecule studied. GRIND also represents the distance between these regions and regions by other relevant interactions fields of the compounds. The VolSurf (7, 8) descriptor uses 3D molecular interaction fields to evaluate 76 features. The compound is initially placed in a grid of atom coordinate space. Two probes, including a hydrophobic and hydrophilic probe, are traversed to each grid, and the interaction energy is then calculated between the probes and each grid. The grid points with the same range of interaction energies are classified as iso-contours, and the summation of the volume of these atoms is calculated. The interaction energies and volumes are combined as the Volsurf descriptors. Another category of 3D descriptors is different from the lattice- or surface-based descriptors in that they do not consider ligand properties at specific locations in space but rather as intrinsic 3D properties of the ligands themselves. Widely used descriptors in this category include the CoMMA (Comparative Molecular Moment Analysis) (9) and WHIM (Weighted Holistic Invariant Molecular descriptors) (10) descriptors. CoMMA is based on the moments of shape and the charge distribution of a molecule. The molecular moment is a set of vector values, usually containing the molecular mass or charge of a molecule, with components along some X, Y, and Z-axes. In CoMMA, second-order moments are calculated from the molecular weight, center-of-mass, principle inertial components and axis, and quadrupole moments and principal axis moments are calculated from re-orientation of the principal inertial axis. WHIM analysis involves performing principal component analysis on the Cartesian coordinate space of a molecule and evaluating the space-invariant statistical indices derived from the scores of projected atoms.

4D and Higher Dimensional Molecular Descriptors Computational models built using higher multidimensional molecular descriptors classify compounds using conformation information obtained from molecular dynamic (MD) simulation. 4D-Fingerprints (4D-FP) were developed because traditionally too few molecular conformations are analyzed using 3D 285 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

fingerprints. The size of the 4D descriptors varies depending on the number of atoms encoded for a molecule.

5D/6D Descriptors (11)

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

These descriptors add adaptation in the fifth dimension, which embeds the information between the protein binding pocket and individual ligand conformation (induced-fit). The sixth dimension of descriptor stands for the multiple consideration of different solvation models (12).

Classification and Correlation Computational Methods For predictive ADMET studies, it is crucial to have a stable and reliable model. Traditionally, predictive model relies on regression or correlation. One of the benefits of this type of modeling is its widely accepted concept and also easy access to tools for building models. Regression or correlation methods are traditionally based on multivariate analysis and often work with dependent variables of continuous values; that is, the endpoints of measurement for ADMET are non-categorical values, such as aqueous solubility, cell permeability coefficient, and hERG Blockage (13, 14). Common methods include partial least squares regression (15), step-wise linear regression (16), and simple multiple regression (17). The fundamental idea of multivariate analysis is that there is one or multiple linear regression model(s) for a large amount of dependent variables (usually the molecular descriptors in the ADMET prediction) with few independent variables (the Y values in the regression equation and also the endpoints of ADMET measurements). Most commonly used is partial least squares regression, which transforms the predicted variables and the observable variables into a new projected space. PLS has gained popularity due to its ability to greatly reduce the data dimensions, the large amount of dependent variables, the molecular descriptors, and the selection of the most important variable sets to explain the AMDET endpoint measurement. Because a large quantity of experimental data has been obtained by high-throughput screening (HTS) and made publicly available, machine learning methods, such as recursive partitioning (RP), genetic algorithm (GA), genetic function approximation (GFA), support vector machines (SVM), artificial neural networks (ANNs), and k-nearest neighbor algorithm (kNN) are more widely used. Machine learning methods are useful for building classification models on categorical data, which fit the HTS experiment data well, constantly giving a threshold for primary screening purposes to differentiate active or inactive assay results (active can be good absorption, highly toxic, or metabolized in ADEMT). Machine learning methods have gained popularity in the last 8 to 10 years in ADMET predictions. A short summary of the machine learning methods is given below (Table 2).

286 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

287

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

Table 2. A summary of machine learning methods used in ADMET predictions. Machine learning Methods

Basic concept

Key function

How the best model was determined

Examples Reference

GA

The possible answers of the queried question are defined as a set of “Chromosomes”, and the variables of the question are regarded as “genes” in a chromosome. Crossover and mutation operators can be used to produce new sets of chromosomes. Chromosomes with high fitness evolve to the next generation, whereas low fitness score chromosomes are ignored (selection). Continuous mutation, crossover, and selection are iteratively performed until the termination criterion is met.

Mutation, Crossover, Selection

Model with the best fitness score. Fitness is evaluated by a scoring function. The higher the value, the better the fitness

(18–28)

GFA

GFA is a multidimensional optimization algorithm using the process of GA to evolve a population of models. The generated models are evaluated by the lack-of-fit score function to fit the training dataset. The LOF function can be used to penalize models with too many overfitting features.

LOF score

Model with the best fitness score.

(19, 22–24, 29–32)

Fitness is controlled by the smoothing factor, a component of the LOF scoring function. Increased smoothing factor results in a decreased size of the model. Continued on next page.

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

288

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

Table 2. (Continued). A summary of machine learning methods used in ADMET predictions. Basic concept

Key function

How the best model was determined

Examples Reference

SVM

SVM performs separation of two classes of compounds by finding a set of hyperplanes with a maximum margin based on either the linear distance or linear distance on a projection in a high-dimension feature space (molecular descriptors) between the two groups.

Kernel function used in SVM

Model that best explains the dataset with known classification (active or inactive in this study).

(26, 27, 33–36)

ANN

The neural network is constructed from three layers: input layer, hidden layer, and output layer. Each layer contains one or more nodes, and pairs of nodes are interconnected between layers by weights. Simultaneously tuning these weights can minimize the prediction error on the training endpoints.

Network layer, weights for each layer

The network model with least network errors in training data.

(37–41)

Machine learning Methods

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

Databases In the last eight to ten years, large quantities of data have gradually been released and have become available for predictive ADMET. The UK-based ChEMBL (19, 42) and US-based PubChem (26, 27, 31, 36) databases are large repositories containing at least the basic data for ADMET information, such as chemical structures and ADMET predictions. ChEMBL contains binding, functional and ADMET data abstracted from primary published literature and is curated. The current version (DB: ChEMBL_14) contains 9,003 targets, 1,213,239 distinct compounds, and 10,129,256 activities from total of 46,133 publications. Unlike the literature-based ChEMBL, the compounds in PubChem are derived from the high-throughput screening assays, maintained from NIH’s Molecular Libraries Roadmap Initiative. PubChem currently contains nearly 33 million unique structures, more than 621,000 bioassays (from nearly 4800 NIH Molecular Libraries assays), 45,000 scientific articles, and several hundred other resources, such as pharmaceutical companies and individual research groups. The PubChem database actually contains three databases, PubChem Compound, PubChem Substance, and PubChem BioAssay. The PubChem Compound database contains unique non-redundant chemical structures, whereas PubChem Substances contains specific chemicals from different vendors or specific chemicals used in specific bioassays. PubChem Compound and Substance contain many chemical structures that are not tested in PubChem BioAssay. BindingDB and DrugBank (42–44) are two more large databases that specifically collect detailed drug/chemical compound data with comprehensive drug targets that are potentially related to ADMET properties. BindingDB (45) even contains curated quantitative data, such as Ki, Kd, and IC50 measurements collected from the literatures containing more detailed assay conditions, such as pH, temperature, and buffer composition. All of the databases described above offer unique user interfaces to browse, query, download and analyze data tailored to different scientific focusses. One of the benefits in the predictive ADMET field is the ability to predict the ADMET properties before the compound is tested experimentally. In addition to in-house compounds, Zinc (46), MMsINCdatabase (47), and ChemSpider (48) are free databases offering commercially available compounds for virtual screening and chemoinformatics applications. Compounds in these databases contain not just 2D structures but also 3D chemical structure information. Most of the databases are cross-referenced to each other and are listed in the NIH PubChem database.

Absorption The vast majority of drug molecules are administered orally; from marketing and patients’ compliance perspectives, oral administration is the easiest route of administration in terms of management. However, oral administration is the least direct route of drug administration – drug molecules face degradation by various enzymes and stomach acid before being absorbed to act on its intended targets. For this reason, the absorption of oral drugs is slow, and the final 289 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

bioavailability is unpredictable. The main factors that govern the extent of drug absorption are solubility and membrane permeability. Drugs that are more soluble and have higher membrane penetrance are likely to have more desirable overall bioavailability. In the following sections, the currently preferred in silico solubility and permeability prediction models are highlighted and discussed.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

Solubility - The Challenges, the Needs, and Current Status Aqueous solubility and membrane permeability are two factors that significantly affect the oral bioavailability of drugs (49–51). Aqueous solubility determines the compound dissolution rate and also the maximum concentration reached in the gastrointestinal fluid. Measurement of the intrinsic solubility, thermodynamic solubility, apparent solubility, and kinetic solubility can also be considered as the solubility measurement in the literature. The most widely used simple structural feature to “filter” or “predict” oral bioavailability is Lipinski’s “Rule of five”. Lipinski’s “Rule of five” (52) concludes that a drug candidate having a molecular weight smaller than 500, a calculated logP (ClogP) smaller than 5.0 and a number of hydrogen bond donors and acceptors less than 5 and 10 (53) is more likely to be an orally active drug in humans. Thus, the basic predictive absorption generally applies the “Rule of Five” as a filter to screen potentially oral active compounds, especially in a large HTS or virtual screening datasets. There are reasons for the popularity of using this simple set of filters for oral availability predictions instead of directly applying predictive aqueous solubility models. Most in silico models use the intrinsic solubility S (or the logarithm of solubility, logS, for convenience) to develop the predictive models. However, despite the fast development of different HTS bioassays, the measurement of intrinsic solubility is low-throughput, which creates a need for predictive models for the solubility of compounds. However, the biggest challenge of creating a reliable predictive model is the irreproducible results of aqueous solubility. Jorgensen and Duffy showed there is at most an average of 0.6 log units in terms of uncertainty in measuring aqueous solubility values (54). To overcome this issue, the CheqSol approach developed by Llinas et al. (55) offers a highly reproducible aqueous solubility measurement by the rapid thermodynamic equilibrium potentiometric technique. In 2008, a “Solubility Challenge” (56) was held using this CheqSol approach that accurately measures intrinsic solubility values with a diverse set of 100 drug-like molecules at 25 °C and an ionic strength of 0.15 M. Researchers were challenged to design an intrinsic solubility predictor for thirty-two other unpublished drug-like compounds that have been evaluated. In the concluding publication, “Findings of the Challenge To Predict Aqueous Solubility” (57), the major findings in this in silico modeling of solubility showed 1) a low percentage of correct predictions (0.0% to 21.9%) for ±10% of the measured value of S for the full set of 32 compounds, 2) the ranges in the predicted versus measured R2 for S are approximately 0.000 to 0.642 for the 28 compounds (32 compounds excluding the four “too soluble to measure” compounds), 3) a 15.6% to 62.5% range in the percent correct predictions for ±0.5 logS of the measured value of 290 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

logS for the full set of 32 compounds, 4) a range of 0.018 to 0.650 in the predicted versus measured R2 for logS for the 28 compounds, 5) no contestant was able to make a prediction of solubility as a function of polymorphic state, 6) the accuracy of predictions is higher if the solubility is in the range of logS of 0.5 to 3, and 7) the prediction accuracy varies and depends on the chemical structure. A recent summary of in silico solubility models using different molecular descriptors set is given in Table 3. Most works were performed using 2D and 3D descriptor pools (58–70), and the major modeling methods include regression based (47, 49, 52) and machine learning methods (46, 48, 50, 53). The performance varies by the training sets used and methods.

Table 3. Summary of recent in silico solubility models using 0D, 1D, 2D and 3D molecular descriptors. Molecular Descriptors

0D or 1D

2D

3D

Classification method

Performance

Reference No.

MLR

R2=0.74

(58)

ANN

R2=0.96

(61)

MLR, ANN

R2=0.83, 0.91

(66)

Regression Analysis

AUE=0.63, RMSE=0.84, Q2=0.762

(68)

ANN

R2=0.92

(69)

MLR, ANN

R2=0.82, 0.92

(70)

LR

R2 = 0.69

(59)

ANN

R2=0.85, RMSE=0.97

(62)

ANN, KNN, DF

accuracy=0.97, 0.96, 0.88

(67)

PLS

R2=0.84, RMSE of 0.51

(60)

SVM

R2=0.79, RMSE=0.90

(63)

Regression Analysis/

RMSE=0.61

(64)

Gaussian Processes

R2=0.82, RMSE=0.96

(65)

Data and Databases for Solubility Prediction for Future in Silico Modeling Aqueous solubility is usually expressed as logS, where S is the solubility at 2025 °C in mol/L. The dataset for the Solubility Challenge can be downloaded from http://www-jmg.ch.cam.ac.uk/data/solubility/; it contains the original one hundred 291 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

molecules with solubility as the training set in the challenge and also the thirty-two molecules for predictions. One commonly used dataset was developed by Tetko (66). This database includes 1290 organic compounds based on the dataset from Huuskonen et al. (71) The Huuskonen dataset was collected from AQUASOL database (72) and PHYSPROP database and contains 1297 diverse molecules (73). Wang et al. (53) developed on top of Tetko’s dataset and added new molecules from literature for a total of 1708 molecules. This dataset is available at http:// modem.ucsd.edu/adme/databases/databases_logS.htm.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

Passive Diffusion – Permeability Another important determinant for oral bioavailability is the permeability of a drug across the intestinal barrier, and the Caco-2 cell line is one of the representative in vitro models, which can mimic the mechanism of drug transport across the intestinal epithelial cell barrier. Caco-2 cells are used to evaluate the intestinal permeability of drugs. With increasing experiments of Caco-2 for screening cellular permeability of drugs, several quantitative structure–activity relationship (QSAR) models were constructed as virtual screening tools for the evaluation of Caco-2 permeability (41, 74–84) (Table 4). Kulkarni et al. developed a membrane-interaction QSAR (MI-QSAR) model to predict the Caco-2 cell permeability using a training dataset of thirty drug molecules and a testing set of eight drugs (85). Three recognized properties, including solvation free energy, the extent of drug interaction with DMPC monolayer, and conformational flexibility of a drug within simulated cell membrane, were strongly correlated with the degree of cell permeation of drugs. However, the limited structural diversity of the small dataset might reduce the predictive power of the resultant model. Sherer et al. used a Merck permeability dataset of over 15,000 compounds as a training set, which is higher than all previous publications, to build a random forest predictive model. They found that logD is also an important feature in predicting cell permeability (77). Predicting the human blood-brain barrier (BBB) penetration of a drug candidate is also necessary to evaluate the existence of a molecule in the targets of the central nervous system (CNS). The function of the BBB is to protect the CNS from xenobiotics that may injure the brain by restricting the permeability of the foreign substances. In the drug development process, we have to examine whether the drug-like compounds penetrates the brain and thus exhibits its pharmacological activity. However, the evaluation of BBB penetration for a large number of testing compounds via traditional experiments is very time-consuming and expensive (86). Although the high-throughput screening for evaluation of BBB penetration has become available (87), the current in vitro BBB models still cannot be used for complete interpretation of in vivo BBB characteristics (88).

292 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Table 4. Summary of recent in silico Caco-2 prediction models using 0D, 1D, 2D and 3D molecular descriptors. Molecular Descriptors

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

0D or 1D

2D

3D

Classification method

Performance

Reference No.

ANN

correlation coefficient=0.84, RMSE=0.55

(41)

LDA-QSAR

ROC=0.89

(74)

LDA

accuracy=0.91

(75)

PLS

R2=0.79, Q2=0.65

(76)

Random forest

R2=0.47, RMSE=0.21

(77)

GFA

R2=0.75

(78)

GA-PLS

R2=0.79, s=0.39

(79)

Decision-Tree

accuracy=0.79

(80)

MLR

R2=0.82, Q2=0.79

(81)

MI-QSAR

R2=0.95

(82)

GA-NN

R2=0.86

(83)

SVM

correlation coefficients=0.88

(84)

Extensive in silico models for the prediction of BBB penetration were developed to reduce the time requirement for drug candidates to approach the market (19, 89–99). Different classification studies using different molecular descriptors for BBB penetration prediction are presented in Table 5. However, the ratio of positive and negative BBB penetration of the training compounds applied by most previous studies is not consistent with the reality of the ratio of world drugs (statistically only two percent of organic compounds can cross the BBB); thus, the success of these models to determine BBB penetration was limited. To overcome this limitation, Martins et al. (90) used support vector machine and random forest approaches incorporating Bayesian theory to yield a reliable model applicable to real scenarios of world drugs. A total of 1970 crated compounds derived from the literature was used, and a rationale selection process for training compounds was applied. The best model yielded an average accuracy, sensitivity, and specificity of 95%, 83%, and 96%, respectively. Furthermore, a web-based system is also available (http://b3pp.lasige.di.fc.ul.pt).

293 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Table 5. Summary of recent in silico BBB prediction models using 0D, 1D, 2D and 3D molecular descriptors. Molecular Descriptors

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

0D or 1D

2D

3D

Classification method

Performance

Reference No.

DT

CCR,MCC=0.91, 0.82

(89)

RF

accuracy=0.95

(90)

RF

accuracy=0.88

(91)

LDA

accuracy=0.80

(92)

MLR

R2= 0.86, Q2=0.85

(93)

SVM

accuracy=0.8

(94)

GFA

R2=0.72

(19)

ANN

R2=0.81

(95)

PCA

R2 =0.81, Q2= 0.66

(96)

kNN-MLR

Q2=0.77

(97)

LR

Q2=0.68

(98)

MLR

accuracy=0.73

(99)

Distribution After a drug molecule is absorbed, it moves away from the site of absorption into other body tissues in a process known as distribution. Skin penetration is one of many types of studies related to drug distribution because it closely examines the movement of chemicals from the outer layer to the inner layer via diffusion across lipid bilayers. The distribution of drug molecules inside an animal’s body is not a process that can be easily monitored or studied without proper biomarkers. Studies of skin penetrance and sensitivity are easier to conduct; therefore, it is not surprising that more datasets are available for training and validation of in silico models.

Skin Penetration The enhancement of delivery of a particular drug or therapeutic agent into the skin for systemic drug administration represents an attractive means. In both the pharmaceutical and cosmetic industries, the subject of the development of penetration enhancers to improve percutaneous absorption of compounds by reducing the barrier property of the skin has attracted high scientific interest for drug delivery systems. The stratum corneum (SC), the outermost layer of skin, has been identified as a primary factor that determines the barrier function for the percutaneous absorption of drugs and other organics in skin. The SC is formed of 294 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

multilamellar lipid bilayer membranes surrounded by flattened dead cells. Small hydrophobic or nonpolar molecules can penetrate into the SC via the intercellular route and then diffuse across the lipid bilayer membranes, whereas hydrophilic or polar molecules can only partition into the SC through the transcellular route or transport via pre-existing aqueous pathways in the form of sweat ducts and hair follicles (100). Several experimental studies (101) have explored the action mode of penetration enhancers, and the suggestion of the enhancement mechanisms included: (1) disorganization of the highly ordered structures of SC by interacting with intercellular lipids can enhance the paracellular diffusivity via the SC; (2) transcellular permeation can be increased by interacting with intracellular proteins of the corneocytes; and (3) directly increasing the partitioning properties of the drug into the SC. Although different mechanisms of enhancer mode have been measured and illustrated, the relationships between lipophilicity and penetration potency from experimental studies and molecular dynamics modeling of the chemical structures of the enhancers are needed to provide more detailed elucidation of the mechanisms of enhancement and to further predict enhancement potency (102). However, only a few molecular modeling and QSAR studies for skin penetration enhancers have been performed. The compounds whose penetration is to be enhanced could have a high structural diversity or produce distinct activity relationships for a given penetration enhancer dataset. In other words, we cannot directly use a QSAR model developed for one drug for a given set of skin penetration enhancers for another drug. Thus, we may develop a unique QSAR model for each drug for a given penetration enhancer dataset and the molecular modeling might also be limited. However, the penetration enhancement of non-polar drugs is governed by a common set of physicochemical properties (103). Manisha Iyer and co-workers (104) constructed QSAR models for four distinct skin penetration enhancer datasets composed of 61, 44, 42, and 17 compounds. The first three relatively large datasets involved the action of non-polar skin penetration enhancers. The fourth relatively small dataset addressed skin penetration enhancement for polar drugs. Significant QSAR models were built using classic QSAR descriptors and 4D-fingerprints and applying multidimensional linear regression models and genetic algorithms for optimization. The resultant QSAR models were built using only 4D-fingerprint descriptors, and no reasonable QSAR models were built when only classic descriptors were applied for two of the four datasets. According to the comparisons of the descriptor terms and regression coefficients, across each pair of best QSAR models for the four skin datasets, no significant similar terms were revealed. Therefore, the mechanisms of enhanced skin transport were distinct and depended on the chemical diversity of both the skin enhancer and the penetrant. The largest mechanism of transport is between polar and nonpolar penetrants. To refine the models built by Manisha’s works, Zheng, et al. (105) expanded the trial descriptor sets and performed member-interaction QSAR (MI-QSAR) (106) analysis to construct skin penetration enhancer QSAR models and to further investigate a better elucidation of the mechanisms of enhanced skin transport. MI-QSAR analysis simulates the transportation of a chemical through 295

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

a phospholipid bilayer using molecular dynamics simulation, MDS (85). The majority of descriptors used in this study were intermolecular descriptors, which were calculated from the MDS trajectories, and the descriptors indicated interactions between the skin penetration enhancer and phospholipids member. In the optimized MI-QSAR models, there is a newly developed and dominant descriptor, indicating how large “holes” are formed by the presence of the skin penetration enhancer in the phospholipid monolayer. Therefore, the resultant MI-QSAR models revealed that good penetration enhancers can enter the phospholipid monolayer, change the structure of the DMPC monolayer and increase the size of holes in the monolayer compared to poor penetration enhancers. Using chemical penetration enhancers (CPEs) can also enhance transdermal delivery of insulin. Recently, a quantitative structure-property relationship (QSPR) model (107) was studied for the prediction of insulin permeation using CPEs. Forty-eight potential CPEs were identified, and 35 of 48 CPEs were used as the training dataset and 13 as the testing dataset. Twelve additional CPEs collected from the literature were also included in the testing dataset. A six-descriptor non-linear QSPR model using artificial neural networks coupled with differential evolution (DE) was constructed. The QSPR models suggested that greater hydrophobicity and reactivity of compounds could increase the potential insulin-specific CPEs, whereas higher dipole moments decrease the potency. The predicted value of R2 and Q2 for the above skin penetration QSAR models are listed in Table 6.

Skin Sensitization Allergic contact dermatitis (ACD) is driven by the T-lymphocyte-mediated immune response against haptens coming onto the skin (108). The haptens (small allergenic molecules) enter the skin and react with carrier protein to become an antigenic hapten-protein complex. The complex is then migrated to the skin-draining lymph nodes processed by antigen-presenting cells. The potential of a compound to be a contact allergen depends on its ability to penetrate the stratum corneum and on its means to react with skin proteins, either directly or after metabolic activation. Thus, the reactivity profile of molecules plays a major role as potential chemical allergens. The mechanisms of the excited state interactions of skin-sensitizing carcinogenic coumarins are shown in Moore’s studies (109), thus providing a reasonable concept for the studies of the structure-activity relationship of skin-sensitizing compounds. Studies reported by Mantulin, et al. also show the skin-sensitizing coumarins derivatives have partially localized triplet states (110). Earlier studies (111) found 5-fluorouracil to be much more reactive than thymine based on the analysis of the excited state of skin-sensitizing carcinogenic molecules. Wondrak, and et al. (112) concluded that the photoexcited states of endogenous skin chromophores, like porphyrins, melanin precursors and cross-link-fluorophores of skin collagen, result in sensitized skin photo-damage by interaction with substrate molecules to form reactive oxygen species. Further, 296

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

the cycloaddition of the excited state for some skin-sensitizing carcinogenic compounds is identified as the most favorable pathway (113), which matches the experimental results. Overall, these studies suggest that the properties of the molecular excited state and ground state could be important factors in constructing accurate computational models that can incorporate the overall mechanism of skin sensitization. QSAR models have been developed using descriptors derived from the ground state of molecules (114–116). These descriptors were explicitly derived from the electronic structure, such as the HOMO and LUMO, or empirical features, like two-dimensional electrotopological descriptors (117, 118).

Table 6. Summary of recent skin penetration and sensitization prediction models. penetration

skin penetration

skin sensitization

Classification method

Performance

Reference No.

GFA

R2=0.83, Q2=0.75

(104)

MI-QSAR

R2=0.79, Q2=0.71

(105)

QSPR

R2=0.86

(107)

Two-state PLS-CLR

accuracy=73.3-80

(119, 123)

Three-state PLS-CLR

accuracy=63.6

(121)

Two-2-state PLS-CLR

accuracy=54.6

(121)

Two-state PLS-CLR(EMAX)

accuracy=96.4

(122)

Two -state PLSCLR(GEMAX)

accuracy=92.8

(122)

Three-state PLS-CLR(EMAX)

accuracy=87.9

(122)

Three-state PLSCLR(GEMAX)

accuracy=72.7

(122)

297 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

Li Y. et al. (119) successfully built a two-state categorical QSAR model to characterize skin sensitization using the ground state descriptor of a set of compounds collected from the validated in vivo murine local lymph node assay (LLNA) (120). A set of ground state 4D-fingerprints (4D-FPs) coupled with the logistic regression (LR) and partial least squares regression algorithm (PLS-CLR) were used to build the two-state (sensitizer and non-sensitizer) categorical QSAR models. The cross-validated prediction accuracy of PLS-CLR models ranges from 87.1 to 89.4% and 73.3 to 80.0% for the training and testing sets, respectively. The effective models for separating non-sensitizers from sensitizers show that certain ground state descriptors can simply provide the reactivity behavior of molecules. Li Y. et al. used the same LLNA dataset applying both LR and PLS-CLR methods to construct 3-state and two-2-state (four categories in total) categorical QSAR models for the evaluation of skin sensitization (121). The 3-state QSAR classification model yielded an accuracy of 73.4% for the training set and 63.6% for the testing set. The two-2-state QSAR model produced an accuracy of 83.2% for the training set and 54.6% for the testing set. The results suggest that combing more than two categorical states in constructing skin-sensitization models results in a loss of accuracy and applicability, which may be a consequence of the lack explicit descriptors derived from the excited states of the molecules. In a more recent study (122), the ground state 4D-FP(GMAX), excited state 4D-FP(EMAX) and the combinatorial 4D-FP descriptors (GEMAX) containing ground and excited state were used in the construction of categorical QSAR models. The methodology of PLS-CLR was again applied. The constructed 3-state and 2-state models derived from the EMAX and GEMAX datasets have higher predictability than those constructed using the GMAX dataset and the corresponding models built from previous studies. There are no significant differences between the EMAX and GEMAX 4D-FP based skin-sensitization models. The prediction accuracy of the above skin sensitization classification models for the testing set are listed in 6.

Metabolism Drug metabolism often entails the chemical conversion of drug substances to detoxify xenobiotics prior to excretion. However, drug metabolism is a complicated process that relies on a variety of different enzymes and sites of metabolism. Because cytochrome P450 (CYP) is the most important enzyme responsible for Phase I metabolism, in silico prediction models of CYP inhibition are highlighted and discussed in this section. The sites of drug metabolism of enzymes often dictate whether a drug molecule is worthy of further investment; therefore, in silico models built for this important subject of investigation are highlighted and discussed below.

298 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

Cytochrome P450 Inhibition Cytochrome P450 (CYP) is a family of isozymes responsible for drug metabolism, primarily in the liver. More than fifty CYP isozymes have been recognized, and the following subtypes are responsible for metabolizing approximately 90 percent of drugs: CYP1A2, CYP2C9, CYP2C19, CYP2D6 and CYP3A4 (124, 125). These enzymes facilitate a variety of reactions, including N-, O-, and S-dealkylation, aromatic-, aliphatic-, and N-hydroxylation, N-oxidation, sulfoxidation, deamination, and dehalogenation (126). CYP enzyme inhibition is one of the main causes of adverse drug-drug interactions. More than 900 drugs and natural chemicals have been reported to cause liver damage; for some, this could lead to liver failure and a necessary liver transplantation operation; for others, the damage could be fatal (124, 127, 128). Similarly, hepatotoxicity and drug-induced liver injury are the main factors leading to clinical trials failures for many drug candidates and why many drugs were removed or recalled from the market. The detection of potential hepatotoxicity could reduce these undesirable outcomes in early drug development stages. Computational models have increasingly been studied to elucidate CYP interactions with drug-like compounds in the last decade. QSAR-based models for the prediction of P450 metabolism have been widely studied over the last two decades and extensively reviewed in several literatures (57, 129–135). By correlating biological CYP inhibitory activities with structural features and properties, QSAR analyses are advantageous in two ways: (1) the quantitative values of CYP inhibitory activity can be directly predicted; (2) the key structural features of molecules contributing to CYP inhibition can also be evaluated. However, most of the QSAR-based models can only be built successfully using analogs for the training compounds. Several machine learning algorithms have been used to construct in silico CYP inhibition classification models, including decision tree induction (136), backpropagation artificial neural networks (137), recursive partition (138), Gaussian kernel weighted k-nearest neighbor (139), associative neural networks (140), and support vector machine algorithms (141–144). However, the applicability of most CYP classification models or available classification web-servers is not optimal because they were constructed from a small number of datasets. Most importantly, these CYP classification models only provide yes/no results. Although several rule-based QSAR CYP prediction models have been studied over the years, there are currently no accurate and readily available rule-based QSAR CYP models publically available for users in the form of a web-server. Rule-based models provide beneficial utility that is unmatched by the models described above. Most notably, rule-based classification models are generally fastperforming, and have the ability to identify rulesets for structural features that are related to specific CYP isozymes inhibition. These interpreted rulesets can assist medicinal chemists in the design or synthesis of novel compounds by avoiding structural features that may potentially inhibit specific CYP enzymes.

299 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

Table 7. Summary of CYP inhibition models for five P450 endpoints. P450 enzyme

Classification method

Performance

Reference No

CYP1A2

C5.0

accuracy=93.0 (testing)

(147)

Recursive Partition

accuracy=81% (testing)

(138)

ASNN

accuracy=68% (testing)

(140)

SVM

accuracy=93% (testing)

(144)

WhichCyp(SVM)

accuracy=87% (testing)

(143)

BP-ANN

accuracy=59.7-73.1% (testing)

(137)

C5.0

accuracy=84.6 (testing)

(147)

SVM

accuracy=89% (testing)

(144)

WhichCyp(SVM)

accuracy=84% (testing)

(143)

BP-ANN

accuracy=70.5-81.0% (testing)

(137)

C5.0

accuracy=81.4 (testing)

(147)

SVM

accuracy=89% (testing)

(144)

WhichCyp(SVM)

accuracy=86% (testing)

(143)

BP-ANN

accuracy=75.4-86.7% (testing)

(137)

C5.0

accuracy=90.6 (testing)

(147)

Recursive Partition

accuracy=89% (testing)

(138)

KNN

accuracy= (testing)

(139)

SVM

accuracy=85.0% (testing)

(144)

WhichCyp(SVM)

accuracy=84% (testing)

(143)

BP-ANN

accuracy=78.5-87.8% (testing)

(137)

C5.0

accuracy=87.9% (testing)

(147)

KNN

accuracy=82% (testing)

(139)

SVM

accuracy=87% (testing)

(144)

WhichCyp(SVM)

accuracy=84% (testing)

(143)

BP-ANN

accuracy=66.3-76.0% (testing)

(137)

CYP2C19

CYP2C9

CYP2D6

CYP3A4

However, an important issue to be addressed when building rule-based classification models is the highly skewed P450 dataset derived from high-throughput screening experiments, especially the CYP2D6 datasets, because 300 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

the ratio of the number of CYP2D6 inhibitors and non-inhibitors is imbalanced (only 19% of the CYP2D6 compounds in the datasets are inhibitors). The issue of highly imbalanced dataset explains why there are no accurate and confident CYP2D6 yes/no classification models in the previous studies compared to models for other CYP enzymes. For imbalanced datasets, the design of a good strategy to sample representative molecules in the training compounds will promote the effectiveness of the classification models (145). Currently, in silico CYP classification models that can suggest a set of rule information derived from structural features of compounds contributing to the five major CYP enzymes were published (146). A rational sampling algorithm was developed by applying an oversampling strategy incorporated with an appropriate strategy for the selection of representative molecules to build a new balanced training and testing datasets, and the performance of the CYP prediction models was significantly enhanced. The training and testing accuracy for the best models in CypRules (version 2.0) are significantly higher than all of the models in the previous studies. The optimized C5.0 model for CYP2D6 also provided excellent predictability. The P450 classification models employing different methodologies are summarized in Table 7 for comparison of their accuracies. A freely accessible CYP prediction web server, CypRules (version 2.0) (147), which can evaluate structural rulesets of CYP inhibition for any testing compounds submitted to the server, was also provided. Five key rules of CYP inhibition provided by CypRules can be used for further inspection of chemical structures. The optimized models can also be applied for rapid virtual high-throughput screening due to the rulesetsbased nature.

Prediction of the Sites of Metabolism The accurate prediction of the sites of metabolism (SoMs) and small molecules binding mode in metabolic enzymes has several advantages and multiple applications, for example, to assist in the identification of potential in vitro or in silico hits, to help prioritize experiments, to enable the design of better drugs, to predict metabolite-related toxicity (e.g., CYP1A2-mediated oxidation of aniline leads to carcinogenic metabolites (148)), and to assist in the investigation of CYP enzyme polymorphism (149). These possibilities accelerated the advancement of computational approaches to predict the metabolism of small molecules by CYP enzymes (150–152) to which the Rydberg group (153–155) made significant contributions. There are three classified approaches: ligand-based, reactivity-based, and structure-based methods. The ligand-based approach encompasses several methods, including quantitative structure–activity relationships (156), pharmacophore, quantum mechanical-derived rules (154, 157), and descriptors (158). Reactivity-based (e.g., calculation of the activation energies of each potential reactive center by DFT or semi-empirical calculations, such as in CypScore or fragment recognition, such as in SMARTCyp (154)), and structure-based (e.g., docking) methods (151, 159–164). A number of SoMs prediction systems have been devised, but most of them only consider a single aspect of reaction (165), as illustrated by ligand-based methods (157) These 301

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

systems do not account for substrate recognition by the CYP enzymes. Similarly, because structure-based approaches are often validated on a single CYP enzyme, the transferability to other CYP enzymes is unknown (166, 167). Ideally, a prediction system that considers both CYP protein structures and ligand chemical reactivity will result in more realistic and accurate estimation. Cruciani et al. (MetaSite (168)) and Oh et al. (MLite (169)), are pioneers in combining the ligand-based, reactivity-based, and structure-based approaches. Few efforts have been made to study the significance of the predictions. A fully automated system (IMPACTS (170)) combines ligand reactivity estimation and structure-based design, including docking and transition state modeling for the prediction of the SoM of drugs. IMPACTS is applied to the CYP1A2, CYP2C9, CYP2D6, and CYP3A4 enzymes, and the accuracy and significance of the system are demonstrated. Different in silico models for the prediction of the sites of metabolism and their predictability are summarized in Table 8.

Toxicity Toxicity is one of the more direct measures of drug effect in vitro as well as in vivo. There are multiple methods to monitor different types of toxicity; as a result, more and larger databases are available for construction, training, and validation of in silico prediction models. In the following sections, in silico models built specifically for the prediction of hERG toxicity, cytotoxicity, mutagenicity, carcinogenicity, teratogenicity, developmental toxicity and acute toxicity are highlighted and discussed. hERG The human Ether-a-go-go Related Gene (hERG) is a potassium channel that plays a crucial role in the coordination of the heart’s beating. When this ion channel is inhibited, its ability to conduct electrical current across the cell membrane is compromised, leading to prolongation of QT intervals or development of cardiac arrhythmia, otherwise known as Torsades de Pointes (TdP). In severe cases, hERG inhibition can lead to long QT syndrome (173–176) and result in sudden death. Several clinically successful drugs are known to inhibit hERG; physicians and patients should be advised about the possible risks prior to administration. Ideally, it is best to avoid potentially hERG-inhibiting agents in the drug development phases. For this reason, the generation of robust and expandable in silico models for hERG prediction is one of the top priorities. Many in silico hERG prediction models have been published to assist in the identification and elimination of drug candidates with the ability to block hERG channels (177–180). Several of these classification or prediction models were built using quantitative structure-activity relationship (QSAR)-based methodologies (181, 182), including Bayesian (183), decision tree (184), neural networks (182), support vector machines (SVM) (185–188) and partial least squares (PLS) (189) methods. A survey of these QSAR-based models suggests which methodology is best suited for the construction of hERG prediction models. 302

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Table 8. Summary of the prediction models for the prediction of the sites of metabolism. P450 enzyme

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

CYP1A2

CYP2C9

CYP2D6

CYP3A4

Model Name

Performance

Reference No.

Structure-based+reactivity (IMPACTS)

accuracy=80.5%

(170)

SVM (RS-Predictor)

accuracy=83%

(158)

Structure-based+reactivity (IMPACTS)

accuracy=76.4%84.4%

(170)

SVM (RS-Predictor)

accuracy=79.7%81.6%

(158)

DFT (SMARTCyp)

accuracy=66.9%67.7%

(154)

Semi-empirical (StarDrop)

accuracy=77.4%78.4%

(171)

MIF+reactivity (MetaSite)

accuracy=68.8-91%

(168)

Mechanism-based (QMBO)

accuracy=84%

(172)

Structure-based+reactivity (IMPACTS)

accuracy=70.7%71.2%

(170)

SVM (RS-Predictor)

accuracy=78.7%86.6%

(158)

DFT (SMARTCyp)

accuracy=48.5%68.1%

(154)

Semi-empirical (StarDrop)

accuracy=69.2%81.5%

(171)

MIF+reactivity (MetaSite)

accuracy=61.8%65.4%

(168)

Structure-based+reactivity (IMPACTS)

accuracy=70.1%82.5%

(170)

SVM(RS-Predictor)

accuracy=72.7%85.7%

(158)

DFT (SMARTCyp)

accuracy=73.1%77.2%

(154)

Semi-empirical (StarDrop)

accuracy=66.9%77.5%

(171)

MIF+reactivity (MetaSite)

accuracy=61.8-87%

(168)

Mechanism-based (QMBO)

accuracy=84%

(172)

A PLS classification model for hERG, published by Keseru et al. (189), resulted in 85% accuracy for a training set of 55 compounds and 83% accuracy for a testing set of 95 compounds. In another study, a Bayesian classification 303 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

model published by Sun et al. (183) was used on a training set of 1979 in-house compounds and a testing set of 66 compounds. This model resulted in a receiver operating characteristic (ROC) accuracy of 87% for the training set and an ROC accuracy of 88% for the testing set. Gepp and Hutter (184) described a decision tree hERG classification model with reported 92% accuracy for a training set of 264 compounds and 76-80% accuracy for a testing set of 75 compounds. Roche et al. (182) implemented an hERG classification model, constructed using supervised neural networks, with an accuracy of 93% for a training set of 244 compounds and an accuracy of 82% for a testing set comprised of 72 compounds. Li et al. (190) published an hERG classification model constructed using the SVM method, which resulted in an overall classification accuracy of 74% for a training set of 495 compounds and an accuracy of 73% for a testing set of 1877 compounds obtained from a PubChem dataset (AID 376) (191). Overall, a sampling of successful hERG models from the literature revealed that models constructed using the SVM methods achieved higher accuracy for the training set compounds. From the QSAR-based models presented above, at first glance, the model proposed by Li et al. resulted in a lower accuracy for the training and testing sets compared with the other studies. A closer investigation reveals that this result is due to the considerably larger training set of 495 compounds and testing set of 1877 compounds that were used in Li’s study, whereas the other models used a testing set of approximately 72 to 95 compounds. Moreover, because most of the QSAR models for hERG prediction were only applied to a small testing set containing 72 to 95 compounds, these models lack sufficient validation, with the exception of the protocol by Li et al. In another study, Huang and Fan (192) used the hERG training set of 495 compounds from Li et al. (190) to construct SVM classification models with descriptors selected by a genetic algorithm (GA) (193–195). An external testing set of 1948 compounds was obtained from the PubChem bioassay database (AID 376), and the best SVM classification model from this study resulted in an accuracy of approximately 87% for the training set and 82% for the testing set (192). In 2010, Su et al. described a hERG binary classification QSAR model (25) constructed using the genetic function approximation (GFA) methodology (24). Su’s model is better than the previously published classification models (182, 189, 196–200) at predicting the hERG potency of compounds. The training set for this model was constructed using a set of 250 structurally diverse compounds collected from the literature with known IC50 values of the hERG block, and the testing set was another 876 compounds derived from a condensed version of the PubChem bioassay (AID 376). This hERG classification model achieved 91% accuracy for the training set and 83% accuracy for the testing set. To further the work in the area of hERG classification modeling, Shen et al. addressed the active versus inactive imbalance issue typically seen in high-throughput screening results in another study (35). The PubChem hERG Bioassay dataset (AID 376; containing 163 active and 1505 inactive compounds) was first pruned of compounds violating the Lipinski’s Rule-of-Five and then those compounds that did not fall within the specified logP range, before the dataset was used as the training set (35). To avoid over-fitting the SVM model, they applied linear SVM modeling and a deletion strategy to reduce the size of the 304

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

descriptor pool and then judiciously selected molecular features from the reduced descriptor pool. This preferred approach maximizes the correct classification of compounds for hERG toxicity. An external dataset consisting of 356 compounds collected from available literature data was used as the testing set. This testing dataset was used to validate the models; it comprises 287 active and 69 inactive compounds. The optimized model yielded an accuracy, sensitivity and specificity of 95%, 90% and 96% for the training set, respectively, and led to overall accuracy of 87% for the additional validation dataset. To compare the overall quality of the each hERG classification model, eleven published in silico studies of hERG classification employing different methodologies are listed in Table 9.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

Cytotoxicity Drugs, or exogenous chemical compounds, used to treat various human diseases are prone to cause toxicity and other adverse effects. For this reason, toxicity testing is a necessary precaution during the drug development processes to ensure the success of drug development research projects and ultimately to ensure the safety of patients when these drugs become available. Cytotoxicity is one of the more fundamental yet important methods for biological evaluation. As a result, there are many assays and an abundance of data readily available. There have also been a number of successful in silico cytotoxicity prediction models reported in the literature. These models were constructed using QSAR-based approaches and have been successfully applied in predicting the toxicity of different cell lines, such as the radical-based toxicity of phenols in a murine leukemia cell line (203), the toxicity of imidazolium-derived ionic liquids in Caco-2 cells (204), and cellular toxicity in HTS data for various cell lines (205). It is important to note that the effectiveness and applicability of in silico models are dependent on the training compounds, the physiochemical descriptors, and the machine learning algorithms selected (206). Many machine learning algorithms have been used to construct classification models for cytotoxicity prediction, including neural network, random forest (RF), and decision tree (207). The use of appropriate machine learning algorithms is crucial in building a reliable predictive model. For example, Guha and Schurer (205) curated and constructed RF-based cytotoxicity classification models of screened compounds from the National Center for Chemical Genomics (NCGC) for 13 different cell lines. The NCGC Jurkat model was used to validate the toxicity of the Scripps Jurkat dataset derived from the Molecular Library Screening Center Network (MLSCN). The Scripps/MLSCN dataset was used to validate the Guha and Schurer CATS2D-based random forest model, and the cytotoxicity classification accuracy was 67.5%. This reported accuracy positively reflects the applicability of this classification model for an external testing dataset; however, a closer look at the sensitivity and specificity of the model indicates that the result was skewed towards the model’s ability to better predict known actives. Specifically, the sensitivity (the model’s ability to predict known active compounds) was 76.3%, and the specificity (the model’s ability to predict known inactive compounds) was 305 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

26.0%. As will be shown, the performance of toxicity classification models can be improved by using different machine learning algorithms, descriptor classes, and sampling strategies.

Table 9. A summary of the different in silico hERG prediction models. Accuracy of the Training Set Predictions (number of compounds)

Accuracy of the Testing Set Predictions (number of compounds)

Support vector machine with 4D-FPs

96% (876)

87% (356)

(35)

PLS (traditional & hologram QSAR)

83~87 (55)

83% (95)

(189)

Shape signatures

69%~73% (83)

85-95% (21)

(200)

Fragment-based – evolutionary algorithm

87~89% (70-100)

85-90% (22-24)

(197)

Recursive partition

96% (100)

93-96% (55)

(198)

Binary QSAR model

83-87% (150-223)

78-86% (58)

(201)

Supervised neural network

93% (244)

82% (72)

(182)

Similarity-based method

76% (275)

80% (500)

(199)

GFA Binary QSAR Model (40µM cutoff)

86% (356)

83% (876)

(202)

SVM with GRIND descriptors

70-86%a (495)

73% (1877)

(190)

SVM with atom descriptors

92%:ROCb(977)

94% (66)

(185)

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

Modeling Methodology

Reference No.

a

The reported method includes linear and nonlinear models at different threshold values. 86% accuracy is for the linear SVM model at a 1μM threshold and 72% is an approximate overall accuracy for the nonlinear SVM model at a 30μM threshold. (The precise values are not stated in the reference.) b ROC: receiver operating characteristic

306 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

In an recent publication, Chang et al. explored and discussed the influences of using different combinations of descriptor sets (1D, 2D MOE, and 4DFP), dataset compositions (biological end points from the Jurkat cell line or another collection of cytotoxic molecules), oversampling strategies (various ratios were tested), and methods for model construction (e.g., SVM, RF) for the prediction of cytotoxicity using an imbalanced qHTS assay dataset (208). Compared to previously published studies, oversampled datasets resulted in SVM models with improved predictions for both the training and external testing sets. The predicted accuracies of the above two cytotoxicity models for the testing dataset are compared in Table 10.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

Table 10. A summary of the cytotoxicity models. Molecular Descriptors

Classification method

Performance

Reference No.

CATS2D

RF

accuracy=67% (testing)

(205)

4D-FP

SVM

accuracy=71% (testing)

(208)

Mutagenicity Mutagenicity is an important factor to consider in any drug development effort; early detection of mutagenicity at preclinical drug discovery stages can aid in the development of safe therapeutic agents by halting the development of potentially harmful drugs. Mutagenicity is a term used to broadly describe the property of chemical agents or drug substances to induce genetic mutation. Mutagenicity is sometimes used interchangeably with the term genotoxicity, especially concerning the discussion of chemical agents to deleteriously change the genetic material in a cell. However, while all mutagens are genotoxic, not all of the genotoxic substances are mutagenic (209). To avoid the selection of mutagens for drug development in the drug candidate screening process, the Ames test is the most common in vitro approach for determining mutagenicity. The Ames test was first introduced in the early 1970’s by Bruce Ames (210–212) and is a well-established and widely accepted method to assess the mutagenic potential of compounds to cause genetic damage in bacterial cells (210). Deleterious genetic changes are central to the overall development of cancer, and evidence of mutagenic activity may indicate a chemical substance’s potential to encourage carcinogenic effects. In therapeutic agents, carcinogenicity is strongly correlated with mutagenicity (213). A positive Ames test could suggest that a chemical agent is mutagenic and highly likely to be carcinogenic; however, false-positive and false-negative test results have been reported. The Ames test is still the preferred standard in vivo assay because it is a quick, convenient, and cost-effective method for estimating compound mutagenicity (carcinogenicity).

307 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Table 11. Structural alerts of mutagenicity. Acylating, Direct Acting Agents acyl halides isocyanate and isothiocyanate groups β-lactones (and γ-sultones) Alkylating, Direct Acting Agents alkyl (C < 5) or benzyl esters of sulfuric, sulfonic, phosphoric, or phosphonic acid N-methylol derivatives

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

S or N mustard β-lactones and γ-sultones epoxides and aziridines aliphatic halogens alkyl nitrite α,β-unsaturated carbonyls simple aldehyde quinines Alkylating, Indirect Acting Agents monohaloalkene hydrazine aliphatic azo and azoxy alkyl carbamate and thiocarbamate alkyl and aryl N-nitroso groups azide and triazene groups aliphatic N-nitro group α,β-unsaturated aliphatic alkoxy group Intercalating and DNA Adduct Forming, Indirect Acting Agents polycyclic aromatic hydrocarbons heterocyclic polycyclic aromatic hydrocarbons coumarins and furocoumarins Aminoaryl DNA Adducts Forming, Indirect Acting Agents aromatic nitroso group aromatic ring N-oxide nitro-aromatic primary aromatic amine, hydroxyl amine, and its derived esters Continued on next page.

308 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Table 11. (Continued). Structural alerts of mutagenicity. Aminoaryl DNA Adducts Forming, Indirect Acting Agents bisaromatic mono- and dialkylamine teraromatic N-acyl amine aromatic diazo Nongenotoxic Carcinogens (poly) halogenated cycloalkanes thiocarbonyl

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

halogenated benzene halogenated PAH halogenated dibenzodioxins

An important advantage of the Ames test is that the available databases are more complete and larger in volume because usually correlates with life-time rodent carcinogenicity studies, which require 2 years to complete (214). We built our models specifically for the scaffold analysis of DNA reactive (mutagenic) chemical agents; therefore, the carcinogenic risks associated with these agents will not be discussed. We use the word “scaffold” primarily to describe the core structure of compounds. In accordance with the International Conference on Harmonisation (ICH) M7 guideline updated in June of 2014, an expert rule-based and statistic-based quantitative structure-activity relationship (QSAR) model can be used to estimate the potential mutagenicity of impurities in pharmaceuticals (215). Similarly, these computational models can also be used to identify potential mutagens in drug safety evaluation. In the early drug discovery and development stages, the application of in silico models to predict mutagenicity is an approach that has gained popularity, sometimes even before prospective drug compounds are synthesized (216). By avoiding synthesizing compounds with potential mutagenicity, the time and cost for drug design and development can be considerably reduced. Consequently, several commercially and publicly available in silico prediction models have been developed using the endpoints of the Ames test to predict the mutagenicity of various compounds in recent years. Currently, structural alert-based (217, 218) and QSAR-based (219, 220) models are the two main strategies for developing models for Ames mutagenicity prediction. Structural alerts (SAs)-based expert prediction systems include DEREK for Windows (217) (DfW) and Toxtree (221). The toxicological alerts are derived from the literature, academic and industry experts, available experimental data (222–224), and Benigni-Bossa rules (225). The QSAR-based approaches (e.g., Leadscope Model Applier (LSMA) (219) and MultiCASE (MC4PC) (220) use regression models to illustrate the relationship between molecular properties (e.g., lipophilicity, polarizability, electron density, and topology) and the mutagenicity of compounds being studied (226). The structural alert-based and 309

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

QSAR-based models have many advantages, but one limitation is that they cannot directly indicate a scaffold’s potential to cause mutagenicity (227). We believe it would be beneficial to be able to relate the core structures of a compound with the associated Ames mutagenicity. The structural alerts approach only evaluates functional groups (Table 11), and the correlative QSAR-based approach mostly emphasizes side chain or functional group analysis of an analog series. Core structures or scaffolds are mostly neglected in the two aforementioned approaches. If a particular scaffold (core structure) is associated with mutagenicity, both the structural alert-based and QSAR-based models fail to identify compounds with this scaffold as potential mutagens. This presents a serious problem, for example: drug compounds usually share one or several similar core structures with different combinations of side chains. If drugs containing a mutagenic scaffold in the early drug development stages are not identified and eliminated, all of the drugs from this series may be mutagenic. A benchmark dataset for in silico prediction of Ames mutagenicity, containing 6,512 compounds (228), was used to analyze the relationship between mutagenicity and the scaffolds of diverse compounds, and the Scaffold Hunter (229) strategy was used to generate hierarchical relationships by correlating the scaffolds and predicted mutagenicity. By analyzing the scaffold relationships, a list of scaffolds with correlated potential mutagenicity was established (Table 12). This model can be used as a basis for drug design to prevent the development of potentially mutagenic therapeutic agents, and the listed scaffolds can be used to suggest non-mutagenic scaffolds to replace mutagenic core structures.

Carcinogenicity Cancer is one of the most common causes of death around the world. Any chemicals that can induce tumors, increase tumor incidence, or shorten the time to tumor occurrence are defined as carcinogens (230). Typically, tests to predict cancer risks of chemicals include gene mutation in bacteria and chromosomal damage in mammalian and rodent hematopoietic cells (231). Because the safety evaluation of carcinogenicity in animal models is highly time-consuming, computational tools for the prediction of the carcinogenicity of chemicals has become a focus in the field of ADMET. Current knowledge of carcinogenicity mainly depends on the data generated from rodent carcinogenicity assays. The available on-line resources of rodent carcinogenicity can be obtained from the US National Toxicology Program (NTP) database (http://ntp-apps.niehs.nih.gov/ntp_tox/index.cfm) (232), the Carcinogenic Potency Database (http://potency.berkeley.edu/cpdb.html) (233), Istituto Superiore di Sanita, Chemical Carcinogens: “Structures and Experimental Data” (ISSCAN) (http://www.epa.gov/ncct/dsstox/sdf_isscan_external.html) (234), and Pesticides Action Network (PAN) database (http://www.pesticideinfo.org) (235). The commonly available programs that can be used to predict carcinogenicity include Derek (DfW), CAESAR (236), Lazar (237), HazardExpert (238), and Toxtree. 310

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Table 12. Identified major mutagenic scaffold groups (Acridine, Phenanthrene, Pyrene, Quinoxaline), and minor mutagenic scaffold group (Naphthalene). Scaffold Name

Rate of Mutagen

Compound Number

Acridine

94%

53

Benzoacridine

86%

21

N-Phenylacridin-9-amine

94%

18

Phenanthrene

93%

40

15,16-Dihydrocyclopenta[a]-phenanthren-17-one

77%

13

Chrysene

96%

23

Pyrene

100%

39

Benzo[e]pyrene

90%

10

Benzo[a]pyrene

84%

50

9,10-Dihydrobenzo[a]pyrene

90%

10

Quinoxaline

78%

18

1H-imidazo[4,5-g]quinoxaline

86%

22

Phenazine

92%

25

Naphthalene

62%

81

Anthracene

87%

31

Phenanthrene

93%

40

Acridine Group

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

Phenanthrene Group

Pyrene Group

Quinoxaline Group

Naphthalene Group

Teratogenicity and Developmental Toxicity The assessments of a chemical’s adverse effects of congenital malformations (teratogenicity) or harmful effects on sex, fertility, development in adult males, females, and offspring are termed as studies of teratogenicity and developmental toxicity. Teratogenicity refers to the damage of reproductive capacity, and developmental toxicity usually indicates non-heritable abnormal effects on the progeny. Because the maternal-embryonic interaction is very complex, the majority of the mechanisms of teratogenesis and developmental toxic action are unknown or only partially understood at the cellular level. Furthermore, under the law of REACH enacted by the European Union for new chemical regulation, the assessment of reproductive and developmental toxicity requires 311 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

the highest number of experimental animals and results in the most costly and time-consuming experiments (239). Therefore, the development of alternative computational tools for the prediction of teratogenicity and developmental toxicity is still a challenging issue. The available tools for the prediction of teratogenicity and developmental toxicity include Derek, CAESAR, ToxBoxes (240), TOPKAT (241), and HazardExpert (238).

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

Acute Toxicity The acute toxicity of chemicals refers to the ability to cause severely harmful effects as a result of a single or multiple dose exposure to a substance within 24 hours. The dose of a pesticide required to kill 50% of test animals (LD50 value) is the most frequently used criterion for the measurement of acute toxicity of compounds. REACH has accepted the alternative use of in vitro or in silico models instead of in vivo animal studies. However, acute toxicity may result from different phases of biochemical events. Directly using LD50 to represent the complex phenomena of acute toxicity could lead to loss of information. Therefore, building a single prediction model with high prediction accuracy is a challenge (242). There are currently no scientifically accurate and applicable in silico models or in vitro assays developed to predict acute toxicity (243). Currently, the available tools for the prediction of acute toxicity include ToxBoxes and TOPKAT. A summary of different toxicity prediction tools is given in Table 13.

Table 13. Toxicity prediction software. Software name

Prediction method

Endpoints

Derek (DfW)

Knowledge-based

Genotoxicity Carcinogenicity Chromosome damage Skin sensitization Developmental toxicity Teratogenicity

CAESAR

Statistics-based

Mutagenicity Carcinogenicity Skin sensitization Bioconcentration factor Developmental toxicity

ToxBoxes (ACD/Tox Suite)

hERG Genotoxicity Estrogen receptor binding affinity (reproductive toxicity) Eye Irritation Rodent acute Lethal toxicity Aquatic toxicity Organ-specific health effects Continued on next page.

312 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Table 13. (Continued). Toxicity prediction software.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

Software name

Prediction method

Endpoints

Lazar

KNN

Mutagenicity Liver toxicity Carcinogenicity Maximum recommended daily dose

TOPKAT

QSAR-based

Mutagenicity Developmental toxicity rodent carcinogenicity Rat chronic LOAEL Lowest Observed Adverse Effect Level (LOAEL) Rat Maximum Tolerated Dose (MTD) Rat oral LD50

HazardExpert

Rule-based

Mutagenicity Carcinogenicity Teratogenicity Membrane irritation Immunotoxicity Neurotoxicity

Toxtree

Decision Tree

Skin irritation Skin sensitization Eye irritation Genotoxicity Carcinogenicity P450 drug metabolism

References 1.

2. 3.

4.

5.

6.

Kier, L.; Hall, L. The Kappa Indices for Modeling Molecular Shape and Flexibility. In Topological indices and related descriptors in QSAR and QSPR; Devillers, J., Balaban, A. T., Eds.; Gordon and Breach Science Publishers: Amsterdam, The Netherlands, 1999; pp 455−489. Burden, F. R. Molecular Identification Number for Substructure Searches. J. Chem. Inf. Comput. Sci. 1989, 29, 225–227. Burden, F. R. A Chemically Intuitive Molecular Index Based on the Eigenvalues of a Modified Adjacency Matrix. Quant. Struct.-Act. Relat. 1997, 16, 309–314. Cramer, R. D.; Patterson, D. E.; Bunce, J. D. Comparative Molecular Field Analysis (CoMFA). 1. Effect of Shape on Binding of Steroids to Carrier Proteins. J. Am. Chem. Soc. 1988, 110, 5959–5967. Cruciani, G.; Pastor, M.; Guba, W. VolSurf: a New Tool for the Pharmacokonetic Optimization of Lead Compounds. Eur. J. Pharm. Sci. 2000, 11 (Suppl. 2), S29–39. Pastor, M.; Cruciani, G.; McLay, I.; Pickett, S.; Clementi, S. GRidINdependent Descriptors (GRIND): a Novel Class of AlignmentIndependent Three-Dimensional Molecular Descriptors. J. Med. Chem. 2000, 43, 3233–3243. 313

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

7.

8.

9.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

10.

11. 12.

13.

14.

15.

16.

17.

18.

19.

20.

Cruciani, G.; Crivori, P.; Carrupt, P.; Testa, B. Molecular Fields in Quantitative Structure-Permeation Relationships: the VolSurf Approach. J. Mol. Struct. (Theochem) 2000, 503, 17–30. Cruciani, G.; Pastor, M.; Guba, W. VolSurf: a New Tool for the Pharmacokonetic Optimization of Lead Compounds. Eur. J. Pharm. Sci. 2000, 11, S29–S39. Silverman, B. D.; Platt, D. E. Comparative Molecular Moment Analysis (CoMMA): 3D-QSAR without Molecular Superposition. J. Med. Chem. 1996, 39, 2129–2140. Todeschini, R.; Gramatica, P. New 3D Molecular Descriptors: The WHIM theory and QSAR Applications. In 3D QSAR in Drug Design, Kubinyi, H.; Folkers, G.; Martin, Y., Eds. Springer Netherlands: Dordrecht, The Netherlands, 2002; Vol. 2, pp 355−380. Vedani, A.; Dobler, M. 5D-QSAR: the Key for Simulating Induced Fit? J. Med. Chem. 2002, 45, 2139–2149. Vedani, A.; Dobler, M.; Lill, M. A. Combining Protein Modeling and 6DQsar. Simulating the Binding of Structurally Diverse Ligands to the Estrogen Receptor. J. Med. Chem. 2005, 48, 3700–3703. Deconinck, E.; Xu, Q.; Put, R.; Coomans, D.; Massart, D.; Vander Heyden, Y. Prediction of Gastro-Intestinal Absorption Using Multivariate Adaptive Regression Splines. J. Pharm. Biomed. Anal. 2005, 39, 1021–1030. Norinder, U.; Osterberg, T.; Artursson, P. Theoretical Calculation and Prediction of Intestinal Absorption Of Drugs in Humans Using Molsurf Parametrization and PLS Statistics. Eur. J. Pharm. Sci. 1999, 8, 49–56. Osterberg, T.; Norinder, U. Prediction of Polar Surface Area and Drug Transport Processes Using Simple Parameters and PLS Statistics. J. Chem. Inf. Comput. Sci. 2000, 40, 1408–1411. Zhao, Y.; Le, J.; Abraham, M.; Hersey, A.; Eddershaw, P.; Luscombe, C.; Butina, D.; Beck, G.; Sherborne, B.; Cooper, I.; Platts, J.; Boutina, D. Evaluation of Human Intestinal Absorption Data and Subsequent Derivation of a Quantitative Structure-Activity Relationship (QSAR) with The Abraham Descriptors. J. Pharm. Sci. 2001, 90, 749–784. Huuskonen, J. QSAR Modeling with the Electrotopological State Indices: Predicting the Toxicity of Organic Chemicals. Chemosphere 2003, 50, 949–953. Bowie, J.; Eisenberg, D. An Evolutionary Approach to Folding Small AlphaHelical Proteins That Uses Sequence Information and an Empirical Guiding Fitness Function. Proc. Natl. Acad. Sci. 1994, 91, 4436–4440. Fan, Y.; Unwalla, R.; Denny, R. A.; Di, L.; Kerns, E. H.; Diller, D. J.; Humblet, C. Insights for Predicting Blood-Brain Barrier Penetration of CNS Targeted Molecules Using QSPR Approaches. J. Chem. Inf. Model. 2010, 50, 1123–1133. Jones, G.; Willett, P.; Glen, R.; Leach, A.; Taylor, R. Development and Validation of a Genetic Algorithm for Flexible Docking. J. Mol. Biol. 1997, 267, 727–748.

314 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

21. Junmei, W.; Peter, A. K. Automatic Parameterization of Force Field by Systematic Search and Genetic Algorithms. J. Comput. Chem. 2001, 22, 1219–1228. 22. Kalhapure, R. S.; Salunke, C. L.; Akamanchi, K. G. Qsar Model for Chemical Penetration Enhancers Containing Long Hydrocarbon Chain. Chemometr. Intell. Lab. Syst. 2012, 118, 267–270. 23. Mungalpara, J.; Pandey, A.; Jain, V.; Mohan, C. G. Molecular Modelling and QSAR Analysis of Some Structurally Diverse N-type Calcium Channel Blockers. J. Mol. Biol. 2010, 16, 629–644. 24. Rogers, D.; Hopfinger, A. J. Application of Genetic Function Approximation to Quantitative Structure-Activity Relationships and Quantitative StructureProperty Relationships. J. Chem. Inf. Comput. Sci. 1994, 34, 854–866. 25. Su, B.-H.; Shen, M.-y.; Esposito, E. X.; Hopfinger, A. J.; Tseng, Y. J. In Silico Binary Classification QSAR Models Based on 4D-Fingerprints and MOE Descriptors for Prediction of hERG Blockage. J. Chem. Inf. Model. 2010, 50, 1304–1318. 26. Vapnik, V. Statatistical Learning Theory; Wiley: New York, 1998. 27. Vapnik, V. The Nature of Statistical Learning Theory. Springer: Berlin, 2000. 28. Yong, L. X.; Donald, E. W. Genetic Algorithms for Docking of Actinomycin D and Deoxyguanosine Molecules with Comparison to the Crystal Structure of Actinomycin D-Deoxyguanosine Complex. J. Phys. Chem. 1994, 98, 7191–7200. 29. Friedman, J. H. Multivariate Adaptive Regression Splines. Ann. Stat. 1991, 19, 1–67. 30. Holland, J. H. Adaptation in Natural and Artificial Systems, 2nd ed.; University of Michigan Press: Ann Arbor, MI, 1975. 31. Su, B.-H.; Shen, M.-y.; Esposito, E.; Hopfinger, A.; Tseng, Y. In Silico Binary Classification QSAR Models Based on 4D-Fingerprints and MOE Descriptors for Prediction of hERG Blockage. J. Chem. Inf. Model. 2010, 50, 1304–1318. 32. Tian, S.; Li, Y.; Wang, J.; Zhang, J.; Hou, T. ADME Evaluation In Drug Discovery. 9. Prediction Of Oral Bioavailability In Humans Based on Molecular Properties and Structural Fingerprints. Mol. Pharmaceutics 2011, 8, 841–851. 33. Bikadi, Z.; Hazai, I.; Malik, D.; Jemnitz, K.; Veres, Z.; Hari, P.; Ni, Z.; Loo, T. W.; Clarke, D. M.; Hazai, E.; Mao, Q. Predicting P-Glycoprotein-Mediated Drug Transport Based On Support Vector Machine and Three-Dimensional Crystal Structure of P-glycoprotein. PLoS ONE 2011, 6, e25815. 34. Lind, P.; Maltseva, T. Support Vector Machines for the Estimation of Aqueous Solubility. J. Chem. Inf. Comput. Sci. 2003, 43, 1855–1859. 35. Shen, M.-y.; Su, B.-H.; Esposito, E. X.; Hopfinger, A. J.; Tseng, Y. J. A Comprehensive Support Vector Machine Binary hERG Classification Model Based on Extensive but Biased End Point hERG Data Sets. Chem. Res. Toxicol. 2011, 24, 934–949. 36. Wenqi, Y.; Widmer, N.; De Micheli, G. Personalized Modeling for Drug Concentration Prediction Using Support Vector Machine. Proceedings of 315

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

37.

38.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

39. 40.

41.

42.

43.

44.

45.

46. 47.

48. 49. 50.

51. 52.

the 4th International Conference on Biomedical Engineering and Informatics (BMEI); 15−17 Oct. 2011; IEEE: 2011; pp 1505−1509. Farhad, G.; Ali, E.; Amir, H. M.; Dominique, R. Representation/Prediction of Solubilities of Pure Compounds in Water Using Artificial Neural Network−Group Contribution Method. J. Chem. Eng. Data 2011, 56, 720–726. Fatemi, M. H.; Heidari, A.; Ghorbanzade, M. Prediction of Aqueous Solubility of Drug-Like Compounds by Using an Artificial Neural Network and Least-Squares Support Vector Machine. Bull. Chem. Soc. Jpn. 2010, 83, 1338–1345. Karelson, M.; Dobchev, D. Using Artificial Neural Networks to Predict CellPenetrating Compounds. Expert Opin. Drug Discovery 2011, 6, 783–796. Myint, K.-Z.; Wang, L.; Tong, Q.; Xie, X.-Q. Molecular Fingerprint-Based Artificial Neural Networks QSAR for Ligand Biological Activity Predictions. Mol. Pharmaceutics 2012, 9, 2912–2923. Paixão, P.; Gouveia, L. F.; Morais, J. A. G. Prediction of the in Vitro Permeability Determined in Caco-2 Cells by Using Artificial Neural Networks. Eur. J. Pharm. Sci. 2010, 41, 107–117. Hou, T. J.; Wang, J. M.; Xu, X. J. Applications of Genetic Algorithms on the Structure-Activity Correlation Study of a Group of Non-nucleoside HIV-1 Inhibitors. Chemometr. Intell. Lab. Syst. 1999, 45, 303–310. Xiang, T.; Anderson, B. Influence of Chain Ordering on the Selectivity of Dipalmitoylphosphatidylcholine Bilayer Membranes for Permeant Size and Shape. Biophys. J. 1998, 75, 2658–2671. Xiang, T.-x.; Anderson, B. A Computer Simulation of Functional Group Contributions to Free Energy in Water and a Dppc Lipid Bilayer. Biophys. J. 2002, 82, 2052–2066. Wassermann, A. M.; Bajorath, J. BindingDB and ChEMBL: Online Compound Databases for Drug Discovery. Expert Opin. Drug Discovery 2011, 6, 683–687. Irwin, J. J.; Shoichet, B. K. Zinc-a Free Database of Commercially Available Compounds for Virtual Screening. J. Chem. Inf. Model. 2005, 45, 177–182. Masciocchi, J.; Frau, G.; Fanton, M.; Sturlese, M.; Floris, M.; Pireddu, L.; Palla, P.; Cedrati, F.; Rodriguez-Tomé, P.; Moro, S. MMsINC: a Large-Scale Chemoinformatics Database. Nucleic Acids Res. 2009, 37, 90. Pence, H. E.; Williams, A. ChemSpider: an Online Chemical Information Resource. J. Chem. Educ. 2010, 87, 1123–1124. Hou, T.; Wang, J. Structure-ADME Relationship: Still a long Way to Go? Expert Opin. Drug Metab. Toxicol. 2008, 4, 759–770. Tetko, I.; Bruneau, P.; Mewes, H.-W.; Rohrer, D.; Poda, G. Can We Estimate the Accuracy of ADME-Tox Predictions? Drug Discovery Today 2006, 11, 700–707. Wang, J.; Hou, T. Recent Advances on in silico ADME Modeling. Annu. Rep. Comput. Chem. 2009, 5, 101–127. Lipinski, C. Drug-Like Properties and the Causes of Poor Solubility and Poor Permeability. J. Pharmacol. Toxicol. Methods 2000, 44, 235–249. 316

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

53. Wang, J.; Krudy, G.; Hou, T.; Zhang, W.; Holland, G.; Xu, X. Development of Reliable Aqueous Solubility Models and Their Application in Druglike Analysis. J. Chem. Inf. Model. 2007, 47, 1395–1404. 54. Jorgensen, W.; Duffy, E. Prediction of Drug Solubility from Structure. Adv. Drug Delivery Rev. 2002, 54, 355–366. 55. Lipinski, C.; Lombardo, F.; Dominy, B.; Feeney, P. Experimental and Computational Approaches to Estimate Solubility and Permeability in Drug Discovery and Development Settings. Adv. Drug Delivery Rev. 2001, 46, 3–26. 56. Llinàs, A.; Glen, R. C.; Goodman, J. M. Solubility Challenge: Can You Predict Solubilities of 32 Molecules Using a Database of 100 Reliable Measurements? J. Chem. Inf. Model. 2008, 48, 1289–1303. 57. Hopfinger, A. J.; Esposito, E. X.; Llinàs, A.; Glen, R. C.; Goodman, J. M. Findings of the Challenge To Predict Aqueous Solubility. J. Chem. Inf. Model. 2008, 49, 1–5. 58. Butina, D.; Gola, J. Modeling Aqueous Solubility. J. Chem. Inf. Comput. Sci. 2003, 43, 837–841. 59. Delaney, J. S. Predicting Aqueous Solubility from Structure. Drug Discovery Today 2005, 10, 289–295. 60. Du-Cuny, L.; Huwyler, J.; Wiese, M.; Kansy, M. Computational Aqueous Solubility Prediction for Drug-Like Compounds in Congeneric Series. Eur. J. Med. Chem. 2008, 43, 501–512. 61. Engkvist, O.; Wrede, P. High-Throughput, in Silico Prediction of Aqueous Solubility Based on One- and Two-Dimensional Descriptors. J. Chem. Inf. Comput. Sci. 2002, 42, 1247–1249. 62. Hansen, N. T.; Kouskoumvekaki, I.; Jorgensen, F. S.; Brunak, S.; Jonsdottir, S. O. Prediction of PH-dependent Aqueous Solubility of Druglike Molecules. J. Chem. Inf. Model. 2006, 46, 2601–2609. 63. Hughes, L. D.; Palmer, D. S.; Nigsch, F.; Mitchell, J. B. Why are Some Properties More Difficult to Predict than Others? A study of QSAPR Models of Solubility, Melting Point, and Log P. J. Chem. Inf. Model. 2008, 48, 220–232. 64. Klamt, A.; Eckert, F.; Hornig, M.; Beck, M.; Bürger, T. Prediction of Aqueous Solubility of Drugs and Pesticides with COSMO-RS. J. Comput. Chem. 2002, 23, 275–281. 65. Obrezanova, O.; Gola, J. M.; Champness, E. J.; Segall, M. D. Automatic Qsar Modeling of ADME Properties: Blood-Brain Barrier Penetration and Aqueous Solubility. J. Comput. Aided Mol. Des. 2008, 22, 431–440. 66. Tetko, I.; Tanchuk, V.; Kasheva, T.; Villa, A. Estimation of Aqueous Solubility of Chemical Compounds Using E-State Indices. J. Chem. Inf. Comput. Sci. 2001, 41, 1488–1493. 67. Votano, J.; Parham, M.; Hall, L.; Kier, L.; Oloff, S.; Tropsha, A.; Xie, Q.; Tong, W. Three New Consensus QSAR Models for the Prediction of Ames Genotoxicity. Mutagenesis 2004, 19, 365–377. 68. Wang, J.; Hou, T.; Xu, X. Aqueous Solubility Prediction Based on Weighted Atom Type Counts and Solvent Accessible Surface Areas. J. Chem. Inf. Model. 2009, 49, 571–581. 317

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

69. Wegner, J.; Zell, A. Prediction of Aqueous Solubility and Partition Coefficient Optimized by a Genetic Algorithm Based Descriptor Selection Method. J. Chem. Inf. Comput. Sci. 2003, 43, 1077–1084. 70. Yan, A.; Gasteiger, J. Prediction of Aqueous Solubility of Organic Compounds Based on a 3D Structure Representation. J. Chem. Inf. Comput. Sci. 2003, 43, 429–434. 71. Huuskonen Estimation of Aqueous Solubility for a Diverse Set of Organic Compounds Based on Molecular Topology. J. Chem. Inf. Comput. Sci. 2000, 40, 773–777. 72. Yalkowsky, S. H.; Dannelfelser, R. M. The ARIZONA dATAbASE of Aqueous Solubility; College of Pharmacy, University of Arizona: Tucson, AZ, 1990. 73. Corporation, S. R. Physical/Chemical Property Database-(PHYSPROP); SRC Environmental Science Center: Syracuse, NY, 1994. 74. Pham The, H.; González-Álvarez, I.; Bermejo, M.; Mangas Sanjuan, V.; Centelles, I.; Garrigues, T. M.; Cabrera-Pérez, M. Á. In Silico Prediction of Caco-2 Cell Permeability by a Classification QSAR Approach. Mol. Inf. 2011, 30, 376–385. 75. Castillo-Garit, J. A.; Marrero-Ponce, Y.; Torrens, F.; García-Domenech, R. Estimation of ADME Properties in Drug Discovery: Predicting Caco-2 Cell Permeability Using Atom-Based Stochastic and Non-Stochastic Linear Indices. J. Pharm. Sci. 2008, 97, 1946–1976. 76. Nordqvist, A.; Nilsson, J.; Lindmark, T.; Eriksson, A.; Garberg, P.; Kihlén, M. A General Model for Prediction of Caco-2 Cell Permeability. QSAR Comb. Sci. 2004, 23, 303–310. 77. Sherer, E. C.; Verras, A.; Madeira, M.; Hagmann, W. K.; Sheridan, R. P.; Roberts, D.; Bleasby, K.; Cornell, W. D. QSAR Prediction of Passive Permeability in the LLC-PK1 Cell Line: Trends in Molecular Properties and Cross-Prediction of Caco-2 Permeabilities. Mol. Inf. 2012, 31, 231–245. 78. Han, C.; Zhang, J.; Zheng, M.; Xiao, Y.; Li, Y.; Liu, G. An Integrated DrugLikeness Study for Bicyclic Privileged Structures: From Physicochemical Properties to in Vitro ADME Properties. Mol. Diversity 2011, 15, 857–876. 79. Yamashita, F.; Fujiwara, S.-I.; Wanchana, S.; Hashida, M. Quantitative Structure/Activity Relationship Modelling of Pharmacokinetic Properties Using Genetic Algorithm-Combined Partial Least Squares Method. J. Drug Targeting 2006, 14, 496–504. 80. Zhang, L.; Balimane, P. V.; Johnson, S. R.; Chong, S. Development of an In Silico Model for Predicting Efflux Substrates in Caco-2 Cells. Int. J. Pharm. 2007, 343, 98–105. 81. Hou, T. J.; Zhang, W.; Xia, K.; Qiao, X. B.; Xu, X. J. ADME Evaluation in Drug Discovery. 5. Correlation of Caco-2 Permeation with Simple Molecular Properties. J. Chem. Inf. Comput. Sci. 2004, 44, 1585–1600. 82. Santos-Filho, O. A.; Hopfinger, A. J. Combined 4D-Fingerprint and Clustering Based Membrane-Interaction QSAR Analyses for Constructing Consensus Caco-2 Cell Permeation Virtual Screens. J. Pharm. Sci. 2008, 97, 566–583.

318 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

83. Fenza, A.; Alagona, G.; Ghio, C.; Leonardi, R.; Giolitti, A.; Madami, A. Caco-2 cell permeability modelling: a neural network coupled genetic algorithm approach. J. Comput. Aided Mol. Des. 2007, 21, 207–221. 84. Ma Guangli, C. Y. Predicting Caco-2 Permeability Using Support Vector Machine and Chemistry Development Kit. J. Pharm. Pharmaceut. Sci. 2006, 9, 210–221. 85. Kulkarni, A.; Han, Y.; Hopfinger, A. J. Predicting Caco-2 Cell Permeation Coefficients of Organic Molecules using Membrane-Interaction QSAR analysis. J. Chem. Inf. Comput. Sci. 2002, 42, 331–342. 86. Zhao, Y. H.; Abraham, M. H.; Ibrahim, A.; Fish, P. V.; Cole, S.; Lewis, M. L.; de Groot, M. J.; Reynolds, D. P. Predicting Penetration Across the Blood-Brain Barrier from Simple Descriptors and Fragmentation Schemes. J. Chem. Inf. Model. 2007, 47, 170–175. 87. Lu, J. A Novel Hypothesis of Blood-Brain Barrier (BBB) Development and in Vitro BBB Model: Neural Stem Cell Is the Driver of BBB Formation and Maintenance. J. Exp. Integr. Med. 2012, 2, 39–43. 88. Cucullo, L.; Aumayr, B.; Rapp, E.; Janigro, D. Drug Delivery and in Vitro Models of the Blood-Brain Barrier. Curr. Opin. Drug Discovery Dev. 2005, 8, 89–99. 89. Suenderhauf, C.; Hammann, F.; Huwyler, J. Computational Prediction of Blood-Brain Barrier Permeability Using Decision Tree Induction. Molecules 2012, 17, 10429–10445. 90. Martins, I.; Teixeira, A.; Pinheiro, L.; Falcao, A. A Bayesian Approach to in Silico Blood-Brain Barrier Penetration Modeling. J. Chem. Inf. Model. 2012, 52, 1686–1697. 91. Muehlbacher, M.; Spitzer, G.; Liedl, K.; Kornhuber, J. Qualitative Prediction of Blood-Brain Barrier Permeability on a Large and Refined Dataset. J. Comput. Aided Mol. Des. 2011, 25, 1095–1106. 92. Vilar, S.; Chakrabarti, M.; Costanzi, S. Prediction of Passive Blood-brain Partitioning: Straightforward and Effective Classification Models Based on in Silico Derived Physicochemical Descriptors. J. Mol. Graphics Modell. 2010, 28, 899–903. 93. Narayanan, R.; Gunturi, S. B. In Silico ADME Modelling: Prediction Models for Blood-brain Barrier Permeation Using a Systematic Variable Selection Method. Bioorg. Med. Chem. 2005, 13, 3017–3028. 94. Kortagere, S.; Chekmarev, D.; Welsh, W. J.; Ekins, S. New Predictive Bodels for Blood-Brain Barrier Permeability of Drug-like Molecules. Pharm. Res. 2008, 25, 1836–1845. 95. Garg, P.; Verma, J. In Silico Prediction of Blood Brain Barrier Permeability: An Artificial Neural Network Model. J. Chem. Inf. Model. 2005, 46, 289–297. 96. Hakkarainen, J. J.; Pajander, J.; Laitinen, R.; Suhonen, M.; Forsberg, M. M. Similar Molecular Descriptors Determine the in Vitro Drug Permeability in Endothelial and Epithelial Cells. Int. J. Pharm. 2012, 436, 426–443. 97. Konovalov, D.; Coomans, D.; Deconinck, E.; Heyden, Y. Benchmarking of QSAR Models for Blood-Brain Barrier Permeation. J. Chem. Inf. Model. 2007, 47, 1648–1656. 319

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

98. Wichmann, K.; Diedenhofen, M.; Klamt, A. Prediction of Blood-Βrain Partitioning and Human Serum Albumin Binding Based on COSMO-RS σ-Moments. J. Chem. Inf. Model. 2006, 47, 228–233. 99. Bolboaca, S. D.; Jantschi, L. Predictivity Approach for Quantitative Structure-Property Models. Application for Blood-Brain Barrier Permeation of Diverse Drug-Like Compounds. Int. J. Mol. Sci. 2011, 12, 4348–4364. 100. Scott, E. R.; Phipps, J. B.; White, H. S. Direct Imaging of MolecularTransport through Skin. J. Invest. Dermatol. 1995, 104, 142–145. 101. Barry, B. W. Mode of Action of Penetration Enhancers in Human Skin. J. Controlled Release 1987, 6, 85–97. 102. Kim, N.; El-Kattan, A. F.; Asbill, C. S.; Kennette, R. J.; Sowell, J. W.; Latour, R.; Michniak, B. B. Evaluation Of Derivatives Of 3-(2-Oxo1-Pyrrolidine) Hexahydro-1H-Azepine-2-One as Dermal Penetration Enhancers: Side Chain Length Variation and Molecular Modeling. J. Controlled Release 2001, 73, 183–196. 103. Kanikkannan, N.; Kandimalla, K.; Lamba, S. S.; Singh, M. StructureActivity Relationship of Chemical Penetration Enhancers in Transdermal Drug Delivery. Curr. Med. Chem. 2000, 7, 593–608. 104. Iyer, M.; Zheng, T.; Hopfinger, A. J.; Tseng, Y. J. QSAR Analyses of Skin Penetration Enhancers. J. Chem. Inf. Model. 2007, 47, 1130–1149. 105. Zheng, T.; Hopfinger, A. J.; Esposito, E. X.; Liu, J.; Tseng, Y. J. Membrane-Interaction Quantitative Structure--Activity Relationship (Mi-Qsar) Analyses of Skin Penetration Enhancers. J. Chem. Inf. Model. 2008, 48, 1238–1256. 106. Iyer, M.; Mishra, R.; Han, Y.; Hopfinger, A. J. Predicting Blood-Brain Barrier Partitioning of Organic Molecules Using Membrane-Interaction QSAR Analysis. Pharm. Res. 2002, 19, 1611–1621. 107. Yerramsetty, K. M.; Neely, B. J.; Madihally, S. V.; Gasem, K. A. M. A Skin Permeability Model of Insulin in The Presence Of Chemical Penetration Enhancer. Int. J. Pharm. 2010, 388, 13–23. 108. Engelhard, V. H. How Cells Process Antigens. Sci. Am. 1994, 271, 54–61. 109. Moore, T. A.; Mantulin, W. W.; Song, P. S. Excited-States and Reactivity of Carcinogenic Benzpyrene - Comparison with Skin-Sensitizing Coumarins. Photochem. Photobiol. 1973, 18, 185–194. 110. Mantulin, W. W.; Song, P. S. Excited-States of Skin-Sensitizing Coumarins and Psoralens - Spectroscopic Studies. J. Am. Chem. Soc. 1973, 95, 5122–5129. 111. Ou, C.-N.; Tsai, C.-H.; Song, P.-S. In Research in Photobiology; Castellani, A., Ed.; Plenum Press: New York, 1977. 112. Wondrak, G. T.; Jacobson, M. K.; Jacobson, E. L. Identification of Quenchers of Photoexcited States as Novel Agents for Skin Photoprotection. J. Pharmacol. Exp. Ther. 2005, 312, 482–491. 113. Li, X. Y.; Eriksson, L. A. Photoreaction of Skin-Sensitizing Trimethyl Psoralen with Lipid Membrane Models. Photochem. Photobiol. 2005, 81, 1153–1160. 114. Pan, D. H.; Iyer, M.; Liu, J. Z.; Li, Y.; Hopfinger, A. J. Constructing Optimum Blood Brain Barrier QSAR Models Using a Combination of 4D-Molecular 320

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

115.

116.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

117.

118.

119.

120.

121.

122.

123.

124.

125. 126. 127.

128.

Similarity Measures and Cluster Analysis. J. Chem. Inf. Comput. Sci. 2004, 44, 2083–2098. Hopfinger, A. J.; Wang, S.; Tokarski, J. S.; Jin, B. Q.; Albuquerque, M.; Madhav, P. J.; Duraiswami, C. Construction of 3D-QSAR Models Using the 4D-QSAR Analysis Formalism. J. Am. Chem. Soc. 1997, 119, 10509–10524. Senese, C. L.; Duca, J.; Pan, D.; Hopfinger, A. J.; Tseng, Y. J. 4D-fingerprints, Universal QSAR and QSPR Descriptors. J. Chem. Inf. Comput. Sci. 2004, 44, 1526–1539. Kulkarni, A.; Hopfinger, A. J.; Osborne, R.; Bruner, L. H.; Thompson, E. D. Prediction of Eye Irritation from Organic Chemicals using MembraneInteraction QSARAnalysis. Toxicol. Sci. 2001, 59, 335–345. Kodithala, K.; Hopfinger, A. J.; Thompson, E. D.; Robinson, M. K. Prediction of Skin Irritation from Organic Chemicals Using MembraneInteraction QSAR Analysis. Toxicol. Sci. 2002, 66, 336–346. Li, Y.; Tseng, Y. J.; Pan, D. H.; Liu, J. Z.; Kern, P. S.; Gerberick, G. F.; Hopfinger, A. J. 4D-Fingerprint Categorical QSAR Models for Skin Sensitization Based on the Classification of Local Lymph Node Assay Measures. Chem. Res. Toxicol. 2007, 20, 114–128. Kimber, I.; Basketter, D. A. The Murine Local Lymph-Node Assay - a Commentary on Collaborative Studies and New Directions. Food Chem. Toxicol. 1992, 30, 165–169. Li, Y.; Pan, D.; Liu, J.; Kern, P. S.; Gerberick, G. F.; Hopfinger, A. J.; Tseng, Y. J. Categorical QSAR Models for Skin Sensitization Based upon Local Lymph Node Assay Classification Measures Part 2: 4D-Fingerprint Three-State and Two-2-State Logistic Regression Models. Toxicol. Sci. 2007, 99, 532–544. Liu, J.; Kern, P. S.; Gerberick, G. F.; Santos-Filho, O. A.; Esposito, E. X.; Hopfinger, A. J.; Tseng, Y. J. Categorical QSAR Models for Skin Sensitization Based on Local Lymph Node Assay Measures and Both Ground and Excited State 4D-Fingerprint Descriptors. J. Comput. Aided Mol. Des. 2008, 22, 345–366. Li, Y.; Tseng, Y. J.; Pan, D.; Liu, J.; Kern, P. S.; Gerberick, G. F.; Hopfinger, A. J. 4D-Fingerprint Categorical QSAR Models for Skin Sensitization Based on the Classification of Local Lymph Node Assay Measures. Chem. Res. Toxicol. 2007, 20, 114–128. Lynch, T.; Pharm, D; Price, A. The Effect of Cytochrome P450 Metabolism on Drug Response, Interactions, and Adverse Effects. Am. Fam. Physician 2007, 76, 391–396. Wilkinson, G. R. Drug Therapy - Drug Metabolism and Variability Among Patients in Drug Response. N. Engl. J. Med. 2005, 352, 2211–2221. Smith, H. S. Opioid Metabolism. Mayo Clin. Proc. 2009, 84, 613–624. Friedman, S. E.; Grendell, J. H.; McQuaid, K. R. Current Diagnosis & Treatment in Gastroenterology; Lang Medical Books/McGraw-Hill: New York, 2003. Pandit, A.; Sachdeva, T.; Bafna, P. Drug-Induced Hepatotoxicity: A Review. J. Appl. Pharm. Sci. 2012, 2, 233–243. 321

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

129. Sridhar, J.; Liu, J. W.; Foroozesh, M.; Stevens, C. L. K. Insights on Cytochrome P450 Enzymes and Inhibitors Obtained Through QSAR Studies. Molecules 2012, 17, 9283–9305. 130. Roy, K.; Roy, P. P. QSAR of Cytochrome Inhibitors. Expert Opin. Drug Metab. Toxicol. 2009, 5, 1245–1266. 131. Lewis, D. F. V.; Lake, B. G.; Dickins, M. Quantitative Structure-Activity Relationships (QSARs) in Inhibitors of Various Cytochromes P450: The Importance of Compound Lipophilicity. J. Enzyme Inhib. Med. Chem. 2007, 22, 1–6. 132. Lewis, D. F. V.; Modi, S.; Dickins, M. Structure-Activity Relationship for Human Cytochrome P450 Substrates and Inhibitors. Drug Metab. Rev. 2002, 34, 69–82. 133. Ekins, S.; De Groot, M. J.; Jones, J. P. Pharmacophore and ThreeDimensional Quantitative Structure Activity Relationship Methods for Modeling Cytochrome P450 Active Sites. Drug Metab. Dispos. 2001, 29, 936–944. 134. Gleeson, M. P.; Davis, A. M.; Chohan, K. K.; Paine, S. W.; Boyer, S.; Gavaghan, C. L.; Arnby, C. H.; Kankkonen, C.; Albertson, N. Generation of In-Silico Cytochrome P450 1A2, 2C9, 2C19, 2D6, and 3A4 Inhibition QSAR Models. J. Comput. Aided Mol. Des. 2007, 21, 559–573. 135. Miller, G. P. Advances in the Interpretation and Prediction of CYP2E1 Metabolism from a Biochemical Perspective. Expert Opin. Drug Metab. Toxicol. 2008, 4, 1053–1064. 136. Hammann, F.; Gutmann, H.; Baumann, U.; Helma, C.; Drewe, J. Classification of Cytochrome P450 Activities Using Machine Learning Methods. Mol. Pharmaceutics 2009, 6, 1920–1926. 137. Cheng, F.; Yu, Y.; Shen, J.; Yang, L.; Li, W.; Liu, G.; Lee, P. W.; Tang, Y. Classification of Cytochrome P450 Inhibitors and Noninhibitors Using Combined Classifiers. J. Chem. Inf. Model. 2011, 51, 996–1011. 138. Burton, J.; Ijjaali, I.; Barberan, O.; Petitet, F.; Vercauteren, D. P.; Michel, A. Recursive Partitioning for the Prediction of Cytochromes P450 2D6 and 1A2 Inhibition: Importance of the Quality of the Dataset. J. Med. Chem. 2006, 49, 6231–6240. 139. Jensen, B. F.; Vind, C.; Padkjær, S. B.; Brockhoff, P. B.; Refsgaard, H. H. F. In Silico Prediction of Cytochrome P450 2D6 and 3A4 Inhibition Using Gaussian Kernel Weighted k-Nearest Neighbor and Extended Connectivity Fingerprints, Including Structural Fragment Analysis of Inhibitors versus Noninhibitors. J. Med. Chem. 2007, 50, 501–511. 140. Novotarskyi, S.; Sushko, I.; Körner, R.; Pandey, A. K.; Tetko, I. V. A Comparison of Different QSAR Approaches to Modeling CYP450 1A2 Inhibition. J. Chem. Inf. Model. 2011, 51, 1271–1280. 141. Michielan, L.; Terfloth, L.; Gasteiger, J.; Moro, S. Comparison of Multilabel and Single-Label Classification Applied to the Prediction of the Isoform Specificity of Cytochrome P450 Substrates. J. Chem. Inf. Model. 2009, 49, 2588–2605.

322 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

142. Mishra, N. K.; Agarwal, S.; Raghava, G. P. Prediction of Cytochrome P450 Isoform Responsible for Metabolizing a Drug Molecule. BMC Pharmacol. 2010, 10, 1–9. 143. Rostkowski, M.; Spjuth, O.; Rydberg, P. WhichCyp: Prediction of Cytochromes P450 Inhibition. Bioinformatics 2013, 29, 2051–2052. 144. Sun, H.; Veith, H.; Xia, M.; Austin, C. P.; Huang, R. Predictive Models for Cytochrome P450 Isozymes Based on Quantitative High Throughput Screening Data. J. Chem. Inf. Model. 2011, 51, 2474–2481. 145. Chang, C. Y.; Hsu, M. T.; Esposito, E. X.; Tseng, Y. J. Oversampling to Overcome Overfitting: Exploring the Relationship between Data Set Composition, Molecular Descriptors, and Predictive Modeling Methods. J. Chem. Inf. Model. 2013, 53, 958–971. 146. Shao, C. Y.; Su, B. H.; Tu, Y. S.; Lin, C.; Lin, O. A.; Tseng, Y. F. J. CypRules: a Rule-based P450 Inhibition Prediction Server. Bioinformatics 2015, 31, 1869–1871. 147. Su, B. H.; Tu, Y. S.; Lin, C.; Shao, C. Y.; Lin, O. A.; Tsene, Y. J. Rule-Based Prediction Models of Cytochrome P450 Inhibition. J. Chem. Inf. Model. 2015, 55, 1426–1434. 148. Shamovsky, I.; Ripa, L.; Borjesson, L.; Mee, C.; Norden, B.; Hansen, P.; Hasselgren, C.; O’Donovan, M.; Sjo, P. Explanation for Main Features of Structure-Genotoxicity Relationships of Aromatic Amines by Theoretical Studies of Their Activation Pathways in CYP1A2. J. Am. Chem. Soc. 2011, 133, 16168–16185. 149. He, S. M.; Zhou, Z. W.; Li, X. T.; Zhou, S. F. Clinical Drugs Undergoing Polymorphic Metabolism by Human Cytochrome P450 2C9 and the Implication in Drug Development. Curr. Med. Chem. 2011, 18, 667–713. 150. Zhang, T.; Chen, Q.; Li, L.; Liu, L. A.; Wei, D. Q. In Silico Prediction of Cytochrome P450-Mediated Drug Metabolism. Comb. Chem. High Throughput Screening 2011, 14, 388–395. 151. Tarcsay, A.; Keseru, G. M. In Silico Site of Metabolism Prediction of Cytochrome P450-Mediated Biotransformations. Expert Opin. Drug Metab. Toxicol. 2011, 7, 299–312. 152. Kirchmair, J.; Williamson, M. J.; Tyzack, J. D.; Tan, L.; Bond, P. J.; Bender, A.; Glen, R. C. Computational Prediction of Metabolism: Sites, Products, SAR, P450 Enzyme Dynamics, and Mechanisms. J. Chem. Inf. Model. 2012, 52, 617–648. 153. Rydberg, P.; Vasanthanathan, P.; Oostenbrink, C.; Olsen, L. Fast Prediction of Cytochrome P450 Mediated Drug Metabolism. ChemMedChem 2009, 4, 2070–2079. 154. Rydberg, P.; Gloriam, D. E.; Olsen, L. The SMARTCyp Cytochrome P450 Metabolism Prediction Server. Bioinformatics 2010, 26, 2988–2989. 155. Rydberg, P.; Hansen, S. M.; Kongsted, J.; Norrby, P. O.; Olsen, L.; Ryde, U. Transition-State Docking of Flunitrazepam and Progesterone in Cytochrome P450. J. Chem. Theory Comput. 2008, 4, 673–681. 156. Saraceno, M.; Massarelli, I.; Imbriani, M.; James, T. L.; Bianucci, A. M. Optimizing QSAR Models for Predicting Ligand Binding to the 323

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

157.

158.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

159.

160.

161.

162.

163.

164. 165.

166.

167.

168.

169.

170.

Drug-Metabolizing Cytochrome P450 Isoenzyme CYP2D6. Chem. Biol. Drug Des. 2011, 78, 236–251. Rydberg, P.; Gloriam, D. E.; Zaretzki, J.; Breneman, C.; Olsen, L. SMARTCyp: A 2D Method for Prediction of Cytochrome P450-Mediated Drug Metabolism. ACS Med. Chem. Lett. 2010, 1, 96–100. Zaretzki, J.; Bergeron, C.; Rydberg, P.; Huang, T. W.; Bennett, K. P.; Breneman, C. M. RS-Predictor: A New Tool for Predicting Sites of Cytochrome P450-Mediated Metabolism Applied to CYP 3A4. J. Chem. Inf. Model. 2011, 51, 1667–1689. Pelkonen, O.; Turpeinen, M.; Raunio, H. In Vivo-In Vitro-In Silico Pharmacokinetic Modelling in Drug Development Current Status and Future Directions. Clin. Pharmacokinet. 2011, 50, 483–491. Czodrowski, P.; Kriegl, J. M.; Scheuerer, S.; Fox, T. Computational Approaches to Predict Drug Metabolism. Expert Opin. Drug Metab. Toxicol. 2009, 5, 15–27. de Graaf, C.; Pospisil, P.; Pos, W.; Folkers, G.; Vermeulen, N. P. E. Binding Mode Prediction of Cytochrome P450 and Thymidine Kinase Protein-Ligand Complexes by Consideration of Water and Rescoring in Automated Docking. J. Med. Chem. 2005, 48, 2308–2318. Stjernschantz, E.; Vermeulen, N. P. E.; Oostenbrink, C. Computational Prediction of Drug Binding and Rationalisation of Selectivity Towards Cytochromes P450. Expert Opin. Drug Metab. Toxicol. 2008, 4, 513–527. Vaz, R. J.; Zamora, I.; Li, Y.; Reiling, S.; Shen, J. A.; Cruciani, G. The Challenges of in Silico Contributions to Drug Metabolism in Lead Optimization. Expert Opin. Drug Metab. Toxicol. 2010, 6, 851–861. Sun, H.; Scott, D. O. Structure-Based Drug Metabolism Predictions for Drug Design. Chem. Biol. Drug Des. 2009, 75, 3–17. Kirchmair, J.; Williamson, M. J.; Tyzack, J. D.; Tan, L.; Bond, P. J.; Bender, A.; Glen, R. C. Computational Prediction of Metabolism: Sites, Products, SAR, P450 Enzyme Dynamics, and Mechanisms. J. Chem. Inf. Model. 2012, 52, 617–648. Vasanthanathan, P.; Hritz, J.; Taboureau, O.; Olsen, L.; Jorgensen, F. S.; Vermeulen, N. P. E.; Oostenbrink, C. Virtual Screening and Prediction of Site of Metabolism for Cytochrome P450 1A2 Ligands. J. Chem. Inf. Model. 2009, 49, 43–52. Moors, S. L.; Vos, A. M.; Cummings, M. D.; Van Vlijmen, H.; Ceulemans, A. Structure-Based Site of Metabolism Prediction for Cytochrome P450 2D6. J. Med. Chem. 2011, 54, 6098–6105. Cruciani, G.; Carosati, E.; De Boeck, B.; Ethirajulu, K.; Mackie, C.; Howe, T.; Vianello, R. MetaSite: Understanding Metabolism in Human Cytochromes from the Perspective of the Chemist. J. Med. Chem. 2005, 48, 6970–6979. Oh, W. S.; Kim, D. N.; Jung, J.; Cho, K. H.; No, K. T. New Combined Model for the Prediction of Regioselectivity in Cytochrome P450/3A4 Mediated Metabolism. J. Chem. Inf. Model. 2008, 48, 591–601. Campagna-Slater, V.; Pottel, J.; Therrien, E.; Cantin, L. D.; Moitessier, N. Development of a Computational Tool to Rival Experts in the Prediction of 324

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

171. 172.

173.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

174.

175.

176.

177.

178.

179.

180.

181.

182.

183. 184. 185.

Sites of Metabolism of Xenobiotics by P450s. J. Chem. Inf. Model. 2012, 52, 2471–2483. StarDrop; Optibrium Ltd.: Cambridge, U.K., 2014. Afzelius, L.; Arnby, C. H.; Broo, A.; Carlsson, L.; Isaksson, C.; Jurva, U.; Kjellander, B.; Kolmodin, K.; Nilsson, K.; Raubacher, F.; Weidolf, L. State-of-the-art Tools for Computational Site of Metabolism Predictions: Comparative Analysis, Mechanistical Insights, and Future Applications. Drug Metab. Rev. 2007, 39, 61–86. Brown, A. M. Drugs, hERG and Sudden Death. Cell. Physiol. Biochem. 2004, 35, 543–547. Pearlstein, R. A.; Vaz, R. J.; kang, J.; Chen, X. L.; Preobrazhenskaya, M.; Shchekotikhin, A. E.; Korolev, A. M.; Lysenkova, L. N.; Miroshnikova, O. V.; Hendrix, J.; Rampe, D. Characterization of Herg Potassium Channel Inhibition Using Comsia 3D QSAR and Homology Modeling Approaches. Bioorg. Med. Chem. Lett. 2003, 13, 1829–1835. Recanatini, M.; Poluzzi, E.; Masetti, M.; Cavalli, A.; De Ponti, F. Qt Prolongation Through hERG K(+) Channel Blockade: Current Knowledge and Strategies For the Early Prediction During Drug Development. Med. Res. Rev. 2005, 25, 133–166. Sanguinetti, M. C.; Jiang, C.; Curran, M. E.; Keating, M. T. A Mechanistic Link Between an Inherited and an Acquired Cardiac Arrhythmia: hERG Encodes the Ikr Potassium Channel. Cell 1995, 81, 299–307. Aptula, A.; Cronin, M. Prediction of hERG K+ Blocking Potency: Application of Structural Knowledge. SAR QSAR Environ. Res. 2004, 15, 399–411. Cianchetta, G.; Li, Y.; Kang, J.; Rampe, D.; Fravolini, A.; Cruciani, G.; Vaz, R. Predictive Models for hERG Potassium Channel Blockers. Bioorg. Med. Chem. Lett. 2005, 15, 3637–3642. Coi, A.; Massarelli, I.; Murgia, L.; Saraceno, M.; Calderone, V.; Bianucci, A. Prediction of hERG Potassium Channel Affinity by the CODESSA Approach. Bioorg. Med. Chem. 2006, 14, 3153–3159. Obrezanova, O.; Csanyi, G.; Gola, J. M. R.; Segall, M. D. Gaussian Processes: A Method for Automatic QSAR Modeling of ADME Properties. J. Chem. Inf. Model. 2007, 47, 1847–1857. Chen, X.; Li, H.; Yap, C.; Ung, C.; Jiang, L.; Cao, Z.; Li, Y.; Chen, Y. Computer Prediction of Cardiovascular and Hematological Agents by Statistical Learning Methods. Cardiovasc. Hematol. Agents Med. Chem. 2007, 5, 11–19. Roche, O.; Trube, G.; Zuegge, J.; Pflimlin, P.; Alanine, A.; Schneider, G. A Virtual Screening Method for Prediction of the hERG Potassium Channel Liability of Compound Libraries. ChemBioChem 2002, 3, 455–459. Sun, H. An Accurate and Interpretable Bayesian Classification Model for Prediction of hERG Liability. ChemMedChem 2006, 1, 315–322. Gepp, M.; Hutter, M. Determination of hERG Channel Blockers Using a Decision Tree. Bioorg. Med. Chem. 2006, 14, 5325–5332. Jia, L.; Sun, H. Support Vector Machines Classification of hERG Liabilities Based on Atom Types. Bioorg. Med. Chem. 2008, 16, 6252–6260. 325

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

186. Leong, M. A Novel Approach using Pharmacophore Ensemble/Support Vector Machine (Phe/Svm) for Prediction of Herg Liability. Chem. Res. Toxicol. 2007, 20, 217–216. 187. Song, M.; Clark, M. Development and Evaluation of an in Silico Model for hERG Binding. J. Chem. Inf. Model. 2006, 46, 392–400. 188. Tobita, M.; Nishikawa, T.; Nagashima, R. A Discriminant Model Constructed by the Support Vector Machine Method for hERG Potassium Channel Inhibitors. Bioorg. Med. Chem. Lett. 2005, 15, 2886–2890. 189. Keseru, G. M. Prediction of Herg Potassium Channel Affinity by Traditionaland Hologram QSAR Methods. Bioorg. Med. Chem. Lett. 2003, 13, 2773–2775. 190. Li, Q.; Jørgensen, F. S.; Oprea, T.; Brunak, S.; Taboureau, O. hERG Classification Model Based on a Combination of Support Vector Machine Method and GRIND Descriptors. Mol. Pharm. 2008, 5, 117–127. 191. hERG Channel Activity (AID: 376, Source: PDSP). In The PubChem BioAssay Database, National Center for Biotechnology Information: Bethesda, MD, U.S.A., 2009. 192. Huang, J.; Fan, X. Why QSAR Fails: An Empirical Evaluation Using Conventional Computational Approach. Mol. Pharmaceutics 2011, 8, 600–608. 193. Holland, J. H. Adaptation in Natural and Artificial Systems: an Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence; University of Michigan: Ann Arbor, MI, 1975. 194. Hopfinger, A. J.; Patel, H. C. Application of Genetic Algorithms to the General QSAR Problem and to Guiding Molecular Diversity Experiments. In Genetic algorithms in molecular modeling; Devillers, J., Ed.; Academic Press: London, 1996; pp 131−157. 195. Meffert, K.; Meseguer, J.; Martí, E. D.; Meskauskas, A.; Vos, J.; Rotstan, N.; Knowles, C.; Sangiorgi, U. B. JGAP - Java Genetic Algorithms and Genetic Programming Package. 196. Li, Q.; Jørgensen, F. S.; Oprea, T.; Brunak, S.; Taboureau, O. hERG Classification Model Based on a Combination of Support Vector Machine Method and GRIND Descriptors. Mol. Pharmaceutics 2008, 5, 117–127. 197. Bains, W.; Basman, A.; White, C. hERG Binding Specificity and Binding Site Structure: Evidence from a Fragment-Based Evolutionary Computing SAR Study. Prog. Biophys. Mol. Biol. 2004, 86, 205–233. 198. Dubus, E.; Ijjaali, I.; Petitet, F.; Michel, A. In Silico Classification of hERG Channel Blockers: a Knowledge-Based Strategy. Chemmedchem 2006, 1, 622–630. 199. Nisius, B.; Goller, A. H. Similarity-Based Classifier Using Topomers to Provide a Knowledge Based for hERG Channel Inhibition. J. Chem. Inf. Model. 2008, 49, 247–256. 200. Chekmarev, D. S.; Kholodovych, V.; Balakin, K. V.; Ivanenkov, Y.; Ekins, S.; Welsh, W. J. Shape Signatures: New Descriptors for Predicting Cardiotoxicity In Silico. Chem. Res. Toxicol. 2008, 21, 1304–1314. 201. Thai, K. M.; Ecker, G. F. A Binary QSAR Model for Classification of hERG Potassium Channel Blockers. Bioorg. Med. Chem. 2008, 16, 4107–4119. 326

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

202. Su, B.-H.; Shen, M.-y.; Esposito, E. X.; Hopfinger, A. J.; Tseng, Y. J. In Silico Binary Classification QSAR Models Based on 4D-Fingerprints and MOE Descriptors for Prediction of hERG Blockage. J. Chem. Inf. Model. 2010, 50, 1304–1318. 203. Selassie, C. D.; Shusterman, A. J.; Kapur, S.; Verma, R. P.; Zhang, L. T.; Hansch, C. On the Toxicity of Phenols to Fast Growing Cells. A QSAR Model for a Radical-Based Toxicity. J. Chem. Soc., Perkin Trans. 2 1999, 2729–2733. 204. Garcia-Lorenzo, A.; Tojo, E.; Tojo, J.; Teijeira, M.; Rodriguez-Berrocal, F. J.; Gonzalez, M. P.; Martinez-Zorzano, V. S. Cytotoxicity of Selected Imidazolium-Derived Ionic Liquids in the Human Caco-2 Cell Line. Sub-Structural Toxicological Interpretation Through a QSAR Study. Green Chem. 2008, 10, 508–516. 205. Guha, R.; Schurer, S. C. Utilizing High Throughput Screening Data for Predictive Toxicology Models: Protocols and Application to MLSCN Assays. J. Comput. Aided Mol. Des. 2008, 22, 367–384. 206. Cronin, M. T. D.; Schultz, T. W. Pitfalls in QSAR. J. Mol. Struct.: THEOCHEM 2003, 622, 39–51. 207. Judson, R.; Elloumi, F.; Setzer, R. W.; Li, Z.; Shah, I. A Comparison of Machine Learning Algorithms for Chemical Toxicity Classification Using a Simulated Multi-Scale Data Model. BMC Bioinformatics 2008, 9, 241–256. 208. Chang, C. Y.; Hsu, M. T.; Esposito, E. X.; Tseng, Y. J. Oversampling to Overcome Overfitting: Exploring the Relationship between Data Set Composition, Molecular Descriptors, and Predictive Modeling Methods. J. Chem. Inf. Model. 2013, 53, 958–971. 209. Nagarathna, P. K. M.; Wesley, M. J.; Reddy, P. S.; Reena, K. Review on Genotoxicity, its Molecular Mechanisms and Prevention. Int. J. Pharm. Sci. Rev. Res. 2013, 22, 236–243. 210. Ames, B. N.; Lee, F. D.; Durston, W. E. An Improved Bacterial Test System for the Detection and Classification of Mutagens and Carcinogens. Proc. Natl. Acad. Sci. U.S.A. 1973, 70, 782–786. 211. Ames, B. N.; McCann, J.; Yamasaki, E. Methods for Detecting Carcinogens and Mutagens with the Salmonella/Mammalian-Microsome Mutagenicity Test. Mutat. Res. 1975, 31, 347–363. 212. Maron, D. M.; Ames, B. N. Revised Methods for the Salmonella Mutagenicity Test. Mutat. Res. 1983, 113, 173–215. 213. Griffiths, A. J.; Miller, J. H.; Suzuki, D. T.; Lewontin, R. C.; Gelbart, W. M. An Introduction to Genetic Analysis, 7th ed.; New York: W. H. Freeman: 2000. 214. Benigni, R.; Bossa, C.; Tcheremenskaia, O.; Giuliani, A. Alternatives to the Carcinogenicity Bioassay: In Silico Methods, and the in Vitro and in Vivo Mutagenicity Assays. Expert Opin. Drug Metab. Toxicol. 2010, 6, 809–819. 215. International Conference on Harmonisation, Multidisciplinary Guidelines, M7 Genotoxic Impurities: Assessment and Control of DNA Reactive (Mutagenic) Impurities in Pharmaceuticals to Limit Potential Carcinogenic Risk. http://www.ich.org/products/guidelines/multidisciplinary/article/ multidisciplinary-guidelines.html (March 27, 2015). 327

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

216. Committee for Medicinal Products for Human Use (CHMP). Guideline on the Limits of Genotoxic Impurities; European Medicines Agency: 2006. 217. Ridings, J. E.; Barratt, M. D.; Cary, R.; Earnshaw, C. G.; Eggington, C. E.; Ellis, M. K.; Judson, P. N.; Langowski, J. J.; Marchant, C. A.; Payne, M. P.; Watson, W. P.; Yih, T. D. Computer Prediction of Possible Toxic Action from Chemical Structure: an Update on the DEREK System. Toxicology 1996, 106, 267–279. 218. Mostrag-Szlichtyng, A.; Zaldívar Comenges, J.-M.; Worth, A. P. Computational toxicology at the European Commission’s Joint Research Centre. Expert Opin. Drug Metab. Toxicol. 2010, 6, 785–792. 219. Leadscope Inc. Leadscope Model Applier. http://www.leadscope.com/ (October 29, 2013). 220. Klopman, G. MULTICASE 1. A Hierarchical Computer Automated Structure Evaluation Program. Quant. Struct.-Act. Relat. 1992, 11, 176–184. 221. Mostrag-Szlichtyng, A.; Zaldívar Comenges, J.-M.; Worth, A. P. Computational toxicology at the European Commission’s Joint Research Centre. Expert Opin. Drug Metab. Toxicol. 2010, 6, 785–792. 222. Ashby, J. Fundamental Structural Alerts to Potential Carcinogenicity or Noncarcinogenicity. Environ. Mol. Mutagen. 1985, 7, 919–921. 223. Benigni, R.; Bossa, C.; Tcheremenskaia, O. Nongenotoxic Carcinogenicity of Chemicals: Mechanisms of Action and Early Recognition through a New Set of Structural Alerts. Chem. Rev. 2013, 113, 2940–2957. 224. von der Ohe, P. C.; Kühne, R.; Ebert, R.-U.; Altenburger, R.; Liess, M.; Schüürmann, G. Structural AlertsA New Classification Model to Discriminate Excess Toxicity from Narcotic Effect Levels of Organic Compounds in the Acute Daphnid Assay. Chem. Res. Toxicol. 2005, 18, 536–555. 225. Benigni, R.; Bossa, C.; Jeliazkova, N.; Netzeva, T.; Worth, A. The Benigni / Bossa Rulebase for Mutagenicity and Carcinogenicity - a Module of Toxtree; EUR 23241 EN; Office for Official Publications of the European Communities: 2008. 226. Nantasenamat, C.; Isarankura-Na-Ayudhya, C.; Naenna, T.; Prachayasittikul, V. A Practical Overview of Quantitative Structure-Activity Relationship. EXCLI J. 2009, 8, 74–88. 227. Kho, R.; Hodges, J. A.; Hansen, M. R.; Villar, H. O. Ring Systems in Mutagenicity Databases. J. Med. Chem. 2005, 48, 6671–6678. 228. Hansen, K.; Mika, S.; Schroeter, T.; Sutter, A.; ter Laak, A.; StegerHartmann, T.; Heinrich, N.; Müller, K.-R. Benchmark Data Set for in Silico Prediction of Ames Mutagenicity. J. Chem. Inf. Model. 2009, 49, 2077–2081. 229. Wetzel, S.; Klein, K.; Renner, S.; Rauh, D.; Oprea, T. I.; Mutzel, P.; Waldmann, H. Interactive Exploration of Chemical Space with Scaffold Hunter. Nat. Chem. Biol. 2009, 5, 581–583. 230. Fjodorova, N.; Vracko, M.; Tusar, M.; Jezierska, A.; Novic, M.; Kuhne, R.; Schuurmann, G. Quantitative and Qualitative Models for Carcinogenicity 328

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

231.

232. 233.

Downloaded by CORNELL UNIV on October 27, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch014

234.

235.

236. 237. 238. 239. 240.

241. 242.

243.

Prediction for Non-Congeneric Chemicals Using CP ANN Method for Regulatory Uses. Mol. Diversity 2010, 14, 581–594. Guyton, K. Z.; Kyle, A. D.; Aubrecht, J.; Cogliano, V. J.; Eastmond, D. A.; Jackson, M.; Keshava, N.; Sandy, M. S.; Sonawane, B.; Zhang, L.; Waters, M. D.; Smith, M. T. Improving Prediction of Chemical Carcinogenicity by Considering Multiple Mechanisms and Applying Toxicogenomic Approaches. Mutat. Res. 2009, 681, 230–240. National Toxicology Program. http://ntp.niehs.nih.gov/results/index.html (Feb. 19, 2016). The Carcinogenic Potency Database (CPDB). http://toxnet.nlm.nih.gov/ cpdb/cpdb.html (Feb. 19, 2016). Distributed Structure-Searchable Toxicity (DSSTox) Database. http:// www.epa.gov/chemical-research/distributed-structure-searchable-toxicitydsstox-database (Feb. 19, 2016). Singh, K. P.; Gupta, S.; Rai, P. Predicting Carcinogenicity of Diverse Chemicals Using Probabilistic Neural Network Modeling Approaches. Toxicol. Appl. Pharmacol. 2013, 272, 465–475. CAESAR Project. http://www.caesar-project.eu. (Feb. 19, 2016). Lazar Toxicity Predictions. http://lazar.in-silico.de/predict. (Feb. 19, 2016). HazardExpert Pro. http://www.compudrug.com/hazardexpertpro. (Feb. 19, 2016). Scialli, A. R. The Challenge of Reproductive and Developmental Toxicology under REACH. Regul. Toxicol. Pharmacol. 2008, 51, 244–250. The ACD/Tox Suite (Toxboxes), ACD/Labs and Pharma Algorithms. http://www.acdlabs.com/products/percepta/physchem_adme_tox/. (Feb. 19, 2016). TOPKAT. http://www.accelrys.com/. (Feb. 19, 2016). Lu, J.; Peng, J. L.; Wang, J. N.; Shen, Q. C.; Bi, Y.; Gong, L. K.; Zheng, M. Y.; Luo, X. M.; Zhu, W. L.; Jiang, H. L.; Chen, K. X. Estimation of Acute Oral Toxicity in Rat Using Local Lazy Learning. J. Cheminf. 2014, 6, 26–37. Bhhatarai, B.; Wilson, D. M.; Bartels, M. J.; Chaudhuri, S.; Price, P. S.; Carney, E. W. Acute Toxicity Prediction in Multiple Species by Leveraging Mechanistic ToxCast Mitochondrial Inhibition Data and Simulation of Oral Bioavailability. Toxicol. Sci. 2015, 147, 386–396.

329 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.