Understanding the Roles of the “Two QSARs” - ACS Publications

Jan 11, 2016 - Latrobe Institute for Molecular Science, Latrobe University, ... School of Chemical and Physical Sciences, Flinders University, Bedford...
0 downloads 0 Views 1MB Size
Perspective pubs.acs.org/jcim

Understanding the Roles of the “Two QSARs” Toshio Fujita† and David A. Winkler*,‡,§,∥,⊥ †

Professor Emeritus at Kyoto University, 38-1 Iwakura-Miyakecho, Kyoto, Japan 606-0022 CSIRO Manufacturing, Bag 10, Clayton South MDC 3169, Australia § Monash Institute of Pharmaceutical Sciences, 392 Royal Parade, Parkville 3052, Australia ∥ Latrobe Institute for Molecular Science, Latrobe University, Bundoora 3086, Australia ⊥ School of Chemical and Physical Sciences, Flinders University, Bedford Park 5042, Australia ‡

ABSTRACT: Quantitative structure−activity relationship (QSAR) modeling has matured over the past 50 years and has been very useful in discovering and optimizing drug leads. Although its roots were in extra-thermodynamic relationships within small sets of chemically similar molecules focused on mechanistic interpretation, a second class of QSAR models has emerged that relies on machine learning methods to generate models from large, chemically diverse data sets for predictive purposes. There has been a tension between the two groups of QSAR practitioners that is unnecessary and possibly counterproductive. This paper explains the difference in philosophy and application of these two distinct, but equally important, classes of QSAR models and how they can work together synergistically to accelerate the discovery of new drugs or materials

Q

scientific meetings and is largely unpublished, but there have been a number of publications in the past decade or two that have also carried this debate. For example, Zefirov and Palyulin26 discussed the general problem of descriptive versus predictive QSAR arguing that high quality correlations are not necessarily predictive. Trospha et al. subsequently summarized work by several QSAR practitioners who emphasized that “one of the most important aspects of QSAR modelling is the ability to interpret the models in physico-chemical and/or mechanistic sense” (pure or classical QSAR modellers).27 However, some of these studies did not rigorously validate these mechanistically focused models, an essential step in good QSAR modeling. Tropsha et al. made the important point that QSPR models must be validated for predictive power before they are applied to predict, let alone explain, the structure−property relationships of biological, pharmaceutical, environmental, or any other property of chemicals. Cronin discussed the ideal QSAR model as being predictive and mechanistically interpretable, but that models that could not achieve the latter aim were still very useful.28 He further argued that the requirements for model simplicity and interpretability were dependent on the context and application of the model. However, Johnson reminded us that mechanistically interpretable models are more likely to describe causative relationships and to be less liable to be the result of chance correlations, a situation that QSAR modellers must always keep front of mind.29

uantitative structure activity relationship (QSAR) modeling is now more than 50 years old, and its utility has been shown in numerous research publications and by the number of new drug and agrochemical entities that have been developed with its aid. The method has evolved very substantially since the seminal linear regression QSAR models published by Hansch and Fujita.1,2 Many new types of molecular descriptors have been developed, new mathematical methods such as neural networks, 3−8 support vector machines,9−11 kernel regression,12 and random forest13,14 have been applied to mapping structure to activity, and QSAR has now incorporated 3D structures using field based methods like CoMFA and CoMSIA,15,16 conformation, and chirality.17−19 The method has therefore evolved steadily since the 1960s as has been well-summarized in numerous recent reviews of the history of QSAR methods.20−24 Two main branches of QSAR have evolved. The first of these remains true to the origins of QSAR, where the model is often relatively simple and linear and interpretable in terms of molecular interactions or biological mechanisms, and may be considered “pure” or classical QSAR. The second type focuses much more on modeling structure−activity relationships in large data sets with high chemical diversity using a variety of regression or classification methods, and its primary purpose is to make reliable predictions of properties of new molecules often the interpretation of the model is obscure or impossible. However, for most of its history, there has been a tension in the QSAR community about these two main purposes to which QSAR has been applied.25 Much robust debate has occurred at © 2016 American Chemical Society

Received: April 22, 2015 Published: January 11, 2016 269

DOI: 10.1021/acs.jcim.5b00229 J. Chem. Inf. Model. 2016, 56, 269−274

Perspective

Journal of Chemical Information and Modeling Many researchers using QSAR methods do not understand their evolution and the need for the two types of modeling methodologies to coexist. They are equally important, each has distinct advantages, and the choice of which to use is driven by the scientific questions that are to be investigated. At an operational level, they also differ in the molecular diversity of their training sets and, consequently, in their domains of applicability. This means that the types of descriptors used to represent molecular properties in the two types of model also differ. Of course, in the continuum that lies between the extremes of simple and highly interpretable models giving important insight into the mechanism of action, and large data driven high diversity models that are opaque to interpretation there is a wealth of interesting and useful science.

At a similar time one of us (Fujita) had tried to elucidate structure−activity relationships in substituted 1-naphthoic acid analogues exhibiting plant growth regulatory properties.39 For this chemotype, potent activity was highly dependent on the steric environment of the COOH group and the presence of an electron withdrawing substituent at certain positions and represented by parameters calculated by the Hückel LCAOMO method. In these structure−activity studies, the difference between experimentally measured biological activity and that predicted from the Hammett σ or MO correlations was attributed to other factors not yet accounted for, such as lipophilic and steric properties of substituents.34 Subsequent work in collaboration with Hansch allowed the contributions of hydrophobic/lipophilic nature of substituents to be accounted for by the log P value (log of the 1-octanol/water partition coefficient) or the π parameter (defined as a component of the log P allocated solely to the substituent, log(PX/PH), another free-energy related hydrophobic substituent parameter of substituents). Hansch further showed that the steric effect of substituents can represented by the Taft Es value, thus defining for the first time the famous QSAR correlation equation.



CLASSICAL OR HANSCH QSAR MODELS RELATING TO SINGLE MECHANISMS AND FREE ENERGY RELATIONSHIPS The initial postulates of classical QSAR are that three main molecular properties or factors are required to rationalize variations in a set of congeneric compounds producing a standard response in a test system: the electronic, hydrophobic, and steric properties. Hammett−Taft electronic parameters and log P or π hydrophobic parameters can account for two of these, and Taft’s Es, molar refractivity, molar volume, or dimensional parameters such as STERIMOL capture the third.30 When the classical QSAR of plant growth regulators was published first in early1960s, the initial intent was not to predict more potent bioactive analogues2,31−34 but to increase the understanding of the chemical and/or biochemical mechanisms of interaction between ligands and receptors. Early work by Hansch at Pomona College focused on structure−activity relationships in the plant growth regulating activity of substituted phenoxyacetic35 and benzoic acids36 such as potent herbicidal compounds like 2,4-D (1), MCPA (2), TBA (3), and Dicamba (4). The potency of plant growth activity was highly dependent on substituent variations at various positions of the aromatic skeleton. Although the reverse was not always true, the most highly active compounds contained electron-withdrawing substituents as Cl, Br, and NO2. Conversely, electron-donating substituents as OH, OMe. and NMe2 resulted in weakly active or inactive compounds. Unless there is multiple substitution of electron-withdrawing substituents, Hansch considered that the receptor interaction could be represented by the Hammett σ constant for substituent X defined by log(KX/KH), where K is the dissociation constant of m- and p-substituted benzoic acids in water at 25 °C. These are free-energy related electronic parameters. Conspicuously, a large number of nonbiological examples, where the electronic effect of aromatic substituents on various rate or equilibrium processes of aromatic compounds were linearly correlated with the Hammett σ, had been reported between 1930 and 1960.37,38

Log(1/C) = aπ + ρσ + δ Es + constant

(1)

where a, ρ, and δ are the coefficients or contributions of the hydrophobic, electronic, and steric terms to the model, respectively. This model implies a linear relationship between the substituent parameters (descriptors) and the logarithm of the activity. It soon became clear that not all QSAR models were linear, especially those relating to lipophilicity. A parabolic term aπ2 (a < 0) or Kubinyi bilinear term alog(β10logP+1) (a and β are constants calculated with nonlinear regression analysis) were found to generate better models than linear terms.40 Other parameters related to free energy differences for certain physicochemical effects between the free and interacting states could also be added to eq 1. In fact, eq 1 is often referred to as a linear f ree-energy relationships or extra-thermodynamic relationships model.41,42 Recent examples of classical QSAR studies are not as numerous as they were a few decades ago, perhaps reflecting less training in the thermodynamic basis of classical QSAR and the maturity of the classical QSAR field applied to drug discovery. Examples include modeling of a series of 62 anti-HIV agents by Leonard and Roy (2007),43 118 protein-peptide binding affinities by Zhou et al. (2013),44 25 anti-HIV agents by Tripathi and Pandey (2011),45 and 56 ecdysone agonists by Fujita and Nakagawa (2007).46 In the latter study, dibenzoylhydrazines substituted at the A- and B-rings were subjected to classical QSAR analyses. The following QSAR equation was found for substituents on the B-ring. pLD50 = 0.72(± 0.21)log P − 0.88( ±0.22)ΔLortho − 0.98( ±0.24)ΔV meta − 0.59(± 0.19)ΔLpara + 4.92( ±0.26)

n = 30,

s = 0.254,

r 2 = 0.83

(no test set)

where LD50 is the dose for 50% killing of rice stem borer larvae. Summarizing, when the set of compounds under investigation contains a common structural core, it is possible, by careful choice of the molecular descriptors, to obtain a robust and predictive model that also provides useful information as to 270

DOI: 10.1021/acs.jcim.5b00229 J. Chem. Inf. Model. 2016, 56, 269−274

Perspective

Journal of Chemical Information and Modeling

single discrete molecular entity, rather a distribution of molecular weights, sizes, functionality etc. Accordingly, one could argue that the definition of relationship between the descriptors and the property being modeled is different. In many of these models, for instance those derived from large toxicity data sets, there may be multiple modes of action or biological mechanisms in play. While classical linear QSAR models generally cannot properly deal with more than one mechanism of action, some nonlinear algorithms (e.g., neural networks) can do so. In this case the multiple models of action can be captured in a single model that can be though of as a “model of models”. The larger the number and chemical diversity of compounds in a data set, the higher the probability that multiple modes of action will be involved. Thus, the close relationship between the descriptors in the models and the thermodynamic aspects of the ligand−receptor interactions implicit in classical QSAR, is not obviously present, or may be entirely absent in these newer types of QSAR models. These QSAR methods using computed descriptors and covering high molecular diversity generate structure−property relationships that may be considered more like pattern recognition models. They are not clearly or directly grounded in medicinal or physical organic chemistry, or biology. Nonetheless they are capable of generating very good predictions of properties for unknown molecules that lie in, or close to, the model domain of applicability. These types of model have become more popular due to the increasing size and diversity of data sets, development of new or improved mathematical methods, the availability of large numbers of descriptors generated by programs such as, DRAGON,56 ADRIANA57 etc., and the increased emphasis on prediction rather than interpretation. These nonclassical QSAR models are capable of dealing with much higher molecular diversity than classical QSAR models, with compounds in the training set not required to have a common structural core. These models are often derived from much larger training sets, and can span large ranges of biological (or physical) property space, as high as 13−14 orders of magnitude in some cases.58 Because of their higher molecular diversity they have larger domains of applicability and can therefore make predictions of the properties of a wider range of new molecules with higher reliability than classical QSAR models (unless the new molecule is closely related to a classical QSAR training set). They are therefore closer to global models than local models. These models are most useful when a robust prediction of a specific property is the primary aim e.g. of aqueous solubility, anticancer activity, acute toxicity, etc. Because of the lack of common structural core, and the fact that they use computational descriptors that are often arcane, they are very difficult to understand and interpret mechanistically. Nonetheless, their prediction power can be very good and they are undoubtedly useful for optimizing leads or for predicting the properties of new molecules that have not yet been synthesized or tested. Care must be taken to ensure that they are as sparse or parsimonious as possible to ensure they have the optimum level of predictivity. Sparse feature selection methods and regularization of regression59,60 have been very useful in imposing appropriate levels of parsimony on models. A recent example of a QSAR model of this type is a global model for aqueous solubility of small organic compounds,58 a very important property for drug candidates. This model was constructed from almost 5000 measured solubility values using machine-learning methods. The chemical diversity of the

which molecular features modulate biological activity, in what way, and why. For example, if the descriptors can be chemically interpreted in terms of lipophilicity, electron donating or withdrawing properties, hydrogen bonding effect, molecular size etc., then any models developed can provide information useful to organic chemists on how they can improve biological activity. There is a link between the types of parameters used in the models and free energy related properties of the interactions between ligands and target proteins. There is also an assumption that the compounds under consideration operate via a single model of action. It has proven possible to stretch these requirements to account for in vivo as well as in vitro structure−activity relationships. In vivo models are on less secure ground regarding the relationship of the model to free energy dependent processes in an organism but they have nonetheless been useful. In most cases QSAR models of this type necessarily have smaller domains of applicability than QSAR models derived from large and more heterogeneous data sets, and are thus largely local models. However, as interpretation rather than prediction are their main aim, the domain size is less important. Relatively recent developments in 3D molecular field methods15,16 like CoMFA and CoMSIA have assisted with interpretation of models, and the molecular field descriptors generated by probe molecules are also likely to have acceptable thermodynamic bases.47,48 Classical QSAR models are also the most intellectually satisfying because it is possible to understand what the models are telling you and to obtain new molecular insights from them.



QSAR MODELS DERIVED FROM LARGE, OFTEN HETEROGENEOUS DATA SETS THAT ARE NOT IDENTIFIABLY BASED ON THE THERMODYNAMICS OF MOLECULAR INTERACTIONS The rapid march of technology and computing power since Hansch and Fujita’s time has provided increasingly large sets of data to be analyzed by cheminformatics methods such as QSAR and increasingly powerful mathematical algorithms for modeling complex data. Katritzky among others has criticized the classical Hansch QSAR method because contemporary QSAR and QSPR models often required large and noncongeneric (no common core structure) data sets. Experimental determination of the physicochemical parameters required for the derivation of a classical QSAR is also often infeasible or expensive.49 Furthermore, the use of nonempirical molecular descriptors often provides advantages over the classical approach, and for noncongeneric data sets, the classical approach is generally not applicable at all. Models based on nonempirical molecular descriptors that can be computed are the only feasible way to establish structure−property (activity) relationships in these situations. This, however, neglects the different roles that the two types of QSAR modeling methods play, implying that one is more “correct” than the other. Reviews have been published on the application of machine learning and other statistical methods to drug discovery,50 physicochemical property prediction,51 and materials52 There is a recent trend to use these types of nonclassical QSAR models even more widely than was originally intended, increasingly in the toxicology, materials,52 biomaterials,53,54 and nanotechnology55 fields. However, its clear that in many of these studies there is not a single mode of action or even a 271

DOI: 10.1021/acs.jcim.5b00229 J. Chem. Inf. Model. 2016, 56, 269−274

Perspective

Journal of Chemical Information and Modeling

Figure 1. Performance of a machine-learning model of aqueous solubility (log S) for the training set (left) and test set (right).

subtlety that the more arcane descriptors do has fostered a greater divergence of the two QSAR approaches. If methods for encoding molecular structure and physicochemical properties in a way that can be readily mapped back onto molecules and capture the essence interactions between ligands and proteins can be developed, the “two QSARs” may finally become closer to convergence and the field of drug design and discovery will benefit enormously. It may be argued that the development of more effective and interpretable molecular descriptors is one of the last major challenges of the QSAR field and is a very legitimate topic for future research. Clearly, complete convergence will not be possible given the high chemical diversity of large QSAR models, but interpretability of these models should very significantly improve. Improved teaching classical of QSAR theory to researchers entering the field will also do much to increase the quality of research in this field (which sometimes is problematic) and expand the appreciation that both types of QSAR modeling approaches are important, essentially allowing practitioners to make better choices. The take home messages from this are that the interpretability of QSAR models is largely driven by the choice of descriptors and that both types of QSAR model can be equally valuable. The reality is that both can happily and synergistically coexist and that each has an important role in molecular property modeling and prediction. Understanding the origins of the two methods, the theoretical underpinnings, and the types of applications for which each is best suited will greatly resolve the “tension” between the two types of QSAR practitioners.

training set was very high, the range of solubilities was from 10−12 to 30 M. The model accounted for 90% of the variance in the data and could predict the aqueous solubilities of new compounds with a standard error of prediction of 0.66 log or a factor of 4. However, the model required the use of 49 descriptors, some of which were arcane and very difficult to interpret. The performance of the model in predicting the training and test set is given in Figure 1.



STEPS NEEDED TO CORRECTLY APPRECIATE HOW QSAR METHODS HAVE EVOLVED AND CAN BEST BE EMPLOYED Both types of QSAR modeling are equally valuable and the choice of which to use is almost entirely driven by the types of scientific questions to be answered. Where interpretation of the molecular interactions that are occurring is more important than predicting properties of new molecules, then classical QSAR should be used. If modeling of large and chemically diverse data sets to make accurate predictions of properties of new molecules is the main aim, nonclassical QSAR methods are more appropriate. As QSAR emerged at a time when computational resources were scarce and expensive, many researchers have subsequently worked on breaking the method down to a series of “unit operations” and improving each of them to take advantage of now plentiful and cheap computational power, and new developments in mathematics and computer algorithms. Methods for generating large numbers of molecular descriptors (e.g., DRAGON61 and Adriana, CoMFA15), tools for selecting suitable descriptors without generating chance correlations (e.g., PCA, genetic algorithms, MLREM60), effective and parsimonious linear and nonlinear algorithms for mapping the relationships between descriptors and biological or other property (e.g., SVM, random forest, neural networks, Gaussian processes62 etc.), and useful ways to estimate model predictivity and domain of applicability (cross validation, test set, bootstrapping etc.) have emerged as a result.63 One of the major drivers for the emergence of two main “camps” of QSAR researchers has been the increasingly arcane nature of the descriptors used in QSAR models generated by nonclassical (e.g., machine learning-based) methods that have become popular. One may argue that the dearth of chemically interpretable descriptors that capture all of the molecular



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Phone: +61 3 9545 2477. Author Contributions

The authors contributed equally. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS D.A.W. acknowledges the financial support from the CSIRO Advanced Materials Transformational Capability Platform and a Newton Turner Award for Exceptional Senior Scientists. 272

DOI: 10.1021/acs.jcim.5b00229 J. Chem. Inf. Model. 2016, 56, 269−274

Perspective

Journal of Chemical Information and Modeling



(25) Schneider, G.; Downs, G. Machine learning methods in QSAR modelling. QSAR Comb. Sci. 2003, 22, 485−486. (26) Zefirov, N. S.; Palyulin, V. A. QSAR for boiling points of ″small″ sulfides. Are the ″high-quality structure-property-activity regressions″ the real high quality QSAR models? J. Chem. Inf. Model. 2001, 41, 1022−1027. (27) Tropsha, A.; Gramatica, P.; Gombar, V. K. The importance of being earnest: Validation is the absolute essential for successful application and interpretation of QSPR models. QSAR Comb. Sci. 2003, 22, 69−77. (28) Cronin, M. T. D. Quantative Structure-Activity Relationships (QSARs)−Applications and Methodology. In Recent Advances in QSAR Studies; Springer, 2010; Vol. 8, Chapter 1, pp 3−11. (29) Johnson, S. R. The trouble with QSAR (or how I learned to stop worrying and embrace fallacy). J. Chem. Inf. Model. 2008, 48, 25−6. (30) Fujita, T.; Iwamura, H. Applications of various steric constants to quantitative analysis of structure-activity relationships. Topics Curr. Chem. 1983, 114, 119−157. (31) Fujita, T.; Hansch, C. Analysis of Structure-Activity Relationship of Sulfonamide Drugs Using Substituent Constants. J. Med. Chem. 1967, 10, 991−1000. (32) Hansch, C.; Fujita, T. Status of QSAR at the end of the twentieth century. ACS. Symp. Ser. 1995, 606, 1−12. (33) Muir, R. M.; Fujita, T.; Hansch, C. Structure-Activity Relationship in Auxin Activity of Mono-Substituted Phenylacetic Acids. Plant Physiol. 1967, 42, 1519−1526. (34) Fujita, T.; Koshimizu, K.; Kawazu, K.; Imai, S.; Mitsui, T. The Plant Growth Activity of 1-Naphthoic Acid Derivatives and Their Related Compounds. Bull. Inst. Chem. Res., Kyoto Univ. 1960, 38, 76− 93. (35) Hansch, C.; Muir, R. M. Electronic effect of substituents on the activity of phenoxyacetic acids. In Plant Growth Regulation; Iowa State University Press: Ames, 1961; pp 431−443. (36) Muir, R. M.; Hansch, C. Chemical Structure and GrowthActivity of Substituted Benzoic Acids. In Plant Growth Regulation; Iowa State University Press: Ames, 1961; pp 249−258. (37) Hammett, L. P. The effect of structure upon the reactions of organic compounds benzene derivatives. J. Am. Chem. Soc. 1937, 59, 96−103. (38) Hammett, L. P. Linear free energy relationships in rate and equilibrium phenomena. Trans. Faraday Soc. 1938, 34, 156−164. (39) Fujita, T.; Kawazu, K.; Mitsui, T.; Katsumi, M. Studies on Plant Growth Regulators 20. Structure/Activity Relationship of Ac-AlkylHydro-1-Naphthoic Acids and Related Compounds. Phytochemistry 1967, 6, 889−897. (40) Kubinyi, H. Quantitative Structure-Activity-Relationships 7. Bilinear Model, a New Model for Nonlinear Dependence of Biological-Activity on Hydrophobic Character. J. Med. Chem. 1977, 20, 625−629. (41) Brown, H. C.; Stock, L. M. Critical Examination of Applicability of a Linear Free Energy Relationship to Aromatic Substitution Reactions. J. Am. Chem. Soc. 1962, 84, 3298−3305. (42) Portoghese, P. Linear Free Energy Relationship among Analgesic N-Substituted Phenylpiperidine Derivatives. Method of Detecting Similar Modes of Molecular Binding to Common Receptors. J. Pharm. Sci. 1965, 54, 1077−1079. (43) Leonard, J. T.; Roy, K. Comparative classical QSAR modeling of Anti-HIV Thiocarbamates. QSAR Comb. Sci. 2007, 26, 980−990. (44) Zhou, Y.; Ni, Z.; Chen, K. P.; Liu, H. J.; Chen, L.; Lian, C. Q.; Yan, L. R. Modeling Protein-Peptide Recognition Based on Classical Quantitative Structure-Affinity Relationship Approach: Implication for Proteome-Wide Inference of Peptide-Mediated Interactions. Protein J. 2013, 32, 568−578. (45) Tripathi, U. K.; Pandey, I. P. Anti-HIV Activity Study by Classical QSAR Method for 1-Alkoxymethyl-5-alkyl-6-naphthylmethyl Uracils as HEPT Analogues. Asian J. Chem. 2011, 23, 1857−1860. (46) Fujita, T.; Nakagawa, Y. QSAR and mode of action studies of insecticidal ecdysone agonists. SAR QSAR Environ. Res. 2007, 18, 77− 88.

REFERENCES

(1) Hansch, C.; Fujita, T. ρ, σ, π Analysis. Method for Correlation of Biological Activity and Chemical Structure. J. Am. Chem. Soc. 1964, 86, 1616−1624. (2) Hansch, C.; Streich, M.; Geiger, F.; Muir, R. M.; Maloney, P. P.; Fujita, T. Correlation of Biological Activity of Plant Growth Regulators and Chloromycetin Derivatives with Hammett Constants and Partition Coefficients. J. Am. Chem. Soc. 1963, 85, 2817−2824. (3) Livingstone, D. J.; Salt, D. W. Regression-Analysis for QSAR Using Neural Networks. Bioorg. Med. Chem. Lett. 1992, 2, 213−218. (4) Manallack, D. T.; Ellis, D. D.; Livingstone, D. J. Analysis of Linear and Nonlinear QSAR Data Using Neural Networks. J. Med. Chem. 1994, 37, 3758−3767. (5) Salt, D. W.; Yildiz, N.; Livingstone, D. J.; Tinsley, C. J. The Use of Artificial Neural Networks in QSAR. Pestic. Sci. 1992, 36, 161−170. (6) Burden, F. R.; Winkler, D. A. Robust QSAR models using Bayesian regularized neural networks. J. Med. Chem. 1999, 42, 3183− 3187. (7) Winkler, D. A. Neural networks as robust tools in drug lead discovery and development. Mol. Biotechnol. 2004, 27, 139−167. (8) Winkler, D. A.; Burden, F. R. Robust QSAR models from novel descriptors and Bayesian Regularised Neural Networks. Mol. Simul. 2000, 24, 243−258. (9) Czerminski, R.; Yasri, A.; Hartsough, D. Use of Support Vector Machine in pattern classification: Application to QSAR studies. Quant. Struct.-Act. Relat. 2001, 20, 227−240. (10) Norinder, U. Support vector machine models in drug design: applications to drug transport processes and QSAR using simplex optimizations and variable selection. Neurocomput. 2003, 55, 337−346. (11) Burden, F. R.; Winkler, D. A. Relevance Vector Machines: Sparse Classification Methods for QSAR. J. Chem. Inf. Model. 2015, 55, 1529−1534. (12) Cedeno, W.; Agrafiotis, D. K. Using particle swarms for the development of QSAR models based on K-nearest neighbor and kernel regression. J. Comput.-Aided Mol. Des. 2003, 17, 255−263. (13) Polishchuk, P. G.; Muratov, E. N.; Artemenko, A. G.; Kolumbin, O. G.; Muratov, N. N.; Kuz’min, V. E. Application of Random Forest Approach to QSAR Prediction of Aquatic Toxicity. J. Chem. Inf. Model. 2009, 49, 2481−2488. (14) Svetnik, V.; Liaw, A.; Tong, C.; Culberson, J. C.; Sheridan, R. P.; Feuston, B. P. Random forest: A classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Model. 2003, 43, 1947−1958. (15) Cramer, R. D.; Patterson, D. E.; Bunce, J. D. Comparative Molecular-Field Analysis (CoMFA) 0.1. Effect of Shape on Binding of Steroids to Carrier Proteins. J. Am. Chem. Soc. 1988, 110, 5959−5967. (16) Klebe, G.; Abraham, U. Comparative Molecular Similarity Index Analysis (CoMSIA) to study hydrogen-bonding properties and to score combinatorial libraries. J. Comput.-Aided Mol. Des. 1999, 13, 1− 10. (17) Crippen, G. M. Chirality Descriptors in QSAR. Curr. Comput.Aided Drug Des. 2008, 4, 259−264. (18) Golbraikh, A.; Tropsha, A. QSAR Modeling using chirality descriptors derived from molecular topology. J. Chem. Inf. Model. 2003, 43, 144−154. (19) Serilevy, A.; West, S.; Richards, W. G. Molecular Similarity, Quantitative Chirality, and Qsar for Chiral Drugs. J. Med. Chem. 1994, 37, 1727−1732. (20) Fujita, T. The Birth of QSAR -In Memory of Professor Corwin Hansch. J. Pestic. Sci. 2012, 37, 206−214. (21) Salassie, C.; Verma, R. P. History of Quantitative Structure-Activity Relationships; Wiley: NY, 2010. (22) Warr, W. A. Some Trends in Chem(o)informatics. In Cheminformatics and Computational Chemical Biology; Bajorath, J., Ed.; Humana Press: Heidelberg, 2011; Chapter 1, pp 1−38. (23) Fujita, T. In memoriam Professor Corwin Hansch: birth pangs of QSAR before 1961. J. Comput.-Aided Mol. Des. 2011, 25, 509−517. (24) Doweyko, A. M. QSAR: dead or alive? J. Comput.-Aided Mol. Des. 2008, 22, 81−89. 273

DOI: 10.1021/acs.jcim.5b00229 J. Chem. Inf. Model. 2016, 56, 269−274

Perspective

Journal of Chemical Information and Modeling (47) Suzuki, T.; Ishida, M.; Fabian, W. M. F. Classical QSAR and comparative molecular field analyses of the host-guest interaction of organic molecules with cyclodextrins. J. Comput.-Aided Mol. Des. 2000, 14, 669−678. (48) Carroll, F. I.; Mascarella, S. W.; Kuzemko, M. A.; Gao, Y. G.; Abraham, P.; Lewin, A. H.; Boja, J. W.; Kuhar, M. J. Synthesis, LigandBinding, and Qsar (CoMFA and Classical) Study of 3-Beta-(3′Substituted Phenyl), 3-Beta-(4′-Substituted Phenyl), and 3-Beta(3′,4′-Disubstituted Phenyl)Tropane-2-Beta-Carboxylic Acid MethylEsters. J. Med. Chem. 1994, 37, 2865−2873. (49) Katritzky, A. R.; Lobanov, V. S.; Karelson, M. QSPR - the Correlation and Quantitative Prediction of Chemical and PhysicalProperties from Structure. Chem. Soc. Rev. 1995, 24, 279−287. (50) Sprous, D. G.; Palmer, R. K.; Swanson, J. T.; Lawless, M. QSAR in the Pharmaceutical Research Setting: QSAR Models for Broad, Large Problems. Curr. Top. Med. Chem. 2010, 10, 619−637. (51) Katritzky, A. R.; Kuanar, M.; Slavov, S.; Hall, C. D.; Karelson, M.; Kahn, I.; Dobchev, D. A. Quantitative Correlation of Physical and Chemical Properties with Chemical Structure: Utility for Prediction. Chem. Rev. 2010, 110, 5714−5789. (52) Le, T.; Epa, V. C.; Burden, F. R.; Winkler, D. A. Quantitative Structure-Property Relationship Modeling of Diverse Materials Properties. Chem. Rev. 2012, 112, 2889−2919. (53) Epa, V. C.; Hook, A. L.; Chang, C.; Yang, J.; Langer, R.; Anderson, D. G.; Williams, P.; Davies, M. C.; Alexander, M. R.; Winkler, D. A. Modelling and Prediction of Bacterial Attachment to Polymers. Adv. Funct. Mater. 2014, 24, 2085−2093. (54) Epa, V. C.; Yang, J.; Mei, Y.; Hook, A. L.; Langer, R.; Anderson, D. G.; Davies, M. C.; Alexander, M. R.; Winkler, D. A. Modelling human embryoid body cell adhesion to a combinatorial library of polymer surfaces. J. Mater. Chem. 2012, 22, 20902−20906. (55) Epa, V. C.; Burden, F. R.; Tassa, C.; Weissleder, R.; Shaw, S.; Winkler, D. A. Modeling Biological Activities of Nanoparticles. Nano Lett. 2012, 12, 5808−5812. (56) Todeschini, R.; Consonni, V.; Mannhold, R.; Kubinyi, H.; Timmerman, H. Handbook of Molecular Descriptors; Wiley VCH: Weinheim, 2000; Vol. 11. (57) Sadowski, J.; Wagener, M.; Gasteiger, J. Assessing similarity and diversity of combinatorial libraries by spatial autocorrelation functions and neural networks. Angew. Chem., Int. Ed. Engl. 1996, 34, 2674− 2677. (58) Salahinejad, M.; Le, T. C.; Winkler, D. A. Aqueous Solubility Prediction: Do Crystal Lattice Interactions Help? Mol. Pharmaceutics 2013, 10, 2757−2766. (59) Burden, F. R.; Winkler, D. A. An Optimal Self-Pruning Neural Network and Nonlinear Descriptor Selection in QSAR. QSAR Comb. Sci. 2009, 28, 1092−1097. (60) Burden, F. R.; Winkler, D. A. Optimal Sparse Descriptor Selection for QSAR Using Bayesian Methods. QSAR Comb. Sci. 2009, 28, 645−653. (61) Todeschini, R.; Consonni, V. Handbook of molecular descriptors; Wiley-VCH: Weinheim and Chichester, 2000; p xxi. (62) Burden, F. R. Quantitative structure - Activity relationship studies using gaussian processes. J. Chem. Inf. Model. 2001, 41, 830− 835. (63) Alexander, D. L. J.; Tropsha, A.; Winkler, D. A. Beware of R2: Simple, Unambiguous Assessment of the Prediction Accuracy of QSAR and QSPR Models. J. Chem. Inf. Model. 2015, 55, 1316−1322.

274

DOI: 10.1021/acs.jcim.5b00229 J. Chem. Inf. Model. 2016, 56, 269−274