Are We Opening the Door to a New Era of Medicinal Chemistry or

Jun 26, 2019 - (7−13) Thus, a tremendous burden looms over medicinal chemists who have .... of pharmacologically relevant molecules and may help resea...
2 downloads 0 Views 10MB Size
Perspective Cite This: J. Med. Chem. XXXX, XXX, XXX−XXX

pubs.acs.org/jmc

Are We Opening the Door to a New Era of Medicinal Chemistry or Being Collapsed to a Chemical Singularity? Perspective Yan A. Ivanenkov,*,†,‡,§,∥ Bogdan A. Zagribelnyy,†,∥ and Vladimir A. Aladinskiy†,§

Downloaded via BUFFALO STATE on July 17, 2019 at 10:20:40 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.



Insilico Medicine Hong Kong Limited (previously Insilico Medicine, Inc.), Unit 307A, Core Building 1, 1 Science Park East Avenue, Hong Kong Science Park, Pak Shek Kok, Hong Kong ‡ Institute of Biochemistry and Genetics Russian Academy of Science (IBG RAS) Ufa Scientific Centre, Oktyabrya Prospekt 71, Ufa 450054, Russian Federation § Moscow Institute of Physics and Technology (State University), 9 Institutskiy Lane, Dolgoprudny, Moscow 141700, Russian Federation ∥ Chemistry Department, Lomonosov Moscow State University, Leninskie Gory, Building 1/3, GSP-1, Moscow 119991, Russian Federation S Supporting Information *

ABSTRACT: The paradigm of “drug-like-ness” dramatically altered the behavior of the medicinal chemistry community for a long time. In recent years, scientists have empirically found a significant increase in key properties of drugs that have moved structures closer to the periphery or the outside of the rule-offive “cage”. Herein, we show that for the past decade, the number of molecules claimed in patent records by major pharmaceutical companies has dramatically decreased, which may lead to a “chemical singularity”. New compounds containing fragments with increased 3D complexity are generally larger, slightly more lipophilic, and more polar. A core difference between this study and recently published papers is that we consider the nature and quality of sp3-rich frameworks rather than sp3 count. We introduce the original descriptor MCE-18, which stands for medicinal chemistry evolution, 2018, and this measure can effectively score molecules by novelty in terms of their cumulative sp3 complexity.



INTRODUCTION More than one-and-a-half centuries ago, Charles Darwin and Alfred Wallace enunciated the guiding principles of the groundbreaking theory of evolution as a self-organizing, generally entropy-driven, but not chaotic, process of natural selection and adaptation of living organisms to their environment. Since that time, enormous progress has been made in almost every realm of life, science, and technology. The spectacular growth in many biological disciplines has allowed scientists to formulate unambiguous mechanistic and philosophical explanations for “how and why” evolution has transpired and how it is fueled by a range of vital impulses. In contrast to technological and industrial progress, an opposing hypothesis on biological nature has been presented that postulates the basic principles of a devolution scenario in parallel with or instead of evolution. Ontologically, we can apply the fundamental outlook of evolution and devolution to medicinal chemistry and drug design via the prism of the crucial drivers that dominate these areas. Chemically, the “big bang” was a valuable and versatile point source of a practically inexhaustible universe abundantly © XXXX American Chemical Society

populated with an astronomical number of hypothetical druglike structures (∼1060).1,2 Although more than 135 million molecules have been reported,3 in-house collections held by various corporations cover only ∼8−10 million unique molecules (excluding building blocks), and most of them are assemblies of “vecchio” clusters and are generally not attractive. These compounds are frequently beyond the current criteria of novelty and are not aligned with the major trends observed in modern drug discovery. The HTS boom1 significantly exhausted available stocks of chemical libraries, resulting in an immense number of molecules languishing in vaults that have excellent chances of becoming tombs or “trashure boxes”. Within this niche, the vast majority of scaffolds and chemotypes are now considered unattractive and have limited potential, and like dinosaurs, these scaffolds are slipping away one by one. The diversity and phylogeny of biological targets are closely related to medicinal chemistry. These disciplines have merged into one Received: January 10, 2019 Published: June 12, 2019 A

DOI: 10.1021/acs.jmedchem.9b00004 J. Med. Chem. XXXX, XXX, XXX−XXX

Journal of Medicinal Chemistry

Perspective

carbon atoms.16 Moreover, this count does not reflect true 3D complexity of molecules and their synthetic feasibility. For instance, testosterone, nonadecenoic acid, and a relatively novel cyclobutyl-[(3aR,6aR)-3a-(3-methyl[1,2,4]oxadiazol-5-yl)-5(tetrahydropyran-4-yl)hexahydropyrrolo[3,4-c]pyrrol-2-yl]methanone (available in the ChemDiv collection since 2018) have the same Fsp3 value of 84.21, but they are undoubtedly different from the medicinal chemistry point of view. Moreover, as for any statistical analysis where these features should be taken into account properly, e.g., for the investigation of how nonplanarity of molecules influences clinical success or lipophilicity as well as medicinal chemistry evolution, simple sp3 count can lead to faulty results because there are a large number of compounds of natural origin, including their derivatives (see discussion section), and the majority of these molecules were launched into the market many years ago, for instance, cephalotaxines, nucleosides, oligopeptides, penicillin, or macrolide antibiotics. In this case, a thorough preprocessing procedure of a reference database is of high importance for reducing the number of very similar nature-based compounds to overcome an invalid statistical inference and incorrect interpretation. Lovering and colleagues used the GV KBIO database as a reference data set. This base was not preprocessed, and all compounds reported prior to 1980 were simply removed from the study (more than 1000 molecules, 46% of the whole database). Authors speculated that highly saturated molecules were more likely to succeed in drug discovery pipeline. However, if duplicates, prodrugs and molecules, especially of natural origin, with high similarity in structure, e.g., “me-too drugs”, were not excluded, the obtained results seem rather cursory. Several papers addressing the complexity of organic structures, including nature-like molecules, have been published earlier.17−24 However, all these approaches suffer from multiple limitations and have not been applied to estimate medicinal chemistry evolution. For example, Allu and Oprea proposed synthetic and molecular complexity index (SMCM) calculated based on atomic electronegativities and bond parameters for the assessment of molecular complexity and synthetic feasibility.17 A medium-sized subset (261 thousand small-molecule compounds with MW < 700) selected from the ChemDiv inhouse collection was used as a reference database.25 The authors also provided a comparative analysis between SMCM, TWC (total walk count) by Rucker, and Barone’s complexity (BC) metrics and demonstrated that, in general, SMCM outperformed TWC and BC. However, after a thorough inspection, we have concluded that the claims by the authors are rather hasty and superficial. For example, the authors emphasized that high SMCM values negatively correlate with synthetic feasibility and positively correlate with molecular complexity, but this conclusion is highly questionable. They particularly noted that 1,2-bis(2-dimethylaminoethoxy)ethane (SMCM = 17.03) can be easily synthesized due to its symmetry,26 in contrast to thiophen-3-ylmethylamine (SMCM = 11.41)27 or also symmetric tris-pyridyl-1,2,4-triazole (SMCM = 47.45)28 or N-{2-[4(6-acetylamino-4-phenylquinazolin-2-yl)phenyl]-4-phenylquinazolin-6-yl}acetamide (SMCM = 84.722)29 as well as {2-[3,5bis-(2-dimethylaminoethyl)[1,3,5]triazinan-1-yl]ethyl}dimethylamine (SMCM = 26.75).30 However, all these compounds can be readily synthesized via relatively simple reactions under mild conditions although they have very different SMCM scores. With regard to molecular complexity, SMCM is also unable to translate this feature properly. For example, 2-oxo-1,2-dihydrobenzo[cd]indole-6-sulfonic acid bi-

symbiotic continuum and have been evolving together for a long time. As a rule, spectacular explosions in medicinal chemistry echo biological breakthroughs. However, if an elegant and novel molecule bearing a harmonious ensemble of potential binding points that meet key criteria is synthesized, an appropriate target can likely be identified. The evolution of biological targets shifts the focus of medicinal chemistry primarily toward complex natural product-like, three-dimensional structures mainly due to the considerable progress that has been made in the field of protein−protein interactions (PPIs), which are involved in disease signaling routes. Currently, scientists estimate that they have identified over 400 000 PPIs4 involving a wide variety of binding grooves with volumes of 800−2000 Å3 and binding sites spread along 250−1000 Å3.3,5 It is not surprising that companies have deployed extensive programs for designing PPI-focused libraries containing a large number of novel aliphatic and heteroaliphatic systems, spiro and chiral centers, bridged and fused rings to achieve better coverage of the 3D chemical space.6 However, annually, only approximately 15 new molecular entities (NMEs) are brought to the market, which highlights the stagnation of the pharmaceutical industry, mainly due to the high project attrition rate and growing IP barriers.7−13 Thus, a tremendous burden looms over medicinal chemists who have been implicated in this nontrivial process. A substantial time investment is now needed to design novel molecules that match the prevailing criteria, which are gradually moving farther from the optimal values set by traditional Lipinski’s rules14 and related indices.15 Therefore, we consider this as a manifestation of “chemical singularity”. We draw the analogy (allegory) between a black hole’s gravitational singularity and HTS. At the beginning of the century, HTS technology was maturing, becoming more powerful and commercially accessible for many research organizations. That period, there were a vast number of molecules available within vendor’s collections as a matter condensed around a black hole. For the following 2 decades many of these molecules have been analyzed during different HTS campaigns. In addition, medicinal chemists and specific filters moved a considerable number of compounds beyond drug-like-ness space a priori. As a result, the amount of this real “chemical matter” has reduced significantly and is collapsing toward a “chemical singularity” as a one-dimensional point that contains a huge chemical mass. De facto, for this moment, these molecules are actually beyond the event horizon and a relatively few number of compounds can escape it. Novel compounds have obviously more complex structure and are strikingly different from old chemical entities. Many current drug design programs start with de novo synthesis and it is not surprising that the “chemical matter” cannot be refilled rapidly. Considering this, medicinal chemistry evolution is becoming more apparent. We clearly illustrate it particularly using the number of unique patent records on novel leading compounds ordered by priority date (vide infra) and have made a conscious effort to explore this hypothesis via a broad statistical analysis using machine-learning techniques. We revealed that sp3 index, routinely applied for different goals in drug design, in many cases can lead to incorrect results and cannot be used for describing medicinal chemistry evolution due to many reasons listed below. Moreover, several other metrics for assessing molecular complexity have obvious limitations. The main drawback of sp3 index (Fsp3) introduced by Lovering is that the spatial geometry, composition, and complexity of structures especially for state-of-the-art molecules are not considered at all because these nontrivial features cannot be reflected a priori using only the portion of sp3-hybridized B

DOI: 10.1021/acs.jmedchem.9b00004 J. Med. Chem. XXXX, XXX, XXX−XXX

Journal of Medicinal Chemistry

Perspective

Figure 1. Simplified illustration of the evolution of medicinal chemistry.

ring (0 or 1), CHIRAL is the presence of a chiral center (0 or 1), SPIRO is the presence of a spiro point (0 or 1), sp3 is the portion of sp3-hybridized carbon atoms (from 0 to 1), Cyc is the portion of cyclic carbons that are sp3 hybridized (from 0 to 1), Acyc is a portion of acyclic carbon atoms that are sp3 hybridized (from 0 to 1), and Q1 is the normalized quadratic index.31 MCE-18 can be readily applied for the assessment of the novelty of pharmacologically relevant molecules and may help researchers in designing new chemical entities that have great potential in modern drug design. This index can also be effectively used for profiling HTS focused libraries and for the prioritization of molecules. Moreover, compounds with optimal MCE-18 values presumably have more reliable IP positions.

phenyl-4-ylamide and (S)-squalene-2,3-epoxide both have similar SMCM values and molecular weight, 65.6 (MW = 400.4) and 57.8 (MW = 426.7), respectively, but these compounds are radically different in their chemical nature, atomic composition, and complexity. Therefore, one of the main goals of this paper is to define a robust function that can discriminate between old and novel drug-like compounds from major pharmaceutical enterprises, thereby reflecting the evolution of medicinal chemistry. Prior to the study, we closely inspected the structures of compounds by large pharmaceutical companies from patent literature mainly focusing on novel fragments that were appearing in structures through the years. On the basis of the complexity of these moieties, we selected and prioritized simple descriptors that play a key role in evolutionary driven transformation of molecular architecture and topology. After statistical analysis, we validated several equations to achieve the best coverage and prediction ability. Considering a range of issues and evident limitations of different metrics proposed previously for assessing the complexity of a structure, we paid special attention to the possibility of translation of two-dimensional features, e.g., the number of spiro and chiral atoms, cyclic and linear carbon atoms that are sp3 hybridized as well as branching score and aromatic moieties, into 3D complexity and molecular framework. After multiple attempts, a superior function was found, and its prediction power was critically estimated using different and wellpreprocessed data sets. On the basis of the issues raised above, we have investigated the major trends in medicinal chemistry and drug design (Figure 1) with a focus on the evolution of the chemical structures of drugs that have been investigated in clinics, launched in the pharmaceutical market, or claimed in patent records. For the first time, we suggest “MCE-18” as a new molecular descriptor that can effectively score structures by their novelty and current lead potential in contrast to simple and in many cases falsepositive sp3 index, and given by the following equation:



MATERIALS AND METHODS

Database Preprocessing and Normalization. Patent Database. To avoid reagents and intermediates, we extracted only unique structures from pharmaceutical patent records by major pharmaceutical companies that are available in the Clarivate Analytics Integrity database under “Patents”.32 All these molecules were claimed to have the best activity (lead compounds) and were not found among the reactants or reagents. Since we mostly focus on analyzing only new entities, each structure was assigned to the earliest year (priority date) in which it appeared. The top 23 pharmaceutical companies were selected for this analysis33 (see the Supporting Information Table S1). Patent records on new drug substances by these players with priority dates from 1950 to 2018 were downloaded. The filtration procedure to isolate only the unique reports resulted in 30 153 records. Then, the reports were redownloaded as structure-data files (sdf) from the Clarivate Analytics Integrity database “Drugs & Biologics” and were preprocessed using ChemoSoft software. Thus, the core “Patent year” field was assigned for each structure, and then all the items were merged into one sdf, which was then subjected to the primary preparation process: charged structures were redrawn and presented in their neutral form, counterions were deleted, and errors in structures were manually corrected. Soft MCFs were applied to the database, resulting in the elimination of metal-, silicon-, and phospho-organic compounds; isotopes; etc. The first preprocessing step excluded 957 structures. Then, 281 samples were deleted because of disconnections between the “Patent year” and the launched date. In other words, the drug was launched prior to the patent record by a company not in the group evaluated herein. Structures that met the following criteria were also excluded: MW > 1200, more than 20 oxygen atoms, or more than 10 fluorine atoms, as well as those with no carbon atoms. Duplicates were removed, and the database was thus reduced to 28 161 samples. To increase the compound diversity and decrease the number of overrepresented chemotypes, a clustering procedure was carried out

NCSPTR ij yz jj zz jj z 3      jj sp + Cyc − Acyc zzzz j zz × Q1 MCE‐18 = jjjAR + NAR + CHIRAL + SPIRO + zz jj 1 + sp3 zz jj zz jj z k {

where AR is the presence of an aromatic or heteroaromatic ring (0 or 1), NAR is the presence of an aliphatic or a heteroaliphatic C

DOI: 10.1021/acs.jmedchem.9b00004 J. Med. Chem. XXXX, XXX, XXX−XXX

Journal of Medicinal Chemistry

Perspective

Molecular Descriptors. For all the compounds in the database, key molecular descriptors were calculated using ChemoSoft25 and SmartMining software.34,35 These descriptors include MCE-18, MW (molecular weight), LogP (lipophilicity, the calculated partition coefficient in the 1-octanol/water system), LogSw (predicted solubility in water), PSA (polar surface area, Å2), HBD (number of potential Hbond donors), HBA (number of potential H-bond acceptors), and SS (common electrotopological index) as well as AR, NAR, CHIRAL, SPIRO, NCSPTR, Q1, and sp3, which are defined below. For the selected descriptors, bivariate Student’s t-values were calculated (see the Supporting Information). We used the MCE-18, MW, PSA, HBD, HBA, LogSw, and SS values for in silico modeling, and this set of parameters were selected in accordance with their theoretical impact on the studied phenomenon. Kohonen Self-Organizing Mapping. Depending on the training set, map size 20 × 20 or 15 × 15 2D representations; learning epochs, 2000; initial learning rate, 0.3 (linear decay); initial learning radius, 15 or 10 (linear decay); initial weight coefficients, random distribution (not normalized); activation function, Gaussian; and the winning neuron was determined using Euclidean metrics. After the training process was completed, the areas populated by different classes of compounds were highlighted. We generally observed that samples from different categories were mainly located in distinct regions within the same map. Neurons were then prioritized based on the following privileged factor (PF): NiC1(%)/NiC2(%), where NiC1 is the percent of compounds from class 1 located in the i-th neuron, while NiC2 is the percent of other molecules from class 2 located in the same neuron and vice versa. A PF value greater than 1 was used as a threshold to assign neurons to one of the two classes. There were no or few “death” neurons within the maps. Nonlinear Sammon Mapping. To investigate how new molecules cover RO5 space, we performed Sammon-based mapping.36 The main goal of this algorithm is the approximation of local geometric and topological relationships hidden in the input chemical space and to visually present these relationships in a 2D- or 3D-dimensional plot. The fundamental idea of this approach is to perform the dimensionality reduction of the initial data set into fewer dimensions, and in this respect, this technique resembles self-organizing map (SOM) and multidimensional scaling. However, unlike other algorithms, classical Sammon nonlinear mapping allows scientists to construct a projection that reflects global topographic relationships as pairwise distances between all objects within the whole range of input vector samples. For modeling, we used the same set of molecular descriptors that were used for Kohonen SOM. Structure similarity was not considered in the study. Euclidean distance was selected as the similarity metric, the stress threshold was 0.01, the iteration number was 1000, and the optimization step was 0.1.

in ChemoSoft. A routine Tanimoto metric was used as a measure of 2D similarity. Compounds with similarity scores over 0.5 were assigned to the same cluster (at least 10 records per cluster). Then, the structures in each cluster were ranked by their diversity coefficients, and the samples with the highest 10% (for clusters containing fewer than 200 records) or 5% (for clusters containing more than 200 records) of diversity scores as well as all the compounds not assigned to a cluster were retained. The resulting database contained 24 232 structures. As a result, preprocessing and subsequent normalization led to the elimination of a total of ∼4000 structures (14%). The excluded structures might be meaningful; however, it was very important to execute the listed operations to achieve an adequate statistical outcome and to allow reliable in silico modeling. For instance, a great bit of “chemical turbulence” was introduced by the abundance of old chemotypes, e.g., morphines, penicillins, cephalosporins, steroids, macrolides, tetracyclines, and fluoroquinolones. Launched Drugs by Year. The raw collection of 4385 launched drugs was obtained from the Clarivate Analytics Integrity “Drugs & Biologics” database. We found that there was no information about the issue date associated with 1312 drugs; therefore, these records were excluded. The remaining 3073 records were redownloaded from “Drugs & Biologics” as an sdf. Of the remaining records, 1383 did not contain any information on the chemical structure (biologics); therefore, they were excluded as well. The earliest date of approval/ registration/launch was assigned to each sample. This information was accessible within the milestone table. As a result, 1690 records with their primary launch date were included in the final data set with no duplicates. Then, the database was preprocessed as described above, resulting in the elimination of another 14 records. Tanimoto-based clusterization was then performed by analogy to the procedure described above. Molecules with high structural similarity were deleted from each cluster (the minimum number of samples per cluster was 10). All compounds not assigned to a cluster were retained. Preprocessing and normalization yielded 1370 unique structures. Preclinical Evaluation, Clinical Trials, and Launched Drugs. The same data source was used to compare this data set. All the molecules that served as lead compounds (preclinical trials), that were clinically evaluated as drug candidates (phase I−III), and that were launched as drugs were downloaded as an sdf. Only the highest milestone point achieved by a sample was considered. For each category, preprocessing and normalization were performed following the procedure described above. As a result, the final database contained ∼30 000 preclinical, 1678 phase I, 1837 phase II, 464 phase III, and 1370 launched compounds. Specific Subsets. The classification ability of MCE-18 function toward different target families/classes was also evaluated using the following subsets. We have selected 13 265 records from the entire preprocessed patent database with the activity claimed against GPCRs (8249 molecules), protein kinases (3397 compounds), and proteases (1619 molecules), a total of 13 265 molecules (∼55% of all the compounds included in the parent data set). The key mechanism of action (MoA) was then assigned for each record using Integrity Database. Then, we have isolated molecules with MCE-18 values of ≥50 to investigate their populations among different target classes. As a result, the data sets of 1273 protease targeting molecules, 4616 GPCRs ligands, and 2103 protein kinase inhibitors were prepared (a total of 7992 compounds). For the remaining molecules in the database other targets were reported, including nuclear factors, hydrolases, and transferases. To estimate the classification power of MCE-18 toward drugs launched into the market or entered in clinical trials, we have collected the database containing molecules with the highest phase tag (phase I−III, preregistered, recommended approval, registered or launched). The related milestone date was also assigned for each sample. The basic cellular mechanism of action was added to each molecule: drugs targeting proteases, drugs acting on GPCRs, or drugs targeting protein kinases. All the records saved were then sorted by milestone date, and duplicates were eliminated using drug name field. After the follow-up preprocessing procedure, 810 GPCR ligands, 459 kinase inhibitors, and 171 protease targeting molecules were selected and used for the analysis.



RESULTS AND DISCUSSION

A Brief Insight into the History and Conception. Prior to the main discussion, we should provide a compact overview of the seminal research papers that address molecular properties and complexity based on descriptors and structural patterns. In a pioneering study, Bemis and Murcko37 showed that more than half of drug substances (a total of 5K structures) contained slightly more than 30 unique 2D molecular frameworks. These moieties included mostly planar aromatic or fused aromatic systems, natural-product-derived fragments (e.g., steroids and penicillins), and “naive” cycloaliphatic fragments. Since then, many papers have focused on drug features and the proximal chemical space. In 2010, Wang and Hou revealed only minor changes in the structures of approved and clinically evaluated drugs in terms of their molecular frameworks.38 They found that 50 fragments covered, on average, 50% of the accumulated drug database (more than 8K entities). Recently, Taylor and colleagues analyzed all linear combinations of ring systems among FDA-approved drugs.39 Their results showed that only D

DOI: 10.1021/acs.jmedchem.9b00004 J. Med. Chem. XXXX, XXX, XXX−XXX

Journal of Medicinal Chemistry

Perspective

might be a result of pronounced “evolution slope” or of the nonpreprocessed data set used by the Schneider team. The polar surface area of the products increased by more than 15% over the studied period despite a simultaneous increase in lipophilicity, whereas the sp3 carbon index, heteroatomic index, HBA, and HBD did not show consistent trends. The authors attributed the drastic decrease in the fraction of free-rotatable bonds and stagnation (or slight decrease) in the sp3 term to the increase in rigidity. However, the growing number of more rigid molecular frameworks can be associated with the increasing number of sp3-rich rings and chiral centers. Moreover, the above-described results were obtained using a nonpreprocessed data set; in particular, the authors did not reduce the number of compounds with high structural similarity within overloaded clusters. Notably, the preprocessing procedure is of high importance for valid statistical analysis and computational modeling because there are many clusters containing a large number of natural products, such as triterpenoid and steroidal saponins, alkaloids, glycosides, glucans, oligosaccharides, fatty acids, benzophenanthridine alkaloids, macrolides, catechins, ibogamine, podophyllotoxins, anthracyclines, aporphine polypeptides, detergents, and penicillin-based molecules. All these compounds and those with similar structures contribute substantially to the sp3 index, leading to misleading conclusions and false-positive results. Therefore, we speculate that, unlike the hypothesis formulated by the Schneider group, the increase in rigidity is primarily associated with the evolution of synthetic approaches that have allowed scientists to access novel spirocompounds, annulated derivatives, and novel building blocks with higher degrees of 3D complexity with an increasing number of cyclic sp3 carbons vs acyclic carbons. The same authors indicated that although the “average reaction product” generally met the drug-like criteria introduced by Lipinski for oral drugs, the number of products violating the RO5 has grown over the years. For example, within 1985−2005 a significant increase in the number of unique products violating lipophilicity rule (RO5) and possessing LogP > 5 was observed (+15%, slope value of 0.78) followed by decrease until 2015 (−5%, slope value of −0.56). During 1982−2010, the number of unique products with molecular weight beyond RO5 (MW > 500) grew as well (+16.5%, slope value of 0.6), then a slight decay was observed. However, this decrease in MW can be attributed to an insufficient statistical data to that moment (2015 year) because the paper by Schneider was published at the beginning of 2016 year. This issue has recently been described in more detail by Doak and co-workers.47 Using different sources, the authors collected a database of 485 drugs (phase I−III or launched) that did not follow the RO5 criteria. Notably, they found that de novo designed molecules accounted for a substantial portion (43%) of the entities included in the data set, followed by natural products (26%) and peptides and peptide mimetics (26%). Almost 65% of de novo designed compounds were targeted for oral administration. This clearly indicates that successful oral drugs can be discovered within the chemical space outside of the RO5 criteria or beyond the Lipinski landscape completely. Through the years, the number of drugs brought to the market with MW > 500 Da steadily increased up to 2010. A high number of oral de novo designed drugs were approved during 1995−2010, whereas natural products were the most common type of approved drugs prior to 1995. Recently, Michael Shultz at Novartis coined the term “Lipinski’s anchor” and critically argued that the standard drug-like-ness rules do not stand the test of time, especially the parameters of MW and HBA, which

1% of the theoretically possible combinations appeared in commercially available libraries. Other authors reached similar conclusions3,40,41 and highlighted that the field of medicinal chemistry has primarily focused on a limited area within the available drug-like chemical space and uses a relatively small number of organic reactions. Brown and Boström42 have recently investigated this issue and have demonstrated that only a small number of organic reactions dominate contemporary practice and account for more than 80% of the synthetic routes used in drug discovery. Clemons and co-workers showed that the structural complexity of small molecules of different origins was well correlated with their protein-binding profile.43 In contrast to the sp3-oriented approach, Kombo et al. revealed that shape-based 3D descriptors, e.g., shadow indices and radius of gyration, strongly influence the off-target activity profile.44 The common theme of the above-mentioned papers is the increasing role of spatial complexity in modern drug discovery. This complexity can be achieved by introducing new nontrivial cores during development. However, these moieties are not common, and it can be quite difficult to conjugate them with existing fragments. This issue was recently directly addressed in a comprehensive review by Schneider and colleagues.45 The authors critically assessed the toolbox of medicinal chemists based on a large number of reactions (>3 million) taken from U.S. patents published from 1976 to 2015. All the mined reactions were categorized into widely used types, e.g., Wittig olefination, Buchwald−Hartwig amination, Mitsunobu reaction, Suzuki−Miyaura coupling, or Sonogashira coupling. As a result, they concluded that the medicinal chemistry toolkit generally operates with a tight set of standard reactions. They estimated that little more than 120 different types of reactions accounted for 95% of the total reactions, and this percentage has increased over the past 3 decades. In addition, the authors performed a thorough analysis of more than 385K molecules claimed in pharmaceutical patent records as final reaction products (sorted by priority date) in terms of their key molecular properties. They revealed that the number of unique products over time followed a trend similar to that seen in the reactions: rapid growth was observed from 2005 to 2010 followed by a plateau up to 2015. Substructure searching showed that the overall number of cyclic moieties, e.g., heteroaromatic rings, aliphatic rings, and aliphatic heterocycles, gradually increased from a mean of 2.6 in 1976 to 3.5 in 2015 in contrast to aromatic carbocycles, such as phenyl, biphenyl, or naphthalene. The authors directly attributed this finding to Suzuki-type coupling reactions (1979) and their popularity. The more important finding was that most of the increase was due to a growth in the number of aliphatic heterocycles, e.g., morpholine, piperazine, and pyrrolidine rings, which can substantially improve water solubility. The authors also noticed an increase in the total number of saturated sp3-rich rings; however, they did not provide a detailed analysis of the structural and spatial complexity of these fragments, probably because of a relatively simple array of fingerprints implemented in RDKit. The authors also investigated the evolution of key molecular properties (MW, LogP, TPSA, sp3 count, HBA, HBD, and FRB) over time. They showed that the reaction products disclosed in pharmaceutical records were becoming larger, more lipophilic, more rigid, and more soluble over the studied period. For instance, the average molecular weight and lipophilicity of the target molecules claimed in 1976 were 331 Da and 3.1, respectively, while in 2015, these values increased overall by 24% (MW = 409 Da) and 16% (LogP = 3.6). In 2011, Walters and coworkers46 uncovered similar but less striking trends, which E

DOI: 10.1021/acs.jmedchem.9b00004 J. Med. Chem. XXXX, XXX, XXX−XXX

Journal of Medicinal Chemistry

Perspective

Figure 2. Representative examples of PPI inhibitors that reached the market or are being evaluated in clinics.

Table 1. Mean Values of the Molecular Descriptors Calculated for the Structures Disclosed in Patent Records by Major Pharmaceutical Companies over the Years no. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 entities

MD

1950−1983

1984−1990

1991−1997

1998−2004

2005−2011

2012−2018

increase factor

MCE-18 MW AR NAR CHIRAL SPIRO NCSPTR Q1 sp3 LogP LogSw PSA HBD HBA SS

43.2 345.4 0.86 0.72 0.51 0.017 0.22 17.5 38.3 2.6 −4.1 79.0 1.7 3.0 56.7 765

50.1 382.1 0.90 0.71 0.55 0.031 0.23 19.6 39.4 3.1 −4.4 84.2 1.7 3.3 62.2 2179

56.3 424.7 0.95 0.75 0.53 0.027 0.21 21.8 35.7 3.5 −4.7 92.5 2.1 3.4 71.7 2741

57.1 430.6 0.98 0.72 0.48 0.025 0.22 22.7 33.0 3.7 −4.8 88.0 1.8 2.8 73.5 6521

64.9 453.0 0.99 0.77 0.49 0.048 0.24 24.9 33.1 3.6 −4.8 94.0 1.8 2.8 77.6 8932

75.9 471.8 0.99 0.83 0.61 0.061 0.26 26.8 34.5 3.3 −4.8 101.9 2.0 3.0 82.6 3094

1.12 1.07 1.03 1.03 1.04 1.36 1.04 1.09 0.98 1.05 1.03 1.05 1.04 1.01 1.08 1.64

have risen considerably over the past 20 years.48 He concludes that the observed trend and reported data call into question the hypothesis that a “drug-like El Dorado” even exists. We have shown that this tendency has become more evident and exciting. Michael Shultz also discusses the likelihood of reaching a new golden age of drug discovery. In particular, he notices that the sharp increase in drug approvals in 1996 was associated with the Prescription Drug User Fee Act (PDUFA) of 1992 in response to the global AIDS epidemic. Hypothetically, many of these drugs could have been approved prior to this time point, and as the author states, “this ‘surge’ actually distorts what was a steady increase in productivity since the 1980s.”48 The past decade has been regarded as the most prolific era in drug discovery considering the large number of registered NCEs. Indeed, during this period, there were more than 200 oral NCEs, which amounts to nearly a third of all approved oral drugs. In contrast to the results by Schneider (vide supra), Michael Shultz suggests that the parameters associated with MW, e.g., FRBs and HBA, have also significantly increased, whereas no considerable changes have been observed in HBD. With respect to TPSA, both authors highlight meaningful growth. Notably, none of the authors correctly addressed the sp3 characteristics of new molecules using simple sp3 index or poorly preprocessed data sets. Depending on the method used, Michael Shultz also found a slight positive correlation with the calculated LogP values. Main Discussion. MCE-18. For the past decade, PPI research has progressed substantially and has resulted in many

druggable interactions, including p53/MDM2 and Bcl-2, GLP1R, Hsp70, NS5A, FAK, β-catenin, and BRDs (Figure 2).49,50 Javier Luque recently published a comprehensive review describing the evolution of CADD techniques within the field of PPI inhibitors.51 Many authors note that PPIs lie far beyond the prevailing drug design paradigm because molecules that may disturb such interaction are significantly different from classic drug templates and chemotypes. A large number of innovative synthetic scaffolds with notable 3D complexity have been developed in recent years, thereby allowing scientists to design nontrivial molecules with a high degree of spatial diversity not only as PPI inhibitors but also for other applications, for example, as kinase inhibitors. During the past decade, PPI inhibitors have attracted a great deal of attention in the field of drug design and development52 since a large number of PPIs have already been discovered and many of them are now considered as attractive biological targets. From the classical medicinal chemistry point of view, novel compounds should meet several trivial criteria, e.g., Lipinski’s RO5 or RO3 to reach success in clinical trials as peroral drug candidates. However, PPI inhibitors are frequently much larger and more bulky than traditional drugs. As a rule, they are equipped with a balanced number of HBA, HBD, and hydrophobic anchors well-adjusted to crucial hot-spots observed along a PPI interface. On the other hand, teachable examples of successfully launched or developing PPI inhibitors (Figure 2) that are not consistent with RO5 parameters have F

DOI: 10.1021/acs.jmedchem.9b00004 J. Med. Chem. XXXX, XXX, XXX−XXX

Journal of Medicinal Chemistry

Perspective

Table 2. Mean Values of the Molecular Descriptors Calculated for the Launched Drugs no. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 entities

MD

1910−1959

1960−1969

1970−1979

1980−1989

1990−1999

2000−2005

2006−2011

2012−2018

increase factor

MCE-18 MW AR NAR CHIRAL SPIRO NCSPTR Q1 sp3 LogP LogSw PSA HBD HBA SS

37.2 302.3 0.74 0.62 0.57 0.008 0.21 14.6 46.1 1.5 −3.6 75.3 1.7 3.0 49.6 129

41.2 332.2 0.82 0.67 0.45 0.010 0.23 16.3 42.4 2.3 −3.9 75.8 2.1 2.9 51.6 102

40.2 338.4 0.83 0.63 0.52 0 0.22 16.6 43.2 2.4 −4.0 78.4 2.1 3.1 56.0 120

44.1 363.6 0.78 0.65 0.60 0.014 0.24 17.1 47.6 1.8 −3.8 91.5 2.2 3.8 59.5 284

46.4 376.5 0.79 0.70 0.62 0.015 0.24 17.9 45.6 1.9 −4.0 94.8 2.2 3.7 61.5 271

50.5 387.3 0.76 0.67 0.64 0.015 0.24 18.9 47.8 1.5 −3.9 98.7 2.2 3.9 63.8 135

52.8 402.8 0.80 0.72 0.59 0.015 0.27 20.2 46.9 2.9 −4.4 92.7 2.2 3.5 67.0 133

55.4 430.5 0.90 0.69 0.48 0.010 0.22 22.2 37.6 2.8 −4.5 97.4 2.1 3.6 73.4 196

1.06 1.05 1.03 1.02 0.99 0.00 1.01 1.06 0.98 1.15 1.03 1.04 1.03 1.03 1.06 1.18

Figure 3. Devolution of medicinal chemistry expressed in the number of patent records by major pharmaceutical companies (bar chart, prior to normalization) sorted by the priority date vs continual growth of MCE-18 scores (black solid line, after normalization). ∗ indicates the year that the Lipinski seminal paper14 was published.

assigned for a patent record (Table 1) as well as with the launch date (Table 2). This Zagreb-based index encodes the degree of branching of a structural framework in terms of vertex degree and implicitly accounts for 2D complexity. The AR term was included in the equation because Shinji Soga and co-workers53 demonstrated that Phe, Trp, and Tyr are more common in the amino acid composition at ligand-binding sites of a protein, and they have been implicated in many PPIs but not in surface interactions. Moreover, this term allows us to isolate branched aliphatic and cycloaliphatic substances, e.g., fatty acids, steroidbased structures, macrolides, and polycarbohydrates, from the pool of new molecules as their structures are qualitatively different from those of “old style” compounds. Cumulatively, all these terms contribute significantly to the evolution of medicinal

shown high MCE-18 scores. In these cases, medicinal chemists are faced with awkward challenges and need to improve instruments and techniques for discovering new drugs beyond Lipinski’s rules. The overarching aim of these scaffolds is to arrange the attached substituents to achieve the best site interactions and hot-spot matching. Frequently, these building blocks or skeletons contain an intricate arrangement of sp3 atoms that forms a spatially predefined and rigid core equipped with appropriate diversification points. Considering that a simple sp3 index that corresponds to the number of sp3hybridized carbon atoms provides a high rate of false-positive results, we addressed this issue with the MCE-18 descriptor. This new equation reflects the quality and complexity of the whole molecular framework and content of sp3 carbons. As shown below, the quadratic index increases with the priority date G

DOI: 10.1021/acs.jmedchem.9b00004 J. Med. Chem. XXXX, XXX, XXX−XXX

Journal of Medicinal Chemistry

Perspective

Figure 4. Top four pharmaceutical companies ranked by MCE-18 (after normalization).

Figure 5. Number of patent records by major pharmaceutical companies (bar chart) sorted by the priority date vs continual growth of MCE-18 scores (solid lines): GPCR ligands (dark gray), protein kinase inhibitors (black), and molecules acting on proteases (gray).

chemistry and to the sp3 quality from both theoretical and statistical perspectives. Industry Productivity in Terms of Patent Records: “Chemical Singularity” or a New Age of Drug Discovery? Considering the above-mentioned trends, we suspect that the number of unique chemical entities being claimed in patent literature is dramatically decreasing. To evaluate this hypothesis, we analyzed patent records from the top 23 pharmaceutical companies32 involving new small-molecule compounds disclosed from 1950 to 2018 (Figure 3). As clearly shown in Figure 3, a spectacular growth in the number of new structures claimed from 1998 to 2004 was observed, but this was followed by a dramatic decline. Schneider’s team demonstrated that the number of U.S. pharmacological applications peaked in 2008,

while in the subsequent 7 years, the number of applications decreased to the level seen in 2003. In contrast, the authors revealed a continuous growth in the number of U.S. pharmacological patent grants between 1975 and 2015. Interestingly, at the final time point, these lines intersected each other. Although the authors noted that a vast majority of the patents and unique reactions in the database were issued after the year 2000, they did not include any discussion of the observed behavior. All the patent records published during the selected time period were used in the study without regard to patent holder and were mined in an automatic fashion without proper expert inspection. Furthermore, the final product of the reactions listed in the patent record may show no relevant activity in the biological trial described therein. Therefore, H

DOI: 10.1021/acs.jmedchem.9b00004 J. Med. Chem. XXXX, XXX, XXX−XXX

Journal of Medicinal Chemistry

Perspective

Figure 6. Distribution (%) of molecules with MCE-18 ≥ 50 among three key target families: (A) protein kinases; (B) proteases; (C) GPCRs.

“pharmacological” character cannot be assigned to such products. In this case, a rigorous database preprocessing and filtration procedure is of great importance to avoid misclassification. We hypothesize that the intellectual property assigned to the major pharmaceutical companies is much more exact and meaningful for reflecting the evolution of the field and industry’s overall health. Considering that the number of patent records issued in 2015−2016 is comparable to the number granted at the end of the previous century, “chemical singularity” is not a mythical point, and this collapse is creeping closer and closer. Under these circumstances, drug-repurposing54,55 is unable to fill this vacuum. Therefore, next-generation molecules are vital for targeting newly identified druggable proteins. Although “old” molecules still significantly influence the statistical outcome, novel chemistry gradually forces out this class of compounds, and we are beginning to see the changes in this field more clearly. Notably, this hypothetical outcome and its aftermath correlate well with previously published ideas. Although the obtained statistical results directly show signs of decreasing activity and continuous regression, MCE-18 scores have been steadily increasing over the years. These tendencies are easily visible and arguably associated with qualitative changes in medicinal chemistry. Therefore, instead of a “chemical singularity”, this era can be reasonably considered an intriguing turning point in drug design and development, and the traditional concept of drug development needs to be radically rethought. With respect to patent holders, Figure 4 illustrates the dynamic changes in MCE-

18 among the leaders. As shown in Figure 4, during the past 10 years, Merck and AstraZeneca have primarily focused on molecules with MCE-18 values over 60, while Pfizer and Sanofi have demonstrated less productivity but have reached similar scores. The data set of small-molecule compounds derived from the selected patent records with the claimed activity against one of the most paramount target families (GPCRs, protein kinases, and proteases) was used to investigate the classification ability of MCE-18. Figure 5 shows the distribution of molecules from the unique patent records through the years and corresponding MCE-18 values. As shown in Figure 5, in contrast to the number of unique patent records, MCE-18 has been gradually increasing over the whole estimated period reaching the highest mean values in the past 10 years: MCE-18 = 90, 75, and 74 for protease targeting molecules, GPCR ligands, and kinase inhibitors, respectively. The revealed behavior correlates well with the common tendency described above for all the compounds claimed in patent records (see Figure 3). Subsequently, within these core families we performed the detailed statistical analysis and uncovered the most populated areas (Figure 6). As shown in Figure 6A, molecules by major pharmaceutical companies with the inhibition potency against CDKs (7.4%), JAKs (6%), IRAKs (5%), Aurora kinases (4.2%), and BTK (4.1%) account for more than 25% of all the kinase inhibitors claimed in the patent literature. It should be especially noted that I

DOI: 10.1021/acs.jmedchem.9b00004 J. Med. Chem. XXXX, XXX, XXX−XXX

Journal of Medicinal Chemistry

Perspective

Figure 7. Representative examples of structures with high MCE-18 values claimed in patent records by major pharmaceutical companies during the past 5 years: (A) protein kinase inhibitors, (B) GPCR ligands, and (C) molecules acting on proteases.

Figure 8. Changes in MCE-18 components over the years.

Crohn’s disease; ABBV-712 (AbbVie, structure is uncovered) is being investigated in phase I study for the treatment of psoriasis; TD-1473, a pan-Janus kinase inhibitor by Janssen Biotech (structure is closed), against inflammatory bowel disease (phase II), itacitinib (MCE-19 = 87.9); a selective JAK1 inhibitor, by Incyte, for the treatment of Graft-versus-host disease (phase III)

during the past 5 years small-molecule inhibitors of JAKs and IRAKs are among the most attractive biological targets particularly for the treatment of rheumatoid arthritis, alopecia areata, and psoriasis. For instance, PF-06700841 (MCE-18 = 102.0), developed by Pfizer, is currently evaluated in phase II clinical trial against alopecia areata, psoriasis, vitiligo, and J

DOI: 10.1021/acs.jmedchem.9b00004 J. Med. Chem. XXXX, XXX, XXX−XXX

Journal of Medicinal Chemistry

Perspective

Figure 9. 20 × 20 2D Kohonen SOM reflecting the trends in key molecular properties of structures disclosed during the following periods: (a) 2015− 2018; (b) 2001−2003; (c) 1980−1985. The color gradient indicates the number of molecules. Areas encircled by the white dotted lines correspond to high MCE-18 values. Basic contours of the map were smoothed for convenient visual inspection.

adequately reflects the progress of synthetic medicinal chemistry. Molecular weight and PSA have constantly increased from the beginning of the studied period, and this tendency correlates well with the results reported previously (vide supra). Unlike PSA, the calculated LogSw values, which relate to aqueous solubility, decreased from the beginning of the studied period the end of the 1990s. Since this point, the LogSw values have plateaued at an average of approximately −4.8. This average value is approximately equal to the minimum aqueous solubility that is statistically allowed for drugs. However, as in the case of the predicted lipophilicity, LogSw varies dramatically with the method used for calculation. We also observed a progressive increase in the degree of 2D branching (Q1) over the years (increased by a factor of 1.09). This trend can be directly attributed to the simultaneous growths in MW and molecular “gravity”. Notably, molecular weight is a less valuable descriptor for the assessment of structural complexity because it strongly depends on the types of atoms present in the molecule. The lipophilicity of compounds constantly increased from 1950 to 2004 followed by a slight decline observed from 2005 (mean LogPcalc value is 3.3 within the last period). However, it would be wrong to speculate around this behavior because there are many of factors that influence significantly this parameter. A similar tendency was noted by Michael Shultz. The AR term also demonstrates the expected growth over the studied time interval for the reasons described above. The resulting Kohonen map is depicted in Figure 9. As shown in Figure 9a, new compounds are located predominantly within two small regions of the same map and are distinct from the nodes abundantly populated by old molecules (Figure 9b,c). In the final cycle, the learning vector quantization error (LVQE) was relatively low, reaching a maximum value of 0.012. More than 90% of the samples showed LVQE values less than 0.002. Therefore, we can conclude that the constructed model has very good generalization ability and learning outcomes. The stability of the model was verified using three independent randomizations. Moreover, the addition of a fuzzy input with stochastic variables did not substantially impact the quality of classification. Upon examination, there were few “dead” neurons within the constructed map. The average classification accuracies for the polar categories were close to 86% and 77% without and with a random threshold, respectively. Launched Drugs: Difference between Old and New Molecules. The main conclusion we can draw from the previous

and ulcerative colitis (phase II); as well as PF-06651600 (MCE19 = 55.6), a suicidal JAK3 inhibitor (Pfizer), developed against alopecia areata (phase II/III) and rheumatoid arthritis (phase II). Within the protease family (Figure 6B), compounds targeted on secretases, DPP-IV, thrombin, and MMP-13 were found to feature in the top positions (55% of all the ligands disclosed). The leaders in the GPCR group (Figure 6C) include molecules acting on serotonin (8.4%), tachykinin (5.9%), chemokine (5.7%), cannabinoid (4.9), and histamine (4.4%) receptors, a total 30% of all the GPCRs ligands described in the patent records. Because only major therapeutically significant and abundantly populated target classes were analyzed, minor targets were included into the “other” category. The structures of several molecules with MCE-18 ≥ 50 from different target classes are shown in Figure 7. During the next step, we performed a statistical analysis of molecular properties using a Kohonen-based machine-learning technique to investigate evolutionary tendencies. As described above, prior to the statistical analysis and in silico modeling, the patent database was preprocessed and normalized to exclude molecules with high structural similarity in each cluster to overcome a statistical bias. Molecular descriptors were selected considering their theoretical impact on the studied phenomenon; the calculated t-values are presented in the Supporting Information (Table S2). Among the evaluated parameters, there were no descriptors with Pearson’s correlation coefficient greater than 0.83 (Table S3). The obtained results are presented in Table 1, Figure 8, and Figure 9. As shown in Table 1, MCE-18 scores gradually increased over the estimated period. The average growth coefficient is 1.12 (per period), while the difference between the scores in the first (1950−1983) and the final (2012−2018) time intervals is close to 1.76. The SPIRO, Q1, SS, and MW values increased by factors of 1.36, 1.09, 1.08, and 1.07, respectively. Although SPIRO showed the greatest slope, it showed no clear trend over the studied period. Among “old” (1950−1990) structures bearing spiro centers, a reasonable portion of these molecules were of natural origin, and this function was present a priori. Therefore, the synthetic efforts were primarily targeted toward other diversification points instead of this ring juncture. During the past decade, the proportion of compounds with spiro centers has been dramatically increasing due to new scaffolds, thereby providing high growth coefficients. Considering the trend, unlike the SPIRO term, MCE-18 is a much more informative metric and K

DOI: 10.1021/acs.jmedchem.9b00004 J. Med. Chem. XXXX, XXX, XXX−XXX

Journal of Medicinal Chemistry

Perspective

Figure 10. Kohonen SOM constructed for launched drugs: (a) white arrows indicate the route by which drugs have evolved over the years; (b) distribution of MCE-18 values within the same map; (c) areas occupied by the drugs brought to market during 1910−1979; (d) locations of new drugs (2011−2018).

the sp3 index is much less informative and rather ambiguous compared with MCE-18, which is quite well balanced. Using these data, we constructed a Kohonen-based SOM (Figure 10) to elucidate the inherent trends hidden in the marketed drugs while keeping the focus on evolutionary changes. The mapping quality was very similar to that achieved above and provided good LVQE values and stability. As shown in Figure 9a, since the beginning of the studied period, drugs have gradually evolved, and we are now observing considerable changes in their aggregate properties. Positive dynamics were revealed for MCE-18 (Figure 10b). Thus, recent drugs obviously tend to be more complex in terms of MCE-18. Unlike drugs brought to marked in the past 7 years (Figure 10d), molecules launched from 1910 to 1979 are predominantly located at the top of the map (Figure 10c). This model clearly shows that the properties of the drugs have dramatically evolved over the years, and many key players in the pharmaceutical industry have followed this trend. Minor companies and academia, if they do not want to miss out, need to consider these trends in their research programs.

section is that new molecules are larger, more polar and show greater 3D complexity relative to “vecchio” chemistry. However, the synthesis of such compounds requires more time and innovative building blocks. Currently, we are seeing a dramatic decline in the industry’s productivity in terms of new patent records on pharmacological substances of synthetic origin. MCE-18 scores allow us to trace the evolution of medicinal chemistry and to assess the “distance” between old and new chemistry. In this section, we explore the dominant trends emerging within the common pool of drugs launched to the pharmaceutical market using MCE-18. As shown in Table 2, MCE-18, MW, Q1, PSA, and SS all gradually increased over time. MCE-18 and Q1 showed high growth coefficients and continuous progress. Similar to the results described above, we did not find any significant correlations between lipophilicity and launch date. The average LogP is 2.13, while for patented compounds, this value is up to 1.5 times higher. We speculate that the drugs that will be brought to market within the foreseeable future will have somewhat higher lipophilicities with the same or even greater PSA. Therefore, we are convinced that L

DOI: 10.1021/acs.jmedchem.9b00004 J. Med. Chem. XXXX, XXX, XXX−XXX

Journal of Medicinal Chemistry

Perspective

Figure 11. Number of drugs (bar chart) sorted by the related milestone date vs continual growth of MCE-18 values (solid lines): GPCR ligands (dark gray), protein kinase inhibitors (black), and molecules acting on proteases (gray).

Table 3. Mean Values of the Molecular Descriptors Calculated for Lead Molecules and Drugs no. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 entities

MD

preclinical studies

phase I

phase II

phase III

launched

increase factor

MCE-18 MW AR NAR CHIRAL SPIRO NCSPTR Q1 sp3 LogP LogSw PSA HBD HBA SS

57.3 418.0 0.91 0.72 0.50 0.027 0.22 22.5 36.4 3.4 −4.6 91.6 2.0 3.2 71.3 38338

56.2 412.9 0.87 0.72 0.55 0.020 0.24 21.8 40.5 2.8 −4.4 95.2 2.2 3.4 71.4 1678

55.9 402.4 0.87 0.71 0.59 0.020 0.24 21.3 41.5 2.9 −4.4 94.0 2.2 3.4 69.4 1837

54.0 397.0 0.85 0.71 0.61 0.030 0.24 21.0 41.8 2.6 −4.3 94.7 2.2 3.5 69.0 464

45.98 366.70 0.80 0.67 0.56 0.01 0.23 17.98 44.65 2.14 −4.01 88.08 2.10 3.44 60.30 1370

0.95 0.97 0.97 0.98 1.03 0.89 1.01 0.95 1.05 0.89 0.97 0.99 1.01 1.02 0.96

Similar results were obtained using the database of drugs acting on GPCRs, protein kinases, or proteases already launched into the market or evaluated in clinical studies (CS). As shown in Figure 11, the number of kinase inhibitors has been constantly increasing over the past 30 years in contrast to protease targeting compounds where a slight decrease has been observed during the past 13 years and GPCR ligands: more than 2-fold decrease for the last period. However, we revealed a permanent increase in MCE-18 values over the whole time scale. The obtained results clearly indicate the MCE-18 conception reflects not only the common tendency in medicinal chemistry and drug discovery but also the evolution proceeded in key therapeutic groups. Voyage to Market: MCE-18 Probe and Nonlinear Mapping. Although we have demonstrated that MCE-18 is in excellent agreement with the issue date assigned in a patent claim and with the launched date, we have also investigated the relationship

between this descriptor and the routine drug development pipeline, from preclinical evaluation to the market via clinical trials. In this case, we anticipated that we would find somewhat different results. As shown in Table 3, MCE-18 falls gradually over the whole route from 57.3 (preclinical studies) to 45.98 (launched drugs), showing an overall decrease by a factor of 1.25. A similar tendency was observed for MW, AR, NAR, Q1, LogSw, and SS. The average MCE-18 values calculated for all the molecules ever launched are 1.2 and 1.65 times lower than those of new drugs (2012−2018) and structures that have been disclosed in patent literature for the past 6 years. Likewise, comparable values were obtained for MW, LogP, Q1, SS, and SPIRO. Interestingly, the sp3 index constantly increases from 36.4 to 44.65 (1.22-fold gain); however, this is not surprising because there is a large population of old structures among the launched drugs that skew the results and cause molecular “turbulence”. In contrast, new molecules that have recently M

DOI: 10.1021/acs.jmedchem.9b00004 J. Med. Chem. XXXX, XXX, XXX−XXX

Journal of Medicinal Chemistry

Perspective

Figure 12. Distribution of lead compounds and drugs within the Kohonen map: (a) molecules in preclinical investigation, (b) compounds examined (or are currently being examined) in clinical trials, (c) the distribution of MCE-18 values, (d) areas occupied by drugs launched from 1910 to 1987, and (e) new drugs brought to market in recent years (2011−2018).

RO5+. To address this issue, we carried out Sammon nonlinear mapping using the same set of molecular descriptors. The resulting plot is presented in Figure 13 and shows that molecules that successfully meet the criteria are tightly grouped within the bottom left area of the map (triangles), whereas new drugs are mainly spread outside this region or are tightly distributed within the periphery of the RO5 region. The obtained result

entered preclinical studies or early clinical trials demonstrate higher MCE-18 values. Considering these trends, we anticipate that if approximately 10% of these molecules reach the market, the statistical results will change. The Kohonen map showing the distribution of compounds by the development stage is depicted in Figure 12. As clearly demonstrated in Figure 12a, lead compounds are homogeneously distributed almost through the whole map with several overpopulated clusters, while drug candidates are mainly located inside these densely populated regions (Figure 12b). Three distinct areas correspond to moderate-to-high MCE-18 values (Figure 12c). Interestingly, the location of old drugs that entered the market in the 1910−1987 period (Figure 12d) is radically different from that of drugs launched in the past decade (Figure 12e). Almost all the older drugs are characterized by relatively low MCE-18 values and are mainly grouped within two areas, while new drugs tend to be incorporated in the regions associated with moderate MCE-18 values. This result is in full accordance with that described above for launch periods (see Figure 10a). The average classification accuracy between these two categories is approximately 70%, which is close to that achieved for clinically evaluated vs launched drugs. Sammon Nonlinear Mapping. As discussed above, statistical analysis and Kohonen-based nonlinear mapping show that new drugs that have been brought to market during the past decade differ remarkably from those launched in the previous century. However, we did not assess their aggregate properties in terms of

Figure 13. Sammon map constructed for the launched drugs. “+” means that PSA < 140 Å was added to the classic Lipinski’s rules. N

DOI: 10.1021/acs.jmedchem.9b00004 J. Med. Chem. XXXX, XXX, XXX−XXX

Journal of Medicinal Chemistry

Perspective

Figure 14. Vendor collections scored by MCE-18.

lost their main source of income. These losses are mitigated by structural reforms, especially within the R&D sector. As a result, the medicinal chemistry community has shrunk considerably over the past decade as a consequence of mergers and downsizing.60,61 Cumulatively, this has led to a dramatic decrease in productivity in terms of the number of patent records. (2) A drop in profits has also been associated with U.S. healthcare reform and European government “austerity” measures. GSK reported that these events had a profound impact on the drug industry.58 Tsukamoto has noted that increasing regulatory burdens significantly influences the costs and time spent on launching a drug to market.62 The improvements in quality control and the tightening of rules for the assessment of clinical efficacy have had a meaningful impact on productivity, and in turn, governments and payers will continue to bear the burdens of increased prices and changes in access, utilization, and prescribing patterns. These obstacles in the legal field will likely indirectly negatively affect patent activity through a reduction in the reinvestment of profits in the R&D sector. (3) The reasons above have forced major pharmaceutical companies to critically rethink the role of medical chemistry. Tsukamoto noted that “medicinal chemists had to make some adjustments to satisfy the appetite of the fast-paced screening paradigm”62 and harshly criticized the classic approach in which medicinal chemistry was solely focused on HTS and did not sufficiently value creativity in routine drug discovery. Mainly, industry is guided by pragmatic and economic principles that in many cases have resulted in the most effective and simple decision. As a consequence, “drug discovery research operations are turning into a game of numbers”.62 The author noted that “complex reactions that require considerable knowledge and technical skills were avoided as much as possible”,62 and the main focus was placed on more trivial routes. Ultimately, the simplicity of the final molecules led to an overall degradation of the organic chemistry toolbox, and this was also highlighted by Schneider. Tsukamoto said the following: “When entry-level chemists fresh from academic laboratories propose complex target molecules that can only be made by a series of unconventional reactions, how often do we dismiss them by saying, “That’s very interesting, but let’s be realistic. Why don’t you make this series of compounds instead? Three easy steps and you can make several analogs?’”62 The author agreed with Kessel, who referred to the analysis by Bain & Company under which a “‘broken innovation culture’ lies at the core of big pharma’s problems”,58 and “big pharma executives need to

visually presents the expected tendency for next-generation molecules to deviate from Lipinski’s rules and particularly reflects one of many aspects of the evolution of medicinal chemistry. In addition, we analyzed the small-molecule compounds in terms of their parent company and their MCE-18 score. The inhouse collections of compounds were obtained by ChemDiv (CD regular collection),25 Enamine (EA hits and advanced collections),56 and InterBioscreen (IBS synthetic collection)57 and preprocessed using soft MCFs to exclude reagents, building blocks, and “marginal” compounds. Molecules containing reactive groups and undesired fragments were eliminated as well. Entities with MWs in the range of 200−700 were used for the study. In this case, we did not perform any normalization procedures and used all the molecules available in the stocks that meet the criteria above. The resulting statistical outcome is depicted in Figure 14. Thus, molecules with MCE-18 less than 45 are generally uninteresting, trivial, contain old scaffolds, and have low degrees of 3D complexity and novelty. These compounds should be discarded and excluded from further investigations. Molecules with MCE-18 values in the range of 45−63 offer sufficient novelty and basically follow the trends currently observed in medicinal chemistry, while compounds with MCE-18 values of 63−78 generally show high structural similarity to compounds disclosed in patent records by major pharmaceutical enterprises during the past decade. The structures of entities with MCE-18 > 78 need to be inspected visually to assess their target profile and drug-like-ness. Overall, the collections can be sorted by 45 < MCE-18 < 78 as follows: CD (47%), EA (AC, 44%), IBS (36%), and EA (HC, 34%). Notably, within the past decade, the synthetic teams of Enamine and ChemDiv have focused on molecules having MCE-18 > 45 and on building blocks with a high degree of 3D complexity. Below, we have listed the key factors responsible for the dramatic decrease in major pharmaceutical companies productivity observed over the past 15 years. (1) The “golden era” of major pharmaceutical companies has long passed, and during the rapid advance of the 20th century, the most-workable and accessible compounds (blockbuster drugs) were fully exploited. Many of the patents on blockbuster drugs have expired or will expire soon; therefore, major pharmaceutical enterprises are continuing to lose its market position, as was predicted by Kessel.58 National Pharmaceutical Services (NPS)59 has recently published a list of drugs, including best-selling drugs, coming off patent by 2022; their generic versions have not been considered. In total, approximately $100 billion in sales have been lost during the past years. Indeed, key pharmaceutical players are facing daunting challenges and have O

DOI: 10.1021/acs.jmedchem.9b00004 J. Med. Chem. XXXX, XXX, XXX−XXX

Journal of Medicinal Chemistry

Perspective

in the field of drug discovery understand these flaws, but they rather prefer to follow the rules than to seek success abroad. However, during the past decade, a dramatic decrease in the number of patent records involving new chemical entities by major pharmaceutical enterprises has been observed. Considering that the activities of a vast majority of small-molecule compounds have been fully elucidated in different biological trials, the escalation of HTS has resulted in a significant depletion of available collections. Many molecules shared in stocks are now considered to have a relatively low attraction rate and little promise. On the other hand, great advances in the field of PPIs have forced scientists to explore new areas of purportedly synthetically accessible chemical space. This impulse is very important; methods and principles of organic chemistry have evolved over the years and have resulted in original building blocks and templates that are primarily characterized by increased 3D complexity. Formally, these valuable insights offer the opportunity to empirically formulate a basis for the “chemical singularity” that is probably our impending eschaton compared with the dogmatic position. In contrast, we can reasonably regard this as a novel turning point in chemical evolution and state that medicinal chemistry has ushered in a new era of drug design and development. To investigate the current trends observed in drug design and development, we used well-preprocessed pharmacological and patent databases as well as the collections of various companies. Equipped with the newly developed MCE-18 descriptor and in silico tools, we have clearly shown that scaffolds are becoming increasingly sophisticated with higher degrees of 3D complexity not only in the PPI realm but also in the field of kinase and protease inhibitors, GPCR-targeted compounds, and others. On the basis of these challenges, we strived to provide readers with statistically adequate and relevant information regarding the dominated trends.

recognize that truly creative steps in product generation are not scalable like other manufacturing processes”.58 (4) Considering the industry’s needs, Rafferty recommends paying more attention to the significant role of high-quality education in drug discovery.63 Medicinal chemists who graduate from universities are less attractive employees for large pharmaceutical companies. Therefore, academic researchers should shift their focus from grants and publications to actual business needs. The industry itself should collaborate with the academic community more effectively and encourage industry to develop innovative solutions that may become alternative sources of research funding. Hoffman notes that the role of medicinal chemistry in industry has changed substantially over the past few decades; however, it remains too narrow.64 In the future, there will likely be increased demand for truly expert medicinal chemists with a deep understanding of drug discovery disciplines and related subjects to manage innovative projects in this area.65 In contrast, the number of ordinary scientists in major pharmaceutical companies will continually decrease, and emphasis will be placed on the quality of their work. In the designing of the MCE-18 metric, the main goal was to reflect the opinions summarized above as well as our own experience regarding the evolution of medicinal chemistry in a unified equation. MCE-18 contains objective terms that can be readily calculated using the structure of a molecule in a convenient and insightful way. Despite the decline in the total number of patents for new drugs, the quality and complexity of novel molecules by major pharmaceutical companies have drastically increased over the past decade. Painstaking synthetic efforts have yielded a series of new building blocks bearing nontrivial 3D architectures, providing a robust IP position. As mentioned above, a considerable number of novel molecules are PPI inhibitors, which requires a shift in the attention of medicinal chemists toward structures with a relatively higher degree of 3D complexity. As we demonstrate herein, MCE-18 can be used to monitor key features of “next-generation” molecules and trace the evolution of medicinal chemistry through the years. Skilled medicinal chemists continue to make good and original molecules that give us confidence that the “chemical singularity” will swallow up vecchio compounds, opening the door to a new era of medicinal chemistry.



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jmedchem.9b00004. Tables of statistical data for the calculated descriptors and a list of top 23 pharmaceutical companies (PDF)



CONCLUSION Evolutionary theory is one of the most important paradigms in science. The fundamental principles of evolution run through almost all fields where incremental progress is observed. One of the crucial drivers of evolution is adaptation to permanently changing environments. From this point of view, progress in the pharmacological industry significantly depends on achievements in disease-associated biology as well as advances in the toolbox of medicinal chemists. Major pharmaceutical companies set the tone of this industry and have many resources to dominate the minor players. They can be regarded as one of the core drivers of medicinal chemistry evolution. In this context, understanding the strategic trends in drug design and development and in the industry breathing space is of great importance. Herein, we shed light on this issue using MCE-18 as a novel metric as well as machine-learning techniques. Historically, the routine and epidemic-like application of drug-likeness paradigm has resulted in a range of common principles and efficiency indices that play a pivotal role in decision making by drug-hunting teams in industry and scientists searching for novel drugs. Indeed, many core players



AUTHOR INFORMATION

Corresponding Author

*Phone: +852 96685251. E-mail: [email protected]. ORCID

Yan A. Ivanenkov: 0000-0002-8968-0879 Notes

The authors declare no competing financial interest. Biographies Yan A. Ivanenkov graduated from the Lomonosov Moscow State University, Faculty of Chemistry (organic and medicinal chemistry) in 2002. Then, in 2010, he was awarded a Ph.D. degree in Biochemistry, and then he became an Associate Professor of Medicinal Chemistry. For more than 20 years he has been working in the field of drug design, medicinal chemistry and biochemistry, QSAR analysis, organic and combinatorial synthesis, and in silico modeling. More than 100 scientific papers have been published in various authoritative and high IF journals. Several small molecule drugs have been developed and are currently evaluated in clinics, including neuraminidase and NS5A P

DOI: 10.1021/acs.jmedchem.9b00004 J. Med. Chem. XXXX, XXX, XXX−XXX

Journal of Medicinal Chemistry

Perspective

(5) Ran, X.; Gestwicki, J. E. Inhibitors of protein-protein interactions (PPIs): an analysis of scaffold choices and buried surface area. Curr. Opin. Chem. Biol. 2018, 44, 75−86. (6) Zheng, Y.; Tice, C. M.; Singh, S. B. The use of spirocyclic scaffolds in drug discovery. Bioorg. Med. Chem. Lett. 2014, 24, 3673−3682. (7) Pammolli, F.; Magazzini, L.; Riccaboni, M. The productivity crisis in pharmaceutical R&D. Nat. Rev. Drug Discovery 2011, 10, 428−438. (8) Scannell, J. W.; Blanckley, A.; Boldon, H.; Warrington, B. Diagnosing the decline in pharmaceutical R&D efficiency. Nat. Rev. Drug Discovery 2012, 11, 191−200. (9) Shih, H. P.; Zhang, X.; Aronov, A. M. Drug discovery effectiveness from the standpoint of therapeutic mechanisms and indications. Nat. Rev. Drug Discovery 2018, 17, 19−33. (10) Pammolli, F.; Magazzini, L.; Riccaboni, M. The productivity crisis in pharmaceutical R&D. Nat. Rev. Drug Discovery 2011, 10, 428− 438. (11) Waring, M. J.; Arrowsmith, J.; Leach, A. R.; Leeson, P. D.; Mandrell, S.; Owen, R. M.; Pairaudeau, G.; Pennie, W. D.; Pickett, S. D.; Wang, J.; Wallace, O.; Weir, A. An analysis of the attrition of drug candidates from four major pharmaceutical companies. Nat. Rev. Drug Discovery 2015, 14, 475−486. (12) Mirasol, F. 2016 NME Drug Approvals Drop After Recent Highs. https://www.dcatvci.org/286-2016-nme-drug-approvals-drop-afterrecent-highs (accessed Feb 27, 2019). (13) Kola, I.; Landis, J. Can the pharmaceutical industry reduce attrition rates? Nat. Rev. Drug Discovery 2004, 3, 711−715. (14) Lipinski, C. A.; Lombardo, F.; Dominy, B. W.; Feeney, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Delivery Rev. 1997, 23, 3−25. (15) Degoey, D. A.; Chen, H.; Cox, P. B.; Wendt, M. D. Beyond the Rule of 5: lessons learned from AbbVie’s drugs and compound collection. J. Med. Chem. 2018, 61, 2636−2651. (16) Lovering, F.; Bikker, J.; Humblet, C. Escape from flatland: increasing saturation as an approach to improving clinical success. J. Med. Chem. 2009, 52, 6752−6756. (17) Allu, T. K.; Oprea, T. I. Rapid evaluation of synthetic and molecular complexity for in silico chemistry. J. Chem. Inf. Model. 2005, 45, 1237−1243. (18) Bertz, S. H. The first general index of molecular complexity. J. Am. Chem. Soc. 1981, 103, 3599−3603. (19) Barone, R.; Chanon, M. A new and simple approach to chemical complexity. application to the synthesis of natural products. J. Chem. Inf. Comput. Sci. 2001, 41, 269−272. (20) Schuffenhauer, A.; Brown, N.; Selzer, P.; Ertl, P.; Jacoby, E. Relationships between molecular complexity, biological activity, and structural diversity. J. Chem. Inf. Model. 2006, 46, 525−535. (21) Feher, M.; Schmidt, J. M. Property distributions: differences between drugs, natural products, and molecules from combinatorial chemistry. J. Chem. Inf. Comput. Sci. 2003, 43, 218−227. (22) Selzer, P.; Roth, H. J.; Ertl, P.; Schuffenhauer, A. Complex molecules: do they add value? Curr. Opin. Chem. Biol. 2005, 9, 310− 316. (23) Whitlock, H. W. On the structure of total synthesis of complex natural products. J. Org. Chem. 1998, 63, 7982−7989. (24) Rucker, C.; Rucker, G.; Bertz, S. H. Organic synthesis - art or science? J. Chem. Inf. Comput. Sci. 2004, 44, 378−386. (25) ChemDiv Inc. Home Page. https://chemdiv.com (accessed Feb 27, 2019). (26) Delfort, B.; Le Pennec, D.; Grandjean, J. A Method of Synthesizing a Mixture of N,N,N′,N′-Tetramethyl-1,6-hexanediamine and N,N,N′,N′-Tetramethyldiaminoethers. France Patent FR3014100A1, 2013. 1,2-Bis(2-dimethylaminoethoxy)ethane can be obtained in two reaction steps: the reaction of commercially available triethylene glycol with ammonia to yield 1,2-bis(2-aminoethoxy)ethane, a well-known reaction, is generally carried out in the presence of hydrogen and a suitable catalyst, followed by an aminoalkylation reaction, for example, by reductive amination using formaldehyde in the presence of hydrogen and a suitable catalyst.

inhibitors and androgen receptor and 5HT6R antagonists. Since 2017, he has been the chief leader and the Head of Medicinal Chemistry Department at Insilico Medicine, Inc. Bogdan A. Zagribelnyy is pursuing a specialist’s degree (to date, 6th year) in organic, bioorganic, and medicinal chemistry at Lomonosov Moscow State University (Moscow, Russia), currently under the supervision of Dr. Yan Ivanenkov and Prof. Alexander Majouga. He started consulting at Insilico Medicine Inc. in 2017. He is involved with the development of new androgen receptor ligands as therapeutic tools in prostate cancer, medicinal chemistry data collection, and training set preparation for AI-driven drug discovery platforms. Vladimir A. Aladinskiy received his B.S. (2012) and M.S (2014) in Applied Mathematics and Physics from Moscow Institute of Physics and Technology. Then he joined Dr. Yan Ivanenkov’s research group at the MIPT and obtained his Ph.D. in Computational Chemistry with research focusing on computer-aided drug design of new NS5A HCV inhibitors. Since 2017, he is a Computational Chemistry Team Lead at Insilico Medicine, Inc. where he supervises projects on the application of AI-driven methods to the generation of new drug-like molecules. His main interests include CADD as well as machine learning and big data in drug design.



ACKNOWLEDGMENTS The authors gratefully acknowledge the financial support of the Ministry of Education and Science of the Russian Federation [Grant 20.9907.2017/VU] (expert opinion, discussion, and manuscript preparation) and Russian Science Foundation [Grant 17-74-30012], IBG RAS Ufa (in silico study and statistical analysis). The authors gratefully acknowledge the valuable assistance of Prof. Nicholas Meanwell in discussions.



ABBREVIATIONS USED Bcl-2, B-cell lymphoma 2 protein; BRD, bromodomain; ChemoSoft, ChemDiv Inc. chemical database software; FAK, focal adhesion kinase; FRB, freely rotatable bonds; GLP1R, glucagon-like peptide-1 receptor; GSK, GlaxoSmithKline Inc.; Hsp70, 70 kDa heat shock proteins; IP, intellectual property; LogSw, predicted solubility in water; RDKit, open-source cheminformatics software; LVQE, learning vector quantization error; MCF, medicinal chemistry filter; NS5A, nonstructural protein 5A; p53/MDM2, p53 protein and mouse double minute 2 homolog protein interaction; RO3, rule of 3; RO5+, beyond “rule of 5” (Lipinski); TPSA, topological polar surface area; SOM, self-organizing map; MoA, mechanism of action; SMCM, synthetic and molecular complexity index; JAK, Janus protein kinase; BTK, Bruton’s tyrosine kinase; TWC, total walk count; BC, Barone’s complexity; GPCR, G-protein-coupled receptor; CDK, cyclin-dependent kinase; IRAK, interleukin-1 receptor (IL-1R) associated kinase; CS, clinical studies



REFERENCES

(1) Mullard, A. The drug maker’s guide to the galaxy. Nature 2017, 549, 445−447. (2) Bohacek, R. S.; McMartin, C.; Guida, W. C. The art and practice of structure-based drug design: a molecular modeling perspective. Med. Res. Rev. 1996, 16, 3−50. (3) Boström, J. G.; Brown, D. G.; Young, R. J.; Keserü, G. M. Expanding the medicinal chemistry synthetic toolbox. Nat. Rev. Drug Discovery 2018, 17, 709−727. (4) BioGRID Database Statistics. https://wiki.thebiogrid.org/doku. php/statistics (accessed Feb 27, 2019). Q

DOI: 10.1021/acs.jmedchem.9b00004 J. Med. Chem. XXXX, XXX, XXX−XXX

Journal of Medicinal Chemistry

Perspective

(27) Chen, F.; Deng, Z.; Li, S. Preparation Method of N-tertButyloxycarbonyl-3-methylaminothiophene. China Patent 104193722A, 2014. The compound can be synthesized in three steps using 3-thienylformaldehyde as an initial raw material: (1) NaBH4, MeOH, 25 °C, 5 h; (2) HBr, H2O/D, 12 h; (3) NH3, DMF, 150 °C, 18 h. (28) This compound was prepared in ChemDiv by analogy with the synthetic procedure described in Klingsberg, E. Preparation of triaryl-striazoles from diaroylhydrazines. J. Org. Chem. 1958, 23, 1086−1087, , starting from commercially available N,N′-di(pyridine-2-carbonyl)hydrazide in two steps. (29) This compound was synthesized following a proprietary ChemDiv protocol: Commercially available 2-bromo-6-nitro-4-phenylquinazoline was subjected to Suzuki−Miyaura coupling reaction with benzene-1,4-diboronic acid followed by nitro group reduction and acylation using acetic anhydride. (30) Kauffman, W. J. Observations on the synthesis and characterization of N,N′,N″-tris-(dimethylaminopropyl)hexahydro-s-triazine and isolable intermediates. J. Heterocycl. Chem. 1975, 12, 409−411. HHT can be prepared by the reaction of N,N-dimethylaminopropylamine with aqueous formaldehyde via aldimine intermediate formation in situ followed by its rapid trimerization (one-pot synthesis, up to 90% yield). (31) Balaban, A. T. Chemical graphs. Theoret. Chim. Acta. 1979, 53, 355−375. (32) Clarivate Analytics Integrity Access Page. https://integrity. thomson-pharma.com (accessed Feb 27, 2019). (33) Christel, M. Pharm Exec’s Top 50 Companies 2017. Pharm. Exec. 2017, 37, 4. http://www.pharmexec.com/pharm-execs-top-50companies-2017 (accessed Feb 27, 2019). (34) Ivanenkov, Y. A.; Khandarova, L. M. Advanced Artificial Intelligence Methods Used in the Design of Pharmaceutical Agents. In Pharmaceutical Data Mining: Approaches and Applications for Drug Discovery; Balakin, K. V., Ed.; Wiley: Hoboken, NJ, 2010; pp 457−489. (35) Pletnev, I. V.; Ivanenkov, Y. A.; Tarasov, A. V. Dimensionality Reduction Techniques for Pharmaceutical Data Mining. In Pharmaceutical Data Mining: Approaches and Applications for Drug Discovery; Balakin, K. V., Ed.; Wiley: Hoboken, NJ, 2010; pp 423−455. (36) Sammon, J. W. A nonlinear mapping for data structure analysis. IEEE Trans. Comput. 1969, C-18, 401−409. (37) Bemis, G. W.; Murcko, M. A. The properties of known drugs. 1. Molecular frameworks. J. Med. Chem. 1996, 39, 2887−2893. (38) Wang, J.; Hou, T. Drug and drug candidate building block analysis. J. Chem. Inf. Model. 2010, 50, 55−67. (39) Taylor, R. D.; MacCoss, M.; Lawson, A. D. G. Combining molecular scaffolds from FDA approved drugs: application to drug discovery. J. Med. Chem. 2017, 60, 1638−1647. (40) Pitt, W. R.; Parry, D. M.; Perry, B. G.; Groom, C. R. Heteroaromatic rings of the future. J. Med. Chem. 2009, 52, 2952−2963. (41) Visini, R.; Arús-Pous, J.; Awale, M.; Reymond, J. L. Virtual exploration of the ring systems chemical universe. J. Chem. Inf. Model. 2017, 57, 2707−2718. (42) Brown, D. G.; Boström, J. Analysis of past and present synthetic methodologies on medicinal chemistry: where have all the new reactions gone? J. Med. Chem. 2016, 59, 4443−4458. (43) Clemons, P. A.; Bodycombe, N. E.; Carrinski, H. A.; Wilson, J. A.; Shamji, A. F.; Wagner, B. K.; Koehler, A. N.; Schreiber, S. L. Small molecules of different origins have distinct distributions of structural complexity that correlate with protein-binding profiles. Proc. Natl. Acad. Sci. U.S.A. 2010, 107, 18787−18792. (44) Kombo, D. C.; Tallapragada, K.; Jain, R.; Chewning, J.; Mazurov, A. A.; Speake, J. D.; Hauser, T. A.; Toler, S. 3D Molecular descriptors important for clinical success. J. Chem. Inf. Model. 2013, 53, 327−342. (45) Schneider, N.; Lowe, D. M.; Sayle, R. A.; Tarselli, M. A.; Landrum, G. A. Big Data from pharmaceutical patents: a computational analysis of medicinal chemists’ bread and butter. J. Med. Chem. 2016, 59, 4385−4402.

(46) Walters, W. P.; Green, J.; Weiss, J. R.; Murcko, M. A. What do medicinal chemists actually make? A 50-Year Retrospective. J. Med. Chem. 2011, 54, 6405−6416. (47) Doak, B. C.; Over, B.; Giordanetto, F.; Kihlberg, J. Oral druggable space beyond the Rule of 5: insights from drugs and clinical candidates. Chem. Biol. 2014, 21, 1115−1142. (48) Shultz, M. D. Two decades under the influence of the Rule of five and the changing properties of approved oral drugs. J. Med. Chem. 2019, 62 (4), 1701−1714. (49) Ran, X.; Gestwicki, J. E. Inhibitors of protein-protein interactions (PPIs): an analysis of scaffold choices and buried surface area. Curr. Opin. Chem. Biol. 2018, 44, 75−86. (50) Cossar, P. J.; Lewis, P. J.; McCluskey, A. Protein protein interactions as antibiotic targets: a medicinal chemistry perspective. Med. Res. Rev. 2018, 1−26. (51) Macalino, S. J. Y.; Basith, S.; Clavio, N. A. B.; Chang, H.; Kang, S.; Choi, S. Evolution of in silico strategies for protein-protein interaction drug discovery. Molecules 2018, 23, 1963. (52) Scott, D. E.; Bayly, A. R.; Abell, C.; Skidmore, J. Small molecules, big targets: drug discovery faces the protein- protein interaction challenge. Nat. Rev. Drug Discovery 2016, 15, 533−550. (53) Soga, S.; Shirai, H.; Kobori, M.; Hirayama, N. Use of amino acid composition to predict ligand-binding sites. J. Chem. Inf. Model. 2007, 47, 400−406. (54) Pushpakom, S.; Iorio, F.; Eyers, P. A.; Escott, K. J.; Hopper, S.; Wells, A.; Doig, A.; Guilliams, T.; Latimer, J.; McNamee, C.; Norris, A.; Sanseau, P.; Cavalla, D.; Pirmohamed, M. Drug repurposing: progress, challenges and recommendations. Nat. Rev. Drug Discovery 2019, 18, 41−58. (55) Ashburn, T. T.; Thor, K. B. Drug repositioning: identifying and developing new uses for existing drugs. Nat. Rev. Drug Discovery 2004, 3, 673−683. (56) Enamine Ltd. Home Page. https://enamine.net (accessed Feb 27, 2019). (57) InterBioscreen Ltd. Home Page. https://www.ibscreen.com (accessed Feb 27, 2019). (58) Kessel, M. The problems with today’s pharmaceutical business an outsider’s view. Nat. Biotechnol. 2011, 29, 27−33. (59) NPS, National Pharmaceutical Services Home Page. https:// www.pti-nps.com/nps/ (accessed Feb 27, 2019). (60) Mullin, R. More job cuts at Merck. Chem. Eng. News 2011, 89, 10. (61) Jarvis, L. M. Pfizer reveals more R&D cuts. Chem. Eng. News 2011, 89, 5. (62) Tsukamoto, T. Tough times for medicinal chemists: are we to blame? ACS Med. Chem. Lett. 2013, 4, 369−370. (63) Rafferty, M. F. No denying it: medicinal chemistry training is in big trouble. J. Med. Chem. 2016, 59, 10859−10864. (64) Hoffmann, T.; Bishop, C. The future of discovery chemistry: quo vadis? Academic to industrial - the maturation of medicinal chemistry to chemical biology. Drug Discovery Today. 2010, 15, 260−264. (65) Campbell, I. B.; Macdonald, S. J. F.; Procopiou, P. A. Medicinal chemistry in drug discovery in big pharma: past, present and future. Drug Discovery Today 2018, 23, 219−234.

R

DOI: 10.1021/acs.jmedchem.9b00004 J. Med. Chem. XXXX, XXX, XXX−XXX