A Machine Learning Approach for Predicting HIV ... - ACS Publications

Carolina, 280 Calhoun St. MSC 141 Charleston, South Carolina, United States 29425-1410. 4. Earth-Life Science Institute, Tokyo Institute of Technology...
0 downloads 0 Views 2MB Size
Article pubs.acs.org/jcim

Cite This: J. Chem. Inf. Model. 2018, 58, 1544−1552

A Machine Learning Approach for Predicting HIV Reverse Transcriptase Mutation Susceptibility of Biologically Active Compounds Thomas M. Kaiser,*,†,∇ Pieter B. Burger,†,‡ Christopher J. Butch,†,§ Stephen C. Pelly,† and Dennis C. Liotta*,† †

Department of Chemistry, Emory University, 201 Dowman Drive, Atlanta, Georgia 30322, United States Department of Drug Discovery and Biomedical Sciences, College of Pharmacy, Medical University of South Carolina, 280 Calhoun St., MSC 141, Charleston, South Carolina 29425-1410, United States § Earth-Life Science Institute, Tokyo Institute of Technology, 2-12-1-IE-1 Ookayam, Meguro-ku, Tokyo 152-8550, Japan

J. Chem. Inf. Model. 2018.58:1544-1552. Downloaded from pubs.acs.org by UNIV OF SUNDERLAND on 09/27/18. For personal use only.



S Supporting Information *

ABSTRACT: HIV resistance emerging against antiretroviral drugs represents a great threat to the continued prolongation of the lifespans of HIV-infected patients. Therefore, methods capable of predicting resistance susceptibility in the development of compounds are in great need. By targeting the major reverse transcription residues Y181, K103, and L100, we used the biological activities of compounds against these enzymes and the wild-type reverse ̈ Bayes Networks. Through this machine learning transcriptase to create Naive approach, we could predict, with high accuracy, whether a compound would be susceptible to a loss of potency due to resistance. Also, we could perfectly predict retrospectively whether compounds would be susceptible to both a K103 mutant RT and a Y181 mutant RT. In the study presented here, our method outperformed a traditional molecular mechanics approach. This method should be of broad interest beyond drug discovery efforts, and serves to expand the utility of machine learning for the prediction of physical, chemical, or biological properties using the vast information available in the literature.



inhibitor (NNRTI) compounds.5 We additionally demonstrate that, trained solely with knowledge of the small molecule scaffold and activity, NBNs more accurately identify nonsusceptible compounds compared to a more traditional docking-based approach, which requires both knowledge of the protein structure and considerable computational time. Of the 24 single agents currently approved for treating HIV, 13 target RT, which makes it the most frequently targeted component of HIV.6 RT is the enzyme responsible for converting the single-stranded RNA genome of HIV into the double-stranded DNA needed for integration into the genome of the host.7 Inhibitors of RT approved for the clinic are in either the nucleoside reverse transcriptase inhibitor (NRTI) family or the NNRTI family. While NRTIs directly act at the active site of RT, NNRTIs bind to a hydrophobic pocket within the palm subdomain of p66 and exert an inhibitory effect through allosterism.8 Key interactions between the allosteric pocket and NNRTIs involve the residues Tyr 181, Lys 101, Tyr 188, Trp 229, Tyr 318, Leu 100, and Val 106. Single amino acid substitution is often sufficient to confer resistance to inhibitors, and key mutations observed in the

INTRODUCTION In the past three decades, more than 25 antiretroviral drugs and drug combinations have been developed for the treatment of HIV-1. However, HIV still has no known cure, and HIV is a major public health threat with an estimated 30 million infected individuals worldwide.1 Also, strict patient adherence (>95% of dose) to the prescribed combination therapy is needed to ensure suppression of HIV viral load, and interruptions in dosing lead to a loss of viral response due to mutation.2 The percentage of patients with an individual adherence rate of >95% was found to be 53% for older subjects and only 26% for younger subjects in the United States.3 Resistance emerging due to a high rate of poor patient adherence represents a great threat to the continued success of antiretrovirals, especially when coupled with the high error rate of reverse transcriptase (RT) producing mutant virions.4 As a result, HIV resistance susceptibility is a drug developmental concern that currently has no method of predicting the susceptibility. Therefore, methods capable of predicting resistance susceptibility in the preclinical development of compounds acting against HIV are in great need. Given our success and that of others with predicting the activity of ̈ Bayes Networks (NBNs), we decided compounds using Naive to extend our machine learning methodology to predict the resistance susceptibility of nonnucleoside reverse transcriptase © 2018 American Chemical Society

Received: August 8, 2017 Published: June 28, 2018 1544

DOI: 10.1021/acs.jcim.7b00475 J. Chem. Inf. Model. 2018, 58, 1544−1552

Article

Journal of Chemical Information and Modeling

when a residue was mutated. Furthermore, our machine learning approach would be ignorant of any three-dimensional (3D) structural information regarding active site-ligand interactions.

clinic that confer resistance to approved NNRTIs are listed in Table 1.8−11 Table 1. Prevalent Mutations Associated with NNRTI Resistance and Percentage of Patients with Mutations in Reverse Transcriptase mutation

percentage in ART patients

Nevirapine

drug

prevalent mutation L100I K101E K103N Y181C Y188C

L100I K101E K103N K103S V106M

7.7 16 61 0.7 15

Efavirenz

K103N K103N/Y181C K103N/E478Q Y181C

Y181C Y181I Y188L G190A

33 3.6 8.8 36

Etravirine

K101H V106I/V179D

G190S P225H

16 10

Rilpivirine

L100I Y181I/V Y188L



RESULTS AND DISCUSSION Our ultimate goal for this study was to create a workflow that would allow the construction of a machine learning algorithm that would predict resistance susceptibility for compounds when a residue was mutated to any other amino acid. If this were to fail, we would then separate out all of the individual mutant types (e.g., Y181C) and perform a more-limited analysis. We used the ChEMBL database as our source for data regarding compounds known to be active against RT or any mutant form of RT.16 The ChEMBL database was selected as our data source, because of the rigorous curation process that activity data undergo before being incorporated.17 We found 3899 entries concerning RT activity, and we decided to first focus on the Y181 mutation, one of the major mutants responsible for a 50-fold loss of efficacy of nevirapine.11 We only focused on single mutant data because there were only a handful of compounds with double mutant data (both the Y181 and K103 residues being altered) in the ChEMBL dataset. We then took the ChEMBL dataset and processed it as shown in the schematic workflow in Figure 1. Selecting for compounds with Y181 mutant data gave 340 compounds out of the 3899. Removing compounds that were tested against the Y181−K103 double mutant gave 311 compounds that had only data against any Y181 mutant. We then selected those compounds from the 3899 that had data against wild-type RT and filtered those for compounds that were also in the set of 311 Y181 mutant compounds to give 308 compounds that had WT data. Finally, we filtered the Y181 compounds, using the WT compounds to give a mutant Y181 set of 308 compounds, which were common to both sets. The logic behind this was to find a set of compounds that had both Y181 mutant and WT activity, which allowed us to quantify the degree of activity loss by calculating the fold change for each compound. To ensure we have a molecularly

As can be seen from Table 1, a majority of patients on ART experience the K103N mutation, and multiple mutations are not uncommon in patients receiving therapy. However, most of the work regarding drug resistance prediction revolves around sequencing viral genomes in patients and making predictions about which drugs would be unusable in a patient due to viral resistance.12−14 A general method capable of predicting resistance susceptibility against clinically observed mutants for hit-to-lead development has not been published at the time of this study.15 Our machine learning approach would have to delineate between two sets of compounds: those compounds that retained wild-type activity against a singleamino-acid mutant, and those compounds that lost activity

Figure 1. Graphical representation of workflow. 1545

DOI: 10.1021/acs.jcim.7b00475 J. Chem. Inf. Model. 2018, 58, 1544−1552

Article

Journal of Chemical Information and Modeling

this study is concerned with the loss of activity for highly active compounds resulting from a 181 RT mutant (75% of compounds show