NANPDB: A Resource for Natural Products from Northern African

Jun 22, 2017 - Hermann-Herder-Straße 9, 79104 Freiburg, Germany. ⊥. Freiburg ... We have built the Northern African Natural Products Database. (NAN...
0 downloads 0 Views 4MB Size
Article pubs.acs.org/jnp

NANPDB: A Resource for Natural Products from Northern African Sources Fidele Ntie-Kang,*,†,‡,# Kiran K. Telukunta,§,# Kersten Döring,§ Conrad V. Simoben,† Aurélien F. A. Moumbock,‡ Yvette I. Malange,‡ Leonel E. Njume,∥ Joseph N. Yong,‡ Wolfgang Sippl,† and Stefan Günther*,§,⊥ †

Department of Pharmaceutical Chemistry, Martin-Luther University of Halle-Wittenberg, Wolfgang-Langenbeck Straße 4, 06120 Halle (Saale), Germany ‡ Department of Chemistry and ∥Chemical and Bioactivity Information Centre, Department of Chemistry, Faculty of Science, University of Buea, P.O. Box 63, Buea, Cameroon § Institute of Pharmaceutical Sciences, Research Group Pharmaceutical Bioinformatics, Albert-Ludwigs-University Freiburg, Hermann-Herder-Straße 9, 79104 Freiburg, Germany ⊥ Freiburg Institute for Advanced Studies (FRIAS), University of Freiburg, Albertstraße 19, 79104 Freiburg i.B, Germany S Supporting Information *

ABSTRACT: Natural products (NPs) are often regarded as sources of drugs or drug leads or simply as a “source of inspiration” for the discovery of novel drugs. We have built the Northern African Natural Products Database (NANPDB) by collecting information on ∼4500 NPs, covering literature data for the period from 1962 to 2016. The data cover compounds isolated mainly from plants, with contributions from some endophyte, animal (e.g., coral), fungal, and bacterial sources. The compounds were identified from 617 source species, belonging to 146 families. Computed physicochemical properties, often used to predict drug metabolism and pharmacokinetics, as well as predicted toxicity information, have been included for each compound in the data set. This is the largest collection of annotated natural compounds produced by native organisms from Northern Africa. While the database includes well-known drugs and drug leads, the medical potential of a majority of the molecules is yet to be investigated. The database could be useful for drug discovery efforts, analysis of the bioactivity of selected compounds, or the discovery of synthesis routes toward secondary metabolites. The current version of NANPDB is available at http://african-compounds.org/nanpdb/.

N

stabilizers should be mentioned; for example, taccalonomides A, B, E, and N, derived from Tacca species, are known to have a unique mode of action that does not involve direct binding to tubulin.8 It is noteworthy that the geographical region of Northern Africa differs significantly from the rest of the continent, covering a land surface area of about 9 million km2,9 most of which is occupied by the Sahara Desert (currently occupying a surface area of >50% of the total area of the region and is constantly expanding). The Northern African region includes Algeria, Egypt, Libya, Morocco, Sudan, South Sudan, Tunisia, Western Sahara, and parts of Northern Mali. It is therefore expected that the natural products from this part of the world will show some uniqueness with respect to structural diversity and biological activities when compared with the rest of the continent. The reason is that plants, animals, and fungi have

atural products (NPs) are known to play an important role in drug discovery, as they often provide scaffolds as starting points for hit/lead discovery.1,2 It has been verified from recent surveys that NPs from Northern African sources could constitute an important reservoir for the discovery of drugs,3−5 due to the long history of the use of their source organisms in traditional medicine, which dates back to prehistoric and pharaonic times.6,7 However, data for the use of the compound sources, collection points of compound sources, biological activities of tested isolates, access to compound samples for screening purposes, among others, are often unavailable and/or scattered in the literature. Some of these data are inaccessible to a majority of scientists. A smaller proportion of these literature sources includes M.Sc. and Ph.D. theses, which are often stored as hard copies in university libraries and inaccessible to the wider community of scientists working on natural products drug discovery. On the other hand, many NPs that have been identified from Northern African sources are known drugs or have been shown to have clinical potential. As a representative, the microtubule © 2017 American Chemical Society and American Society of Pharmacognosy

Received: March 31, 2017 Published: June 22, 2017 2067

DOI: 10.1021/acs.jnatprod.7b00283 J. Nat. Prod. 2017, 80, 2067−2076

Journal of Natural Products

Article

use of the source organisms in traditional medicine and the known biological activities of the compounds. Data can be accessed via query searches based on compound names, chemical substructure, compound similarity, source organism name and family, references, and compound classes and subclasses, or by keywords. All compound structures can be downloaded entirely from the Web site and could be used for in silico screening, with the view of identifying potential new activities of the molecules. NANPDB, with its generated 3D models, would, therefore, be a useful asset for research groups focusing on the modeling of biomolecular interactions. A systematic plan to constantly update the database has also been established. The database includes links to all literature references (including those not available in PubMed), information about the uses of the source species, experimentally verified activities, and modes of action (where available). Additionally, laborious efforts (in terms of time) have been invested to manually curate data, particularly from Ph.D. theses from university libraries, knowing that such data are often not included in classical databases. This was done in order to make the otherwise inaccessible data available online.

adapted to this unique geographical setting for numerous generations. Previous efforts toward the development of electronic NP libraries from African sources include CamMedNP,10 ConMedNP,11 AfroDb,12 p-ANAPL,13 AfroCancer,14,15 AfroMalariaDB,16 and Afrotryp,17 along with the most recently published South African natural compound database (SANCDB).18 The latter consists of an online downloadable library of about 600 compounds from plant, animal, bacterial, and fungal species from South Africa. The SANCDB contains annotations on biological activities, along with 2D and 3D chemical structural data. The CamMedNP and ConMedNP libraries contain generated 3D models of natural products mainly from the Central African region. AfroDb12 is a diverse well-curated data set of about 1000 NPs derived from African flora, collected from different regions of the continent. Literature information shows AfroDb compounds exhibit a broad range of biological activities and have been included as a subset of the ZINC database.19−21 On the other hand, the Pan-African Natural Products Library (p-ANAPL) is a data set of about 600 compounds from diverse floral sources in Africa, for which stored samples are directly available for experimental screening.13 Meanwhile, AfroCancer,14,15 AfroMalariaDB,16 and Afrotryp17 are, respectively, focused NP virtual libraries containing compounds with promising anticancer, antimalarial, and anti-Trypanosoma properties. However, no similar endeavor has been carried out for the Northern Africa region, thus prompting the development of an online library for natural products from species found in this region. The Northern African Natural Products Database (NANPDB) can be browsed through lists and searches. Natural products from Northern African species in the form of compound files can also be downloaded. The data currently cover compounds derived mainly from plants, with contributions from some endophytes, animals (e.g., corals), fungi, and bacteria. Where available, each compound entry has been linked to PubChem.22 For the description of the source species, NANPDB has been linked to available databases; for example, some plant sources have been linked to the Prota23 and Tropicos24 databases, while marine sources have been linked to the World Register of Marine Species (WoRMS).25 Bacterial sources were linked to GenBank26 and fungal sources to the Mycobank database,27−29 while the taxonomic data for a majority of the compounds have been linked to the National Center for Biotechnology Information (NCBI) Taxonomy database.30 Additionally, the availability of source organism storage data, e.g., in national and university herbaria and repositories, has been included, where possible. Literature information (e.g., authors, journal names, titles of publications, literature reference, doi, among others) has also been included, via links to PubMed31 and journal reference links. NANPDB represents the largest collection of annotated NPs derived from natural species found in Northern Africa. Although the database currently includes many known drugs and drug leads, the biological activity for the large majority of compounds has not been tested. Hence, the medicinal potential of most of the molecules is yet to be assessed. This opens a broad window of opportunity for further investigations in drug discovery studies, analysis of the bioactivity of selected compounds, and probing of the routes toward the biosynthesis of some of the identified metabolites. An additional asset of NANPDB is the inclusion of related information such as the



RESULTS AND DISCUSSION Table 1 provides a summary of the current content of NANPDB, which describes 4469 molecular entities (mainly Table 1. Current Contents of NANPDB unique SMILES 4469 unique PCIDs 2059

families 146

source organisms

biological activities

modes of action

kingdoms

617 cited references

98 PubMed references

37 compound classes

5

787

324

95

terpenoids, flavonoids, and alkaloids, Figure 1A). About 20% of NANPDB compounds have shown at least one biological activity, the reported activities being broad and diverse. A smaller fraction has already shown quite promising activities via known modes of action. A total of about 100 known biological activities have been recorded, along with 36 distinct modes of action. They have been grouped into 63 biological activity classes (Table S1, Supporting Information). The majority of the known bioactive compounds are anti-infective (e.g., exhibiting anti-HIV, other antiviral, antifungal, antitubercular, other antimycobacterial, and antibacterial activities), cytotoxic, and potential anticancer drugs (Figure 1B). The latter class of bioactive molecules includes compounds exhibiting inhibitory activities against a broad range of cancer cell lines or having recorded antileukemic, kinase inhibitory, tumor anti-initiating, tumor-specific cytotoxic, antimetastatic, antiproliferative, and antitumor activities, while other compounds have shown clinical potential for cancer treatment. Other bioactive molecules include antifeedants or insect repellents, among others. Less than half of the data set (about 2000 compounds) were present in PubChem at the time of data curation. Less than half of the 781 literature references are currently listed in PubMed, making the database of particular interest. The compounds in NANPDB were further compared with known drugs and other published natural products data sets, some of which contain compounds from other specific geographical regions, while others are collections of NPs with 2068

DOI: 10.1021/acs.jnatprod.7b00283 J. Nat. Prod. 2017, 80, 2067−2076

Journal of Natural Products

Article

Figure 1. Current content of NANPDB. (A) Pie chart showing main compound classes. (B) Bar chart showing most recorded biological activities.

Table 2. Description of Datasets Used for Comparative Analysis of ADMET-Related Descriptors Study data set

no. of unique SMILES

NANPDB IDA2PM DrugBank NuBBE StreptomeDB 2.0

4469 1617 7133 1749 4040

NPs from Northern African sources FDA-approved drugs known drugs NPs from Brazilian sources NPs from Streptomyces species

ConMedNP BioPhytMol

3177 633

NPACT

1574

NPs from Central African sources antimycobacterial NPs from around the world anticancer NPs from around the world

nature of compounds

Web site

ref

http://african-compounds.org/nanpdb/ http://idaapm.helsinki.fi/ http://www.drugbank.ca/ http://nubbe.iq.unesp.br/nubbeDB.html/ http://www.pharmaceutical-bioinformatics.de/ streptomedb2/ under construction http://ab-openlab.csir.res.in/biophytmol/

this paper 32 33, 34 35 36

http://crdd.osdd.net/raghava/npact/

38

11 37

bioactivity and mode of action could be at least partly clarified, e.g., the anti-inflammatory properties of skimmianine (B) by downregulating TNF-α,40 for most other molecules such as the alkaloid 1,2,3,11b-tetrahydroquinolactacide (C)41 the activity has not yet been tested. This molecule, as well as some of the other 31 annotated molecules from NANPDB, might be good starting points for the search of novel drugs. Artemisinin and its derivatives are important drugs against malaria. However, it is observed that the causative agents of the genus Plasmodium have become resistant to artemisinin derivatives in many parts of the world.42 A ligand similarity search in NANPDB starting with artemisinin yields seven molecules with a Tanimoto coefficient higher than 0.8, indicating a high structural similarity. These molecules might be worth investigating as molecules with similar effects. Some of the source organisms are already used in traditional medicine against diseases caused by worms, e.g., Ambrosia maritima against bilharziasis.43 The natural product austricin (Figure 5), contained in A. maritima, may also be active against Plasmodium sp. or other parasites. Virtual screening is a well-established and powerful technique extending traditional high-throughput screening technologies. Especially for natural products that are normally available only in low yields, computational models can complement resourceexpensive wet laboratory experiments. Some of the molecules annotated in NANPDB have already been subjected to virtual screening studies as part of other molecular libraries. For instance, ellagic acid was identified in a high-throughput docking approach as a potent inhibitor of casein kinase 2, a putative oncogene in animal models.44 The molecular library

activities against specific diseases (Table 2). The Venn diagrams in Figure 2 show the overlap of NANPDB with the selected data sets. The computed physicochemical parameters for NANPDB compounds were compared with those of other data sets, including data sets for FDA-approved drugs and diverse NPs from the plant, animal, and microbial sources, described in Table 2. The results for the physicochemical parameters to predict drug-likeness, e.g., MW, c log P, HBA, HBD, the number of rotatable bonds (NRB), and the number of Lipinski violations (NLV) are shown in Table S3 (Supporting Information), while the distributions of selected physicochemical properties are shown in Figure 3. In addition, the pkCSM model predictions for 10 toxicity end points have been summarized in Table 3 and S4 (Supporting Information). NANPDB is freely available at http://african-compounds. org/nanpdb/, without the requirement of a login. Furthermore, the SMILES descriptor and low-energy 3D structures of the molecules are downloadable for noncommercial use, e.g., for virtual screening projects. The database enables delivery of information related to research in the pharmaceutical sciences or chemistry of natural products. Some examples of possible applications are explained below. Natural products are a valuable resource for the identification of new drugs.1,2,39 To start a general search for bioactive compounds, it is useful to have a deeper look at scaffolds that are known to be often related to some specific bioactivity (i.e., “privileged scaffolds”). For instance, quinoline appears in many drugs as a substructure, e.g., the disinfectant broxyquinoline. A substructure search with quinoline in NANPDB yields 32 different molecules (Figure 4). While for many of them the 2069

DOI: 10.1021/acs.jnatprod.7b00283 J. Nat. Prod. 2017, 80, 2067−2076

Journal of Natural Products

Article

sequently acknowledged accordingly under curators of the compound. The majority of the identified compounds in NANPDB were terpenoids, flavonoids, and alkaloids. Only about one-fifth of the compounds have been tested to be active in at least one bioassay. Among the bioactive compounds, a majority showed anti-infective, cytotoxic, anticancer, antioxidant, and antiinflammatory properties. A comparison of NANPDB with NPs from different geographical regions (Figure 2A) indicates that only 40 of its compounds have also been found in the Central African and Brazilian flora (ConMedNP and NuBBE, respectively). More than 4200 (>95%) NANPDB compounds have not yet been annotated in these data sets. With respect to bacterial metabolites from Streptomyces sp. and NPs with activities against cancer and mycobacteria, approximately the same proportion of compounds are only found in NANPDB (Figure 2B). Most of the molecules in NANPDB (>98%) are not listed in the data sets of known drugs (Figure S1, Supporting Information). No compound was found in the intersection of the selected data sets compared (Figure S2, Supporting Information). An evaluation of the physicochemical properties employed in Lipinski’s “rule of five”46 (Figure 3) indicates that ∼57% of NANPDB compounds showed no Lipinski violations, while ∼75% of the NPs showed less than two violations (Figure 3D). When compared with the other data sets, only the data sets of known drugs (DrugBank and IDA2PM), along with NuBBE and BioPhytMol, showed a better drug-likeness. This holds true when comparing the mean values for MW (Table S3, Supporting Information). It was additionally observed that the mean c log P values of both NANPDB and FDA-approved drugs were equal to ∼2.15 units (Table S3, Supporting Information). Regarding the distribution of individual properties, we herein focus on two parameters (MW, c log P), showing that ∼75% of NANPDB compounds had MW ≤ 500 Da, with respective percentages of ∼85 (IDA2PM), ∼89 (DrugBank), ∼84 (NuBBE), ∼56 (StreptomeDB 2.0), ∼80 (ConMedNP), ∼92 (BioPhytMol), and ∼70% (NPACT), Figure 3A. All c log P distributions showed approximate Gaussian curves (Figure 3B), with ∼87% of NANPDB compounds respecting Lipinski’s criterion (c log P ≤ 5 units), when compared with ∼89 (IDA2PM), ∼83 (DrugBank), ∼79 (NuBBE), ∼89 (StreptomeDB 2.0), ∼69 (ConMedNP), ∼69 (BioPhytMol), and ∼78% (NPACT). The toxicity prediction indicated that about three-quarters of NANPDB compounds showed compliance (tested negative) to the AMES mutagenic test in bacteria, while none of the compounds were predicted for human ether-a-go-go gene (hERG) I inhibition and only about 8% and 12% were predicted for hepatotoxicity and skin sensitization, respectively (Table 3). A positive prediction for the AMES test would indicate that a compound is mutagenic and may, therefore, act as a carcinogen. Meanwhile, the inhibition of the potassium ion (K+) channels, encoded by hERG, is among the main causes for the development of torsade de pointes, or long QT syndrome (which leads to fatal ventricular arrhythmia).47−49 Toxicity due to the inhibition of hERG channels has often resulted in the removal of many drugs from the market. The pkCSM predictors were built using hERG I and II inhibition information for 368 and 806 compounds, respectively. The pkCSM predictor determines whether or not a compound is likely to be an hERG I/II inhibitor. The hepatotoxicity predictor (based on the measured liver-associated side effects

Figure 2. Venn diagrams showing the intersection of NANPDB with known drugs and other NP databases. (A) NANPDB versus NPs from other geographical regions. (B) NANPDB versus NPs of bacterial origin, along with NPs that target the specific diseases tuberculosis and cancer.

provided for download was processed using Open Babel,45 in order to get a ready-to-use virtual screening library. Data contribution is highly welcome and appreciated. Individuals or research groups willing to make a contribution for updating NANPDB are invited to fill out the downloadable spreadsheet from the Web site (http://african-compounds.org/ nanpdb/downloads/ or Supplementary Data set File and Notes S1, Supporting Information) and provide the SMILES string and/or the structure file(s) of the compound(s) to be submitted in any of the following acceptable formats (SMILES, mol, sdf, mol2, CHEMDRAW). The completed file can be sent via e-mail to the NANPDB team ([email protected]). The data will be checked and uploaded if it meets the requirements. The contributor(s) will be sub2070

DOI: 10.1021/acs.jnatprod.7b00283 J. Nat. Prod. 2017, 80, 2067−2076

Journal of Natural Products

Article

Figure 3. Property distribution of selected Lipinski parameters for NANPDB, together with known drugs and selected NP data sets in terms of percentages. (A) Histogram for MW distribution. (B) Plot of c log P, (C) plot of HBD, (D) histogram of NLV. Data on y-axes are percentage representation of the various data sets.

Table 3. Selected Predicted Toxicity Parameters for NANPDBa toxicity end point predicted % compliance

AMES testb yes 26.36

no 73.64

hERG I inhibitionc yes 0

no 100

hERG II inhibitionc yes 68.91

no 31.09

hepatotoxicityd yes 7.89

no 92.11

skin sensitizatione yes 12.10

no 87.90

a

Data are expressed as a percentage compliance of the dataset. bA compound that tests positive for AMES mutagenicity is mutagenic and therefore may act as a carcinogen. cAn hERG I/II inhibitor could cause the development of acquired long QT syndrome, which leads to fatal ventricular arrhythmia. dA compound that tests positive could be associated with disrupted normal function of the liver. eA compound that tests positive could have a high potential adverse effect for products applied to the skin, e.g., cosmetics and antifungals.

The current data collection included in NANPDB is the most comprehensive database of NPs from the Northern Africa region. For each compound, the details of its chemical structure, biological and physicochemical properties, source species, and literature information are provided. The importance of the database is also demonstrated by the low number of compounds already annotated in the PubChem database and the relatively few references common with PubMed. The bioactive compounds can be further investigated for modes of action and alternative biological activities, while the untested molecules are a valuable resource for future drug discovery efforts. We would admit that the predictive models for DMPK/ADMET modeling were originally designed and

of 531 compounds observed in humans) helps to classify a compound as hepatotoxic if it is predicted to have at least one pathological or physiological event strongly associated with disruption of normal liver functions. Meanwhile, skin sensitization is the potential of a compound to cause adverse effects (e.g., it can induce allergic contact dermatitis) when applied on the skin. The human maximum tolerated dose (Table S4, Supporting Information) or the maximum recommended tolerated dose (in log mg/kg/day), which provides an estimate of the toxic dose threshold of chemicals in humans, is the maximum recommended starting dose in phase I clinical trials (based on extrapolations from animal data). 2071

DOI: 10.1021/acs.jnatprod.7b00283 J. Nat. Prod. 2017, 80, 2067−2076

Journal of Natural Products

Article

Figure 4. Identification of potential drugs in NANPDB by substructure search using quinoline as the “privileged scaffold”, leading to 32 hits. (A) Quinoline scaffold, (B) skimmianine, (C) 1,2,3,11b-tetrahydroquinolactacide, and (D) names of the first 12 hits, with compounds B and C marked in green and red, respectively.



tested for small-molecule benchmark data sets and may not provide reasonably reliable predictions for some of the bulky NPs present in NANPDB. The predictions are, therefore, intended to serve as a mere guide to narrow efforts on just a few compounds that could be of interest for future drug discovery efforts. We intend to expand data coverage by continuous data curation so that further releases are more comprehensive by including more computed molecular descriptors, experimental data leading to the characterization of the NPs, e.g., NMR and MS data, melting and boiling points, and possible biosynthesis pathways toward the included metabolites. About sample availability for compounds from African sources, the most important continent-wide endeavor has been the p-ANAPL project.13 Although for NANPDB compounds already included in PubChem, related sample vendor information could be directly retrieved from PubChem, we are currently working on including data for compound sample availability and/or vendor information for samples dispersed in academic laboratories in the region. NANPDB is planned to be upgraded annually.

EXPERIMENTAL SECTION

Data Sources. Data on source organisms, geographical collection sites, and chemical structures of derived compounds have been retrieved from literature sources from the major international journals on natural products and medicinal chemistry, alongside available Ph.D. theses, spanning the period 1962 to 2016. For journal article data sources, each journal search engine was queried by using the country names as search term. A full list of journals consulted is available as in Table S2 (Supporting Information). The resulting articles were checked to verify that the source species were harvested from the Northern African region. The retained articles were downloaded and referred to henceforth as data sources. The data sources were arranged by taxonomic families of source organisms, in order to avoid duplicate curation. Each data source (abstract and full text) was carefully read, and the related information was tabulated on spreadsheets. The accuracy of the manually curated data was double-checked to minimize errors and redundancy. In double checking, the emphasis was laid on the accuracy of the following: • chemical information • biological activity data • source species information 2072

DOI: 10.1021/acs.jnatprod.7b00283 J. Nat. Prod. 2017, 80, 2067−2076

Journal of Natural Products

Article

values, percentage inhibitory concentrations, and brief descriptions of the biological assays. Literature sources (or references) information include • author names • literature reference type (e.g., journal article, conference abstract, thesis) • reference (e.g., journal full name, year, volume, issue, and page numbers) • reference title • link to PubMed (PubMed IDs are included, where available) • reference link to journal citation (doi or directly accessible Web link) For each data table, additional comments were included. The final tables were further cross checked. A workflow for the entire process is shown in Figure 6. Database Design. NANPDB is organized as a relational database. The main goal of the design was to readily access the information with minimum redundancy in the data. PostgreSQL 9.5.4 was applied as a database server,50 combined with the RDKit PostgreSQL cartridge for processing small molecules51,52 and Ubuntu 16.04 LTS as host operating system. Database Scheme. Physicochemical properties of the molecules were generated using QikProp (version 2015)53 and were used, together with the reported biological activity information, as molecular properties. Compound-related information stored in a single table also includes the chemical structure of the compounds in SMILES format as well as names and synonyms extracted from the PubChem database. The source species information such as kingdom, family, habitat, and availability was stored in a second table. Two other tables include literature reference information and information about the curators. Toxicity predictions were stored in a fifth table (Figure 7). Data Preparation and Upload. Figure S4 (Supporting Information) provides a summary of the main contents of each SQL table and how the tables are connected to each other. Compound names and synonyms were individually queried in PubChem,22 and related data were retrieved. A Python script was used to download individual 2D molecule (sdf) files from PubChem using the manually retrieved and verified PubChem ID (PCID) compound codes. Where compounds were not available in PubChem,22 ChemSpider,54 or MarvinSketch,55 the compounds were sketched (based on reported 2D structures) and compared with the structure (if available) in SciFinder.56 The 2D sketches were carried out using the ChemDraw 8.0 Ultra tool57 and saved as 2D mol files. Chemical structures were double checked for potential problems such as incorrect formulas, the presence of salts, and counterions. Unique (canonical) SMILES strings were generated using Open Babel,45 following the previously described methodology.36,58 In order to double check the SMILES strings obtained, both methods were applied for several hundred cases; that is, 2D models of sketched chemical structures (as mol files) of several compounds available in PubChem were manually generated based on the available structural data in the literature sources. The second set of

Figure 5. Chemical structure of the NANPDB compound austricin, which showed high similarity (Tanimoto coefficient >0.8) to the antimalarial drug artemisinin. The collected information for source organisms includes • their scientific and local names, where available • their known uses (e.g., in traditional medicine) • the part of species where the compound was identified (e.g., plant roots, leaves, whole animal organism, etc.) • kingdom and families • country and region of collection (including GPS coordinates, where available) • source availability and reference (e.g., herbarium name and voucher reference number) • date of collection Furthermore, species information, e.g., alternative taxonomic names and the source species links to taxonomic data in related databases,23−30 was included, where additional information and pictures of the source species can be consulted. For the compounds, retrieved data include • compound name (trivial and IUPAC name) • compound class (e.g., terpenoid, flavonoid, steroid, alkaloid) • compound subclass (e.g., monoterpene, flavanone glycoside, steroidal saponin, pyrrolizidine alkaloid) • known biological activity (e.g., antioxidant activity, antiacetylcholinesterase activity, antidiabetic activity, anticancer effect) • mode of action (when available) • link to PubChem (PubChem IDs are included, where available) Additional comments on the reported biological activities of the compounds include, as the case may be, the IC50, Ki, and/or EC50

Figure 6. Simplified workflow that captures the entire process of construction of NANPDB. 2073

DOI: 10.1021/acs.jnatprod.7b00283 J. Nat. Prod. 2017, 80, 2067−2076

Journal of Natural Products

Article

Figure 7. Simplified database scheme of NANPDB. SMILES strings was generated from downloaded sdf files directly from PubChem, without any further structural changes in the files. In both cases, the generated SMILES codes (hereafter referred to respectively as mol SMILES and pubchem SMILES) were obtained using Open Babel.45 The obtained pubchem SMILES and mol SMILES were compared using in-house Python scripts. In cases where there were discrepancies in the generated pubchem SMILES and mol SMILES, the molecular structures were double checked and corrected, and the SMILES generated again from the corrected mol and sdf files. From the accepted canonical SMILES, 3D models were generated by using Open BabelQikProp (Schrodinger, LLC.45 Lipinski physicochemical property filters such as molecular weight (MW), computed logarithms of the n-octanol/water partition coefficients (c log P), number of hydrogen bond donors (HBDs) and acceptors (HBAs), and other properties related to drug metabolism and pharmacokinetics (DMPK) were computed using QikProp (Schrodinger, LLC),53 after a preliminary ligand preparation treatment on the maestro interface59 using LigPrep (Schrodinger, LLC).60 Data were uploaded in a PostgreSQL 9.5.4 database. All compound data were made redundant free for each compound using canonical SMILES, such that each unique SMILES is linked to only one molecular entry. The data for each species were also normalized, by using the species name and taxon ID, to ensure that each species link provides all data for identified compounds and literature references (Table 1). Calculation of Toxicity Parameters. The putative toxicity of the compounds in the entire database was assessed by using the freely accessible Cambridge University pkCSM Web server.61,62 The pkCSM model is based on 10 predicted end points for toxicity: AMES test for mutagenicity assessment, maximum tolerated dose in humans (expressed in log mg/kg/day), hERG I inhibition, hERG II inhibition, oral acute toxicity (LD50) in rats (expressed in mol/kg), oral chronic toxicity (LOAEL) in rats (log mg/kg body weight/day), hepatotoxicity, skin sensitization, Tetrahymena pyriformis toxicity (expressed in log μg/L), and Fathead Minnow (fish) toxicity (expressed in log mM). Web Service. The NANPDB Web site is based on the Django 1.8.2 framework.63 The Web interface (Figure S3, Supporting

Information) allows for searches by molecule name, chemical substructure, chemical similarity, source species, biological activity, SMILES, reference, and molecular descriptors. Assistance to users (e.g., a basic tutorial on searching NANPDB) is included. The entire SMILES and molecule files are available for download. The chemical substructure search is implemented using the JSDraw chemical structure editor.64



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jnatprod.7b00283. Supporting Information (PDF)



AUTHOR INFORMATION

Corresponding Authors

*E-mail: ntiekfi[email protected] (F. Ntie-Kang). *E-mail: [email protected] (S. Günther). ORCID

Fidele Ntie-Kang: 0000-0003-0795-394X Author Contributions #

F. Ntie-Kang and Kiran K. Telukunta contributed equally and are joint first authors. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS F.N.K. acknowledges a Georg Forster fellowship from the Alexander von Humboldt Foundation, Germany. C.V.S. is 2074

DOI: 10.1021/acs.jnatprod.7b00283 J. Nat. Prod. 2017, 80, 2067−2076

Journal of Natural Products

Article

currently a doctoral candidate financed by the German Academic Exchange Services (DAAD), Germany (Award No. 91611788). The authors also acknowledge the free academic license to use the JSDraw chemical structure editor. The authors also acknowledge the assistance of Dr. D. E. V. Pires for providing the toxicity predictions calculated with the pkCSM server.



(24) Tropicos.org. Missouri Botanical Garden (http://www.tropicos. org). Accessed May 21, 2017. (25) World Register of Marine Species (http://www.marinespecies. org). Accessed May 11, 2016. (26) Benson, D. A.; Karsch-Mizrachi, I.; Lipman, D. J.; Ostell, J.; Sayers, E. W. Nucleic Acids Res. 2010, 38, D46−D5110.1093/nar/ gkp1024 (http://www.ncbi.nlm.nih.gov). Accessed July 1, 2016. (27) Robert, V.; Vu, D.; Amor, A. B. H.; Van de Wiele, N.; Brouwer, C.; Jabas, B.; Szoke, S.; Dridi, A.; Triki, M.; Ben Daoud, S.; Chouchen, O.; Vaas, L.; De Cock, A.; Stalpers, J. A.; Stalpers, D.; Verkley, G. J.; Groenewald, M.; Dos Santos, F. B.; Stegehuis, G.; Li, W.; Wu, L.; Zhang, R.; Ma, J.; Zhou, M.; Gorjón, S. P.; Eurwilaichitr, L.; Ingsriswang, S.; Hansen, K.; Schoch, C.; Robbertse, B.; Irinyi, L.; Meyer, W.; Cardinali, G.; Hawksworth, D. L.; Taylor, J. W.; Crous, P. W. IMA Fungus 2013, 4, 371−37910.5598/imafungus.2013.04.02.16 (http://www.mycobank.org). Accessed June 7, 2016. (28) Crous, P. W.; Gams, W.; Stalpers, J. A.; Robert, V.; Stegehuis, G. Stud. Mycol. 2004, 50, 19−22. (29) Robert, V.; Stegehuis, G.; Stalpers, J. The MycoBank Engine and Related Databases. CentraalbureauvoorSchimmelcultures (CBS), 2005 (http://www.mycobank.org). Accessed May 15, 2016. (30) Sayers, E. W.; Barrett, T.; Benson, D. A.; Bryant, S. H.; Canese, K.; Chetvernin, V.; Church, D. M.; DiCuccio, M.; Edgar, R.; Federhen, S.; Feolo, M.; Geer, L. Y.; Helmberg, W.; Kapustin, Y.; Landsman, D.; Lipman, D. J.; Madden, T. L.; Maglott, D. R.; Miller, V.; Mizrachi, I.; Ostell, J.; Pruitt, K. D.; Schuler, G. D.; Sequeira, E.; Sherry, S. T.; Shumway, M.; Sirotkin, K.; Souvorov, A.; Starchenko, G.; Tatusova, T. A.; Wagner, L.; Yaschenko, E.; Ye, J. Nucleic Acids Res. 2009, 37, D5− D1510.1093/nar/gkn741 (http://www.ncbi.nlm.nih.gov/taxonomy). Accessed July 29, 2016. (31) PubMed Database (http://www.ncbi.nlm.nih.gov/pubmed). Accessed July 29, 2016. (32) Legehar, A.; Xhaard, H.; Ghemtio, L. J. Cheminf. 2016, 8, 3310.1186/s13321-016-0141-7 (http://idaapm.helsinki.fi/). Accessed September 19, 2016 . (33) Wishart, D. S.; Knox, C.; Guo, A. C.; Shrivastava, S.; Hassanali, M.; Stothard, P.; Chang, Z.; Woolsey, J. Nucleic Acids Res. 2006, 34, D668−D672. (34) DrugBank version 5.0 (http://www.drugbank.ca/). Accessed September 19, 2016. (35) Valli, M.; Dos Santos, R. N.; Figueira, L. D.; Nakajima, C. H.; Castro-Gamboa, I.; Andricopulo, A. D.; Bolzani, V. S. J. Nat. Prod. 2013, 76, 439−44410.1021/np3006875 (http://nubbe.iq.unesp.br/ nubbeDB.html/). Accessed September 19, 2016. (36) Klementz, D.; Döring, K.; Lucas, X.; Telukunta, K. K.; Erxleben, A.; Deubel, D.; Erber, A.; Santillana, I.; Thomas, O. S.; Bechthold, A.; Günther, S. Nucleic Acids Res. 2016, 44, D509−D51410.1093/nar/ gkv1319 (http://www.pharmaceutical-bioinformatics.org/ streptomedb/) Database accessed September 19, 2016. (37) Sharma, A.; Dutta, P.; Sharma, M.; Rajput, N. K.; Dodiya, B.; Georrge, J. J.; Kholia, T.; Bhardwaj, A. OSDD Consortium. J. Cheminf. 2014, 6, 4610.1186/s13321-014-0046-2 (http://ab-openlab.csir.res.in/ biophytmol/). Database accessed September 19, 2016 . (38) Mangal, M.; Sagar, P.; Singh, H.; Raghava, G. P. S.; Agarwal, S. M. Nucleic Acids Res. 2013, 41, D1124−D112910.1093/nar/gks1047 (http://crdd.osdd.net/raghava/npact/). Database accessed September 19, 2016. (39) Mishra, B. B.; Tiwari, V. K. Eur. J. Med. Chem. 2011, 46, 4769− 7807. (40) Ratheesh, M.; Sindhu, G.; Helen, A. Inflammation Res. 2013, 62, 367−376. (41) El-Neketi, M.; Ebrahim, W.; Lin, W.; Gedara, S.; Badria, F.; Saad, H. E.; Lai, D.; Proksch, P. J. Nat. Prod. 2013, 76, 1099−1104. (42) WHO. Updates on Artemisinin Resistance (http://www.who. int/malaria/areas/drug_resistance/updates/en/). Accessed Jan 6, 2017. (43) Geerts, S.; van Blerk, K.; Triest, L. J. Ethnopharmacol. 1994, 42, 7−11.

REFERENCES

(1) Harvey, A. L.; Edrada-Ebel, R.; Quinn, R. J. Nat. Rev. Drug Discovery 2015, 14, 111−129. (2) Rodrigues, T.; Reker, D.; Schneider, P.; Schneider, G. Nat. Chem. 2016, 8, 531−541. (3) Ntie-Kang, F.; Yong, J. N. RSC Adv. 2014, 4, 61975−61991. (4) Yong, J. N.; Ntie-Kang, F. RSC Adv. 2015, 5, 26580−26595. (5) Ntie-Kang, F.; Njume, L. E.; Malange, Y. I.; Günther, S.; Sippl, W.; Yong, J. N. Nat. Prod. Bioprospect. 2016, 6, 63−96. (6) Abdel-Azim, N. S.; Shams, K. A.; Shahat, A. A. A.; El Missiry, M. M.; Ismail, S. I.; Hammouda, F. M. Res. J. Med. Plant 2011, 5, 136− 141. (7) Shahat, A. A.; Pieters, L.; Apers, S.; Nazeif, N. M.; Abdel-Azim, N. S.; Berghe, D. V.; Vlietinck, A. J. Phytother. Res. 2001, 15, 593−597. (8) Risinger, A. L.; Mooberry, S. L. Cancer Lett. 2010, 291, 14−19. (9) United Nations Organization: Composition of Macro Geographical (Continental) Regions, Geographical Sub-regions, and Selected Economic and Other Groupings. http:// millenniumindicators.un.org/unsd/methods/m49/m49regin.htm. Assessed August 28, 2016. (10) Ntie-Kang, F.; Mbah, J. A.; Mbaze, L. M.; Lifongo, L. L.; Scharfe, M.; Ngo Hanna, J.; Cho-Ngwa, F.; Onguéné, P. A.; Owono, L. C. O.; Megnassan, E.; Sippl, W.; Efange, S. M. N. BMC Complementary Altern. Med. 2013, 13, 88. (11) Ntie-Kang, F.; Onguéné, P. A.; Scharfe, M.; Owono, L. C. O.; Megnassan, E.; Mbaze, L. M.; Sippl, W.; Efange, S. M. N. RSC Adv. 2014, 4, 409−419. (12) Ntie-Kang, F.; Zofou, D.; Babiaka, S. B.; Meudom, R.; Scharfe, M.; Lifongo, L. L.; Mbah, J. A.; Mbaze, L. M.; Sippl, W.; Efange, S. M. N. PLoS One 2013, 8, e78085. (13) Ntie-Kang, F.; Onguéné, P. A.; Fotso, G. W.; Andrae-Marobela, K.; Bezabih, M.; Ndom, J. C.; Ngadjui, B. T.; Ogundaini, A. O.; Abegaz, B. M.; Meva’a, L. M. PLoS One 2014, 9, e90655. (14) Ntie-Kang, F.; Nwodo, J. N.; Ibezim, A.; Simoben, C. V.; Karaman, B.; Ngwa, V. F.; Sippl, W.; Adikwu, M. U.; Mbaze, L. M. J. Chem. Inf. Model. 2014, 54, 2433−2450. (15) Ntie-Kang, F.; Simoben, C. V.; Karaman, B.; Ngwa, V. F.; Judson, P. N.; Sippl, W.; Mbaze, L. M. Drug Des., Dev. Ther. 2016, 10, 2137−2154. (16) Onguéné, P. A.; Ntie-Kang, F.; Mbah, J. A.; Lifongo, L. L.; Ndom, J. C.; Sippl, W.; Mbaze, L. M. Org. Med. Chem. Lett. 2014, 4, 6. (17) Ibezim, A.; Debnath, B.; Ntie-Kang, F.; Mbah, C. J.; Nwodo, N. J. Med. Chem. Res. 2017, 26, 562−579. (18) Hatherley, R.; Brown, D. K.; Musyoka, T. M.; Penkler, D. L.; Faya, N.; Lobb, K. A.; Bishop, Ö . T. J. Cheminf. 2015, 7, 29. (19) Irwin, J. J.; Sterling, T.; Mysinger, M. M.; Bolstad, E. S.; Coleman, R. G. J. Chem. Inf. Model. 2012, 52, 1757−1768. (20) Sterling, T.; Irwin, J. J. J. Chem. Inf. Model. 2015, 55, 2324− 2337. (21) ZINC12 subsets: AfroDb Natural Products (http://zinc. docking.org/catalogs/afronp). Published by permission from the authors of ref 12. Accessed June 27, 2016. (22) Kim, S.; Thiessen, P. A.; Bolton, E. E.; Chen, J.; Fu, G.; Gindulyte, A.; Han, L.; He, J.; He, S.; Shoemaker, B. A.; Wang, J.; Yu, B.; Zhang, J.; Bryant, S. H. Nucleic Acids Res. 2016, 44, D1202− D121310.1093/nar/gkv951 (https://pubchem.ncbi.nlm.nih.gov/). Accessed May 17, 2016. (23) Prota Africa: Plant resources of Tropic Africa (http://www. prota4u.org/). Accessed May 20, 2016. 2075

DOI: 10.1021/acs.jnatprod.7b00283 J. Nat. Prod. 2017, 80, 2067−2076

Journal of Natural Products

Article

(44) Cozza, G.; Bonvini, P.; Zorzi, E.; Poletto, G.; Pagano, M. A.; Sarno, S.; Donella-Deana, A.; Zagotto, G.; Rosolen, A.; Pinna, L. A.; Meggio, F.; Moro, S. J. Med. Chem. 2006, 49, 2363−2366. (45) O’Boyle, N. M.; Banck, M.; James, C. A.; Morley, C.; Vandermeersc h, T.; Hutchison, G. R. J. Cheminf. 2011, 3, 33. (46) Lipinski, C. A.; Lombardo, F.; Dominy, B. W.; Feeney, P. J. Adv. Drug Delivery Rev. 1997, 23, 3−25. (47) Chiesa, N.; Rosati, B.; Arcangeli, A.; Olivotto, M.; Wanke, E. J. Physiol. 1997, 501, 313−318. (48) De Ponti, F.; Poluzzi, E.; Montanaro, N. Eur. J. Clin. Pharmacol. 2001, 57, 185−209. (49) Vandenberg, J. I.; Walker, B. D.; Campbell, T. J. Trends Pharmacol. Sci. 2001, 22, 240−246. (50) PostgreSQL 9.5.4, Documentation. Development Versions: devel/9.6; The PostgreSQL Global Development Group, 1996− 2016 (https://www.postgresql.org/docs/9.5/static/). (51) The RDKit Database Cartridge: Open-source Cheminformatics and Machine Learning (http://www.rdkit.org/docs/Cartridge.html). Accessed August 29, 2016. (52) Landrum, G.; Palmer, A. The RDKit and PostgreSQL: an Opensource Database System for Chemistry. 5th Meeting on U.S. Government Chemical Databases and Open Chemistry, Basel, Switzerland, August 2011. (53) QikProp Version 2015, Rapid ADME Predictions of Drug Candidates (https://www.schrodinger.com/QikProp/); Schrodinger: New York, 2015. (54) Pence, H. E.; Williams, A. J. Chem. Educ. 2010, 87, 1123− 112410.1021/ed100697w (http://www.chemspider.com/). Database Accessed August 13, 2016. (55) MarvinSketch 15.4.6.0; ChemAxon: Cambridge, MA, 2015. (56) Scifinder, 2015; Chemical Abstracts Service: Columbus, OH, 2015; RN 58-08-2. Accessed August 13, 2016. (57) Mendelsohn, L. D. J. Chem. Inf. Comput. Sci. 2004, 44, 2225− 2226. (58) Lucas, X.; Senger, C.; Erxleben, A.; Grüning, B. A.; Döring, K.; Mosch, J.; Flemming, S.; Günther, S. Nucleic Acids Res. 2013, 41, D1130−D1136. (59) Maestro, Version 9.2; LLC: New York, 2011. (60) LigPrep Software, Version 2.5; LLC: New York, 2011. (61) University of Cambridge. pkCSM-pharmacokinetics Prediction Web Server (http://bleoberis.bioc.cam.ac.uk/pkcsm/). Server accessed September 19, 2016. (62) Pires, D. E. V.; Blundell, T. L.; Ascher, D. B. J. Med. Chem. 2015, 58, 4066−4072. (63) Django 1.8.2: a High-level Python Web Framework that Encourages Rapid Development and Clean, Pragmatic Design (https://pypi.python.org/pypi/Django/1.8.2). Accessed September 2, 2016. (64) JSDraw2 - A Javascript Chemical Structure Editor/Viewer, Scilligence, 2015 (http://www.scilligence.com/).

2076

DOI: 10.1021/acs.jnatprod.7b00283 J. Nat. Prod. 2017, 80, 2067−2076