First Multitarget Chemo-Bioinformatic Model To Enable the Discovery

Publication Date (Web): March 9, 2016. Copyright © 2016 American Chemical .... Speeding Up the Virtual Design and Screening of Therapeutic Peptides...
0 downloads 0 Views 1MB Size
Article pubs.acs.org/jcim

First Multitarget Chemo-Bioinformatic Model To Enable the Discovery of Antibacterial Peptides against Multiple Gram-Positive Pathogens Alejandro Speck-Planche,*,†,‡ Valeria V. Kleandrova,§ Juan M. Ruso,† and M. N. D. S. Cordeiro‡ †

Department of Applied Physics, University of Santiago de Compostela (USC), 15782 Santiago de Compostela, Spain REQUIMTE/Department of Chemistry and Biochemistry, University of Porto, 4169-007 Porto, Portugal § Faculty of Technology and Production Management, Moscow State University of Food Production, Volokolamskoe shosse 11, 125080 Moscow, Russia ‡

S Supporting Information *

ABSTRACT: Antimicrobial peptides (AMPs) have emerged as promising therapeutic alternatives to fight against the diverse infections caused by different pathogenic microorganisms. In this context, theoretical approaches in bioinformatics have paved the way toward the creation of several in silico models capable of predicting antimicrobial activities of peptides. All current models have several significant handicaps, which prevent the efficient search for highly active AMPs. Here, we introduce the first multitarget (mt) chemo-bioinformatic model devoted to performing alignment-free prediction of antibacterial activity of peptides against multiple Gram-positive bacterial strains. The model was constructed from a data set containing 2488 cases of AMPs sequences assayed against at least 1 out of 50 Gram-positive bacterial strains. This mt-chemo-bioinformatic model displayed percentages of correct classification higher than 90.00% in both training and prediction (test) sets. For the first time, two computational approaches derived from basic concepts in genetics and molecular biology were applied, allowing the calculations of the relative contributions of any amino acid (in a defined position) to the antibacterial activity of an AMP and depending on the bacterial strain used in the biological assay. The present mt-chemo-bioinformatic model constitutes a powerful tool to enable the discovery of potent and versatile AMPs. peptides have been compiled.10 At the same time, recent advances in the use of theoretical approaches in disciplines such as bioinformatics have paved the way to the extraction of data from those public sources mentioned above, with the subsequent development of computational models focused on searching or predicting AMPs. In this context, and with the aim of obtaining as much information as possible, all current models have combined different approaches and physicochemical and structural descriptors together with diverse methods for data analysis such as principal component analysis (PCA) and partial leastsquares (PLS),11 artificial neural networks (ANN),12−15 support vector machines (SVM),16−18 decision trees (DT),19 k-mean clustering (kMC),20 fuzzy k-nearest neighbor (fKNN),21 and many others.22−29 Without doubts, all the computational tools created to date represent significant progresses toward the better understanding of the biological behaviors of the AMPs. These computational models have helped to reduce the high cost and the time consumption associated with the synthesis and evaluation of large quantities of peptides. Decision making constitutes one of the prime objectives in any computational model with prospecting purposes, guiding the

1. INTRODUCTION Antimicrobial peptides (AMPs) have become promising and hopeful therapeutic alternatives in the fight against the diverse community-acquired diseases and nosocomial infections that afflict mankind.1,2 The most updated knowledge regarding the broad-spectrum activity of AMPs suggests that they can be used to inhibit and/or kill many pathogenic microorganisms (bacteria, fungi, parasites, etc.), including those which exhibit high degrees of resistance against multiple antimicrobial drugs.3 Indeed, there are several AMPs, which are currently undergoing preclinical and/or clinical trials at different stages of development.4 Nowadays, AMPs with antibacterial activity are the most studied, and most of them are cationic in nature with hydrophilic and hydrophobic domains, which allow them to target bacterial cell membranes and cause disintegration of the lipid bilayer structure.5−7 On the other hand, AMPs can kill bacteria by inhibiting some important pathways inside the cell such as DNA replication and protein synthesis.8 In this context, net charge, helicity, and amphipathicity have been considered some of the most important physicochemical/structural properties to be present in highly active AMPs.9 The fast growth in the potential applications of AMPs as emerging pharmacological agents has led to the creation of many databases, where important aspects such as the structures, functions, and activity profiles of the © 2016 American Chemical Society

Received: October 16, 2015 Published: March 9, 2016 588

DOI: 10.1021/acs.jcim.5b00630 J. Chem. Inf. Model. 2016, 56, 588−598

Article

Journal of Chemical Information and Modeling

contain information related to the different classes of pathogenic microorganisms, but they do not specify the diverse bacterial strains that a pathogen may have. This is an aspect of crucial importance because different strains belonging to the same microorganism can exhibit different degrees of resistance to the AMPs. Therefore, according to our revision of the scientific literature, we decided to select the Database of Antimicrobial Activity and Structure of Peptides (DBAASP) because of its high degree of curation of the experimental information.45 In this database, it is very easy to compile the data associated with the AMPs, and the most important aspect is that the notation related with the bacterial strains is more accurate than in any other public source. Nevertheless, we performed an additional curation by employing the StrainInfo Web site, which contains a rigorous annotation of many microbial strains.46 In some cases, it was detected that two different strain notations in DBAASP corresponded to the same bacterial strain. We finally extracted 1581 different peptide sequences, ranging from 1 to 119 amino acids. Only AMPs containing natural amino acids were considered for this study. Therefore, all the AMPs with unusual amino acids were excluded, while the only chemical modifications (when present) in the sequences of the AMPs were those corresponding to the acetylation and amidation of the Nterminus and C-terminus, respectively. These AMPs were experimentally tested against at least 1 out of 50 different Gram-positive bacterial strains. Taking into consideration that several AMPs were assayed against more than one strain, our data set contained 2488 cases (AMP sequencebacterial strain combinations). The values of antibacterial activities of all the AMPs appeared in μg/mL. For this reason, they were converted to micromolar (μM) according to the following steps. First, the *.txt file containing all the peptide sequences was transformed to *.fasta by using the online tool known as format converter,47 which belongs to the HIV sequence database.48−50 After, the *.fasta file was transformed to *.sdf through the use of the Standardizer v6.1.4 (Chemaxon).51 Finally, the approximate molecular weight of each AMP was calculated from the *.sdf file by employing the software PaDEL-descriptor v2.21.52,53 It is necessary to point out that the transformation of the values of antibacterial activity from μg/mL to μM is very important because it allows the direct and accurate comparison of the inhibitory activities among the AMPs. Taking into account the different values of antibacterial activities in our data set, each case of AMP sequence was assigned to 1 of 2 possible classes called active [ABi(bs) = 1] and inactive [ABi(bs) = −1]. In this context, ABi(bs) is a categorical variable describing the antibacterial activity of an AMPi against a defined Grampositive bacterial strain (bs). Thus, any AMP was selected as active for MIC ≤ 11.965 μM, being MIC the minimum inhibitory concentration, i.e., the lowest concentration that prevented the visible growth of the isolates of a specific strain. Otherwise, the AMP was annotated as inactive. Despite the arbitrary nature of the cutoff presented here, this is very close to the cutoff values used in several works for the prediction of antimicrobial agents based on small molecules.54−56 We also chose this cutoff in order to avoid significant imbalance in the number of active peptides with respect to the number of those annotated as inactive, which may remarkably affect the future utility and performance of the model. For the characterization of the structural information present in the AMPs, we calculated connectivity-like indices, which were derived from the Kier-Hall (KH) formalism.57 It should be emphasized that the KH connectivity indices constitute one of

experimental scientists to optimize in a cost-effective manner the discovery of new molecular entities with desirable biological activities.30 Despite the efforts of the scientific community, several aspects regarding the search for AMPs need to be resolved because they constitute key handicaps in the current computational models developed to date. Thus, at least one of the following four major drawbacks in the models mentioned above has been found. First, the antimicrobial activity of the peptides is predicted unspecifically, which means that the models can only discriminate between AMPs and those which have other biological activities. In these cases, there is no possibility of knowing the microbial strain against which a peptide can be active. Consequently, from a phenomenological and experimental point of view, if any model identifies potential AMPs, then many assays will have to be conducted in order to determine the pathogen(s) inhibited by the peptides. Second, sometimes, the antimicrobial activity of the peptides is predicted against only one microorganism, and this prevents the imminent applications of the peptides, taking into consideration that the ideas regarding the wide-spectrum activity of AMPs have now been established.3,4,10 Third, usually, the absence of a cutoff value of antimicrobial activity is detrimental for the future information that can be gathered from a model because it is not possible to discriminate highly active AMPs from those with moderate or low activity. This leads to lack of deeper knowledge regarding the physicochemical/structural elements responsible for the enhancement of the antimicrobial potency. Four, the descriptors used for generating the computational models have a global nature; descriptors without local features will prevent the analysis and evaluation of the influence of any amino acid or fragment (dipeptide, tripeptide, etc.) in the antimicrobial activity of a defined peptide. Thus, if the purpose is to design new peptides from different amino acids, these previous models will not be able to provide information regarding the positions and distances at which the diverse amino acids should be placed in the sequence in order to achieve a high antimicrobial activity. All the aforementioned disadvantages can be overcome with the use of multitarget (mt) approaches. In this sense, mtcomputational models have been successfully used in many areas, including the discovery and simultaneous virtual screening of molecules with desired pharmacological activities against diverse biological entities (proteins, microorganisms, cell lines, etc.)31−37 and the concurrent prediction of several functions of many biomacromolecules.38−41 It is necessary to point out that the mtcomputational models combine some of the data analysis methods cited above, with the use of descriptors derived from graph invariants (topological indices). These descriptors encode both global and local information, and they have relatively simple interpretations, allowing the assessment of the potential relationships between certain fragments and the biological activities under study.42−44 To the best of our knowledge, there is no model capable of performing predictions of AMPs by considering several pathogenic microorganisms at the same time. Bearing in mind all the aspects explained until now, and in order to apply one of these mt-approaches in antibacterial research, we introduce here the first mt-chemo-bioinformatic model focused on the simultaneous prediction of the antibacterial activity of peptides against multiple Gram-positive bacterial strains.

2. MATERIALS AND METHODS 2.1. Data Set and Calculation of the Descriptors. For the construction of the data set, any public source can be in principle used.10 We are aware that in many cases the databases of AMPs 589

DOI: 10.1021/acs.jcim.5b00630 J. Chem. Inf. Model. 2016, 56, 588−598

Article

Journal of Chemical Information and Modeling

Figure 1. Pictorial description of the different steps devoted to the creation of the mt-chemo-bioinformatic model.

(acetylation) and the C-terminus (amidation), respectively. The symbol L refers to the length (number of amino acids) of the peptide, and its introduction permits to normalize the KHm(PP) indices. Additionally, θ is the Mills’s constant, and K is the Viswanath’s constant. It should be pointed out that θ and K are arbitrary mathematical constants, whose purpose is to contribute to the increase/decrease of the values of the MKHm(PP) indices in such a way that the chemical modifications in the N-terminus and C-terminus are considered. Therefore, notice that θ > K, which means that the functional group modifying the N-terminus (acetyl) is larger than the functional group modifying the Cterminus (amino). These two constants were selected because with their introductions, there are no drastic changes in the MKHm(PP) values. Anyway, in principle, many other constants could be used. 2.2. Generating Multitarget Descriptors through the Moving Average Approach. The descriptors of the form MKHm(PP) only depend on the peptide sequence. For this reason, they will not be able to discriminate the antibacterial activity when an AMP is tested against different bacterial strains. The solution to this problem is the central foundation of many of the mt-computational models published in the literature.38−44 All these models are based on the Box-Jenkins moving average approach, which was initially suggested many years ago for time series analysis.64 The mathematical formalism adapted to this work is presented in the following way

the most recognized descriptors with wide applications in many fields of research,58 and they can be employed to characterize the molecular structure at different levels of complexity. From a physicochemical/structural point of view, KH connectivity indices are relatively simple to interpret, and the computational cost to calculate them is very low. The program PROTDCAL created by Marrero-Ponce et al. was used to calculate the KH-like indices [KHm(PP)].59 This program is able to generate diverse families of descriptors for peptides and/or proteins.60−63 In order to calculate the KHm(PP), we used the option named “Indices”, selecting the suboption “Chemical-physical and structural composition indices”. After, in the option “Groups”, we chose the abbreviation PRT, which means that the calculation was performed for the whole sequence. Then, we selected the aggregation operator (Manhattan distance), and, finally, we chose the weighting procedure (in the menu called “Options”) according to the KH-formalism, with cutoff (order) m ranging from 1 to 5. The following equation summarizes the KHm(PP) for an entire AMP sequence A

KHm(PP) =

na

∑ ∏ PPjα α=1

j=1

(1)

In eq 1, A represents the number of segments with a maximum length of m amino acids, nα is the number of amino acids in a subsegment, and PPjα is the tabulated physicochemical/ structural property of the jth amino acid in the segment α. In our study, 15 PPs were used, being associated with the hydrophobicity, steric and electronic features, and the probabilities of the amino acids to be present in certain conformations. In this work, the KHm(PP) indices were modified according to the following formalism MKHm(PP) =

KHm(PP) ·θ nt ·K ct L

avgMKHm(PP) =

1 n(bs)

n(bs)

∑ MKHmi(PP) i=1

(3)

In eq 3, avgMKHm(PP) is the mean of the descriptors MKHmi(PP) for all the ith AMPs assayed against the same Grampositive bacterial strain. Indeed, n(bs) precisely refers to the number of AMPs tested against the same strain. Then, the deviation terms can be calculated

(2)

In eq 2, MKHm(PP) is the modified KH-like index. On the other hand, nt and ct are categorical variables that characterize the presence/absence of chemical modifications in the N-terminus

DMKHmi(PP) = MKHmi(PP) − avgMKHm(PP) 590

(4)

DOI: 10.1021/acs.jcim.5b00630 J. Chem. Inf. Model. 2016, 56, 588−598

Article

Journal of Chemical Information and Modeling Table 1. List of Descriptors Present in the Final mt-Computational Model symbol

definition

DMKH1(Pb)

Deviation of the connectivity index derived from the Kier-Hall formalism, considering subgraphs formed by one amino acid, being weighted by the probability of an amino acid to be in a β-sheet conformation. Deviation of the connectivity index derived from the Kier-Hall formalism, considering subgraphs formed by one amino acid, being weighted by the probability of an amino acid to be in a β-turn conformation. Deviation of the connectivity index derived from the Kier-Hall formalism, considering subgraphs formed by a maximum of three amino acids, being weighted by the combined measure of hydrophobicity-related properties. Deviation of the connectivity index derived from the Kier-Hall formalism, considering subgraphs formed by a maximum of three amino acids, being weighted by the combined measure of bulkiness-related properties. Deviation of the connectivity index derived from the Kier-Hall formalism, considering subgraphs formed by a maximum of four amino acids, being weighted by the isotropic surface area. Deviation of the connectivity index derived from the Kier-Hall formalism, considering subgraphs formed by a maximum of five amino acids, being weighted by the molecular weight.

DMKH1(Pt) DMKH3(Z1) DMKH3(Z2) DMKH4(ISA) DMKH5(Mw)

Notice that in eq 4, the deviation terms DMKHmi(PP) consider the sequence of an AMP and the Gram-positive bacterial strain against which the AMP was tested. For this reason, the deviation terms DMKHmi(PP) [Box-Jenkins moving averages] can be viewed as multitarget descriptors. In total, 75 of these descriptors were used during the generation of the mtchemo-bioinformatic model (Figure 1). In order to create the mt-chemo-bioinformatic model, we used the cross-validation method known as independent test set, in which the whole data set was partitioned into two series: training and prediction (test) sets.65 In this work, the partition was performed randomly. Thus, our data set containing the 2488 cases of AMPs was split into training (1991 cases) and prediction (497 cases) sets. The training set was used to search for the best mt-chemo-bioinformatic model, while the prediction (test) set was used to validate the model, in order to demonstrate its predictive capability. Cases in the prediction set were never used to construct the model (training set). We employed linear discriminant analysis (LDA) as the statistical method for the generation of the model, using a forward stepwise procedure as variable (descriptor) selection strategy. The program STATISTICA v6.0 was used to perform this task66 ABi (bs) = a0 +

∑ b·DMKHmi(PP)

prediction sets. These additional indices served to demonstrate the quality and predictive power of the mt-chemo-bioinformatic model.

3. RESULTS AND DISCUSSION 3.1. Mt-Chemo-Bioinformatic Model. In addition to the strategy used for choosing the most adequate descriptors, the principle of parsimony was applied. This means that the best model was selected by considering the highest statistical quality and the lowest number of descriptors. In our mt-chemobioinformatic model, only 6 descriptors entered ABi (bs) = −27.065DMKH1(Pb) − 23.647DMKH1(Pt ) + 0.737DMKH3(Z1) + 1.432DMKH3(Z2) + 0.141DMKH4(ISA) + 0.144DMKH5(Mw) − 0.317 N = 1991 λ = 0.457 p‐value < 10−16 χ 2 = 1555.30 (6)

The symbols, as well as the corresponding definitions of the different descriptors, appear in Table 1. By relating the relatively low λ to the small p-value in eq 6, we can conclude that from one side, the means of all the independent variables/descriptors are different across the groups of the categorical variable ABi(bs), which permit to discriminate active peptides from those assigned as inactive. On the other hand, the large χ2 (also associated with the p-value) indicates the great independence between classes of active and inactive peptides. Therefore, it is intuitive to deduce that the present mt-chemo-bioinformatic model has very good statistical quality. In the training set, the mt-chemo-bioinformatic model exhibited a sensitivity of 94.36%, correctly classifying 870 out of 922 active cases, while 1009 out of 1069 inactive cases were properly classified (specificity = 94.39%). The accuracy was 94.37%. At the same time, the prediction set indicated that 222 out of 239 active cases were rightly classified/predicted (sensitivity = 92.89%), as well as 249 out of 258 inactive cases (specificity = 96.51%). The percentage of correct classification (accuracy) in the prediction set was 94.77%. Specific details regarding the percentages of classification of each AMP, together with all the relevant chemical and biological data can be found in Tables S1−S5 of the Supporting Information (SI). In addition, the percentages of correctly classified peptides (%CCP) depending on the type of Gram-positive bacterial strain were in the range 71.43−100% for active AMPs and 75−100% for inactive AMPs. The individual %CCP values for each Grampositive bacterial strain appear depicted in the SI (Table S6). Another statistical index of significant importance is MCC, which measures the strength of the correlation between the observed and predicted values of a categorical variable in a

(5)

In eq 5, a0 is the constant term, and b refers to the coefficients of the descriptors. We need to point out that during the creation of the model, the software STATISTICA v6.0 takes the initial categorical values of ABi(bs), and it initially transforms them into scores (continuous values) of antibacterial activity. Then, after the calculation of statistics such as Mahalanobis’s distance,64 the predicted categorical values of ABi(bs) for all AMPs are generated together with their probabilities to belong to the different classes/categories. The quality of the model was determined through the analysis of several statistical indices such as Wilks’s lambda (λ), chi-square (χ2), and p-value, which were obtained for the training set. In this sense, we need to say that λ is a measure of the total variance not explained by group differences using the means of the independent variables/descriptors as criteria. Thus, λ ranges from 0 (perfect discrimination between classes/ categories) to 1 (no discrimination). On the other hand, χ2 is used to assess the independence between any two criteria of classification of qualitative data (e.g., in our case, the criteria are the classes named active and inactive). The p-value represents the probability of error of the model, and it is strongly associated with λ and χ2. In addition, we calculated other statistical indices such as sensitivity, specificity, accuracy, Matthews’s correlation coefficient (MCC), and receiver operating characteristic (ROC) curves,36,67 which were calculated for both training and 591

DOI: 10.1021/acs.jcim.5b00630 J. Chem. Inf. Model. 2016, 56, 588−598

Article

Journal of Chemical Information and Modeling classification model.68 The mt-chemo-bioinformatic model exhibited MCC values of 0.887 and 0.896 for training and prediction sets, respectively. These values indicate a very high correlation between the observed and predicted antibacterial activities [ABi(bs)] of the AMPs. The areas under the ROC curves were used as final confirmations of the quality and predictive power of our model (Figure 2). In this sense, the area

Therefore, it is important to emphasize that our mt-descriptors (Box-Jenkins moving averages) have a deeper phenomenological meaning, because, in some way, they describe the fact that when a peptide is assayed against multiple bacterial strains, each peptidebacterial strain combination will be unique from an experimental point of view. 3.2. Physicochemical/Structural Interpretations of the Descriptors. The results reported in the previous section suggest that the mt-chemo-bioinformatic model developed in this work can be applied to the search for new and potent AMPs against many Gram-positive bacterial strains. A very important aspect is that the descriptors of the present model have simple physicochemical/structural interpretations. For instance, all the descriptors are based on the KH-formalism, which means that they are strongly associated with the molecular accessibility,69 i.e., the presence of regions, which can directly participate in effective interactions, leading to the occurrence of changes in the biological targets. From now on, we will give the interpretations of the descriptors explaining the factors responsible for the appearance and/or enhancement of the antibacterial activity of any peptide and how the descriptors should vary in order to increase this biological effect. As commented above, all the descriptors depend on both the peptide sequence and the bacterial strain against which each peptide was assayed. In order to provide a more accurate explanation, we will refer to the relative importance of the different descriptors, which have been assessed through the calculation of the absolute values of the standardized coefficients (Figure 3).

Figure 2. Illustrative representation of the ROC curves.

under the ROC curve had a value of 0.990 for both training and prediction sets. Such a value close to one is indicative of a very good performance, and, at the same time, it demonstrates that the present mt-chemo-bioinformatic model does not behave as a random classifier, for which the area is equal to 0.5. The whole analysis of the diverse statistical indices indicates the high efficiency of the mt-chemo-bioinformatic model for classifying/ predicting AMPs. These results are comparable with many of the models reported in the scientific literature, which have been focused on modeling the activities of the AMPs.11−17,19,20,22−27 A final detail is that the model is based on descriptors calculated from the sequences of the peptides, which means that the alignment rules are not needed. Consequently, the mt-chemobioinformatic model performs alignment-free predictions of antibacterial activities of peptides, preventing the high computational cost associated with the procedures/methodologies focused on the optimization of the 3D structures of the peptides. One of the advantages of the present mt-chemo-bioinformatic model is that the predictions are performed depending on the bacterial strain(s) against which the peptides may be active or not. In this work focused on the Box-Jenkins moving average approach, it was possible to represent the same peptide sequence as many times as bacterial strains were tested against the peptide because the value of the descriptor (which depends on both the sequence and the bacterial strain) will be different for each bacteria. For instance, in the case of the sequence AAKHAAHRA [it appears 6 times in our data set (Table S1)], the descriptor DMKH3(Z1) (which entered in the final mt-chemo-bioinformatic model) will have the following values depending on the sequence and the bacterial strains: −3.762 [Enterococcus faecalis], −6.603 [Streptococcus gordonii (ATCC 35105/Challis)], −6.432 [Streptococcus parasanguinis (ATCC 903)], −6.109 [Streptococcus mutans (ATCC 700610/UA159)], −6.430 [Streptococcus sanguinis (NY101)], and −4.572 [Staphylococcus aureus]. Differences can also be found in the corresponding descriptors of any other AMP sequence when the bacterial strains are changed.

Figure 3. Standardized coefficients as measures of the relative importance of the descriptors.

The most important descriptor in the mt-chemo-bioinformatic model is DMKH1(Pb), which means that the tendency of a peptide to be in a β-sheet conformation is detrimental to the increment of the antibacterial activity. In convergence with DMKH1(Pb), the descriptor DMKH1(Pt) (second most important) indicates that the diminution of the tendencies of the AMPs to be in β-turn conformations will increase their inhibitory activities against bacteria. Thus, if we use the joint interpretation of these first two descriptors, it is intuitive to deduce that the preferable geometric requirement for a remarkable enhancement of the antibacterial activity of any AMP against a defined strain is the presence of conformations other than β-sheet or β-turn. In this context, it seems that the conformations in AMPs such as α-helices guarantee a more adequate molecular accessibility, increasing the interactions with the bacterial strains, with possible effects such as the disruption of the cell membranes, and the subsequent death of the bacteria. 592

DOI: 10.1021/acs.jcim.5b00630 J. Chem. Inf. Model. 2016, 56, 588−598

Article

Journal of Chemical Information and Modeling Table 2. Relative Contributions of the Amino Acids to the Antibacterial Activities of the C-Terminal Amidated Peptide GLLSVLGSVAKHVLPHVVPVIAEHL through the Deletion Process amino acid

position

Staphylococcus aureus (ATCC 25923)

Staphylococcus epidermidis

Staphylococcus aureus (ATCC 29213)

Staphylococcus aureus

Enterococcus faecalis

Listeria innocua

G L L S V L G S V A K H V L P H V V P V I A E H L

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

−1.928 0.864 0.864 −1.798 0.158 0.885 −2.239 −1.706 0.150 −1.040 0.927 0.820 0.190 0.999 −0.531 0.820 0.202 0.202 −0.558 0.185 0.581 −1.033 −0.478 0.610 0.734

−1.898 0.894 0.894 −1.768 0.188 0.915 −2.209 −1.676 0.180 −1.010 0.957 0.850 0.220 1.029 −0.501 0.850 0.232 0.232 −0.528 0.215 0.611 −1.003 −0.448 0.640 0.764

−1.868 0.924 0.924 −1.738 0.218 0.944 −2.179 −1.647 0.209 −0.981 0.987 0.880 0.250 1.059 −0.472 0.880 0.261 0.261 −0.498 0.245 0.641 −0.974 −0.418 0.670 0.794

−1.889 0.903 0.903 −1.759 0.197 0.923 −2.200 −1.668 0.188 −1.002 0.966 0.859 0.229 1.038 −0.493 0.859 0.240 0.240 −0.519 0.224 0.620 −0.995 −0.439 0.649 0.772

−1.629 1.163 1.163 −1.499 0.456 1.183 −1.940 −1.408 0.448 −0.742 1.225 1.118 0.488 1.298 −0.233 1.118 0.500 0.500 −0.259 0.483 0.879 −0.735 −0.180 0.908 1.032

−1.846 0.946 0.946 −1.716 0.240 0.966 −2.157 −1.624 0.232 −0.958 1.009 0.902 0.272 1.081 −0.449 0.902 0.284 0.284 −0.476 0.267 0.663 −0.951 −0.396 0.692 0.816

voluminous amino acids seems to be important for the augmentation of the antibacterial activity. 3.3. Computing the Contributions of the Amino Acids to the Antibacterial Activity of an AMP. Even when the current computational models reported in the first section can be applied as promising tools for virtual screening of AMPs, much more information can be gathered from them. In this sense, such models may also be used as knowledge generators. For instance, nowadays, it is known that many AMPs exert very high inhibitory activity against several pathogens, and the common factor in all of them is that they are rich in certain types of amino acids.70−73 This foresees two important facts. First, some amino acids have an intrinsic influence in the activity of the AMPs. Second, this influence may be more or less affected, depending on the position of the amino acid in the peptide sequence. Consequently, it is imperative to study and analyze the contributions of the amino acids in the appearance and/or enhancement of the antibacterial activity of an AMP. In this work, we applied two different approaches for the calculation of such contributions. The C-terminal amidated peptide GLLSVLGSVAKHVLPHVVPVIAEHL was used as a first case of study for the calculations. This AMP is present in our data set and was experimentally assayed against different isolates/strains of Staphylococcus aureus, displaying also high antibacterial activity against Staphylococcus epidermidis, Enterococcus faecalis, and Listeria innocua. The AMP under study was annotated as active according to the cutoff MIC ≤ 11.965 μM. This peptide was correctly classified by our mtchemo-bioinformatic model as active against all these bacteria. In order to determine the contribution of each amino acid to the antibacterial activity of this AMP, we used a concept of genetics known as deletion.74 This concept refers to the occurrence of a mutation due to the absence of genetic material in a defined

This spatial characteristic has been reported as one of the main factors associated with highly active AMPs.70 A descriptor with positive influence in the antibacterial activity is DMKH3(Z1), which expresses the increment in the hydrophobicity of fragments/regions formed by a maximum of three amino acids (tripeptides). In the mt-chemo-bioinformatic model, DMKH3(Z1) has the lowest significance. However, this descriptor must not be misinterpreted. It is evident that the presence of hydrophobic regions in the AMPs is very important for the development of the antibacterial activity. However, the mt-chemo-bioinformatic model created here suggests that the enhancement of the antibacterial activity is more influenced by descriptors other than DMKH3(Z1). Steric factors have a crucial importance in potentiating the antibacterial activity of the AMPs. Indeed, in our model there are three descriptors related to the steric features in different ways. First, DMKH3(Z2) (the fifth most important descriptor) characterizes the increment of the bulkiness of the AMPs in fragments/regions formed with a maximum of three amino acids (tripeptides). On the other hand, DMKH4(ISA) indicates the increase in the isotropic surface area of the AMPs in fragments/ regions containing a maximum of four amino acids. Notice that, for an AMP, DMKH4(ISA) (the fourth most significant descriptor) is the resultant of polar and nonpolar regions. Consequently, in some way, DMKH4(ISA) embodies the amphipathicity,70 a property which is strongly associated with the enhancement of the antibacterial activity of the AMPs. Finally, in convergence with the descriptor DMKH3(Z2), DMKH5(Mw) (the third most meaningful descriptor) involves the increment of the molecular weight in fragments/regions containing a maximum of five amino acids; the presence of 593

DOI: 10.1021/acs.jcim.5b00630 J. Chem. Inf. Model. 2016, 56, 588−598

Article

Journal of Chemical Information and Modeling Table 3. Relative Contributions of the Amino Acids to the Antibacterial Activities of the C-Terminal Amidated Peptide GLLSVLGSVAKHVLPHVVPVIAEHL through Alanine Scanning amino acid

position

Staphylococcus aureus (ATCC 25923)

Staphylococcus epidermidis

Staphylococcus aureus (ATCC 29213)

Staphylococcus aureus

Enterococcus faecalis

Listeria innocua

G L L S V L G S V A K H V L P H V V P V I A E H L

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

−2.337 0.732 0.924 −1.661 0.233 0.919 −2.167 −1.667 0.062 −0.957 0.855 0.867 0.285 1.033 −0.402 0.871 0.255 0.243 −0.437 0.215 0.495 −0.957 −0.534 0.571 0.422

−2.307 0.762 0.955 −1.630 0.263 0.949 −2.137 −1.637 0.092 −0.927 0.886 0.898 0.315 1.064 −0.372 0.901 0.285 0.273 −0.407 0.246 0.526 −0.927 −0.503 0.602 0.452

−2.277 0.792 0.985 −1.601 0.293 0.979 −2.107 −1.607 0.122 −0.897 0.916 0.928 0.345 1.093 −0.342 0.931 0.315 0.303 −0.377 0.276 0.555 −0.897 −0.474 0.632 0.482

−2.298 0.771 0.963 −1.622 0.272 0.958 −2.128 −1.628 0.101 −0.918 0.894 0.906 0.324 1.072 −0.363 0.910 0.294 0.282 −0.398 0.254 0.534 −0.918 −0.495 0.610 0.461

−2.037 1.033 1.225 −1.360 0.533 1.219 −1.867 −1.366 0.363 −0.656 1.156 1.168 0.585 1.334 −0.102 1.171 0.555 0.544 −0.137 0.516 0.796 −0.656 −0.233 0.872 0.722

−2.254 0.815 1.007 −1.578 0.315 1.001 −2.085 −1.584 0.145 −0.874 0.938 0.950 0.367 1.116 −0.320 0.954 0.337 0.326 −0.355 0.298 0.578 −0.874 −0.451 0.654 0.504

of a weighted arithmetic mean. Finally, a standardization procedure was carried out. Thus, the mean of all WSa values was subtracted from each WSa value, and the result was divided by the standard deviation of the WSa values. From this procedure, the standardized WSa values were obtained, representing the relative contributions of the amino acids to the antibacterial activity of the AMP under study, which depended on the bacterial strains against which the AMP was assayed. The summary of the contributions appears in Table 2, while the details of these calculations can be found in the SI (Tables S7 and S8). By analyzing Table 2, one can deduce that some amino acids such as leucine (L), lysine (K), histidine (H), isoleucine (I), and valine (V) have positive contributions to the antibacterial activity of the C-terminal amidated peptide GLLSVLGSVAKHVLPHVVPVIAEHL. It can be observed that the AMP has a relatively high content of leucine (20%), and this amino acid has a higher contribution at the beginning and the middle of the AMP sequence than at the end (last five or six amino acids). Similar behavior can be observed for valine (24% of the composition). These facts clearly explain that the antibacterial activity of the peptide depends on the intrinsic properties of certain amino acids and their positions in the sequence. On the other hand, the analysis of amino acids such as glycine (G), serine (S), alanine (A), proline (P), and glutamic acid (E) suggests that they are detrimental to the antibacterial activity, regardless of their positions in the peptide sequence. It should be noticed that we have used deletion as the central concept. Therefore, if two (or more) amino acids of the same type are adjacent, they will have exactly the same contribution because the mutated sequence will be the same. For this reason, we have used a second approach based on the concept known as

location (chromosome, DNA, etc.). Bearing in mind this concept, we subtracted (from the AMP sequence) the amino acid located in the position k. Here, k ranges from 1 to L = 25 (length of the AMP). This allowed the formation of a mutated sequence containing the union of the amino acids in the positions k + 1 and k − 1. After, the descriptors of both the original and mutated sequences were calculated according to all the equations explained above. The substitution of DMKHmi(PP) descriptors in eq 6 generated two scores of antibacterial activity (one per each sequence). By subtracting the score of the mutated sequence from the score of the original sequence, a score named Sa1 was obtained. In this context, Sa1 expresses the influence of the amino acid (in the position k) to the antibacterial activity of the AMP under analysis. This procedure was applied to all the bacteria mentioned above. A second score annotated as Sa2 was calculated by using the descriptors of the amino acid in the position k that was subtracted from the original sequence. Therefore, Sa2 expresses the intrinsic influence of the amino acid (in the position k) as a simple fragment, i.e., regardless of its molecular environment. The calculation of Sa2 was also applied to all the aforementioned bacteria. Then, we used the following mathematical formalism WSa =

n1 n2 ·Sa1 + ·Sa2 L L

(7)

It should be emphasized that eq 7 has a general use, and it is invariant to the type of mutation that took place in a peptide. Here, n2 = 1 because it refers to the amino acid in the position k that has been subtracted/mutated. On the other hand, n1 is the number of amino acids (in the original sequence), which were not subtracted/mutated. At the same time, the condition n1 + n2 = L is always valid. Consequently, WSa is the classical definition 594

DOI: 10.1021/acs.jcim.5b00630 J. Chem. Inf. Model. 2016, 56, 588−598

Article

Journal of Chemical Information and Modeling Table 4. Relative Contributions of the Amino Acids to the Antibacterial Potencies of the Peptide GIGKFLHSAKKFGKAFVGEIMNS through the Deletion Process amino acid

position

Streptococcus pneumoniae

Staphylococcus aureus

Streptococcus mutans

Staphylococcus aureus (ATCC 25923)

Streptococcus mutans (ATCC 25175/JCM 5705/ KCTC 3065)

G I G K F L H S A K K F G K A F V G E I M N S

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

−1.350 0.353 −1.542 0.813 1.282 0.824 0.724 −1.191 −0.662 0.791 0.791 1.243 −1.683 0.770 −0.702 1.208 0.222 −1.641 −0.228 0.476 0.897 −0.327 −0.996

−1.320 0.383 −1.513 0.843 1.311 0.853 0.754 −1.162 −0.632 0.820 0.820 1.273 −1.654 0.800 −0.673 1.238 0.251 −1.611 −0.199 0.506 0.927 −0.298 −0.967

−1.390 0.313 −1.583 0.773 1.241 0.783 0.684 −1.232 −0.702 0.751 0.751 1.203 −1.724 0.730 −0.742 1.168 0.182 −1.681 −0.268 0.436 0.857 −0.368 −1.037

−1.351 0.353 −1.543 0.812 1.281 0.823 0.723 −1.192 −0.663 0.790 0.790 1.242 −1.684 0.769 −0.703 1.207 0.221 −1.641 −0.229 0.475 0.896 −0.328 −0.997

−1.354 0.349 −1.546 0.809 1.278 0.820 0.720 −1.195 −0.666 0.787 0.787 1.239 −1.687 0.766 −0.706 1.204 0.218 −1.645 −0.232 0.472 0.893 −0.331 −1.000

Table 5. Relative Contributions of the Amino Acids to the Antibacterial Potencies of the Peptide GIGKFLHSAKKFGKAFVGEIMNS through Alanine Scanning amino acid

position

Streptococcus pneumoniae

Staphylococcus aureus

Streptococcus mutans

Staphylococcus aureus (ATCC 25923)

Streptococcus mutans (ATCC 25175/JCM 5705/ KCTC 3065)

G I G K F L H S A K K F G K A F V G E I M N S

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

−1.600 0.303 −1.479 0.828 1.324 0.877 0.725 −1.183 −0.594 0.735 0.822 1.254 −1.574 0.714 −0.594 1.146 0.227 −1.544 −0.169 0.513 0.901 −0.415 −1.142

−1.571 0.332 −1.450 0.857 1.353 0.907 0.754 −1.154 −0.565 0.764 0.851 1.283 −1.545 0.743 −0.565 1.175 0.257 −1.515 −0.140 0.543 0.930 −0.386 −1.113

−1.641 0.263 −1.520 0.788 1.284 0.837 0.684 −1.224 −0.635 0.695 0.782 1.214 −1.614 0.673 −0.635 1.106 0.187 −1.584 −0.210 0.473 0.861 −0.455 −1.183

−1.601 0.302 −1.480 0.827 1.323 0.877 0.724 −1.184 −0.595 0.734 0.821 1.253 −1.575 0.713 −0.595 1.145 0.227 −1.545 −0.170 0.513 0.901 −0.416 −1.143

−1.605 0.299 −1.483 0.824 1.320 0.873 0.720 −1.187 −0.599 0.731 0.818 1.250 −1.578 0.710 −0.599 1.142 0.223 −1.548 −0.174 0.509 0.897 −0.419 −1.146

position k, the amino acid in that position was replaced by alanine. The results related to the calculations of the contributions of the amino acids using alanine scanning are depicted in Table 3 and the SI (Tables S7 and S8). All the explanations given for the procedure involving deletion are convergent with those which can be extracted from alanine

alanine scanning, which is an experimental technique widely employed in molecular biology with the aim of determining the contribution of any amino acid to the stability and/or function of a protein.75 For alanine scanning, all the operations were realized exactly in the same way as in the case of deletion. The only difference is that instead of deleting the amino acid in the 595

DOI: 10.1021/acs.jcim.5b00630 J. Chem. Inf. Model. 2016, 56, 588−598

Journal of Chemical Information and Modeling



scanning. However, this last method has the advantage of discriminating between two or more adjacent amino acids of the same type. Until now, we have analyzed the effects of the amino acids in the antibacterial activity of an AMP annotated as active in our data set. In order to complete this study, we applied all the procedures mentioned above to the calculation of the contributions of the amino acids in the peptide GIGKFLHSAKKFGKAFVGEIMNS, which was annotated as inactive in the data set. This AMP was experimentally tested against isolates/ strains belonging to Streptococcus pneumoniae, Staphylococcus aureus, and Streptococcus mutans. The AMP was correctly classified by the mt-chemo-bioinformatic model as inactive against all these bacteria. The results of the calculations of the contributions are depicted in Tables 4 and 5, as well as in the SI (Tables S7 and S9). If the same bacterial species is considered, the analysis of the contributions in the inactive AMP shows certain similarity with respect to the analysis of the contributions in the previous AMP annotated as active. This means that all the amino acids, which are common to both AMPs, have the same signs in their contributions. Additionally, in the case of the inactive AMP, new amino acids such as phenylalanine (F) and methionine (M) have positive contributions, while others, such as asparagine (N), have a negative influence in the antibacterial activity. We would like to emphasize that many AMPs considered as active can contain several amino acids with negative contributions. At the same time, various amino acids with positive contributions can be present in AMPs annotated as inactive. For this reason, the presence or absence of a defined amino acid is not a sufficient condition for the appearance/enhancement of the antibacterial activity. Only the combination of the diverse amino acids, as well as their positions in the peptide sequence, will be the definitive aspects accounting for the improvement of the antibacterial profile. In any case, the calculation of the contributions of the amino acids to the antibacterial activity of the AMPs permits the deletion of those amino acids with negative contributions or their replacements by other amino acids with positive contributions. This may lead to the optimization of the activity of a peptide against a specific bacterium or the augmentation of the versatility of that peptide as an antibacterial agent against multiple bacterial strains.

Article

ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jcim.5b00630. Collection of tables/spreadsheets, which store all the chemical and biological data, as well as the results of the classifications, and the results of the calculations of amino acid contributions (XLS)



AUTHOR INFORMATION

Corresponding Author

*Fax: 351 220402659. E-mail: [email protected]. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This work is supported by Grant No. Pest-C/EQB/LA0006/ 2013, financed by the Portuguese FCT − Fundaçaõ para a Ciencia e a Tecnologia. The authors are grateful for the joint financial support given by the Portuguese FCT/MEC and FEDER (Project No. UID/QUI/50006/2013). Prof. Juan M. Ruso also acknowledges the financial support given by MICINNSpain (Project No. MAT2011-25501).



REFERENCES

(1) Ryan, K. J.; Ray, C. G. Sherris Medical Microbiology. An Introduction to Infectious Diseases, 4th ed.; McGraw-Hill Companies, Inc: AZ, 2004. (2) Brachman, P. S.; Abrutyn, E. Bacterial Infections of Humans: Epidemiology and Control, 4th ed.; Springer Science+Business Media, LLC: New York, NY, 2009. (3) Engler, A. C.; Wiradharma, N.; Ong, Z. Y.; Coady, D. J.; Hedrick, J. L.; Yang, Y. Y. Emerging Trends in Macromolecular Antimicrobials to Fight Multi-Drug-Resistant Infections. Nano Today 2012, 7, 201−222. (4) Fjell, C. D.; Hiss, J. A.; Hancock, R. E.; Schneider, G. Designing Antimicrobial Peptides: Form Follows Function. Nat. Rev. Neurosci. 2012, 11, 37−51. (5) Shai, Y. Mode of Action of Membrane Active Antimicrobial Peptides. Biopolymers 2002, 66, 236−248. (6) Zhang, L.; Rozek, A.; Hancock, R. E. Interaction of Cationic Antimicrobial Peptides with Model Membranes. J. Biol. Chem. 2001, 276, 35714−35722. (7) Jenssen, H.; Hamill, P.; Hancock, R. E. Peptide Antimicrobial Agents. Clin. Microbiol. Rev. 2006, 19, 491−511. (8) Brogden, K. A. Antimicrobial Peptides: Pore Formers or Metabolic Inhibitors in Bacteria? Nat. Rev. Microbiol. 2005, 3, 238−250. (9) Bahar, A. A.; Ren, D. Antimicrobial Peptides. Pharmaceuticals 2013, 6, 1543−1575. (10) Aguilera-Mendoza, L.; Marrero-Ponce, Y.; Tellez-Ibarra, R.; Llorente-Quesada, M. T.; Salgado, J.; Barigye, S. J.; Liu, J. Overlap and Diversity in Antimicrobial Peptide Databases: Compiling a NonRedundant Set of Sequences. Bioinformatics 2015, 31, 2553−2559. (11) Jenssen, H.; Fjell, C. D.; Cherkasov, A.; Hancock, R. E. QSAR Modeling and Computer-Aided Design of Antimicrobial Peptides. J. Pept. Sci. 2008, 14, 110−114. (12) Fjell, C. D.; Jenssen, H.; Hilpert, K.; Cheung, W. A.; Pante, N.; Hancock, R. E.; Cherkasov, A. Identification of Novel Antibacterial Peptides by Chemoinformatics and Machine Learning. J. Med. Chem. 2009, 52, 2006−2015. (13) Cherkasov, A.; Hilpert, K.; Jenssen, H.; Fjell, C. D.; Waldbrook, M.; Mullaly, S. C.; Volkmer, R.; Hancock, R. E. Use of Artificial Intelligence in the Design of Small Peptide Antibiotics Effective against a Broad Spectrum of Highly Antibiotic-Resistant Superbugs. ACS Chem. Biol. 2009, 4, 65−74.

4. CONCLUSIONS With the fast development of the theoretical approaches in bioinformatics and related disciplines, it is possible to enable and even speed up the search for more efficient antibacterial peptides. The use of the current advanced computational methods may provide an easier and more accurate manner to accomplish this task. Our mt-chemo-bioinformatic model, which was generated from a large and heterogeneous data set of peptides with dissimilar values of inhibitory potency, constitutes a promising tool for the simultaneous virtual discovery of AMPs with enhanced antibacterial activity. The concepts, procedures, and results associated with the calculations of the contributions of the amino acids to the antibacterial activities of the AMPs can pave the way to the creation of large libraries of new, potent, and versatile AMPs, where the modifications/mutations are realized according to the principles of the present mt-chemobioinformatic model used as a knowledge generator. This work envisages the horizons where AMPs with wide-spectrum antibacterial activities can be designed in a cost-effective and rational way. 596

DOI: 10.1021/acs.jcim.5b00630 J. Chem. Inf. Model. 2016, 56, 588−598

Article

Journal of Chemical Information and Modeling

Target Inhibitors for Proteins Associated with HIV Infection. Mol. BioSyst. 2012, 8, 2188−2196. (34) Speck-Planche, A.; Luan, F.; Cordeiro, M. N. D. S. Abelson Tyrosine-Protein Kinase 1 as Principal Target for Drug Discovery against Leukemias. Role of the Current Computer-Aided Drug Design Methodologies. Curr. Top. Med. Chem. 2012, 12, 2745−2762. (35) Speck-Planche, A.; Kleandrova, V. V.; Luan, F.; Cordeiro, M. N. D. S. Multi-Target Inhibitors for Proteins Associated with Alzheimer: In Silico Discovery Using Fragment-Based Descriptors. Curr. Alzheimer Res. 2013, 10, 117−124. (36) Speck-Planche, A.; Kleandrova, V. V.; Luan, F.; Cordeiro, M. N. D. S. Rational Drug Design for Anti-Cancer Chemotherapy: MultiTarget QSAR Models for the In Silico Discovery of Anti-Colorectal Cancer Agents. Bioorg. Med. Chem. 2012, 20, 4848−4855. (37) Speck-Planche, A.; Kleandrova, V. V.; Luan, F.; Cordeiro, M. N. D. S. Unified Multi-Target Approach for the Rational In Silico Design of Anti-Bladder Cancer Agents. Anti-Cancer Agents Med. Chem. 2013, 13, 791−800. (38) Concu, R.; Dea-Ayuela, M. A.; Perez-Montoto, L. G.; BolasFernandez, F.; Prado-Prado, F. J.; Podda, G.; Uriarte, E.; Ubeira, F. M.; Gonzalez-Diaz, H. Prediction of Enzyme Classes from 3D Structure: A General Model and Examples of Experimental-Theoretic Scoring of Peptide Mass Fingerprints of Leishmania Proteins. J. Proteome Res. 2009, 8, 4372−4382. (39) Concu, R.; Dea-Ayuela, M. A.; Perez-Montoto, L. G.; PradoPrado, F. J.; Uriarte, E.; Bolas-Fernandez, F.; Podda, G.; Pazos, A.; Munteanu, C. R.; Ubeira, F. M.; Gonzalez-Diaz, H. 3D Entropy and Moments Prediction of Enzyme Classes and Experimental-Theoretic Study of Peptide Fingerprints in Leishmania Parasites. Biochim. Biophys. Acta, Proteins Proteomics 2009, 1794, 1784−1794. (40) Concu, R.; Podda, G.; Uriarte, E.; Gonzalez-Diaz, H. Computational Chemistry Study of 3D-Structure-Function Relationships for Enzymes Based on Markov Models for Protein Electrostatic, Hint, and van der Waals Potentials. J. Comput. Chem. 2009, 30, 1510−1520. (41) Munteanu, C. R.; Magalhaes, A. L.; Uriarte, E.; Gonzalez-Diaz, H. Multi-Target QPDR Classification Model for Human Breast and Colon Cancer-Related Proteins Using Star Graph Topological Indices. J. Theor. Biol. 2009, 257, 303−311. (42) Prado-Prado, F. J.; Garcia-Mera, X.; Gonzalez-Diaz, H. MultiTarget Spectral Moment QSAR Versus ANN for Antiparasitic Drugs against Different Parasite Species. Bioorg. Med. Chem. 2010, 18, 2225− 2231. (43) Speck-Planche, A.; Cordeiro, M. N. D. S. Simultaneous Modeling of Antimycobacterial Activities and ADMET Profiles: A Chemoinformatic Approach to Medicinal Chemistry. Curr. Top. Med. Chem. 2013, 13, 1656−1665. (44) Speck-Planche, A.; Cordeiro, M. N. D. S. Chemoinformatics for Medicinal Chemistry: In Silico Model to Enable the Discovery of Potent and Safer Anti-Cocci Agents. Future Med. Chem. 2014, 6, 2013−2028. (45) Gogoladze, G.; Grigolava, M.; Vishnepolsky, B.; Chubinidze, M.; Duroux, P.; Lefranc, M. P.; Pirtskhalava, M. DBAASP: Database of Antimicrobial Activity and Structure of Peptides. FEMS Microbiol. Lett. 2014, 357, 63−68. (46) Verslyppe, B.; De Smet, W.; De Baets, B.; De Vos, P.; Dawyndt, P. Straininfo Introduces Electronic Passports for Microorganisms. Syst. Appl. Microbiol. 2014, 37, 42−50. (47) Leitner, T. Format Converter [Online]. http://www.HIV.lanl. gov/content/sequence/FORMAT_CONVERSION/form.html (accessed September 2015). (48) Kuiken, C.; Korber, B.; Shafer, R. W. HIV Sequence Databases. AIDS Rev. 2003, 5, 52−61. (49) Gaschen, B.; Kuiken, C.; Korber, B.; Foley, B. Retrieval and onthe-Fly Alignment of Sequence Fragments from the HIV Database. Bioinformatics 2001, 17, 415−418. (50) Leitner, T.; Foley, B.; Korber, B.; Apetrei, C.; Hahn, B.; Mizrachi, I.; Mullins, J.; Rambaut, A.; Wolinsky, S. HIV Sequence Database [Online]. http://www.HIV.lanl.gov/content/sequence/HIV/ mainpage.html (accessed September 2015).

(14) Torrent, M.; Andreu, D.; Nogues, V. M.; Boix, E. Connecting Peptide Physicochemical and Antimicrobial Properties by a Rational Prediction Model. PLoS One 2011, 6, e16968. (15) Mooney, C.; Haslam, N. J.; Holton, T. A.; Pollastri, G.; Shields, D. C. Peptidelocator: Prediction of Bioactive Peptides in Protein Sequences. Bioinformatics 2013, 29, 1120−1126. (16) Porto, W. F.; Pires, A. S.; Franco, O. L. Cs-Amppred: An Updated SVM Model for Antimicrobial Activity Prediction in Cysteine-Stabilized Peptides. PLoS One 2012, 7, e51444. (17) Ng, X. Y.; Rosdi, B. A.; Shahrudin, S. Prediction of Antimicrobial Peptides Based on Sequence Alignment and Support Vector MachinePairwise Algorithm Utilizing LZ-Complexity. BioMed Res. Int. 2015, 2015, 212715. (18) Khosravian, M.; Faramarzi, F. K.; Beigi, M. M.; Behbahani, M.; Mohabatkar, H. Predicting Antibacterial Peptides by the Concept of Chou’s Pseudo-Amino Acid Composition and Machine Learning Methods. Protein Pept. Lett. 2013, 20, 180−186. (19) Lira, F.; Perez, P. S.; Baranauskas, J. A.; Nozawa, S. R. Prediction of Antimicrobial Activity of Synthetic Peptides by a Decision Tree Model. Appl. Environ. Microbiol. 2013, 79, 3156−3159. (20) Khamis, A. M.; Essack, M.; Gao, X.; Bajic, V. B. Distinct Profiling of Antimicrobial Peptide Families. Bioinformatics 2015, 31, 849−856. (21) Xiao, X.; Wang, P.; Lin, W. Z.; Jia, J. H.; Chou, K. C. Iamp-2l: A Two-Level Multi-Label Classifier for Identifying Antimicrobial Peptides and Their Functional Types. Anal. Biochem. 2013, 436, 168−177. (22) Juretic, D.; Vukicevic, D.; Ilic, N.; Antcheva, N.; Tossi, A. Computational Design of Highly Selective Antimicrobial Peptides. J. Chem. Inf. Model. 2009, 49, 2873−2882. (23) Wang, P.; Hu, L.; Liu, G.; Jiang, N.; Chen, X.; Xu, J.; Zheng, W.; Li, L.; Tan, M.; Chen, Z.; Song, H.; Cai, Y. D.; Chou, K. C. Prediction of Antimicrobial Peptides Based on Sequence Alignment and Feature Selection Methods. PLoS One 2011, 6, e18476. (24) Melo, M. N.; Ferre, R.; Feliu, L.; Bardaji, E.; Planas, M.; Castanho, M. A. Prediction of Antibacterial Activity from Physicochemical Properties of Antimicrobial Peptides. PLoS One 2011, 6, e28549. (25) Vishnepolsky, B.; Pirtskhalava, M. Prediction of Linear Cationic Antimicrobial Peptides Based on Characteristics Responsible for Their Interaction with the Membranes. J. Chem. Inf. Model. 2014, 54, 1512− 1523. (26) Freire, J. M.; Almeida Dias, S.; Flores, L.; Veiga, A. S.; Castanho, M. A. Mining Viral Proteins for Antimicrobial and Cell-Penetrating Drug Delivery Peptides. Bioinformatics 2015, 31, 2252−2256. (27) Chang, K. Y.; Lin, T. P.; Shih, L. Y.; Wang, C. K. Analysis and Prediction of the Critical Regions of Antimicrobial Peptides Based on Conditional Random Fields. PLoS One 2015, 10, e0119490. (28) Toropova, M. A.; Veselinovic, A. M.; Veselinovic, J. B.; Stojanovic, D. B.; Toropov, A. A. QSAR Modeling of the Antimicrobial Activity of Peptides as a Mathematical Function of a Sequence of Amino Acids. Comput. Biol. Chem. 2015, 59 (Pt A), 126−130. (29) Toropov, A. A.; Toropova, A. P.; Raska, I., Jr.; Benfenati, E.; Gini, G. QSAR Modeling of Endpoints for Peptides Which Is Based on Representation of the Molecular Structure by a Sequence of Amino Acids. Struct. Chem. 2012, 23, 1891−1904. (30) Cronin, M. T.; Jaworska, J. S.; Walker, J. D.; Comber, M. H.; Watts, C. D.; Worth, A. P. Use of QSARs in International DecisionMaking Frameworks to Predict Health Effects of Chemical Substances. Environ. Health Perspect. 2003, 111, 1391−1401. (31) Garcia, I.; Fall, Y.; Gomez, G.; Gonzalez-Diaz, H. First Computational Chemistry Multi-Target Model for Anti-Alzheimer, Anti-Parasitic, Anti-Fungi, and Anti-Bacterial Activity of GSK-3 Inhibitors in Vitro, in Vivo, and in Different Cellular Lines. Mol. Diversity 2011, 15, 561−567. (32) Marzaro, G.; Chilin, A.; Guiotto, A.; Uriarte, E.; Brun, P.; Castagliuolo, I.; Tonus, F.; Gonzalez-Diaz, H. Using the TOPS-MODE Approach to Fit Multi-Target QSAR Models for Tyrosine Kinases Inhibitors. Eur. J. Med. Chem. 2011, 46, 2185−2192. (33) Speck-Planche, A.; Kleandrova, V. V.; Luan, F.; Cordeiro, M. N. D. S. A Ligand-Based Approach for the In Silico Discovery of Multi597

DOI: 10.1021/acs.jcim.5b00630 J. Chem. Inf. Model. 2016, 56, 588−598

Article

Journal of Chemical Information and Modeling (51) ChemAxon-Team Chemaxon. Standardizer (Tool for Structure Canonicalization and Transformation), Jchem, v6.1.4; Budapest, Hungary, 1998−2014. (52) Yap, C. W. Padel-Descriptor: An Open Source Software to Calculate Molecular Descriptors and Fingerprints. J. Comput. Chem. 2011, 32, 1466−1474. (53) Yap, C. W. Padel-Descriptor [Online], v2.21. http://padel.nus. edu.sg/software/padeldescriptor/ (accessed September 2015). (54) Gonzalez-Diaz, H.; Prado-Prado, F. J.; Santana, L.; Uriarte, E. Unify QSAR Approach to Antimicrobials. Part 1: Predicting Antifungal Activity against Different Species. Bioorg. Med. Chem. 2006, 14, 5973− 5980. (55) Prado-Prado, F. J.; Gonzalez-Diaz, H.; Santana, L.; Uriarte, E. Unified QSAR Approach to Antimicrobials. Part 2: Predicting Activity against More Than 90 Different Species in Order to Halt Antibacterial Resistance. Bioorg. Med. Chem. 2007, 15, 897−902. (56) Prado-Prado, F. J.; Gonzalez-Diaz, H.; de la Vega, O. M.; Ubeira, F. M.; Chou, K. C. Unified QSAR Approach to Antimicrobials. Part 3: First Multi-Tasking QSAR Model for Input-Coded Prediction, Structural Back-Projection, and Complex Networks Clustering of Antiprotozoal Compounds. Bioorg. Med. Chem. 2008, 16, 5871−5880. (57) Todeschini, R.; Consonni, V. Handbook of Molecular Descriptors; WILEY-VCH Verlag GmbH: Weinheim, New York, Chichester, Brisbane, Singapore, Toronto, 2000. (58) Todeschini, R.; Consonni, V. Molecular Descriptors for Chemoinformatics; WILEY-VCH Verlag GmbH & Co. KGaA: Weinheim, 2009. (59) Ruiz-Blanco, Y. B.; Paz, W.; Green, J.; Marrero-Ponce, Y. Protdcal: A Program to Compute General-Purpose-Numerical Descriptors for Sequences and 3D-Structures of Proteins. BMC Bioinf. 2015, 16, 162. (60) Ruiz-Blanco, Y.; Marrero-Ponce, Y. ProtDCal [Online], v1.0. http://bioinf.sce.carleton.ca/ProtDCal/ (accessed September 2015). (61) Ruiz-Blanco, Y. B.; Marrero-Ponce, Y.; Garcia, Y.; Puris, A.; Bello, R.; Green, J.; Sotomayor-Torres, C. M. A Physics-Based Scoring Function for Protein Structural Decoys: Dynamic Testing on Targets of CASP-ROLL. Chem. Phys. Lett. 2014, 610−611, 135−140. (62) Ruiz-Blanco, Y. B.; Marrero-Ponce, Y.; Paz, W.; Garcia, Y.; Salgado, J. Global Stability of Protein Folding from an Empirical Free Energy Function. J. Theor. Biol. 2013, 321, 44−53. (63) Ruiz-Blanco Yasser, B.; Garcia, Y.; Sotomayor-Torres, C. M.; Marrero-Ponce, Y. New Set of 2D/3D Thermodynamic Indices for Proteins. A Formalism Based on “Molten Globule” Theory. Phys. Procedia 2010, 8, 63−72. (64) Hill, T.; Lewicki, P. Statistics Methods and Applications. A Comprehensive Reference for Science, Industry and Data Mining; StatSoft: Tulsa, 2006. (65) Martin, T. M.; Harten, P.; Young, D. M.; Muratov, E. N.; Golbraikh, A.; Zhu, H.; Tropsha, A. Does Rational Selection of Training and Test Sets Improve the Outcome of QSAR Modeling? J. Chem. Inf. Model. 2012, 52, 2570−2578. (66) Statsoft-Team Statistica. Data Analysis Software System, v6.0; Tulsa, 2001. (67) Hanczar, B.; Hua, J.; Sima, C.; Weinstein, J.; Bittner, M.; Dougherty, E. R. Small-Sample Precision of ROC-Related Estimates. Bioinformatics 2010, 26, 822−830. (68) Jurman, G.; Riccadonna, S.; Furlanello, C. A Comparison of MCC and CEN Error Measures in Multi-Class Prediction. PLoS One 2012, 7, e41882. (69) Estrada, E. Physicochemical Interpretation of Molecular Connectivity Indices. J. Phys. Chem. A 2002, 106, 9085−9091. (70) Ma, Q. Q.; Lv, Y. F.; Gu, Y.; Dong, N.; Li, D. S.; Shan, A. S. Rational Design of Cationic Antimicrobial Peptides by the Tandem of Leucine-Rich Repeat. Amino Acids 2013, 44, 1215−1224. (71) Gopal, R.; Seo, C. H.; Song, P. I.; Park, Y. Effect of Repetitive Lysine-Tryptophan Motifs on the Bactericidal Activity of Antimicrobial Peptides. Amino Acids 2013, 44, 645−660. (72) McDonald, M.; Mannion, M.; Pike, D.; Lewis, K.; Flynn, A.; Brannan, A. M.; Browne, M. J.; Jackman, D.; Madera, L.; Power Coombs, M. R.; Hoskin, D. W.; Rise, M. L.; Booth, V. Structure-

Function Relationships in Histidine-Rich Antimicrobial Peptides from Atlantic Cod. Biochim. Biophys. Acta, Biomembr. 2015, 1848, 1451− 1461. (73) Kacprzyk, L.; Rydengard, V.; Morgelin, M.; Davoudi, M.; Pasupuleti, M.; Malmsten, M.; Schmidtchen, A. Antimicrobial Activity of Histidine-Rich Peptides Is Dependent on Acidic Conditions. Biochim. Biophys. Acta, Biomembr. 2007, 1768, 2667−2680. (74) Lewis, R. Human Genetics: Concepts and Applications, 6th ed.; McGraw-Hill: Boston, MA, 2005. (75) Morrison, K. L.; Weiss, G. A. Combinatorial Alanine-Scanning. Curr. Opin. Chem. Biol. 2001, 5, 302−307.

598

DOI: 10.1021/acs.jcim.5b00630 J. Chem. Inf. Model. 2016, 56, 588−598