Predictive in silico Modeling of Ionic Liquids toward Inhibition of the

To the best of our knowledge, only three groups of authors, namely Yan et al.,(5) .... A metric rmsd(32) characterizing the proximity of conformationa...
0 downloads 0 Views 2MB Size
Article pubs.acs.org/IECR

Predictive in silico Modeling of Ionic Liquids toward Inhibition of the Acetyl Cholinesterase Enzyme of Electrophorus electricus: A Predictive Toxicology Approach Rudra Narayan Das and Kunal Roy* Drug Theoretics and Cheminformatics Laboratory, Division of Medicinal and Pharmaceutical Chemistry, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700 032, India S Supporting Information *

ABSTRACT: Chemicals are the essential components of the industry for maneuvering the required need of the living ecosystem. Ionic liquids are a group of promising novel chemicals with potential usefulness toward various industrial applications, although they are not entirely devoid of hazardous outcomes. The present study is an attempt to investigate the chemical attributes of a wide variety of 292 ionic liquids toward their inhibitory potential of acetyl cholinesterase enzyme of electric eel through the development of predictive regression and classification-based quantitative mathematical models in the light of the OECD guidelines. Molecular docking studies have additionally corroborated the results. Hydrophilicity, hydrophobicity, branching, and positively charged N-species were observed to be the major chemical contributors to such toxicity. The docking studies chiefly portrayed the π-cationic type interaction of the cationic N+ atom with the Phe288, Phe290, and Trp23 residues of the acyl binding pocket to be responsible for enzyme inhibition.

1. INTRODUCTION Chemicals constitute an essential part of the industry for maneuvering the vast need of the human life by direct as well as indirect means. Development of new chemicals, both new discoveries as well as modification of the existing ones, is a continuous process attended by the chemical, pharmaceutical, and related industries. The two major facets that lead to the development and release of novel chemicals may be identified as their purpose specificity and their unwanted hazardous issues. The hazardous issue depicts the toxic manifestation of the industrial as well as household chemicals toward the entire living ecosystem including mankind. One of the most apprehensive situations encountered in the chemical industrial processes is the use of a considerable amount of volatile organic compounds (VOCs), which are not only toxic to the environment but also indulge a potential explosion risk to the operators because of their high volatility and flammability.1 Hence, there is an ongoing interest in the chemical industries as well as in the academia toward the development of new and sustainable chemicals as suitable alternatives to meet the demand of the desired chemical operations as well as the need of the consumers in the light of a healthy environmental balance. There is a focused study regarding the deployment of a “green engineering” system involving neoteric and environmentally benevolent chemicals. The ionic liquids (ILs) present themselves as useful alternative chemicals with various promising properties, viz. low vapor pressure, high electrochemical and thermal stability, wide liquidus range, etc. due to their new and tunable chemistry.2,3 However, recent toxicological studies have evidentially shown them to be potentially hazardous, thus necessitating proper assessment of their toxicological attributes.4,5 Hence, necessary modifications may be incorporated in ionic liquids for enhancement of their © 2013 American Chemical Society

environmental friendliness to replace prohibited industrial chemicals. The chemistry of compounds provides a wide opportunity to scientists for the design and development of purpose specific and harmless novel analogues. Rational strategies are always acceptable in this regard to minimize the amount of biological (including toxicological) assessments and thereby aid the development of potent analogues employing less resources. Quantitative structure−toxicity relationship (QSTR) studies present such an opportunity in exploring the encoded chemical information of molecules through the development of predictive mathematical models using selected experimental data.6 The QSTR paradigm complies with the famous “3R principle” of human experimental techniques7 and is advocated by different recognized international organizations, namely the European Union’s REACH (Registration, Evaluation, and Authorization of Chemicals) regulations,8 the European Commission’s European Centre for the Validation of Alternative Methods (ECVAM),9 the Council for International Organizations of Medical Sciences,10 the Office of Toxic Substances of the US Environmental Protection Agency,11 the Agency for Toxic Substances and Disease Registry,12 the Organization for Economic Cooperation and Development (OECD),13 etc. Following the development of predictive models, QSTR analysis attempts to investigate the mechanistic reason behind the toxic manifestation of the modeled chemicals to guide the design and development of desired novel analogues. Received: Revised: Accepted: Published: 1020

October 28, 2013 December 22, 2013 December 22, 2013 December 23, 2013 dx.doi.org/10.1021/ie403636q | Ind. Eng. Chem. Res. 2014, 53, 1020−1032

Industrial & Engineering Chemistry Research

Article

proposed OECD guidelines for QSAR model development,13 thus following strict rules for a definite data set selection, an easily explainable model building strategy followed by defining a suitable chemical domain. The models have been subjected to scrupulous validation strategies corresponding to goodness-offit, robustness, and predictivity, and finally possible explanations have been provided in order to explore the mechanism of action of the analyzed chemicals. 2.1. Data Set and Descriptors. A large set of 292 diverse ionic liquids containing both quantitative and qualititative toxicity data toward the inhibition of Electrophorus electricus AChE enzyme as determined by Ellman’s method has been adopted from the literature23 and employed toward the development of predictive QSTR models and docking analysis. The enzyme inhibitory concentration values were converted to molar negative logarithmic scale (pEC50, mol/L), and it was observed to span a range of 2.81 log units. The observed (experimental) values (pEC50, mol/L) for the whole data set can be found in Supporting Information Table S1. In order to maintain a precise and easily reproducible formalism, we have used the simple two-dimensional predictor variables for developing correlational models. The ETA indices, both of first and second generations,17,18 have been used in combination with various other topological non-ETA variables and atom-type fragment parameters to decode the chemical information of ionic liquids responsible for their AChE inhibitory potency. All the descriptors were computed for both the cationic and anionic species. The basic formalism of the ETA indices and a categorical list of all the employed descriptors have been provided in Supporting Information Tables S2 and S3, respectively. 2.2. Model Development. 2.2.1. Thinning of Descriptor Matrix, Data Set Division, and Threshold Selection. The nonETA and atom-centered fragment descriptor pool was pruned to a thinned matrix containing only relevant variables by removing those bearing variance less than 0.0001, while in the case of the ETA descriptors, some of the basic level indices that are needed to compute the final descriptors were omitted in the final descriptor pool. As a result, we obtained a total pool of 216 pruned predictor variables. The k-means cluster approach24 was employed for the division of the whole data set (n = 292) by taking approximately 62% compounds into the training set (ntraining = 182) and the remaining 38% into the test set (ntest = 110) in such a manner that each of the test set compounds lies in the neighborhood of the training set molecules (Figure 1). The list of molecules belonging to separate clusters can be found in Supporting Information Table S4. The whole set division was used for the classification analysis, while the chemicals showing qualitative values were excluded during the regression analysis from both the sets. In order to perform a binary classification analysis, i.e., two classes, we have considered Rivastigmine, a potent cholinesterase inhibitor, as the reference threshold toxic chemical, possessing an E. electrophorus AChE inhibitory value of 3.3 mol/L (pEC50),25 such that ionic liquids bearing pEC50 values equal to and higher than 3.3 will be considered in the toxic group and those having less than 3.3 in the nontoxic group. For the development of the classification models, the descriptors were further reduced using the molecular spectrum approach26 and the thinned variable pool was subjected to a standardization operation that had been described in the Supporting Information section. 2.2.2. Performed Statistical Analyses and Employed Chemometric Tools. We have provided a schematic

The present study aims to develop predictive QSTR models correlating the chemical features of ionic liquids with the inhibitory potential of the acetyl cholinesterase (AChE) enzyme of electric eel and also explores binding interactions at the receptor level by performing docking studies. AChE is very important in controlling the nervous functions of a living system by carrying out catalytic degradation of acetylcholine, inhibition of which leads to neuronal damage involving severe neuromuscular disorder (myasthenia) along with a number of biomedical problems.14 Hence, design of ILs with reduced propensity toward inhibition of AChE can improve their “greenness” in the maintenance of a sustainable chemical system. There is a paucity of predictive modeling studies of ionic liquids on this end point, i.e., AChE inhibition. To the best of our knowledge, only three groups of authors, namely Yan et al.,5 Arning et al.,15 and Torrecilla et al.,16 reported linear as well as nonlinear regression-based quantitative models on ILs toward AChE inhibition data. We have attempted to build predictive QSTR models on ILs following explicit regression and classification algorithms involving easily interpretable descriptors. During screening of a huge number of ionic liquids, the proposed classification model may act as a preliminary filter to classify them into the toxic and nontoxic agents, while the regression based equation can then be used to predict the exact AChE inhibitory values of the short listed chemicals only. Since, this study attempts not only to develop just mathematical equations but also to provide a mechanistic basis of the results, we have employed different chemometric algorithms to capture the chemical attributes of ionic liquids from different aspects, namely discriminant behavior and regression based correlation. The extended topochemical atom (ETA) indices, developed by the present authors’ group,17,18 have been used in combination with various other topological non-ETA and atom-type fragment descriptors as predictor variables, and the quantitative models obtained have been subjected to meticulous statistical tests involving multiple validation strategies. The OECD guidelines for QSAR model development13 have been scrupulously followed during the development, validation, and interpretation of the QSTR models. The ETA indices represent a group of two-dimensional descriptors having simplicity in definition, calculability, and interpretability. They capture not only the topological features but also the electronic parameters of chemicals responsible for their response. The ETA descriptors have been extensively used in toxicological modeling of a wide variety of chemicals belonging to food industries and chemical industries, including the pharmaceutical sectors.19 Recently, we have shown their potential role in correlating the ecotoxicity of ionic liquids toward different indicator organisms, representing bacteria (Vibrio f ischeri),20 water flee (Daphnia magna),21 and green algae (Scenedesmus vacuolatus).22 Hence, we have attempted here to explore the ability of ETA indices in unfolding the chemistry behind the eel AChE enzyme inhibitory potential of ionic liquids, and the results were observed to be interesting from the perspective of both the discriminant as well as regression analyses. We have additionally performed, for the first time, molecular docking studies to explore the features responsible for binding of the ILs into the AChE enzyme cavity, and the results support the findings of the QSTR analyses.

2. MATERIALS AND METHOD The classification-based as well as the regression-based predictive models were developed in consonance with the 1021

dx.doi.org/10.1021/ie403636q | Ind. Eng. Chem. Res. 2014, 53, 1020−1032

Industrial & Engineering Chemistry Research

Article

Apart from the predictive QSAR models, we have additionally opted for exploring the docking analysis32 of the ionic liquids. In order to maintain simplicity and also considering their more contributing features,33 cations were considered for the docking study. Owing to the nonavailability of a suitable Xray crystallographic structure of the Electrophorus electricus AChE receptor, we have employed the AChE enzyme structure (resolution 2.2 Å) of the organism Torpedo californica (the Pacific electric ray),34 which possesses a good degree of genetic homology with E. electricus. The X-ray crystal structure of the oethylmethylphosphonylated T. californica AChE enzyme 1VXR (EC- 3.1.1.7, resolution 2.2 Å) (http://www.rcsb.org/pdb/ explore/explore.do?structureId=1vxr) was used as the target receptor. The employed enzyme 1VXR contained the cocrystallized nerve gas agent o-ethylmethylphosphonic acid ester (VX) as the ligand bound to its oxyanionic binding site. We have attempted to explore the binding behavior of various ionic liquid cations at this site with the allowance of the native binding sites as well. The specifications of the employed chemometric operations and docking study can be found in the Supporting Information section. 2.2.3. Computed Statistical Validation Measures. In accordance with OECD principle 3,13 we have determined the domain of applicability of the QSTR models by applying two different algorithms, viz. leverage analysis35,36 and diversity validation analysis based on Euclidean distance measure.37 For evaluating the robustness and reliability of prediction, we have performed extensive validation of the developed models using multiple validation strategies in light of OECD principle 4 (OECD, 2007).13 The validation metrics characterize the fitness, robustness, and internal as well as external predictivity of the models. Apart from the classical parameters, we have

Figure 1. Plot of the first three principle components showing the occurrence of the test set compounds in close vicinity of the training set chemicals.

representation of the methodology of this work in Figure 2. Linear discriminant analysis (LDA),27 multiple linear regression (MLR),28 and partial least-squares (PLS)29 are the used statistical techniques for developing predictive equations, while the chemometric operations, namely the stepwise selection technique28 and the genetic function approximation (GFA),30,31 have been used as the tools for the selection of predictor variables.

Figure 2. Schematic overview of the employed methodology in consonance with the OECD guidelines. 1022

dx.doi.org/10.1021/ie403636q | Ind. Eng. Chem. Res. 2014, 53, 1020−1032

Industrial & Engineering Chemistry Research

Article

Table 1. Statistical Parameters Derived from the Mann−Whitney U Test and Wald−Wolfowitz Runs Test Analyses (Training Set, ntraining = 182, ntoxic = 128, nnontoxic = 54) sum of ranks response pEC50 (mol/L)

nonparametric test method

toxic

nontoxic

Mann−Whitney U test

15168

1485

Wald−Wolfowitz runs test

no. of runs

no. of ties

2

additionally reported the scaled version of the rm2 metrics for the regression-based models (developed by our group)38 to judge the quality of internal leave-one-out (LOO) crossvalidation (rm2(LOO)), external predictivity (rm2(test)), as well as overall predictive capability of the models considering both training and test sets (rm2(overall)). The internal stability of the regression models was further tested by employing Yrandomization operation and using the metric cRp2 (developed by our group). A detailed list of the determined metrics with their threshold limit values has been provided in Supporting Information Table S5.39−49 Validation of the docking study has been performed by extraction of the cocrystallized ligand from the active site followed by its redocking to the same enzyme system and finally comparing the dock-pose (i.e., conformational similarity) of the docked ligand using the extracted ligand as reference. A metric rmsd32 characterizing the proximity of conformational alignment between the reference and the docked ligands has been reported. 2.2.4. Software Tools. MarvinSketch version 5.11.5 software (ChemAxon Ltd., http://www.chemaxon.com) has been used for drawing the chemical structures of the cations and anions. The computation of various ETA indices has been performed by using PaDEL-Descriptor version 2.11 (a product of NUS, http://padel.nus.edu.sg/software/padeldescriptor/) while Cerius2 version 4.10 software (Accelrys Inc., http://www.accelrys. com/) has been employed for calculating various other nonETA and atom-type fragment parameters. The LDA operation was carried out using the Discriminant Analysis module present in STATISTICA version 7.1.515.0 (STATSOFT Inc., http:// www.statsoft.com/) software, and the pertinent receiver operating characteristic (ROC) analysis has been executed using the Graph module of the SPSS version 9.0.0 software (SPSS Inc., http://www.spss.co.in/). The regression module of the MINITAB version 14.13 software (Minitab Inc., http:// www.minitab.com/en-US/default.aspx) has been used for stepwise MLR and PLS analysis, while the GFA equations have been derived by employing the QSAR+ module of the Cerius2 version 4.10 software. The Y-randomization of the regression based models was performed by using the randomization module of the Cerius2 version 4.10 software whereas SIMCA P version 10.0 software (Umetrics, http:// www.umetrics.com/) was employed to do the same for the PLS models. We have used the significance levels of 99% and 90%, respectively, for model and process randomization of the variables. The values of the scaled rm2 metrics were derived using the online web-based application RmSquare Calculator (http://aptsoftware.co.in/rmsquare/). We have additionally used a software developed by our group (Java platform based) EUCLIDEAN (http://dtclab.webs.com/software-tools) to carry out the diversity validation analysis. Molecular docking analysis was performed using the LigandFit module of the Discovery Studio version 2.5 software (Accelrys Inc., http:// accelrys.com/products/discovery-studio/).

U statistic

Z-value (p-level)

Z-adjusted (p-level)

0.000

10.645 (1.863 × 10−26) −13.365 (0.000)

10.656 (1.647 × 10−26) 13.276 (0.000)

0

3. RESULTS AND DISCUSSION The achieved categorical division with respect to Rivastigmine was validated by performing a Mann−Whitney U test50 and a Wald−Wolfowitz Runs test51 on the training set data (classification division), and it depicted appreciable diversity between the groups at the significance level of 95% (Table 1). Three categories of descriptor combination, namely ETA and atom-type fragment; non-ETA and atom type fragment; and ETA, non-ETA and atom type fragment were separately used for the development of predictive classification as well as regression models. Supporting Information Table S6 and Table S7 depict all the models developed using LDA and regression analyses, respectively. However, the best model from both the analyses on the basis of superior validation metrics and mechanistic interpretability is discussed below (eqs 1 and 2). 3.1. Classification-Based Discriminant Model. Among the developed discriminant models employing ETA indices and various other topological non-ETA and atom-centered fragment parameters, model C1 (Supporting Information Table S6) was found to be most promising in terms of statistical measures as well as mechanistic interpretability and, hence, is discussed below. Equation 1 shows a nine descriptor discriminant equation, with the parameters being arranged in the descending order of their standardized coefficient values and thereby their relative priority. DF = −9.277 + 17.203(± 1.773) × Atype _N _79(cation) − 16.219( ±3.207) × Atype _H _51(cation) + 11.181(± 2.464) × ([∑ α]Y /∑ α)cation − 9.093( ± 2.831) × Atype _H _50(cation) + 5.898( ± 2.209) × ΔψB(anion) − 5.497(± 0.968) × Atype _O _59(cation) − 3.854( ±1.208) × Atype _O _56(cation) − 3.504( ± 1.404) × Atype _N _74(cation) + 1.715( ±0.559) × ([∑ α]Y /∑ α)anion Ntraining = 182, Wilk′s :

Ntest = 110,

λ = 0.374,

ρ = 20.22

df = 9,

F(df ) = 31.985 (9, 172),

1023

(1)

p‐level = 0.0000, 2

χ = 172.59,

Rc = 0.791,

2 dM

&0.690(test),

G‐means = 0.897(training)&0.864(test)

= 7.932,

MCC = 0.802(training)

dx.doi.org/10.1021/ie403636q | Ind. Eng. Chem. Res. 2014, 53, 1020−1032

Industrial & Engineering Chemistry Research

Article

Figure 3. Receiver operating characteristic (ROC) analysis diagram for the (a) training set and (b) test set chemicals.

Figure 4. Pharmacological distribution diagram analysis plot for the (a) training set and (b) test set chemicals.

Cohen′s

plots of the training as well as the test sets with their encouraging area under the curve values. We have additionally reported two more comprehensive metrics on ROC analysis, namely ROCED and ROCFIT based on Euclidean distance measure, which also reflected good model stability with appreciable predictive potential. A pharmacological distribution diagram (PDD) analysis was performed by plotting the expectancy of the compounds, that is, the tendency to belong to a toxic or nontoxic group, against their posterior probability values. We have used a probability cutoff value of 45% to render the compounds nontoxic below it, while the chemicals possessing more than a 55% probability ratio were considered to be toxic. The agents falling within the range of 45−55% were classified as undetermined chemicals. The PDD diagram in Figure 4 shows a minimum overlap of the classified toxic and nontoxic ionic liquids and thus corresponds to appreciable model quality. The rationale used for ROC and PDD analysis along with the validation parameters (Supporting Information Table S5) for the PDD analysis has been shown the Supporting Information. The domain of applicability of eq 1 was determined using the diversity validation approach. A plot of the normalized mean Euclidean distance values (Figure 5) showed the presence of all the compounds within the defined chemical domains, thus obeying OECD principle 3. Equation 1 contains six atom-type parameters and three ETA indices defining the essential structural attributes responsible

κ = 0.801(training)&0.682(test),

AUC ‐ROC = 0.959(training)&0.914(test), ROCED = 0.432,

ROCFIT = 0.395

The values within parentheses in eq 1 represent standard error values of the corresponding coefficients. Considering a training set of 182 compounds, the nine descriptors show a satisfactory ρ value of 20.22 (>5). The classification model (eq 1) is appended with encouraging quality of the fitness measures shown by a low value of the Wilk’s λ statistics39 (0.374) and higher values of Fischer validation criteria37 (31.985) and Chisquared distribution statistics40 (172.59) at a considerably significant probability value. The model also shows good agreement with the Canonical correlation analysis as defined by the interesting value of the correlation coefficient (Rc = 0.791). Additionally, the higher value of the squared Mahalanobis distance measure (dM2 = 7.932) also supports good quality of the model. The internal stability as well as the external predictivity of the model is further highlighted by the appreciable values of various validation measures (Supporting Information Table S6), namely sensitivity, specificity, accuracy, precision, F-measure, Matthew’s correlation coefficient (MCC), G-means, Cohen’s κ, along with different area under the curve parameters for the ROC analysis. Figure 3 shows the ROC 1024

dx.doi.org/10.1021/ie403636q | Ind. Eng. Chem. Res. 2014, 53, 1020−1032

Industrial & Engineering Chemistry Research

Article

a negative coefficient describing the inverse impact of H-atoms attached to a heteroatom toward the toxic manifestation of ionic liquids. In other words, this parameter indicates the occurrence of H-bond donor atoms and signifies that toxicity is reduced due to reduced lipophilicity of the molecule. The next parameter in the order of priority is a second generation ETA index for anions, ΔψB(anion) with a positive contribution. The descriptor is a spline term (negative value taken zero)18 and indicates the contribution of heteroatoms possessing ψ values more than that of the carbon atom (0.714). In this present study, it was observed that anions possessing a lipophilic heteroatom such as iodide and bromide hold nonzero values for this parameter and thereby indicate the positive effect of hydrophobicity of anions toward the enzyme inhibitory activity of ionic liquids. The next three variables are atom-type parameters, with negative contribution defining cationic features; the first one is Atype_O_59(cation), showing the effects of an ether type structural fragment (Al−O−Al, where Al is an aliphatic group) toward the modeled end point; the second one Atype_O_56(cation) describes the presence of alcohols, while the third one Atype_N_74(cation) indicates the occurrence of a multiple bonded N-atom, i.e., RN or RN, considering R as any group linked through carbon. Hence, it might be noted that these parameters emphasize the hydrophilic nature of the molecules as the occurrence of H-bond donor and receiver atoms present in ethers, alcohols, as well as nitrile or imine type substituents or structural fragments in cations. Hydrophilicity hinders a molecule from passing through the biological membrane and thereby reduces toxicity due to less availability of the compound at the receptor site. With the aim of exploring the relative discriminatory influence of the variables, we have additionally developed a contribution diagram by plotting the product of the average values of the descriptors belonging to toxic and nontoxic groups with their coefficients (eq 1) values. Supporting Information Figure S1 shows the parameters Atype_N_79(cation), [(∑α)Y/∑α](cation), and Atype_H_51(cation) to be more discriminating than the others, hence emphasizing the importance of positively charged tertiary N-atom and Hbond acceptor groups such as cyano, carbonyl, etc. 3.2. Partial Least-Squares-Based Regression Model. Among all the developed regression models, model R12 (Supporting Information Table S7) was observed to be the best one, in view of the validation parameters and the interpretability of descriptors. A 14 descriptor linear model was initially obtained after performing stepwise MLR selection which was optimized by PLS analysis using the “5% Q2” rule,52 i.e., retaining only those variables (defined by latent variables) which caused an increase in Q2 by 5%. Equation 2 is a parabolic model consisting of eleven dependent variables defined by ten latent variables (LVs) for a training set of 182 compounds and gives a satisfactory ratio of training set compounds to the number of LVs (14.80). The model shows an appreciable predicted variance of 80.80% with an explained variance of 82.6%. The value of the coefficient of determination also proves to be encouraging (0.838). The rootmean-square error measures for both the sets were also very small (RMSEc = 0.243 and RMSEp = 0.248), and their small difference value (ΔRMSE = |RMSEC − RMSEP| = 0.005) additionally showed good model quality. Moreover, the model is also promising in internal robustness and external predictivity, as shown by different rm2 validation parameters developed by the present authors’ group. We have additionally determined the Golbraikh and Tropsha’s criteria49 for

Figure 5. Euclidean distance based diversity validation analysis plot depicting the chemical domain of eq 1.

for the AChE enzyme inhibitory action of the analyzed ionic liquids. The first and hence the most significant descriptor presented in eq 1 is Atype_N_79(cation), with a positive coefficient, which describes the direct relationship of the positively charged N-atoms of cations with toxicity.

The N-atom-containing a positively charged region in ionic liquid cations is likely to interact with the electron rich nucleophilic domain of the AChE enzyme of eel, thereby causing inhibition of the enzyme, leading to the toxicity. The next variable is another atom-type parameter Atype_H_51(cation) with a negative contribution defining the structural fragment containing an H-atom attached to an α-carbon, where α-carbon corresponds to a C-atom attached through a single bond with −CX, CX, −C---X (X is any heteroatom, such as O, N, S, P, Se, halogens, etc.). An inspection of the cationic structures revealed the −CX group to be cyano (CN) and the carbonyl (−CO) group as the cationic side chain substituents. Hence, the impact may be considered to be due to the H-bond accepting effect of O or N atoms, which tends to enhance the aqueous solubility of the molecules and thereby reduce penetration, leading to a reduced toxicity profile. Next we have an ETA shape parameter (first generation) [(∑α)Y/ ∑α](cation) with a positive coefficient depicting the impact of branching in the cationic structure, in correlation with toxicity. This parameter specifically refers to the branching where one central atom is attached to three more non-hydrogen atoms, forming a Y-shaped structural fragment, as found in tertiary groups.

This parameter mainly indicates the cationic head groups at the N-atom on imidazolium and pyridinium systems, giving a tertiary substituent structure and thereby giving reasons for such substituents producing toxicity. The same parameter for an anion is also present in eq 1 as the ninth descriptor and possibly indicates the presence of a similar branching pattern in anions as in bis[oxalato(2−)]borate, hydroxyacetate, etc. The next variable is an atom type parameter Atype_H_50(cation) with 1025

dx.doi.org/10.1021/ie403636q | Ind. Eng. Chem. Res. 2014, 53, 1020−1032

Industrial & Engineering Chemistry Research

Article

a positively charged N-atom. The toxicity of cationic structures containing attributes of S_ssssN was found to be lower than that of the corresponding pyridinium or imidazolium systems. The next is another E-state variable S_dsssP(cation) defining the positive contribution of phosphonium cations toward toxicity. Now, it was observed that the presence of phosphonium cations produced a negative value for the parameter S_dsssP, and thus, the presence of phosphonium cations in ionic liquids is found to reduce toxicity. The next predictor in eq 2 is a nonETA valence connectivity index, 2χv(cation), possessing a parabolic relationship toward the AChE enzyme inhibitory activity of ionic liquids. 2χv(cation), with a positive coefficient signifying the positive impact of the branching and molecular size of cations toward the modeled toxicity. Because of the valence contribution, the parameter additionally accounts for the effect of unsaturation and heteroatomic contribution. The tenth parameter is a squared term of the same variable {(2χv(cation))2} and bears a negative impact on pEC50 values, thus revealing a parabolic relationship; that is, toxicity will increase at first with the increase in value of the 2χv(cation) and shall decrease after reaching a certain value. Differentiation of eq 2 with respect to 2χv(cation) showed the optimal value of 2χv(cation) to be 21.667. Now, 2χv increases with the increase in chain length, branching, and the presence of lipophilic heteroatoms (Cl, Br, I, etc.), while it decreases with the occurrence of multiple bonds (π bond) and the presence of lower quantum number heteroatoms such as O, N, etc. However, the connectivity parameter 2χv also shows an increase in its value when ring structures are used rather than chains, but that effect is avoided for ease of explanation. The toxicity of ionic liquids is thus observed to become enhanced with the presence of lipophilic heteroatoms (Cl, Br, I, etc.) while it follows the reverse order when unsaturation and H-bonding heteroatoms (O, N, etc.) are present. The seventh descriptor is an ETA index ΔεD(cation) depicting its negative impact on toxicity. This index gives a measure of the contribution of hydrogen-bond donor atoms in cationic side chains in terms of the presence of −OH, −NH2, etc. groups and thereby depicts a reduction in toxicity due to the enhanced hydrophilicity of the molecules. Next we have two atom type fragment parameters with negative coefficients. The first one, Atype_H_53(cation), refers to the number of Hatoms attached to a C0sp3 carbon having 2 heteroatoms attached to its next carbon. Here, C0sp3 refers to an sp3 hybridized carbon atom having no heteroatomic substitution in it. In other words, it was observed that toxicity tends to reduce with the presence of heteroatoms such as O, N (e.g., 1-(3-methoxypropyl)-1methylpiperidinium cation) in the vicinity of a sp3 hybridized carbon atom containing H because of the hydrogen bonding effect exhibited by the heteroatoms. The other parameter, Atype_N_74(cation), defines the negative impact of RN and RN types of structural fragments. An inspection of the data set indicated that the presence cyano (CN) type substituents reduces the modeled toxicity end point due to the occurrence of an H-bond acceptor N-atom in it. The final descriptor in eq 2 is an E-state parameter, S_dssC(cation), depicting the negative impact of the carbon atom fragment containing one double and two single bonds.

ascertaining the external predictivity, and it also showed acceptable results: (i) Q2 = 0.808 (>0.5); (ii) R2test = 0.827 (>0.6); (iii) (r2 − r20)/r2 = 0.004 (