Prediction of ETN Polarity Scale of Ionic Liquids Using a QSPR

Dec 10, 2015 - Citation data is made available by participants in Crossref's Cited-by Linking service. For a more comprehensive list of citations to t...
0 downloads 0 Views 548KB Size
Article pubs.acs.org/IECR

Prediction of ETN Polarity Scale of Ionic Liquids Using a QSPR Approach Mohsen Nekoeinia,*,† Saeed Yousefinejad,*,‡ and Azam Abdollahi-Dezaki† †

Department of Chemistry, Payame Noor University, P.O. BOX 19395-3697, Tehran, Iran School of Health, Shiraz University of Medical Sciences, Shiraz, Iran



S Supporting Information *

ABSTRACT: A multiparameter quantitative model was developed to establish a relationship between structural descriptors of a set of 52 ionic liquids and their ETN polarity scale. Theoretical descriptors were extracted by Dragon software and the ETN model was obtained using multiple linear regression approach. After molecular modeling, four significant descriptors were identified which are related to the ETN values of the ionic liquids and demonstrates good fit statistics and accurate predictions. The stability and prediction ability of the ETN model was evaluated using various common statistical methods such as cross-validation, external validation, and Y-randomization test. As another indicator of model’s validity, the leverage and standardized residual confirmed the presence of almost all 52 ILs in the applicability domain of the proposed model.



where λmax is the wavelength that a maximum of the longintramolecular charge transfer absorption band of the betaine dye can be observed. On the other hand, the dimensionless normalized ENT scale was introduced later by Reichardt,12 using water (ENT = 1.00) and TMS (ENT = 0.00) as polar and nonpolar reference solvents, respectively. ENT value is calculated according to eq 2

INTRODUCTION ″Ionic liquid” is the term widely used to describe a large category of low melting ionic salts that usually exist in liquids state below 100 °C.1 The typical ionic liquid (IL) is based on generic cations (such as pyrrolidinium, piperidinium, pyridinium, or imidazolium) paired with a variety of symmetric anions (e.g., Cl−, BF−4 , PF−6 , (CF3SO2)N−2 , etc.).2 ILs have attracted considerable interests in the last two decade because of their unique properties such as low vapor pressure, high thermal stability, and nonflammability.1 Also, these compounds can solve various organic and inorganic species and thus have been known as green solvents for liquid−liquid extractions, recycling in homogeneous catalysis, and different kinds of chemical reactions.3 Among the physicochemical properties of solvents, polarity is one of the most important characteristics. Having information about polarity can be utilized to explain the interaction between the solute(s) and solvent.4 Different quantities such as dielectric constant, refractive index and dipole moments are widely used as macroscopic solvent polarity parameters.5 Since these quantities did not give satisfactory results, the solvent polarity is most often expressed at the microscopic level, which are capable to measure all possible specific and nonspecific interactions between the solute and the solvent.6,7 On the basis of these interactions, several polarity scales such as Yscale,8 Z-scale,9 Kamlet and Taft parameters (α, β, π*)9−11 and ET scale12 have been proposed which are based on the interaction of solvatochromic dyes with solvents. One of the most popular polarity scales is the ET(30), which has been obtained from UV−visible spectrophotometric measurements of negatively solvatochromic Reichardt’s betaine-30 dye.12 The ET(30) values are defined as the molar transition energies of betaine dye at standard conditions (25 °C and 1 bar) according to eq 1: E T(30)(kcal mol−1) =

28591 λmax (nm) © 2015 American Chemical Society

E TN =

E T(solvent) − E T(TMS) E (solvent) − 30.7 = T E T(water) − E T(TMS) 32.4 (2)

When somebody wants to design a new ionic liquid with certain polarity, the first step is to estimate this parameter using an appropriate approach which needs synthesis of the desired IL and evaluate its polarity experimentally and continue this in a trial-and-error process. Quantitative structure−property relationship (QSPR) studies, relate descriptors of the molecular structure to the properties of chemical molecules13−15 or macromolecules.16,17 QSPR approach have been also used successfully in prediction of different chemical properties and toxicity of ILs18 and were found to be appropriate in estimation of the polarity of conventional solvents and ILs as well.19−21 In this approach, the descriptors can be computed exclusively from structure only and are not reliant on any experiment properties which can decrease trial-and-error synthesis of ILs or time-consuming experimental estimations. This can be considered as a bold benefit of QSPR over other predictive methods. It is noteworthy that molecular modeling and QSPR methods have been utilized previously in prediction different properties of ILs such as melting point,22,23 surface tension,24 Received: Revised: Accepted: Published:

(1) 12682

August 13, 2015 October 22, 2015 November 19, 2015 December 10, 2015 DOI: 10.1021/acs.iecr.5b02982 Ind. Eng. Chem. Res. 2015, 54, 12682−12689

Article

Industrial & Engineering Chemistry Research Table 1. Experimental (Observed) and Predicted ETN Value of ILs in the Data Set of This Study no. IL#1 IL#2a IL#3a IL#4 IL#5 IL#6a IL#7a IL#8 IL#9a IL#10 IL#11 IL#12a IL#13 IL#14 IL#15 IL#16 IL#17 IL#18 IL#19 IL#20 IL#21 IL#22 IL#23 IL#24 IL#25 IL#26 a

ETN (obs) 0.635 0.772 0.873 0.469 0.867 0.685 0.596 0.657 0.650 0.546 0.647 0.574 0.629 0.525 0.660 0.673 0.728 0.939 0.806 0.985 0.941 0.599 0.565 0.562 0.994 0.565

ETN (Pred) 0.628 0.837 0.795 0.530 0.805 0.704 0.621 0.688 0.631 0.588 0.643 0.563 0.627 0.566 0.613 0.596 0.761 0.863 0.839 1.011 0.970 0.662 0.664 0.656 0.936 0.603

residual

no. a

0.007 −0.065 0.078 −0.061 0.062 −0.019 −0.025 −0.031 0.019 −0.042 0.003 0.011 0.002 −0.041 0.047 0.077 −0.033 0.076 −0.033 −0.026 −0.029 −0.063 −0.099 −0.094 0.058 −0.038

IL#27 IL#28 IL#29 IL#30 IL#31 IL#32 IL#33 IL#34a IL#35 IL#36a IL#37 IL#38 IL#39 IL#40 IL#41 IL#42 IL#43 IL#44 IL#45 IL#46a IL#47 IL#48 IL#49 IL#50 IL#51 IL#52

ETN (Obs)

ETN (Pred)

residual

0.552 0.562 0.540 0.602 0.991 0.577 0.603 0.592 0.604 0.552 0.571 0.577 0.596 0.608 0.670 0.591 0.565 0.657 0.732 1.031 0.648 0.546 0.775 0.596 0.380 0.515

0.598 0.627 0.564 0.611 0.947 0.540 0.632 0.572 0.587 0.533 0.605 0.616 0.626 0.614 0.619 0.613 0.620 0.601 0.569 1.024 0.615 0.541 0.598 0.497 0.391 0.619

−0.046 −0.065 −0.024 −0.009 0.044 0.037 −0.029 0.020 0.016 0.019 −0.034 −0.039 −0.030 −0.006 0.051 −0.022 −0.055 0.056 0.163 0.007 0.033 0.005 0.177 0.099 −0.011 −0.104

ILs used as the test set of ETN model

files of the cationic parts of ILs were drawn using Hyperchem software (Version 7, Hypercube Inc., http://www.hyper.com) and were optimized by MM+ molecular mechanics force field. The obtained structures were utilized as the input of Dragon. After extraction of theoretical descriptors from chemical structures, some data treatments were done to prepare a descriptor set for variable selection step: (i) numerous structural descriptors were lessened by removing descriptors that were not available for every ILs, (ii) removing also descriptors with an essentially constant or near constant value for all ILs, (iii) To decrease the redundant descriptors in the Xvariables matrix, the correlations between all generated descriptors and the ETN vector of ILs were evaluated, and among the detected collinear descriptors one with the highest correlation (i.e., R2 > 0.95) with ETN was retained and the others were removed. After these data treatments, 426 descriptors were remained for each IL. 2.2. Data Processing and Modeling. Multiple linear regression analysis (MLR) with stepwise selection of variables (employing SPSS software; SPSS Inc., version 19.0) was utilized to select best subset of descriptors and to correlate the selected descriptors and ETN of ILs. All calculations were done using a laptop computer under Windows XP operating system. SPSS software (version 19.0, SPSS Inc., http://www.spss.com) and MATLAB (version 7.6, Math work, Inc., http://www.mathworks.com) was utilized for MLR analysis and other statistical calculations were done in the MATLAB environment.

conductivity,25 viscosity,25−27 solubility in water,28 parachor value,29 density,30 and refractive indices.31 Different researches on the study and determination of polarity indices of ILs and mixture of solvent-ILs can be found in literature.5,32−36 For example, a systematic investigation was done by Brennecke et al. on the effects of anions, cations and cosolvents on the solvent strength of IL−organic mixtures after measurement of their ETN scale using different spectroscopic probes.32 In spite of various works on determination of polarity scales in IL systems, reports on the prediction of polarity in ILs using QSPR are rare. Palomar et al. used QSPR to predict empirical solvent polarity of some ILs and their mixtures with organic solvents,21 however the complexity of model (because of high number of parameters in their QSPR) can be noted as a limitation. In the current work, the focus was on the suggestion of a new predictive model based on QSPR approach to estimate the ETN (as an applicable and well-known polarity scale) of a set of ILs with similar anions and different cations.

2. MATERIALS AND MMETHODS 2.1. Data Set and Descriptors. The ETN Scale of 52 ILs with similar anions (CF3SO2)N−2 ) and different cations was taken from literature.37 The structure and name of ILs used in the current study are shown in Table S1 (Supporting Information) and the experimental ETN value of these ILs are represented in Table 1. Because of difference in experimental conditions in case of some ILs, there are some variations in amounts of reported ETN. Thus, the average amounts of ETN scales in different reports were utilized in the current study. The structural descriptors of the ILs were calculated by Dragon software38 (http://michem.disat.unimib.it/chm/; Milano Chemometrics and QSAR research group). To do so, the structure 12683

DOI: 10.1021/acs.iecr.5b02982 Ind. Eng. Chem. Res. 2015, 54, 12682−12689

Article

Industrial & Engineering Chemistry Research

much by using bigger number of descriptors. The final fourparametric equation is

3. RESULTS AND DISCUSSION 3.1. Model Construction. 426 structural descriptors for 52 Ionic liquids with known ETN were arranged in a matrix. So, for all 52 ILs, a data array with the size of 52 × 426 was obtained. To select training and test ILs, principle component analysis (PCA) was performed on the initial descriptor matrix (size = 52 × 426) and two first principle components (PCs) were extracted which contained maximum variance of data information from ILs’ structures. The plot of PC1 versus PC2 which indicates the distribution of ILs in the twodimensional space of the descriptors is illustrated in Figure 1.

ET N = 0.953( ±0.082) + 0.217( ±0.023) MAXDN − 0.009( ±0.002)MPC04 − 1.025( ±0.320)R1m+ − 0.051( ±0.025)GATS 3e

(3)

N = 52, Ntrain = 42, Ntest = 10, R2 train = 0.82, Q 2 LOO = 0.77, F = 41.01, Fcrit = 2.63

In this equation, MAXDN is related to maximal electrotopological negative variation, MPC04 is a molecular path count of order 4, R1m+ indicates R maximal autocorrelation of lag 1/weighted by mass and GATS3e is Geary autocorrelation of lag 3 weighted by Sanderson electronegativity. Each of these parameters can be utilized to describe the effective structural features of ILs on ETN and will be discussed in the next parts of manuscript. In the above equation N,Ntrain and Ntest are total number of ILs in data set, number of training compounds and number of test ILs, respectively. R2train is the squared correlation coefficient of calibration (training set) and Q2LOO is the squared correlation coefficients for leave one out cross-validation. F is the well-known Fisher F-statistic and Fcrit is the F-critical value. According to the results, the obtained F is bigger than Fcrit and shows significance of the suggested QSPR model. The suggested QSPR model has good statistical quality which explains more than 82% of variances in the ETN data of ILs. To have better prediction, the descriptor matrix and ETN vector were autoscaled using their mean values and standard deviation. The standardized coefficients of the model after autoscaling and their corresponding t-value are represented in Table S2 (Supporting Information). As it is clear from Table S2, t test confirms the significance of coefficients as well. According to the literature, leave-many-out cross validation is a better statistical procedure in compare with leave-one-out method to judge internal validity and model’s stability.39 For this reason, leave-three out and leave-six-out cross validation were also performed and both Q2L3O and Q2L6O were obtained equal to 0.77. The closeness of these metrics to the R2train shows the good stability of the proposed model and also its brief ability in prediction. It is noteworthy that utilizing only the cross-validation method cannot be a rigorous way to evaluate the prediction ability. One of the accepted protocols to check the predictability is using the model to predict the property (here ETN) of an external test set and compare the predicted and real (experimental) properties of these ILs that were not used in the model construction.40 The squared correlation coefficient of the test set (R2test) and root-mean-square error of the test set (RMSEtest) was calculated equal to 0.94 and 0.25 respectively which shows good prediction ability of the proposed model for ETN of ionic liquids. To show that the training quality of model and its external prediction power are not dependent on the selection of training and test ILs, two other random splitting of compounds were done and it was observed that models’ quality are retained by using different set of ILs as training and test compounds (See Table S3 in Supporting Information). It should be emphasized that using wrong number of descriptors (i.e., bigger than optimum value) can obtain an overfitted model which leads to weak prediction in external test set. To indicate this point, the variation of root-mean-square error of prediction in test ILs (RMSEtest) is also shown in

Figure 1. Random selection of training and test ILs from different points of PCA space (Plot of the PC1 versus PC2). This plot was prepared based on total descriptors obtained from the structure of ILs.

Then, according to this factor space, about 20% of compounds (10 ILs out of 52) were randomly selected as the external test compounds and 42 remaining ones were used as the training set for model construction. After variable selection based on stepwise regression, six different models were generated step by step but there was the probability of overfitting in some of them. In order to choose the best model, cross-validation method was employed for each of these six models. Figure 2 shows the plot of variation of cross

Figure 2. Correlation coefficient of the training set of ILs (R2train) and cross-validation (Q2) versus number of utilized descriptors to select the optimum number of parameters for model construction. The variation of the root-mean-square error of prediction in test set (RMSEtest) are also represented I this figure.

validated squared correlation coefficient (Q2) and squared correlation coefficient of training set (R2train) as a function of the number of descriptors entered. As it is clear from Figure 2, by introducing each new descriptor, the performance of QSPR was increased up to four variables and it is not increased very 12684

DOI: 10.1021/acs.iecr.5b02982 Ind. Eng. Chem. Res. 2015, 54, 12682−12689

Article

Industrial & Engineering Chemistry Research Table 2. Various Statistics Parameter of the Proposed QSPR Model nva

Ntrainb

Ntestc

R2traind

RMSECe

rm2(train)f

R2LOO cvg

R2L6O cvg

RMSEcvh

R2testi

4

42

10

0.82

0.43

0.74

0.77

0.77

0.48

0.94

a

b

c

RMSEtestj

rm2(test)k

Q2MPl

0.25

0.92

0.16

c 2 R P (train)m

0.76 d

Number of descriptors applied for the model development. Number of molecules in training set. Number of molecules in test set. Training correlation coefficient. eRoot mean square error of calibration (training). fAverage r2m metric of training set. gCorrelation coefficient of leave-one-out and leave-six-out cross-validation. hRoot-mean-square errors of leave-six-out cross-validation. iCorrelation coefficient of the test set. jRoot-meansquare errors of prediction (test set). kaverage r2m metric of external test set. lMaximum cross-validation correlation coefficient for 30 Yrandomization test. mThe cR2P of training set as a metric for checking chance correlation.

Figure 2. It is clear from this figure that increasing the factors after 4, led to decreasing the prediction ability of external test ILs in spite of increasing correlation in training and internal validation (R2train and Q2). This can confirms that correct numbers of descriptors are included in the proposed QSPR model for the ETN of ILs. Another point which has been recently emphasized in the literature is the insufficiency of squared regression coefficient between observed and predicted values of the train, or test compounds for showing the real validity of a QSPR model.41,42 So an average modified term ( rm2 ) has been strongly suggested for evaluate the validity of model by Roy and co-workers.42 For this purpose, the experimental and predicted property (here ETN) were scaled between their minimum and maximum values, the correlation metric (known as r2m) was calculated using the experimental and predicted ETN as the x-axis and y-axis respectively with and without the intercept. A similar metric was also calculated after interchanging the axes (known as r′2m). For a valid QSPR model, rm2 which is the average of r2m and r′2m should not be lower than 0.5 and Δr2m which is the difference of r2m and r′2m should not be bigger than 0.2.43The obtained rm2 and Δr2m were calculated for the training set 0.74 and 0.15 respectively which confirmed the validity of training results. The rm2 and Δr2m for the external validation based on test set were also equal to 0.92 and 0.03, respectively, which confirmed the prediction ability of the model as well. One of the risks for the significance of a QSPR model is the existence of chance correlation but just using cross validation could not detect such chancy models. This risk is specifically important in cases that number of descriptors is comparable to or higher than the number of molecules in a data set.40,44 The common procedure for this issue is Y-randomization or Ypermutation45 in which the vector of response values, that is, ETN, is randomized many times and a QSPR model is made using the descriptors of the original model each time. Here, Yrandomization was done 30 times and cross validation was performed on each of these randomized models. Maximum Q2 of this randomized model (Q2MP) can be compared with Q2 of the original model. In the current work the Q2MP was calculated equal to 0.16 which has an outstanding difference with the Q2 of the original model and denotes that the proposed model for ETN of ionic liquids is not chancy.39 For further evaluation of chance correlation, cRp2 as a model randomization metric was also calculated according to the following equation:46 c

R p2 = R (R2 − R r2 )

considered as a robust model with no significant chance correlation.46 The cRp2 of training and cross validation of the proposed ETN model were calculated equal to 0.76 and 0.68 respectively. These obtained values confirm the validity of model. The predicted ETN value of ILs in training and test sets and the difference of experimental and predicted ETN (i.e., residual) are presented in Table 1.The statistics of the performance of QSPR model and its validation are summarized in Table 2. The agreement between experimental and predicted RF of training and test set is graphically illustrated in Figure 3a. Also, the plot

Figure 3. (a) Plot of predicted ETN scale versus their experimental values for 52 ILs in training and test set using the proposed four parametric model. (b) Residual value of prediction in the noted 52 ILs.

of the residual for the predicted values of ETN for both the training and test molecules against the experimental (observed) ETN values are shown in Figure 3b. As it is clear from this figure the propagation of the residuals on both sides of zero line is random which indicates no proportional and systematic error in the proposed QSPR model. The numerical value of four parameters in eq 3 before autoscaling for both training and test compounds are summarized in Table S4. (Supporting Information). 3.2. Model’s Applicability Domain. In continue of the notations regarding to the validation of QSPR model we want to discuss about the applicability domain (AD) of the model which is a common concept to indicate the limitation of model

(4)

where R and R2 are the correlation coefficient and squared correlation coefficient (training or cross validation) and R2r is the average of the squared correlation coefficients (training or cross validation) of randomized models. The threshold value of c 2 Rp is 0.5 and a model exceeding this certain value can be 12685

DOI: 10.1021/acs.iecr.5b02982 Ind. Eng. Chem. Res. 2015, 54, 12682−12689

Article

Industrial & Engineering Chemistry Research in accurately prediction of the property of similar molecules.47 Different strategies have been proposed to define AD but one of the famous ones is using the standardized residual and leverage.48 “Leverage” deals with the multivariate normality of predictions49 and provides a measure of the distance of predicted ETN of each IL from the centroid of the model’s calibration space. The leverages of the ith IL in the descriptor space was computed according to the following equation:47 h i = x i(XTX)−1x i T

of this approach were also similar to Williams plot and detected IL#51 as the outlier. Without this outlier, the variable selection was performed on the initial pool of descriptors and the four descriptors appeared in eq 3 were selected again as the best set of parameters. As it is indicated in the following equation, the coefficient of proposed equation was not changed significantly after omitting the IL#51. ET N = 0.946( ±0.089) + 0.217( ±0.023) MAXDN − 0.009( ±0.003)MPC04 − 1.028( ±0.325)R1m+

(5)

− 0.050( ±0.026)GATS 3e

where X is the descriptor matrix of the training set of ILs and xi is the descriptor row vector of the desired IL’s ETN (in training or test set). In this issue, a warning leverage is calculated as h* = 3m/n, where n is the number of ILs in training set and m is the number of descriptors plus one49,50 (4 + 1 for current ETN model). The leverage values for all ILs in both training and test sets were calculated which was lower than warning h* (= 0.357) in all IL molecules, except IL#51. Thus, almost all predictions of the QSPR model cannot cause an outstanding influence on the regression line or force that line to put near the result of a certain molecule. Not only high leverage ILs but also those with high standardized residuals49 can cause exiting from the AD. As a result, Williams plot utilizes both standardized residual and leverage concepts simultaneously to show the AD visually.48 Williams plot of ETN model is represented in Figure 4. In this figure, the limits of acceptable

(6)

N = 51, Ntrain = 41, Ntest = 10, R2 train = 0.80, R2 test = 0.93, Q 2 LOO = 0.74, F = 35.59

Collinearity and Multicollinearity of Parameters. In linear models based on MLR, one can have a simple and interpretable model which is one of the remarkable advantages of MLR. However, the existence of linear dependency between the variables of the QSPR can limit the accuracy of the model significantly. If we have not a QSPR model with enough mathematical independency, MLR can generate a model with some coefficients larger than expected or those with the wrong signs.53 In such conditions, it is not possible to use sign and magnitude of coefficients for interpretation purpose. To check this issue, the correlation coefficients matrix of the model, contained MAXDN, MPC04, R1m+, and GATS3e as the molecular descriptors, were calculated and is presented in Table 3. According to the results, there is no pair-correlation in the QSPR model. Table 3. Pair Correlation Coefficient between Parameters of ETN Model of ILs and Their VIF Values MAXDN MAXDN MPC04 R1m+ GATS3e

Figure 4. Williams plot (standardized residual versus leverage) of the entire set of ILs in the proposed ETN model. Standardized residual (±3.0 times the standards deviation and cut off values of leverage (h*) and are shown by horizontal and vertical dashed lines, respectively.

1.00 0.00 0.14 0.22

MPC04 1.00 0.01 0.15

R1m+

1.00 0.21

GATS3e

VIF

1.00

1.437 1.424 1.396 1.973

Furthermore, the multicollinearity between the parameters in the constructed model was evaluated by variance inflation factor (VIF),54 which can be obtained simply using the following equation: 1 VIF = 1 − R j2 (7)

leverage on x-axis and standardized residual on y-axis are represented by dash lines. According to Figure 4, all ILs are in the admissible range of AD which shows the accuracy of predictions. It is also clear form Figure 4 that our model based on eq 3 predicts the ETN of all ILs with a standardized residual in the admissible range of ±3.0σ.51 These results confirm the ability of the QSPR model to predict the ETN of ILs. It is noteworthy that standardized residual of ILs were calculated by dividing residual of prediction for each IL (i.e., difference of actual and predicted ETN) by the standard deviation of residual values of all data set. Detail values of standardized residuals and leverages of all ILs utilized in Figure 4 are presented in Table S5 (Supporting Information). In addition to the Williams plot, another methodology was utilized to check the AD of model, which has been proposed by Roy et al. based on the theory of standardization.52 The results

In eq 7, R2j is the multiple correlation coefficient of jth molecular descriptor regressed on the remaining descriptors. It has been shown in the original literature that VIFs bigger than a cutoff value equal to 5.0 shows a remarkable multicollinearitiy.54 According to Table 3, the VIF values of four descriptors in ETN model are lower than cutoff value and thus there is no multicollinearitiy in the model. Interpretation of Descriptors. The relative importance of the variables included in eq 3 is MAXDN > MPC04 > R1m+ > GATS3e (See Table S2 in Supporting Information) which is in accordance with the order of entrance of descriptors due stepwise procedure. In this model the most important descriptor is MAXDN which is a topological index and denotes the “maximal electrotopological negative variation”.55,56 The 12686

DOI: 10.1021/acs.iecr.5b02982 Ind. Eng. Chem. Res. 2015, 54, 12682−12689

Article

Industrial & Engineering Chemistry Research positive sign of this coefficient shows that increasing MAXDN can increase ETN of ILs. MAXDN which has been originated from the hydrogen-depleted molecular graph based on the Kier-Hall intrinsic states of atoms of an IL contained m atoms can be calculated according the following equation:55 max DN = max|ΔIi|i = 1, 2, ..., m(for ΔIi < 0)

Table 4. Brief Definitions and Description of Variables of the ETN Model solvent scale MAXDN

(8)

MPC04

where ΔIi is the field effect on the i atom of each IL due to the perturbation of all other atoms:57 th

ΔIi =

∑ j

R1m+

Ii − Ij

GATS3e 2

(dij + 1)

(9)

where, I shows the atomic intrinsic state and d is the topological distance between the two considered atoms of each IL.55,57 It could be said that MAXDN (which is the maximum negative intrinsic state difference in the compounds) can be related to the nucleophilicity of the ILs.55 The second descriptor in the model is MPC04 which is categorized in “Walk and path counts” descriptors56 that denotes molecular path count of order 4. This descriptor is originated from molecular graph of ILs58 and the negative sign of this parameter implies that MPC04 has a reverse relationship with the ETN value of ILs. The next descriptor is R1m+ and shows the R maximal autocorrelation of lag 1 weighted by mass.38 In calculation of this descriptor, geometry, topology, and atom-weights of a chemical compound have been assembled together and so this parameter is categorized in GETAWAY descriptors.56 Considering lag 1 in R1m+ means that the mass of adjacent atoms of ILs were used in calculation of this descriptor. According to the negative sign of this descriptor, it can be concluded that the mass of ILs have a reverse role on polarity (ETN) of ILs. Thus, ILs with lower mass shows higher ETN. The last descriptor of the model is GATS3e which is again a 2D autocorrelation indices and denotes Geary autocorrelation of lag 3 weighted by Sanderson electronegativity56 with a negative regression coefficient. Two points are noteworthy regarding this descriptor. First is the role of polarity and polar interaction which was expected from the nature of dependent variable of model, that is, ETN as a polarity concept. The second point is the negative sign of the coefficient of GATS3e. Because of the nature of ETN, one may expect a positive coefficient for GATS3e. In other word, in the first look the positive relationship of GATS3e (with an electronegativity nature) and ETN seems more logical. But it should be emphasized that GATS3e is an autocorrelation index which is reflect topological characteristics of ILs as well. On the other hand, the QSPR is a multiparameter model and we cannot interpret a single parameter solely. Brief definition and description of four parameters of the ETN model are summarized in Table 4.

definition

property of scale

maximal electrotopological negative variation molecular path count of order 4 R maximal autocorrelation of lag 1 weighted by mass

(i) topology of structure and (ii) negative charge (i) topology of structure

Geary autocorrelation of lag 3 weighted by Sanderson electronegativity

(i) GETAWAY (geometry, topology, and atom-weights assembly) (ii) molecular mass (i) topology of structure, (ii) electronegativity and polar interactios

molecules before synthesis with the aid of QSPR, could be a suitable option to decrease trial-and-error experiments.



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.iecr.5b02982. Five tables. Name and structure of ILs data set used in this study (Table S1), The MLR-coefficient of the proposed QSPR model and their t- and p-values (Table S2), Statistical parameter of the proposed QSPR model using two different random selected of training and test sets (Table S3), The numerical value of four parameters in the proposed QSPR model (Table S4), Detail values of residual standardized residuals and leverages of ILs (Table S5) (PDF)



AUTHOR INFORMATION

Corresponding Authors

*(M.N.) Phone: +98 03832223328; e-mail: m_nekoeinia@ pnu.ac.ir. *(S.Y.) Phone: +98 917 7042635; e-mail: yousefinejad.s@ gmail.com Notes

The authors declare no competing financial interest.



ABBREVIATIONS IL ionic liquid QSPR quantitative structure property relationship MLR multiple linear regression analysis VIF variance inflation factor AD applicability domain PCA principle component analysis TMS tetramethylsilane





CONCLUSION QSPR helped us to construct a predictive model for ETN of ILs as a well-known polarity scale. Results showed that proposed linear MLR model has a good ability in modeling and this relationship covers 82% variance in ETN of ILs in the training set. The stability and prediction ability of the ETN model was confirmed using various approaches such as cross-validation, external validation and Y-randomization test. When somebody wants to focus on synthesis of an IL with a specific polarity, following and estimation of ETN in new

REFERENCES

(1) Bhowmik, P.; Nedeltchev, A.; Han, H. Synthesis, Thermal and Lyotropic Liquid Crystalline Properties of Protic Ionic Salts. Liq. Cryst. 2008, 35, 757. (2) Wang, C.; Wei, Z.; Wang, L.; Sun, P.; Wang, Z. Assessment of Bromide-Based Ionic Liquid Toxicity toward Aquatic Organisms and QSAR Analysis. Ecotoxicol. Environ. Saf. 2015, 115, 112. (3) Mallakpour, S.; Dinari, M. Ionic Liquids as Green Solvents: Progress and Prospects. In Green Solvents II; Springer: Netherlands: Dordrecht, 2012; pp 1−32. (4) Chiappe, C.; Pieraccini, D. Ionic Liquids: Solvent Properties and Organic Reactivity. J. Phys. Org. Chem. 2005, 18, 275.

12687

DOI: 10.1021/acs.iecr.5b02982 Ind. Eng. Chem. Res. 2015, 54, 12682−12689

Article

Industrial & Engineering Chemistry Research (5) Reichardt, C. Polarity of Ionic Liquids Determined Empirically by Means of Solvatochromic Pyridinium N-Phenolate Betaine Dyes. Green Chem. 2005, 7, 339. (6) Advanced Fluorescence Reporters in Chemistry and Biology II; Demchenko, A. P., Ed.; Springer Series on Fluorescence; Springer Berlin Heidelberg: Berlin, Heidelberg, 2010; Vol. 9. (7) Organic Reactions in Water; Lindstrm, U. M., Ed.; Blackwell Publishing Ltd: Oxford, UK, 2007. (8) Nigam, S.; Rutan, S. Principles and Applications of Solvatochromism. Appl. Spectrosc. 2001, 55, 362. (9) Kosower, E. M. The Effect of Solvent on Spectra. I. A New Empirical Measure of Solvent Polarity: Z-Values. J. Am. Chem. Soc. 1958, 80, 3253. (10) Kamlet, M. J.; Taft, R. W. The Solvatochromic Comparison Method. I. The.beta.-Scale of Solvent Hydrogen-Bond Acceptor (HBA) Basicities. J. Am. Chem. Soc. 1976, 98, 377. (11) Taft, R. W.; Kamlet, M. J. The Solvatochromic Comparison Method. 2. The.alpha.-Scale of Solvent Hydrogen-Bond Donor (HBD) Acidities. J. Am. Chem. Soc. 1976, 98, 2886. (12) Reichardt, C. Solvatochromic Dyes as Solvent Polarity Indicators. Chem. Rev. 1994, 94, 2319. (13) Yousefinejad, S.; Hemmateenejad, B. A Chemometrics Approach to Predict the Dispersibility of Graphene in Various Liquid Phases Using Theoretical Descriptors and Solvent Empirical Parameters. Colloids Surf., A 2014, 441, 766. (14) Yousefinejad, S.; Honarasa, F.; Saeed, N. Quantitative StructureRetardation Factor Relationship of Protein Amino Acids in Different Solvent Mixtures for Normal-Phase Thin-Layer Chromatography. J. Sep. Sci. 2015, 38, 1771. (15) Quintero, F. A.; Patel, S. J.; Muñoz, F.; Sam Mannan, M. Review of Existing QSAR/QSPR Models Developed for Properties Used in Hazardous Chemicals Classification System. Ind. Eng. Chem. Res. 2012, 51, 16101. (16) Hemmateenejad, B.; Yousefinejad, S.; Mehdipour, A. R. Novel Amino Acids Indices Based on Quantum Topological Molecular Similarity and Their Application to QSAR Study of Peptides. Amino Acids 2011, 40, 1169. (17) Yousefinejad, S.; Bagheri, M.; Moosavi-Movahedi, A. A. Quantitative Sequence-Activity Modeling of Antimicrobial Hexapeptides Using a Segmented Principal Component Strategy: An Approach to Describe and Predict Activities of Peptide Drugs Containing L/D and Unnatural Residues. Amino Acids 2015, 47, 125. (18) Das, R. N.; Roy, K. Advances in QSPR/QSTR Models of Ionic Liquids for the Design of Greener Solvents of the Future. Mol. Diversity 2013, 17, 151. (19) Katritzky, A. R.; Tamm, T.; Wang, Y.; Sild, S.; Karelson, M. QSPR Treatment of Solvent Scales. J. Chem. Inf. Model. 1999, 39, 684. (20) Habibi-yangjeh, A. Artificial Neural Network Prediction of Normalized Polarity Parameter. Bull. Korean Chem. Soc. 2007, 28, 1472. (21) Palomar, J.; Torrecilla, J. S.; Lemus, J.; Ferro, V. R.; Rodríguez, F. A COSMO-RS Based Guide to Analyze/quantify the Polarity of Ionic Liquids and Their Mixtures with Organic Cosolvents. Phys. Chem. Chem. Phys. 2010, 12, 1991. (22) Katritzky, A. R.; Lomaka, A.; Petrukhin, R.; Jain, R.; Karelson, M.; Visser, A. E.; Rogers, R. D. QSPR Correlation of the Melting Point for Pyridinium Bromides, Potential Ionic Liquids. J. Chem. Inf. Model. 2002, 42, 71. (23) Varnek, A.; Kireeva, N.; Tetko, I. V.; Baskin, I. I.; Solov’ev, V. P. Exhaustive QSPR Studies of a Large Diverse Set of Ionic Liquids: How Accurately Can We Predict Melting Points? J. Chem. Inf. Model. 2007, 47, 1111. (24) Gardas, R. L.; Coutinho, J. A. P. Applying a QSPR Correlation to the Prediction of Surface Tensions of Ionic Liquids. Fluid Phase Equilib. 2008, 265, 57. (25) Tochigi, K.; Yamamoto, H. Estimation of Ionic Conductivity and Viscosity of Ionic Liquids Using a QSPR Model. J. Phys. Chem. C 2007, 111, 15989.

(26) Chen, B.-K.; Liang, M.-J.; Wu, T.-Y.; Wang, H. P. A High Correlate and Simplified QSPR for Viscosity of Imidazolium-Based Ionic Liquids. Fluid Phase Equilib. 2013, 350, 37. (27) Mirkhani, S. A.; Gharagheizi, F. Predictive Quantitative Structure−Property Relationship Model for the Estimation of Ionic Liquid Viscosity. Ind. Eng. Chem. Res. 2012, 51, 2470. (28) Freire, M. G.; Neves, C. M. S. S.; Ventura, S. P. M.; Pratas, M. J.; Marrucho, I. M.; Oliveira, J.; Coutinho, J. A. P.; Fernandes, A. M. Solubility of Non-Aromatic Ionic Liquids in Water and Correlation Using a QSPR Approach. Fluid Phase Equilib. 2010, 294, 234. (29) Gardas, R. L.; Rooney, D. W.; Hardacre, C. Development of a QSPR Correlation for the Parachor of 1,3-Dialkyl Imidazolium Based Ionic Liquids. Fluid Phase Equilib. 2009, 283, 31. (30) Qiao, Y.; Ma, Y.; Huo, Y.; Ma, P.; Xia, S. A Group Contribution Method to Estimate the Densities of Ionic Liquids. J. Chem. Thermodyn. 2010, 42, 852. (31) Sattari, M.; Kamari, A.; Mohammadi, A. H.; Ramjugernath, D. A Group Contribution Method for Estimating the Refractive Indices of Ionic Liquids. J. Mol. Liq. 2014, 200, 410. (32) Mellein, B. R.; Aki, S. N. V. K.; Ladewski, R. L.; Brennecke, J. F. Solvatochromic Studies of Ionic Liquid/Organic Mixtures. J. Phys. Chem. B 2007, 111, 131. (33) Harifi-Mood, A. R.; Habibi-Yangjeh, A.; Gholami, M. R. Solvatochromic Parameters for Binary Mixtures of 1-(1-Butyl)-3Methylimidazolium Tetrafluoroborate with Some Protic Molecular Solvents. J. Phys. Chem. B 2006, 110, 7073. (34) Khodadadi-Moghaddam, M.; Habibi-Yangjeh, A.; Gholami, M. R. Solvent Effects on the Reaction Rate and Selectivity of Synchronous Heterogeneous Hydrogenation of Cyclohexene and Acetone in Ionic Liquid/alcohols Mixtures. J. Mol. Catal. A: Chem. 2009, 306, 11. (35) Baker, S. N.; Baker, G. A.; Bright, F. V. Temperature-Dependent Microscopic Solvent Properties of “dry” and “wet” 1-Butyl-3Methylimidazolium Hexafluorophosphate: Correlation with ET(30) and Kamlet−Taft Polarity Scales. Green Chem. 2002, 4, 165. (36) Crowhurst, L.; Mawdsley, P. R.; Perez-Arlandis, J. M.; Salter, P. A.; Welton, T. Solvent-Solute Interactions in Ionic Liquids. Phys. Chem. Chem. Phys. 2003, 5, 2790. (37) Machado, V. G.; Stock, R. I.; Reichardt, C. Pyridinium N -Phenolate Betaine Dyes. Chem. Rev. 2014, 114, 10429. (38) Mauri, A.; Consonni, V.; Pavan, M.; Todeschini, R. Dragon Software: An Easy Approach to Molecular Descriptor Calculations. MATCH Commun. Math. Comput. Chem. 2006, 56, 237. (39) Yousefinjead, S.; Hemmateenejad, B. Chemometrics Tools in QSAR/QSPR Studies: A Historical Perspective. Chemom. Intell. Lab. Syst. 2015, 149, 177. (40) Golbraikh, A.; Tropsha, A. Predictive QSAR Modeling Based on Diversity Sampling of Experimental Datasets for the Training and Test Set Selection. Mol. Diversity 2000, 5, 231. (41) Ojha, P. K.; Mitra, I.; Das, R. N.; Roy, K. Further Exploring rm2Metrics for Validation of QSPR Models. Chemom. Intell. Lab. Syst. 2011, 107, 194. (42) Roy, K.; Mitra, I.; Kar, S.; Ojha, P. K.; Das, R. N.; Kabir, H. Comparative Studies on Some Metrics for External Validation of QSPR Models. J. Chem. Inf. Model. 2012, 52, 396. (43) Roy, K.; Chakraborty, P.; Mitra, I.; Ojha, P. K.; Kar, S.; Das, R. N. Some Case Studies on Application of “ rm2 ” Metrics for Judging Quality of Quantitative Structure-Activity Relationship Predictions: Emphasis on Scaling of Response Data. J. Comput. Chem. 2013, 34, 1071. (44) Livingstone, D. J.; Salt, D. W. Judging the Significance of Multiple Linear Regression Models. J. Med. Chem. 2005, 48, 661. (45) Rücker, C.; Rücker, G.; Meringer, M. Y-Randomization and Its Variants in QSPR/QSAR. J. Chem. Inf. Model. 2007, 47, 2345. (46) Mitra, I.; Saha, A.; Roy, K. Exploring Quantitative Structure− activity Relationship Studies of Antioxidant Phenolic Compounds Obtained from Traditional Chinese Medicinal Plants. Mol. Simul. 2010, 36, 1067. (47) Practical Guide to Chemometrics, 2nd ed.; Gemperline, P., Ed.; Taylor & Francis Group: Boca Raton, 2006. 12688

DOI: 10.1021/acs.iecr.5b02982 Ind. Eng. Chem. Res. 2015, 54, 12682−12689

Article

Industrial & Engineering Chemistry Research (48) Todeschini, R.; Consonni, V.; Gramatica, P. Chemometrics in QSAR. In Comprehensive Chemometrics: Chemical and Biochemical Data Analysis; Tauler, R., Walczak, B., Brown, S. D., Ed.; Elsevier B.V.: Amsterdam, 2009. (49) Netzeva, T. I.; Worth, A. P.; Aldenberg, T.; Benigni, R.; Cronin, M. D. .; Gramatica, P.; Jaworska, J. S.; Kahn, S.; Klopman, G.; A, C.; et al. Current Status of Methods for Defining the Applicability Domain of (Quantitative) Structure − Activity Relationships. ATLA 2005, 2, 1. (50) Yousefinejad, S.; Honarasa, F.; Montaseri, H. Linear Solvent Structure-Polymer Solubility and Solvation Energy Relationships to Study Conductive Polymer/carbon Nanotube Composite Solutions. RSC Adv. 2015, 5, 42266. (51) Honarasa, F.; Yousefinejad, S.; Nasr, S.; Nekoeina, M. Structure−electrochemistry Relationship in Non-Aqueous Solutions: Predicting the Reduction Potential of Anthraquinones Derivatives in Some Organic Solvents. J. Mol. Liq. 2015, 212, 52. (52) Roy, K.; Kar, S.; Ambure, P. On a Simple Approach for Determining Applicability Domain of QSAR Models. Chemom. Intell. Lab. Syst. 2015, 145, 22. (53) Eriksson, L.; Jaworska, J.; Worth, A. P.; Cronin, M. T. D.; McDowell, R. M.; Gramatica, P. Methods for Reliability and Uncertainty Assessment and for Applicability Evaluations of Classification- and Regression-Based QSARs. Environ. Health Perspect. 2003, 111, 1361. (54) Craney, T. A.; Surles, J. G. Model-Dependent Variance Inflation Factor Cutoff Values. Qual. Eng. 2002, 14, 391. (55) Gramatica, P.; Corradi, M.; Consonni, V. Modelling and Prediction of Soil Sorption Coefficients of Non-Ionic Organic Pesticides by Molecular Descriptors. Chemosphere 2000, 41, 763. (56) Todeschini, R.; Consonni, V. Molecular Descriptors for Chemoinformatics, 2nd.; WILEY-VCH: Weinheim, 2009. (57) Kier, L. B.; Hall, L. H.; Frazer, J. W. An Index of Electrotopological State for Atoms in Molecules. J. Math. Chem. 1991, 7, 229. (58) Ruecker, C.; Ruecker, G. Mathematical Relation between Extended Connectivity and Eigenvector Coefficients. J. Chem. Inf. Model. 1994, 34, 534.

12689

DOI: 10.1021/acs.iecr.5b02982 Ind. Eng. Chem. Res. 2015, 54, 12682−12689