On the Solubility of Ferrocene in Nonaqueous Solvents - Journal of

Dec 4, 2015 - School of Health, Shiraz University of Medical Sciences, Shiraz, Iran ... The solubility of ferrocene in various organic solvents is imp...
0 downloads 0 Views 816KB Size
Article pubs.acs.org/jced

On the Solubility of Ferrocene in Nonaqueous Solvents Saeed Yousefinejad,*,† Fatemeh Honarasa,§ and Aida Solhjoo∥ †

School of Health, Shiraz University of Medical Sciences, Shiraz, Iran Department of Chemistry, Shiraz Branch, Islamic Azad University, Shiraz, Iran ∥ Department of Chemistry, Yasouj University, Yasouj, Iran §

S Supporting Information *

ABSTRACT: The solubility of ferrocene in various organic solvents is important because of its application in chemical process and its role as a standard electrochemical probe in nonaqueous systems. A multilinear quantitative structure property relationship (QSPR) model based on theoretical descriptors and a new linear solvation energy relationship (LSER) model based on empirical scales of solvents are suggested for Ostwald solubility coefficient of ferrocene in 35 organic solvents. The constructed models were validated using different statistical approaches such as internal validation and external test set. In addition to excellent external prediction ability, QSPR and LSER models covered 94 and 92% of cross-validated variance, respectively. The proposed models confirmed the role of polar interactions and basicity of solvents during the solvation of ferrocene in organic phase.

1. INTRODUCTION The subject of solubility is always interesting for a scientist because of its importance in numerous areas such as scientific research, industry, and environmental problems. Solubility in water is mostly focused on environmental studies, erosion, pollution, and mass transfer.1 On the other hand, solubility in organic solvents is the concern of many chemical and also industrial problems. Solubility is correlated to the bioavailability of pharmaceuticals, biodegradation, toxicity of toxic materials, suitability of gaseous anesthetics, and so forth.1 To have predictive models, different quantitative structure property relationship (QSPR) studies have been performed on the solubility of various solutes in different solvents.1−4 Ferrocene is a well-known chemical and electrochemical probe especially in nonaqueous systems. Ferrocene and its numerous derivatives have been applied in different disciplines such as material chemistry, organometallic ligand scaffold, pharmaceutical, and fuel additive.5−8 Because of importance, a lot of physicochemical properties of ferrocene have been investigated before such as vapor pressure,9,10 the enthalpy of sublimation,10 solubility in water,11,12 and some electrochemical properties.13−15 Different approaches have been conducted to investigate ferrocene solubility.16 One of the important studies on this subject was done by Abraham et al.17 that determined the solvation parameters of ferrocene, including the overall hydrogen bond basicity in a set of organic solvents. They also illustrated how these solvation parameters may be utilized to deduce different physicochemical properties.17 Dabrowski et al. also determined the solubility of ferrocene and some substituted ferrocenes in different organic solvents.18 © XXXX American Chemical Society

In the current study, we used two approaches to model the solubility of ferrocene in organic solvents; one is based on the theoretical properties of solvents structures using a QSPR method and the other one is based on some solvent empirical scales to represent a new linear solvation energy relationship (LSER) methodology. The basis of the definition of most of the empirical parameters used in our LSER approach is choosing an experimental probe and following its response in different solvents.19,20 Using the solvent scales in modeling of the solubility makes it possible to clarify some solute−solvent interactions involve in solvation process.4,21,22 The term “Solvent empirical parameters” has been introduced to cover most of the intermolecular interactions of the solvents.23

2. MATERIALS AND METHODS 2.1. Data Preparation and Processing. To investigate the solubility of ferrocene in organic phase, two different approaches were tested to construct predictive models. In the first one, the theoretical and structural descriptors of solvents were utilized to build a QSPR model. To do so, the Ostwald solubility of ferrocene in 34 organic solvents were utilized from the work of Katritzky et al.1 and related solubility in acetonitrile from the work of Abraham et al.17 Ostwald solubility is the gas/ organic solvent partition coefficient in logarithmic form (log10(L)).17 Structural parameters of the solvents were computed using Dragon software24 (Milano Chemometrics Received: September 9, 2015 Accepted: November 18, 2015

A

DOI: 10.1021/acs.jced.5b00768 J. Chem. Eng. Data XXXX, XXX, XXX−XXX

Journal of Chemical & Engineering Data

Article

Table 1. Experimental and Predicted Oswald Solubility Coefficient of Ferrocene in Various Solvents model 1

a

model 2

no.

solvent name

log L (exp)

log L (pred)

residual

log L (pred)

residual

S1 S2a S3 S4 S5b S6a S7 S8 S9 S10b S11a S12b S13 S14 S15 S16 S17 S18b S19a S20 S21 S22a,b S23 S24 S25 S26 S27a S28 S29 S30 S31b S32a S33 S34 S35

hexane octane nonane decane cyclohexane methylcyclohexane 2,2,4-trimethylpentane 1,2-dichloroethane benzene toluene m-xylene p-xylene ethylbenzene methanol ethanol 1-propanol 2-propanol 1-butanol 2-butanol 2-methyl-1-propanol 2-methyl-2-butanol 3-methyl-1-butanol 1-pentanol 4-methyl-2-pentanol 1-hexanol 2-ethyl-1-hexanol 1-heptanol 1-octanol dibutyl ether methyl t-buthyl ether methyl acetate ethyl acetate butyl acetate dimethyl sulfoxide acetonitrile

5.62 5.87 5.61 5.60 5.59 6.27 5.81 5.51 6.36 6.36 5.35 6.21 6.19 6.18 5.30 5.39 5.46 5.50 5.54 5.43 5.40 5.44 5.48 5.41 5.42 5.54 5.53 5.87 5.93 5.99 6.02 5.55 6.01 5.68 5.60

5.58 5.86 5.60 5.67 5.67 6.32 5.75 5.46 6.29 6.41 5.34 6.18 6.25 6.15 5.16 5.34 5.44 5.55 5.54 5.55 5.45 5.46 5.58 5.51 5.44 5.56 5.52 5.91 5.87 6.02 5.99 5.51 5.84 5.68

0.04 0.01 0.01 −0.07 −0.08 −0.05 0.06 0.05 0.07 −0.05 0.01 0.03 −0.06 0.03 0.14 0.05 0.02 −0.05 0.00 −0.12 −0.05 −0.02 −0.10 −0.10 −0.02 −0.02 0.01 −0.04 0.06 −0.03 0.03 0.04 0.17 0.00

5.71 5.65 5.75 5.62 5.83 5.71 5.56 6.28 6.24 6.21 6.23 6.24 6.23 5.28 5.38 5.42 5.29 5.52 5.41 5.36 5.50 5.44 5.50

−0.09 −0.04 −0.15 −0.03 0.04 0.10 −0.05 0.08 0.12 0.06 −0.02 −0.05 −0.05 0.02 0.01 0.04 0.06 −0.02 0.02 0.04 0.04 0.00 −0.02

5.50 5.51 5.51 5.50 5.93 5.96 5.84 5.99 6.03 5.71 5.70

0.03 −0.09 0.04 0.04 −0.06 0.03 0.09 0.03 −0.02 −0.03 −0.10

Compounds in the test set of model #1. bCompounds in the test set of model #2.

scopic, and (iii) empirical parameters obtained from other measurements)20 were used as X-variables matrix to develop an LSER model for solubility of ferrocene in organic phases. Among 35 solvents used in the previous model, the experimental parameters of one solvent were unavailable and thus the information on 34 solvents was used. 2.2. Data Modeling. An essential step in the model construction is selecting the best informative descriptors and deterring redundant ones using a variable selection strategy. In the presented work, multiple linear regression analysis (MLR) with stepwise selection of variables was utilized, which is a common variable selection and simple linear regression method as well.25,26 All selected theoretical descriptors of model 1 and empirical parameters of model 2 were autoscaled using their mean values and standard deviations to obtain better performance and make them dimensionless.27 The solvents were randomly divided into two parts; training set for model development and test set for its external validation. In addition to external validation, cross validation and y-scrambling were also done to evaluate the stability and significance of both suggested models.

and QSAR research group; http://michem.disat.unimib.it/ chm/). The input of Dragon was the structure files of solvents that were prepared by Hyperchem software (Version 7, Hypercube Inc., http://www.hyper.com, U.S.A.) and were optimized by semiempirical method of AM1. The name of solvents and solubility of ferrocene (log10(L)) in these solvents are presented in Table 1. After calculation of theoretical descriptors some pretreatments were done to prepare the descriptor set for variable selection step: (i) Numerous structural descriptors were lessened by removing descriptors that could not be calculated for every solvent in the data set and those ones with an essentially constant or near constant value for all solvents as well. (ii) To decrease the redundant descriptors in the Xvariables matrix, the correlations between all generated descriptors and the solubility vector of ferrocene were examined and among the detected collinear descriptors (i.e., R2 > 0.95) one with the highest correlation with log10(L) was retained and the others were removed. After these pretreatments, 403 descriptors were remained for each solvent. In the second approach, 127 empirical scales of solvents of three principal categories ((i) equilibrium-kinetic, (ii) spectroB

DOI: 10.1021/acs.jced.5b00768 J. Chem. Eng. Data XXXX, XXX, XXX−XXX

Journal of Chemical & Engineering Data

Article

All calculations and statistical treatments were done in MATLAB environment (version 7, Math work, Inc., http:// mathworks.com, U.S.A.) using a laptop computer under the Windows XP operating system.

3. RESULTS AND DISCUSSION 3.1. Solvent Structure-Ferrocene Solubility (Model 1). This part of work was conducted using 403 solvent structural descriptors for 35 solvents with known Ostwald solubility coefficients for ferrocene.1 Hence the descriptors were collected in a data matrix with the size of 34 × 403. Then this data set was randomly divided to training and test series. To do so, 28 solvents out of 35 (80% of total) were chosen as the training set to build the QSPR model and the remaining 7 solvents (20%) were reserved as the test set. It should be noted that the log10(L) values of ferrocene for the solvents of training set covered those range in the test set. In other words, the organic solvents in the training set was able to dissolve ferrocene in range of 5.30 to 6.36, while the solubility of ferrocene in solvents of the test set was in the range of 5.43 to 6.21. To detect an outlier in an iterative process during variable selection, each solvent were deleted from the model and resulted improvements were followed. It was detected that acetonitrile (solvent #35 in Table 1) can not be modeled very well and thus the procedure was continued for only 34 solvents (27 solvents in the training and 7 solvents in the test set). Eleven different linear models were generated step by step through the stepwise regression run but it is clear that some of them may be overfitted. To indicate overfitting, cross-validation was run on each model. The variations of the squared correlation coefficients of calibration (R2train) and cross validation (Q2) are shown in Figure 1a. By looking at R2train, the goodness of fit in the models was refined up to the four-parametric model by introducing each new descriptor and after that no drastic change occurred. The Q2 metric in cross validation confirmed the significance of the four-parametric model in comparison with simpler or more complex models. The equation of this distinguished model is represented in the following

Figure 1. Training and cross validation performance of model #1 versus number of included parameters (a) and plot of predicted log L of ferrocene using model #1 versus experimental values in organic solvents (b).

from Table S1, t-test confirms the significance of the above QSPR model with p-values almost equal to zero. In eq 1, Mv denotes the mean atomic van der Waals volume (scaled on Carbon atom), MATS4v is Moran autocorrelation of lag 4 weighted by van der Waals volume, MATS1e is related to Moran autocorrelation of lag 1 weighted by Sanderson electronegativity and BEHm1 is the highest eigenvalue (number 1) of Burden matrix weighted by atomic mass (see Table 2).

log10(L) = 5.812(± 0.407) + 5.897(± 0.378)Mv − 0.214(± 0.059)MATS4v + 2.502(± 0.303)MATS1e − 0.875(± 0.136)BEHm1 N = 34,

Ntrain = 27,

Q 2 LOO = 0.94,

Ntest = 7,

F = 163.86,

(1)

Table 2. Brief Definitions and Description of Variables of Two Proposed Models

R2 train = 0.95,

solvent scale

Fcrit(95%) = 2.82

Mv

where N is the number of solvents, R2train is the squared correlation coefficient of calibration, and Q2LOO is the squared correlation coefficients obtained using leave-one-out crossvalidation strategy and their high values (0.95 and 0.94) show the goodness of fit and stability of the suggested QSPR model. Moreover, F is the Fischer F-statistic and Fcrit denotes the Fcritical value. When the value of calculated F is bigger than the Fcrit, it can indicate the statistical significance of the developed model. The values between parentheses in eq 1 are the standard deviations of the model’s coefficients in which their low values verify the significance of the selected parameters. Some other statistical tests for significance of the coefficients presented in the model such as t-test and their corresponding p-values are presented in Table S1 (Supporting Information). As it is clear

MATS4v MATS1e H1m E(SVB) ΔH0BF3 log γKc

C

definition

property of scale

mean atomic van der Waals volume (scaled on Carbon atom) Moran autocorrelation of lag 4 weighted by atomic van der Waals volume Moran autocorrelation of lag 1 weighted by Sanderson electronegativity highest eigenvalue (number 1) of Burden matrix weighted by atomic mass average equilibrium and chromatographic distribution constants on Amberlite XAD-2, SM-2 and XAD-4 enthalpy of complexation of solvents with BF3 in dichloromethane fluorescence quenching rate constants for naphthonitrile-olefin and furan pairs

molecular size and volume topology and size/ volume topology and electronegativity/ polarity topology and structural mass branching in solvent structure basicity solvent’s polarity and polar interactions

DOI: 10.1021/acs.jced.5b00768 J. Chem. Eng. Data XXXX, XXX, XXX−XXX

Journal of Chemical & Engineering Data

Article

Table 3. Various Statistical Parameters of Two Developed Models model 1 (QSPR) model 2 (LSER)

nva

Ntrainb

Ntestc

R2traind

RMSEtraine

R2LOOcvf

R2LMO cvg

RMSEcvh

R2testi

RMSEtestj

Q2MPk

4 3

27 28

7 6

0.95 0.95

0.22 0.23

0.94 0.92

0.93 0.92

0.26 0.29

0.98 0.98

0.10 0.12

0.08 0.15

a

Number of descriptors applied for the model development. bNumber of molecules in training set. cNumber of molecules in test set. dTraining correlation coefficient. eTraining root-mean-square error. fLeave-one-out correlation coefficient. gLeave-many-out cross-validation correlation coefficient (Leave-three-out in model 1 and Leave-four-out in model 2). hRoot-mean-square error of leave-one-out cross-validation. iCorrelation coefficient of the test set. jTest root-mean-square errors. kMaximum cross-validation correlation coefficient for Y-randomization test.

statistical parameters of the mentioned QSPR model (model 1) are summarized in Table 3. The numerical values of utilized descriptors in eq 1 are presented in Table S3 (Supporting Information). Solvent Property-Ferrocene Solubility Relationship (Model 2). The goal of this part was using solvent empirical scales as the independent variable and thus the developed model could be classified in the linear solvation energy relationships (LSERs). It is clear that in the LSER approach, the focus is on the effects of solvent−solute interactions on physicochemical properties and on the investigation of reactivity parameters.31 So from this aspect, the model that will be discussed in the following is similar to the LSERs that has been suggested by Kamlet and Taft.32,33 The empirical properties of 34 solvents out of 35 ones were available in literature and the data of 4-methyl-2-pentanol (solvent # 24 in Table 1) was not found. After data pretreatment, 122 empirical scales of our 34 solvents were collected in a matrix of the size of 34 × 122. Here, 28 solvents were randomly chosen as the training set and remaining 6 solvents were used as the test compounds. Similar to the process conducted to build model 1 in previous section, stepwise MLR was used to select the best subset of empirical parameters and cross validation was used to choose the final best model. According to Figure 2a, among the four suggested models by Stepwise MLR (including one to four parameters) the performance of models (training and internal validation) was not sharply improved in the model with more than three parameters. So, the three-parametric model was chosen because of its lower risk of overfitting, which is presented in eq 2

Estimation of the relative effect of each of these parameters in prediction of the solubility of ferrocene in organic phase can be possible by looking at the magnitude of the coefficients related to each parameter, which will be discussed with more details in the next sections. After autoscaling of the selected parameters and also the target vector (ferrocene solubility), the linear model was rebuilt and the standardized MLR-coefficients were obtained, which are presented in Table S1 (Supporting Information). The standardized coefficients were used to predict the solubility of ferrocene in organic solvents of the training set. The predicted values of ferrocene solubility in different solvents are shown in Table 1. To evaluate the stability of the proposed QSPR model, leave-one-out and leave-three out cross validations were also conducted. Q2LOO and Q2L3O were calculated equal to 0.94 and 0.93, respectively. In comparison with R2train, which can be enhanced by adding more descriptors, Q2 metric commonly is declined in the presence of an over parametrized model.28 The closeness of Q2 (leave-one-out and leave-many-out cross validation) to R2train implies the stability and prediction ability of model 1. However, the cross validation obtained excellent results but it has been proven that cross validation is not enough and the true prediction power of a QSPR model should be checked by an external test set of compounds that were not used during model building step.29 Hence, the model was utilized to predict the ferrocene solubility in 7 solvents of the test set (See Table 1). Squared correlation coefficient (R2test) and root-meansquare error of the test set (RMSEtest) were obtained equal to 0.98 and 0.10 respectively, which is another indicator of the excellent predictability of our QSPR model. As it is illustrated in Figure 1b, there is an outstanding agreement between experimental and predicted values of ferrocene solubility in both training and test sets. To indicate that the performance of the suggested QSPR model is not significantly varied by changing the composition of solvents in the training and test set, three other random data splitting (into training and test sets) were performed. The model construction and external validation were done using these different training and test sets (train-test 2, train-test 3, and train-test 4) and the statistical quality of these models are summarized in Table S2 (Supporting Information). On the basis of results, goodness of fit and prediction power of models using parameters of eq 2 were not changed significantly by changing the composition of utilized training and test sets. . In addition to cross validation and external validation, the significance of Q2 was investigated by y-randomization or yscrambling.26,30 In the current work, ferrocene solubility vector in the training set was scrambled 30 times and cross validation was run on each scrambled data set. The maximum of the Q2 in these scrambled data (Q2MP) was 0.08. This very low value of Q2MP in comparison with the Q2 of the original data set confirms that the model 1 was not developed by chance. All the

log10(L) = 8.609(± 0.574) + 2.945(± 0.251)E(SVB) − 0.005(± 0.001)ΔH0 BF3 + 0.280(± 0.058)log γ K c (2) N = 34,

Ntrain = 28,

Q 2 LOO = 0.92,

Ntest = 6,

F = 131.604,

R2 train = 0.95, Fcrit(95%) = 3.01

As it is observed, the selected solvent scales show good ability in establish correlation between solvent properties and solubility of ferrocene. Equation 2 covers more than 0.95 of data variance in the training set and its significance is confirmed by the F-test. The t-value of each obtained coefficient in the model and the standardized coefficients after data’s auto scaling are also represented in Table S4 (Supporting Information). In addition to leave-one-out cross validation, leave-four-out procedure was also done and Q2L4O was obtained equal to 0.92. The high value of Q2 metrics and their closed value to R2train indicate the stability and prediction ability of the proposed LSER model. To further check the predictability, the mentioned LSER model was used to predict the solubility of ferrocene in the solvents of external test set and R2test and D

DOI: 10.1021/acs.jced.5b00768 J. Chem. Eng. Data XXXX, XXX, XXX−XXX

Journal of Chemical & Engineering Data

Article

of literature,35 a warning leverage was defined as h* = 3m/n, where n is the number of solvents in training set and m is the number of parameters plus one (here 4 + 1 for model 1 and 3 + 1 for model 2). The leverage values for all solvents of ferrocene in the training and test sets were calculated for both QSPR and LSER models. It is clear that a solvent with a leverage bigger than h* causes an outstanding influence on the regression line and can force the line to put near the result of that solvent. So, it is a criterion to recognize solvents out of AD. Not only high leverage compounds but also those with high standardized residuals35 can be located out of AD. As a result, Williams plot combines both leverage and standardized residual concepts,31 which are represented in Figure 3a,b for models 1

Figure 2. Training and cross validation performance of model #2 versus number of included parameters (a) and plot of predicted log L of ferrocene using model #2 (LSER) versus experimental values in organic solvents (b).

RMSEtest were calculated equal to 0.98 and 0.14, respectively, which show its high external predictive ability. Here again, to show the stability of model’s quality by using different training and test sets two other random compositions of solvents were selected as train-test 2 and train-test 3. The results of training, cross validation, and external validation of these sets are shown in Table S5 (Supporting Information). As it is represented in Table S5, the quality of the LSER model was kept when it was reconstructed and evaluated using different training and test sets. The maximum Q2 of y-randomization test (Q2MP) among 30 randomized models was calculated equal to 0.15 that denotes the LSER model is free of chance correlation. All the statistical parameters of model #2 are summarized in Table 3 and the excellent agreement between experimental and predicted values of ferrocene solubility in organic solvents is illustrated in Figure 2b. The numerical values of empirical parameters of eq 2 are shown in Table S6 (Supporting Information). 3.3. Validation Based on Applicability Domain. One of the common methods in the validation of a QSPR model is determining the applicability domain (AD) that indicates the limitation of a model and an accurate prediction of the property of similar compounds.34 Different approaches have been proposed to define AD but one of the famous ways is using the concept of leverage and standardized residual.31 Leverage is a concept to check the multivariate normality of observations35 and provides a measure of the distance of that observation from the centroid of the model’s calibration space. On the basis of the recommendation

Figure 3. Williams plot of the entire set of solvents in model 1 (a) and model 2 (b). Cut off values of leverage (h*) and standardized residual (±3 times the standards deviation) are shown by vertical dashed line on x-axis and horizontal dashed lines on y-axis, respectively. DMSO and butyl acetate are out of the applicable domain in model 1 and model 2, respectively.

and 2, respectively. In this plot, the boundaries of acceptable leverage on the x-axis and standardized residual on y-axis are shown using dashed lines. It is clear from Figure 3 that all solvents are in the admissible AD except dimethyl sulfoxide (DMSO) in model 1 and 1,2 dichloroethane in model 2. On the other hand all solvents show a standardized residual in the admissible range of ±3σ. According to these results, it could be concluded that both QSPR and LSER models, proposed for the solubility of ferrocene in organic phase, show acceptable AD. One of the reasons for getting some results out of AD is the limitation of number of solvents used for model construction. For example, as it is represented in Supporting Information E

DOI: 10.1021/acs.jced.5b00768 J. Chem. Eng. Data XXXX, XXX, XXX−XXX

Journal of Chemical & Engineering Data

Article

Table 4. Correlation Coefficient between Parameters of QSPR and LSER Models and Their VIF Values Mv MATS4v MATS1e H1m E(SVB) ΔH0BF3 log γKc a

Mv

MATS4v

1.00 0.04 0.27 0.36

1.00 0.00 0.01

MATS1e

BEHm1

1.00 0.68

E(SVB)

ΔH0BF3

log γKc

VIFa

1.00

1.758 1.176 3.225 3.995 1.407 1.533 1.191

1.00 1.00 0.23 0.01

1.00 0.09

Variation inflation factor.

with more mean atomic van der Waals volume are more capable for solving ferrocene. The next descriptor in the model is MATS1e, which is categorized in 2D autocorrelation descriptors and denotes the Moran autocorrelation of lag 1 weighted by Sanderson electronegativity.38 On the basis of the positive sign of the coefficient of MATS1e, it can be concluded that the electronegativity of organic solvents and consequently the polar interactions have a remarkable role in solvation of ferrocene in organic phases. Thus, increasing the electronegativity of solvents and polar interactions can increase the solubility of ferrocene. The third descriptor in the model is BEHm1 and shows the “highest eigenvalue (number 1) of Burden matrix weighted by atomic mass”. The negative sign of BEHm1 implies that the mass of organic solvent has a reverse relationship with solubility of ferrocene. The last parameter of the model 1 is MATS4v, which is again a 2D autocorrelation index and denotes Moran autocorrelation of lag 4 weighted by atomic van der Waals volume38 with a negative coefficient. Because the van der Waals volume shows a positive effect on solvation of ferrocene in Mv and a negative role in MATS4v, the direction of the effect of atomic volume seems more complex; however, because of higher rank of importance in Mv in comparison with MATS4v the positive effect of van der Waals volume of solvents’ atoms on the solubility of ferrocene is more probable. 3.4.2. Interpretation of Model 2. By looking at the standardized regression coefficient of model 2 (Supporting Information Table S4) the order of importance of the variables in three-parametric model of eq 2 is SVB > ΔH0BF3 > log γKc and is similar to the order of entering in the model. As it was noted previously, brief definitions and descriptions of variables of two proposed models are presented in Table 2. SVB or ε is the first-rank parameter in the suggested LSER model for ferrocene solubility and deals with the average equilibrium and chromatographic distribution constants on some SVB (styrene-divinylbenzene) adsorbents.39 The value of this parameter has an almost direct relationship with the branching of the solvent structure.39,21 According to the positive sign of this scale in eq 2, we can say that the branched and cyclic solvents are a better options for dissolving ferrocene than n-hydrocarbons with lower structural branches. However, it should be emphasized that eq 2 is a multiparameter model and its final decision depends on the effect of all involved parameters. The next parameter in model 2 is ΔH0BF3 that denotes “enthalpy of complexation of solvents with BF3 in dichloromethane” and is a solvent Lewis basicity scale.40 The negative sign of the coefficient of this parameter shows that increasing the basicity of the solvent decreases the solubility of ferrocene.

Table S3, the numerical value of some descriptors of DMSO is different from other solvents. We can overcome this limitation if we use more solvents for model building. Collinearity and Multicollinearity. However, the simplicity and interpretability are the remarkable advantages of MLR models but the linear independency between the variables of the MLR model is a necessary condition for its accuracy. If molecular descriptors are mathematically independent or orthogonal to each other, then the magnitudes of descriptors’ coefficients denote the relative significance of the descriptors and their signs can show their positive or negative contribution to the target property (ferrocene solubility). In other words, collinear descriptors can lead to coefficients larger than expected or those with the wrong signs.36 To check this issue, the pair correlation between the independent variables of model 1 and model 2 were checked using the correlation coefficients matrix that is presented in Table 4. According to this table, both developed QSPR and LSER have no pair-correlation limitation. On the other hand, the existence of multicollinearity between the parameters in each model can be also a limitation for the accuracy of coefficients. Variance inflation factor (VIF), which has been proposed to indicate such multicollinearities,37 was calculated according to eq 3 for the variables of both models and are included in the last column of Table 3. VIF =

1 1 − R j2

(3)

where R2j is the multiple correlation coefficient of molecular variable j regressed on the remaining variables. According to the original literature, VIFs bigger than a cutoff value equal to 5.0 is an indicator of outstanding multicollinearitiy and shows that information on variable j may be hidden by other descriptors of model.37 As it is clear from Table 3, the VIF values of four parameters in model 1 and three scales in model 2 have VIFs lower than cutoff value and no significant multicollinearitiy can be detected. According to the results of this part, sign and magnitude of models’ coefficient could be used for interpretation of two new proposed models and will be discussed in the following. 3.4.1. Interpretation of Model 1. On the basis of the absolute values of standardized regression coefficients of model 1, the relative importance of the variables included in eq 1 is Mv > MATS1e > BEHm1 > MATS4v (see Table S1 in Supporting Information). In this ordering, the most significant descriptor is Mv while MATS4v is the less important one. Mv belongs to the constitutional indices and is the mean atomic van der Waals volume (scaled on Carbon atom) and the positive sign of its coefficient in model 1 shows that solvents F

DOI: 10.1021/acs.jced.5b00768 J. Chem. Eng. Data XXXX, XXX, XXX−XXX

Journal of Chemical & Engineering Data



The last solvent scale in the proposed LSER model is log γKc and is known as fluorescence quenching rate constant for naphthonitrile-olefin and furan pairs and is related to polarity of solvents.41 In other words, the presence of this parameter in the LSER model shows that polar interactions have a significant role in solubility of ferrocene in organic phases.

AUTHOR INFORMATION

Corresponding Author

*E-mail: yousefi[email protected]. Tel.: +98 917 704 2635. Notes

The authors declare no competing financial interest.





CONCLUSION Because of the application of ferrocene as a well-known chemical and electrochemical probe, new strategies were suggested using QSPR and also a kind of LSER to predict the solubility of ferrocene in organic phase. To have a comparison, the statistics of performance of both QSPR and LSER models are summarized in Table 2. The Q2MP of both models are nonsignificant26 and confirms a nonchancy model. According to the obtained results, however both proposed models resulted in very good statistics, but the LSER model used only three parameters and so is simpler than fourparametric QSPR model. Hence, it was shown that empirical solvent scale supplies outstanding ability to model the solubility of ferrocene. Another advantage of the LSER approach over the proposed QSPR is its ability in description of some aspects in solvent−solute interactions. Another big advantage of solvent empirical parameters in comparison with theoretical descriptors of structures is their lower initial population and, consequently, lower risk of chance correlation in variable selection process.21,22 However, an important constraint and disadvantage of model construction based on empirical scales in LSER over utilizing theoretical descriptors in QSPR is the limitation in gathering information for new solvents in the data set. For example, in our study we could not obtain empirical scales of one of the solvents, while the descriptors in QSPR approach are calculated theoretically with the contribution of the available software and this model can be easily applied for the prediction of the ferrocene solubility in new solvents (even for virtual ones that have not been checked experimentally yet). Combined application of theoretical descriptors and solvent empirical scales resulted in successful approach in the investigation of solvation of ferrocene in different organic solvents. The roles of mean atomic van der Waals volume and atomic mass appeared from model 1. On the other side, by looking at LSER model (model 2) it was indicated that polar interactions have a significant role in the solubility of ferrocene in the organic phase. Also, the basicity of solvents has the reverse effect on the solubility of ferrocene.



Article

REFERENCES

(1) Katritzky, A. R.; Oliferenko, A. A.; Oliferenko, P. V.; Petrukhin, R.; Tatham, D. B.; Maran, U.; Lomaka, A.; Acree, W. E. A General Treatment of Solubility. 1. The QSPR Correlation of Solvation Free Energies of Single Solutes in Series of Solvents. J. Chem. Inf. Model. 2003, 43, 1794−1805. (2) Katritzky, A. R.; Oliferenko, A. a.; Oliferenko, P. V.; Petrukhin, R.; Tatham, D. B.; Maran, U.; Lomaka, A.; Acree, W. E. A General Treatment of Solubility. 2. QSPR Prediction of Free Energies of Solvation of Specified Solutes in Ranges of Solvents. J. Chem. Inf. Model. 2003, 43, 1806−1814. (3) Katritzky, A. R.; Tulp, I.; Fara, D. C.; Lauria, A.; Maran, U.; Acree, W. E. A General Treatment of Solubility. 3. Principal Component Analysis (PCA) of the Solubilities of Diverse Solutes in Diverse Solvents. J. Chem. Inf. Model. 2005, 45, 913−923. (4) Yousefinejad, S.; Honarasa, F.; Montaseri, H. Linear Solvent Structure-Polymer Solubility and Solvation Energy Relationships to Study Conductive Polymer/carbon Nanotube Composite Solutions. RSC Adv. 2015, 5, 42266−42275. (5) Conroy, D.; Moisala, A.; Cardoso, S.; Windle, A.; Davidson, J. Carbon Nanotube Reactor: Ferrocene Decomposition, Iron Particle Growth, Nanotube Aggregation and Scale-Up. Chem. Eng. Sci. 2010, 65, 2965−2977. (6) Liu, W.; Xu, Q.; Ma, Y.; Liang, Y.; Dong, N.; Guan, D. SolventFree Synthesis of Ferrocenylethene Derivatives. J. Organomet. Chem. 2001, 625, 128−131. (7) Top, S.; Vessières, A.; Leclercq, G.; Quivy, J.; Tang, J.; Vaissermann, J.; Huché, M.; Jaouen, G. Synthesis, Biochemical Properties and Molecular Modelling Studies of Organometallic Specific Estrogen Receptor Modulators (SERMs), the Ferrocifens and Hydroxyferrocifens: Evidence for an Antiproliferative Effect of Hydroxyferrocifens on Both Hormone-Depen. Chem. - Eur. J. 2003, 9, 5223−5236. (8) Chao, T. S.; Owston, E. H. Iron-Containing Motor Fuel Compositions and Method for Using Same. US4104036A, 1978. (9) Torres-Gómez, L. A.; Barreiro-Rodríguez, G.; Méndez-Ruíz, F. Vapour Pressures and Enthalpies of Sublimation of Ferrocene, Cobaltocene and Nickelocene. Thermochim. Acta 1988, 124, 179−183. (10) Da Silva, M. A. V. R.; Monte, M. J. S. The Construction, Testing and Use of a New Knudsen Effusion Apparatus. Thermochim. Acta 1990, 171, 169−183. (11) Brisset, J. L. Solubilities of Some Electrolytes in Water-Pyridine and Water-Acetonitrile Solvent Mixtures. J. Chem. Eng. Data 1982, 27, 153−155. (12) Wu, J.-S.; Toda, K.; Tanaka, A.; Sanemasa, I. Association Constants of Ferrocene with Cyclodextrins in Aqueous Medium Determined by Solubility Measurements of Ferrocene. Bull. Chem. Soc. Jpn. 1998, 71, 1615−1618. (13) Gagne, R. R.; Koval, C. A.; Lisensky, G. C. Ferrocene as an Internal Standard for Electrochemical Measurements. Inorg. Chem. 1980, 19, 2854−2855. (14) Zara, A. J.; Machado, S. S.; Bulhões, L. O. S.; Benedetti, A. V.; Rabockai, T. The Electrochemistry of Ferrocene in Non-Aqueous Solvents. J. Electroanal. Chem. Interfacial Electrochem. 1987, 221, 165− 174. (15) Tsierkezos, N. G. Cyclic Voltammetric Studies of Ferrocene in Nonaqueous Solvents in the Temperature Range from 248.15 to 298.15 K. J. Solution Chem. 2007, 36, 289−302. (16) Cowey, C. M.; Bartle, K. D.; Burford, M. D.; Clifford, A. A.; Zhu, S.; Smart, N. G.; Tinker, N. D. Solubility of Ferrocene and a Nickel Complex in Supercritical Fluids. J. Chem. Eng. Data 1995, 40, 1217−1221.

ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jced.5b00768. (Supporting Information of this article contains six tables: the MLR-coefficient of model 1 and their t and pvalues (Table S1), statistical parameter of the proposed QSPR model using three different random selected training and test sets (Table S2), the numerical value of four parameters in the model 1 (Table S3), the MLRcoefficient of model 2 and their t- and p-values (Table S4), statistical parameter of the LSER model using two different random selected training and test (Table S5), and the numerical value of three empirical parameters in the model 2 (Table S6). (PDF) G

DOI: 10.1021/acs.jced.5b00768 J. Chem. Eng. Data XXXX, XXX, XXX−XXX

Journal of Chemical & Engineering Data

Article

(17) Abraham, M.; Benjelloun-Dakhama, N.; Gola, J.; Acree, W.; Cain, W.; Cometto-Muniz, J. Solvation Descriptors for Ferrocene, and the Estimation of Some Physicochemical and Biochemical Properties. New J. Chem. 2000, 24, 825−829. (18) Da̧browski, M.; Misterkiewicz, B.; Sporzyński, a. Solubilities of Substituted Ferrocenes in Organic Solvents. J. Chem. Eng. Data 2001, 46, 1627−1631. (19) Katritzky, A. R.; Fara, D. C.; Yang, H.; Tämm, K.; Tamm, T.; Karelson, M. Quantitative Measures of Solvent Polarity. Chem. Rev. 2004, 104, 175−198. (20) Katritzky, A. R.; Fara, D. C.; Kuanar, M.; Hur, E.; Karelson, M. The Classification of Solvents by Combining Classical QSPR Methodology with Principal Component Analysis. J. Phys. Chem. A 2005, 109, 10323−10341. (21) Yousefinejad, S.; Honarasa, F.; Abbasitabar, F.; Arianezhad, Z. New LSER Model Based on Solvent Empirical Parameters for the Prediction and Description of the Solubility of Buckminsterfullerene in Various Solvents. J. Solution Chem. 2013, 42, 1620−1632. (22) Yousefinejad, S.; Hemmateenejad, B. A Chemometrics Approach to Predict the Dispersibility of Graphene in Various Liquid Phases Using Theoretical Descriptors and Solvent Empirical Parameters. Colloids Surf., A 2014, 441, 766−775. (23) Katritzky, A. R.; Tamm, T.; Wang, Y.; Sild, S.; Karelson, M. QSPR Treatment of Solvent Scales. J. Chem. Inf. Model. 1999, 39, 684−691. (24) Mauri, A.; Consonni, V.; Pavan, M.; Todeschini, R. Dragon Software: An Easy Approach to Molecular Descriptor Calculations. MATCH Commun. Math. Comput. Chem. 2006, 56, 237−248. (25) Yousefinejad, S.; Honarasa, F.; Saeed, N. Quantitative StructureRetardation Factor Relationship of Protein Amino Acids in Different Solvent Mixtures for Normal-Phase Thin-Layer Chromatography. J. Sep. Sci. 2015, 38, 1771−1776. (26) Yousefinjead, S.; Hemmateenejad, B. Chemometrics Tools in QSAR/QSPR Studies: A Historical Perspective. Chemom. Intell. Lab. Syst. 2015, DOI: 10.1016/j.chemolab.2015.06.016. (27) Honarasa, F.; Yousefinejad, S.; Nasr, S.; Nekoeinia, M. Structure−electrochemistry Relationship in Non-Aqueous Solutions: Predicting the Reduction Potential of Anthraquinones Derivatives in Some Organic Solvents. J. Mol. Liq. 2015, 212, 52−57. (28) Hawkins, D. M.; Basak, S. C.; Mills, D. Assessing Model Fit by Cross-Validation. J. Chem. Inf. Model. 2003, 43, 579−586. (29) Tetko, I. V.; Livingstone, D. J.; Luik, A. I. Neural Network Studies. 1. Comparison of Overfitting and Overtraining. J. Chem. Inf. Model. 1995, 35, 826−833. (30) Rücker, C.; Rücker, G.; Meringer, M. Y-Randomization and Its Variants in QSPR/QSAR. J. Chem. Inf. Model. 2007, 47, 2345−2357. (31) Todeschini, R.; Consonni, V.; Gramatica, P. Chemometrics in QSAR. In Comprehensive Chemometrics: Chemical and Biochemical Data Analysis; Tauler, R., Walczak, B., Brown, S. D., Eds.; Elsevier B.V.: Amsterdam, 2009. (32) Kamlet, M. J.; Taft, R. W. Linear Solvation Energy Relationships. Part 1. Solvent Polarity-Polarizability Effects on Infrared Spectra. J. Chem. Soc., Perkin Trans. 2 1979, No. 3, 337. (33) Taft, R. W.; Abboud, J.-L. M.; Kamlet, M. J.; Abraham, M. H. Linear Solvation Energy Relations. J. Solution Chem. 1985, 14, 153− 186. (34) Practical Guide to Chemometrics, 2nd ed.; Gemperline, P., Ed.; Taylor & Francis Group: Boca Raton, 2006. (35) Netzeva, T. I.; Worth, A. P.; Aldenberg, T.; Benigni, R.; Cronin, M. D. .; Gramatica, P.; Jaworska, J. S.; Kahn, S.; Klopman, G.; A, C.; Myatt, G.; Nikolova-jeliazkova, N.; Patlewicz, G. Y.; Perkins, R. Current Status of Methods for Defining the Applicability Domain of (Quantitative) Structure − Activity Relationships. ATLA 2005, 2, 1− 19. (36) Eriksson, L.; Jaworska, J.; Worth, A. P.; Cronin, M. T. D.; McDowell, R. M.; Gramatica, P. Methods for Reliability and Uncertainty Assessment and for Applicability Evaluations of Classification- and Regression-Based QSARs. Environ. Health Perspect. 2003, 111, 1361−1375.

(37) Craney, T. A.; Surles, J. G. Model-Dependent Variance Inflation Factor Cutoff Values. Qual. Eng. 2002, 14, 391−403. (38) Todeschini, R.; Consonni, V. Molecular Descriptors for Chemoinformatics, 2nd ed.; Wiley-VCH: Weinheim, 2009. (39) Robinson, J. L.; Robinson, W. J.; Marshall, M. A.; Barnes, A. D.; Johnson, K. J.; Salas, D. S. Liquid-Solid Chromatography on Amberlite XAD-2 and Other Styrene-Divinylbenzene Adsorbents. J. Chromatogr. A 1980, 189, 145−167. (40) Maria, P. C.; Gal, J. F. A Lewis Basicity Scale for Nonprotogenic Solvents: Enthalpies of Complex Formation with Boron Trifluoride in Dichloromethane. J. Phys. Chem. 1985, 89, 1296−1304. (41) Pac, C.; Yasuda, M.; Shima, K.; Sakurai, H. Photochemical Reactions of Aromatic Compounds. XXXVII. Solvent Effects on Fluorescence Quenching and Photoreactions of Naphthonitrile-Olefin and Furan Systems. Qualitative Consideration on Electronic Structures and Reactivities of Nonemissive Exciplexes. Bull. Chem. Soc. Jpn. 1982, 55, 1605−1616.

H

DOI: 10.1021/acs.jced.5b00768 J. Chem. Eng. Data XXXX, XXX, XXX−XXX