Quantitative Structure–Property Relationship ... - ACS Publications

Apr 5, 2012 - Two linear and nonlinear models were produced using genetic function approximation (GFA) and adaptive neurofuzzy inference system ...
0 downloads 0 Views 247KB Size
Article pubs.acs.org/IECR

Quantitative Structure−Property Relationship Prediction of Liquid Heat Capacity at 298.15 K for Organic Compounds Aboozar Khajeh and Hamid Modarress* Department of Chemical Engineering, Amirkabir University of Technology (Tehran Polytechnic), Hafez Avenue, 15914 Tehran, Iran S Supporting Information *

ABSTRACT: Novel QSPR models were developed and evaluated for the prediction of heat capacity of liquids at 298.15 K with only three descriptors. Two linear and nonlinear models were produced using genetic function approximation (GFA) and adaptive neurofuzzy inference system (ANFIS) methods based on a data set of 706 compounds with a wide variety of functional groups. The results showed that both GFA and ANFIS methods could model the relationship between the liquid heat capacity of organic compounds and their structures with high accuracy. The predictive quality of the QSPR models were tested for an external test set, where the squared correlation coefficients of prediction for the GFA and ANFIS methods were 0.970 and 0.973, respectively.

1. INTRODUCTION The liquid heat capacity is an important fundamental physical chemical property of organic compounds. It is of great importance in chemical engineering, mainly for establishing energy balances and evaluating the phase equilibrium, the reaction yields, and separation ratio calculations. The heat capacities of liquids can be measured by various methods such as differential scanning calorimetry (DSC),1 scanning transitiometry,2 the hot-wire method,3 and temperature oscillation calorimetry.4 The methods for determining heat capacities of liquids were reviewed by some authors.5,6 Considering the importance of liquid heat capacity and complexity of it measurement, development of convenient and reliable theoretical methods to predict heat capacity of new compounds without experimentation is urgent and practical. Several investigators proposed empirical equations as a function of temperature and pressure for calculating liquid heat capacity.7,8 Group contribution (GC) is another method that is commonly used for estimating the heat capacity of pure liquids in various temperatures.9−11 Heat capacity is estimated on the basis of theory of equation of state and residual function method.12 Moreover, quantitative structure−property relationship (QSPR) models have been developed for prediction liquid heat capacity of pure compounds.13−17 The basic idea of QSPR is to find a relationship between the structure of molecules and their physicochemical properties. One of the most important factors influencing the quality of QSPR models is the computational method used to build a model. Many different computational methods such as multiple linear regression (MLR),18,19 partial least-squares analysis (PLS),20,21 multilayer perceptrons (MLP) neural network,21,22 radial basis function (RBF) neural network,23,24 and support vector machine (SVM)25,26 have been used in QSPR models. Genetic function approximation (GFA) and adaptive neurofuzzy inference system (ANFIS) are two alternative computational methods successfully used to develop QSPR models.27−31 GFA is a powerful technique that, by combining genetic algorithm with statistical modeling tools, produces a © 2012 American Chemical Society

population of statistically valid regression equations to get the best fit of response data. ANFIS is a flexible approach based on fuzzy logic and an artificial neural network that combines the advantages of both neural and fuzzy inference. It has been successfully applied to modeling complex nonlinear systems.32,33 In this work, we present new QSPR models for predicting liquid heat capacity values for a diverse set of organic compounds that contain only three descriptors. A genetic function approximation (GFA) procedure was used for selection of descriptors and developing linear model. With the use of a hybrid subtractive clustering ANFIS, the nonlinear behavior of these selected molecular descriptors for predicting the heat capacity of liquids was studied. Moreover, the produced models were validated using external validation, and the applicability domain of them was inspected.

2. MATERIALS AND METHODS 2.1. Data Set. To propose a comprehensive and predictive model, the 883 organic compounds and corresponding liquid heat capacities from various chemical families used in this study were collected from Yaws’ handbook.34 Heat capacity values of these compounds were in the range from 79.93 to 707.88 J/ (mol K) at 298.15 K. The names of the compounds used in this study together with their heat capacity values are listed in the Supporting Information. 2.2. Molecular Descriptors. More than 1000 molecular descriptors were calculated for each molecule using the program Dragon Web developed by the Milano Chemometrics and QSAR research group.35 These descriptors can be categorized as constitutional descriptors, topological descriptors, connectivity indices, information indices, 2D autocorrelations, Burden eigenvalues descriptors, eigenvalue-based indices, Received: Revised: Accepted: Published: 6251

September 20, 2011 March 7, 2012 April 5, 2012 April 5, 2012 dx.doi.org/10.1021/ie202153e | Ind. Eng. Chem. Res. 2012, 51, 6251−6255

Industrial & Engineering Chemistry Research

Article

f i (X ) = Tkp= 1μ F i (xk)

geometrical descriptors, WHIM descriptors, GETAWAY descriptors, functional group counts, atom-centered fragments, and molecular properties. 2.3. Genetic Function Approximation. GFA is a geneticbased algorithm of variable selection, developed by Rogers and Hopfinger, 36 that is a combination of the Friedman’s multivariate adaptive regression splines (MARS) algorithm37 and the Holland’s genetic algorithm.38 GFA evolves population of equations that correlate best with the responses. The major advantages of this approach are that it produces a population of models rather than a single model, estimates the most appropriate number of features, resists overfitting, and allows control over the smoothness of fit. GFA works in the following way: (a) a particular number of equations (e.g., 100) are generated by a random choice of descriptors; (b) pairs of parent are selected from the present population of equations, with probabilities proportional to their fitness; (c) crossovers are performed at randomly chosen points within the equations to generate progeny equations combining the characteristics of both parents; (d) the goodness of each progeny equation is assessed by various score such as R2, adjusted R2, and Friedman’s lack of fit (LOF); (e) the new progeny equation with better fitness is preserved. In this work the following equation is used for LOF score LOF =

SSE (1 − (c + dp /n))2

in which T denoted a t-norm, usually a minimum or product. The ANFIS structure contains five layers described below. In the first layer, all the nodes are adjustable nodes. They generate fuzzy membership grades of the inputs, and outputs of this layer are given by

y (X ) =

c0i

+

c1ix1

+

c 2i x 2

+ ... +

= CiX

i

1 ⎡ 1+⎢ ⎣

2 ⎤bi

( ) ⎥⎦ x − ci ai

O2, i = wi = μ A (x)μB (y) , i = 1, 2 i

(7)

i

(8)

which are the so-called firing strengths of the rules. The third layer implements a normalization function and the outputs of this layer can be represented as wi O3, i = wi̅ = , i = 1, 2 w1 + w2 (9) In the fourth layer, the nodes are adjustable nodes and every node i has the following function O4, i = wfi̅ i = wi(px + qiy + ri), i = 1, 2 i

(10)

where w̅ i is the output of third layer, and {pi, qi, ri} is the parameter set. The fifth layer represents the aggregation of the outputs performed by weighted summation. The output is computed as O5, i =

∑ wfi̅ i i

=

∑i wfi i w1 + w2

(11)

ANFIS uses a hybrid learning algorithm in order to train the network according to input−output data pairs. A hybrid algorithm is divided into a forward pass and a backward pass. The forward pass of the learning algorithm stop at nodes in layer 4 and the consequent parameters are identified by leastsquares method. In the backward pass, the error signals propagate backward and the premise parameters are undated by gradient descent. It has been proved that this hybrid algorithm is highly efficient in training the ANFIS.39

(2)

3. RESULTS AND DISCUSSION In this work the analysis begins by partitioning the data set randomly into two parts: a representative training set of 706 compounds to build and validate a QSPR model and a test set to validate the external predictive ability of the derived model. The test set, including 177 compounds, consists of wide variety

M

∑i = 1 f i (X ) M

∑i = 1 f i (X )(c0i + c1ix1 + ... + cpixp) M

, i = 3, 4

where ai, bi and ci are parameters of the membership function, governing the Gaussian functions accordingly. The second layer consisting of fixed nodes represent the tnorm operators that combine the possible input membership grades in order to compute the firing strength of the rule. The outputs of this layer are given by

∑i = 1 f i (X )y i (X )

∑i = 1 f i (X )

(6)

μ A (x) =

M

=

O1, i = μB

where μAi(x) and μBi‑2(y) can adopt any fuzzy membership function. For example, if the Gaussian membership function is employed, μAi(x) is given by

where i = 1, 2, ..., M; cik (k = 0, 1, ..., p) are the consequent parameters; yi(X) is the output of the ith rule; and Fik (k = 1, 2, ..., p) are fuzzy sets. The overall output, y(X), of the model is obtained by combining the outputs from the M rules in the following prescribed way y(X ) =

(5)

i − 2(y)

(1)

cpixp

O1, i = μ A (x) , i = 1, 2 i

where SSE is the sum of squares of errors, c is the number of basis functions (other than the constant term), d is a user defined smoothness factor, p is the number of features in the model, and n is the number of data points from which the model is built. 2.4. Adaptive Neurofuzzy Inference System. ANFIS, first introduced by Jang in 1993,39 is a class of adaptive networks that is funcionally equivalent to a fuzzy inference system. A fuzzy inference system includes four steps: (1) fuzzification of the input variables, (2) evaluation of the output for each rule, (3) aggregation of the rules’ outputs, and (4) defuzzification, which can be done by different approaches. By using a hybrid learning procedure, ANFIS can determine fuzzy inference parameters and construct an input−output mapping based on some collection of input−output data. ANFIS is based on the Takagi−Sugeno−Kang (TSK)40 inference model. In a TSK model with M fuzzy if−then rules, each giving p antecedents, the ith rule can be expressed as Rule i: If x1 is Fi1 and xp is Fip, then i

(4)

k

(3)

where the f i(X) are rule firing level (strengths), defined as 6252

dx.doi.org/10.1021/ie202153e | Ind. Eng. Chem. Res. 2012, 51, 6251−6255

Industrial & Engineering Chemistry Research

Article

experimental values (k and k′)43 according to the equations below

of functional groups. After that, we apply the GFA method as a powerful feature selection method on the training set to select the optimal set of descriptors. The optimum subset size of descriptor was chosen as 3. Adding more descriptor does not improve the statistics of the derived model to any significant degree and the increase of the R2 value was less than 0.01. On the basis of training data set the following linear equation was derived by using the GFA method (10 000 iterations, LOF score, 50 population size, 50% mutation probability):

n

R2 = 1 −

1

S1K

(12)

2

Sv

3

nROH

type

definition

topological descriptors constitutional descriptors functional group counts

one-path Kier α-modified shape index

∑i = 1 (yiexp − yicalc )2 n

1 ARD% = n k=

k′ =

n



(14)

|yiexp − yicalc | yiexp

i=1

× 100 (15)

∑ yiexp yicalc ∑ (yicalc )2

(16)

∑ yiexp yicalc ∑ (yiexp )2

(17)

calc where yexp i , yi , and y ̅ are experimental, calculated, and average of calculated values and n is the number of compounds in the data set. The external predictability of the QSPR models using the test set were determined by above statistical parameters and are presented in Table 2. The close values of R2, k, and k′ to unity and low RMSE and ARD% values obtained from two methods suggest that the presented QSPR models can be used to predict heat capacity of liquids at 298.15 k with high accuracy. As can be seen from the table, the results of the ANFIS method are a little better than those of the GFA one for both training and test sets. A comparison between this work and other works based on molecular structures for predicting the liquid heat capacity of organic compounds is made in Table 3. However, the group contribution (GC) based models are more accurate, but the number of parameters of our proposed QSPR models is lower than those presented in the group contribution models. Moreover, the results of QSPR models are more interpretable than GC models because selected descriptors that have been used in QSPR models have definite physical meanings. Also comparison between our QSPR models and previously presented QSPR model13 shows that the results of this work is more accurate, while the number of descriptors is 3 in our models instead of 6 in that model. The applicability domains (ADs) of the QSPR models were analyzed in the plot of the standardized residuals versus the leverage values (the Williams plot). The standardized residual of a compound is calculated as follows:

Table 1. The Three Molecular Descriptors Used in Eq 12 molecular descriptor

(13)

n

The S1K belongs to the Kier α-modified shape descriptors, representing paths of order 1 which encodes information about the count of atoms and relative cyclicity of molecules.41 This descriptor and therefore heat capacity increases with increasing the size of molecules and decreases with cyclicity of molecules. Sv is the sum of atomic van der Waals volumes scaled on carbon atom. Considering the eq 12, the sign of coefficient of the Sv descriptor is positive; as a result, the heat capacity of liquids is proportional with molecular volume. Finally, nROH descriptor is a functional group that is a measure of the number of hydroxyl groups in a molecule. the positive coefficient of this descriptor indicates a large contribution of hydroxyl groups in increasing of the heat capacity of liquids. The molecular descriptors and their physical meanings are presented in Table 1.

ID

n

∑i = 1 (yiexp − y ̅ )2

RMSE =

Cp = 17.093873889 × S1K + 7.905409310 × Sv + 38.997421023 × nROH + 14.131049814

∑i = 1 (yiexp − yicalc )2

sum of atomic van der Waals volumes (scaled on carbon atom) number of hydroxyl groups

The hybrid subtractive clustering ANFIS method42 was also employed to describe the nonlinear relation between heat capacity of liquids and the selected descriptors. By using the same three descriptors and the same training samples used for GFA modeling, the optimal ANFIS model was obtained and the corresponding results were shown in Supporting Information. All ANFIS calculations were carried out using Matlab mathematical software with the fuzzy logic toolbox for Windows running on a personal computer, while the GFA model was derived by using Materials Studio software by Accelrys Inc. The performance of proposed QSPR models was evaluated in terms of the squared correlation coefficient (R2) and the root mean squares error (RMSE), average relative deviation (ARD %), and the slopes of the regression line of the predicted vs

sri = (ri − mp)/sd

(18)

where ri is residual of each compound, mp is mean value, and sd is the standard deviation of the residuals. The leverage (hi) value of a compound is defined as

Table 2. Results for Prediction of Liquid Heat Capacity at 298.15 k Obtained by GFA and ANFIS Methods GFA training set test set total

ANFIS

RMSE

R2

ARD%

k

k′

RMSE

R2

ARD%

k

k′

16.718 17.574 16.893

0.96261 0.96945 0.96434

5.07 5.66 5.19

1.0000 1.0092 1.0019

0.9960 0.9868 0.9941

16.137 16.632 16.238

0.96516 0.97263 0.96705

4.77 5.22 4.86

1.0000 1.0084 1.0017

0.9963 0.9880 0.9946

6253

dx.doi.org/10.1021/ie202153e | Ind. Eng. Chem. Res. 2012, 51, 6251−6255

Industrial & Engineering Chemistry Research

Article

Table 3. Comparison between the Presented Models and Previous Models

a

no.

model

ARD%

RMSE

Ncomponent

temp (K)

method

1 2 3 4 5 6

ref 9 ref 10 ref 11 ref 13 this work (GFA) this work (ANFIS)

1.5 2.6 2.8 NA 5.19 4.86

NAa NA NA 17.141 16.893 16.238

404 1395 595 871 883 883

298.15 298.15−423.15 250−426 298.15 298.15 298.15

GCb GC GC QSPR QSPR QSPR

Not available. bGroup contribution.

hi = x i T(XTX)−1x i (i = 1, ..., n)

4. CONCLUSION In the present work based on a data set of 706 common organic compounds by using the GFA method, three descriptors have been selected from a set of more than 1000 molecular descriptors, and a linear model for prediction of liquid heat capacity at 298.15 K was constructed. The three descriptors selected by the GFA were used for developing a nonlinear model by the ANFIS method. Besides high accuracy and significant correlation between molecular structure and the liquid heat capacity, the selected descriptors have explicit physical meanings and show that the liquid heat capacity depends on size, cyclicity, and the number of hydroxyl groups in a molecule. Since the highly accurate proposed models cover a diverse structural space of compounds, it can be used for the prediction of liquid heat capacity for any desired organic chemical structure.

(19)

where xi is the descriptor-row vector of the considered compound and X is the k × n matrix containing the k descriptor values for each one of the n training compounds. From the Figure 1 it can be seen that the majority of



ASSOCIATED CONTENT

S Supporting Information *

Experimental and predicted values for training and test sets as well as corresponding QSPR descriptors. This material is available free of charge via the Internet at http://pubs.acs.org.



Figure 1. Williams plot describing the applicability domain of the GFA model (h* = 0.017).

AUTHOR INFORMATION

Corresponding Author

*Tel: +98 21 64543176. E-mail: [email protected]. Notes

The authors declare no competing financial interest.



compounds are located within the applicability domain (i.e., hat value smaller that the h*) and are predicted accurately. The leverage (h) being greater than the critical hat value (h* = 3k/n, where k is the number of model parameters and n is the number of compounds) suggested that the compound was influential on the model. The high leverage values related to high values of hydroxyl groups or the sum of atomic van der Waals volumes; for example, glycerol has the highest leverage value and is the only compound with three hydroxyl groups and trioctylamine has second leverage values with highest value of Sv. The third leverage value concerns hexafluoroacetone, and the Sv value of this compound is greater than the S1k value. As can be found from Figure 1, most of the compounds with high leverage values have small residuals, and such compounds are so-called good influence points, which stabilize the model and make it more precise. Also, it can be seen that the most samples with response outliers (i.e., compounds with standardized residuals greater than three standard deviation units, >3 sd) have low leverage values. We thought that the inaccurate prediction could probably be attributed to wrong experimental data rather than to molecular structures or to the kind of methods (GFA or ANFIS) used for model development.

REFERENCES

(1) Straka, M.; Ruzicka, K.; Ruzicka, V. Heat capacities of chloroanilines and chloronitrobenzenes. J. Chem. Eng. Data 2007, 52, 1375−1380. (2) Randzio, S. L. Transitiometry: Towards a global virtual instrument control and a virtual link between experiment and modeling. Thermochim. Acta 2000, 355, 107−113. (3) Giaretto, V.; Torchio, M. F. Two-wire solution for measurement of the thermal conductivity and specific heat capacity of liquids: Experimental design. Int. J. Thermophys. 2004, 25, 679−699. (4) Richner, G.; Neuhold, Y. M.; Papadokonstantakis, S.; Hungerbuhler, K. Temperature oscillation calorimetry for the determination of the heat capacity in a small-scale reactor. Chem. Eng. Sci. 2008, 63, 3755−3765. (5) Wilhelm, E.; Letcher, T. Heat Capacities: Liquids, Solutions and Vapours; Royal Society of Chemistry: Cambridge, 2010. (6) Bransky, M. Z.;Jr., V. R.; Majer, V.; Domalski, E. S. Heat Capacity of Liquids. Vols. 1 and 2. Critical Review and Recommended Values; American Institute of Physics: Woodbury, NY, 1996. (7) Garg, S. K.; Banipal, T. S.; Ahluwalia, J. C. Heat capacities and densities of liquid oxylene, m-xylene, p-xylene, and ethylbenzene, at temperatures from 318.15 to 373.15 K and at pressures up to 10 MPa. J. Chem. Thermodyn. 1993, 25, 57−62.

6254

dx.doi.org/10.1021/ie202153e | Ind. Eng. Chem. Res. 2012, 51, 6251−6255

Industrial & Engineering Chemistry Research

Article

(8) Jovanovic, J. D.; Knezevic-Stevanovic, A. B.; Grozdanic, D. K. An empirical equation for temperature and pressure dependence of liquid heat capacity. J. Taiwan Inst. Chem. E. 2009, 40, 105−109. (9) Kolska, Z.; Kukal, J.; Zabransky, M.; Ruzicka, V. Estimation of the heat capacity of organic liquids as a function of temperature by a threelevel group contribution method. Ind. Eng. Chem. Res. 2008, 47, 2075− 2085. (10) Ceriani, R.; Gani, R.; Meirelles, A. J. A. Prediction of heat capacities and heats of vaporization of organic liquids by group contribution methods. Fluid Phase Equilib. 2009, 283, 49−55. (11) Valderrama, J. O.; Toro, A.; Rojas, R. E. Prediction of the heat capacity of ionic liquids using the mass connectivity index and a group contribution method. J. Chem. Thermodyn. 2011, 43, 1068−1073. (12) He, M. G.; Yang, Y. J.; Zhang, Y.; Zhang, X. X. Theoretical estimation of the isobaric heat capacity Cp of refrigerant. Appl. Therm. Eng. 2008, 28, 1813−1825. (13) Yao, X. J.; Fan, B.; Doucet, J. P.; Panaye, A.; Liu, M.; Zhang, R.; Zhang, X.; Hu, Z. Quantitative structure property relationship models for the prediction of liquid heat capacity. QSAR Comb. Sci. 2003, 22, 29. (14) Ashrafi, F.; Saadati, R.; Behboodi Amlashi, A. Modeling and theoretical calculation of liquid heat capacity of alcohols and aldehydes using QSPR. Afr. J. Pure Appl. Chem. 2008, 2, 116−120. (15) Roy, K.; Saha, A. QSPR with TAU indices: Part 5. Liquid heat capacity of diverse functional organic compounds. J. Indian Chem. Soc. 2006, 83, 351−355. (16) Gardas, R. L.; Ge, R.; Goodrich, P.; Hardacre, C.; Hussain, A.; Rooney, D. W. Thermophysical properties of amino acid-based ionic liquids. J. Chem. Eng. Data 2010, 55, 1505−1515. (17) Golovanov, I. B.; Zhenodarova, S. M.; Smolyaninova, O. A. Quantitative structure−property relationship: XIII.1 Properties of aliphatic alcohols. Russ. J. Gen. Chem. 2003, 73, 519−524. (18) Katritzky, A. R.; Kuanar, M.; Stoyanova-Slavova, I. B.; Slavov, S. H.; Dobchev, D. A.; Karelson, M.; Acree, W. E. Quantitative structure−property relationship studies on ostwald solubility and partition coefficients of organic solutes in ionic liquids. J. Chem. Eng. Data 2008, 53, 1085−1092. (19) Liu, J. P.; Wilding, W. V.; Giles, N. F.; Rowley, R. L. A quantitative structure property relation correlation of the dielectric constant for organic chemicals. J. Chem. Eng. Data 2010, 55, 41−45. (20) Li, L.; Xie, S.; Cai, H.; Bai, X.; Xue, Z. Quantitative structure− property relationships for octanol−water partition coefficients of polybrominated diphenyl ethers. Chemosphere 2008, 72, 1602−1606. (21) Golmohammadi, H.; Dashtbozorgi, Z. Quantitative structure− property relationship studies of gas-to-wet butyl acetate partition coefficient of some organic compounds using genetic algorithm and artificial neural network. Struct. Chem. 2010, 21, 1241−1252. (22) Fatemi, M. H.; Karimian, F. Prediction of micelle−water partition coefficient from the theoretical derived molecular descriptors. J. Colloid Interface Sci. 2007, 314, 665−672. (23) Modarresi, H.; Modarress, H.; Dearden, J. C. QSPR model of Henry’s law constant for a diverse set of organic chemicals based on genetic algorithm-radial basis function network approach. Chemosphere 2007, 66, 2067−2076. (24) Tetteh, J.; Suzuki, T.; Metcalfe, E.; Howells, S. Quantitative structure−property relationships for the estimation of boiling point and flash point using a radial basis function neural network. Ind. Eng. Chem. Res. 2009, 48, 7378−7387. (25) Nantasenamat, C.; Isarankura-Na-Ayudhya, C.; Naenna, T.; Prachayasittikul, V. Prediction of bond dissociation enthalpy of antioxidant phenols by support vector machine. J. Mol. Graph. Modell. 2008, 27, 188−196. (26) Niazi, A.; Jameh-Bozorghi, S.; Nori-Shargh, D. Prediction of toxicity of nitrobenzenes using ab initio and least squares support vector machines. J. Hazard. Mater. 2008, 151, 603−609. (27) Khajeh, A.; Modarress, H. QSPR prediction of flash point of esters by means of GFA and ANFIS. J. Hazard. Mater. 2010, 179, 715−720.

(28) Khajeh, A.; Modarress, H. Quantitative structure−property relationship for surface tension of some common alcohols. J. Chemometr. 2011, 25, 333−339. (29) Khajeh, A.; Modarress, H. Quantitative structure−property relationship prediction of liquid thermal conductivity for some alcohols. Struct. Chem. 2011, 22, 1315−1323. (30) Khajeh, A.; Modarress, H. QSPR prediction of surface tension of refrigerants from their molecular structures. Int. J. Refrig. 2012, 35, 150−159. (31) Khajeh, A.; Modarress, H. Quantitative structure−property relationship for flash point of alcohols. Ind. Eng. Chem. Res. 2011, 50, 11337−11342. (32) Khajeh, A.; Modarress, H.; Rezaee, B. Application of adaptive neuro-fuzzy inference system for solubility prediction of carbon dioxide in polymers. Expert Syst. Appl. 2009, 36, 5728−5732. (33) Khajeh, A.; Modarress, H. Prediction of solubility of gases in polystyrene by adaptive neuro-fuzzy inference system and radial basis function neural network. Expet. Syst. Appl. 2010, 37, 3070−3074. (34) Yaws, C. L. Yaws’ Handbook of Thermodynamic and Physical Properties of Chemical Compounds; Knovel: Norwich, NY, 2003. (35) http://www.michem.disat.unimib.it/chm/. (36) Rogers, D.; Hopfinger, A. J. Application of genetic function approximation to quantitative structure−activity relationships and quantitative structure−property relationships. J. Chem. Inf. Comput. Sci. 1994, 34, 854−866. (37) Friedman, J. Multivariate AdaptiveRegression Splines, Technical Report No. 102, Laboratory for Computational Statistics, Department of Statistics; Stanford University; Stanford, CA, Nov 1988 (revised Aug 1990). (38) Holland, J. Adaptation in Artificial and Natural Systems; University of Michigan Press: Ann Arbor, MI, 1975. (39) Jang, J. ANFIS: Adaptive network-based fuzzy inference systems. IEEE Trans. Syst. Man Cybern. 1993, 23, 665−685. (40) Sugeno, M. Industrial Applications of Fuzzy Control; Elsevier, Amsterdam, 1985. (41) Todeschini, R.; Consonni, V. Molecular Descriptors for Chemoinformatics; Wiley-VCH: Weinheim, 2009. (42) Yager, R.; Filev, D. Approximate clustering via the mountain method. IEEE Trans. Systems Man Cybernet. 1994, 24, 1279−1284. (43) Golbraikh, A.; Tropsha, A. Beware of q2! J. Mol. Graphics Modell. 2002, 20, 269−276.

6255

dx.doi.org/10.1021/ie202153e | Ind. Eng. Chem. Res. 2012, 51, 6251−6255