Group Contribution Model for the Prediction of Refractive Indices of

May 13, 2014 - Chen Cai , Rachael E. H. Miles , Michael I. Cotterell , Aleksandra Marsh , Grazia Rovelli , Andrew M. J. ... Valeska Gonzalez , Spencer...
0 downloads 0 Views 1MB Size
Article pubs.acs.org/jced

Group Contribution Model for the Prediction of Refractive Indices of Organic Compounds Farhad Gharagheizi,†,‡ Poorandokht Ilani-Kashkouli,†,‡ Arash Kamari,† Amir H. Mohammadi,*,†,§ and Deresh Ramjugernath*,† †

Thermodynamics Research Unit, School of Engineering, University of KwaZulu-Natal, Howard College Campus, King George V Avenue, Durban 4041, South Africa ‡ Department of Chemical Engineering, Buinzahra Branch, Islamic Azad University, Buinzahra, Iran § Institut de Recherche en Génie Chimique et Pétrolier (IRGCP), Paris Cedex, France S Supporting Information *

ABSTRACT: The determination of a wide range of optical parameters for the evaluation of thermodynamic properties and process variables is required in chemical thermodynamics and process engineering. Thus, in this study, the prediction of the refractive indices of pure chemical compounds as a potential thermophysical property is pursued by a reliable model. An accurate group contribution (GC) method is presented for the estimation of the refractive indices of pure compounds. The model was developed by use of a very large data set of 11918 pure components, most of which are organic compounds. Approximately 80 % of the data set (9536 data points) was used to develop the model, and the remaining 20 % (2382 data points) was implemented to evaluate the predictive capability of the proposed model. The method uses a total of 80 substructures or structural functional groups to estimate the refractive index. The model has an average absolute relative deviation with respect to the literature data of 0.83 %, with a squared correlation coefficient of 0.888. The model therefore performs very satisfactorily with regard to the prediction of refractive indices of pure compounds.

1. INTRODUCTION The refractive index (RI), also called the index of refraction, is a measure of the change in velocity of a light wave as it travels from one medium to another.1 It is also equal to the ratio of the speed of light in vacuum, c, to that in a given medium, v, in a substance, or n = c/v.2 The refractive index (n) is frequently employed to characterize organic compounds,3 and it is also one of the most important properties in light scattering measurements of dilute polymer solutions which are applied for the estimation of molecular weight, size, and shape.4 Values of refractive index can be measured experimentally and are normally used to correlate density and/or other physical properties of chemicals.5 Information obtained from the RI measurements is therefore valuable in various chemical engineering calculations due to its application in the design of new optical materials. Moreover, RI measurements in combination with density, melting point, boiling point, and other analytical data are very useful industrially for the specification and characterization of substances such as oils, waxes, and sugar syrups, etc.6 As previously mentioned, the refractive index is a measure of the change in velocity of a light wave and it is directly related to the molecular structure of the material it traverses. RI can thus be related to the density of the material, and it is also directly related to the dielectric constant of the material.1 Most theoretical treatments for the estimation of RI have been proposed in terms © 2014 American Chemical Society

of molar refraction, which quantifies the intrinsic refractive power of the basic structural units of a material.7,8 Alternative definitions of molar refraction (R) have been developed by Lorentz and Lorenz9 (eq 1), Gladstone and Dale10 (eq 2), and Vogel11 (eq 3) as follows:7,8 RLL =

n2 − 1 V n2 + 2

(1)

R GD = (n − 1)V

(2)

R V = nM

(3)

The calculation of the refractive index is based on the preceding equations, where the molar refraction (R) is calculated as a sum of corresponding atom and bond contributions and volume (V) is estimated as a van der Waals volume of the compound divided by the average coefficient of molecular packing.7,8 Hence, it is relatively easy to calculate the refractive index using the molar refraction, V, and molar mass (M). Two major approaches which include the group contribution (GC) strategy and the quantitative structure−property relationship (QSPR) approach have been applied for the estimation of the Received: January 20, 2014 Accepted: April 29, 2014 Published: May 13, 2014 1930

dx.doi.org/10.1021/je5000633 | J. Chem. Eng. Data 2014, 59, 1930−1943

Journal of Chemical & Engineering Data

Article

molar refraction.7 Agrawal and Jenekhe12 estimated the refractive index property for π-conjugated polymers by preparing a series of 14 conjugated rigid-rod polyquinolinea and polyanthrazolines of current interest as optical properties using a GC model. In other words, they studied the linear optical materials of a series of rigidrod polyquinolines. The results obtained demonstrated that the steric hindrance in the structure and hence the extent of π-electron delocalization was considerably modified by simple systematic variations in the backbone structure of conjugated polyquinolines. Furthermore, Yang and Jenekhe13 indicated the application of a semiempirical GC model to estimate new Lorentz and Lorenz9 RLL values for functional groups commonly found in π-conjugated polymers. They obtained a reasonable average error of 14 % from the literature-reported RLL data. Kier and Hall14 proposed two reliable QSPR models for two sets of 55 alkanes with four topological indices and 24 alkyl substituted benzenes including two topological indices. For these separate models, they obtained a R2 of 0.998. Brekke et al.15 used a databank which included 24 different samples in order to predict physical properties such as the retention factor (Rf) of 12 component mixtures of n-alkanes, isoalkanes, and cycloalkanes. Consequently, they obtained a standard error of 2.5 %. Xu et al.16 developed a four-descriptor QSPR model with a R2 of 0.929 for prediction of RI for a set of 121 linear polymers. Ha et al.17 proposed QSPR models based on molecular descriptors to estimate the RI of 200 aromatic compounds, 186 saturated compounds, and a combined set of 386 compounds. Xu et al.18 developed the nonlinear and linear QSPR models based on feedforward neural networks for the estimation of RI of 120 polymers. A thorough comparison of the previous models proposed for the estimation of the refractive index of pure chemical compounds reveals that most of the models have been developed/evaluated for small chemical groups/families of compounds. Therefore, it was decided to use a very large database in an attempt to develop a general group contribution relationship for the prediction of the refractive index. Besides, it is observable that the group contribution method has previously been successfully used to determine the critical properties,19 the Henry’s law constant of pure organic compounds in water;20 surface tension of pure compounds;21 solubility of pure compounds in water;19 lower and flammability limit of pure compounds;,22,23 autoignition temperature;24 sublimation,25,26 vaporization,25,26 and fusion27 enthalpies of organic compounds; parachor of pure organic compounds;28 and vaporization of crystal lattice energy of pure organic compounds;25 etc. Furthermore, in order to check whether the newly developed model is statistically correct and valid, the leverage approach in which the statistical hat matrix, Williams plot, and the residuals of the model results is used to identify the probable outliers is also applied.

and extensive data set for the development of a GC model for refractive index of pure compounds. A careful analysis of the compounds within the data set shows that the refractive indices range between 1 and 1.872 and molecular weights between 16.042 and 891.497. Here, it should be mentioned that all of the literature data have been reported for standard state condition (T = 298 K and P = 1 atm). The compounds are composed of hydrogen (1 to 110 atoms per compound), carbon (1 to 57 atoms per compound), nitrogen (1 to 6 atoms per compound), oxygen (from 1 to 19 atoms per compound), phosphorus (1 to 4 atoms per compound), sulfur (1 to 4 atoms per compound), fluorine (1 to 27 atoms per compound), chlorine (1 to 10 atoms per compound), bromine (1 to 6 atoms per compound), iodine (1 to 2 atoms per compound), boron (1 to 3 atoms per compound), silicon (1 to 11 atoms per compound), arsenic (1 to 2 atoms per compound), and selenium (1 to 2 atoms per compound). It should be mentioned that there are a few cadmium, tin, antimony, lead, tellurium, and bismuth compounds in the database. Figures 1, 2, and 3 indicate the

Figure 1. Distribution of refractive indices in the databank.

2. DATABANK The superiority of a proposed model is dependent on the extensiveness of the data set used for both the development and testing of the model. In other words, the applicability, reliability, and accuracy of the model for estimation of physical properties depend on the comprehensiveness of the data set employed in its development.21,29−35 There are relatively few studies in literature which used a very large data set for model derivation with the aid of the group contribution method. This makes the data set of 11918 diverse mostly organic compounds drawn from Yaws’ Handbook of Thermodynamic and Physical Properties of Chemical Compounds36 the most comprehensive

Figure 2. Distribution of molecular weights in the databank.

distribution of refractive indices, molecular weights, and atom numbers, respectively. There are 2039 hydrocarbons (C and H compounds) in the data set whose refractive indices range from 1 to 1.729. The 1931

dx.doi.org/10.1021/je5000633 | J. Chem. Eng. Data 2014, 59, 1930−1943

Journal of Chemical & Engineering Data

Article

the quota of each subdata set from the main data set. As a result, 80 % (9536 points), 10 % (1191 points), and 10 % (1191 points) of the main data set were assigned to the training, validation, and test sets, respectively. 3.2. Model Development. In order to develop a reliable group contribution model, the chemical structures of all of the components used in the database were examined thoroughly to find out the most efficient substructures for the estimation of the refractive index. In other words, having defined the compounds present in the database, the chemical structures of all of the studied compounds have been analyzed to recognize the functional groups. These functional groups are generally selected from a series of groups containing approximately 500 different chemical groups as follows:19−28,30,32,38−40 (a) Functional groups are partitioned in different categories, each one including two pairs from all of the groups. (b) A mathematical strategy (algorithm) is applied to establish a linear relationship between the two groups in a pair:

Figure 3. Distribution of atom numbers in the databank.

GCi = a × GCj + b

data set includes 2409 nitrogen compounds whose refractive indices range from 1.259 to 1.872. The elemental composition analysis of the data set further indicates that there are 6543 oxygen compounds whose refractive indices range from 1.24 to 1.763. There are 691 sulfur compounds in the data set having refractive indices that range from 1.24 to 1.822. The data set includes 219 phosphorus compounds whose refractive indices range from 1.3 to 1.636. There are a significant number of halogen compounds within the data set: 1106 fluorine-containing compounds with refractive indices between 1.151 and 1.625; 1711 chlorine-containing compounds with refractive indices between 1.199 and 1.695; 790 bromine-containing compounds having refractive indices that range from 1.238 to 1.863; and 263 iodine-containing compounds whose refractive indices range from 1.327 to 1.871. The information about the names, abbreviations, groups, and the original reference for each data point is presented as Supporting Information.

(4)

where GC denotes the functional groups, a and b are the coefficients of the linear regression, and subscripts i and j refer to ith and jth functional groups. (c) In the case where the squared correlation coefficient of eq 4 is greater than a selected value (0.9 in this study), one of the groups is omitted from the investigated pair because it has no significant effects on the model finally developed and results in an increase in the model parameters (final functional groups). The preceding procedure is pursued until the most efficient contributions for evaluation of the corresponding property (refractive index) have been determined. In order to propose an accurate and reliable correlation model, one must utilize model parameters which enable one to distinguish one compound from the other. In other words, one needs a unique set of model parameters for each compound that can sufficiently describe the refractive index. In the present study, it was decided to generate the model parameters from the molecular structures. As a result, a collection of nearly 500 chemical substructures were collected. In the next step, the frequency of appearance of each of the chemical substructures as model parameters was counted in each compound. The pair correlation between each pair of the chemical substructures was then evaluated to avoid entering irrelevant parameters into the final model. Then, if the pair correlation of a pair of chemical substructures was more than the threshold value of 0.9, one of them was eliminated and the other kept for the next step. Performing this procedure, the collection of the chemical substructures was decreased to nearly 200 chemical substructures. In order to develop the final model and select the optimal subset of chemical substructures affecting the refractive index, the sequential search method was applied.40 In computer science, sequential search is a technique for finding a particular value in a list that consists of checking every one of its elements, one at a time and in sequence, until the desired one is found.41 In other words, the major target of a sequential search is to find an optimal subset of chemical substructures for a specified model size.38 The basic idea of the method is to replace each chemical substructure, one at a time, with all the remaining ones and see whether a better model is obtained. Actually, sequential search with percentage of average absolute relative deviation as an objective function is successfully implemented for selection of variables.

3. METHODOLOGY 3.1. Data Partition. Typically, in group contribution model development, the selected literature data set is split into three subsets which include the training, validation, and test sets. The “training” set is applied to generate the model structure and the “validation” as well as the “test (prediction)” set are employed to investigate its prediction validity and capability. In other words, the first set is for developing the model, the second set is for assessing the internal validity of the model, and the final set is for evaluating the predictive capability of the derived model. In partitioning the existing data into these subdata sets, several distributions have been implemented to avoid regression problems with respect to the local minima and to ensure parameters in the feasible region of the problem. As a result, the most efficient distribution is one with homogeneous distribution data in the domain of the three subdata sets.37 In this work, the K-means clustering technique is implemented to partition the main data set into the training, validation, and test sets. K-means clustering is a technique of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. In other words, it would be of great interest if we could split the main data set so that all the subsets are uniform and have almost the same ranges and means. This procedure resolves the issue of inappropriate allocation of data sets. Another point is 1932

dx.doi.org/10.1021/je5000633 | J. Chem. Eng. Data 2014, 59, 1930−1943

Journal of Chemical & Engineering Data

Article

Table 1. Contribution of Each Chemical Substructure to the Refractive Index of Organic Compounds (Parameters of Equation 4)c

1933

dx.doi.org/10.1021/je5000633 | J. Chem. Eng. Data 2014, 59, 1930−1943

Journal of Chemical & Engineering Data

Article

Table 1. continued

1934

dx.doi.org/10.1021/je5000633 | J. Chem. Eng. Data 2014, 59, 1930−1943

Journal of Chemical & Engineering Data

Article

Table 1. continued

1935

dx.doi.org/10.1021/je5000633 | J. Chem. Eng. Data 2014, 59, 1930−1943

Journal of Chemical & Engineering Data

Article

Table 1. continued

a

The superscript represents the formal oxidation number. The formal oxidation number of a carbon atom equals the sum of the conventional bond orders with electronegative atoms; the C--N bond order in pyridine may be considered as 2 while we have one such bond and 1.5 when we have two such bonds; the C···X bond order in pyrrole or furan may be considered as 1. bAn alpha-C may be defined as a C attached through a single bond with -C = X, -C#X, -C--X (see footnote c). cX represents any electronegative atom (O, N, S, P, Se, and halogens). Al and Ar represent aliphatic and aromatic groups, respectively. = represents a double bond. # represents a triple bond. -- represents an aromatic bond as in benzene or delocalized bonds such as the N−O bond in a nitro group.

3.3. Leverage Approach. In fitting predictive linear/ multiple models, it is very often useful to identify how much impact or leverage each data y value can have on each fitted y value.42 In other words, because the results of a regression analysis can be quite sensitive to outliers (either on y or in the space of the predictors), it is important and effective to be able to detect such points. To pursue this objective, simultaneous numerical and graphical methods43−48 can be employed for the identification target. The leverage approach,43−45 which is a well-known way for outlier detection and is used in this study, consists of a definition of the values of the residuals (i.e., deviations of the model results from the related experimental data) and a matrix, named as the Hat matrix which involves the experimental data and the predicted values given by the model.47,48 The main application criterion of the aforementioned algorithm is to use a proper mathematical model, which is capable of sufficient calculation/estimation of the data of interest. The Hat matrix used in the leverage method is defined as shown below:43−48

H = X(XTX)−1XT

matrix. As a matter of fact, the Hat values of the chemicals in the feasible region of the problem are the diagonal elements of the H matrix.49 The Williams plot is later sketched for graphical determination of the suspect data or outliers on the basis of the calculated H values through eq 5. This plot indicates the correlation of Hat indices and standardized cross-validated residuals (R), which are defined as the difference between the represented/predicted values and the implemented data. A warning leverage (H*) is normally fixed at a value equal to 3p/n, in which p is the number of model parameters plus one and n stands for the number of training points. The leverage equal to three is a “‘cutoff’” value to accept the points within a ± 3 standard deviation range from the mean (to cover 99 % normally distributed data). If the majority of the data points are located in the ranges of 0 ≤ H ≤ H*and −3 ≤ R ≤ 3, it indicates that both model development and its predictions are done in an applicable domain and this consequently leads to a statistically valid model. It is worthwhile to note that ‘“good high leverage”’ points are located in the domain of H* ≤ H and −3 ≤ R ≤ 3. The good high leverage can be recognized as the points which are outside of the applicability domain of the implemented model (as can be seen in Figure 5, there is no

(5)

where X is a two-dimensional matrix composed of n data (rows) and p model parameters (columns) and T stands for the transpose 1936

dx.doi.org/10.1021/je5000633 | J. Chem. Eng. Data 2014, 59, 1930−1943

Journal of Chemical & Engineering Data

Article

good high leverage point in this study). The points which are situated in the range of R ≤ −3 or 3 ≤ R (whether they are greater or smaller than the H* value) are identified as outliers of the model or “‘bad high leverage”’ points. These erroneous representations/predictions may be identified to the doubtful data.46

4. RESULTS AND DISCUSSION In order to obtain an accurate and reliable correlation, the collection of nearly 200 chemical substructures prepared as discussed in the previous section, were introduced into the sequential search mathematical algorithm. Furthermore, in order to obtain the optimal correlation in terms of both the Table 2. Statistical Error Analysis of the Correlation Developed in This Study statistical parameter 2

R average absolute relative deviation standard deviation error root mean square error no. of used data points R2 average absolute relative deviation standard deviation error root mean square error no. of used data points R2 average absolute relative deviation standard deviation error root mean square error no. of used data points R2 average absolute relative deviation standard deviation error root mean square error no. of used data points

value

Figure 5. Comparison between the results of the developed correlation and the literature values of refractive index.

training set 0.888 0.83 0.02 0.02 9536 validation set 0.878 0.83 0.02 0.02 1191 test set 0.898 0.83 0.02 0.02 1191 total 0.888 0.83 0.02 0.02 11918

Figure 6. Relative deviations of the refractive index values obtained by the proposed correlation from the databank values.

Figure 7. Distribution of errors in the training, validation, and test subsets.

number of chemical substructures and accuracy, a threshold value of 0.01 was considered for the reduction in the average

Figure 4. Gradual change of R2 and ARD% as a function of the number of chemical substructures. 1937

dx.doi.org/10.1021/je5000633 | J. Chem. Eng. Data 2014, 59, 1930−1943

Journal of Chemical & Engineering Data

Article

Table 3. Outlier Compounds and Their Deviations Mw ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59

lit

compound

RI

bromochlorodinitromethane bromotrifluoromethane chlorotrifluoromethane phosgene carbon tetrafluoride trifluoromethanesulfenyl fluoride chlorodifluoromethane fluoroform hydrogen cyanide bromoiodomethane chloroiodomethane difluoromethane diiodomethane 3H-diazirine trithiocarbonic acid methyldichloroarsine N,N-dibromomethylamine tribromomethylsilane methyl fluoride nitrosourea 5-aminotetrazole 2H-tetrazol-5-amine methane carbonyl sulfide hexabromoethane chlorotrifluoroethylene chloropentafluoroethane 3,4-dichloro-1,2,5-thiadiazole pentafluoroethane iodoacetonitrile 1,2,5-thiadiazole 1,1,1-trifluoroethane acetyl iodide sodium acetate 1,1-difluoroethane 1,1-diiodoethane 1,2-diiodoethane ethyldichloroarsine ethyl fluoride 2-iodoethanol dimethyl cadmium dimethyl ether dimethyl diselenide hexafluoropropylene thiazole methoxyflurane 1H-pyrazole acrylamide 1,3,5-triazine-2,4,6-triamine propane trimethyl thioborate iodotrimethylsilane 1,1,2,3,4,4-hexafluoro-1,3-butadiene octafluorocyclobutane chloroprene 3-methoxy-1-propyne succinic acid malic acid butanoyl bromide

1.57 1.238 1.199 1.3561 1.151 1.5484 1.256 1.215 1.2594 1.641 1.5822 1.196 1.738 1.684 1.8225 1.566 1.571 1.5152 1.174 1.608 1.699 1.699 1.0004 1.24 1.863 1.38 1.214 1.561 1.5012 1.5744 1.515 1.206 1.5491 1.464 1.2434 1.65 1.871 1.555 1.2621 1.5713 1.5488 1.2984 1.639 1.583 1.5969 1.386 1.4203 1.566 1.872 1.2861 1.5788 1.471 1.378 1.217 1.4583 1.5035 1.3373 1.3516 1.596 1938

RI

pred

1.502 1.328 1.291 1.420 1.244 1.301 1.336 1.298 1.391 1.541 1.504 1.317 1.585 1.436 1.618 1.477 1.387 1.369 1.335 1.494 1.563 1.563 1.363 1.478 1.725 1.303 1.286 1.482 1.265 1.502 1.439 1.288 1.484 1.385 1.310 1.577 1.592 1.479 1.341 1.507 1.474 1.376 1.466 1.266 1.524 1.450 1.486 1.467 1.674 1.374 1.416 1.376 1.288 1.288 1.714 1.406 1.431 1.456 1.458

ARD%

g·mol−1

4.3 7.2 7.6 4.7 8.1 15.9 6.4 6.8 10.4 6.1 4.9 10.2 8.8 14.7 11.2 5.7 11.7 9.6 13.7 7.1 8.0 8.0 36.2 19.2 7.4 5.6 5.9 5.1 15.7 4.6 5.0 6.8 4.2 5.4 5.4 4.4 14.9 4.9 6.2 4.1 4.8 6.0 10.6 20.0 4.6 4.6 4.6 6.3 10.6 6.8 10.3 6.5 6.6 5.9 17.6 6.5 7.0 7.7 8.7

219.3791 148.911 104.459 98.9154 88.005 120.071 86.4684 70.0144 27.0259 220.835 176.384 52.0239 267.836 42.040 88 110.225 160.861 188.849 82 282.832 34.0333 89.054 62 85.069 82 85.069 82 16.0428 60.0764 503.446 116.47 154.467 155.007 120.022 166.949 86.1179 84.0413 169.95 82.0344 66.0508 281.863 281.863 174.888 48.0602 171.966 142.481 46.069 187.99 150.024 85.1298 164.966 68.0788 71.0791 126.123 44.0965 152.113 200.094 162.035 200.032 88.5362 70.091 118.089 134.089 151.003

dx.doi.org/10.1021/je5000633 | J. Chem. Eng. Data 2014, 59, 1930−1943

Journal of Chemical & Engineering Data

Article

Table 3. continued Mw ID 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118

lit

compound

RI

1,1,3-tribromobutane tetramethylammonium hydroxide octachlorocyclopentene 2-chloro-3-nitropyridine 1H-pyrrole-2-carboxaldehyde trans-3-(dimethylamino)acrylonitrile 3-dimethylaminoacrolein 2,4-diiodopentane 2-ethyl-3-nitrosothiazolidine 3-methyltetrahydrofuran 3-hydroxy-3-methylbutanoic acid 1,3-dichloro-5-nitrobenzene 1,4-dichloro-2-nitrobenzene 2,4,6-trinitrophenol 4-bromo-3-fluoroaniline benzenesulfinyl chloride N-sulfinylaniline 3-pyridinecarboxamide 4-mercaptophenol pyrocatechol methyl 2-thiofuroate 2-aminobenzenethiol trans,trans-2,4-hexadienal 1-ethyl-3-methylimidazolium tetrafluoroborate trans-4-(dimethylamino)-3-buten-2-one 3-ethyltetrahydrofuran paraldehyde 1-fluoro-4-methylpentane 1,2,6-hexanetriol sorbitol D-mannitol butylchlorodimethylsilane hexamethyl phosphoramide perfluoromethylcyclohexane 1-chloro-3-(trichloromethyl)benzene 3-(2-furanyl)-2-propenenitrile imidazo[1,2-a]pyridine 2,4-dinitrotoluene 2,6-dinitrotoluene 2,4,6-cycloheptatrien-1-one 2-iodo-4-methylphenol 2,6-dimethylpyridine-1-oxide cyclohexyl isocyanate butyl 3-bromopropanoate cycloheptanol 4,4,4-trifluoro-1-(2-furyl)-1,3-butanedione 1-(4-bromophenyl)ethanone indole 4-(2-furanyl)-3-buten-2-one 1-(2-furyl)-1,3-butanedione 3-chloro-2-methylanisole 2-amino-1-phenylethanone 1,2-dimethoxybenzene ethyl 4-methyl-3-oxopentanoate methyl (4S)-(+)-2,2-dimethyl-1,3-dioxolane-4-acetate 7-methyl-1,5,7-triazabicyclo[4.4.0]dec-5-ene 2,5-dimethyl-1-hexanol diethyl [difluoro(trimethylsilyl)methyl]phosphonate 3-phenyl-2-propynal

1.651 1.35 1.566 1.455 1.5939 1.533 1.584 1.519 1.612 1.493 1.5081 1.4 1.439 1.763 1.471 1.347 1.627 1.466 1.5101 1.6044 1.5711 1.4606 1.5384 1.413 1.557 1.491 1.405 1.31 1.58 1.333 1.333 1.5145 1.4564 1.285 1.4461 1.5824 1.626 1.442 1.479 1.6172 1.5331 1.5706 1.5341 1.3051 1.407 05 1.528 1.647 1.63 1.5788 1.5745 1.42 1.616 1.5827 1.25 1.615 1.538 1.5095 1.414 1.6079

1939

RI

pred

1.564 1.433 1.638 1.561 1.520 1.440 1.453 1.589 1.534 1.413 1.437 1.581 1.581 1.636 1.591 1.556 1.497 1.556 1.576 1.527 1.503 1.612 1.472 1.496 1.453 1.419 1.498 1.377 1.477 1.539 1.539 1.429 1.521 1.364 1.583 1.495 1.554 1.571 1.571 1.537 1.609 1.502 1.468 1.457 1.472 1.434 1.565 1.560 1.505 1.495 1.533 1.539 1.514 1.422 1.428 1.666 1.430 1.325 1.543

ARD% 5.2 6.1 4.6 7.3 4.6 6.1 8.3 4.6 4.8 5.4 4.7 12.9 9.9 7.2 8.1 15.5 8.0 6.1 4.4 4.8 4.3 10.4 4.3 5.9 6.7 4.8 6.6 5.1 6.5 15.4 15.4 5.7 4.5 6.1 9.5 5.5 4.4 9.0 6.2 4.9 5.0 4.4 4.3 11.6 4.6 6.1 5.0 4.3 4.7 5.0 8.0 4.8 4.3 13.8 11.6 8.3 5.3 6.3 4.0

g·mol−1 294.812 91.1536 343.675 158.544 95.1011 96.1325 99.1329 323.943 146.214 86.1338 118.133 192.001 192.001 229.107 190.015 160.624 139.178 122.127 126.179 110.112 142.178 125.195 96.1289 197.972 113.16 100.161 132.159 104.168 134.175 182.174 182.174 150.723 179.203 350.056 229.919 119.123 118.139 182.136 182.136 106.124 234.036 123.155 125.171 209.083 114.188 206.121 199.047 117.151 136.15 152.15 156.611 135.166 138.166 158.197 174.197 153.228 130.23 260.293 130.146

dx.doi.org/10.1021/je5000633 | J. Chem. Eng. Data 2014, 59, 1930−1943

Journal of Chemical & Engineering Data

Article

Table 3. continued Mw lit

ID

compound

RI

119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177

(R)-(−)-O-formylmandeloyl chloride trans-3-phenyl-2-propenenitrile methyl 2-isothiocyanatobenzoate cinnamaldehyde trans-3-phenyl-2-propenal 2-methyl benzothiophene 4-methoxybenzyl isocyanate 4-propyl-1,2-benzenediol di(ethylene glycol) methyl ether methacrylate 2-nitrophenyl 2-methylpropanoate 4-methyl-2-isopropyl-1-pentanol tris(trimethylsilyl)silane 1,2-dichloronaphthalene 4-(1-ethylpropyl)pyridine tricyclo[3.3.1.13,7]decane camphor, (+) 1-cyclohexylpiperazine 1,1,3,3,3-pentakis(dimethylamino)-1l5,3l5-diphosphazene 1-oxide 1-naphthalenecarboxylic acid 1-methoxynaphthalene butyl 3,5-dinitrobenzoate ethyl 2-phenoxypropanoate 1-phenyl-1-pentanol 4-chlorobenzenesulfonothioic acid, s-phenyl ester diphenyl selenide diphenyl diselenide 2,3-dimethylnaphthalene 1-Z-piperazine (1S)-(+)-neomenthyl acetate sucrose 2,3-dimethyl-3,4-diethylhexane tetraethylbutylene-1,4-diphosphonate tetrapropylammonium hydroxide 1,1,2,2-tetraisopropyldisilane o-toluenethiosulfonic acid, s-phenyl ester 1-isopropylnaphthalene N-carbobenzyloxy-L-threonine methyl ester anthracene phenanthrene N-carbobenzyloxy-L-glutamic acid 1-methylester di-tert-butyl N,N-diisopropylphosphoramidite 1,1-diphenyl-2-propanone 2,3-dimercapto-1-propanol tributyrate 2,4,6-tris(dimethylaminomethyl)phenol divinyldiphenylsilane 1-benzyl-4-propylbenzene 2,5-bis(tert-butylperoxy)-2,5-dimethylhexane 1,3-di-o-benzylglycerol benzphetamine 1,1-bis(tert-butylperoxy)-3,3,5-trimethylcyclohexane N-(4-methoxybenzylidene)-4-butylaniline di-2-naphthyl disulfide di(propylene glycol) dibenzoate dicyclohexyl phthalate 2,4,6,8-tetrabutyl-2,4,6,8-tetramethylcyclotetrasiloxane 9-hexylheptadecane trioctylamine tris(2-ethylhexyl) phosphate tetra(2-ethylbutyl) silicate

1.523 1.6013 1.5364 1.6195 1.62 1.699 1.433 1.444 1.44 1.4315 1.621 1.49 1.5338 1.4091 1.568 1.5462 1.6011 1.495 1.46 1.694 1.488 1.36 1.4086 1.4087 1.55 1.743 1.506 1.546 1.652 1.5376 1.664 1.4475 1.372 1.477 1.5341 1.693 1.4365 1.729 1.548 1.4365 1.444 1.5361 1.495 1.516 1.535 1.3552 1.423 1.549 1.5515 1.441 1.55 1.4555 1.528 1.431 1.43 1.4465 1.4486 1.444 1.4307

1940

RI

pred

1.604 1.537 1.623 1.545 1.545 1.603 1.541 1.539 1.512 1.517 1.439 1.418 1.620 1.512 1.489 1.471 1.496 1.606 1.618 1.587 1.571 1.495 1.522 1.691 1.658 1.642 1.576 1.455 1.446 1.620 1.439 1.518 1.457 1.398 1.669 1.580 1.538 1.643 1.649 1.537 1.376 1.605 1.564 1.593 1.607 1.574 1.349 1.619 1.631 1.351 1.618 1.692 1.597 1.527 1.337 1.375 1.375 1.365 1.260

ARD% 5.3 4.0 5.6 4.6 4.6 5.6 7.5 6.6 5.0 6.0 11.2 4.8 5.6 7.3 5.0 4.9 6.6 7.4 10.8 6.3 5.6 9.9 8.1 20.0 7.0 5.8 4.7 5.9 12.5 5.4 13.5 4.9 6.2 5.3 8.8 6.7 7.1 5.0 6.5 7.0 4.7 4.5 4.6 5.1 4.7 16.2 5.2 4.5 5.1 6.2 4.4 16.3 4.5 6.7 6.5 4.9 5.1 5.5 12.0

g·mol−1 198.605 129.162 193.226 132.162 132.162 148.229 163.176 152.193 188.224 203.239 144.257 248.661 197.063 149.236 136.237 152.236 168.283 312.338 172.183 158.2 268.227 194.23 164.247 284.787 233.171 312.131 156.227 220.272 198.305 342.3 170.338 330.299 203.369 230.54 264.369 170.254 267.282 178.233 178.233 295.292 277.388 210.276 334.501 265.4 236.388 210.319 290.444 272.344 239.361 302.455 267.371 318.463 342.392 330.424 464.939 324.634 353.676 434.641 432.759

dx.doi.org/10.1021/je5000633 | J. Chem. Eng. Data 2014, 59, 1930−1943

Journal of Chemical & Engineering Data

Article

Table 3. continued Mw ID

lit

compound

178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197

RI

(2-methylphenoxy)triphenylsilane 9-octyl-8-heptadecene 9-octylheptadecane 11-(2,2-dimethylpropyl)heneicosane 6,11-dipentylhexadecane 3-ethyl-5-(2-ethylbutyl)octadecane sucrose octaacetate 2,2,4,10,12,12-hexamethyl-7-(3,5,5-trimethylhexyl)tridecane 9-octyleicosane tri-n-heptyl trimellitate 2,6,10,15,19,23-hexamethyltetracosane 9-octyldocosane 11-decylheneicosane 2,4,6,8-tetraethyl-2,4,6,8-tetraphenylcyclotetrasiloxane 11-decyltetracosane 13-undecylpentacosane methyltris(trisec-butoxysilyloxy)silane 13-dodecylhexacosane glycerol dioleate glycerol trioleate

1.483 1.4554 1.4487 1.4491 1.4502 1.4524 1.466 1.4558 1.4515 1.403 1.453 1.4537 1.454 1.543 1.4556 1.4567 1.419 1.4577 1.4663 1.4621

80

∑ ni RIi + RI0 i=1

1.579 1.388 1.372 1.377 1.352 1.378 1.323 1.306 1.376 1.486 1.386 1.379 1.380 1.352 1.385 1.387 1.161 1.390 1.538 1.549

ARD% 6.5 4.6 5.3 5.0 6.7 5.1 9.7 10.3 5.2 5.9 4.6 5.1 5.1 12.4 4.9 4.8 18.2 4.6 4.9 5.9

g·mol−1 366.534 350.672 352.688 366.715 366.715 366.715 678.598 394.769 394.769 504.708 422.822 422.822 436.849 601.007 478.93 506.984 833.407 535.037 620.998 885.449

that the developed correlation has a small scatter around the zero error and a small error range in estimating the refractive index. As previously mentioned, the optimal model was obtained using 80 chemical substructures. This point is depicted as a green pentagram sign in Figure 6. The gradual changes in R2 and AARD % as a function of an incremental increase in the number of chemical substructures is depicted in Figure 6. Moreover, the estimated refractive index and their absolute relative deviation from the literature values are presented as a table in the Supporting Information. The model results indicate that it can successfully predict the refractive index of pure organic compounds. The distribution of the deviations was also studied and is presented as Figure 7. As is shown in this figure, the distributions of deviations are identical for the training, validation, and test subsets. The list of compounds studied as well as their predicted refractive index values is presented as Supporting Information. As previously mentioned, in proposing a predictive model or correlation, outlier detection plays a significant role in assessing a group or groups of data which may differ from the bulk of the data present in a data set.43,47,48 Therefore, to check whether the model is statistically correct and valid; the Williams plot has been sketched for the results obtained. Existence of a majority of the data points in the ranges 0 ≤ H ≤ 0.02038 and −3 ≤ R ≤ 3 demonstrates that the applied correlation for prediction of refractive index is statistically acceptable and valid. Good high leverage points are located in the domain of 0.02038 < H for the developed correlation. These points may be known to be outliers of the applicability domain of the model implemented. The results of the refractive index predictive correlation illustrate that only 197 of the data points are located in the aforementioned domain. The compound name, literature values of refractive index, and the predicted values by the new correlation and molecular weight as well as the deviation associated with the 197 data points are tabulated in Table 3 (Another version of Table 3 including the chemical structures of the compounds is presented as Supporting Information.).

absolute relative deviations (AARDs) as a stopping criterion. It means that when the improvement of the model AARD% was less than 0.01, the sequential search mathematical strategy was automatically stopped and the final model reported. The best model derived to predict refractive index data for an 80 chemical structure correlation equation, with a total R2 = 0.888, is as follows: RI =

RI

pred

(6)

where RI0, RIi, and ni are the intercept of the equation, the contribution of the ith chemical substructure to the refractive index, and the number of occurrences of the ith chemical substructure in every chemical structure of pure compounds, respectively. The subset of 80 chemical substructures and their contribution to the refractive index are shown in Table 1. To evaluate the accuracy of the developed correlation, statistical error analysis, comprising R2, AARDs, standard deviation errors (STDs), root-mean-square errors (RMSEs), and graphical error analysis, in which a cross-plot and error distribution are sketched, has been utilized. Definitions and equations of the aforementioned parameters are presented in Appendix A. Table 2 lists the statistical error parameters of the correlation developed using the group contribution method for prediction of refractive index. The R2 and average absolute relative deviation of the new correlation in the testing phase are reported as 0.898 and 0.83, respectively. A comparison between the represented/predicted refractive index values and the literature values are illustrated in Figures 4 and 5. Figure 4 displays the scatter diagram that compares experimental refractive index versus the new correlation outputs. A tight cluster of points near the diagonal for the training phase, validation, and testing data sets illustrates the robustness of the proposed correlation for prediction of refractive index. The results demonstrate that good agreement exists between the prediction of eq 5 and refractive index literature data. Moreover, Figure 5 represents the error distribution of the developed correlation for prediction of refractive index. This figure confirms 1941

dx.doi.org/10.1021/je5000633 | J. Chem. Eng. Data 2014, 59, 1930−1943

Journal of Chemical & Engineering Data

Article

Notes

5. CONCLUSION A group contribution technique has been successfully developed for estimation of the refractive index. In addition, a comprehensive data set of experimental refractive index data was used to develop a general group contribution correlation. A collection of 80 chemical substructures was implemented as model inputs. Using this correlation, a training set composed of 9536 data points was correlated with a very low AARD of 0.83 %. A test set consisting of 1191 data points was employed to test its capability. The model shows an AARD of 0.83 % for the test set. As a result, the model proposed here is reliable and also has appropriate capability for predicting and modeling the physical property.

The authors declare no competing financial interest.



(1) Samuels, R. J. Application of refractive index measurements to polymer analysis. J. Appl. Polym. Sci. 1981, 26 (4), 1383−1412. (2) García-Domenech, R.; de Julián-Ortiz, J. Prediction of indices of refraction and glass transition temperatures of linear polymers by using graph theoretical indices. J. Phys. Chem. B 2002, 106 (6), 1501−1507. (3) Finar, I. Organic Chemistry, 6th ed; Longman: London, 1973. (4) van Krevelen, D. W.; te Nijenhuis, K. Properties of polymers: Their correlation with chemical structure; their numerical estimation and prediction from additive group contributions; Elsevier: Oxford, U.K., 2009 (accessed online). (5) Touba, H.; Mansoori, G. A.; Sarem, A. M. S. New analytic techniques for petroleum fluid characterization using molar refraction, SPE Western Regional Meeting, Long Beach, CA, USA, SPE 38312, Society of Petroleum Engineers, 1997. (6) Mehra, R. Application of refractive index mixing rules in binary systems of hexadecane and heptadecane with n-alkanols at different temperatures. J. Chem. Sci. 2003, 115 (2), 147−154. (7) Katritzky, A. R.; Sild, S.; Karelson, M. General quantitative structure−property relationship treatment of the refractive index of organic compounds. J. Chem. Inf. Comput. Sci. 1998, 38 (5), 840−844. (8) Katritzky, A. R.; Sild, S.; Karelson, M. Correlation and prediction of the refractive indices of polymers by QSPR. J. Chem. Inf. Comput. Sci. 1998, 38 (6), 1171−1176. (9) Lorentz, H. Ueber die Beziehung zwischen der Fortpflanzungsgeschwindigkeit des Lichtes und der Körperdichte. Ann. Phys. 1880, 245 (4), 641−665. (10) Dale, T. P.; Gladstone, J. On the influence of temperature on the refraction of light. Philos. Trans. R. Soc. London 1858, 148, 887− 894. (11) Vogel, A. I. 369. Physical properties and chemical constitution. Part XXIII. Miscellaneous compounds. Investigation of the so-called co-ordinate or dative link in esters of oxy-acids and in nitro-paraffins by molecular refractivity determinations. Atomic, structural, and group parachors and refractivities. J. Chem. Soc. (Resumed) 1948, 1833−1855. (12) Agrawal, A. K.; Jenekhe, S. A. Thin-film processing and optical properties of conjugated rigid-rod polyquinolines for nonlinear optical applications. Chem. Mater. 1992, 4 (1), 95−104. (13) Yang, C.-J.; Jenekhe, S. A. Group contribution to molar refraction and refractive index of conjugated polymers. Chem. Mater. 1995, 7 (7), 1276−1285. (14) Kier, L. B.; Hall, L. H. Molecular connectivity in structure-activity analysis; Research Studies Press: Letchworth, U.K., 1986; Vol. 9. (15) Brekke, T.; Kvalheim, O. M.; Sletten, E. Prediction of physical properties of hydrocarbon mixtures by partial-least-squares calibration of carbon-13 nuclear magnetic resonance data. Anal. Chim. Acta 1989, 223, 123−134. (16) Xu, J.; Chen, B.; Zhang, Q.; Guo, B. Prediction of refractive indices of linear polymers by a four-descriptor QSPR model. Polymer 2004, 45 (26), 8651−8659. (17) Ha, Z.; Ring, Z.; Liu, S. Quantitative structure-property relationship (QSPR) models for boiling points, specific gravities, and refraction indices of hydrocarbons. Energy Fuels 2005, 19 (1), 152− 163. (18) Xu, J.; Liang, H.; Chen, B.; Xu, W.; Shen, X.; Liu, H. Linear and nonlinear QSPR models to predict refractive indices of polymers from cyclic dimer structures. Chemom. Intell. Lab. Syst. 2008, 92 (2), 152− 156. (19) Gharagheizi, F.; Eslamimanesh, A.; Mohammadi, A. H.; Richon, D. Determination of Critical Properties and Acentric Factors of Pure Compounds Using the Artificial Neural Network Group Contribution Algorithm. J. Chem. Eng. Data 2011, 56 (5), 2460−2476. (20) Gharagheizi, F.; Abbasi, R.; Tirandazi, B. Prediction of Henry’s Law Constant of Organic Compounds in Water from a New GroupContribution-Based Model. Ind. Eng. Chem. Res. 2010, 49 (20), 10149−10152.



APPENDIX A. STATISTICAL ERROR PARAMETERS In this study, to identify the accuracy of the new correlation, a number of statistical parameters have been applied including squared correlation coefficients (R2), average absolute relative deviations (AARDs), standard deviation errors (STDs), and root-mean-square errors (RMSEs). Definitions and equations of the aforementioned parameters are as follows: (1) squared correlation coefficients: N

R2 = 1 −

∑i = 1 (X(i)exp − X(i)rep/pred)2 N

∑i (X(i)rep/pred − avX(i)rep/pred)2

(A.1)

(2) average absolute relative deviations: AARD% =

N

100 N



|X(i)rep/pred − X(i)exp| X(i)exp

i

(A.2)

(3) absolute relative deviation: ARD% = 100

|X(i)rep/pred − X(i)exp | X(i)exp

(4) standard deviation errors: STD =

1 N

N

((X(i)rep/pred) − av(X(i)rep/pred))2

∑ i

(A.3)

(5) root mean square errors: RMSE =



1 N

N

∑ (X(i)exp − X(i)rep/pred)2 i

(A.4)

ASSOCIATED CONTENT

S Supporting Information *

Table listing the chemical structures of outliers, the predicted values, and the status of each data point (training, validation, and test set). This material is available free of charge via the Internet at http://pubs.acs.org.



REFERENCES

AUTHOR INFORMATION

Corresponding Authors

*E-mail: [email protected]. *E-mail: [email protected]. Funding

This work is based upon research supported by the South African Research Chairs Initiative of the Department of Science and Technology and National Research Foundation. 1942

dx.doi.org/10.1021/je5000633 | J. Chem. Eng. Data 2014, 59, 1930−1943

Journal of Chemical & Engineering Data

Article

(21) Gharagheizi, F.; Eslamimanesh, A.; Mohammadi, A. H.; Richon, D. Use of artificial neural network-group contribution method to determine surface tension of pure compounds. J. Chem. Eng. Data 2011, 56 (5), 2587−2601. (22) Gharagheizi, F. A new group contribution-based model for estimation of lower flammability limit of pure compounds. J. Hazard. Mater. 2009, 170 (2), 595−604. (23) Gharagheizi, F.; Abbasi, R. A New Neural Network Group Contribution Method for Estimation of Upper Flash Point of Pure Chemicals. Ind. Eng. Chem. Res. 2010, 49 (24), 12685−12695. (24) Gharagheizi, F.; Eslamimanesh, A.; Mohammadi, A. H.; Richon, D. Representation/prediction of solubilities of pure compounds in water using artificial neural network− group contribution method. J. Chem. Eng. Data 2011, 56 (4), 720−726. (25) Gharagheizi, F.; Sattari, M.; Tirandazi, B. Prediction of Crystal Lattice Energy Using Enthalpy of Sublimation: A Group ContributionBased Model. Ind. Eng. Chem. Res. 2011, 50 (4), 2482−2486. (26) Gharagheizi, F.; Ilani-Kashkouli, P.; Acree, W. E., Jr.; Mohammadi, A. H.; Ramjugernath, D. A group contribution model for determining the sublimation enthalpy of organic compounds at the standard reference temperature of 298K. Fluid Phase Equilib. 2013, 354, 265−285. (27) Gharagheizi, F.; Babaie, O.; Mazdeyasna, S. Prediction of vaporization enthalpy of pure compounds using a group contributionbased method. Ind. Eng. Chem. Res. 2011, 50 (10), 6503−6507. (28) Gharagheizi, F.; Eslamimanesh, A.; Mohammadi, A. H.; Richon, D. Determination of Parachor of Various Compounds Using an Artificial Neural Network-Group Contribution Method. Ind. Eng. Chem. Res. 2011, 50 (9), 5815−5823. (29) Scalabrin, G.; Marchi, P.; Bettio, L.; Richon, D. Enhancement of the extended corresponding states techniques for thermodynamic modeling. II. Mixtures. Int. J. Refrig. 2006, 29 (7), 1195−1207. (30) Gharagheizi, F.; Alamdari, R. F.; Angaji, M. T. A new neural network− group contribution method for estimation of flash point temperature of pure components. Energy Fuels 2008, 22 (3), 1628− 1635. (31) Mohammadi, A. H.; Richon, D. A mathematical model based on artificial neural network technique for estimating liquid water− hydrate equilibrium of water− hydrocarbon system. Ind. Eng. Chem. Res. 2008, 47 (14), 4966−4970. (32) Gharagheizi, F. Prediction of the standard enthalpy of formation of pure compounds using molecular structure. Aust. J. Chem. 2009, 62 (4), 376−381. (33) Kamari, A.; Hemmati-Sarapardeh, A.; Mirabbasi, S.-M.; Nikookar, M.; Mohammadi, A. H. Prediction of sour gas compressibility factor using an intelligent approach. Fuel Process. Technol. 2013, 116, 209−216. (34) Kamari, A.; Gharagheizi, F.; Bahadori, A.; Mohammadi, A. H.; Zendehboudi, S. Rigorous Modeling for Prediction of Barium Sulfate (Barite) Deposition in Oilfield Brines. Fluid Phase Equilib. 2014, 366, 117−126. (35) Kamari, A.; Khaksar-Manshad, A.; Gharagheizi, F.; Mohammadi, A. H.; Ashoori, S. Robust Model for the Determination of Wax Deposition in Oil Systems. Ind. Eng. Chem. Res. 2013, 52, 15664− 15672. (36) Yaws, C. L. The Yaws handbook of physical properties for hydrocarbons and chemicals; Gulf: Houstpm. TX. USA, 2005 (37) Eslamimanesh, A.; Gharagheizi, F.; Illbeigi, M.; Mohammadi, A. H.; Fazlali, A.; Richon, D. Phase equilibrium modeling of clathrate hydrates of methane, carbon dioxide, nitrogen, and hydrogen+ water soluble organic promoters using Support Vector Machine algorithm. Fluid Phase Equilib. 2012, 316, 34−45. (38) Gharagheizi, F.; Ilani-Kashkouli, P.; Mohammadi, A. H.; Ramjugernath, D.; Richon, D. Development of a Group Contribution Method for Estimating the Thermal Decomposition Temperature of Ionic Liquids. Fluid Phase Equilib. 2013, 355, 81−86. (39) Gharagheizi, F.; Eslamimanesh, A.; Mohammadi, A. H.; Richon, D. Representation/Prediction of Solubilities of Pure Compounds in

Water Using Artificial Neural Network−Group Contribution Method. J. Chem. Eng. Data 2011, 56 (4), 720−726. (40) Gharagheizi, F.; Eslamimanesh, A.; Ilani-Kashkouli, P.; Mohammadi, A. H.; Richon, D. Determination of Vapor Pressure of Chemical Compounds: A Group Contribution Model for an Extremely Large Database. Ind. Eng. Chem. Res. 2012, 51 (20), 7119−7125. (41) Knuth, D. E. The art of computer programming, Vol. 3: Sorting and searching. Addison-Wesley: Reading, MA, USA, 1973. (42) Hoaglin, D. C.; Welsch, R. E. The hat matrix in regression and ANOVA. Am. Stat. 1978, 32 (1), 17−22. (43) Rousseeuw, P. J.; Leroy, A. M. Robust regression and outlier detection; Wiley: Hoboken, NJ, USA, 2005; Vol. 589. (44) Goodall, C. R. 13 Computation using the QR decomposition. Handb. Stat. 1993, 9, 467−508. (45) Gramatica, P. Principles of QSAR models validation: Internal and external. QSAR Comb. Sci. 2007, 26 (5), 694−701. (46) Gharagheizi, F.; Eslamimanesh, A.; Mohammadi, A. H.; Richon, D. Group contribution model for determination of molecular diffusivity of non-electrolyte organic compounds in air at ambient conditions. Chem. Eng. Sci. 2012, 68 (1), 290−304. (47) Mohammadi, A. H.; Eslamimanesh, A.; Gharagheizi, F.; Richon, D. A novel method for evaluation of asphaltene precipitation titration data. Chem. Eng. Sci. 2012, 78, 181−185. (48) Mohammadi, A. H.; Gharagheizi, F.; Eslamimanesh, A.; Richon, D. Evaluation of experimental data for wax and diamondoids solubility in gaseous systems. Chem. Eng. Sci. 2012, 81, 1−7. (49) Mohammadi, A. H.; Eslamimanesh, A.; Gharagheizi, F.; IlaniKashkouli, P. Are the reservoir fluid compositional grading data reliable? Fluid Phase Equilib. 2014, 363, 27−31.

1943

dx.doi.org/10.1021/je5000633 | J. Chem. Eng. Data 2014, 59, 1930−1943