Environ. Sci. Technol. 2000, 34, 2596-2600
Predicting Surface Tension of Liquid Organic Solvents E. EGEMEN, N. NIRMALAKHANDAN,* AND C. TREVIZO Civil, Agricultural, and Geological Engineering Department, New Mexico State University, Las Cruces, New Mexico 88003
A group contribution method is proposed to predict surface tension of liquid organic solvents. The proposed model is developed from a training set of 349 chemicals and validated with an external testing set of 44 chemicals. For the training set, the experimental surface tension values and the values fitted by this model agreed well with r 2 ) 0.75 at p ) 0.0001. The predictions of this model for the external testing set of 44 chemicals were within an average factor of error of 1.07 showing good agreement between experimental and predicted values with r 2 ) 0.89 at p ) 0.0001. A comparison of the model developed in this study against five other empirical models reported in the literature is also presented.
Introduction Surface tension (ST) is an important property in the study of physics and chemistry at free surfaces. Such data are of importance to scientists, engineers, and practitioners in many fields such as chemical process and reactor engineering, flow and transport through porous media, materials selection and engineering, biomedical and biochemical engineering, electronic and electrical engineering, etc. The role of ST of cleaning agents in the mass production of ultra “clean” surfaces and components for the electronic, computer, and medical applications has been well-documented. Several chemicals traditionally used as cleaning agents, known as solvents, have now become the subject of intense regulatory control due to their hazardous and toxic nature. Thus, industries and end-users are seeking alternate “greener” solvents as replacements. In the evaluation of alternate solvents, ST has been identified as a key property (1). For example, the wetting index of a cleaning agent, defined as 1000 × ST × density/viscosity, has been proposed as a useful parameter in comparing cleaning effectiveness (2). Consequently, ST data can also be of benefit in screening existing and proposed chemicals as solvents. When experimentally measured ST data are not readily available, theoretical or empirical methods may be used to establish acceptable ST values for preliminary screening purposes. At present, the methods available have not been well-validated and are inadequate for new chemicals (3). Theoretical methods of calculating ST based on thermodynamics can be complex and require additional chemical properties and simplifying assumptions to complete the calculations. Defay and Prigogine (4) presented an equation developed from thermodynamic considerations and reported, as an example, a calculated value of 14.9 dyn/cm for liquid argon and concluded that it was in “fairly good agreement” with the measured value of 11.9 dyn/cm with a * Corresponding author phone: (505)646-5378; fax: (505)646-6049; e-mail:
[email protected]. 2596
9
ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 34, NO. 12, 2000
factor of error (FE) of 1.25. Using another method, called the Cell Model, Defay and Prigogine (4) reported ST for argon at 85 K, calculated as 9 dyn/cm, which they rated as “only moderately good” as compared with the measured value of 13.2 dyn/cm. Quantitative structure-activity and property-activity relationships (QSAR and QPAR) are well-accepted tools for developing empirical methods to relate physical/chemical/ biological properties of organic molecules to their structures and other properties (5). A limited number of such methods relating to ST have been proposed in the literature. Reid and Sherwood (6) have presented a review of three of the more common empirical models from the literature. The first one (model 1) related ST to parachor, P; liquid density, FL (mol/ mL); and vapor density, FV (mol/mL), according to
ST ) [P(FL - FV)]4
(1)
A group contribution method was proposed to calculate P. Some criticisms have been raised about the validity of the above model (6). For example, if the liquid density has to be estimated, then its fourth power will lead to significant errors in ST. The authors of that model had proposed a second model (model 2) based on molar refraction:
ST )
[
P(n2 - 1)
R(n2 + 2)
]
4
(2)
where R is the molar refraction and n is the refractive index at a wavelength of that of sodium D line. Again, a group contribution approach was proposed to estimate R; but values for n have to be determined experimentally. The third method (model 3) discussed by Reid and Sherwood (6) is based on the principle of corresponding states. According to this method, ST (dyn/cm) can be estimated from
ST ) Pc2/3Tc1/3[(0.133Re - 0.281)(1 - Tr)]11/9
(3)
where Pc is the critical pressure (atm); Tc is the critical temperature (K); Re is the Riedel factor; and Tr is the reduced temperature (-). All of these model parameters can be found from handbooks for many of the existing chemicals but have to be determined for new chemicals. When appropriate and reliable data are available, models 1 and 2 are expected to yield ST with errors of 3-4%, and model 3 with 4-5%. Kavun et al. (3) evaluated over 450 molecular descriptors in developing a QSAR model for ST and reported an eightvariable model that fitted ST data of 72 organic chemicals ranging from 9.49 to 67 dyn/cm, with r 2 ) 0.955 (model 4):
∑
q(Hal) qmax qmin - 264.5 + 401.95 N N N Suns 2.05Ss - 1.08 + 1.04 ln(1 + Sp) + 67.8iχν (4) N
ST ) 37.4 - 690.6
where qmax and qmin are maximal and minimal charges on the atoms; ∑q(Hal) is the net charge on halogen atoms; Ss and Suns are the surface area of saturated and unsaturated apolar surface of the molecule; Sp is the area of the polar van der Waals surface; N is the number of atoms, and iχν is the valence index of connectivity. The predictive ability of this model was demonstrated on an external testing set of 22 chemicals for which the predictions were within 30% of the measured ST values. 10.1021/es991284u CCC: $19.00
2000 American Chemical Society Published on Web 05/16/2000
Yaws (7) has used regression analysis to derive the following equation (model 5) for calculating ST (dyn/cm) at various temperatures, T (K):
(
T ST ) A 1 Tc
)
n
(5)
where A, n, and Tc are statistically determined model constants. The fitting quality of this model was not reported. The application of this method to new molecules will be impossible because it requires three model constants, all of which would be unknown. As can be seen from the above summary, most currently available models are limited by their need of model parameters that have to be determined experimentally or calculated using special purpose computer programs. The objective of this study was to develop an easy-to-use structure-activitybased model to predict ST of organic liquids, without requiring any experimental data or any other physicochemical property. The results of this study will also be compared against the above five predictive approaches reported in the literature.
Methods A training set was assembled from experimentally measured ST data evaluated and compiled from several sources by Lide (8). This training set includes halogenated aliphatics and aromatics, alcohols, ethers, esters, ketones, amines, etc. with cyclic, saturated, and unsaturated structures. The original ST data source (8) contained only eight fluorinated solvents. Preliminary evaluations indicated that these did not fit well with the rest of the data, primarily due to their limited representation in the training set. Therefore, fluorinated solvents were excluded from further modeling. A few sulfurous chemicals from ref 8 were also excluded for the same reason. The final training set of 358 chemicals ranged in ST values from 12 to 63 dyn/cm (Appendix A in Supporting Information). Experimental ST data for the 350 of the 358 chemicals were over a temperature range of 20-30 °C. For the remaining eight chemicals, the corresponding temperatures are indicated in parentheses next to the chemical’s name in Appendix A (see Supporting Information). A multiple stepwise linear regression procedure with ST as the dependent variable was used in developing the model. Simple and valence molecular connectivities, octanol-water partition coefficient, and atom/bond contributions were evaluated singly and in various combinations as independent variables. The connectivity indices evaluated in this study ranged from zero-order to third-order path and cluster, calculated following the procedure of Kier and Hall (9) and modified by Nirmalakhandan (10). The testing set consisted of 44 chemicals that were not used in the model development process. The experimental ST data for this testing set were also obtained from Lide (8) except in the case of ethyl bromide, for which ST data was found from Reid and Sherwood (6). As far as possible, chemicals with multiple structural features were reserved as test chemicals. For example, bromochloromethane in this testing set consists of bromine and chlorine together, whereas these two atoms are represented only individually in the training set; cyclohexanone consists of a cyclo and a ketone subgroup, which were also represented individually in the training set. All the ST values in the this testing set are at 20 °C and ranged from 21 to 41 dyn/cm. Reid and Sherwood (6) have previously compared the predictive ability of models 1-3 on a common set of 19 chemicals. We have chosen the same 19 chemicals to compare the model developed in this study against those three models as well as against models 4 and 5. It should be noted that none of these 19 chemicals were used by us in the training
set in developing the model. They were, however, included in the testing set of 44 chemicals used to validate our model.
Results Model Development. Preliminary evaluations indicated that molecular connectivity indices were not significant in correlating with ST but that atomic contents were. Although, the octanol-water partition coefficient [log(P)] correlated poorly with ST by itself, when certain atom counts were included with log(P), a significant model was found. However, for this model, intercorrelation between the model parameters exceeded 0.75. Therefore, in an effort to develop a model that is independent of any other physical chemical property and to avoid the need for additional data, evaluation of log(P) as a model parameter was abandoned. When an atom/bond contribution approach was attempted, a linear combination of the counts of atoms and bonds yielded an efficient model relating ST (in dyn/cm) to the atom/bond counts by i)m
ST ) 16.36 +
∑(n × contribution ) i
(6)
i
i)1
where m represents the number of various types of atoms/ bonds and n represents the count of each type of atom/ bond in the molecule. The agreement between the experimental ST data and the values calculated by this atom/bond contribution method was highly significant for the 358 chemicals with r 2 ) 0.69 and RMS residual error ) 3.65, at p ) 0.0001. A review of the statistical thermodynamic studies on ST (as suggested by one of the reviewers) indicated that a nonlinear relationship between ST and molecular characteristics might be more appropriate. Several nonlinear combinations of the atom/bond counts were evaluated, but none resulted in any significant improvements to the above model. While this model does not explain any mechanistic behavior, the statistical procedure merely captures the linear relationship existing between ST and the atom/bond contents of the molecules under study. The statistical significance of the relationship as indicated by the p values confirms the relationship. The validity of the linear relationship is further demonstrated by the fact that the distribution of the residuals was completely random when plotted against the fitted values for the 358 chemicals. This plot also indicated that the residulas for 92% of the 358 chemicals were within ( 2 SD. To quantify the fitting error, we propose the use of FE, defined as the ratio of the calculated value to the experimental value; if the ratio is less than 1, then the inverse ratio is used. The fitting error was unacceptably large with FE g 1.5 for 5 of the 358 chemicals (1.4% of the database) and FE g 1.25 for 37 of the 358 chemicals (10% of the database); however, the overall average factor of error, AFE, was an acceptable 1.13. The errors were consistently high for the amines, which may be due to the inadequate representation of these chemicals in the training set. The maximum fitting error was -89% (for triethylamine) while the absolute average error for the entire data set was 11%. Model Refinement. The above basic model was further examined to demonstrate its statstical validity and practical applicability. First, the training set was screened for chemicals exhibiting any outlying tendencies. The distribution of the fitting errors, deleted residulas, standard normal probability plot of the residuals, serial auto correlation, and Cook’s distance were used to identify any outliers. On the basis of these evaluations, 22 chemicals were identified as possible outliers. Before eliminating any of them as outliers, the literature was searched for alternate experimental data for these 22 chemicals. Alternate data for only two chemicals VOL. 34, NO. 12, 2000 / ENVIRONMENTAL SCIENCE & TECHNOLOGY
9
2597
TABLE 1. Atom/Bond Contribution Valuesa atom/ bond
contribution
F value
C O OH Cl Br I
0.470 2.470 8.640 4.300 9.020 12.490
29.719 100.902 452.165 176.899 232.601 73.102
atom/ bond NH2 NH N NO2 cycle double bond
contribution
F value
7.420 8.680 7.200 16.480 5.900 2.850
52.542 63.678 73.523 173.743 55.319 264.195
a ST (dyn/cm) ) 14.92 + ∑n × contribution where n is the number i i i of atom/bond i in chemical.
were found that were significantly different from those used in the original training set; using these data, the quality of the model improved and the repective errors decreased. On the basis of the above, the database was amended as follows for trichloromethylbenzene (no. 121), original ST ) 23.39, amended ST ) 40.84, and ethylenechlorohydrin (no. 282), original ST ) 38.90, amended ST ) 36.55.
The deleted residuals for the following nine chemicals were significantly higher than the average for the rest of the cases: formamide, diiodomethane, adiponitrile, glycerol, trimethylamine, diphenyl ether, ethanolamine, ethylene glcyol, and dibutyl phthalate. The standard normal probability plot of the residuals also indicated that these nine chemicals did not follow the linear relationship. The average Cooks distance for these nine chemicals was an order of magnitude greater than the average value for the rest of the cases (0.01 vs 0.0016). Similar anomaly has been observed by Papzian (11) for formamide and glycol, which was attributed to strong hydrogen-bonding effects. On the basis of these tests, the above nine chemicals were considered outliers, and a final version of eq 6 was developed for the remaining 349 chemicals. The atom/bond contributions and their statistical significances for the final model are tabulated in Table 1. The quality of fitting of this model is illustrated in Figure 1. Auto serial correlation for the final model was 0.0732, confirming that the model was not in any statistical violations. The agreement between the experimental ST values and those calculated from eq 6 was highly
TABLE 2. Comparison between Observed and Predicted ST Valuesa surface tension (dyn/cm) chemical no.
name
observed
predicted
factor
%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
acetic acid acetone aniline benzene bromochloromethane 1-bromonaphthalene butanol o-chloroaniline chlorobenzene 1-chloronaphthalene m-cresol cyclohexane cyclohexanol cyclohexanone cyclohexylamine cyclopentane trans-decahydronaphthalene diethyl ether diisobutyl ketone n,n-dimethylacetamide n,n-dimethylaniline ethyl acetate ethyl bromide ethylcyclohexane ethylene glycol monoethyl ether ethylene glycol diacetate ethylenimine furfuryl alcohol 2,4-lutidine n-heptane n-hexane methanol methylal n-methylaniline n-octane phenol tetrachloromethane tetrahydrofuran tetrahydrofurfuryl alcohol 1,2,3,4-tetrahydronaphthalene toluene p-toluidine 2,2,3-trimethyl butane 2,4-xylidine
27.10 23.46 42.12 28.22 33.32 44.19 24.93 43.66 32.99 41.04 35.69 24.65 32.92 34.57 31.22 21.88 32.15 16.65 25.54 32.43 35.52 23.39 24.15 25.15 28.35 33.01 32.8 38.00 33.17 19.65 17.89 22.07 21.12 36.90 21.14 38.20 26.43 26.40 37.00 33.17 27.93 36.06 18.99 36.75
29.82 21.65 33.71 26.29 28.71 42.89 25.44 38.01 30.59 38.17 35.40 23.64 32.28 28.96 31.06 23.17 31.42 19.27 24.47 29.32 34.43 24.59 24.88 24.58 27.91 33.35 30.45 39.98 33.96 18.21 17.74 24.03 21.27 35.44 18.68 34.93 32.59 25.17 34.28 34.07 26.76 34.18 18.21 34.65
1.10 1.08 1.25 1.07 1.16 1.03 1.02 1.15 1.08 1.08 1.01 1.04 1.02 1.19 1.01 1.06 1.02 1.16 1.04 1.11 1.03 1.05 1.03 1.02 1.02 1.01 1.08 1.05 1.02 1.08 1.01 1.09 1.01 1.04 1.13 1.09 1.23 1.05 1.08 1.03 1.04 1.06 1.04 1.06
-10.0 7.7 20.0 6.8 13.8 2.9 -2.0 12.9 7.3 7.0 0.8 4.1 1.9 16.2 0.5 -5.9 2.3 -15.7 4.2 9.6 3.1 -5.1 -3.0 2.3 1.6 -1.0 7.2 -5.2 -2.4 7.3 0.8 -8.9 -0.7 4.0 11.6 8.6 -23.3 4.7 7.4 -2.7 4.2 5.2 4.1 5.7
1.07
6.4
absolute av a
predictive error
All observed data are from ref 8, except for ethyl bromide (no. 23) which was from ref 6.
2598
9
ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 34, NO. 12, 2000
TABLE 3. Comparison of Observed vs Calculated ST Values and Factors of Error by Different Modelsa model 1 chemical name
no. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
acetic acid acetone aniline benzene butanol carbon tetrachloride chlorobenzene cyclohexane cyclopentane diethyl ether ethyl acetate ethylbromide n-heptane n-hexane methanol phenol n-octane toluene 2,2,3-trimethyl butane AFE
obsd ST
calcd ST
27.4 23.3 43.0 28.9 24.6 26.8 33.3 25.0 22.4 17.1 23.8 24.2 20.3 18.4 22.6 40.9 21.8 28.5 18.8
26.0 23.7 41.5 27.9 25.3 27.1 33.1 24.3 21.7 17.0 23.2 23.0 20.3 18.4 19.7 36.4 21.7 27.9 18.2
model 2
FE
calcd ST
1.05 1.02 1.04 1.04 1.03 1.01 1.01 1.03 1.03 1.01 1.02 1.05 1.00 1.00 1.14 1.12 1.00 1.02 1.03
26.3 24.3 41.7 27.9 25.1 27.1 32.9 24.3 21.7 17.0 23.4 23.0 20.3 18.5 20.1 36.4 21.7 27.9 18.2
1.03
model 3
model 4
FE
calcd ST
FE
calcd ST
1.04 1.04 1.03 1.03 1.02 1.01 1.01 1.03 1.04 1.00 1.02 1.05 1.00 1.00 1.12 1.12 1.00 1.02 1.03
42.2 25.2 47.2 28.0 33.6 25.9 33.1 24.1 22.0 17.0 24.3 27.3 19.3 18.4 44.2 54.1 21.7 29.1 19.2
1.54 1.08 1.10 1.03 1.37 1.03 1.01 1.04 1.02 1.00 1.02 1.13 1.05 1.00 1.96 1.32 1.00 1.02 1.02
26.9 21.1 42.3 32.2 23.1 26.6 37.4 27.3 na na 22.4 26.9 19.5 18.1 na na 20.6 30.6 na
1.03
1.14
FE 1.02 1.11 1.02 1.11 1.06 1.01 1.12 1.09 1.06 1.11 1.04 1.02 1.06 1.07 1.06
model 5 calcd ST 17.9 23.7 43.0 28.9 26.3 26.9 33.6 25.3 22.4 17.0 23.8 24.2 20.3 18.6 24.2 41.3 21.6 28.5 19.5
model 6
FE
calcd ST
FE
1.54 1.02 1.00 1.00 1.07 1.01 1.01 1.01 1.00 1.00 1.00 1.00 1.00 1.01 1.08 1.01 1.01 1.00 1.04
29.8 21.7 33.7 26.3 25.4 32.6 30.6 23.6 23.2 19.3 24.6 24.9 18.2 17.7 24.0 34.9 18.7 26.8 18.2
1.09 1.08 1.28 1.10 1.04 1.22 1.09 1.06 1.03 1.13 1.04 1.03 1.12 1.04 1.07 1.17 1.17 1.07 1.03
1.04
1.10
a
Models 1-3, ref 6; model 4, ref 3; model 5, ref 7; model 6, this study. obsd, observed; calcd, calculated; FE, factors of error; na, not available; AFE, average factor of error.
FIGURE 1. Observed vs fitted surface tension for final training set of 349 chemicals.
FIGURE 2. Observed vs predicted surface tension for testing set of 44 chemicals.
significant for the 349 chemicals, with r 2 ) 0.754 and adjusted r 2 ) 0.746, at p < 0.0001. The fact that the basic model was able to identify questionable data further adds credence to this atom/bond contribution approach. Additional evidence supporting the suitability of this approach is provided in the next section, where the predicitive ability of the model is demonstrated. Model Validation. The predictive ability of the refined atom/bond contribution model was evaluated on the testing set of 44 chemicals that were not included in the model development. These chemicals, their experimental ST values, and the predicted ST values are listed in Table 2. The predictions for these 44 chemicals agreed well with the experimental values with an r 2 ) 0.89 at p ) 0.0001. The overall AFE for these predictions was 1.07 with a maximum deviation of -22%. The quality of prediction for this testing set is illustrated in Figure 2. Since these chemicals had multiple structural features that were represented in the training set by individual chemicals, this validation indicates that the proposed atom/bond contribution approach is a valid one to predict ST.
On the basis of the results of this study, the atom/bond contribution method can be seen to be a simple and straightforward method to predict ST data for organic chemicals. The predictions made by this model for “new” chemicals were demonstrated to be within 20% and at an AFE of 1.07, which may be acceptable for most practical applications. Model Comparisons. The above model was evaluated against the five others reported in the literature using 19 chemicals of another testing set as a common testing set. In the case of model 4, the authors had reported calculated values for only 14 of these 19 chemicals. (Due to the nonavailability of the computer program, we were unable to utilize their model, eq 4, to calculate ST values for the remaining five chemicals.) The results of this comparison are summarized in Table 3. The maximum predictive FE of the five models and the proposed model were 1.14, 1.12, 1.96, 1.11, 1.54, and 1.28, respectively. On the basis of the AFE values tabulated in Table 3, models 1, 2, and 5 fitted the experimental ST data very well (AFE ) 1.03-1.04), while models 4 and 6 performed satisfactorily (AFE ) 1.05-1.10). However, models 1-4 may not be readily applicable to new VOL. 34, NO. 12, 2000 / ENVIRONMENTAL SCIENCE & TECHNOLOGY
9
2599
chemicals and existing chemicals due to scarcity of model parameters. Model 5 can not be applied to new chemicals. The model presented in this study (model 6) is a simple and straightforward one that could be confidently applied to molecules containing atoms similar to the training set of chemicals used in this study. While comparing well with other theoretical and empirical methods reported in the literature, the proposed model does not require any experimental inputs whatsoever and utilizes molecular structural features only. This enables ST values to be estimated for new molecules in a rapid manner, albeit with a slightly higher average error of ∼10% as compared to other models (∼3-5%) that require experimental inputs or special purpose computer packages. In addition, the models reported in the literature cover only fewer chemicals, and their predictive abilities have not been well-demonstrated. In comparison, the model reported in this study covers nearly 350 chemicals with demonstrated predicitvie ability on 44 new chemicals.
Supporting Information Available Table showing the experimental vs calculated ST data for the training set of chemicals (11 pages). This material is available free of charge via the Internet at http://pubs.acs.org.
2600
9
ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 34, NO. 12, 2000
Literature Cited (1) Trevizo, C.; Daniel, C.; Nirmalakhandan, N. Environ. Sci. Technol. 2000, 34, 2587-2595. (2) Brighton, P. W. M. J. Hazard. Mater. 1985, 11, 189-208. (3) Kavun, S. M.; Chalykh, A. E.; Palyulin, V. A. Colloid J. 1995, 57 (6), 767-771. (4) Defay, R.; Prigogine, L. Surface tension and adsorption; John Wiley & Sons: New York, 1966. (5) Nirmalakhandan, N.; Speece, R. E. Environ. Sci. Technol. 1989, 22, 606-615. (6) Reid, C. R.; Sherwood, T. K. The Properties of Gases and Liquids; McGraw-Hill: New York, 1966. (7) Yaws, C. L. Chemical properties handbook; McGraw-Hill: New York, 1999. (8) Lide, D. R. Handbook of organic solvents; CRC Press Inc.: Boca Raton, FL, 1995. (9) Kier, L. B.; Hall, K. H. Molecular Connectivity in Structure Activity Analysis; Research Studies Press: Hertfordshire, England, 1986. (10) Nirmalakhandan, N. Prediction of Aqueous Solubility and Henrys Constant from Molecular Structure. Ph.D. Dissertation, Drexel University, Philadelphia, PA, 1988. (11) Papazin, H. A. J. Am. Chem. Soc. 1971, 93, 5634-5636.
Received for review November 15, 1999. Revised manuscript received February 28, 2000. Accepted March 10, 2000. ES991284U