Application of Quantitative Structure−Performance ... - ACS Publications

Application of Quantitative Structure-Performance Relationship and Neural Network Models for the Prediction of Physical. Properties from Molecular Str...
1 downloads 5 Views 101KB Size
Ind. Eng. Chem. Res. 1998, 37, 3043-3051

3043

Application of Quantitative Structure-Performance Relationship and Neural Network Models for the Prediction of Physical Properties from Molecular Structure Alexander P. Bu 1 nz,*,† Bjo1 rn Braun,‡ and Ralf Janowsky† Department of Chemical Engineering (experSCience), Hu¨ ls Infracor GmbH, Paul-Baumann-Strasse 1, D-45764 Marl, Germany, and Technische Universita¨ t Hamburg-Harburg, Eissendorfer Strasse 38, 21073 Hamburg, Germany

Quantitative structure-performance relationship (QSPR) and neural network models have been designed to correlate and predict physical properties of pure components and a mixture parameter for a simple equation of state. The key step was to generate and select those structure-related parameters (descriptors) that best described the experimental physical property data by a multilinear regression or a neural network analysis. The descriptors found show theoretical significance and allow insights in the theoretical background of the physical properties investigated. The correlations and neural network models enable us to predict physical properties of compounds related to but not present in the training set of compounds used for the development of the QSPR and neural network models. Examples are presented for the prediction of the normal boiling point of chlorosilanes, the cloud points of surfactants, and the combining rule parameter kij in a modified Peng-Robinson equation of state applied to vapor-liquid equilibria of binary systems containing carbon dioxide. Introduction John M. Prausnitz wrote (1996), “There is a widespread assumption that physical properties are “there”, somewhere; all one needs to do is to look in the right places. [...] Experimental data and correlation methods are severely limited; for many (perhaps most) mixtures, we can only make intelligent guesses and to do even that, requires time, thought and experience.” At Hu¨ls in Marl, the Physical Properties & Thermodynamics service group uses a three-path strategy similar to the one outlined by Dohrn et al. (1997) to meet our customers’ needs in terms of physical properties of pure compounds and mixtures. Our access to large commercial and user defined In-house databanks as well as on-line databanks (e.g., Detherm, DIPPR, DDBSP, Beilstein, Gmehlin) allow a fast and inexpensive search for the requested information. Experimental data from our laboratories fill gaps where accuracy is a prime issue. However, in many cases experimental data cannot be retrieved or require a prohibitively large amount of money and time if measured. It is then that the vast number of thermodynamic models and estimating methods now readily available with modern process simulators, such as ASPEN PLUS, ChemCad III, HYSIM, and PRO II, can be used with care by the experienced physical property person. Another approach to calculate physical properties not yet included in commercially available process simulators is based on quantitative structure-activity or structure-property relationship (QSAR or QSPR, respectively) methods and computational or artificial neural networks (CNN or ANN, respectively). * To whom correspondence should be addressed. Telephone: + 49 (2365) 49-4212. Fax: + 49 (2365) 49-5589. E-mail: [email protected]. † Department of Chemical Engineering (experSCience), Hu ¨ ls Infracor GmbH. ‡ Technische Universita ¨ t Hamburg-Harburg.

Driven by the fundamental principle of chemistry that all the information that determines the chemical, biological, and physical properties of any compound is coded within its structural formula (Katritzky et al., 1997), QSAR/QSPR methods or more general, quantitative structure-performance relationship methods were established. The basic concept of QSPR is to relate the structure of a compound expressed in terms of descriptors that characterize one particular molecular feature (e.g., the number of carbon atoms) to the property of interest (e.g., the boiling point). Correlations can be found by multilinear regression techniques such as principal component analysis (PCA; Glen et al., 1989), by nonlinear techniques such as genetic function approximation (GFA; Rogers and Hopfinger, 1993) or by ANN (Egolf and Jurs, 1993; Gakh et al., 1994). The beauty of QSPR methods lies in the chance to obtain explicit or implicit correlations that not only represent those compounds originally present in the training set but very often apply to sets of compounds that are only structurally related to the training set. This is due to the more generalized information encoded in the calculated descriptors and distinguishes QSPR methods from classical group contribution methods, which fail when a single group in the molecule is absent in the group table (Stanton et al., 1991). QSPR methods are nothing new and have long been used by major pharmaceutical companies to elucidate the effect of structure on particular biological properties (Hansch and Fujita, 1964). However, the application of QSPR methods to chemical and physical properties has been less developed (Katritzky et al., 1997). Among the physical properties correlated by QSPR are boiling points (e.g., Le and Weers, 1995), melting points (e.g., Tan and Rode, 1996), glass transition temperatures (e.g., Cypcar et al., 1996), water solubility and octanolwater coefficients (e.g., Patil, 1991 and 1994), and the solubility of organic compounds in carbon dioxide (En-

S0888-5885(97)00910-X CCC: $15.00 © 1998 American Chemical Society Published on Web 05/20/1998

3044 Ind. Eng. Chem. Res., Vol. 37, No. 8, 1998

gelhardt and Jurs, 1997). Particularly active and successful in this field are Jurs and co-workers (Stanton et al., 1991 and 1992; Russell et al., 1992; Egolf and Jurs, 1993; Wessel and Jurs, 1994 and 1995, and Sutter and Jurs, 1996) and the cooperating research groups of Karelson and Katritzky (Murugan et al., 1994; Katritzky et al., 1995; Huibers et al., 1996; Katritzky et al., 1996; Karelson et al., 1996; and Katritzky et al., 1997). The goals of this study are 2-fold. First of all we intend to prove the applicability of QSPR and neural network models for correlating and predicting physical properties of pure components and mixtures relevant in process simulation. The examples selected are the normal boiling point of chlorosilanes, the cloud point of surfactants, and the interaction parameter necessary to correctly describe the vapor-liquid phase equilibrium of binary systems containing carbon dioxide and hydrocarbons with an equation of state. The second goal of this work is to establish a new methodology at our department as an integral part of a software package aiming at the computer-aided molecular design of chemicals with specified properties. This toolbox will eventually assist chemical engineers and chemists alike to develop new products rapidly and at lower costs for the benefit of our customers. Normal Boiling Point The normal boiling point of an organic compound is an important piece of information, both for chemistry and engineering. It is elementary for the identification of an unknown substance (Shriner et al., 1980) and for the estimation of physical properties such as critical temperature, molar volume, and enthalpy of vaporization using models that incorporate the normal boiling point as a parameter (Reid et al., 1987). Methods to predict the boiling point of organic compounds are numerous (e.g., Rechsteiner, 1982). Widely used in process simulation is the method of Joback (1984) and Joback and Reid (1987), which employs a group additivity scheme but assumes no interaction between groups. More recently, a new group contribution method was developed using first- and second-order group contributions that allows one to distinguish between isomers (Constantinou and Gani, 1994). Although group contribution methods have proven successful, they are limited in application due to the constraints with respect to the size and the diversity of functionalities (Egolf and Jurs, 1993). Also, it has been pointed out that there is no theoretical basis for a particular group definition (Wu and Sandler, 1991). As an alternative, QSPR models employ descriptors solely derived from the molecular structure. Katritzky et al. (1995) and Wessel and Jurs (1995) studied large sets of 298 diverse organic compounds and 296 hydrocarbons respectively to correlate the normal boiling point with molecular stucture and developed two- to sixparameter models with correlation coefficients (R) of 0.997 and 0.987, respectively. The compilation of boiling points was taken from the Design Institute for Physical Property Data (DIPPR) database. The QSPR models for more homogeneous classes of furans, tetrahydrofurans, thiophenes, pyrans, pyrroles, and pyridines were developed by Stanton et al. (1991, 1992) and Egolf and Jurs (1993). The data in these studies were taken from Beilstein Institute database. The goal was to predict the boiling points of similar compounds where experimental data are unavailable and to use the models for

quality control; that is, to detect errors in the Beilstein Institute database and to validate data before they are implemented in the database. The authors also used artificial neural networks as an alternative method to the multilinear regression technique. The neural networks proved to be capable of creating superior models and allowed for selecting the best descriptors from a set of descriptors that were important in the linear models. A set of boiling points for pyridines and piperidines were used by Murugan et al. (1994) to generate a sixparameter QSPR model that had the potential to speed up the screening process of new compounds. Cloud Points of Surfactants Nonionic surfactants are important constituents of detergent blends. They contain a polar hydrophilic group (head) and a nonpolar hydrophobic group (tail). The detergency is mainly determined by the properties of the surfactants in aqueous solution. One of the many relevant physical properties is the cloud point. Typically, when the temperature of a homogeneous nonionic surfactant-solvent mixture is increased, it becomes turbid in a narrow temperature range called the cloud point or the lower consolution temperature. Above the cloud point, the system splits into two isotropic phases (Degiorgio, 1985). The cloud point is a function of the concentration of the surfactant, the solvent, and other constituents, such as oil or electrolytes if present. It gives a rough estimate of the hydrophilic strength of the surfactants and can be related to the phase inversion temperature, which in turn was found to be close to the optimum cleaning temperature of a given multicomponent surfactant system (Raney et al., 1987). A recent study of Lindgren et al. (1995) showed that the cloud point corresponds directly to the temperature of maximum detergency. Although a few QSPR models have been reported in the literature for the critical micelle concentration of a surfactant-solvent mixture (e.g., Huibers et al., 1996) and for the washing efficiency expressed in characteristic properties of a test fabric (e.g., Richter et al., 1996), only one QSPR model for cloud points was found in the literature (Huibers et al., 1997). Cloud point data for 62 different structures were collected from literature. Most of the data were for pure surfactants of the alkyl ethoxylate class with a wide variety in the hydrophobic domain, neglecting the ethylene oxide size distribution. The QSPR model developed comprised three topological descriptors characterizing the hydrophobic region plus the number of ethylene oxide residues to correlate the cloud point data with a standard deviation(s) of (6.3 °C and a correlation coefficient (R) of 0.968. Interaction Parameter for Cubic Equation of States Cubic equations of state (EOS) are commonly used for the calculation of phase equilibrium in systems with supercritical components such as nitrogen or carbon dioxide. With the exception of fully predictive equations such as the PSRK EOS (Fischer, 1993), these EOS require at least one interaction parameter kij for each binary component pair present in a multicomponent mixture to correctly describe the phase behavior. This parameter accounts for the inadequacies of the underlying mixing and combining rules. It is regressed from experimental vapor-liquid equilibrium data and its

Ind. Eng. Chem. Res., Vol. 37, No. 8, 1998 3045

value is usually close to zero or unity depending on the definition of it. A comprehensive monograph on vaporliquid equilibrium for mixtures of low boiling substances with interaction parameter values given for four EOS was published by Knapp et al. (1982). Process simulators such as ASPEN PLUS offer a variety of cubic EOS for some of which published interaction parameters are stored in databanks. However, at present, for binary systems with carbon dioxide, only a very small number of kij is readily available in these databanks. Despite its empirical origin, the interaction parameter under certain conditions reflects differences in molecular properties and is therefore not an correlation artifact (Teja, 1978). Numerous empirical and semiempirical correlations have been published and are reviewed by Coutinho et al. (1994). More recent work has been done by the group of Tassios (Kordas et al., 1994, 1995; Avlonitis et al., 1994). However, correlations for binary systems with carbon dioxide are less plentiful. Tsonopoulos (1979) reported kij for the cross-virial coefficient Bij used in a corresponding state approach for binary carbon dioxide mixtures. A correlation is presented in terms of the number of carbon atoms for binary mixtures of carbon dioxide and polar hydrocarbons. Nishiumi et al. (1988) correlated the interaction parameter for the Peng-Robinson EOS in terms of the critical molar volumes and the acentric factors of both carbon dioxide and the hydrocarbon. Valderrama et al. (1988) applied five cubic EOS for the calculation of vapor-liquid equilibrium in carbon dioxide-hydrocarbon binary mixtures, with the regressed kij then correlated in terms of the reduced temperature and the acentric factor of only the hydrocarbon. Acentric factors, critical pressures, and critical temperatures of carbon dioxide and the hydrocarbon were used by Bartle et al. (1992) to correlate kij for the Peng-Robinson EOS. A generalized correlation for the interaction parameter of carbon dioxide-alkane binaries used with a volume-translated Peng-Robinson EOS is presented by Kordas et al. (1994). The reduced temperature of carbon dioxide and the alkane acentric factor were needed for the correlation. To our knowledge, no work has yet been published using molecular descriptors to correlate the interaction parameter kij of an EOS for binary systems with carbon dioxide. Method Calculations for the QSPR models were performed on an Alpha AXP 2100 workstation using the AMPAC 5.0 software of Semichem (Shawnee, KS) running under Open VMS 6.1 for drawing and geometrically optimizing the structures and for obtaining the semiempirical quantum-chemical descriptors. The output files of AMPAC were loaded together with the experimental data (e.g., boiling point data) into the PC version of the CODESSA 2.13 program (Semichem) running under Windows 3.11 for calculating up to 450 different constitutional, topological, geometrical, and electrostatic descriptors and for performing the correlation analysis to come up with the best QSPR models of a given number of descriptors. Among different regression strategies, those multiparameter linear regression methods termed Best Multilinear and Heuristic Method (Katritzky et al., 1996) were chosen after careful examination of its performance for our correlation problems. The output of CODESSA is a list of correlation equations, each characterized by its squared correlation

Figure 1. Schematic diagram of the architecture of the artificial neural network used in this study. Descriptor and variable numbers are identified as follows: (1) minimal nucleophilic reactivity index for a chlorine atom; (2) fractional partial negative surface area, type 2 (FNSA-2); (3) relative postive charge surface area, Zefirov’s PC (RPCGZef); (4) weighted partial positive area, type 1 (WPSA-1); (5) relative number of benzene rings (nBenz); (6) gravitation index (all bonds); (7) number of chlorine atoms (NCl); (8) number of carbon atoms (NC); (9) value of boiling point. The thickness of lines drawn between nodes indicates the relative importance of the connectivity expressed by the value of the weight associated with it (not all connections are shown in the diagram).

coefficient and its F-test that are determined automatically by CODESSA. After that, the cross-validated squared correlation coefficient for any selected correlation can be calculated. It is determined by consecutively removing a single physical property value (e.g., the boiling point of one compound) from the data set and by then building a new correlation model used to predict the property value of the compound removed before. The array of predicted data is linearly correlated with the array of experimental data resulting in the crossvalidated correlation coefficient, which is thus a measure of the stability of the regression model and should have a high value (Katritzky et al., 1995). Finally, six statistical outlier tests (Belsley et al., 1980) were employed to validate the data set. For the development of the QSPR model, a training set containing most of the experimental data is used. To test the predictive potential of the model, a much smaller set of experimental data is employed. Computations with ANN were performed on a Pentium PC 133 MHz with 32 MB RAM using the DATA ENGINE 2.0 software of MIT (Aachen, Germany) running under Windows 95. The in-house software TE (R. Dudda) was linked to the Data Engine for the purpose of data management and for scaling the values of input and output variables properly. A feed-forward, threelayer network with a modified back-propagation method as training algorithm was applied to our data sets. Figure 1 illustrates the architecture of the ANN. The input variables shown are the best eight descriptors taken from the multilinear regression analysis already outlined. The values of the input variables are fed to the nodes (neurons) of the first, input layer and are linearly scaled. The nodes of the second, hidden layer receive information from all nodes of the input layer and propagate it forward through the network (i.e., toward the output layer). Each node in this hidden layer uses a summation function for the incoming information, taking into account the different connection weights between two nodes and a transfer function of the sigmoid type to generate the output signal for the node in the output layer. In the node of the output layer, the same summation and transfer functions are applied to result in the output signal of the net that is the

3046 Ind. Eng. Chem. Res., Vol. 37, No. 8, 1998 Table 1. Experimental and Calculated Normal Boiling Points of Chlorosilanes for a Subgroup of the 50-Member Training Seta structure

Tobs [K]

Tcalc,MLR [K]

∆TMLR [K]

∆TMLR [%]

Tcalc,ANN [K]

∆TANN [K]

∆TANN [%]

1,1,1-trichloro-2,2,2-trifluorodisiloxane 1,2-bis(trichlorosilyl)ethane 1,2-dichloroethylethyltrichlorosilane 1,7-dichloro-1,1,3,3,5,5,7,7-octamethyltetrasiloxane 2-(3-cyclohexenyl)ethyltrichlorosilane 3,3,3-trifluorpropyldichloromethylsilane chloro(chloromethyl)dimethylsilane chlorodiethylsilane chloromethylsilane chloromethyltrichlorosilane chlorosilane chlorotriethylsilane dichloro(4-methylphenyl)silane dichloro(chloromethyl)methylsilane dichlorodi-2-propenylsilane dichloroethenylmethylsilane dichloroethoxymethylsilane dichloroethoxyphenylsilane dichloroethylphenylsilane dichlorofluoromethylsilane dichlorosilane diethyldichlorosilane dichlorodifluorosilane chlorodimethylphenylsilane hexachlorodisilane hexachlorodisiloxane dichloromethylsilane methylenebis(trichlorosilane) methyltrichlorosilane phenyldichlorosilane phenyltrichlorosilane tetrachlorosilane trichloro(2-chloropropyl)silane trichloro(3,3,3-trifluoropropyl)silane trichloro-2-propenylsilane trichloroethoxysilane trifluorochlorosilane trimethylchlorosilane

315.89 474.11 453.74 495.20 512.43 395.40 387.31 372.24 280.90 390.15b 242.80 419.46 469.58 391.55 438.52 366.62 373.21 495.40 503.43 302.65 281.45c 403.21 241.45 466.40 412.50 408.20 314.39 452.57 339.32 474.13 474.15 330.26 439.57 387.33 390.12 375.64 203.15 330.96

304.01 475.85 446.07 494.10 504.91 396.31 384.50 368.06 276.49 386.73 243.87 417.00 478.55 389.62 440.53 370.02 376.80 489.13 498.30 306.05 290.96 397.33 249.67 474.03 413.44 415.31 319.57 453.16 342.24 460.04 476.49 323.64 435.82 394.11 392.55 368.39 200.03 331.84

11.88 -1.74 7.67 1.10 7.52 -0.91 2.81 4.18 4.41 3.42 -1.07 2.46 -8.97 1.93 -2.01 -3.40 -3.59 6.27 5.13 -3.40 -9.51 5.88 -8.22 -7.63 -0.94 -7.11 -5.18 -0.59 -2.92 14.09 -2.34 6.62 3.75 -6.78 -2.43 7.25 3.12 -0.88

3.76 -0.37 1.69 0.22 1.47 -0.23 0.73 1.12 1.57 0.88 -0.44 0.59 -1.91 0.49 -0.46 -0.93 -0.96 1.27 1.02 -1.12 -3.38 1.46 -3.40 -1.64 -0.23 -1.74 -1.65 -0.13 -0.86 2.97 -0.49 2.00 0.85 -1.75 -0.62 1.93 1.54 -0.27

308.48 477.47 453.40 491.99 509.91 390.35 385.12 368.10 275.77 388.61 248.48 421.42 482.02 390.31 444.68 369.20 376.49 492.01 501.17 300.67 290.00 399.72 247.89 476.63 418.49 410.68 316.85 454.35 340.83 462.34 480.94 325.07 441.90 388.97 394.66 368.59 206.76 330.45

7.41 -3.36 0.35 3.21 2.52 5.05 2.19 4.14 5.13 1.54 -5.68 -1.96 -12.44 1.24 -6.16 -2.02 -3.28 3.39 2.26 1.98 -8.55 3.49 -6.44 -10.23 -5.99 -2.48 -2.46 -1.78 -1.51 11.79 -6.79 5.19 -2.33 -1.64 -4.54 7.05 -3.61 0.51

2.40 -0.70 0.08 0.65 0.49 1.29 0.57 1.13 1.86 0.40 -2.29 -0.47 -2.58 0.32 -1.39 -0.55 -0.87 0.69 0.45 0.66 -2.95 0.87 -2.60 -2.15 -1.43 -0.60 -0.78 -0.39 -0.44 2.55 -1.41 1.60 -0.53 -0.42 -1.15 1.91 -1.74 0.15

a

If not otherwise stated, all data are from the Dechema Detherm Databank, 1997. b Aldrich catalog, 1994/95. c Yaws et al., 1981.

desired physical property for the chosen set of input descriptors. To train the network (i.e., to decrease the differences between calculated and experimental physical property values for the training set of compounds), the connection weights of all nodes are optimized on the basis of the steepest descent method by propagating an error signal back through the net. A test set of compounds was used to observe the learning process of the net during the training phase. To avoid overtraining, which limits the predictive capability of the network, the optimal network parameters are taken at that point of time at which the average error for the test set is at its minimum. Results and Discussion Normal Boiling Points of Chlorosilanes. We started the selection process with normal boiling point data for 74 compounds collected from the Dechema Detherm Databank and from our in-house databank (Table 1). Obvious outliers of a preliminary correlation identified by statistical tests were eliminated after careful examination of the experimental data. Typing mistakes for the name of a compound leading to an erroneous structure, multiple data sets from different data sources with unspecified accuracy for a single compound, and boiling point data measured at reduced pressure instead of atmospheric pressure were the main reasons for the rejection of a total of 15 compounds. The

remaining normal boiling point data of proven reliability were randomly divided into a training set of 50 and a test set of nine compounds. Up to 600 different descriptors were then calculated for each compound as already described. All regression methods available with the CODESSA software were applied in a first run. The BMLR and the Heuristic Method resulted in the highest correlation and crossvalidated correlation coefficients for a specified number of descriptors. In Figure 2 the calculated boiling points denoted as solid circles are plotted versus the experimental boiling points of the 50-member training set, employing a correlation with eight descriptors. A perfect fit (i.e., a squared correlation coefficient of 1.0) would require all data to be located exactly on the diagonal. Here, the data are evenly distributed along the diagonal; that is, the representation of experimental data is equally good for low and high values of the normal boiling point. When the absolute difference between the experimental and calculated data is visualized (Figure 3) together with the 3 and 5% boundaries, it is evident that almost all calculated boiling points are within a (3% relative deviation from the experimental data. To find the highest quality correlation with the lowest possible number of descriptors, the mean square deviation (s2) and squared correlation coefficient (R2) were plotted versus the number of descriptors used in the

Ind. Eng. Chem. Res., Vol. 37, No. 8, 1998 3047

Figure 2. Calculated versus experimental normal boiling points of chlorosilanes for the 50-member training set and the ninemember test set. Solid symbols denote results from multilinear regression; open symbols denote results from artificial neural network. The descriptors used are given in Table 3.

Figure 3. Absolute deviation between experimental and calculated versus experimental normal boiling points of chlorosilanes for the 50-member training set and the nine-member test set. Solid symbols denote results from multilinear regression; open symbols denote results from artificial neural network. Lines show the 3% and 5% corridor of absolute deviation between experimental and calculated values. The descriptors used are given in Table 3.

correlation (Figure 4). Both curves show a significant change of slope between seven and eight descriptors. With more than eight descriptors, the correlation improves only little with each additional descriptor compared with the improvement below eight descriptors. Therefore, the final correlation equation consists of eight descriptors. The squared correlation coefficient is 0.9956 with a mean square deviation of 37.05 K2. All experimental and calculated normal boiling points together with absolute and relative deviation for a subgroup of the 50-member training set and 9-member test set are presented in Tables 1 and 2, respectively. The median error for the normal boiling point of all 50 chlorosilanes in the training set is (4.5 K. Table 3 gives the results of the best eight descriptors. The best correlation was that with the lowest mean square deviation, the highest squared correlation coefficient and the highest F-test value. The challenge for the new correlation arises when it is applied to the nine-member prediction set. In Figure 2 the calculated boiling points are plotted and in Figure

Figure 4. Mean square deviation (variance) s2 and squared correlation coefficient R2 versus the number of descriptors used in the correlation. Note the change of the slope of both curves between seven and eight descriptors.

3 the absolute difference in temperature (both denoted as solid triangles) is plotted versus the experimental data. All values are given in Table 2. The median error for the prediction of the normal boiling point of these nine chlorosilanes is (3.2 K. For the ANN, the descriptors used as input variables and the training and prediction set were the same as those obtained from the multilinear regression. The architecture of the net was varied with respect to the number of nodes in the hidden layer. A three-layer network with four nodes in the hidden layer proved to be sufficient to give excellent results both for training and prediction (Figure 1). The optimal point in training was reached after 5000 epoches (training cycles). The results are plotted in Figure 2 and Figure 3 (open circles and triangles). The median error for the training and prediction sets of normal boiling points are (4.1 and (4.6 K, respectively. Contrary to the results of the multiple linear regression (MLR) analysis, the descriptor expressing the minimum nucleophilic reactivity index for a chlorine atom (no. 1 in Figure 1) was the least important in this nonlinear model. On the other hand, the number of carbon atoms and the number of chlorine atoms proved to be equally important in both models. The results show that the ANN using the same descriptors as input variables will not give a better performance in terms of squared correlation coefficient and mean square deviation than the MLR analysis for this problem. However, it should be noted that this is not necessarily true for the case of a different set of descriptors as input variables. As expected for the class of chlorosilanes, the correlation comprises the number of chlorine atoms. The absence of the descriptor number of silicon atoms can be explained by the fact that most of the compounds in the training set contain only one silicon atom. Instead, we find the number of carbon atoms and the relative number of benzene rings as additional constitutional descriptor expressing the diversity of carbon and benzene in substituted groups. These two descriptors and the gravitation index calculated for all bonds describe the molecular structure in general. The correlation also includes three charge partial surface area (CPSA) descriptors introduced by Stanton and Jurs (1990), which describe molecular interaction and hydrogen bonding. Finally, the minimum nucleophilic reactivity index for a chlorine atom as a quantum-chemical

3048 Ind. Eng. Chem. Res., Vol. 37, No. 8, 1998 Table 2. Experimental and Predicted Normal Boiling Points of Chlorosilanes for the Nine-Member Test Seta structure

TObsa [K]

TCalc,MLR [K]

∆TMLR [K]

∆TMLR [%]

TCalc,ANN [K]

∆TANN [K]

∆TANN [%]

1,5-dichloro-1,1,3,3,5,5-hexamethyltrisiloxane 3-chloropropyltrimethylsilane dichloroethenylethylsilane dimethyldichlorosilane ethyltrichlorosilane trichloro(1-methylethyl)silane trichlorosilane dichloromethylphenylsilane chloromethyltrimethylsilane

457.04 424.15b 396.84 343.33 369.33 392.43 305.06 478.64 370.15b

458.70 428.66 391.68 344.75 370.82 394.07 315.60 478.30 360.95

-1.66 -4.51 5.16 -1.42 -1.49 -1.64 -10.54 0.34 9.20

-0.36 -1.06 1.30 -0.41 -0.40 -0.42 -3.46 0.07 2.48

456.32 432.58 393.30 343.03 372.04 397.21 314.61 481.20 361.54

0.72 -8.43 3.54 0.31 -2.71 -4.78 -9.55 -2.56 8.61

0.16 -1.99 0.89 0.09 -0.73 -1.22 -3.13 -0.53 2.33

a

If not otherwise stated all data are from the Dechema Detherm Databank, 1997. b Aldrich catalog 1994/95.

Table 3. Results of the Best Eight-Parameter Multilinear Regression for Normal Boiling Points of Chlorosilanes Based on the 50-Member Training Seta descriptor

no.

X

DX

t-test

R2

F

s2

intercept number of C atoms number of Cl atoms RPCG [Zefirov’s PC] gravitation index (all bonds) min. nucl. react. index for a Cl atom relative number of benzene rings WPSA-1 [Semi-MO PC] FNSA-2 [Semi-MO PC]

0 1 2 3 4 5 6 7 8

3.33E+02 2.13E+01 3.95E+01 -9.87E+01 -7.94E-02 -4.65E+02 5.63E+02 3.60E-01 -1.42E+01

1.09E+01 9.16E-01 1.61E+00 1.09E+01 9.75E-03 1.08E+02 8.50E+01 6.30E-02 4.67E+00

30.6744 23.2486 24.5612 -9.0484 -8.1443 -4.2884 6.6182 5.7157 -3.0447

0.6212 0.9702 0.9751 0.9875 0.9903 0.9920 0.9946 0.9956

78.72 765.83 599.34 887.17 897.37 889.35 1102.85 1156.17

2716.24 218.04 186.66 95.78 75.97 63.99 44.35 37.05

a X, regression coefficient; DX, error of the regression coefficient; t-test, value for the single regression term; R2, squared correlation coefficient for all regression terms just mentioned; F, value for the F-test for all regression terms just mentioned; s2, mean square deviation (variance) for all regression terms just mentioned.

descriptor completes the list of descriptors used in the correlation. Our QSPR model compares well with correlations published by other research groups (e.g., Murugan et al., 1994; Egolf and Jurs, 1993) in terms of the number of descriptors used and the accuracy obtained. It is noteworthy that descriptors of the same classes, namely constitutional and CPSA descriptors were used in most correlations. However, this is not true for the class of topological descriptors that appear in many correlations but were absent in our model. These indices encode size and shape of the molecules. In our case, these characteristics are presumably incorporated in the CPSA and constitutional descriptors. Cloud Points of Surfactants. Cloud point data of 23 surfactant-solvent systems were collected from our in-house surfactant databank (Dr. Martin Stolz, Business Unit Surfactants, R&D). All surfactants belong to the class of alkyl ethoxylates with an unspecified ethylene oxide chain length distribution. The concentration of the surfactant in an aqueous butyldiglycol solution was 16.7 wt %. The average experimental error was (3 K. The MLR and ANN analysis performed were the same as described for the normal boiling points. The results for the MLR analysis are shown in Figure 5 and in Table 4. An excellent correlation was established with four descriptors of which the squared correlation coefficient is 0.9948, with a mean square deviation of 2.75 K2. The median error for the cloud point of the 20 surfactants is (1.2 K in the training set and (0.7 K in the threemember prediction set. The correlation combines two topological [average information content (order 2), Kier shape index (order 3)] and two constitutional descriptors (relative molecular weight, relative number of rings). It is known that the cloud point is a function of the number of ethylene oxide groups (DIN 53917, ISO 1061). This dependence is reflected in the relative molecular weight, which in-

Figure 5. Calculated versus experimental cloud points of nonionic surfactants for the 20-member training set (solid symbols) and for the three-member test set (open symbols). The descriptors used are given in Table 4.

creases, and in the relative number of rings, which decreases with the number of ethylene oxide groups. A similar argument justifies the need for the topological descriptors representing shape and size of a molecule in our correlation. A comparison of our results with published QSPR models (Huibers et al., 1997) shows a similar quality for the correlation. However, it should be stated that contrary to Huibers et al. (1997), a broad distribution of ethylene oxide chain length existed for all surfactantsolvent systems in our study. Therefore, an average chain length had to be assumed prior to the calculation of descriptors. Interaction Parameter for Cubic Equation of States. Values for the optimized interaction parameter kijOpt for a modified Peng-Robinson equation of state (t-mPR EOS) of 22 binary CO2-hydrocarbon systems

Ind. Eng. Chem. Res., Vol. 37, No. 8, 1998 3049 Table 4. Results of the Best Four-Parameter Multilinear Regression for Cloud Points of Nonionic Surfactants Based on the 20-Member Training Seta descriptor

no.

X

DX

t-test

R2

F

s2

intercept relative molecular weight average information content (order 2) relative number of rings Kier shape index (order 3)

0 1 2 3 4

-7.03E+02 1.77E+02 -4.53E+01 -5.53E+03 -2.81E+00

1.91E+01 4.33E+00 2.56E+00 1.68E+02 1.34E-01

-36.7455 40.9502 -17.7164 -32.8383 -21.0049

0.2983 0.6445 0.8516 0.9948

8.08 16.32 32.51 765.99

313.55 167.67 74.14 2.75

a X, regression coefficient; DX, error of the regression coefficient; t-test, value for the single regression term; R2, squared correlation coefficient for all regression terms just mentioned; F, value for the F-test for all regression terms just mentioned; s2, mean square deviation (variance) for all regression terms just mentioned.

Table 5. Optimized and Calculated Interaction Parameters of Binary Carbon Dioxide-Hydrocarbon Mixtures for the 21-Member Training Set and Optimized and Calculated Interaction Parameter for the Single-Member Test Set structure

Figure 6. Calculated versus optimized interaction parameters of binary carbon dioxide-hydrocarbon mixtures for the 21-member training set (solid symbols) and the single-member test set (open symbol). The descriptors used are given in Table 6.

were taken from a supplement of a publication by Kordas et al. (1994) kindly provided to us by the authors. They used a minimization function for the bubble point to evaluate the interaction parameters for each isotherm separately. The percentage average relative error for the bubble point varied between 1.1 and 6.6% for the CO2-non-n-alkane systems. For our purpose of developing a QSPR model, it was necessary to fix the temperature for all 22 systems investigated at a unique value. A temperature of 311 K (above the critical temperature of CO2) was chosen, where optimized interaction parameters for all systems were available. The total number of descriptors in the final correlation was limited to five because of the relatively small number of 21 kij values in our training set. Figure 6, Table 5, and Table 6 give the results of the MLR analysis. The squared correlation coefficient is 0.9224. The average percentage relative deviation between optimized and calculated values is (8.2% (notice the high deviation for 1-hexene of 65.8%, indicating that 1-hexene is an outlier in this correlation; a much lower deviation of 5.3% can be obtained without 1-hexene). The correlation terms include a constitutional descriptor (number of rings) to account for the molecular composition, a geometrical descriptor (YZ shadow) to reflect the size of the molecule, and three quantumchemical descriptors (maximum total interaction for a C-H bond, minimum nuclear-nuclear repulsion for a C-C bond, minimum nucleophilic reactivity index for a C atom) representing the polar interactions between molecules and their chemical reactivity. To test the quality of the model, we predicted the interaction parameter for the binary system CO2-

training set 1-butene 1-heptene 1-hexene 1-methylnaphthaline benzene cyclohexane cyclopentane ethylbenzene ethylcyclohexane isopentane methylcyclohexane neopentane toluene transdecalin i-propylbenzene n-heptylbenzene n-hexylbenzene n-octylbenzene n-propylbenzene o-xylene p-xylene test set isobutane a

kijOpt a kijCalc,MLR

∆kijMLR

∆kijMLR [%]

0.0833 0.0847 0.0234 0.1182 0.0816 0.1443 0.1143 0.0981 0.1299 0.1141 0.1156 0.1017 0.1015 0.1635 0.0905 0.0897 0.0889 0.0931 0.0963 0.1001 0.0933

0.0726 0.0779 0.0388 0.1290 0.0882 0.1409 0.1172 0.0953 0.1167 0.1247 0.1152 0.1043 0.1036 0.1639 0.0891 0.0903 0.0922 0.0912 0.0850 0.0902 0.0995

0.0107 0.0068 -0.0154 -0.0108 -0.0066 0.0034 -0.0029 0.0028 0.0132 -0.0106 0.0004 -0.0026 -0.0021 -0.0004 0.0014 -0.0006 -0.0033 0.0019 0.0113 0.0099 -0.0062

12.85 8.03 65.81 9.14 8.09 2.36 2.54 2.85 10.16 9.29 0.35 2.56 2.07 0.24 1.55 0.67 3.71 2.04 11.73 9.89 6.65

0.1213

0.1223

-0.001

0.82

Taken from a supplemental table of Kordas et al., 1994.

Figure 7. Pxy-diagram for the binary system CO2-isobutane at 311 K. The solid symbols denote experimental data (Besserer and Robinson, 1973). Phase equilibrium calculations were performed using the Peng-Robinson equation of state with an interaction parameter kij ) 0.1223 (solid line) predicted with the correlation from Table 6 and kij ) 0.0 (broken line).

isobutane at 311 K and calculated the Pxy-diagram at this temperature. In Figure 7 the experimental and calculated vapor-liquid equilibrium data are shown for the predicted value kijcalc ) 0.1223 and for kij ) 0.0,

3050 Ind. Eng. Chem. Res., Vol. 37, No. 8, 1998 Table 6. Results of the Best Five-Parameter Multilinear Regression for the Interaction Parameter of Binary Carbon Dioxide-Hydrocarbon Mixtures Based on the 21-Member Training Seta descriptor

no.

X

DX

t-test

R2

F

intercept number of rings max total interaction for a CsH bond YZ shadow min n-n repulsion for a CsC bond min nucleoph. react. for a C atom

0 1 2 3 4 5

3.80E+00 2.30E-02 -1.49E-01 -6.90E-03 -1.43E-02 5.02E+00

5.11E-01 4.18E-03 1.81E-02 1.37E-03 2.95E-03 1.35E+00

7.4455 5.5119 -8.2324 -5.0219 -4.8323 3.7205

0.3225 0.7717 0.7957 0.8507 0.9224

9.04 30.42 22.07 22.79 35.64

a X, regression coefficient; DX, error of the regression coefficient; t-test, value for the single regression term; R2, squared correlation coefficient for all regression terms just mentioned; F, value for the F-test for all regression terms just mentioned.

which is the value of choice when no experimental VLE data are at hand to fit the parameter. The predicted interaction parameter allows an accurate description of the experimental data for the complete concentration range. Using kij ) 0.0 on the other hand would result in a significant error for the prediction. Conclusions Our study shows that physical properties of pure compounds and parameters for thermodynamic models can be correlated and predicted with a high degree of confidence using QSPR models. The average percentage relative deviation between measured and calculated property for the training sets is 1.2% for the normal boiling points of chlorosilanes, 2.4% for the cloud points of surfactants, and 8.2% for the interaction parameter of binary CO2-hydrocarbon systems (the error made in describing the vapor-liquid equilibrium is not necessarily in the same order of magnitude as for the kij). For the prediction, the deviations were 0.7% for the boiling points, 0.8% for the cloud points, and 0.8% for the interaction parameter. The QSPR models are based exclusively on structural information encoded in constitutional, topological, geometrical, and quantumchemical descriptors. Therefore, the property prediction for compounds not yet synthesized but similar in structure to those in the training set is now readily available. The different techniques applied for the correlation analysis, the MLR, and the ANN technique show a comparable performance for the properties investigated when the same descriptors were permitted in each of the final models. However, as pointed out by Egolf and Jurs (1993), the neural network approach can be used for pinpointing the best descriptors from a larger pool of descriptors than that used in the final linear regression model. Also, because of the higher flexibility, a better performance can then be expected for the neural network. Evidently the same types of descriptors are included in QSPR models from different research groups developed for the same property, but the selection of descriptors for the final models is far from unique. In fact, our MLR analysis resulted in several correlations of comparable accuracy but with different descriptors employed. We have successfully applied the QSPR approach to many other physical properties of interest to our customers. This novel tool enables us to do a rapid, inexpensive data analysis and screening of numerous compounds with specified properties when no experimental data are available. We believe that a large QSPR model base can at least partially substitute group

contribution estimation techniques in future software packages for the computer-aided molecular design of chemicals. Acknowledgment B. B. is grateful to the Hu¨ls AG for financial support of his Diplomarbeit at the Department of Chemical Engineering, Hu¨ls AG at Marl. The authors thank Reinhard Dudda for providing a user-friendly interface to the neural network models used in this work and Dr. Martin Stolz for the access to the cloud point data measured in his laboratory. The assistance of Dr. Andy Holder (Semichem, Shawnee, KS) to overcome initial problems with the QSPR software is gratefully acknowledged. Literature Cited Avlonitis, G.; Mourikas, G.; Stamataki, S.; Tassios, D. A generalized correlation for the interaction coefficients of nitrogenhydrocarbon binary mixtures. Fluid Phase Equilib. 1994, 101, 53. Bartle, K. D.; Clifford, A. A.; Shilstone, G. F. Estimation of solubilities in supercritical carbon dioxide: A correlation for the Peng-Robinson interaction parameters. J. Supercrit. Fluids 1992, 5, 220. Belsley, D. A.; Kuh, E.; Welsch, R. E. Regression Diagnostics; John Wiley & Sons: New York, 1980. Besserer, G. J.; Robinson, D. B. Equilibrium-phase properties of isobutane-carbon dioxide systems. J. Chem. Eng. Data 1973, 18, 298. Constantinou, L.; Gani, R. New group contribution method for estimating properties of pure compounds. AIChE J. 1994, 40 (10), 1697. Coutinho, A. P.; Kontogeorgis, G. M.; Stenby, E. H. Binary interaction parameters for nonpolar systems with cubic equations of state: A theoretical approach 1. CO2/hydrocarbons using SRK equation of state. Fluid Phase Equilib. 1994, 102, 31. Cypcar, C. C.; Camelio, P.; Lazzeri, V.; Waegell, B. QSPR model for predicting the glass transition temperature of aliphatic substituted acrylate and methacrylate polymers. Polym. Preprints 1996, 37 (25), 364. Cypcar, C. C.; Camelio, P.; Lazzeri, V.; Mathias, L. J.; Waegell, B. Prediction of the glass transistion temperature of multicyclic and bulky substituted acrylate and methacrylate polymers using the energy, volume, mass (EVM) QSPR model. Macromolecules 1996, 29 (13), 8954. Degiorgio, V. Nonionic micelles. Proc. Int. Sch. Phys. “Enrico Fermi” 1985, 90, 303. Dohrn, R.; Treckmann, R.; Olf, G. A Centralized ThermophysicalProperty Service in the Chemical Industry. Conference Distillation and Absorption ‘97, Maastricht, The Netherlands, April 1997; Vol. 1, p 115. Engelhardt, H. L.; Jurs, P. C. Prediction of supercritical carbon dioxide solubility of organic compounds from molecular structure. J. Chem. Inf. Comput. Sci. 1997, 37, 478. Fischer, K. Die PSRK-Methodde: Eine Zustandsgleichung unter Verwendung des UNIFAC-Gruppenbeitragsmodells. Fortschrittsberichte VDI, Reihe 3: Verfahrenstechnik; VDI-Verlag: Du¨sseldorf, 1993.

Ind. Eng. Chem. Res., Vol. 37, No. 8, 1998 3051 Gakh, A. A.; Gakh, E. G.; Sumpter, B. G.; Noid, D. W. Neural network-graph theory approach to the prediction of the physical properties of organic compounds. J. Chem. Inf. Comput. Sci. 1994, 34, 832. Glen, W. D.; Dunn, W. J.; Scott, R. D. Principal component analysis and partial least squares regression. Tetrahedron Comput. Methodol. 1989, 2, 349. Hansch, C.; Fujita, T. F-σ-π. A method for the correlation of biological activity and chemical structure. J. Am. Chem. Soc. 1964, 86, 1616. Huibers, P. D. T.; Lobanov, V. S.; Katritzky, A. R.; Shah, D. O.; Karelson, M. Prediction of critical micelle concentration using a quatitative structure-property relationship approach. 1. Nonionic surfactants. Langmuir 1996, 12, 1462. Huibers, P. D. T.; Shah, D.; Katritzky, A. R. Predicting Surfactant Cloud Point from Molecular Structure. J. Colloid Interface Sci. 1997, 193, 132. Joback, K. G. M. S. Thesis, Massachusetts Institute of Technology, Cambridge, MA, 1984. Joback, K. G.; Reid, R. C. Estimation of pure-component properties from group-contributions. Chem. Eng. Commun. 1987, 57, 233. Karelson, M.; Lobanov, V. S. Quantum-chemical descriptors in QSPR/QSAR studies. Chem. Rev. 1996, 96, 1027. Katritzky, A. R.; Lobanov, V. S.; Karelson, M. QSPR: The correlation and quantitative prediction of chemical and physical properties from structure. Chem. Soc. Rev. 1995, 279. Katritzky, A. R.; Mu, L.; Lobanov, V. S.; Karelson, M. Correlation of boiling points with molecular structure. 1. A training set of 298 diverse organics and a test set of 9 simple inorganics. J. Phys. Chem. 1996, 100, 10400. Katritzky, A. R.; Mu, L.; Karelson, M. A QSPR Study of the solubility of gases and vapors in water. J. Chem. Inf. Comput. Sci. 1996, 36, 1162. Katritzky, A. R.; Rachwal, P.; Law, K. W.; Karelson, M.; Lobanov, V. S. Prediction of polymer glass transition temperatures using a general quantitative structure-property relationship treatment. J. Chem. Inf. Comput. Sci. 1996, 36, 879. Katritzky, A. R.; Karelson, M.; Lobanov, V. S. QSPR as a means of predicting and understanding chemical and physical properties in terms of structure. Pure Appl. Chem. 1997, 69 (2), 245. Knapp, H.; Do¨ring, R.; Oellrich, L.; Plo¨cker, U.; Prausnitz, J. M. Vapor-Liquid Equilibria for Mixtures of Low Boiling Substances, Dechema Chemistry Data Series, Vol. VI.; Dechema: Frankfurt, 1982. Kordas, A.; Tsoutsouras, K.; Stamataki, S.; Tassios, D. A generalized correlation for the interaction coefficients of CO2-hydrocarbon binary mixtures. Fluid Phase Equilib. 1994, 93, 141. Kordas, A.; Tsoutsouras, K.; Stamataki, S.; Tassios, D. Methanehydrocarbon interaction parameters correlation for the PengRobinson and the t-mPR equation of state. Fluid Phase Equilib. 1995, 112, 33. Le, T. D.; Weers, J. G. QSPR and GCA models for predicting the normal boiling points of fluorocarbons. J. Phys. Chem. 1995, 99 (14), 6739. Murugan, R.; Grendze, M. P.; Toomey, J. E., Jr.; Katritzky, A. R.; Karelson, M.; Lobanov, V. S.; Rachwal, P. Predicting physical properties from molecular structure. CHEMTECH 1994, June, 17. Nishiumi, H.; Arai, T. Generalization of the binary interaction parameter of the Peng-Robinson equation of state by component family. Fluid Phase Equilib. 1988, 42, 43. Patil, G. S. Correlation of aqueous solubility and octanol-water partition coefficient based of molecular structure. Chemosphere 1991, 22 (49), 723. Patil, G. S. Prediction of aqueous solubility and octanol-waster partition coefficient for pesticides based on their molecular structure. J. Hazard. Mater. 1994, 19 (49), 35. Prausnitz, J. M. University of California at Berkeley, personal communication, 1996.

Raney, K. H.; Benton, W. J.; Miller, C. A. Optimum detergency conditions with nonionic surfactants. I. Ternary water-surfactant-hydrocarbon systems. J. Colloid Interface Sci. 1987, 117 (1), 282. Rechsteiner, C. E., Jr. In Handbook of Chemical Property Estimation Methods; Lyman, W. J., Reehl, W. F., Rosenblatt, D. H., Eds.; McGraw-Hill: New York, 1982; Chapter 12. Reid, R. C.; Prausnitz, J. M.; Poling, B. E. The Properties of Gases and Liquids, 4th ed.; McGraw-Hill: New York, 1987. Richter, R.-R.; Bluhm, T.; Kru¨ssmann, H. Struktur-Aktivita¨tsbeziehungen zwischen Tensidstruktur und Waschleistung. Tenside Surf. Det. 1996, 33 (1), 42. Rogers, D.; Hopfinger, A. J. Application of genetic function approximation to quantitative structure-activity relationships and quantitative structure-property relationships. J. Chem. Inf. Comput. Sci. 1994, 34, 854. Russell, C. J.; Dixon, S. L.; Jurs, P. C. Computer-assisted study of the relationship between molecular structure and Henry’s Law constant. Anal. Chem. 1992, 64 (31), 1350. Shriner, R. L.; Curtin, D. Y.; Fuson, R. C.; Morill, T. C. The Systematic Identification of Organic Compounds, 6th ed.; John Wiley & Sons: New York, 1980. Stanton, D. T.; Jurs, P. C. Development and use of charge partial surface area structural descriptors in computer-assisted quantitative structure-property relationship studies. Anal. Chem. 1990, 62, 2323. Stanton, D. T.; Jurs, P. C. Computer-assisted prediction of normal boiling points of furans, tetrahydrofurans, and thiophenes. J. Chem. Inf. Comput. Sci. 1991, 31, 301. Stanton, D. T.; Egolf, L. M.; Jurs, P. C. Computer-assisted prediction of normal boiling points of pyrans and pyrroles. J. Chem. Inf. Comput. Sci. 1992, 32, 306. Sutter, J. M.; Jurs, P. C. Prediction of aqueous solubility of a diverse set of heteroatom-containing organic compounds using a quantitative structure-property relationship. J. Chem. Inf. Comput. Sci. 1996, 36, 100. Tan, T. T. M.; Rode, B. M. The melting points of oligomethylenes. J. Polym. Sci., Part B: Polym. Phys. 1996, 34 (17), 2139. Teja, A. S., Binary interaction coefficients for mixtures containing the n-alkanes. Chem. Eng. Sci. 1978, 33, 609. Tsonopoulos, C. Second virial cross-coefficients: Correlation and prediction of kij. In ACS Advances in Chemistry Series. Equations of State in Engineering and Research; Chao, K. C., Robinson, R. L., Jr., Eds.; American Chemical Society: Washington, DC, 1979; Vol. 182, p 143. Valderrama, J. O.; Obaid-Ur-Rehman, S.; Cisteranas, L. A. Generalized interaction parameters in cubic equations of state for CO2-n-alkane mixtures. Fluid Phase Equilib. 1988, 40, 217. Wessel, M. D.; Jurs, P. C. Prediction of reduced ion mobility constants from structural information using multiple linear regression analysis and computational neural networks. Anal. Chem. 1994, 66, 2480. Wessel, M. D.; Jurs, P. C. Prediction of normal points of hydrocarbons from molecular structure. J. Chem. Inf. Comput. Sci. 1995, 35, 68. Wu, S. E.; Sandler, S. I. Use of ab initio quantum mechanics calculations in group contribution methods: 1. Theory and the basis for group identifications. Ind. Eng. Chem. Res. 1991, 30, 881. Yaws, C. L.; Li, K.-Y.; Hopper, J. R.; Fang, C. S.; Hansen, K. C. Process Feasibility Study in Support of Silicon Material Task I. Final report, Distribution Category UC-63; Lamar University: Beaumont, TX, 1981.

Received for review December 15, 1997 Revised manuscript received February 10, 1998 Accepted February 11, 1998 IE970910Y