Automatic Creation of Missing Groups through Connectivity Index for

Design of Heat-Transfer Media Components for Retail Food Refrigeration. Apurva Samudra and Nikolaos V. Sahinidis. Industrial & Engineering Chemistry ...
0 downloads 0 Views 249KB Size
7262

Ind. Eng. Chem. Res. 2005, 44, 7262-7269

Automatic Creation of Missing Groups through Connectivity Index for Pure-Component Property Prediction Rafiqul Gani* CAPEC, Department of Chemical Engineering, Technical University of Denmark, DK-2800 Lyngby, Denmark

Peter M. Harper and Martin Hostrup CapSolvasIntegrated Process Solutions ApS, Kronprinsessegade 46E, Copenhagen DK-1306 K, Denmark

A common frustration of using property models in general and group contribution models in particular is that the selected model may not have all the needed parameters, such as groups and/or their contributions needed to represent the molecular structure of the compound whose properties are to be estimated. Also, even if the groups are available, for some chemicals the set of groups may not be able to provide an acceptable level of prediction accuracy. One way to address these limitations with the group contribution approach is to add new groups. Addition of new groups, however, normally requires experimental data so that the new groups can be defined and their contributions estimated, which requires time and resources and is, therefore, not convenient for the model user. In this paper, a group contribution+ approach for purecomponent properties, where missing groups are created and their contributions predicted through a set of zero-order and first-order connectivity indices, is presented. Introduction In group contribution (GC) methods, molecular structures are considered as ensembles of relatively small molecular fragments or groups (for example, CH3, CH2, ..., OH, ..., etc.). These groups are used as building blocks to describe molecular structures. A typical GC method treats a property of a given compound as an additive function of parameters related to the groups describing the molecular structure. The parameters involved in the additive function are then considered as the “contributions” of each molecular fragment or group to the compound property. This GC-based model can be expressed in a very simplified way as follows

f(Y) )

∑i NiCi

(1)

where f(Y) is a property function for the property Y to be estimated, Ni is number of times the group Gi appears in the molecule, and Ci is the contribution of the group Gi to the property function f(Y). The selection of a convenient set of groups G ) Gi (i ) 1...n) is a key factor if a group contribution method should be able to provide acceptable estimates of property Y for a wide range of chemical compounds. Another key factor is the determination of the contributions Ci of the group Gi to the property Y. The contribution values are usually obtained through regression over a data set of chemical compounds and their corresponding experimentally measured values for property Y. If at least one part of the molecular structure of a given compound is not described by any of the available groups, the GC method cannot be applied to estimate values of the property Y for this compound. For example, if we consider the GC methods reported by Constantinou * Corresponding author. Tel.: (45) 45252882. Fax: (45) 45932906. E-mail: [email protected].

Figure 1. Incomplete molecular structure representationsthe circled group does not exist in the group tables for the Marrero and Gani method.2

and Gani1 or Marrero and Gani,2,3 the chemical compound shown in Figure 1 cannot be fully described and, consequently, its properties cannot be estimated. To overcome this limitation, a possible solution may be to include a new group Gn+1, namely PdS, in the group set G and determine its contribution Cn+1 from regression, if sufficient new experimental data is available. However, this is not a practical approach if a quick response is needed, because regression may be a lengthy process, especially when sufficient experimental values of the property Y are necessary to estimate the contributions for the new group with an adequate statistical significance. If no experimental property value of Y is available, which may be the usual case, then another option is to try to predict the new contribution Cn+1. In this paper, a methodology is proposed to create new groups and predict their contributions for specific purecomponent properties by using valence connectivity indices (vχ) as described by Kier and Hall.4 That is, use the current set of experimental data of pure-component properties to define a set of connectivity indices (CI) and regress their contributions for the corresponding pure-

10.1021/ie0501881 CCC: $30.25 © 2005 American Chemical Society Published on Web 08/06/2005

Ind. Eng. Chem. Res., Vol. 44, No. 18, 2005 7263 Table 1. Values of the Atomic Index δv for Each Atom/Vertex (nH Is the Number of Connected Hydrogen Atoms) acyclic cyclic special

C

Si

N

F

Br

Cl

I

Na

K

O

4-nH 14-nH

4-nH 14-nH

5-nH 15-nH nitro: 6

7

7/27

7/9

7/47

1/10

1/18

6-nH 16-nH

component properties. This leads to a group contribution+ (GC+) method of wider application range than before because of the new groups that can be created through the regressed contributions of connectivity indices. By introducing rules to join connectivity indices to form any missing groups, it is, therefore, possible to create the missing group and/or estimate the missing pure-component property contributions. The paper describes the new methodology, together with an analysis of the results from the CI-based method (properties estimated only with the contributions from CI) as well as the combined GC-CI approach (properties estimated with available groups and created groups, the GC+ method). Theoretical Background The connectivity indices, vχ indices, are formalisms defined via graphical theoretical concepts intended to describe topological characteristics of molecular structures.4-6 The graphical theoretical treatment of the molecular structures starts by the construction of the hydrogen-suppressed graph of the molecular structure. For example, the differences between the molecular structure and its corresponding hydrogen-suppressed graph for acetic acid are illustrated in Figure 2. On the left-hand side of Figure 2, the representation of acetic acid with two groups is illustrated, while the corresponding molecule and group representation in terms of the hydrogen-suppressed graph is shown on the righthand side of Figure 2.

Figure 2. Representation of molecular structure and hydrogensuppressed graph for acetic acid.

As shown in the right-hand side of Figure 2, the nonhydrogen atoms become vertices 1, 2, 3, and 4 in the graph while the bonds become edges a, b, and c. The omission of hydrogen and double bonds in the graph is compensated by the manner in which the atomic index δv for each vertex is defined. Table 1 lists the values of atomic indices for various atoms and vertices,4 where nH is the number of attached hydrogen atoms. The defined atomic index comprises information not only about the nature of the atom associated to the vertex but also about the way it is bonded to its surrounding atoms. The bond indices βk are usually defined through the pair (that is, bonding atoms) of atomic indices δv.

βk ) δvi δvj where i and j are the atoms involved in the bond.

(2)

P

S

PH2: 1/3 PH: 4/9

“special” + 9 SH: 5/9 S: 2/3 S: 8/3

5/9

Table 2. Calculated Atom and Bond Indices for Acetic Acid atom

1

2

3

4

bond

a (1-2) b (2-3) c (2-4)

δv 1 4 5 6 βk 4.000 1/(δv)1/2 1 0.5 0.447 0.408 1/(βk)1/2 0.500

20.000 0.223

24.000 0.204

The zeroth-order (atomic) connectivity index (vχ0) is defined as a summation over the vertices of the hydrogensuppressed graph, v 0

χ )

∑i (1/xδvi )

i ) 1, L

(3)

where L is the number of vertices (atoms) in the graph and the values of δvi are the atom indices whose values can be obtained from Table 1 for the corresponding atom. Similarly, the first-order (bond) connectivity index (vχ1) is defined as a summation over the edges of the hydrogen-suppressed graph, v 1

χ )

∑k (1/xβk )

k ) 1, M

(4)

where M is the number of edges (bonds) in the graph while the bond index βk is given by eq 2. For acetic acid, whose hydrogen-suppressed graph is shown in Figure 2, the atom and bond indices calculated by using eq 2 and Table 1 are given in Table 2. The zeroth-order (atomic) connectivity index (vχ0) is calculated from eq 3 to be 2.355 (summation of the four 1/(δv)1/2 values given in Table 2), while the first-order (bond) connectivity index (vχ1) is calculated from eq 4 to be 0.927 (summation over the three 1/(βk)1/2 values given in Table 2). Connectivity indices of higher orders can be defined similarly to represent topological features of larger fragments.7,8 However, for the sake of simplicity and the objective of this work (create the missing groups and estimate their contributions), these are not employed in this paper. Note also that the missing groups to be created through this method will represent only a small part of the total molecular structure, and therefore, higher-order terms are not used in this method. Combined GC-CI Method In the combined GC-CI method, missing groups are automatically created with a CI-based method and then the corresponding property is estimated with a GC method. To create new groups and predict their contributions from the connectivity indices, it is necessary to relate the values of the desired property Y with vχ0 and vχ1. First, however, some rules to represent groups with connectivity indices need to be established. Representation of Groups with Connectivity Indices. If one is to be able to use connectivity indices to create groups, it should be possible to “split” any compound into arbitrary groups without altering the

7264

Ind. Eng. Chem. Res., Vol. 44, No. 18, 2005

molecular structural information. That is, if the acetic acid is split into two groups, one should be able to get the same values of vχ0 and vχ1 for the molecule (see Table 2) and by addition of the vχ0 and vχ1 connectivity from each fragment (group).

(vχ0)molecule )

∑n (νχ0)n

(5)

(vχ1)molecule )

∑n (νχ1)n

(6)

where n is the number of fragments (or groups) into which a compound is subdivided. By subdividing acetic acid into two groups, A (for group COOH) and B (for group CH3), according to Figure 2 and by applying the formulas for the calculation of vχ0 and vχ1 given above, one would get (note that the bond between atoms 2 and 4 is not included)

(vχ0)groupA ) 0.5 + 0.447 + 0.408 ) 1.355 (vχ0)groupB ) 1.0 (vχ1)groupA ) 0.223 + 0.204 ) 0.427 (vχ1)groupB ) 0 Using eqs 5 and 6, the molecular vχ0 and vχ1 are obtained as follows:

(vχ0)molecule ) (vχ0)groupA + (vχ0)groupB ) 1.355 + 1 ) 2.355 (vχ1)molecule ) (vχ1)groupA + (vχ1)groupB ) 0.427 + 0 ) 0.427 Clearly, the above rules (eqs 5 and 6), while alright for molecules represented by atoms and bonds, are not consistent for vχ1 when groups/fragments are combined to calculate the molecular value. The value of vχ1 for the molecule should have been 0.927 (see Table 2) and not 0.427. This difference is due to the fact that the bond between the two groups (the connection between atoms 2 and 4) was not included. Therefore, to make the method internally consistent, the following modification to eq 6 has been used,

(νχ1)group )

k )+ ∑k (1/xβ internalbonds m 0.5/xβ bondsoutofgroup ( ) ∑ m

(7)

where k is the number of internal bonds in the group/ fragment and m is the number of bonds leaving the group/fragment. Note that eqs 5 and 7 satisfy the requirement that the molecule and groups representing them will have the same zero- and first-order connectivity index values. Referring to Figure 2 and the groups COOH and CH3 representing acetic acid, the calculated vχ1 value for the molecule with eq 7 becomes

(vχ1)molecule ) (vχ1)groupA + (vχ1)groupB ) [0.427 + 0.5 (0.5)]COOH + [0.5(0.5)]CH3 ) 0.927

Clearly, with eqs 5 and 7, when the indices for the groups are joined together, the corresponding molecular values are obtained. CI-Based Method. In this work, we have employed the following pure-component property model,

f(Y) )

∑i (aiAi) + b(νχ0) + 2c(νχ1) + d

(8)

where Y is the pure-component property to estimate, Ai is the number of ith-atoms occurring in the molecular structure, vχ0 is the zeroth-order (atom) connectivity index given by eq 3, vχ1 is the first-order (bond) connectivity index given by eq 4, ai is the contribution of atom i, b and c are adjustable parameters, and d is a constant. High accuracy in the prediction of Y cannot be expected from this model since, with only a few parameters, a large set of compounds is going to be represented. Greater accuracy can be obtained by adding higher-order connectivity indices.8,9 However, the objective in this work is to obtain the missing group contributions, for which only the first two connectivity indices should be sufficient. Regression of CI-Model Parameters. All the pure-component property data used earlier by Marrero and Gani2,3 have been regressed to estimate the parameters for the model represented by eq 8 for the following primary properties: normal melting point (Tm), normal boiling point (Tb), critical temperature (Tc), critical pressure (Pc), critical volume (Vc), standard heat of formation (Hf), standard Gibbs energy (Gf), standard heat of fusion (Hfus), standard heat of vaporization (HV) at 298 K, and octanol-water partition coefficient (log Kow). Table 3a provides a summary of the regression statistics in terms of data points used for each property, the correlation coefficient, average absolute error, property function f(Y), and the units of measure for each property with the CI-based method alone. Table 3b provides the summary correlation statistics for the Marrero-Gani method for the same compounds used in Table 3a. It can be noted that, while the CI method is not better than the Marrero-Gani method, it is also not very much worse. Therefore, using the coefficients for the CI-based method (eq 8) provided in Table 4, a missing Marrero-Gani method group contribution can be estimated with similar accuracy. Figure 3 provides a visual picture (calculated property value versus experimental property value) of the regression for each property. In our view, more detailed correlation statistics will not provide additional useful information, because we do not recommend use of the parameters for the CI-based method other than for predicting the missing group contributions. Therefore, what is important is to evaluate the performance of the model against true predictions. Calculation of Properties with the Combined GC-CI Method. The combined GC-CI method (automatic creation of groups with the CI method and property prediction with the GC method) is used only when a chemical compound cannot fully be represented by the available groups and/or when one or more of the groups representing the chemical compound does not have the corresponding property contribution in the group parameter tables. In principle, the groups and/ or its missing contribution can be used within any GCbased method. Since the missing group is created and its missing contribution is predicted through the CI

Ind. Eng. Chem. Res., Vol. 44, No. 18, 2005 7265 Table 3. Correlation Statistics for the CI Method (Part A) and for the Marrero-Gani Method (Part B) A. CI Method Correlation Statistics and Expression for the Right Hand Side of the Property Function (eq 8) properties variable type

Tm

Tb

Tc

Pc

Vc

Hf

Gf

Hfus

HV

log Kow

A. CI Method no. data points correlation coeff., r2 avg. absolute error property func., f(Y) units

7 970 0.487 56.78 exp(Tm/E) K

5 406 0.824 25.79 exp(Tb/E) K

788 0.837 32.97 exp(Tc/E) K

784 0.843 3.46 (Pc - E)(-0.5) - G bar

787 0.991 11.76 Vc - E cm3/mol

791 0.926 44.62 Hf - E kJ/mol

762 0.954 33.00 Gf - E kJ/mol

720 0.864 3.08 Hfus - E kJ/mol

447 0.691 4.49 HV - E kJ/mol

12 026 0.693 0.75 log Kow - E

B. Marrero-Gani Method Correlation Statisticsa properties variable type

Tm

Tb

Tc

Pc

Vc

Hf

Gf

Hfus

Hvap-298

log Kow

no. data points avg. absolute error units avg relative errorb

8096 57.87 K 16%

5373 16.46 K 4%

784 11.16 K 2%

783 0.94 bar 3%

786 8.47 cm3/mol 2%

786 7.25 kJ/mol

758 6.32 kJ/mol

717 2.15 kJ/mol

423 0.97 kJ/mol 3%

12022 0.52

a The number of compounds in Table 3B is less than that in Table 3A because some of the compounds from Table 3A could not be described by the Marrero-Gani method (because of too few compounds within a particular group). b The average relative error for the heats of fusion and log Kow are not given because there are positive and negative values for the measured property values.

Table 4. Regressed Parameters for the CI Method (for Use in Eqs 8 and 9) properties parameter type a(H) a(Cl) a(Br) a(F) a(I) a(N) a(O) a(P) a(S) a(C) a(Si) b c d E G

Tm (10-1)

Tb (10-1)

-1.951 16 -1.194 61 17.742 44 14.001 77 44.965 78 24.031 95 -8.182 62 -0.791 56 43.472 78 35.274 55 28.882 43 16.237 96 19.879 42 9.283 53 3.304 41 2.481 35 26.658 39 17.769 81 10.864 15 11.312 90 -1.340 33 3.731 42 2.631 05 -9.382 97 -10.868 99 4.604 18 0.000 00 18.371 91 1474.5 2225.4

Tc (10-2)

Pc (10-3)

Vc

-44.252 84 2.022 97 7.119 75 448.531 72 5.370 86 43.403 16 739.233 87 -6.189 42 51.377 39 -9.098 94 7.725 04 20.807 61 1 312.383 03 -10.784 26 68.446 31 584.222 14 4.093 67 39.903 16 372.271 66 -1.389 01 18.047 65 1 795.371 60 N/A -82.464 85 780.776 34 -8.430 30 32.232 63 324.582 68 5.499 90 31.797 84 N/A N/A N/A -327.749 36 2.129 80 3.096 75 125.059 14 6.521 88 7.874 95 388.551 35 -18.972 22 8.673 18 23123.9 5982.7 7.95 108.998

method, there is no need for saving the created groups. Also, since the group creation considers the bonds and atoms for the specific molecule, its use in another molecule would not be correct. To maintain consistency when applied to groups, the CI model is rewritten in the following form:

f(Ym) )

∑i (am,i Am,i) + b(vχ0)m + 2c(vχ1)m f(Y*) )

(∑ m

)

nm f(Ym) + d

(9)

(10)

In eqs 9 and 10, m indicates the number of different missing groups/fragments and nm indicates the number of times a missing group/fragment appears in the molecule. This ensures additivity regardless of the number of groups generated from the connectivity indices. Use of the CI parameters (Tables 1 and 4) in conjunction with the GC methods is proposed via the following step-by-step algorithm: • Step 1: Input the set of groups G for the given molecular structure S. • Step 2: Determine the group assignment for S (for example, in the Marrero and Gani method,2 the group

Hf

Gf

Hfus (10-1)

HV (10-1)

log Kow (10-1)

-34.777 51 -66.442 25 -40.041 62 -238.125 24 10.071 69 92.740 20 -176.070 06 -243.529 50 9.576 91 40.155 90 N/A -7.395 81 11.717 23 61.926 11 5.549

-15.256 65 -47.252 86 -30.932 67 -222.598 77 11.310 87 67.399 56 -168.720 71 N/A -1.173 46 35.307 60 N/A -21.519 51 15.348 11 97.288 21 -34.967

-0.141 97 3.800 52 5.007 44 0.521 16 5.686 94 4.409 31 3.621 08 -11.318 61 4.600 86 2.214 78 N/A -3.663 88 3.113 80 1.099 72 -28.06

1.102 26 101.044 45 152.640 74 9.473 31 202.487 15 130.470 19 100.135 61 N/A 125.616 16 55.237 68 N/A -52.334 20 25.389 32 -23.39402 117.33

-1.175 34 1.543 08 -0.573 23 0.209 02 -5.010 69 -4.673 89 -5.536 05 -2.831 10 0.223 41 2.433 30 7.160 47 2.696 38 0.423 65 0.940 70 5.429

assignment describing the molecular structure shown in Figure 1 is 4 CH2 cyc, 1 CH cyc, 1 CH3CO, 1 COOH, 3 CH2; note that this assignment does not represent the entire molecule). • Step 3: If S is fully described by groups of G, go to Step 6. Otherwise, determine the fragments s* of S that have not been described by any group of G, for example, the PdS group in Figure 1. There is only one missing fragment. • Step 4: For each fragment s* of S that has not been described by groups, determine the set of atoms involved in the fragment and calculate its connectivity indices vχ0 and vχ1. For example, referring to Figure 1, the calculated values of vχ0 and vχ1 for the fragment PdS (considering the 3 bonds with C atoms) are 1.9540 and 1.9563, respectively. • Step 5 using the CI-based model (eq 9), predict the value Ym corresponding to the contribution of fragment s* to property Y. For example, for the missing fragment of Figure 1 and normal melting point temperature,

f(Y) ) (0.330 441 1 + 2.665 839) + 0.263 101 5 × (1.954 01) - 2 × 1.086899 × (1.956 3) ) -0.742 252 756

7266

Ind. Eng. Chem. Res., Vol. 44, No. 18, 2005

Figure 3. Comparison of experimental and calculated property values for the CI method. Plots of experimental versus calculated values for (a) log Kow; (b) VC; (c) Tb; (d) TC; (e) PC; (f) Tm; (g) Hf (at 298 K); (h) Gf (at 298 K); (i) Hfus (at 298 K); (j) HV (at 298 K).

• Step 6: Create new groups with contributions Ym foreach fragment s*. That is, execute G ) G + G* where G* is the new set of groups needed to describe the structure S. For example, a new group PdS with bonds

to 3 C-atoms is created with the contribution for f(Ym) ) -0.742 257 256 • Step 7: Using eq 10, calculate Y* as the aggregate group contribution for all fragments s*. For example,

Ind. Eng. Chem. Res., Vol. 44, No. 18, 2005 7267 Table 5. Comparison of CI-GC Method against Measured Data and Marrero-Gani Method (with Missing Contribution) compound name

CAS no.

glycerol-betanitrate 6H-purin-6-one, 2-amino-1,9-dihydro9-[[2-hydroxy-1-(hydroxymethyl)ethoxy]methyl]mepanipyrim

000620-12-2 082410-32-0 110235-47-7

3-(2-nitrophenyl)-2-propynoic acid diethyl dimethylphosphoramidate phosphoramidic acid, 1,3-dithiolan-2-ylidene-, diethyl ester 1H-benzotriazole methyl-o-isopropylphosphonofluoridate

000530-85-8 002404-03-7 000947-02-4 000095-14-7 000107-44-8

soman thiophene, tetrahydro-3-methyl-, 1,1-dioxide acetamide, N-butyl-N-phenylacetamide, N-butyl-N-phenylacetamide, N-ethyl-N-phenyl-

000096-64-0 000872-93-5 000091-49-6 000091-49-6 000529-65-7

urea, N,N′-dimethyl-N,N′-diphenyl-

000611-92-7

acetamide, 2-chloro-N-(2,6-dimethylphenyl)-N-(2-methoxyethyl)-

050563-36-5

acetamide, N-methyl-N-phenylo-acetotoluidide, N-methylcarbamic chloride, methylphenylphosphonothioic acid, methyl-, O,S-diethyl ester phosphoric triamide, hexamethylphosphonic dichloride, phenylphosphorodichloridic acid, phenyl ester noname 952 diazene, (4-methoxyphenyl)phenylbenzenamine, 4-(phenylazo)chrysoidine bismuthine, triphenylbenzene, 1,1′-(1,2-ethynediyl)biscyclohexene, 1-methyl-4-(1-methylethylidene)-

000579-10-2 000573-26-2 004285-42-1 002511-10-6 000680-31-9 000824-72-6 000770-12-7 003283-12-3 002396-60-3 000060-09-3 000532-82-1 000603-33-8 000501-65-5 000586-62-9

2-propanol, 1-methoxy-

000107-98-2

dipropylene glycol

025265-71-8

missing group (structure)

property

known value

CI-GC

GC- a

OHCH2CHO OHCH2CHO OHCH2CHO C#CC CYC C#CC CYC C#CC CYC OdPO(O) OdPO(O) NdN CYC OdP OdP OdP OdP OdSdO CYC OdCNC CYC OdCNC CYC OdCNC CYC OdCNC CYC OdCNC CYC OdCNC CYC OdCNC CYC OdCNC CYC OdCNC CYC OdCNC CYC OdCNC CYC OdCNC CYC OdP OdP OdP OdP OdP NdN NdN NdN SiH C#CC CYC CdC CYC CdC CYC CdC CYC CdC CYC CdC CYC OHCHCH2O OHCHCH2O OHCHCH2O OHCHCH2O OHCHCH2O OHCHCH2O OHCHCH2O OHCHCH2O OHCHCH2O OHCHCH2O OHCHCH2O OHCHCH2O OHCHCH2O OHCHCH2O OHCHCH2O OHCHCH2O

T m, K T m, K log Kow T m, K log Kow T m, K Tm, K Tm, K Tb, K Tb, K T m, K log Kow Tb, K Tb, K Tb, K T m, K Tb, K T m, K Tb, K T m, K Tb, K T m, K log Kow Tb, K Tb, K Tb, K Tb, K Tb, K Tb, K Tb, K Tb, K Tb, K Tb, K Tb, K T m, K T m, K TC, K PC, bar VC, cm3/mol Tb, K log Kow TC, K T m, K Tb, K PC, bar VC, cm3/mol Gf, kJ/mol Hf, kJ/mol Hfus, kJ/mol T c, K T m, K Tb, K PC, bar VC, cm3/mol Gf, kJ/mol Hf, kJ/mol Hfus, kJ/mol

327.15 523.15 -1.66 405.95 3.28 430.15 281.45 314.15 623.15 420.15 216.15 0.3 471.15 549.15 554.15 297.65 533.15 328.15 623.15 395.15 593.15 319.15 2.17 529.15 533.15 553.15 502.55 505.65 531.15 515.15 650.15 613.15 639.15 535.15 350.75 335.65 672 27.7 509.00 459.15 4.47 553.0 131.2 392.2 43.4 294.0 -259.0 -403.9 11.1 654 233.2 503.6 35.8 415.0 -406.0 -628.0 22.7

371.28 529.32 -1.00 437.27 4.30 453.89 362.33 369.12 584.48 441.35 144.46 1.86 496.41 476.8 588.89 344.06 565.13 336.69 646.49 393.37 634.63 374.66 4.26 553.34 563.62 565.87 513.29 498.76 556.69 563.74 452.9 642.78 656.34 686.33 351.71 327.70 733.28 29.24 517.57 519.48 3.83 544.47 197.83 406.75 41.73 307.88 -266.99 -414.74 13.30 712.72 270.78 499.97 36.13 428.29 -402.18 -626.87 19.88

266.10 488.72 -1.99b 358.86 3.69b 383.36 186.79 208.04 491.39 323.91 309.64b 0.71b 410.52 296.68 463.4 370.40b 419.78 364.26b 557.68 411.26b 539.83 394.74b 2.57b 394.27 414.86 418.14 417.8 423.29 482.93 490.56 322.8 577.96 595.92 634.49 360.67 282.16 613.42 29.91 431.73 447.96b 3.08b 240.09 142.07b 361.75b 42.60b 289.27b -274.3b -417.2b 12.5b 515.54 235.5b 472.9b 36.5b 410.6b -407.6b -627.9b 12.5b

a GC- ) property estimated with Marrero-Gani method without the missing group contribution. b Property estimated with MarreroGani method with all group contributions available.

f(Y*) ) -0.742 257 256 + 0 for the normal melting point for the group created in step 7. • Step 8: Estimate the value of the property Y by using the group contribution model for Y with the added contribution of Y* (for example, by extension of eq 1)

f(Y) )

(∑ )

NiCi + f(Y*) + higher-order terms (11)

i

Results The predictive power of the combined GC-CI method has been tested with a large set of data not used in

model development for the Marrero and Gani method.2,3 Table 5 highlights a selection of these results. In Table 5, compounds/properties that could not be handled by the Marrero-Gani method because of missing group contributions (see the parameter tables of the MarreroGani method2,3) are now handled by the combined GCCI method. For each compound, the property estimated, the missing group, the known experimental data, and the estimated values with the GC-CI method and with the GC method without the missing group contribution are given. Also, for selected compounds, the properties estimated with the Marrero-Gani method without any

7268

Ind. Eng. Chem. Res., Vol. 44, No. 18, 2005

missing contributions have been compared with those of the GC-CI method with one group replaced by a created group. Since, generally, the correlation errors for the CI method are not too far from those of the Marrero-Gani method, this comparison confirms the results of Table 3. It can be seen that, in almost all cases, the combined GC-CI method moves the estimated property closer to the known experimental value. It should be noted, however, that the combined GC-CI method cannot improve the correlation accuracy of the original GC method. Thus, in some cases where the addition of the missing contribution moves the prediction to larger errors than the estimated values without the missing contribution, we believe that if the group contribution value was available, the error would have been closer to that of the GC-CI method. Data for many more test compounds can be obtained from the corresponding author. Note, however, that not all test compound data can be released, because data for some of the compounds are bounded by confidentiality agreements. Two illustrative examples of the application of the combined GC-CI method are given to highlight the application of the GC-CI method. Example 1: Prediction of the Properties of Nitroscanate (CAS No. 19881-18-6). Steps 1-3: The chemical structure of nitroscanate cannot be handled by the method of Marrero and Gani2,3 because the fragment SdCdN- cannot be represented, as shown in Figure 4.

478.15 K. For Tm, the calculated value is highlighted below. Nitroscanate is represented by the following firstorder groups: 8, aCH; 2, aC; 1, aC-O; 1, aC-NO2; and 1, S)C)N.

Tm ) 147.45 × ln(8 × 4.6880 + 2 × 0.9176 + 1 × 1.3045 + 1 × 4.3531 + 1 × 5.7692) ) 425.78 (first-order contribution only) Adding the second-order contribution for 2*AROMRINGs1s4 to the first-order value yields

Tm ) 425.78 + 15.34 ) 441.12 Adding the third-order contribution for 1*aC-O-aC to the first-order and second-order values yields

Tm ) 441.12 - 3.60 ) 437.52 Example 2: Thiotepa (CAS No. 52-24-4). Steps 1-3: The chemical compound shown in Figure 5 cannot be totally represented by the method of Marrero and Gani.2,3 The fragment SdP is not described.

Figure 5. Molecular structure of thiotepa.

Figure 4. Molecular structure of nitroscanate.

Step 4: The values of vχ0 and vχ1 calculated by using eqs 5 and 7 for the fragment SdCdN- are 1.5596 and 0.5896, respectively. From the atom/vertex values, we can calculate the atom indices as δS ) 2.67, δC ) 4, δN ) 5, and δC(out of fragment) ) 14. Using eq 2, we get the corresponding bond indices as βS-C ) 10.68, βC-N ) 20, and βN-C(out of fragment) ) 70. Using eqs 5 and 7, we now get the values of vχ0 and vχ1 as v 0

χ ) 0.612 + 0.5 + 0.447 ) 1.559

v 1

χ ) 1/(10.68)0.5 + 1/(20)0.5 + 0.5/(70)0.5 ) 0.5896

Steps 5-7: By using the CI model (eqs 9 and 10), the contributions of fragment SdCdN to Tm, Tb, and log Kow are predicted to be 10.9064, 5.4488, and 0.3628, respectively. The calculation of the contribution of the SdCdN group for Tm is highlighted below. There is 1 N-atom, 1 C-atom, and 1 S-atom in the SdCdN group. Therefore,

f(Y*) ) (2.888243 + 1.086 415 + 2.665839) + 0.2631051 × 1.5596 - 2 × 1.086899 × 0.5896 + 0 ) 5.7693 Step 8: By using the Marrero and Gani method,2,3 including the new group SdCdN, the estimated values of Tm, Tb, and log Kow are 437.52 K, 677.94 K, and 4.17. The experimental value of Tm for this compound is

Step 4: The values of vχ0 and vχ1 calculated by using eqs 5 and 7 for the fragment SdP are 1.9540 and 1.3412, respectively. Step 5-7: By using the CI model (eqs 9 and 10), the contributions of fragment SdP to Tm, Tb, and log Kow are predicted with values of 0.5949, 3.2639, and 0.4738, respectively. Step 8: By using the Marrero and Gani method,2,3 including the new group SdP, the estimated values of Tm, Tb, and log Kow are 259.86 K, 573.65 K, and 0.82. The experimental values of Tm and log Kow for this compound are 324.65 K and 1.03. Conclusions The CI-based method for the automatic creation of groups is a simple but reliable way of making safe predictions for a number of properties for which neither experimental data nor the property model parameters (in these cases, group contributions) are available. The method does not require additional experimental data or regression. It is able to create the missing group and create the missing contribution for the required property. The application range is currently limited by the groups that can be created through the available atoms and their connectivity indices given in Table 4. If and when experimental data is available, the predicted groups can be fine-tuned for the original GC method. It should be noted that we do not claim this method is the only way to predict the missing group contributions. Methods based on Quantitative Structure Activity Relationship (QSAR) and/or descriptors from molecular modeling could easily have also been used, while the CI-based method could have been made more accurate by adding additional terms. For our purposes (as pointed

Ind. Eng. Chem. Res., Vol. 44, No. 18, 2005 7269

out above), the use of connectivities appeared to be simple, easy to maintain and update, and compatible with the idea of an additive method for property estimation. Also, even though it has not been highlighted in this paper, we believe that the parameters given in Table 4 for the CI-based method for automatic group creation can be used as well for other GC-based methods for pure-component property estimation. Current and future work is adding more atoms and their connectivities, improving the property correlations for the CI method, and extending the approach to predict the missing UNIFAC group interaction parameters. Also, we believe further improvement in the regression of the CI parameters is possible. Nomenclature a(i) ) property model coefficient for atom i (see eqs 8 and 9 and Table 4) b ) property model coefficient (see eqs 8 and 9 and Table 4) c ) property model coefficient (see eqs 8 and 9 and Table 4) Ci ) contribution of the group i for a specific property d ) property model coefficient (see eqs 8 and 9 and Table 4) E ) property model coefficient (see Table 4) f(Y) ) function for property Y (see Table 4) G ) property model coefficient (see Table 4) Gf ) standard Gibbs free energy at 298 K, kJ/mol Gi ) group i Hf ) standard heat of formation at 298 K, kJ/mol Hfus ) standard heat of fusion at 298 K, kJ/mol HV ) heat of vaporization at 298 K, kJ/mol k ) number of internal bonds in the group/fragment log Kow ) octanol-water partition coefficient m ) number of bonds leaving the group/fragment (see eq 7) M ) number of edges or bonds (see eq 4) n ) number of groups/fragments (see eqs 5 and 6) ni ) number of atoms of atom i (see eqs 8 and 9) Ni ) number of times group i appears in the molecule

PC ) critical pressure, Bar Tb ) normal boiling point temperature, K TC ) critical temperature, K Tm ) normal melting temperature, K VC ) critical volume, Y ) property Y Greek Symbols δvi ) atom index for atom i βk ) bond index for connectivity k vχ0 ) zero-order connectivity index vχ1 ) first-order connectivity index

Literature Cited (1) Constantinou, L.; Gani, R. New Group Contribution Method for Estimating Properties of Pure Components. AIChE J. 1994, 40, 1697. (2) Marrero, J.; Gani, R. Group Contribution Based Estimation of Pure Component Properties. Fluid Phase Equilib. 2001, 183, 183. (3) Marrero, J.; Gani, R. A Group Contribution Based Estimation of Octanol-Water Partition Coefficient and Aqueous Solubility. Ind. Eng. Chem. Res. 2002, 41, 6623. (4) Kier, L. B.; Hall, H. L. Molecular Connectivity in Structure Activity Analysis; John Wiley & Sons: New York, 1986. (5) Trinajstic, N. Chemical Graph Theory; CRC Press: Boca Raton, FL, 1983. (6) Randic, M. Connectivity Index 25 Years After. J. Mol. Graphics Modell. 2001, 20, 19. (7) Wang, S.; Milne, G. W. A. Graph Theory and Group Contributions in the Estimation of Boiling Points. J. Chem. Inf. Comput. Sci. 1994, 34, 1242. (8) He, J.; Zhong, C. A QSPR Study of Infinite Dilution Activity Coefficients of Organic Compounds in Aqueous Solutions. Fluid Phase Equilib. 2003, 205, 303. (9) Lin, B.; Chavali, S.; Camarda, K. V.; Miller, D. C. ComputerAided Molecular Design using Tabu Search. Comput. Chem. Eng. 2005, 29, 337.

Received for review February 16, 2005 Revised manuscript received June 20, 2005 Accepted July 6, 2005 IE0501881