A Nonlinear Group Contribution Method for Predicting the Free

Aug 11, 2001 - A Nonlinear Group Contribution Method for Predicting the Free Energies of Inclusion Complexation of Organic Molecules with α- and β-C...
3 downloads 6 Views 57KB Size
1266

J. Chem. Inf. Comput. Sci. 2001, 41, 1266-1273

A Nonlinear Group Contribution Method for Predicting the Free Energies of Inclusion Complexation of Organic Molecules with r- and β-Cyclodextrins Takahiro Suzuki† Chemical Resources Laboratory, Tokyo Institute of Technology, 4259 Nagatsuta, Midori-ku, Yokohama 226-8503, Japan Received April 17, 2001

A group-contribution method was developed for calculating the binding constants or the free energies of complexation between native R- or β-cyclodextrin (CD) and organic guest molecules. The nonlinear models for binary (1:1 stoichiometry) complexes of R- and β-CDs were derived with squared correlation coefficients (r2) of 0.868 and 0.917 based on a database consisting of 102 and 218 diverse guest molecules, respectively. The parameters used in the models are first-order molecular connectivity index as a measure of molecular bulk and atom/group counts in the guest molecules. The models allow accurate estimations for the wide range of guests containing C, H, N, O, S, and/or all halogens by summing the contribution values of each defined group present in the chemical structure of the guest along with guest’s molecular size factors (linear and square terms) and then the summation to a constant coefficient value. The predictive performance of the models was tested by extra set of 27 compounds which were not included in the original data set. The predicted values by the models are in good agreement with the experimentally determined data. INTRODUCTION

Cyclodextrins (CDs) are cyclic carbohydrates made of six, seven, or eight glucose molecules joined in a ring, to form R-, β-, and γ-CD, respectively. The CDs all form cylindrical or doughnut-shaped molecules with their hydroxyl groups on the outside of the molecule and a relatively nonpolar hole down the middle. This hole can accommodate another molecule, called “guest”, and form noncovalent host-guest inclusion complexes with a variety of organic and inorganic molecules Via molecular recognition in aqueous solution.1 This allows them to be used in a wide range of applications in industrial, pharmaceutical, agricultural, and other related fields, including improving the solubility and stability (against the effects of light, heat, and oxidation) of drugs and selectively binding materials that fit into the central hole in affinity purification and chromatography methods. 2-4 The CDs also provide an excellent model system mimicking the substrate-specific interaction of enzymes.2 A variety of intermolecular interactions and solvation effects were proposed to explain the stability of the inclusion complexes formed with a great diversity of substrates. Hydrophobic desolvation and van der Waals interactions are among the most significant driving forces for CD complexation besides hydrogen bonding between polar groups of the substrate and the OH groups of the macrocycles. In addition, release of “high energy water” from the CD cavity, relief of conformational strain energy possessed by the uncomplexed CD, and molecular shape and size of guest have been postulated as key driving forces.1 However, there have been many uncertainties on the mechanisms for the CD inclusion complexation.1,5 †

Corresponding author phone and fax: (+81 45) 924-5255; e-mail: [email protected].

Until now, to identify the significant factors contributing the host-guest interaction and predict the thermodynamic stability of CD complexes, computational chemistry and QSAR/QSPR techniques, molecular mechanics computation,6 linear regression,7-10 partial least squares,11 and artificial neural networks12 have been applied. In a previous paper,13 traditional QSAR and the 3D-QSAR method of CoMFA14 were applied to derive quantitative relationships between the binding constants and guest molecular structures for natural β-CD, modified R-, β-, and γ-CD that bear one p-(dimethylamino)benzoyl moiety. However, the applicability range of these models to predict the complexation properties of guest molecules is rather limited. Experimental determination of the complex binding constant is often difficult and timeconsuming because of the low solubility of the guest molecules in aqueous solution. For instance, 10 days were required for the equilibrium system of digitoxin: β-CD.15 The divergence of the reported binding constants is often observed. Therefore, a reliable and convenient method for predicting the thermodynamic stability of CD complexes is desirable. The purpose of the present study is to propose a groupcontribution model (GCM) for predicting the free energies of complexation between guest molecules with R- and β-CDs based on a collection of reported binding data. GCMs or additive schemes for the estimation of thermochemical quantities have had wide usage over a long period.16 To our knowledge, development of GCMs to the study of inclusion complexes of CDs has not been reported in the literature. METHODS

Data Sets. The thermodynamic stability of a CD inclusion complex is generally expressed in terms of binding constant (or stability constant) of the complex, K. The 218 organic

10.1021/ci010295f CCC: $20.00 © 2001 American Chemical Society Published on Web 08/11/2001

PREDICTING

THE

FREE ENERGIES

OF

ORGANIC MOLECULES

J. Chem. Inf. Comput. Sci., Vol. 41, No. 5, 2001 1267

Table 1. Experimental and Calculated Values of Free Energies of Complexation (kJ/mol) and Estimation Error R-CD:guest system

β-CD:guest system

no.

guest

formula

∆Gobsd

∆Gcalcd



ref

∆Gobsd

∆Gcalcd



ref

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75

carbon tetrachloride chloroform methanol acetonitrile acetaldehyde ethanol 1,2-ethanediol acetone 1-propanol 2-propanol 1,3-propanediol tetrahydrofuran cyclobutanol 1-butanol 2-butanol 2-methyl-1-propanol 2-methyl-2-propanol 1,4-butanediol diethylamine cyclopentanol 1-pentanol 2-pentanol 3-pentanol 2-methyl-1-butanol 2-methyl-2-butanol 3-methyl-1-butanol 3-methyl-2-butanol 2,2-dimethyl-1-propanol 1,5-pentanediol 1,4-dibromobenzene 1,4-diiodobenzene 3,5-dibromophenol 3,5-dichlorophenol 1-chloro-4-nitrobenzene fluorobenzene bromobenzene iodobenzene 3-fluorophenol 4-fluorophenol 3-chlorophenol 4-chlorophenol 3-bromophenol 4-bromophenol 3-iodophenol 4-iodophenol nitrobenzene 4-nitrophenol benzene phenol hydroquinone 4-nitroaniline aniline sulfanilamide cyclohexanol 1-hexanol 2-hexanol 2-methyl-2-pentanol 3-methyl-3-pentanol 4-methyl-2-pentanol 3,3-dimethyl-2-butanol 1,6-hexanediol benzonitrile benzothiazole 4-nitrobenzoic acid benzaldehyde benzoic acid 4-hydroxybenzaldehyde 4-hydroxybenzoic acid benzyl chloride toluene benzyl alcohol anisole m-cresol p-cresol 4-methoxyphenol

CCl4 CHCl3 CH4O C2H3N C2H4O C2H6O C2H6O2 C3H6O C3H8O C3H8O C3H8O2 C4H8O C4H8O C4H10O C4H10O C4H10O C4H10O C4H10O2 C4H11N C5H10O C5H12O C5H12O C5H12O C5H12O C5H12O C5H12O C5H12O C5H12O C5H12O2 C6H4Br2 C6H4I2 C6H4Br2O C6H4Cl2O C6H4ClNO2 C6H5F C6H5Br C6H5I C6H5FO C6H5FO C6H5ClO C6H5ClO C6H5BrO C6H5BrO C6H5IO C6H5IO C6H5NO2 C6H5NO3 C6H6 C6H6O C6H6O2 C6H6N2O2 C6H7N C6H8N2O2S C6H12O C6H14O C6H14O C6H14O C6H14O C6H14O C6H14O C6H14O2 C7H5N C7H5NS C7H5NO4 C7H6O C7H6O2 C7H6O2 C7H6O3 C7H7Cl C7H8 C7H8O C7H8O C7H8O C7H8O C7H8O2

-9.25 -9.25 0.17 -4.28

-10.88 -11.25 -0.58 -5.95

1.63 2.00 0.75 1.67

20 20 7 20

-4.28 0.57

-3.52 -0.47

-0.76 1.04

7 7

-7.82 -3.94 -3.03

-7.07 -3.70 -3.80

-0.75 -0.24 0.77

7 7 7

-9.08 -11.13 -8.10 -8.22 -3.65 -5.20

-6.81 -10.40 -7.44 -7.44 -3.01 -6.88

-2.27 -0.73 -0.66 -0.78 -0.64 1.68

7 7 7 7 7 7

-9.48 -14.33 -12.16 -11.08 -11.64 -8.73 -10.67 -7.25 -8.39 -8.56 -17.07

-9.32 -13.49 -10.59 -10.90 -10.90 -6.80 -10.59 -7.82 -6.80 -9.74 -19.67

-0.16 -0.84 -1.57 -0.18 -0.74 -1.93 -0.08 0.57 -1.59 1.18 2.60

7 7 7 7 7 7 7 7 17 7 20

-17.81 -14.44 -10.45 -8.73 -15.47 -17.35

-17.54 -10.56 -13.40 - 7.98 -16.68 -19.64

-0.27 -3.88 2.95 -0.75 0.21 2.29

7 7 20 20 20 20

-5.20 -12.39 -13.99 -14.04 -16.44 -17.81 -19.01 -17.81 -12.55 -14.35 -6.17 -6.85 -15.47

- 5.98 -10.19 -10.19 -13.68 -13.68 -17.64 -17.64 -13.22 -10.90 -11.53 -9.68 -7.69 -15.20

0.78 -2.20 -3.80 -0.36 -2.76 -0.17 -1.37 -4.59 -1.65 -2.82 3.51 0.84 -0.27

7 7 7 7 7 7 7 7 17 17 7 20 20

-10.33 -16.84 -14.56

-11.60 -16.34 -13.52

1.27 -0.50 -1.04

17 7 7

-9.82 -7.42 -11.25

-10.67 -7.17 -12.37

0.85 -0.25 1.12

7 7 7

-12.97 -19.35 -17.15 -9.99 -17.52

-16.72 -15.76 -16.61 -13.58 -14.28

3.75 -3.59 -0.54 3.59 -3.24

17 20 17 7 20

-12.27 -7.59 -12.22 -6.68 -8.62 -9.08

-11.87 -8.18 -10.58 -9.87 -9.87 -8.39

-0.40 0.59 -1.64 3.19 1.25 -0.69

20 7 20 18 7 7

-12.56 -8.16 2.80 1.54 3.65 0.17 1.08 -2.40 -3.26 -3.60 -3.82 -8.39 -6.74 -6.99 -6.79 -9.25 -9.59 -3.65 -7.77 -11.87 -10.29 -8.51 -7.71 -11.87 -10.90 -12.84 -10.96 -15.48 -6.97 -16.98 -18.12 -14.61 -11.82 -12.27 -11.18 -14.28 -16.71 -9.70 -9.87 -13.01 -14.92 -14.33 -15.12 -16.73 -17.00 -11.64 -14.76 -12.72 -11.28 -11.72 -14.18 -9.13 -15.79 -15.27 -13.31 -11.30 -11.36 -12.27 -11.64 -15.70 -9.65 -12.74 -13.59 -13.36 -10.16 -12.13 -9.99 -12.54 -13.95 -11.93 -9.76 -13.24 -11.30 -13.68 -12.61

-11.78 -6.80 2.96 1.07 2.04 -0.16 0.67 -1.77 -3.37 -3.84 -2.37 -9.16 -5.69 -6.41 -7.02 -7.02 -9.18 -5.25 -8.15 -9.27 -9.29 -9.95 -10.01 -10.01 -12.31 -9.95 -10.62 -12.31 -7.97 -15.60 -19.34 -16.48 -13.61 -13.31 -11.27 -13.57 -15.44 -12.25 -12.25 -13.11 -13.11 -14.55 -14.55 -16.42 -16.42 -12.95 -13.69 -11.44 -12.52 -13.49 -13.02 -11.84 -14.81 -12.68 -12.01 -12.71 -15.32 -15.23 -13.40 -15.85 -10.52 -9.83 -13.02 -11.17 -8.85 -10.77 -9.69 -11.52 -14.73 -12.49 -11.05 -12.43 -13.47 -13.47 -13.27

-0.78 -1.36 -0.16 0.47 1.61 0.33 0.41 -0.63 0.11 0.24 -1.45 0.77 -1.05 -0.58 0.23 -2.23 -0.41 1.60 0.38 -2.60 -1.00 1.44 2.3.0 -1.86 1.41 -2.89 -0.34 -3.17 1.00 -1.38 1.22 1.87 1.79 1.04 0.09 -0.71 -1.27 2.55 2.38 0.10 -1.81 0.22 -0.57 -0.31 -0.58 1.31 -1.07 -1.28 1.24 1.78 -1.16 2.71 -0.98 -2.59 -1.30 1.41 3.96 2.96 1.76 0.15 0.87 -2.91 -0.57 -2.19 -1.31 -1.36 -0.30 -1.02 0.78 0.56 1.29 -0.81 2.17 -0.21 0.66

9 9 7 9 9 7 7 9 7 9 7 9 7 17 7 7 7 7 9 7 17 7 7 7 7 7 7 17 7 12 12 7 7 12 12 12 12 7 7 7 12 7 12 7 12 9 12 17 12 12 12 9 10 17 17 7 7 7 7 7 7 12 11 12 9 17 7 12 12 9 9 12 7 12 12

1268 J. Chem. Inf. Comput. Sci., Vol. 41, No. 5, 2001

SUZUKI

Table 1 (Continued) R-CD:guest system ∆

β-CD:guest system

no.

guest

formula

∆Gobsd

∆Gcalcd

ref

∆Gobsd

∆Gcalcd



ref

76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150

3-methoxyphenol 4-hydroxybenzyl alcohol hydrochlorothiazide N-methylaniline 1-butylimidazole 1-heptanol phenylacetylene thianaphthene 4-fluorophenyl acetate 3-fluorophenyl acetate 4-chlorophenyl acetate 3-chlorophenyl acetate 4-bromophenyl acetate 3-bromophenyl acetate 4-iodophenyl acetate 3-iodophenyl acetate 4-nitrophenyl acetate acetophenone phenyl acetate methyl benzoate 3-hydroxyacetophenone 4-hydroxyacetophenone acetoanilide p-xylene ethylbenzene phenetole 2-phenylethanol 3-ethylphenol 4-ethylphenol 4-ethoxyphenol 3-ethoxyphenol 3,5-dimethoxyphenol N-ethylaniline N,N-dimethylaniline barbital cyclooctanol 1-octanol 2-octanol quinoline 3-cyanophenyl acetate 4-hydroxycinnamic acid ethyl benzoate 4′-hydroxypropiophenone 3′-hydroxypropiophenone p-tolyl acetate 3-methylphenyl acetate 4-methoxyphenyl acetate 4-propylphenol 3-propylphenol 4-isopropylphenol 3-isopropylphenol 4-isopropoxyphenol 2-norbornaneacetate 1-benzylimidazole m-methylcinnamic acid 4-ethylphenyl acetate 3-ethylphenyl acetate 4-ethoxyphenyl acetate 3-ethoxyphenyl acetate allobarbital 4-n-butylphenol 3-n-butylphenol 3-isobutylphenol 4-sec-butylphenol 3-sec-butylphenol 4-tert-butylphenol 3-tert-butylphenol menadion sulfapyridine sulfamonomethoxine sulfisoxazole 4-n-propylphenyl acetate 3-n-propylphenyl acetate 4-isopropylphenyl acetate 3-isopropylphenyl acetate

C7H8O2 C7H8O2 C7H8ClN3O4S2 C7H9N C7H12N2 C7H16O C8H6 C8H6S C8H7FO2 C8H7FO2 C8H7ClO2 C8H7ClO2 C8H7BrO2 C8H7BrO2 C8H7IO2 C8H7IO2 C8H7NO4 C8H8O C8H8O2 C8H8O2 C8H8O2 C8H8O2 C8H9NO C8H10 C8H10 C8H10O C8H10O C8H10O C8H10O C8H10O2 C8H10O2 C8H10O3 C8H11N C8H11N C8H12N2O3 C8H16O C8H18O C8H18O C9H7N C9H7NO2 C9H8O3 C9H10O2 C9H10O2 C9H10O2 C9H10O2 C9H10O2 C9H10O3 C9H12O C9H12O C9H12O C9H12O C9H12O2 C9H14O2 C10H10N2 C10H10O2 C10H12O2 C10H12O2 C10H12O3 C10H12O3 C10H12N2O3 C10H14O C10H14O C10H14O C10H14O C10H14O C10H14O C10H14O C11H8O2 C11H11N3O2S C11H12N4O3S C11H13N3O3S C11H14O2 C11H14O2 C11H14O2 C11H14O2

-5.88

-8.39

2.51

7

-12.77 -19.18

-14.72 -18.97

0.95 -0.21

7 7

-6.56 -10.28 -12.90

-6.53 -10.74 -10.74

-0.03 0.46 -2.16

7 20 7

-15.98

-14.23

-1.75

7

-18.95 -10.46

-18.19 -10.27

-0.76 -0.19

7 17

-7.02

-10.74

3.72

7

-10.33 -11.65

-12.06 -14.78

1.73 3.13

20 20

-10.84 -12.16 -10.67 -9.65 -6.79

-12.59 -12.59 -10.60 -10.60 -6.30

1.75 0.43 -0.07 0.95 -0.49

7 7 7 7 7

-12.84 -21.69 -17.98 -8.33 -9.93 -18.84

-12.76 -21.37 -18.67 -9.42 -8.26 -20.13

-0.08 -0.32 0.69 1.09 -1.67 1.29

7 7 7 20 7 20

-11.19 -8.39

-10.42 -10.42

-0.77 2.03

20 7

-15.70 -14.33

-14.79 -14.79

-0.91 -0.46

7 7

-12.55 -10.15 -23.58

-12.96 - 8.66 -22.32

0.41 -1.49 -1.26

17 18 20

-10.90

-12.44

1.54

7

-9.87

-9.80

-0.07

7

-18.61 -16.56

-16.76 -16.76

-1.85 0.20

7 7

-10.96 -8.33

-11.03 -11.03

0.07 2.70

20 20

-13.82

-14.00

0.18

7

-12.05 -12.34 -10.04 -12.07 -12.50 -16.27 -13.48 -18.44 -12.05 -10.90 -14.27 -13.93 -15.30 -15.24 -17.13 -17.53 -12.16 -12.96 -11.99 -14.28 -11.76 -12.44 -12.54 -13.58 -14.77 -14.20 -12.27 -14.84 -15.37 -13.30 -13.41 -13.36 -13.33 -13.48 -10.19 -18.84 -18.10 -17.87 -12.10 -8.51 -16.15 -15.59 -15.01 -14.90 -14.21 -12.61 -13.99 -20.26 -18.72 -20.43 -19.64 -16.33 -20.50 -14.92 -16.75 -16.15 -15.30 -14.50 -14.21 -11.32 -22.66 -21.46 -24.03 -23.86 -23.18 -26.03 -25.18 -12.95 -15.43 -14.16 -13.25 -17.98 -18.72 -16.44 -19.18

-13.27 -11.89 -10.56 -12.79 -14.70 -14.56 -13.48 -15.72 -13.02 -13.02 -13.89 -13.89 -15.32 -15.32 -17.19 -17.19 -13.64 -12.18 -13.65 -13.67 -12.92 -12.92 -12.13 -13.45 -15.09 -14.82 -13.44 -15.93 -15.93 -15.52 -15.52 -13.47 -15.18 -13.48 -10.63 -18.69 -16.95 -17.75 -13.36 -10.95 -16.47 -15.76 -15.06 -15.06 -14.25 -14.25 -13.56 -18.19 -18.19 -19.03 -19.03 -18.50 -18.18 -17.34 -16.45 -16.22 -16.22 -15.36 -15.36 -11.32 -20.28 -20.28 -21.16 -21.17 -21.17 -23.86 -23.86 -12.50 -16.58 -11.79 -13.04 -18.02 -18.02 -18.98 -18.98

1.22 -0.45 0.52 0.72 2.20 -1.71 0.00 -2.72 0.97 2.12 -0.38 -0.04 -0.02 0.08 0.06 -0.34 1.48 -0.78 1.66 -0.61 1.16 0.48 -0.41 -0.13 0.32 0.62 1.17 1.09 0.56 2.22 2.11 0.11 1.85 0.00 0.44 -0.15 -1.15 -0.12 1.26 2.44 0.32 0.17 0.05 0.16 0.04 1.64 -0.43 -2.07 -0.53 -1.40 -0.61 2.17 -2.32 2.42 -0.30 0.07 0.92 0.86 1.15 0.00 -2.38 -1.18 -2.87 -2.69 -2.01 -2.17 -1.32 -0.45 1.15 -2.37 -0.21 0.04 -0.70 2.54 -0.20

7 12 10 12 18 7 12 11 7 7 7 7 7 7 7 7 7 12 7 12 7 12 12 12 12 12 7 7 12 7 7 7 12 12 10 7 7 7 11 7 10 12 7 7 7 7 7 7 7 7 7 7 17 18 10 7 7 7 7 10 7 7 7 7 7 7 7 10 10 10 10 7 7 7 7

PREDICTING

THE

FREE ENERGIES

OF

ORGANIC MOLECULES

J. Chem. Inf. Comput. Sci., Vol. 41, No. 5, 2001 1269

Table 1 (Continued) R-CD:guest system

β-CD:guest system

no.

guest

formula

∆Gobsd

∆Gcalcd



ref

∆Gobsd

∆Gcalcd



ref

151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218

4-n-amylphenol 4-tert-amylphenol carbutamide pentobarbital amobarbital thiopental dibenzofuran dibenzothiophene phenazine thianthrene carbazole phenoxazine phenothiazine furosemide phenobarbital sulfisomidine sulfamethomidine sulfadimethoxine 4-n-butylphenyl acetate 3-n-butylphenyl acetate 3-isobutylphenyl acetate 4-tert-butylphenyl acetate cyclobarbital hexobarbital 1-adamantaneacetate acridine phenanthridine xanthene N-phenylanthranilic acid mephobarbital 4-n-amylphenyl acetate flufenamic acid meclofenamic acid nitrazepam flurbiprofen sulfaphenazole bendroflumethiazide mefenamic acid acetohexamide fludiazepam nimetazepam fenbufen ketoprofen medazepam progabide griseofulvin tolnaftate prostacyclin triamcinolone cortisone prednisolone hydrocortisone corticosterone dexamethasone betamethasone paramethasone cortisone-21-acetate prednisolone-21-acetate hydrocortisone-21-acetate fluocinolone acetonide triamcinolone acetonide spironolactone dehydrocholic acid chenodeoxycholic acid ursodeoxycholic acid cholic acid hydrocortisone-17-butyrate cinnarizine

C11H16O C11H16O C11H17N3O3S C11H18N2O3 C11H18N2O3 C11H18N2O2S C12H8O C12H8S C12H8N2 C12H8S2 C12H9N C12H9NO C12H9NS C12H11ClN2O5S C12H12N2O3 C12H14N4O2S C12H14N4O3S C12H14N4O4S C12H16O2 C12H16O2 C12H16O2 C12H16O2 C12H16N2O3 C12H16N2O3 C12H18O2 C13H9N C13H9N C13H10O C13H11NO2 C13H14N2O3 C13H18O2 C14H10F3NO2 C14H11Cl2NO2 C15H11N3O3 C15H13FO2 C15H14N4O2S C15H14F3N3O4S2 C15H15NO2 C15H20N2O4S C16H12ClFN2O C16H13N3O3 C16H14O3 C16H14O3 C16H15ClN2 C17H16ClFN2O2 C17H17ClO6 C19H17NOS C20H32O5 C21H27FO6 C21H28O5 C21H28O5 C21H30O5 C21H30O4 C22H29FO5 C22H29FO5 C22H29FO5 C23H30O6 C23H30O6 C23H32O6 C24H30F2O6 C24H31FO6 C24H32O4S C24H34O5 C24H40O4 C24H40O4 C24H40O5 C25H36O6 C26H28N2

-19.18

-18.50

-0.68

7

-16.27

-15.32

-0.95

7

-12.97

-12.97

0.00

17

-23.92 -26.83 -13.08 -17.22 -17.53 -18.71 -16.95 -19.87 -13.76 -20.38 -13.93 -15.36 -15.59 -10.19 -18.37 -12.02 -13.31 -12.89 -20.66 -20.89 -21.86 -21.98 -15.47 -17.60 -24.69 -13.30 -14.67 -15.47 -16.53 -18.05 -21.69 -17.70 -15.27 -11.26 -21.07 -13.42 -10.83 -14.24 -16.77 -13.31 -9.89 -15.02 -16.31 -13.72 -14.48 -8.39 -21.90 -16.79 -19.27 -19.10 -20.30 -20.58 -22.00 -20.84 -21.31 -19.44 -20.65 -21.47 -20.05 -19.85 -20.03 -25.34 -19.29 -24.90 -25.71 -19.97 -18.43 -20.77

-22.21 -25.90 -16.57 -18.15 -18.24 -18.77 -14.65 -18.45 -12.77 -22.01 -13.47 -13.23 -17.04 -11.38 -14.74 -13.28 -11.30 -9.13 -19.65 -19.65 -20.67 -21.99 -17.25 -16.06 -24.03 -15.47 -15.45 -16.92 -15.92 -15.73 -21.13 -15.78 -14.28 -10.66 -18.88 -13.34 -12.05 -15.41 -15.60 -12.34 -11.08 -16.17 -17.25 -17.66 -14.48 -9.62 -21.84 -16.85 -18.40 -22.35 -20.59 -21.85 -21.62 -22.43 -22.43 -17.37 -19.76 -18.00 -19.26 -18.13 -20.86 -25.34 -22.28 -26.04 -26.04 -21.26 -17.01 -20.66

-1.71 -0.93 3.49 0.93 0.71 0.06 -2.30 -1.42 -0.99 1.63 -0.46 -2.13 1.45 1.19 -3.63 1.26 -2.01 -3.76 -1.01 -1.24 -1.19 0.01 1.78 -1.54 -0.66 2.17 0.78 1.45 -0.61 -2.32 -0.56 -1.92 -0.99 -0.60 -2.19 -0.08 1.22 1.17 -1.17 -0.97 1.19 1.15 0.94 3.94 0.00 1.23 -0.06 0.06 -0.87 3.25 0.29 1.27 -0.38 1.59 1.12 -2.07 -0.89 -3.47 -0.79 -1.72 0.83 0.00 2.99 1.14 0.33 1.29 -1.42 -0.11

7 7 10 10 17 10 11 11 11 11 11 11 11 10 17 10 10 10 7 7 7 7 10 10 17 11 11 11 17 10 7 17 17 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 19 10 19 10 19 10 19 19 19 19 19 19 19 19 19 19 19 19 19 10

guests used in developing the GCMs, their experimental free energies of complexation for R- and β-CDs at 25 °C, and literature sources are presented in Table 1.7,9,10-12,17-20 For the R-CD system, available data is limited since its cavity size is small, and resulting molecular recognition is effective

to bind the smaller molecules such as aliphatic alcohols and monosubstituted benzene derivatives but showed no response to large cyclic compounds. Otherwise a missing value in the database means that a particular compound was not tested or reliable experimental value could not be found.

1270 J. Chem. Inf. Comput. Sci., Vol. 41, No. 5, 2001

SUZUKI

The set of guests is structurally diverse and includes a large number of classes of organic compounds: aromatic hydrocarbons, alcohols, phenols, ethers, aldehydes, ketones, acids, esters, nitriles, anilines, halogenated compounds, heterocycles, nitro, and sulfur compounds. Besides structurally complex compounds, such as steroids and barbitals are included. The measured K data were converted into free energies ∆G () -RT ln K) in kJ/mol. Model Development. The complex stability, ∆G, can be expressed as a function of a set of group contributions G1, G2, ..., Gm

∆G ) -RT ln K ) f(G1, G2,..., Gm)

(1)

where m is the number of group-contributions considered in a particular group-contribution table. The group or fragment contributions are numerical quantities associated with single atoms, atom pairs, or multiatom groups or functional groups.21 The GCM function may be linear or nonlinear. In our previous study,13 a nonlinear dependency of binding constants on the zeroth and/or first-order molecular connectivity index (0χ, 1χ)22 as a measure of size was found for the natural β-CD:guest system. The free energy of complexation according to our approach can, therefore, be expressed by follows

∆G ) c0 + c1D + c2D + ∑niGi 2

(2)

where c0 is a model-specific constant, D is a molecular descriptor as a measure of bulkiness, c1 and c2 are the regression coefficients for D and its quadratic term, respectively, Gi is the contribution of the group of type i, ni is the number of times this group occurs in the molecule, and the summation is carried over all type i. Regression coefficients were estimated by the method of least squares, using Excel Multivariate Analysis v3.0 (Esumi Co., Ltd., Tokyo, Japan) on a microcomputer running Windows 95 as its operating system. The quality of this model (i.e., the precision of its predictions and their reliability) is critically dependent on the definition of a set of groups and on the values used. Any structure can be divided into groups, but their number can be considerable. Generally, as the size of the groups increases, more precise estimates would be available. However, the sheer volume of experimental data required to determine their contribution values is a prohibitive factor from a viewpoint of statistical reliability. The definition of what constitutes the best value of a group contribution is not very rigid. After considering the available number of data points, easy identification of constitutional groups in a molecule, and implicit inclusion of the constitutive factors, a similar set of groups used earlier by Lydersen for estimating critical properties23 was employed in this study. This type of group was successfully applied to estimate 11 kinds of fundamental physical properties of pure materials as a common set of structural groups by Joback and Reid.24 RESULTS AND DISCUSSION

r-CD:Guest System. The experimentally determined ∆G values at 25 °C for 1:1 complexes between natural R-CD and 102 organic compounds containing C, H, N, O, and all halogens were used to obtain the model. As readily available

Figure 1. Correlation between the calculated (eq 3) versus experimental free energies of binding for R-cyclodextrin with 102 guest molecules.

and computable descriptors for D, 0χ, 1χ and molecular weight (Mw) were attempted to introduce. Although other geometrical or topological descriptors such as molecular volume, molecular surface area, and molecular refraction could be also used as candidates for D, the former three descriptors are much easier to prepare for any compounds without calculation errors. The best GCM using 1χ was given by the following equation

∆G[kJ/mol] ) -1.13 - 10.761χ + 0.4631χ2 + ∑niGi (3) (n ) 102, r2 ) 0.868, SD ) 1.97, ME ) 1.32) where n ) number of compounds, r2 ) explained variance, SD ) standard deviation, and ME ) absolute mean error. Table 2 lists the coefficient or contribution (Gi) for 22 groups together with the coefficients for 1χ (-10.76) and 1χ2 (0.463) parameters. The statistical quality of the above model was slightly inferior (r2 ) 0.828) without the two terms for 1χ and 1χ2. Good models were also obtained with 0χ (r2 ) 0.848) or Mw (r2 ) 0.854), respectively, instead of 1χ. A plot of the experimental ∆G values against the calculated ∆G values based on eq 3 for all the compounds in the data database is shown in Figure 1. β-CD:Guest System. On the basis of measured ∆G values at 25 °C for 1:1 complexes of β-CD with 218 organic compounds containing C, H, N, O, S, and halogens, the GCM was obtained as follows:

∆G[kJ/mol] ) 8.48 - 3.531χ + 0.3261χ2 + ∑niGi

(4)

(n ) 218, r2 ) 0.917, SD ) 1.66, ME ) 1.20) The contributions (Gi) for 39 groups are also listed in Table 2. The experimental values are plotted versus the predicted ones in Figure 2. Omission of two parameters, 1χ and 1χ2, gave a substantially inferior fit with r2 ) 0.771. The employment of 0χ or Mw in place of 1χ gave slightly inferior r2 values of 0.908 and 0.907, respectively, as well as the case of R-CD. Group Contributions to CD Inclusion Complexation. The importance of individual contributions to the complexation process can be related to the value of the respective regression coefficients in Table 2. The negative sign of the regression coefficient indicates that the groups should enhance complexation ability. For the case of β-CD:guest

PREDICTING

THE

FREE ENERGIES

OF

ORGANIC MOLECULES

J. Chem. Inf. Comput. Sci., Vol. 41, No. 5, 2001 1271

Table 2. Structural Fragments and Their Contributions to the Free Energies of Compexiation fragment or parameter constant χ 1χ2 1

no. of compds

R-CD:guest system freq of occurrence

contribution

no. of compds

β-CD:guest system freq of occurrence

contribution

-1.13 -10.76 0.463

218 218

218 218

8.48 -3.53 0.326

144 95 34 13 1 5 3 1

261 190 36 13 2 11 3 1

-3.18 -1.98 -1.69 -3.23 -1.49 -2.41 -2.09 -3.55

102 102

102 102

-CH3 -CH2>CH>C< )CH2 )CH)C< -CtCH

62 41 13 7

96 101 15 7

Nonring Increments 2.12 1.06 1.71 2.70

-CH2>CH>C< )CH)C
CdO (nonring) >CdO (ring) -CHO (aldehyde) -COOH (acid) -COO- (ester)

2.19 3.93 10.84 Nitrogen Increments

-NH2 >NH (nonring) >NH (ring) >N- (nonring) >N- (ring) -N) (nonring) -N) (ring) -CN -NO2

3 2 6

5 2 6

5.93 7.35 7.32 Sulfur Increments

-S-(nonring) -S-(ring) -SO2>CdS

Figure 2. Correlation between the calculated (eq 4) versus experimental free energies of binding for β-cyclodextrin with 218 guest molecules.

system, the presence of carbon (except only )C < (ring)), halogen, and sulfur increments with negative coefficients results in an increase of the stability of the complex, while

the presence of most oxygen and nitrogen containing groups (except -OH (phenol), -O-(ring), >CdO(ring), -COOH, -NH2, and -NO2) decreases the complex stability according to their positive coefficients. On the contrary, a markedly different trend can be found for the R-CD:guest system. The contribution values have positive signs with the exception of -Br and -Cl. The reason can be attributable to the difference in the number of possible configurations and resulting interaction effects between the guests and the host cavity in the two complex systems. The major difference in the composition of the driving forces for R- and β-CD can be caused by the cavity dimensions. The significant contribution of van der Waals interactions can be assessed from the negative signs of the coefficients for the fragments containing carbon and halogens in the β-CD system. It seems that the contribution values for F, Cl, Br, and I in both the R-CD and the β-CD system are correlated with the effective van der Waals radii of their

1272 J. Chem. Inf. Comput. Sci., Vol. 41, No. 5, 2001

SUZUKI

Table 3. De Novo Pediction of Free Energies of Complexation (kJ/mol) Using Eqs 3 and 4 host

guest

formula

∆Gobsd

∆Gcalcd



ref

RRRRRRRRRRRRβββββββββββββββ-

pyridine 3-cyanophenol 4-cyanophenol cycloheptanol 3,5-dichlorophenyl acetate 1-phenylimidazole 3-methybenzoic acid 3,5-dimethylphenol 3-methoxyphenyl acetate 3,5-dimethylphenyl acetate 3,5-dimethoxylphenyl acetate L-R-O-benzylglycerol 2-methoxyethanol cycloheptanol 3-hydroxycinnamic acid ethyl 4-hydroxybenzoate ethyl 4-aminobenzoate 4-methylcinnamic acid sulfadiazine L-R-O-benzylglycerol sulfamerazine butyl 4-hydroxybenzoate butyl 4-aminobenzoate benzidine triflumizole diazepam prostaglandine E2 absolute mean error

C5H5N C7H5NO C7H5NO C7H14O C8H6Cl2O2 C8H8N2 C8H8O2 C8H10O C9H10O3 C10H12O2 C10H12O4 C10H14O3 C3H8O2 C7H14O C9H8O3 C9H10O3 C9H11NO2 C10H10O2 C10H10N4O2S C10H14O3 C11H12N4O2S C11H14O3 C11H15NO2 C12H12N2 C15H15ClF3N3O C16H13ClN2O C20H21O5

-12.55 -9.99 -10.62 -10.84 -14.16 -8.00 -13.81 -6.85 -7.88 -8.79 -7.31 -6.66 -1.26 -18.44 -14.63 -17.19 -15.34 -15.13 -14.40 -12.03 -11.28 -19.37 -18.24 -19.11 -15.19 -13.31 -17.64

-8.56 -8.41 -8.41 -14.50 -10.59 -7.08 -16.47 -9.92 -8.26 - 9.97 -4.96 -4.77 -3.75 -15.92 -16.46 -16.21 -15.53 -16.44 -13.52 -15.60 -13.28 -19.64 -18.96 -19.25 -19.21 -13.96 -20.37

-4.00 -1.58 -2.21 3.66 -3.57 -0.92 2.66 3.07 0.38 1.18 -2.35 - 1.89 2.50 - 2.52 1.83 -0.98 0.19 1.31 0.88 3.59 2.00 0.27 0.72 0.14 4.02 0.65 2.73 1.92

17 7 7 7 7 18 17 7 7 7 7 18 7 7 10 10 10 10 10 18 10 10 10 10 10 10 10

atoms. A recent QSAR study on inclusion complexes between 16 para-substituted phenols and natural β-CD showed that the molar refractivity as a measure of van der Waals interactions represents an approximation for host/guest complementarity.25 The study identified that hydrophobic and dipolar interactions were minor factors to binding for the phenolic series. For the binding of R- and β-CD with 24 monosubstituted benzenes, it was concluded that the inclusion by R-CD was predominantly driven by van der Waals force and with β-CD by both van der Waals force and hydrophobic interactions.26 The positive coefficients for oxygen and nitrogen containing groups in the β-CD system can be assigned to the possibility of the competitive interactions with the solvent as discussed by Park and Nah.9 It is worth to note the difference of the signs of -OH in alcoholic and phenolic types in the β-CD system. In contrast, both have positive coefficients in the R-CD system. Only phenolic OH in the β-CD system has a negative coefficient and should have an increased complexation ability, suggesting the possibility to stabilize the complex by forming hydrogen bonding with the host’s hydroxyl groups. However, the contribution by the phenolic OH moiety of the guests may not play a major role according to the magnitude of their coefficients. The model can be further developed incrementally by determining the missing contributions for the R-CD:guest system and adding new substructures as experimental data become available. Comparison with Molecular Modeling Approaches. The orientation and alignment of the guest molecule in the CD cavity is crucial in molecular modeling analysis or many 3DQSAR methods. In fact, various probable starting orientations and positions of the guest molecule within the CD cavity were studied, and the lowest energy one was used in our previous CoMFA analyses.13 For the case of disubstituted

benzenes, two and three orientational isomers for the complexes with p- and m-substituted phenols, respectively, were postulated, and the observed binding constants were divided into the corresponding dissociation constants of the isomers involved.7 In contrast, the present GCM approach does not depend on structural alignment of guest molecules to CDs, thus eliminating a substantial measure of conjecture. Predictive Performance of the GCMs. The predictive capabilities of eqs 3 and 4 were tested by a de noVo prediction, using the data set of host-guest systems not included in the deduction of the models, as shown in Table 3. The absolute mean error for the 27 compounds is 1.92. This value is somewhat larger than the values of 1.32 for eq 3 and 1.22 for eq 4. The results of the table confirm that the models have a good and reasonable predictive ability. To illustrate a typical estimation of ∆G, we will explain the calculations necessary for 4-tert-butylphenyl acetate (1χ ) 6.3929): β-CD complex using the GCM. Using Table 2 and eq 4, we have

∆G ) equation constant + (-3.53)(1χ) + (0.326)(1χ2) + (-3.18)(four -CH3) + (-3.23) (one nonring >C