Variable Molecular Connectivity Indices for ... - ACS Publications

Mar 11, 2009 - School of Chemistry & Chemical Engineering, Xuzhou Normal University, ... Xuzhou Institute of Technology, Xuzhou, Jiangsu 221008, Peopl...
0 downloads 0 Views 1MB Size
Ind. Eng. Chem. Res. 2009, 48, 4165–4175

4165

CORRELATIONS Variable Molecular Connectivity Indices for Predicting the Diamagnetic Susceptibilities of Organic Compounds Lailong Mu,* Hongmei He, Weihua Yang, and Changjun Feng School of Chemistry & Chemical Engineering, Xuzhou Normal UniVersity, Xuzhou, Jiangsu 221116, People’s Republic of China, Xuzhou College of Industrial Technology, Xuzhou, Jiangsu 221006, People’s Republic of China, and School of Chemistry & Chemical Engineering, Xuzhou Institute of Technology, Xuzhou, Jiangsu 221008, People’s Republic of China

For predicting the molar diamagnetic susceptibilities of organic compounds, a variable molecular connectivity index mχ′ and its converse index mχ′′ based on the adjacency matrix of molecular graphs and the variable atomic valence connectivity index δi′ were proposed. The optimal values of parameters a, b, and y included in definition of δi′, mχ′ and mχ′′ can be found by optimization methods. When a ) 1.10, b ) 2.8, and y ) 0.36, a good five-parameter model can be constructed from mχ′ and mχ′′ by using the best subsets regression analysis method for the molar diamagnetic susceptibilities of organic compounds. The correlation coefficient r, standard error s, and average absolute deviation (AAD) of the multivariate linear regression (MLR) model are 0.9930, 4.99, and 3.72 cgs, respectively, for the 720 organic compounds (training set). The AAD of predicted values of the molar diamagnetic susceptibility of another 361 organic compounds (test set) is 4.37 cgs for the MLR model. The results show that the current method is more effective than literature methods for estimating the molar diamagnetic susceptibility of an organic compound. 1. Introduction Topological indices (TIs) play an unquestionable role in research fields related to the prediction of physical, chemical, and biological propertied of molecules. The uses of these molecular descriptors have permitted the prediction of many properties of chemical, pharmaceutical, toxicological, and environmental relevance.1-4 In these studies, the TIs used can be considered as “classic” molecular descriptors, where the term “classic” is used here to distinguish the indices that have been defined in an ad hoc way in the literature and which are not necessarily optimal for modeling or predicting a specific property or activity. Some of these “classic” TIs were created in contexts which are very different from the quantum structure-property relationship/quantum structure-activity reletionship (QSPR/QSAR) research fields. Thus, there is no guarantee that the use of these “classical” TIs can predict effectively a property/activity for an arbitrary set of compounds, even for the same series of compounds originally used to define the TI. Basically, there are two approaches used to try to amend the lack of correlation of one particular TI with an experimental property. One of these approaches is to try with another TI with the hope that this index will give a better correction. Another is to try by combining several TIs into a multivariate linear regression (MLR) model. Instead of these no optimal approaches we have proposed the use of an optimization approach for the Kier-Hall index.3 Here, we will explain in detail this approach, and we will use it for optimizing models to predict the diamagnetic susceptibility of organic compounds. The molar susceptibility χm of a compound is an important physicochemical property. If substance has χm < 0 and is diamagnetic, its molar diamagnetic susceptibility can be cal* To whom correspondence should be addressed. Tel.: +86-51683536538. Fax: +86-516-83403164. E-mail address: mull@ xznu.edu.cn.

culated from the Langevin-Pauli formula.5,6 There has been a continued interest in chemistry, based on the theoretical and practical importance of this property, for the determination and prediction of the diamagnetic susceptibility of organic and organometallic compounds.7-12 Because the experimental determination of diamagnetic susceptibility requires strict laboratory conditions and indispensable equipment, in some simple cases, the theoretical calculation of diamagnetic susceptibility seems to be attractive. The theoretical research for magnetic susceptibility dates from the beginning of the twentieth century. Magnetic susceptibilities of stable organic species have been observed to be approximately fit by a sum of atomic contribution, the so-called Pascal’s constants,6 which have been successful in the prediction of diamagnetic susceptibilities of organic compounds and are still in use even today.6 A subsequent expansion, motivated by the simple quantum chemical treatment, was developed by Hameka13-17 to describe diamagnetic susceptibilities of several classes of organic compounds. In Hameka’s approach, not only are atomic contributions considered, but also contributions coming from bonds and bond-bond interactions are included. However, a large number of experimental values of diamagnetic susceptibilities are needed to obtain that large family of parameters, and most of the results obtained are quantitatively incomparable with the experimental value. Subsequent studies include quantum chemical approaches such as the path-integral formulation of quantum mechanics, the Weizsa¨cker energy of many-electron systems, density-functional methods,18-21 and integrated molecular transform.22 However, these theoretical methods have some disadvantage, specifically in the domain of applicability or complicated calculation methods. The diamagnetic property of a compound is determined mainly by its molecular structure, rather than by its bulk as a whole.23 Hence, the application of graph theory in this context may have some merits, although the structure here

10.1021/ie801252j CCC: $40.75  2009 American Chemical Society Published on Web 03/11/2009

4166 Ind. Eng. Chem. Res., Vol. 48, No. 8, 2009 Table 1. Atomic Attributes and δi′ Values for Organic Compounds (a ) 1.10 and b ) 2.8) No.

atom groupa

δiv

xi

δi′

No.

atom group

δiv

xi

δi′

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

sCH3 ssCH2 sssCH ssssC dCH2 dCH2(c)d dsCH dsCH(c)d dssC dssC(c)d ddC tCH stC sOH ssO dO dO(c)d sNH2 sNH2(c)d ssNH ssNH(c)d sssN sssN(c)d dNH dNH(c)d

1 2 3 4 2 2 3 3 4 4 4 3 4 5 6 6 6 3 3 4 4 5 5 4 4

2.551 2.551 2.551 2.551 2.64 2.64 2.64 2.64 2.64 2.64 2.818 2.818 2.818 2.747 2.747 3.751 3.751 2.929 3.32 2.929 3.32 2.929 3.32 3.32 3.32

1.0000 2.0000 3.0000 4.0000 2.2016 2.1575 3.3023 3.2363 4.4031 4.3151 5.2857 3.9643 5.2857 6.1516 7.3819 17.6593 17.3061 4.4172 6.1482 5.8896 8.1975 7.3620 10.2469 8.3648 8.1975

26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49

dsN dsN(c)d tN dN ddN ddsN ddsN(c)d dssN dssN(c)d sSH ssS dS dS(a)d dssS dssS(c)d ddssS sPH2 ssPH sssP dsssP sF sCl sBr sI

5 5 5 5 5 20c 20c 10b 10b 0.56 0.67 0.67 0.67 1.33b 1.33b 2.67c 0.33 0.44 0.56 2.22c 7 0.78 0.26 0.16

3.32 3.32 3.515 3.32 3.515 3.32 3.32 3.32 3.32 2.101 2.101 2.761 2.761 2.761 2.761 2.596 2.121 2.121 2.121 2.409 3.515 2.626 2.404 2.121

10.4560 10.2469 12.2679 10.4560 12.2679 41.8242 40.9877 20.9121 20.4938 0.2590 0.3108 0.6679 0.6545 1.3357 1.3090 2.2481 0.1596 0.2128 0.2660 1.5196 17.1751 0.6771 0.1579 0.0634

a Symbols: s for single bond, d for double, t for triple. the conjugated π-electron system.

b

δiv ) 2[(Ziv - hi)/(Zi - Ziv -1)]. c δiv ) 4[(Ziv - hi)/(Zi - Ziv -1)].

is approximately represented by the graph without taking the geometrical details into account. In fact, many attempts have been made in this direction, from the Wiener index,1 the Randic index2,24 for alkane, the Kier-Hall index3,25 for both alkanes and aliphatic alcohols, the cluster expansion method,26,27 to the Estrada TOSS-MODE approach.28-30 However, more or less these models have some shortage, biggish prediction error or lesser domain of application. Recently, we have developed some QSPR/QSAR models to estimate the diamagnetic susceptibilities of inorganic or organic compounds by using various model parameters.31-34 In this study, to construct the more accurate model for predicting the diamagnetic susceptibilities of organic compounds, the atomic valence connectivity index δi′ was farther modified using parameters a and b based on our formal works.34 A new variable molecular connectivity index mχ′ and its converse index mχ′′ including parameter y were proposed based on the Kier-Hall index3,25 and δi′. The optimal values of parameters a, b, and y can be found by optimization method. The optimal novel molecular connectivity index has a better correlation for the diamagnetic susceptibilities of organic compounds. 2. Variable Molecular Connectivity Indices Molecular valence connectivity indices,3,25 which contain a large amount of information about the molecule, have been widely used as structure descriptors. The details of their definition and the calculation method can be found elsewhere. The general expression for the mth-order molecular valence connectivity index is as follows: nm

m v χk

)

m+1

∑(∏δ )

v -1/2 i j

j)1

(1)

i)1

where parameter m is the order of the molecular valence connectivity index, k denotes a contiguous path type of fragment, which is divided into paths (P), clusters (C), path/ clusters (PC), and chains (cycles) (CH). Parameter nm is the number of the

d

(c): the atom group in

relevant paths, and δiv is the atomic valence connectivity index and is defined as δvi

)

Zvi - hi Zi - Zvi - 1

(2)

where parameter Zi is the number of electrons of atom i, Zvi the number of valence electrons, and hi the number of hydrogen connected with atom i. In our study, it was found that the δvi value can be used not only to distinguish different atoms, but also to reflect some atomic characteristic. However, the δvi value does not distinguish the precise chemical environment of an atom. For instance, the δvi values of the carbon atom all are in atom groups “sssCH”, “dsCH”, “dsCH(c)”, and “tCH”(symbols: “s” for single bond, “d” for double, “t” for triple, “c” for conjugated). Therefore, we defined a novel δi′ value for all atoms by a united formula according to the structural characteristic of different atoms. The δi′ value is defined as δi′ )

Zvi - hi (Zi - Zvi - 1)a

(xi /2.551)b

(3)

where parameter Zi is the number of electrons of atom i, Ziv is the number of valence electrons, hi is the number of hydrogens connected with atom i, xi is the orbital electronegativity,35 and the constant, 2.551, is the sp3 hybrid orbital electronegativity of the carbon atom. The values of parameters a and b are variable, and the optimal values can be found by an optimization method. Besides, the δi′ of the atom in the conjugated π-electron system is multiplied by a factor of 0.98 because the π-electron not only belongs to an atom but also is shared by more atoms in these conjugated systems. For a saturated hydrocarbon, xi ) 2.551, δi′ ) δvi . The xi and δi′ (for a ) 1.10 and b ) 2.8) values for some atoms in different atom groups are calculated from eq 3 and listed in Table 1. When the δi′ is used instead of δvi , the novel connectivity index mχ′ and its converse index mχ′′ can be defined as follows:

Ind. Eng. Chem. Res., Vol. 48, No. 8, 2009 4167 nm

m

χk′ )

∑ ( ∏ δ ′)

-y

i j

j)1

(4.4031 × 5.8896 × 4.4031 × 5.8896 × 4.4031)-0.36 + (17.6593 × 4.4031 × 2 × 4.4031 × 17.6593)-0.36 ) 1.1262

m+1

(4)

i)1

4 nm

m

χk′′ )

m+1

∑ ( ∏ δ ′)

y

i j

j)1

(5)

5

i)1

where, parameter m is the order of the molecular connectivity index and y is a variable, whose optimal value can be found by the optimization method. The 0χ′, 1χ′, 2χ′, 3χp′, 3χc′, 4χp′, 4χc′, 5 χp′, 4χpc′, 5χpc′, χch′, 0χ′′, 1χ′′, 2χ′′, 3χp′′, 3χc′′, 4χp′′, 4χc′′, 5χp′′, 4 χpc′′, 5χpc′′, and χch′′ of 1081 organic compounds were calculated by the MATALAB program for the various values of parameters a, b, and y. For example, when a ) 1.10, b ) 2.8, and y ) 0.36, the 0χ′, 1χ′, 2χ′, 3χp′, 3χc′, 4χp′, 4χc′, 5χp′, 4χpc′, 5χpc′, χch′, 0 χ′′, 1χ′′, 2χ′′, 3χp′′, 3χc′′, 4χp′′, 4χc′′, 5χp′′, 4χpc′′,5χpc′′ and χch′′ of thiobarbituric acid (Figure 1 gives the structure and δi′ values of thiobarbituric acid) were calculated as follows: χ′ ) 2 × 17.6593-0.36 + 3 × 4.4031-0.36 +

0

χc′ ) 0

χp′ ) 6 × (5.8896 × 4.4031 × 5.8896 × 4.4031 × 2 × 4.4031)-0.36 + 2 × (4.4031 × 5.8896 × 4.4031 × 2 × 4.4031 × 17.6593)-0.36 + 2 × (0.6679 × 4.4031 × 5.8896 × 4.4031 × 2 × 4.4031)-0.36 + 2 × (4.4031 × 5.8896 × 4.4031 × 5.8896 × 4.4031 × 17.6593)-0.36 ) 0.5541

4

χpc′ ) 4 × (17.6593 × 4.4031 × 5.8896 × 2 × 4.4031)0.36 + 2 × (0.6679 × 4.4031 × 5.8896 × 5.8896 × 4.4031)-0.36 ) 0.4233 χpc′ ) 0

5

χch′ ) (5.8896 × 4.4031 × 5.8896 × 4.4031 × 2 × 4.4031)-0.36 ) 0.0438

2 × 5.8896-0.36 + 2-0.36 + 0.6679-0.36 ) 5.4627 χ′ ) 2 × (17.6593 × 4.4031)-0.36 +

1

4 × (4.4031 × 5.8896)-0.36 + 2 × (2 × 4.4031)-0.36 + (0.6679 × 4.4031)-0.36 ) 3.2483

0

20.36 + 0.66790.36 ) 16.6729 1

χ′ ) 2 × (17.6593 × 4.4031 × 5.8896)-0.36 +

2

2 × (17.6593 × 4.4031 × 2)-0.36+ 2 × (4.4031 × 5.8896 × 4.4031)-0.36 + 2 × (2 × 4.4031 × 5.8896)-0.36+ 2 × (0.6679 × 4.4031 × 5.8896)-0.36 + (4.4031 × 2 × 4.4031)-0.36+ (5.8896 × 4.4031 × 5.8896)-0.36 ) 2.5395

χ′′ ) 2 × 17.65930.36 + 3 × 4.40310.36 + 2 × 5.88960.36+ χ′′ ) 2 × (17.6593 × 4.4031)0.36 + 4 × (4.4031 × 5.8896)0.36 + 2 × (2 × 4.4031)0.36 + (0.6679 × 4.4031)0.36 ) 28.3520

2

χ′′ ) 2 × (17.6593 × 4.4031 × 5.8896)0.36 + 2 × (17.6593 × 4.4031 × 2)0.36 + 2 × (4.4031 × 5.8896 × 4.4031)0.36 + 2 × (2 × 4.4031 × 5.8896)0.36 + 2 × (0.6679 × 4.4031 × 5.8896)0.36 + (4.4031 × 2 × 4.4031)0.36 + (5.8896 × 4.4031 × 5.8896)0.36 ) 65.1802

χp′ ) 2 × (17.6593 × 4.4031 × 5.8896 × 4.4031)-0.36 +

3

2 × (17.6593 × 4.4031 × 2 × 4.4031)-0.36 + 2 × (4.4031 × 5.8896 × 4.4031 × 5.8896)-0.36 + 2 × (2 × 4.4031 × 5.8896 × 4.4031)-0.36 + 2 × (4.4031 × 2 × 4.4031 × 5.8896)-0.36 + 2 × (0.6679 × 4.4031 × 5.8896 × 4.4031)-0.36 ) 1.4981

3

2 × (17.6593 × 4.4031 × 2 × 4.4031)0.36 + 2 × (4.4031 × 5.8896 × 4.4031 × 5.8896)0.36 + 2 × (2 × 4.4031 × 5.8896 × 4.4031)0.36 + 2 × (4.4031 × 2 × 4.4031 × 5.8896)0.36 + 2 × (0.6679 × 4.4031 × 5.8896 × 4.4031)0.36 ) 110.5569

χc′ ) 2 × (17.6593 × 4.4031 × 5.8896 × 2)-0.36 +

3

(0.6679 × 4.4031 × 5.8896 × 5.8896)-0.36 ) 0.3609 χp′ )

4

2 × (17.6593 × 4.4031 × 5.8896 × 4.4031 × 5.8896)-0.36 + 3 × (2 × 4.4031 × 5.8896 × 4.4031 × 5.8896)-0.36 + 2 × (17.6593 × 4.4031 × 2 × 4.4031 × 5.8896)-0.36 + 2 × (4.4031 × 5.8896 × 4.4031 × 2 × 4.4031)-0.36 + 2 × (0.6679 × 4.4031 × 5.8896 × 4.4031 × 2)-0.36 + 2 × (0.6679 × 4.4031 × 5.8896 × 4.4031 × 17.6593)-0.36 +

χp′′ ) 2 × (17.6593 × 4.4031 × 5.8896 × 4.4031)0.36 +

3

χc′′ ) 2 × (17.6593 × 4.4031 × 5.8896 × 2)0.36 + (0.6679 × 4.4031 × 5.8896 × 5.8896)0.36 ) 28.5825

4

χp′′ ) 2 × (17.6593 × 4.4031 × 5.8896 × 4.4031 × 5.8896)0.36 + 3 × (2 × 4.4031 × 5.8896 × 4.4031 × 5.8896)0.36 + 2 × (17.6593 × 4.4031 × 2 × 4.4031 × 5.8896)0.36 + 2 × (4.4031 × 5.8896 × 4.4031 × 2 × 4.4031)0.36 + 2 × (0.6679 × 4.4031 × 5.8896 × 4.4031 × 2)0.36 + 2 × (0.6679 × 4.4031 × 5.8896 × 4.4031 × 17.6593)0.36 + (4.4031 × 5.8896 × 4.4031 × 5.8896 × 4.4031)0.36 + (17.6593 × 4.4031 × 2 × 4.4031 × 17.6593)0.36 ) 248.7945 4

5

Figure 1. Structure and δi′ values of thiobarbituric acid.

χc′′ ) 0

χp′′ ) 6 × (5.8896 × 4.4031 × 5.8896 × 4.4031 × 2 × 4.4031)0.36 + 2 × (4.4031 × 5.8896 × 4.4031 × 2 ×

4168 Ind. Eng. Chem. Res., Vol. 48, No. 8, 2009

Figure 2. Results of optimization of parameter b and y for a ) 1.00.

4.4031 × 17.6593)0.36 + 2 × (0.6679 × 4.4031 × 5.8896 × 4.4031 × 2 × 4.4031)0.36 + 2 × (4.4031 × 5.8896 × 4.4031 × 5.8896 × 4.4031 × 17.6593)0.36 ) 325.3394 χpc′′ )

4

4 × (17.6593 × 4.4031 × 5.8896 × 2 × 4.4031)0.36 + 2 × (0.6679 × 4.4031 × 5.8896 × 5.8896 × 4.4031)0.36 ) 97.4729 χpc′′ ) 0

5

χch′′ ) (5.8896 × 4.4031 × 5.8896 × 4.4031 × 2 × 4.4031)0.36 ) 22.8084 3. Data Set The quantitative structure-property relationships (QSPR) research started with collecting the data set. The experimental molar diamagnetic susceptibilities, χm, data were gathered from refs 30 and 36-39. A total of 1081 organic compounds were selected as the data set (see TS1 and TS2, which are given as Supporting Information). The quality and robustness of the predictive power of a QSPR model are heavily dependent on the diversity of the data set. To select significant descriptors for the QSPR model which captures all the underlying interaction mechanisms, it is advisable to represent as many structural features as possible in the data set. The working data set included hydrocarbons, nonhydrocarbons, and their substituted compounds. 4. Search for Optimal Values of Parameters a, b, and y To find the optimal values of the parameters a, b, and y for the five-parameter regression model of the molar diamagnetic susceptibilities of 1081 organic compounds, we initially varied a, b, and y in the intervals [1.00, 1.15], [0.4, 5.0], and [0.46 0.30], respectively. The interval is selected according to results of our pretest. The comparison of the five-parameter regression models based on the novel connectivity index mχ′ and its converse index mχ′′ from different pairs (a, b, and y) is made on the basis of their standard error of estimate. The best subsets regression analysis method is applied to select optimal values

of the parameters a, b, and y for linear QSPR models. The algorithm is described as follows: (1) Let a be equal to a fixed value (for example, a ) 1.00), search for the local optimal values of parameters y and b. (2) Vary the value of a (a ) 1.05, 1.10, 1.15,...), search for the other local optimal values of parameters y and b for other values of a. (3) Compare the different local optimal values of parameters y and b for different values of a, search for the optimal values of parameters a, b, and y. The standard error of the five-parameter regression models, which are constructed by the best subsets regression of molar diamagnetic susceptibilities (-χm) versus 0χ′, 1χ′, 2χ′, 3χp′, 3χc′, 4 χp′, 4χc′, 5χp′, 4χpc′, 5χpc′, χch′, 0χ′′, 1χ′′, 2χ′′, 3χp′′, 3χc′′, 4χp′′, 4 χc′′, 5χp′′, 4χpc′′, 5χpc′′, and χch′′ of 1081 compounds for the various values of the parameters a, b, and y, are listed in TS3 (given in the Supporting Information). The boldfaced values in TS3 are the local minimum of the standard error of estimate for different a and y. Figures 2-6 graphically illustrate the results of optimization for different a, b, and y values. It can be found that a best five-parameter model with the least standard deviation (s ) 5.2513) can be constructed from the pair (a ) 1.10, b ) 2.8, and y ) 0.36). Thus, the optimal values of parameters a, b, and y are 1.10, 2.8, and 0.36 in this study. 5. Multivariate Linear Regression Model In order to obtain effective QSPR models, the data set was divided into two data sets: a training set and test set. The training and test sets represented 66.67% (720 data points) and 33.33% (361 data points), respectively, of the data. The k-means cluster analysis (k-MCA) may be used in training and test set design.40-44 The idea consists of carrying out a partition of the set of compounds under study in several statistically representative classes of chemicals. Then, one may select from the members of all these classes the training and test sets. This procedure ensures that any chemical class (as determined by the clusters derived from k-MCA) will be represented in both compounds sets (training and test). It permits the design of both training and predicting sets which are representative of the entire

Ind. Eng. Chem. Res., Vol. 48, No. 8, 2009 4169

Figure 3. Results of optimization of parameter b and y for a ) 1.05.

Figure 4. Results of optimization of parameter b and y for a ) 1.10.

“experimental universe”. Figure 7 graphically illustrates the above-described procedure. The k-MCA uses Euclidean distance. The researcher must specify in advance the desired number of clusters, k. Initial cluster centers are chosen randomly in a first pass of the data, and then, each additional iteration group observations based on nearest Euclidean distance to the mean of the cluster. That is, the algorithm seeks to minimize within-cluster variance and maximize variability between clusters in an ANOVA-like fashion. Cluster centers change at each pass. The process continues until cluster means do not shift more than a given cutoff value or the iteration limit is reached. In this paper, the k-MCA was carried by SPSS. From the results of best subset regression analysis, it has been found that only five indexes (0χ′, 1χ′, 2χ′, 1χ′′, and 2χ′′) can be included in the regression models. So, only these five indexes will be considered in the k-MCA. k-MCA splits the 1081 organic compounds in five clusters with 17, 374, 94, 353, and 243 members, respectively.

The main results of k-MCA for the 1081 organic compounds are depicted in Table 2. The selection of the training and prediction set was carried out by taking, in a random way, compounds that belong to each cluster. MLR analysis was carried out by MINITAB and SPSS. The regression of the molar diamagnetic susceptibilities, -χm, versus 0 χ′, 1χ′, 2χ′, 1χ′′, and 2χ′′ of the 720 compounds (training set), for the parameters a ) 1.10, b ) 2.8, and y ) 0.36, resulted in the best five-parameter model. The model is shown as follows: -χm × 10-6 (cgs) ) 3.1022 + 11.59530χ′ + 6.30341χ′ 1.62362χ′ - 0.96341χ′′ + 0.22722χ′′ (6) n ) 720, r ) 0.9930, r2 ) 0.9861, s ) 4.99 cgs, F ) 10141.10, AIC ) 25.23, FIT ) 68.06 where n is the number of organic compounds included in the model, r is the correlation coefficient, r2 is the square of the

4170 Ind. Eng. Chem. Res., Vol. 48, No. 8, 2009

Figure 5. Results of optimization of parameter b and y for a ) 1.15.

Figure 6. Results of optimization for different a and y values. Table 2. Main Results of the k-Means Cluster Analysis for the 1081 Organic Compounds Variance Analysis the molecular connectivity index 0

χ′ χ′ 2 χ′ 1 χ′′ 2 χ′′ 1

Figure 7. Training and test sets design throughout k-MCA.

correlation coefficient, s is the standard deviation of the regression, F is the Fisher ratio, AIC is the Akaike’s information criterion, and FIT is the Kubinyi function.45,46 The statistical parameters of model 6 are listed in Table 4. The calculated results from model 6 for 720 organic compounds are shown under the “Cal.1” column in TS1 (given in the Supporting Information). Finally, to test the prediction ability of model 6, the molar diamagnetic susceptibilities of another 361 compounds (test set) were calculated from model 6 and are shown under the “Cal.1” column in

between SSa

within SSb

Fisher ratio (F)

p-levelc

1249.08 622.01 432.43 80762.78 583628.44

5.67 3.43 4.55 33.03 206.57

220.34 181.36 95.03 2445.48 2825.34

0.00 0.00 0.00 0.00 0.00

a Variability between groups. significance.

b

Variability within groups.

c

Level of

TS2 (given in the Supporting Information). The plots of calculated molar diamagnetic susceptibility versus experimental data for all the organic compounds in this study are shown in Figures 8 and 9. These figures show good agreement between the experimental and calculated molar diamagnetic susceptibility values for the 1081organic compounds with

Ind. Eng. Chem. Res., Vol. 48, No. 8, 2009 4171 0

0

Table 3. Molar Diamagnetic Susceptibilities and χ′ ( χ′′) of Saturated Straight-Chain Alkanes compounds

-χm × 10-6

ethane n-propane n-butane n-pentane n-hexane n-heptane n-octane

27.30 40.50 57.40 63.05 74.60 85.24 96.63

0

χ′

2.0000 2.7792 3.5583 4.3375 5.1167 5.8958 6.6750

χ′′

compounds

-χm × 10-6

2.0000 3.2834 4.5669 5.8503 7.1337 8.4171 9.7006

n-nonane n-decane n-undecane n-tridecane n-tetradecane n-hexadecane

108.13 119.50 131.80 153.70 166.20 187.60

0

Figure 8. Plot of calculated vs experimental values of molar diamagnetic susceptibilities, MLR model 6, for the training set.

Figure 9. Plot of calculated vs experimental values of molar diamagnetic susceptibilities, MLR model 6, for the test set.

diverse structures. The histogram of predicting errors is shown in Figure 10, where a near-Gaussian error distribution curve centered at zero is seen. 6. Results and Discussion Figures 2-5 graphically illustrate the optimization results of parameter b and y for a ) 1.00, 1.05, 1.10, and 1.15. When a ) 1.00, it can be found that the local optimal values of parameter b are 5.0, 3.9, 3.2, 2.7, 2.1, 1.7, and 1.3 for y ) 0.34, 0.36, 0.38, 0.40, 0.42, 0.44, and 0.46, respectively. From Figure 6, we found that the pair (b ) 2.1, y ) 0.42) has characteristic standard deviation (s ) 5.3282) for a ) 1.00. In the same way, it can be found that the pairs (b ) 2.8, y ) 0.38), (b ) 2.8, y ) 0.36), and (b ) 3.5, y ) 0.32) have characteristic standard deviations (s ) 5.2697, 5.2513, and 5.2668) for a ) 1.05, 1.10,

0

χ′

7.4542 8.2333 9.0125 10.5708 11.3500 12.9083

0

χ′′

10.9840 12.2674 13.5508 16.1177 17.4011 19.9680

Figure 10. Distribution of predicting error, MLR model 6.

and 1.15, respectively. Thus, it can be seen that a best model, with the least standard deviation (s ) 5.2513), can be constructed from the pair (a ) 1.10, b ) 2.8, and y ) 0.36). In other words, the optimal values of parameters a, b, and y are 1.10, 2.8, and 0.36 in this study. From the definition of δi′, it is known that the δi′ values for a ) 1 and b ) 0 are reduced to the atomic valence connectivity index (δvi ) and the δi′ values of the carbon are the same as the original atomic valence connectivity index for alkanes, regardless of the values of a and b, because the parameter xi is equal to 2.551 when the carbon atom has a sp3 hybrid configuration. When a * 0 and b * 0, the δi′ values defined in this paper are various for different atoms due to various electron numbers, valence electron numbers, and orbital electronegativities. For the different atom groups of an atom in the same valences, the δi′ values are unequal due to different values of parameter hi or parameter xi. For example, all parameters hi are 1, so the δvi value is 3 for sssCH, dsCH, and tCH. However, as the parameters xi are 2.551, 2.64, and 2.818, the δi′ values are 3.0000, 3.3023, and 3.9643, respectively, for a ) 1.10 and b ) 2.8. Besides, δi′ of the atom in the conjugated π-electron system is multiplied by a factor of 0.98 because the π-electron not only belongs to an atom but also is shared by more atoms in these conjugated system. For example, the δi′ values of dsCH and dsCH(c) are different, they are 3.3023 and 3.2363, respectively, for a ) 1.10 and b ) 2.8. These facts show that the definition of the new δi′ of a skeletal atom expresses both electronic and topological information and can reflect the different chemical environment of the given atom. At the same time, from the definitions of the variable connectivity index mχ′ and its converse index mχ′′, it can be found that the basic features of original molecular connectivity indices are maintained in our variable valence molecular connectivity indices. When y ) 0.5 (-0.5), the variable molecular connectivity index mχ′ (mχ′′) is reduced to the molecular valence connectivity index mχvk . Therefore, the variable valence molecular connectivity indices preserve the

4172 Ind. Eng. Chem. Res., Vol. 48, No. 8, 2009 Table 4. Comparison of Several Regression Models for the Molar Diamagnetic Susceptibilities of Organic Compounds descriptors model model model model model ref 30 ref 30

6 7 ref 34 8 9 10

0

1

2

1

2

χ′, χ′, χ′, χ′′, χ′′ χ′, 1χ′, 2χ′, 4χp′, χch 0 χ, 1χ, 4χpc, 5χpc, χch 0 χ′, 1χ′, 3χc′, 3χp′′, 4χpc′′ 0 χ′, 1χ′, 4χpc′, 0χ′′, 3χc′′ µ0, µ1, µ2, µ3, µ4 µ0, µ1, µ2, µ3, µ4 0

n

r

r2

s

F

AIC

FIT

720 720 720 233 85 233 85

0.9930 0.9921 0.9899 0.9927 0.9913 0.980 0.980

0.9861 0.9842 0.9800 0.9854 0.9826 0.960 0.960

4.99 5.32 5.99 3.69 2.52 6.06 3.82

10141.10 8901.40 6988.29 3064.38 894.06 1089.60 379.20

25.23 28.69 36.39 14.12 7.05 38.17 16.21

68.06 59.74 46.90 59.39 40.64 21.12 17.24

features of the original molecular connectivity indices; it may be useful for QSAR and QSPR studies. The diamagnetic property of a compound is determined mainly by its molecular structure. Different compounds have different molecular structures, namely, different adjacency matrixes of molecular graphs, and their mχ′ (mχ′′) values are different. On the basis of expressions 4 and 5, 0χ′ (0χ′′) is the sum of atomic δi′-0.36 (δi′0.36) of organic compound chemical formula and 1χ′ (1χ′′) is the sum of chemical bonds of the chemical formula. Thus, 0χ′ (0χ′′) and 1χ′ (1χ′′) contain atomic and bond contributions to the diamagnetic susceptibility of the compound. The 0χ′ (0χ′′) is proportional to the number of atom of organic compound chemical formula, so the -χm is positively correlated with 0χ′ (0χ′′). The molar diamagnetic susceptibilities and 0χ′ (0χ′′) values of thirteen saturated straight-chain alkanes are listed in Table 3. It can be found that the molar diamagnetic susceptibilities are increased along with the numbers of carbon atom. The result of the regression has shown that the molar diamagnetic susceptibilities, -χm, is highly correlative with the0χ′ (0χ′′) for the 13 compounds, and the correlation coefficient r is 0.9994 (0.9994). The more chemical bonds there are in an organic compound, the smaller diamagnetic susceptibility the compound has, so the -χm is negatively correlated with 1χ′ (1χ′′). For example, the 1χ′ (1χ′′) are 2.1654 (4.2140), 2.4284 (6.5890) for n-butane and cyclobutane, respectively, their -χm values are 57.4 and 40.0. These facts are agreement with the results of regression analysis. In model 6, the coefficient of 0χ′ is positive and the coefficient of 1χ′′ is negative. However, the coefficient of 1χ′ is positive. The results show that the infection factors contained in index 1χ′ is different from 1χ′′. The result of MLR analysis has shown that linear model 6 is a good fit, and the correlation coefficient r is 0.9930. Model 6 explains more than 98.61% of the variance in the experimental values of the molar diamagnetic susceptibilities, -χm, for these organic compounds. From TS1 and TS2 (given in the Supporting Information), one can observe that there are 76 (7.03%) compounds showing deviations greater than two standard errors (9.98). The greatest error in this data set is observed with benzil, the present approach gives error of -18.78 cgs. Some of the polyhalogenated compounds and polycyclic aromatic hydrocarbons (PAHs) have greater errors. For instance, the present approach gives errors of -17.00 and 15.92 cgs for hexabromoethane and tetracene. This result is agrees with the literature methods.30 From Figures 8 and 9, it can be found that two of the data points (nos. 689 and 1057) are quite separated from the remaining data. From the result of best subset regression analysis without them, it can be found that a best five-parameter model with the least standard deviation (s ) 5.2307) can be constructed form the five same indexes (0χ′, 1χ′, 2χ′, 1χ′′, and 2χ′′) for the pair (a ) 1.10, b ) 2.8, and y ) 0.36). At the same time, we calculated the coefficients without them in order to see the real correlations. The correlation coefficient r and the standard deviation s are 0.9927 and 4.99 for 719 organic compounds (training set). The results show that such isolated points can

not influence the variables selected for the final model, and only slight influence the regression statistics. As shown in column Cal.1 in TS1 (given in the Supporting Information), the calculated values agree well with the available experimental ones. The AAD of 720 organic compounds (the training set) is 3.72 cgs. Model 6 has been verified by the crossvalidation, using the leave-one-out method, and the correlation coefficient rcv and standard deviation scv, together with the normal r and s, are 0.9928 (0.9930), 5.07 (4.99) cgs, respectively. These data reveal that the result of the cross-validations for model 6 is very close to the normal result of model 6, which means that the model constructed in this work is stable. In this paper, the working data set contains nine non-hydrogen elements (C, N, O, S, P, F, Cl, Br, and I) and includes hydrocarbons, nonhydrocarbons, and their substituted compounds (alkanes, cycloalkanes, alkenes, cycloalkenes, alkynes, carboxylic acids, alcohols, aldehydes, anhydrides, ketones, amines, halides, ethers, esters, acid halides, amides, nitro compounds, nitriles, nitrates, sulfides, thiocyannates, hydroxybenzenes, hydrazines, pyridines, pyrroles, saccharides, amino acids, ureas, oxime, thioureas, sulfates, phosphates, phosphite, thiazoles, and so on). So, our method should have a broad application domain. In other words, model 6, constructed from the variable molecular connectivity index mχ′ and its converse index mχ′′ can be used to predict the molar diamagnetic susceptibilities of organic compounds with extensive structural diversities. Finally, to test the prediction ability of model 6, the molar diamagnetic susceptibilities of another 361 organic compounds were calculated from model 6. For example, the molar diamagnetic susceptibility of thiobarbituric acid is calculated as follows: -χm×10-6 (cgs) ) 3.1022 + 11.59530χ′ + 6.30341χ′ 1.62362χ′ - 0.96341χ′′ + 0.22722χ′′ ) 3.1022 + 11.5953 × 5.4627 + 6.3034 × 3.2483 - 0.6236 × 2.5395 0.9634 × 28.3520 + 0.2272 × 65.1802 ) 70.29 The experimental value (-χm) of the molar diamagnetic susceptibility of thiobarbituric acid is 72.9 cgs, the predicting error is -2.61 cgs. The calculated values of the molar diamagnetic susceptibilities of the 361 organic compounds are listed in column Cal.1 in TS2 (given in the Supporting Information). This external prediction set includes amino acids, phosphates, sulfates, barbohydrates, heterocycles, polycyclic aromatic compounds, and so on. Despite this degree of difficulty, the predicted values agree well with the experimental values, and the AAD is 4.37 cgs. In our former study,34 a novel molecular connectivity index, denoted as mχ′ and based on the adjacency matrix of molecular graphs and novel atomic valence connectivity (δi′) was proposed. A good three parameter model for molar diamagnetic susceptibilities of organic compounds was obtained from 0χ′, 1χ′, and 2 χ′, using MLR analysis. The correlation coefficient (r) and the standard deviation (s) are 0.9917 and 5.42, respectively. According our former study, the best five-parameter model of 720 organic compounds (the training set) can be obtained by

Ind. Eng. Chem. Res., Vol. 48, No. 8, 2009 4173

Figure 11. Plot of calculated vs experimental values of molar diamagnetic susceptibilities, MLR model 7, for training set.

Figure 13. Distribution of predicting error, MLR model 7.

model 6). From Table 4, it can be found that, comparing with model 7, the statistical parameters in model 6 improved with increases in the value of the r, r2, F, and FIT and decreases in the value of s and AIC. From Figures 10 and 13, it can be found that, comparing with the model7, the standard deviation of predicted values from model 6 is decreased by 5.79% for 1081 organic compounds. The results show that the novel valence molecular connectivity indexes in this study are more effective than our formal molecular connectivity indexes for the prediction of the molar diamagnetic susceptibilities of organic compounds. To compare the novel valence molecular connectivity indexes in this study with original molecular connectivity indexes, the 0 χ, 1χ, 2χ, 3χp,3χc,4χp,4χc,5χp,4χpc,5χpc, and χch were calculated and regressed with the molar diamagnetic susceptibilities (-χm) of 720 organic compounds (the training set). The result of best five-parameter regression is shown as follows: Figure 12. Plot of calculated vs experimental values of molar diamagnetic susceptibilities, MLR model 7, for test set.

the best subsets regression analysis method. The model is shown as follows: -χm×10-6 (cgs) ) 3.7601 + 9.77660χ′ + 8.78971χ′ 1.45272χ′ - 2.29624χp′ - 24.0789χch′ (7) n ) 720, r ) 0.9921, r2 ) 0.9842, s ) 5.32 cgs, F ) 8901.40, AIC ) 28.69, FIT ) 59.74 The statistical parameters of model 7 are listed in Table 4. The calculated results from model 7 for 720 organic compounds are shown in the Cal.2 column in TS1 (given in the Supporting Information), and the AAD of 720 organic compounds (the training set) is 4.07 cgs. To test the prediction ability of model 7, the molar diamagnetic susceptibilities of another 361 compounds were calculated from model 7 and are listed in the Cal.2 column of TS2 (given in the Supporting Information), and the AAD is 4.63 cgs. The plots of the calculated molar diamagnetic susceptibility from the model 7 versus the experimental data for all the organic compounds in this study are shown in Figures 11 and 12. The histogram of predicting errors from model 7 for the 1081 organic compounds is shown in Figure 13. From TS1 and TS2 (given in the Supporting Information), one can find there are 92 (8.51%) compounds showing deviations greater than 9.98 (two standard errors of

-χm×10-6 (cgs) ) 6.3110 + 5.77090χ + 14.05131χ + 1.73234χpc - 3.38835χpc - 35.839χch (8) n ) 720, r ) 0.9899, r2 ) 0.9800, s ) 5.99 cgs, F ) 6988.29, AIC ) 36.39, FIT ) 46.90 The statistical parameters of model 8 are listed in Table 4. The calculated results from model 8 for 720 organic compounds are shown in the Cal.3 column in TS1 (given in the Supporting Information), and the AAD of 720 organic compounds (the training set) is 4.45 cgs. To test the prediction ability of model 8, the molar diamagnetic susceptibilities of another 361 compounds were calculated from model 8 and are listed in the Cal.3 column of TS2 (given in the Supporting Information), and the AAD is 5.10 cgs. The plots of the calculated molar diamagnetic susceptibility from model 8 versus the experimental data for all the organic compounds in this study are shown in Figures 14 and 15. The histogram of predicting errors from model 8 for the 1081 organic compounds is shown in Figure 16. From TS1 and TS2 (given in the Supporting Information), one can find there are 124 (11.47%) compounds showing deviations greater than 9.98 (two standard errors of model 6). The greatest error in this data set is observed with tetranitromethane, model 8 gives error of -31.28. From Table 4, it can be found that, comparing with model 8, the statistical parameters in model 6 improved with increases in the value of the r, r2, F, and FIT and decreases with the value of s and AIC. From Figures 10 and 16, it can be found that, comparing with

4174 Ind. Eng. Chem. Res., Vol. 48, No. 8, 2009

nectivity indexes for the prediction of the molar diamagnetic susceptibilities of organic compounds. In ref 30, Estrada et al., using the topological substructural molecular design (TOSS-MODE) approach, had developed two five-parameter models to estimate the diamagnetic susceptibilities of aliphatic and aromatic compounds. The correlation coefficient (r), the standard deviation (s), the Akaike’s information criterion (AIC), and the Kubinyi function (FIT) for the two models are 0.980, 0.980, 6.06 cgs, 3.82 cgs, 38.17, 16.21, 21.12, and 17.24, respectively. According to our method, if the diamagnetic susceptibilities of only the 233 aliphatic and 85 aromatic compounds from ref 30 are related to the novel molecular connectivity indexes respectively, the two fiveparameter models can be constructed by best subsets regression analysis. The two models are shown as follows:

Figure 14. Plot of calculated vs experimental values of molar diamagnetic susceptibilities, MLR model 8, for training set.

-χm×10-6 (cgs) ) -2.7278 + 13.23550χ′ + 3.36181χ′ 2.08243χc′ - 0.28893χp′′ + 0.12234χpc′′ (9) n ) 233, r ) 0.9927, r2 ) 0.9854, s ) 3.69 cgs, F ) 3064.38, AIC ) 14.12, FIT ) 59.39 -χm×10-6 (cgs) ) 11.4159 + 6.40620χ′ + 11.39661χ′ + 1.22104χpc′ - 1.05170χ′′ - 0.11033χc′′ (10) n ) 85, r ) 0.9913, r2 ) 0.9826, s ) 2.52 cgs, F ) 894.06, AIC ) 7.05, FIT ) 40.64 The statistical parameters of models 9 and 10 are listed in Table 4. From Table 4, it can be found that the standard deviations (s) values are decreased by 39.11% and 34.03%, respectively. The Akaike’s information criterions (AIC) are decreased, and the Kubinyi functions (FIT) are increased. The results show that the current method is more effective than the topological sub structural molecular design (TOSS-MODE) approach for the prediction of the molar diamagnetic of organic compounds. 7. Conclusion

Figure 15. Plot of calculated vs experimental values of molar diamagnetic susceptibilities, MLR model 8, for test set.

Figure 16. Distribution of predicting error, MLR model 8.

model 8, the standard deviation of predicted values from model 6 is decreased by 16.11% for 1081 organic compounds. The results show that the novel valence molecular connectivity indexes are more effective than the original molecular con-

For predicting the molar diamagnetic susceptibility of an organic compound, a variable molecular connectivity index mχ′ and its converse index mχ′′ have been proposed based on adjacency matrix of molecular graphs and the variable atomic valence connectivity index δi′. The optimal values of parameters a, b, and y included in the definition of δi′, and mχ′ and mχ′′ can be found by an optimization method. When a ) 1.10, b ) 2.8, and y ) 0.36, a good five-parameter QSPR model for the molar diamagnetic susceptibilities can be constructed from mχ′ and m χ′′ by using the best subsets regression analysis method. The correlation coefficient r, standard error s, and AAD of the MLR model are 0.9930, 4.99 cgs, and 3.72 cgs, respectively, for the 720 organic compounds (the training set). The cross-validation by using the leave-one-out method demonstrates that the MLR model is highly reliable from the point of view of statistics. The AAD of predicted values of the molar diamagnetic susceptibility of another 361 organic compounds (the test set) is 4.37 cgs for the MLR model. The results show that the current method is more effective than the literature methods for estimating the molar diamagnetic susceptibilities of organic compounds. The MLR method can provide an acceptable model for the prediction of the molar diamagnetic susceptibilities of organic compounds. Acknowledgment This work is supported by the university natural science foundation of Jiangsu Province in china (contract grant number:

Ind. Eng. Chem. Res., Vol. 48, No. 8, 2009 4175

04KJD150195). The authors express our gratitude to the referees for their valueable comments. Supporting Information Available: Experimental and calculated molar diamagnetic susceptibilities of 720 organic compounds (the training set) and another 361 organic compounds (the test set); values of the standard error of estimates for various values of the parameters a, b, and y. (PDF files.) This material is available free of charge via the Internet at http:// pubs.acs.org. Literature Cited (1) Wiener, H. Structural Determination of Paraffin Boiling Points. J. Am. Chem. Soc. 1947, 69, 17–20. (2) Randic´, M. Characterization of Molecular Branching. J. Am. Chem. Soc. 1975, 97, 6609–6615. (3) Kier, L. B.; Hall, L. H. Molecular ConnectiVity in Structure-ActiVity Analysis; Research Studies: Letchworth, England, 1986. (4) Devillers, J., Balaban, A. T. Topological Indices and Related descriptors in QSPR and QSAR; Gordon & Breach: Amsterdam, 1999. (5) Atkins, A. T. Physical Chemistry, 3rd ed.; Oxford University Press: Oxford, 1986; p 699. (6) Vulfsan, S. G. Molecular Magnetochemistry; Gordon & Breach: Amsterdam, 1998. (7) Simion, D. V.; Sorensen, T. S. A Theoretical Computation of the Aromaticity of (Benzene) Cr(CO)3 Compared to Benzene using the Exaltation of Magnetic Susceptibility Criterion and a Comparison of Calculated and Experimental NMR Chemical Shifts in these Compounds. J. Am. Chem. Soc. 1996, 118, 7345–7352. (8) Ania, F.; Balta-Calleja, F. J. Principles and Applications of Diamagnetism in Polymers. Stud. Phys. Theor. Chem. 1992, 77, 527–554. (9) Katsuki, A.; Tokunaga, R.; Watanabe, S.; Tanimoto, Y. The Effect of High Magnetic Field on the Growth of Benzophenone. Chem. Lett. 1996, 607–608. (10) Mccarthy, M. J.; Zion, B.; Chen, P.; Ablett, S.; Darke, A. H.; Lillford, P. J. Diamagnetic Susceptibility Changes in Apple Tissue after Bruising. J. Sci. Food Agric. 1995, 67, 13–20. (11) Ligabue, A.; Soncini, A.; Lazzeretti, P. The Leap-Frog Effect of Ring Currents in Benzene. J. Am. Chem. Soc. 2002, 124 (9), 2008–2014. (12) Putz, M. V.; Russo, N.; Sicilia, E. Atomic Radii Scale and Related Size Properties from Density Functional Electronegativity Formulation. J. Phys. Chem. A. 2003, 107 (28), 5461–5465. (13) Hameka, H. F. Theory of Diamagnetic Susceptibility of Alkanes. J. Chem. Phys. 1961, 34, 1996–2000. (14) O’Sullivan, P. S.; Hameka, H. F. A Semiempirical Theory of Diamagnetic Susceptibilities with Particular Emphasis on Oxygen-containing Organic Molecules. J. Am. Chem. Soc. 1970, 92, 25–32. (15) O’Sullivan, P. S.; Hameka, H. F. A Semiempirical Description of the Diamagnetic Susceptibilities of Aromatic Molecules. J. Am. Chem. Soc. 1970, 92, 1821–1824. (16) Stockham, M. E.; Hameka, H. F. Semiempirical Magnetic Susceptibilities of Benzene Derivatives. J. Am. Chem. Soc. 1972, 94, 4076–4078. (17) Haley, L. V.; Hameka, H. F. Semiempirical Magnetic Susceptibilities of Nitrogen-containing heterocycles. J. Am. Chem. Soc. 1974, 96, 2020– 2024. (18) Pollock, E. L.; Runge, K. J. Path Integral Study of Magnetic Response: Excitonic and Biexcitonic Diamagnetism in Semiconductor Quantum Dots. J. Chem. Phys. 1992, 96, 674–680. (19) Romera, E.; Dehesa, J. S. Weizsaecker Energy of Many-electron Systems. Phys. ReV. A 1994, 50, 256–266. (20) Sulzbach, H. M.; Schleyer, P. v.; Jiao, H. J.; Xie, Y. M.; Schaefer, H. F. A. [10] annulene isomer may be aromatic, after all. J. Am. Chem. Soc. 1995, 117, 1369–1373. (21) De Luca, G.; Russo, N.; Sicilia, E.; Toscano, M. Molecular Quadrupole Moments, Second Moments, and Diamagnetic Susceptibility Evaluation Using the Generalized Gradient Approximation in the Framework of Gaussian Density Functional Method. J. Chem. Phys. 1996, 105, 3206– 3210. (22) King, J. W.; Molnar, S. P. Correlation of Organic Diamagnetic Susceptibility with Structure via the Integrated Molecular Transform. Int. J. Quantum Chem. 1997, 64, 635–645. (23) You, X.-Z. Structure and Properties of Coordination Compounds; Science Press: Beijing, China, 1992.

(24) Randic´, M. Graph theoretical analysis of diamagnetic susceptibilities of alkanes and alkenes. Chem. Phys. Lett. 1978, 53, 602–605. (25) Kier, L. B.; Hall, L. H. Molecular ConnectiVity in Chemistry and Drug Research; Academic Press: New York, 1976. (26) Schmalz, T. G.; Klein, D. J.; Sandleback, B. L. Chemical Graph Theoretical Cluster Expansion and Diamagnetic Susceptibility. J. Chem. Inf. Comput. Sci. 1992, 32, 54–57. (27) Klein, D. J. Chemical Graph-Theoretic Cluster Expansions. Int. J. Quantum Chem. 1986, S20, 153–171. (28) Estrada, E. Modelling the Diamagnetic Susceptibility of Organic Compounds by a Sub-structural graph-theoretical Approach. J. Chem. Soc., Faraday Trans. 1998, 94, 1407–1410. (29) Estrada, E.; Gutierrez, Y. Modeling Chromatographic Parameters by a Novel Graph Theoretical Sub-structural Approach. J. Chromatogr., A 1999, 858, 187–199. (30) Estrada, E.; Gutierrez, Y.; Gonza´lez, H. Modeling Diamagnetic and Magnetooptic Properties of Organic Compounds with the TOSS-MODE Approach. J. Chem. Inf. Comput. Sci. 2000, 40, 1386–1399. (31) Mu, L. L.; Feng, C. J.; He, H. M. Topological Research on Molar Diamagnetic Susceptibilities for Inorganic Compounds. MATCH Commun. Math. Comput. Chem. 2007, 58 (3), 591–607. (32) Mu, L. L.; He, H. M.; Feng, C. J. Quantitative Structure Property Relations (QSPR) for Predicting Molar Diamagnetic Susceptibilities, χm, of Inorganic Compounds. Chin. J. Chem. 2007, 25, 743–750. (33) Mu, L. L.; Feng, C. J.; He, H. M. Topological research on diamagnetic susceptibilities of organic compounds. J. Mol. Model. 2008, 14, 109–134. (34) Mu, L. L.; Feng, C. J.; He, H. M. Modeling diamagnetic susceptibilities of organic compounds with a novel connectivity index. Ind. Eng. Chem. Res. 2008, 47, 2428–2433. (35) Li, G. Y. A new graduation of orbital electronegativity. Chin. UniV. Chem. 2006, 16 (5), 57–61. (36) David, R. L.; Grace, B.; Lev, I. B.; Robert, N. G.; Henry, V. K.; Kozo, K.; Gerd, R.; Dana, L. R.; Daniel, Z. CRC Handbook of Chemistry and Physics, 85th ed.; CRC Press: Boca Raton, FL, 2002; pp 4-143–4-148. (37) Yao, Y. B.; Xie, T.; Gao, Y. M. Handbook of Physical Chemistry; Shanghai technology publishing company: China, 1985; pp 247-276. (38) Lawrence, V. H.; Hendrik, F. H. Semiempirical magnetic susceptibilities of nitrogen-containing heterocycles. J. Am. Chem. Soc. 1974, 96 (7), 2020–2024. (39) Hyp, J. D. J.; James, D. W.; John, L. L. Diamagnetic susceptibility exaltation in hydrocarbons. J. Am. Chem. Soc. 1969, 91 (8), 1991–1998. (40) Gonza´lez, M. P.; Helguera, A. M.; Cabrera, M. A. Quantitative structure-activity relationship to predict toxicological properties of benzene derivative compounds. Bioorg. Med. Chem. 2005, 13, 1775–1781. (41) Gonza´lez, M. P.; Dias, L. C.; Helguera, A. M.; Rodriguez, Y. M.; de Oliveira, L. G.; Gomez, L. T.; Diaz, H. G. TOPS-MODE based QSARs derived from heterogeneous series of compounds. Applications to the design of new anti-inflammatory compounds. Bioorg. Med. Chem. 2004, 12, 4467– 4475. (42) Molina, E.; Diaz, H. G.; Gonza´lez, M. P.; Rodriguez, E.; Uriarte, E. Designing Antibacterial Compounds through a Topological Substructural Approach. J. Chem. Inf. Comput. Sci. 2004, 44, 515–521. (43) Gonza´lez, M. P.; Gonzalez, Diaz, H.; Molina Ruiz, R.; Cabrera, M. A.; Ramos de Armas, R. TOPS-MODE Based QSARs Derived from Heterogeneous Series of Compounds. Applications to the Design of New Herbicides. J. Chem. Inf. Comput. Sci. 2003, 43, 1192–1199. (44) Gonza´lez, M. P.; Tera´n, C.; Fall, Y.; Diaz, L. C.; Morales, A. H. A topological sub-structural approach to the mutagenic activity in dental monomers. 3. Heterogeneous set of compounds. Polymer 2005, 46, 2783– 2790. (45) Saiz-Urra, L.; Gonza´lez, M. P.; Teijeira, M. QSAR studies about cytotoxicity of benzophenazines with dual inhibition toward both topoisomerases I and II: 3D-MoRSE descriptors and statistical considerations about variable selection. Bioorg. Med. Chem. 2006, 14, 7347–7358. (46) Saiz-Urra, L.; Gonza´lez, M. P.; Teijeira, M. 2D-autocorrelation descriptors for predicting cytotoxicity of naphthoquinone ester derivatives against oral human epidermoid carcinoma. Bioorg. Med. Chem. 2007, 15, 3565–3571.

ReceiVed for reView August 17, 2008 ReVised manuscript receiVed January 31, 2009 Accepted February 4, 2009 IE801252J