Neural Network−Topological Indices Approach to the Prediction of

A topological indices vector of five parameters including three grades of structural information was set up as a molecular descriptor to predict the n...
1 downloads 0 Views 152KB Size
1146

J. Chem. Inf. Comput. Sci. 1997, 37, 1146-1151

Neural Network-Topological Indices Approach to the Prediction of Properties of Alkene Shuhui Liu, Ruisheng Zhang,* Mancang Liu, and Zhide Hu Department of Chemistry, Lanzhou University, Lanzhou, Gansu 730000, Peoples’ Republic of China Received June 30, 1997X

A topological indices vector of five parameters (χ, P, w, l, s) including three grades of structural information was set up as a molecular descriptor to predict the normal boiling point, the density, and the refractive index of alkenes with a neural network. The five parameters are the connection index χ, the polarity number p, w, l representing the effect of a double bond on the properties, and s distinguishing enantiomers of alkenes. The estimation results show average accuracies of 1.3% with maximum deviations of 16%. INTRODUCTION

Since the renaissance in the later 1980s, artificial neural networks (ANNs) have been used in many fields of chemistry, e.g., various problems in spectroscopy including calibration, QSAR/QSPR studies, prediction of the secondary structure of proteins, prediction of faults and diagnosis of causes during the control of chemical processes, and classification of atomic energy levels. In particular, the newly developed ANNs have become powerful tools in QSPR research.1-3 The ANNs application in QSPR research has both advantages and disadvantages. The primary advantage is that ANNs can simulate the nonlinear relationship between structural information and properties of compounds during the training process and then generalize the knowledge among homologous series without the need for theoretical formulas. The main disadvantage is that the structural information of compounds must be converted into numerical inputs acceptable to ANNs. The formation of the numerical inputs, called a molecular descriptor, is a critical and difficult problem. This is due to the fact that a molecular descriptor must represent molecular structural features related to the properties of interest as distinctly as possible. The prediction accuracy of ANNs depends heavily on the amount of correlation between the molecular descriptor and the structural features. According to chemical graph theory, many molecular descriptors have been explored in the application with great success.4 Nevertheless, few papers have dealt with the prediction properties of unsaturated hydrocarbons to date. The reason for this is that setting up a suitable molecular descriptor of an unsaturated hydrocarbon is difficult because of the presence of multiple bonds. In this article, a topological indices (TI) set of five parameters (χ, P, w, l, s) was set up as a molecular descriptor of alkenes, and then an ANN was used to predict the normal boiling points, the densities, and the refractive indices of the compounds. The prediction result was satisfactory. THEORY

A neural network model is composed of a large number of simple processing neuron nodes or units organized into a sequence of layers. X

Abstract published in AdVance ACS Abstracts, September 1, 1997.

S0095-2338(96)00107-2 CCC: $14.00

Figure 1. Fully connected multilayer feed-forword network.

Figure 1 shows a general model of a multilayer feedforward network (only a three-layer network is given as an example). As can be seen from Figure 1, the first layer is the input layer, the last layer is the output layer, and the one in between is the hidden layer. The connection between hidden units and input or output units is called weight wij or wjk. In ANNs application for QSPR, input units generally represent structural information; output units are physical properties. Signals from input units are transferred to output units through hidden units. In this paper, a multilayer feed-forward network with the learning rule of back-propagation is used as the simulator. Its definite training process can be summarized as follows: (1) The connection weights wij, wjk are initialized to some small random values. (2) The net input and output values of each unit are propagated.

Nethj ) ∑xiwij + θj

(1)

i

Outhj ) F(Nethj ) )

1 h

1 + e-(RNetj )

(2)

where F is the sigmoid transfer function, R can be used to adjust the sigmoid function to vary between a linear (small values) and a step function (large values), and θj is a bias term or threshold value of unit j responsible for accommodating nonzero offsets in the data. © 1997 American Chemical Society

PREDICTION

OF

PROPERTIES

OF

ALKENE

Net°k ) ∑Outhj wjk + θk

J. Chem. Inf. Comput. Sci., Vol. 37, No. 6, 1997 1147

(3)

j

Outok ) F(Netok )

(4)

where θk is a bias term of unit k. (3) The error between target value Tk and the actual output value Outok for training set member is calculated.

D(i)

atom

D(i)

1 2 2 3

>CHdCd >C< >Cd

3 4 4 4

Table 2. Connection Weights for Edge Types in the Graphs Corresponding to Carbon Compounds

(5)

edge type D(i), D(j)

connection weight [D(i) D(j)]-1/2

edge type D(i), D(j)

connection weight [D(i) D(j)]-1/2

E ) ∑Ep

(6)

1, 1 1, 2 1, 3 1, 4 2, 2

1 0.7071 0.5773 0.5 0.5

2, 3 2, 4 3, 3 3, 4 4, 4

0.4082 0.3536 0.3333 0.2887 0.25

wjk(t + 1) ) wjk(t) + ηδpkOutpk

(7)

wij(t + 1) ) wij(t) + ηδpjOutpj

(8)

∂Ep ∂Netok

∂Nethj

) ROuthpj(1 - Outhpj)∑δpkwjk

(11)

i,j

) ROutopk (1 - Outopk)(Tpk - Outopk) (9)

∂Ep

(1) A double bond of an alkene was ignored temporarily, and an alkene was treated as a saturated hydrocarbon. The connection index χ(G) and polarity number p were selected as the first-grade structural parameters of an alkene representing the information about an alkene skeleton. They are introduced respectively as follows. (a) Connection index χ(G), introduced by Randic,5 is a single number characterizing the graph of a molecule.

χ ) χ(G) ) ∑[D(i) D(j)]-1/2

where η is a gain term, known as the learning rate.

δpj )

atom -CH3 dCH2 -CH2dCH-

Ep ) 1/2∑(Tpk - Outopk)2

where E is the total error of the training set. (4) E is compared with the error criteria dmax and then wij, wjk are adjusted to reduce the error. In back-propagation, the reduction of E is carried out by gradient descent. This is an iterative least squares procedure in which the algorithm tries to adjust the connection weight in a fashion which reduces the error most rapidly, by moving the state of the system downward in the direction of maximum gradient.

δpk )

Table 1. Number of the Types of Carbon Atoms

(10)

where [D(i) D(j)]-1/2 is bond weight between two atoms connected directly of a molecule and D(i) is the number of the types of carbon atoms. For a hydrocarbon, the number can be defined as follows:

D(i) ) 4 - δH

(12)

where R is the same parameter as that in formula 2. These expressions of weight adjustment show that the extent of the adjustment of connection weights to hidden layers depends on errors in subsequent layers, so modifications are made first to the output layer weights, and the error is then propagated successively back through the hidden layers. Each unit receives an amount of the error signal which is in proportion to its contribution to the output signal, and the connection weights are adjusted by an amount proportional to this error. In standard back-propagation, the error signals are collected for all output units and all training targets, and the connection weights are adjusted at the end of every epoch; i.e., all members of a training set are learned once. (5) Returning to step 2, the process is continued until E satisfies the error criteria requested. At that time, the whole training of the ANN has been completed.

Where δH is the number of hydrogen atoms connected directly with the carbon atom (except methane). Table 1 shows the number of types of various carbon atoms. As can be seen, there are only four possible types of carbon atoms. These give rise to 10 edge types (connection type) whose connectivity weights are given in Table 2. If the number of each edge type is denoted by bij (i ) 1, ..., 4; j ) 1, ..., 4) and the corresponding connectivity weights are used, then eq 11 becomes

MOLECULAR DESCRIPTOR

P ) P(G) ) 1/2∑(p3)i

It is time-consuming work to set up a suitable molecular descriptor of an alkene, because structural features that affect properties of an alkene are not only the size and shape of the compound but also the position where a double bond is located in the structure (only an alkene with a single double bond was investigated). An efficient descriptor must reflect all of this structural information as accurately as possible. In this paper, the problem was tackled through the following steps:

χ ) b11 + 0.7071b12 + 0.5773b13 + 0.5(b14 + b22) + 0.4082b23 + 0.3536b24 + 0.3333b33 + 0.2887b34 + 0.25b44 (13) (b) Polarity number p was defined by Wiener in 1947,6 which is equal to half the number of pairs of atoms separated by three bonds.

(14)

i

(2) During the step, the structural information about a double bond of an alkene was considered. After careful investigation of the relationship between the three properties of alkenes and their structures (see Table 3), the conclusion obtained was that the influence of double bonds gets weaker with the increasing size of the molecules. According to this theory, w and l are defined as the second-grade structural

1148 J. Chem. Inf. Comput. Sci., Vol. 37, No. 6, 1997

LIU

ET AL.

Table 3. Several Examples of the Relationship between the Properties and Alkene Structure C5 C5 C8 C8 C11 C11

1-pentene 2-pentene (trans) 1-octene 2-octene (trans) 1-hendecene 2-hendecne (trans)

29.96 36.35 121.30 125.00 192.70 192.50

0.6405 0.6482 0.7149 0.7199 0.7503 0.7528

1.3715 1.3793 1.4087 1.4132 1.4260 1.4292

Table 4. Results of Linear Regressions param

bp

dens

refract index

w l crit coeff

-0.657 0.202 0.183 (0.1a)

-0.559 0.259

-0.480 0.229

a

Figure 2. Structure and graph of an alkene. Table 6. Molecular Descriptor of Various Alkenes

0.1 is the value of confidence coefficient.

Table 5. Optimization of the Descriptora MSE × 103 descriptor

trial 1

trial 2

trial 3

χ, P, w, l, s χ, P, w, l χ, P, w χ, P, l χ, P, s χ, p

0.98 1.05 1.16 2.21 2.53 2.97

1.04 1.13 1.16 1.94 2.49 2.90

1.00 1.04 2.11 1.93 2.57 2.88

a The first column values represent the input of the ANN, MSE ) N ∑p∈patterns ∑j∈output(tpj - opj)2/N (N ) 82).

parameters, indicating the effects of a double bond on an alkene structure.

w ) x(G)/X(G)

(15)

where x(G) is the sum of connection indices corresponding to carbon atoms of a double bond.

l ) ∑(p3)i + ∑(p3)i dim

(16)

din

where m, n are the labeled numbers of a double bond in a molecular graph. In order to further confirm the correlativeness between parameters w, l and the three properties, a linear regression was done. The results are listed in Table 4. As can be seen, the calculated values of all correlation coefficients are greater than the standard critical one (0.183) from refs 7 and 8. This shows that w, l are correlated with the three properties. (3) Finally, because of the presence of double bonds, alkenes have enantiomers that affect properties slightly. To distinguish enantiomers, the third-grade structural parameter s is defined as follows:

{

1 trans-isomer s ) 0 no enantisomer -1 cis-isomer In order to further confirm whether one, two, or all three double descriptors are really necessary, a series of trials was done and the results are listed in Table 5. From the experimental results we can see that five-term and four-term descriptors are the best. Considering that parameter s can distinguish enantiomers of alkenes, a fiveterm descriptor was chosen in the following part. Using the method explained above, a TI set of five parameters (χ, P, w, l, s) including three grades of structural

Cn

alkene

χ

w

P

l

s

C4 C5 C6 C6 C7 C7 C7 C7 C8 C8 C8 C9 C9 C10 C10 C11 C11 C13 C14 C16 C18 C20

2-butene (trans) 2-pentene (trans) 1-butene, 3,3-dimethyl 1-butene, 2,3-dimethyl 1-pentene, 4,4-dimethyl 2-pentene, 2,4-dimethyl 1-pentene, 2-ethyl 1-pentene, 3-ethyl 1-hexene, 2-ethyl 1-octene 2-heptene, 2-methyl 1-nonene 1-octene, 2-methyl 3-nonene, 2-methyl 2-octene, 2,6-dimethyl 2-hendecne (trans) 1-hendecene 4-nonene, 5-butyl 1-tetradecene 1-hexadecene 9-octadcene 1-eicosene

1.4879 2.0259 2.1969 2.2969 2.6700 2.7766 2.9750 2.9721 3.4750 3.5235 3.1926 4.0235 3.9143 4.4365 4.2977 5.0259 5.0235 4.5254 6.5235 7.5235 8.5639 9.5235

1.0000 0.6510 0.3172 0.4973 0.3058 0.5841 0.3570 0.2495 0.3053 0.2317 0.5320 0.2030 0.3084 0.2430 0.3950 0.2620 0.1625 0.3103 0.1250 0.1090 0.1342 0.0860

0.5 1 1.5 2 2 2 2.5 3 2.5 2.5 2.5 3 3 3.5 4 4 4 6.5 5.5 6.5 7.5 8.5

0 1 3 2 4 2 3 4 3 2 2 1 2 4 2 2 2 6 2 2 4 2

1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

information was set up as a molecular descriptor in the QSPR research. The set is unique for each alkene molecule. For example, the molecular descriptor of 1-pentene, 2,4,4trimethyl (Figure 2) is (3.0608, 2.5, 0.2576, 3, 0). Table 6 lists these five parameters for various alkenes. RESULTS AND DISCUSSION

A Stuttgart Neural Network Simulator release 4.0sSNNS4.0swas used throughout this research.9 It was installed on an Intel 80486/DX-2 computer with 16MB RAM running Linux Slackware release 2.0.10 A set of 82 alkenes and their three properties with the number of carbon atoms varying from 5 to 20 were derived from ref 11. The compounds were divided randomly into three sets: a training set (49 members), a validation set (17 members), and a test set (16 members). They are listed in Tables 7-9. Before the prediction ability is demonstrated, the ANN is not only trained but also optimized. Optimization is very important in ensuring highly accurate estimates. Generally, optimization of an ANN includes three aspects: the number of hidden units (topology structure of an ANN), learning parameters, and learning epochs. Several network architectures were investigated to determine the optimal hidden units’ number. The results are listed in Table 10. As can be seen, a lower training set MSE (Vide infra) does not always mean a lower cross-validation set MSE. From the table the number 5 was considered to be

PREDICTION

OF

PROPERTIES

OF

ALKENE

J. Chem. Inf. Comput. Sci., Vol. 37, No. 6, 1997 1149

Table 7. List of 49 Alkenes and Their Properties Used as a Training Set

Table 8. List of 17 Alkenes and Their Properties Used as a Validation Set

Cn

alkene

bp, °C

dens, g/cm3

refract index

C5 C5 C5 C5 C6 C6 C6 C6 C6 C6 C6 C6 C6 C6 C6 C7 C7 C7 C7 C7 C7 C7 C7 C7 C7 C7 C7 C7 C7 C7 C8 C8 C8 C8 C8 C8 C9 C9 C10 C10 C10 C10 C11 C11 C13 C14 C16 C18 C20

1-butene, 3-methyl 1-pentene 2-pentene (cis) 2-pentene (trans) 1-butene, 3,3-dimethyl 1-butene, 2-ethyl 1-hexene 1-hexene (cis) 3-hexene (trans) 1-pentene, 2-methyl 2-pentene, 2-methyl 2-pentene, 3-methyl (cis) 2-pentene, 3-methyl (trans) 2-butene, 2,3-dimethyl 1-butene, 2,3-dimethyl 1-butene, 2,3,3-trimethyl 2-heptene (cis) 2-heptene (trans) 2-pentene, 3,4-dimethyl 2-pentene, 4,4-dimethyl (trans) 2-pentene, 3-ethyl 1-hexene, 2-methyl 2-hexene, 2-methyl 1-butene, 2-ethyl-3-methyl 1-pentene, 2,3-dimethyl 1-pentene, 3,3-dimethyl 1-pentene, 4,4-dimethyl 2-pentene, 2,4-dimethyl 1-pentene, 2-ethyl 1-pentene, 3-ethyl 1-hexene, 2-ethyl 1-octene 2-heptene, 2-methyl 2-pentene, 2,4,4-trimethyl 4-octene (cis) 4-octene (trans) 1-nonene 1-octene, 2-methyl 3-nonene, 2-methyl 2-octene, 2,6-dimethyl 1-octene, 3,7-dimethyl 1-decene 2-hendecne (trans) 1-hendecene 4-nonene, 5-butyl 1-tetradecene 1-hexadecene 9-octadcene 1-eicosene

20.00 29.96 36.90 36.35 41.20 64.70 63.35 66.44 67.08 60.70 67.29 67.63 70.45 73.20 55.67 77.87 98.50 98.00 87.00 76.75 96.01 91.10 96.50 89.00 84.26 77.54 72.49 83.44 94.00 85.00 120.00 121.30 122.60 104.90 122.50 122.30 146.00 114.80 161.00 163.00 154.00 170.56 192.50 192.70 215.50 233.00 284.40 310.00 341.00

0.6272 0.6405 0.6556 0.6482 0.6529 0.6894 0.6731 0.6796 0.6772 0.6799 0.6863 0.6986 0.6942 0.7080 0.6803 0.7050 0.7080 0.7012 0.7126 0.6889 0.7204 0.7000 0.7100 0.7150 0.7051 0.6974 0.6827 0.6954 0.7079 0.6948 0.7270 0.7149 0.7241 0.7218 0.7212 0.7141 0.7300 0.7343 0.7340 0.7460 0.7396 0.7408 0.7528 0.7503 0.7745 0.7745 0.7811 0.7916 0.7882

1.3643 1.3715 1.3830 1.3793 1.3763 1.3969 1.3837 1.3947 1.3943 1.3920 1.4004 1.4045 1.4016 1.4122 1.3995 1.4025 1.4060 1.4045 1.4070 1.3982 1.4148 1.4040 1.4040 1.4100 1.4030 1.3984 1.3918 1.4040 1.4052 1.3966 1.4157 1.4087 1.4170 1.4160 1.4148 1.4114 1.4140 1.4184 1.4202 1.4250 1.4212 1.4215 1.4292 1.4260 1.4375 1.4351 1.4412 1.4470 1.4400

the best. Using the same method, the best learning rate was determined to be 0.2. In order to avoid overtraining of the ANN, the early-stopping method was used to achieve the optimal learning epoch.12 The value can be deried directly from the error graph interface of SNNS (Figure 3). As can be seen, the training line always decreases with an increase of epochs, while the validation line has the lowest range. When the epoch reaches the range, the process of training is stopped. At this time, the best epoch can be achieved. From Figure 3, the optimal epoch in our work is about 10 000. A momentum term was added to eq 7 and eq 8 to speed the convergence of the ANN. However, this method led to higher MSE values of the training and the cross-validation sets. Hence, in the paper a standard backpropagation algorithm was used to obtain a better prediction result. In order to further confirm the quality of the ANN model, plots of experimental properties versus the ANN calculated

Cn

alkene

bp, °C

C5 C6 C6 C6 C7 C7 C7 C8 C8 C8 C8 C9 C10 C10 C11 C12 C15

2-butene, 2-methyl 2-hexene (cis) 2-hexene (trans) 1-pentene, 3-methyl 1-pentene, 2,4-dimethyl 1-hexene, 3-methyl 1-heptene 3-octene (cis) 3-octene (trans) 1-hextene, 3-ethyl 2-hexene, 2,3-dimethyl 4-nonene (trans) 5-decene (cis) 5-decene (trans) 5-hendecene (trans) 1-dodecene 1-pentadecene

38.57 68.84 68.00 51.14 81.64 84.00 93.64 122.90 123.30 118.20 122.10 140.00 170.00 170.20 192.00 213.00 268.17

dens, kg/cm3 refract index 0.6623 0.6869 0.6784 0.6675 0.6943 0.6945 0.6970 0.7189 0.7152 0.7206 0.7405 0.7318 0.7445 0.7401 0.7497 0.7584 0.7764

1.3874 1.3977 1.3935 1.3841 1.3986 1.3970 1.3998 1.4135 1.4126 1.4120 1.4269 1.4205 1.4258 1.4243 1.4285 1.4300 1.4389

Table 9. List of the Experimental and Predicted Properties of 16 Alkenes as a Test Seta Cn

alkene

C5

1-butene, 2-methyl

C6

2-pentene, 4-methyl (cis)

C6

2-pentene, 4-methyl (trans)

C6

1-pentene, 3-methyl

C7

3-heptene (cis)

C7

3-heptene (trans)

C7

2-pentene, 2,3-dimethyl

C7

1-hexene, 4-methyl

C8

2-octene (cis)

C8

2-octene (trans)

C8

2-hexene, 2,5-dimethyl

C8

1-pentene, 2,4,4-trimethyl

C9

3-nonene (trans)

C13

1-tridecene

C18

1-octadecene

C17

1-heptadcene

RSD a

bp, °C 31.16 34.83 56.30 57.22 58.55 56.13 51.14 59.39 95.75 96.61 95.67 95.73 97.46 99.94 87.50 87.89 125.60 123.81 125.00 123.40 112.60 112.14 101.40 97.83 147.50 146.26 232.00 224.11 314.90 312.73 300.00 300.34 3.08%

dens, kg/cm3

refract index

0.6504 0.6437 0.6690 0.6791 0.6686 0.6745 0.6675 0.6750 0.7030 0.7077 0.6981 0.7029 0.7277 0.7201 0.6969 0.6995 0.7243 0.7251 0.7199 0.7197 0.7182 0.7214 0.7150 0.7097 0.7320 0.7335 0.7658 0.7635 0.7891 0.7861 0.7852 0.7835

1.3778 1.3774 1.3880 1.3940 1.3889 1.3921 1.3841 1.3886 1.4059 1.4056 1.4043 1.4040 1.4208 1.4146 1.3985 1.4004 1.4150 1.4149 1.4132 1.4126 1.4135 1.4143 1.4086 1.4076 1.4181 1.4183 1.4340 1.4325 1.4448 1.4432 1.4432 1.4418

0.61%

0.13%

The properties in the second lines are predicted.

properties for both the training and cross-validation sets are shown in Figures 4-6. The average RSDs (relative standard deviations) of the training set are 4.88, 0.43, and 0.13% for the boiling point, density, and refractive index, respectively, while the responsible values of the cross-validation set are 3.5, 0.4, and 0.14%. The connection weights of the model between the input and hidden units are also presented in Table 11 to prove the efficiency of the descriptor in this paper. The trained network was then used to predict the properties of the external test set. The results are presented in Table

1150 J. Chem. Inf. Comput. Sci., Vol. 37, No. 6, 1997

LIU

ET AL.

Table 10. Relationship of the Number of Hidden Units (HU) and MSE of the Training and Cross-Validation Sets MSE × 103 trial 1

trial 2

trial 3

HU no.

TR

CV

TR

CV

TR

CV

10 8 6 5 4

0.9286 0.9612 1.0510 0.9898 1.0755

0.9588 1.4000 0.7471 0.7471 0.7706

1.0122 0.8816 0.9918 0.9816 1.0020

0.7588 1.1235 0.9588 0.6647 0.7176

0.9204 0.9184 0.8755 0.9367 1.0204

1.2588 0.8824 0.9471 0.6235 0.8706

Figure 6. Experimental versus predicted values of the refractive index for the training set (open circle) and the validation set (filled circles). Table 11. Connection Weights between Input and Hidden Units

Figure 3. Training error of SNNS.

Hn

χ

P

w

l

s

bias

H1 -0.799 84 0.038 20 0.445 53 0.033 25 -0.048 24 1.131 48 H2 0.652 63 1.183 22 0.126 79 -0.212 14 0.171 30 1.365 18 H3 1.354 32 0.104 25 0.543 81 0.046 23 -0.189 64 -0.312 08 H4 0.342 50 0.782 12 1.270 69 0.076 14 0.022 64 -3.168 93 H5 2.758 73 0.132 87 0.914 51 -0.044 46 -0.028 00 3.718 49

Figure 4. Experimental versus predicted values of the boiling points for the training set (open circle) and the validation set (filled circles).

Because no research papers dealing with unsaturated hydrocarbons in this research direction are available, a crosscomparison is hard to make. However, compared with prediction results from computational methods, the method is among the most accurate.13 Analysis of the prediction results shows that different physical properties of alkenes can be predicted with a different degree of accuracy. Among the three properties, the refractive index was predicted most accurately, followed by density and then by boiling point, which was of the lowest accuracy. There are several reasons for these phenomena. The main reason is that boiling points of the compounds depend strongly on nonvalence interactions that are not elucidated in general TI. As a result, thermodynamic properties of sterically hindered and strained molecules were predicted with less accuracy than those of nonhindered ones. In contrast to density and the refractive index, there is no systematic nonlinear relationship between boiling point and enantiomers in the training set. ANN cannot deduce accurate information about this relationship; thus it cannot make an accurate prediction. CONCLUSION

Figure 5. Experimental versus predicted values of the densities for the training set (open circle) and the validation set (filled circles).

9. As Table 9 shows, the average RSDs are 3.08, 0.61, and 0.13% for the boiling point, density, and refractive index, respectively. These values are comparable to those corresponding ones of the training and the cross-validation sets.

In this paper, a topological indices vector of five parameters including three grades of structural information was set up as a molecular descriptor of alkenes, and then the SNNS software was used to simulate the relationship between normal boiling point, density, and the refractive index of the compounds and descriptors. The promising prediction results indicate both the efficiency of the descriptors and the strong modeling capability of ANNs. ACKNOWLEDGMENT

This work was supported by the National Natural Science Foundation of China and the Special Foundation of the National Education Committee of China.

PREDICTION

OF

PROPERTIES

OF

ALKENE

REFERENCES AND NOTES (1) Sutter, J. M.; Dixon, S. L.; Jurs, P. C. Automated Descriptor Selection for Quantitative Structure-Activity Relationships Using Generalized Simulated annealing. J. Chem. Inf. Comput. Sci. 1995, 35, 77-84. (2) Cherqaoui, D.; Villemln, D. Use of a Neural Network to Determine the Boiling Point of Alkane. J. Chem. Soc., Faraday Trans. 1994, 90, 97-102. (3) Manallack, D. T.; Ellis, D. D.; Livingstone, D. J. Analysis of Linear and Nonlinear QSAR Data Using Neural Networks. J. Med. Chem. 1994, 37, 3758-3767. (4) Kaliszan, R. QuantitatiVe Structure-Chromatographic Retention Relationships. Wiley: New York, 1987. (5) Trinajstic´, N. Chemical Graph Theory, 2nd ed.; CRC Press: Boca Raton, FL, 1992; p 227. (6) Wiener, H. Structural Determination of Paraffin Boiling Points. J. Am. Chem. Soc. 1947, 69, 17. (7) Massart, D. L.; et al. Chemometrics: A Textbook; Elsevier: Amsterdam, 1990; Chapter 5.

J. Chem. Inf. Comput. Sci., Vol. 37, No. 6, 1997 1151 (8) Nalimov, V. V. The Application of Mathematical Statistics to Chemical Analysis. Pergamon Press: Oxford, U.K., 1963; pp 164-209. (9) Zell, A.; et al. SNNS User Manual, Version 4.0, Report No. 6/95. (It is available through anonymous ftp site: ftp.informative.uni-stuttgart.de.) (10) Linux is a freely-distributable implementation of UNIX for 80386 and 80486 machines. It can be obtained by anonymous ftp from Linux distributions ftp sites. The version used in this study is the Slackware distribution maintained by Patrick J. Volkerding (volkerdi@ mhd1.moorhead.msus.edu). We get it from anonymous ftp site: ftp.cdrom.com:/pub/linux/slackware. (11) Laded, D. R. CRC Handbook of Chemistry and Physics, 73rd ed.; CRC Press: Boca Raton, FL, 1992-1993. (12) Geman, S.; Bienenstock, E.; Doursat, R. Neural networks and the Bias/ Variance Dilemma. Neural Comput. 1994, 6, 181-214. (13) Razinger, M.; et al. Structural Selectivity of Topological Indexes in Alkane Series. J. Chem. Inf. Comput. Sci. 1985, 25, 23-27.

CI960107Z