Hybrid Method to Predict Melting Points of Organic Compounds Using

Aug 7, 2009 - The study shows that the proposed GCM + ANN + PSO model represents an .... The ANN was trained with particle swarm optimization...
0 downloads 0 Views 236KB Size
8760

Ind. Eng. Chem. Res. 2009, 48, 8760–8766

CORRELATIONS Hybrid Method to Predict Melting Points of Organic Compounds Using Group Contribution + Neural Network + Particle Swarm Algorithm Juan A. Lazzu´s* Departamento de Fı´sica, UniVersidad de La Serena, Casilla 554, La Serena, Chile

The melting points of organic compounds were estimated using a hybrid method that includes a simple group contribution method (GCM) implemented in an artificial neural network (ANN) replacing standart backpropagation with particle swarm optimization (PSO). A total of 439 compounds have been used to train the network developed using MatLab. Then, the melting points of 100 other compounds have been predicted and results compared to experimental data and other models availables in the literature. The study shows that the proposed GCM + ANN + PSO model represents an excellent alternative for the estimation of melting points of organic compounds (average absolute relative deviation (AARD) ) 7%) from the knowledge of the molecular structure. Introduction The melting point (mp) is the temperature at which a solid fuses to become a liquid; the freezing point is the temperature at which a liquid solidifies, and for all practical purposes, the two can be considered to be identical.1 From a thermodynamics point of view, at mp the change in Gibbs free energy (∆G) is zero, because the transition temperature is therefore related to the enthalpy (∆Htr) and entropy of transition (∆Str) by the following relationship: Ttr )

∆Htr ∆Hm ⇒ Tm ) ∆Str ∆Sm

(1)

Melting phenomena occur when the Gibbs free energy of the liquid becomes lower than the solid for that substance. From a scientific and industrial point of view, a fundamental understanding of the chemical, physical, and thermodynamic properties of substances should be known before their application to several processes. For instance, knowledge of some basic properties is useful in the area of fluid property estimation, thermodynamic property calculations, and phase equilibrium, among others.2 The mp is one of the most widely used fundamental physical properties.2 It finds applications in chemical identification, purification, and calculation of a number of other physicochemical properties.3,4 Also, several correlations of physicochemical properties make use of melting temperature.5 Melting point prediction has a long history.1 Mainly, prediction methods for this property can be categorized as group contribution methods (GCMs) and quantitative structure-property relationship (QSPR).2 Joback and Reid6 reevaluated Lydersen’s GCM,7 added several new functional groups, and determined new contribution values. For the case of mp, the method of Joback and Reid includes one of the simplest methods available.2 Later, Constantinou and Gani8 developed an advanced GCM based on the UNIFAC groups but enhanced it by allowing for more sophis* Towhomcorrespondenceshouldbeaddressed.E-mail:[email protected]. Tel.: +56 51-204128. Fax: +56 51-206658.

ticated functions of the desired properties and by providing contributions at a second-order level. Yalkowsky and co-workers3,4,9,10 have explored connections between melting point and normal boiling point as well as proposing correlations for mp. The method consists of both group contributions which are additive and molecular descriptors which are not additive.5 In all these methods, the property of a compound is calculated by summing up the contributions of certain defined groups of atoms, considering at the same time the number frequency of each group occurring in the molecule.5 Table 1 lists selected works on GCM applications to estimate mp that have been published in the literature during the last years. The aforementioned GCMs use linear and nonlinear regression techniques to represent the relations among the variables of a given system. The relationship between the physical and thermodynamic properties is highly nonlinear, and consequently, an artificial neural network (ANN) can be a suitable alternative to model the underlying thermodynamic properties. ANN is an especially efficient algorithm to approximate any function with a finite number of discontinuities by learning the relationships between input and output vectors. Thus, an ANN is an appropriate technique to model the nonlinear behavior of thermophysical properties.16 Table 1. Reported GCMs for Predicting Melting Points of Organic Compounds authors

compounds

6

diverse diverse aromatics aliphatics, non-hydrogenbonded diverse aliphatics diverse hydrocarbons diverse diverse hydrocarbons

Joback and Reid (1987) Constantinou and Gani8 (1994) Simamora and Yalkowsky9 (1994) Krzyzaniak et al.10 (1995) Tu and Wu11 (1996) Zhao and Yalkowsky3 (1999) Marrero and Gani12 (2001) Skander and Chitour13 (2002) Jain et al.4 (2004) Wen and Qiang14 (2004) Li et al.15 (2006) a

b

no. data deviation 388 312 1690 596

11.20%a 7.23%a 37.50b 34.30b

1310 1040 1103 577 1215

8.20%a 20.00%a 7.50%a 8.44%a 11.99%a 7.75%a 8.56%a

622

Average absolute relative deviation. Average absolute deviation.

10.1021/ie900431f CCC: $40.75  2009 American Chemical Society Published on Web 08/07/2009

Ind. Eng. Chem. Res., Vol. 48, No. 18, 2009

8761

Figure 2. Average absolute relative deviations found in correlating the melting points of all substances as function of the number of neurons in the hidden layer: ([) during the training step and (9) during the prediction step.

Figure 1. Flow diagram for the ANN + PSO program developed for this work.

Taskinen and Yliruusi17 presented a complete list of properties that have been analyzed in the literature using different approaches of ANNs. Properties such as boiling point, critical temperature, critical pressure, vapor pressure, heat capacity, enthalpy of sublimation, heat of vaporization, density, surface tension, viscosity, thermal conductivity, and acentric factor, among others, were thoroughly reviewed. One notable exception according to these authors is the melting point.17 To the best of the author’s knowledge, there is no application for mp prediction that includes a heterogeneous set of compounds, such as the one presented here, using a GCM + ANN methodology. In this work, the melting point of organic compounds has been estimated using a simple GCM implemented in an ANN replacing standart backpropagation with particle swarm optimization (PSO), which is one of the most recently developed evolutionary algorithms.18 Neural Network Used In this study, a feed-forward neural network was used to represent nonlinear relationships among variables.16 The network programmed with the software MatLab,19 consists on a multilayer network, in which the flow of information spreads forward through the layers while the propagation of the error is back. In this process, the network uses some factors called “weights”

(wi) to quantify the influence of each fact and of each variable. There are two main states in the operation of an ANN: the learning and the validation. The learning or training is the process for which an ANN modifies the weights in response to entered information.20 This ANN program considers the reading of the necessary data organized in an Excel file. To distinguish between the different physical and chemical properties of the substances used, and so the network can discriminate and learn in optimum form, properties derived from the molecular structure were considered. The input layer contains one neuron (node) for each variable. The output layer has one node generating the scaled estimated value of the mp. The ANN was trained with particle swarm optimization.21 Some researchers have used PSO to train neural networks and found that a PSO-based ANN has better training performance, a faster convergence rate, and a better predicting ability than the standard backpropagation algorithm.22 PSO is a population-based optimization tool, where the system is initialized with a population of random particles and the algorithm searches for optima by updating generations.23 In a PSO system, each particle is “flown” through the multidimensional search space, adjusting its position in search space according to its own experience and that of neighboring particles. The particle therefore makes use of the best position encountered by itself and that of its neighbors to position itself toward an optimal solution. The performance of each particle is evaluated using a predefined fitness function, which encapsulates the characteristics of the optimization problem.22 In each iteration, the velocity for each particle is calculated according to the following formula: Vpi (t + 1) ) ωVpi (t) + c1r1(ψpi (t) - xpi (t)) + c2r2(ψg(t) xpi (t))

(2)

where t is the current step number, ω is the inertia weight, c1 and c2 are the acceleration constants, and r1 and r2 are elements from two randon sequences in the range [0,1]. xpi (t) is the current position of the particle, ψpi is the best of the solutions that this particle has reached, and ψg is the best solutions that all the particles have reached. In general, the value of each component in V can be clamped to the range [-Vmax, Vmax] to control excessive roaming of particles outside the search space.23 After calculating the velocity, the new position of every particle is xp(t + 1) ) xp(t) + Vp(t + 1)

(3)

8762

Ind. Eng. Chem. Res., Vol. 48, No. 18, 2009

Figure 3. Comparison between experimental and calculated values of melting point: (a) during the training and (b) during the prediction.

when x and V denote a particle position and its corresponding velocity in a search space, respectively. The PSO algorithm performs repeated applications of the update equations above until a specified number of iterations have been exceeded, or until the velocity updates are close to zero. The total steps to calculate the output parameter (mp), using the input parameters, were as follows: The net inputs (N) are calculated for the hidden neurons coming from the inputs neurons. For a hidden neuron n

Nhj )

∑w p

h ij i

+ bhj

(4)

i

where the p corresponds to the vector of the inputs of the training, j is the hidden neuron, wij is the weight of the connection among the input neurons with the hidden layer, and the term bj corresponds to the bias of the neuron j of the hidden layer, reached in its activation. Starting from these inputs, the outputs (y) the hidden neurons are calculated, using a transfer function f h associated with the neurons of this layer. n

yhj ) f hj (

∑w p

h ij i

+ bhj )

(5)

and the lineal function (purelin) in the output layer, defined as f(Njk) ) (Njk)

(7)

All the neurons of the ANN have an associated activation value for a give input pattern, the algorithm continues finding the error that is presented for each neuron, except those of the input layer. After finding the output values, the weights of all layers of the network are actualized by PSO, using eqs 2 and 3.21 The PSO algorithm is very different then any of the traditional methods of training. Each neuron contains a position and velocity. The position corresponds to the weight of a neuron. The velocity is used to update the weight. The velocity is used to control how much the position is updated. If a neuron is further away (the position is further from the global best position), then it will adjust its weight more than a neuron that is closer to the global best. PSO initializes all weights to random values and starts training each one. On each pass through a data set, PSO compares each weight’s fitness. The network with the highest fitness is considered the global best. The other weights are updated based on the global best network rather than on their personal error or fitness. Figure 1 presents a block diagram of the program developed and written in MatLab M-file.

i

Database Used and Training Similar calculations are carried out to obtain the results of each neuron of the following layer until the output layer is reached. To minimize the error, the transfer function f it should be differentiable. In the ANN, two types of transfer function were used: the hyperbolic tangent function (tansig) in the hidden layer, defined by the equation f(Njk) )

eNjk - e-Njk eNjk + e-Njk

(6)

In this study, 439 organic compounds were used to train the network and then values of mp of 100 substances, not used in the training process, were predicted. (Note that the 100 organic compounds used in the prediction step were taken from the work of Dearden1 for a comparison of the results.) Molecular mass M (size), dipole moment µ in debye (polarity), and the structure of the molecules, represented by the number of well-defined groups forming the molecule, were provided as variables. Molecular mass and dipole moment were chosen to characterize

Ind. Eng. Chem. Res., Vol. 48, No. 18, 2009 Table 2. Structural Groups Used in the Proposed GCM + ANN + PSO Model parameter

no. occurrence

no.

group

max value

training set

predicting set

total set

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 26 25 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44

M µ sCH3 sCH2s >CHs >C< dCH2 dCHs dC< dCd tCH tCs sOH sOs >CdO sCHO sCOOH sCOOs HCOOs dO sNH2 sNHs >Ns dNs sCN sNO2 sSs sF sCl sBr sI sCH2s(ring) >CHs(ring) dCHs(ring) >CNs(ring) dNs(ring) sSH sSs(ring)

394.8 6.3 7 26 4 2 2 2 2 1 1 2 5 2 2 2 4 2 1 1 2 2 2 2 2 3 2 6 6 4 2 10 6 20 2 6 3 6 2 2 4 2 2 1

439 439 249 200 45 35 26 30 12 4 2 3 35 14 15 33 44 19 4 3 21 17 5 2 11 21 5 13 34 19 7 32 11 176 8 174 12 26 9 8 5 11 23 5

100 100 63 36 14 8 11 7 4 0 0 0 7 2 5 1 3 4 1 1 3 2 1 0 4 0 3 9 12 3 2 8 2 22 0 17 3 3 1 1 0 4 3 1

539 539 312 236 59 43 37 37 16 4 2 3 42 16 20 34 47 23 5 4 24 19 6 2 15 21 8 22 46 22 9 40 13 198 8 191 15 29 10 9 5 15 26 6

Table 3. Overall Minimum, Maximum, and Average Deviations for the Calculated Melting Points for All Compounds Using the GCM + ANN + PSO Model ANN model

training set

prediction set

total set

no. substances ARDmin ARDmax ARD AARD no. AARDs < 10 no. AARDs > 20

439 0.0 23.2 0.5 6.1 334 10

100 0.0 20.9 0.9 7.1 71 2

539 0.0 23.2 0.6 6.4 415 12

the different molecules.2,16,20,21 All 539 substances and the properties used are listed in Table S1 (see the Supporting Information). These values are of special importance to verify that an acceptable range of melting temperatures is covered in this study. In other works for estimating the mp,1 the authors compare their models with a great number of experimental data points for different substances. The problem is that the reliability of these experimental data is never established. Uncertainties are not given for any of the experimental data, which makes it impossible to interpret the deviations between the models and the experimental data. This is especially troubling given the history of unreliable experimental data that have been collected

8763

Table 4. Optimum Weights and Biases for the GCM + ANN + PSO Model 44-4-1 wji

1

2

3

4

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 bj wjk bk

1.0119 -0.3068 -0.6494 -0.9188 0.1476 -0.1507 -0.1821 -0.0971 0.6494 1.6389 1.7596 0.1732 0.0237 -0.1618 -0.2790 -0.2815 -0.8688 -0.4188 1.3901 -0.1001 -0.0514 0.4512 0.1011 0.3264 -0.2362 -0.0646 -0.1617 2.0647 -1.1231 -1.2018 -1.0369 -0.2571 0.0182 -1.3033 -0.2504 0.2035 0.1233 -1.0043 -0.3002 -0.2680 -0.5453 0.6047 -0.3049 -0.3769 -0.0272 7.3071

-23.4570 -8.6386 -8.0246 -7.0443 -29.9660 -3.5915 -13.4940 11.9270 -1.6739 -2.7347 5.6313 -0.2434 -1.3245 10.3440 36.4380 -3.7999 19.4270 -6.3221 3.2960 -10.4370 12.8600 20.1000 22.6920 -7.3443 -16.8620 -14.1270 6.9791 9.6475 -15.4380 0.9313 3.9707 -7.7004 37.3980 -29.9200 4.2330 13.8740 -8.9897 -17.0540 12.9230 11.8150 -0.5712 -13.2720 -2.0856 0.1527 5.7824 -0.1418

1.3507 -0.3514 -0.8194 -0.5711 0.0177 -0.0945 -0.0198 -0.2956 0.1220 -0.0691 4.1529 4.7261 -0.6668 0.3169 -0.3021 -0.1376 -2.6882 -0.6860 -0.1583 0.1473 -0.3786 0.3227 -0.3125 0.7470 -0.6097 -0.2524 0.7775 -0.4443 -1.1970 -1.1529 -1.3371 -0.4767 -0.6008 -1.4478 0.1290 -0.6229 -0.4812 -1.0167 -0.7010 -0.8143 -0.8286 0.1648 -0.2093 -0.2137 -5.8045 -1.3692

5.4991 -0.5884 -2.6330 -3.9508 -0.5402 0.7508 -0.5464 -0.4579 -0.6296 -6.8844 -7.3872 14.7350 -1.3353 0.7799 0.5038 -0.0739 -4.0115 -1.2890 -0.7039 0.2732 -0.3637 0.2891 -0.2936 13.0390 -2.0190 -1.0235 0.6700 -14.3880 -2.9426 -3.9744 -3.5822 -1.9044 -1.3729 -3.4589 0.4653 -2.1021 -2.9747 0.2751 -0.8600 -1.2536 -0.1817 -0.2709 -0.6879 0.0110 -36.6540 0.6727 -6.5082

for the melting temperature.1 It would be much more useful and productive to select only the best experimental data sets (with well-established uncertainties) and to use these data sets to develop and test models. In this work all the properties of interest (M, µ, mp) were taken from the DIPPR database24 that includes estimated uncertainties for the experimental data. To distinguish between the different substances considered in this study, so the net can discriminate and learn in optimum form, the properties used cover wide ranges: 85-561 K for mp of organic compounds. In addition to that, the substances included in the study have very different physical and chemical characteristics: low molecular weight substances such as formaldehyde (M ) 30) to high molecular weight substances such as n-octacosane (M ) 395); or nonpolar substances (µ ) 0) such as benzene and anthracene to highly polar substances such as o-dinitrobenzene (µ ) 6.3). Thus, the problem is not straightforward and most probably is one of the reasons why the mp has not been treated using ANN as proposed in this paper. The structure of the molecules used were represented by the number of well-defined groups forming the molecules. The value associated to the structural group was defined as follows: 0, when the group does not appear in the substance, and n, when

8764

Ind. Eng. Chem. Res., Vol. 48, No. 18, 2009

Table 5. Comparison of the Proposed GCM + ANN + PSO Model with Other GCMs and QSPR Methods Found in the Literature to Determine the Melting Points of Organic Compoundsa MPBPVP1 no. formula 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83

CBrCl3 CCl4 CClF3 CH2I2 CH2O CH3F CH3I CH4O CH4S CH5N CHCl3 CHClF2 CHF3 C2HCl5 C2H2F2 C2H3Br C2H3F C2H3F3 C2H3N C2H4Cl2 C2H4Cl2 C2H4O2 C2H5F C2H6O C2H6S C2H6S2 C2H7N C2N2 C3H5Cl C3H6 C3H6O C3H6O2 C3H6O2 C3H7Br C3H8 C3H8O C3H8S C4H4O C4H4S C4H5N C4H7N C4H8 C4H8Cl2 C4H8O C4H8O C4H8O C4H8O C4H8O2 C4H10 C4H10O C4H10O2 C4H10S C5H5N C5H9N C5H10 C5H10O C5H10O2 C5H11N C5H12S C6H4Cl2 C6H5F C6H6 C6H7N C6H7N C6H8N2 C6H10O C6H12 C6H12O C6H12O C6H12O2 C6H12O2 C6H12O2 C6H12O2 C6H14 C6H14O C7H7Cl C7H8O C7H16 C7H16 C7H16 C8H8 C8H8O C8H10

CASN

name

mexp p

75-62-7 56-23-5 75-72-9 75-11-6 50-00-0 593-53-3 74-88-4 67-56-1 74-93-1 74-89-5 67-66-3 75-45-6 75-46-7 76-01-7 75-38-7 593-60-2 75-02-5 420-46-2 75-05-8 75-34-3 107-06-2 64-19-7 353-36-6 64-17-5 75-18-3 624-92-0 124-40-3 460-19-5 557-98-2 115-07-1 67-64-1 79-20-9 79-09-4 106-94-5 74-98-6 71-23-8 75-33-2 110-00-9 110-02-1 109-97-7 109-74-0 115-11-7 616-21-7 78-93-3 109-99-9 123-72-8 109-92-2 123-91-1 106-97-8 60-29-7 6982-25-8 75-66-1 110-86-1 110-59-8 287-92-3 107-87-9 542-55-2 110-89-4 628-29-5 95-50-1 462-06-6 71-43-2 62-53-3 108-89-4 100-63-0 108-94-1 563-78-0 108-93-0 589-38-8 110-19-0 142-62-1 123-86-4 105-54-4 110-54-3 628-81-9 100-44-7 108-39-4 142-82-5 464-06-2 565-59-3 100-42-5 98-86-2 95-47-6

bromotrichloromethane tetrachloromethane chlorotrifluoromethane diiodomethane formaldehyde methyl fluoride methyl iodide methanol methyl mercaptan methylamine chloroform chlorodifluoromethane trifluoromethane pentachloroethane 1,1-difluoroethene vinyl bromide vinyl fluoride 1,1,1-trifluoroethane acetonitrile 1,1-dichloroethane 1,2-dichloroethane acetic acid ethylfluoride ethanol dimethyl sulfide dimethyl disulfide dimethylamine cyanogen 2-chloropropene propene acetone methyl acetate propionic acid 1-bromopropane propane 1-propanol isopropyl mercaptan furan thiophene pyrrole n-butyronitrile isobutene 1,2-dichlorobutane methyl ethyl ketone tetrahydrofuran 1-butanal ethyl vinyl ether 1,4-dioxane n-butane diethyl ether 2,3-butanediol tert-butylmercaptan pyridine valeronitrile cyclopentane 2-pentanone isobutyl formate piperidine methyl n-butyl sulfide 1,2-dichlorobenzene fluorobenzene benzene aniline 4-methylpyridine phenylhydrazine cyclohexanone 2,3-dimethyl-1-butene cyclohexanol 3-hexanone isobutyl acetate n-hexanoic acid n-butylacetate ethyl n-butyrate 1-hexane n-butylethyl ether benzyl chloride 3-cresol n-heptane 2,2,3-trimethylbutane 2,3-dimethylpentane styrene acetophenone 2-xylene

252.2 250.3 92.2 279.3 181.2 131.4 206.7 175.5 150.2 179.7 209.6 115.7 118.0 244.2 129.2 135.4 112.7 161.8 229.3 176.2 237.5 289.8 130.0 159.1 174.9 188.4 181.0 245.3 135.8 87.9 178.5 175.2 252.5 163.2 85.5 147.0 142.6 187.6 234.9 249.7 161.3 132.8 210.0 186.5 164.7 176.8 157.4 285.0 134.9 156.9 280.8 274.3 231.5 177.0 179.3 196.3 177.4 262.7 175.3 256.2 230.9 278.7 267.1 276.8 292.4 242.0 115.9 296.6 217.5 174.3 269.3 199.7 175.2 177.8 170.2 234.2 285.4 182.6 248.6 203.2 242.5 292.8 248.0

mcalc p 241.3 214.8 135.9 217.3 162.3 112.4 178.0 172.2 157.9 174.9 194.3 133.8 102.0 249.4 116.9 179.7 125.0 121.5 189.4 176.8 196.4 251.9 126.5 185.4 165.5 181.2 167.0 249.1 155.6 137.8 179.6 178.1 264.2 194.1 139.3 198.2 172.2 186.1 227.5 228.4 214.8 142.3 206.0 192.7 188.3 193.0 170.6 209.6 152.9 172.0 238.8 190.9 228.7 227.1 178.7 205.5 201.0 248.5 204.1 258.9 200.2 195.3 267.0 247.3 298.4 243.6 156.9 239.8 218.1 204.8 299.4 216.4 216.4 177.9 197.9 246.0 288.9 192.2 174.2 168.5 224.9 263.3 232.5

ProPred1

|%∆mp|

mcalc p

4.3 14.2 47.4 22.2 10.4 14.5 13.9 1.9 5.1 2.7 7.3 15.6 13.6 2.1 9.5 32.7 10.9 24.9 17.4 0.3 17.3 13.1 2.7 16.5 5.4 3.9 7.7 1.5 14.6 56.7 0.6 1.7 4.6 18.9 62.9 34.8 20.7 0.8 3.2 8.6 33.2 7.1 1.9 3.3 14.3 9.2 8.4 26.5 13.3 9.6 15.0 30.4 1.2 28.3 0.4 4.7 13.3 5.4 16.4 1.1 13.3 29.9 0.1 10.7 2.1 0.6 35.3 19.2 0.3 17.5 11.2 8.4 23.5 0.0 16.3 5.0 1.2 5.2 29.9 17.1 7.3 10.1 6.3

269.4 264.2 161.2 209.6 --96.7 143.2 184.1 74.4 206.1 206.9 --180.5 286.4 147.3 135.4 112.7 114.9 203.4 172.3 198.9 308.5 120.0 194.4 129.8 181.8 166.9 277.2 167.1 96.3 191.2 155.1 313.1 160.2 73.2 204.0 133.9 195.8 204.1 261.0 185.8 120.4 200.6 197.3 191.5 212.3 144.4 238.7 94.2 136.1 274.3 269.1 236.6 195.9 154.5 206.7 180.9 272.2 170.0 241.2 201.1 185.5 285.3 287.5 302.3 265.8 129.2 287.9 219.1 187.0 325.8 189.6 185.9 136.1 163.0 248.5 304.4 143.7 151.8 143.4 212.2 277.0 193.8

ChemOffice1

Joback-Reid6

|%∆mp|

mcalc p

|%∆mp|

mcalc p

|%∆mp|

mcalc p

BPNN

6.8 5.5 74.9 25.0 --26.4 30.7 4.9 50.5 14.7 1.3 --53.0 17.3 14.0 0.0 0.0 29.0 11.3 2.2 16.3 6.4 7.7 22.2 25.8 3.6 7.8 13.0 23.1 9.5 7.1 11.5 24.0 1.8 14.4 38.8 6.1 4.4 13.1 4.5 15.2 9.4 4.5 5.8 16.3 20.1 8.3 16.2 30.2 13.3 2.3 1.9 2.2 10.7 13.9 5.3 2.0 3.6 3.1 5.9 12.9 33.5 6.8 3.8 3.4 9.8 11.4 3.0 0.7 7.3 21.0 5.1 6.1 23.5 4.2 6.1 6.6 21.3 39.0 29.5 12.5 5.4 21.9

252.6 222.7 134.7 216.7 159.0 101.2 158.7 161.5 137.1 183.9 175.4 116.7 87.4 248.9 97.4 169.9 110.7 116.1 176.9 156.7 171.7 272.5 112.5 172.7 146.3 180.7 164.6 241.9 137.4 121.4 173.1 165.5 283.8 183.0 123.2 184.0 144.6 177.7 234.6 --199.4 118.7 179.3 184.4 176.2 176.4 154.9 199.2 134.4 156.7 226.1 173.3 231.2 210.7 160.8 195.6 194.9 262.4 180.1 255.8 184.0 170.9 266.7 254.9 319.3 236.8 126.3 225.2 206.9 184.3 317.6 199.3 199.3 155.2 179.2 224.6 306.4 168.2 155.7 138.2 204.2 255.9 218.5

0.2 11.1 46.1 22.4 12.3 23.0 23.2 8.0 8.7 2.3 16.4 0.8 26.0 1.9 24.6 25.5 1.8 28.3 22.9 11.1 27.7 6.0 13.5 8.6 16.4 4.1 9.1 1.4 1.2 38.1 3.0 5.5 12.4 12.1 44.1 25.2 1.4 5.3 0.2 --23.6 10.7 14.6 1.1 7.0 0.2 1.6 30.1 0.4 0.1 19.5 36.8 0.2 19.0 10.4 0.4 9.9 0.1 2.7 0.2 20.3 38.7 0.2 7.9 9.2 2.2 8.9 24.1 4.9 5.7 17.9 0.2 13.8 12.8 5.3 4.1 7.3 7.9 37.4 32.0 15.8 12.6 11.9

253.0 223.1 135.1 217.2 120.3 101.6 159.1 161.9 117.4 184.3 175.8 129.0 87.8 249.3 97.8 170.3 111.1 116.5 177.3 157.1 172.1 272.9 112.9 173.1 146.7 181.1 165.0 242.3 137.8 121.8 173.5 165.9 284.2 183.4 123.6 184.4 124.9 178.1 --223.4 199.8 119.1 179.7 184.8 176.6 176.8 155.3 199.6 134.8 157.1 226.5 --231.6 211.1 161.3 196.0 --262.8 180.5 256.2 184.4 171.3 267.1 255.3 319.7 237.2 126.7 251.8 207.3 184.7 318.0 188.4 199.7 157.4 179.6 225.0 306.8 168.7 156.1 138.7 204.6 256.3 218.9

0.3 10.9 46.7 22.2 33.6 22.6 23.0 7.8 21.8 2.6 16.1 11.5 25.6 2.1 24.3 25.9 1.3 28.0 22.7 10.8 27.5 5.8 13.1 8.8 16.1 3.9 8.8 1.2 1.5 38.6 2.8 5.3 12.6 12.4 44.6 25.5 12.4 5.1 --10.5 23.9 10.3 14.4 0.9 7.2 0.1 1.3 30.0 0.0 0.1 19.3 --0.0 19.3 10.1 0.1 --0.0 3.0 0.0 20.2 38.5 0.0 7.8 9.4 2.0 9.3 15.1 4.7 6.0 18.1 5.6 14.0 11.5 5.5 3.9 7.5 7.6 37.2 31.8 15.7 12.5 11.8

214.0 200.7 147.4 219.5 127.4 135.0 178.3 178.1 189.2 169.0 185.3 152.3 132.0 226.8 141.6 173.9 142.9 140.9 193.2 178.8 177.5 213.3 144.1 187.2 199.6 252.9 168.7 240.2 170.7 154.2 189.4 182.4 222.5 184.3 155.4 196.4 208.8 185.4 239.1 197.3 211.5 164.2 197.1 198.6 186.3 186.8 172.2 200.0 164.6 173.5 249.0 217.7 208.8 220.7 184.4 207.7 183.5 208.1 227.0 232.6 196.0 194.8 229.9 221.2 252.4 227.6 183.7 230.8 216.9 211.1 249.9 200.7 209.8 182.8 170.3 222.9 243.8 192.0 194.3 194.6 215.1 250.4 244.9

ANN + PSO

|%∆mp| mcalc |%∆mp| p 15.1 19.8 60.0 21.4 29.7 2.8 13.7 1.5 26.0 6.0 11.6 31.6 11.9 7.1 9.7 28.5 26.8 12.9 15.7 1.5 25.2 26.4 10.9 17.7 14.1 34.2 6.8 2.0 25.7 75.4 6.2 4.1 11.9 13.0 81.8 33.6 46.4 1.1 1.8 21.0 31.2 23.6 6.1 6.5 13.2 5.7 9.5 29.8 22.0 10.6 11.3 20.6 9.8 24.7 2.9 5.8 3.5 20.8 29.5 9.2 15.1 30.1 13.9 20.1 13.7 5.9 58.5 22.2 0.3 21.1 7.2 0.5 19.8 2.8 0.1 4.8 14.6 5.2 21.8 4.3 11.3 14.5 1.2

248.7 283.7 104.7 285.7 181.2 118.4 198.1 138.9 142.7 175.9 237.4 126.8 118.5 255.9 125.0 136.4 107.0 156.2 213.0 170.8 206.1 306.3 107.3 136.7 151.3 191.2 155.6 263.0 159.8 87.3 198.7 197.8 295.2 165.2 89.5 137.4 151.7 175.5 196.9 267.5 180.6 140.9 201.3 206.3 171.5 204.4 168.6 281.5 123.5 166.3 252.2 264.6 270.1 149.2 193.0 209.8 187.3 222.9 181.9 258.8 237.5 256.9 259.1 241.2 315.5 237.0 136.9 323.8 210.0 196.0 265.3 221.3 177.1 173.8 170.2 224.6 304.5 180.1 203.7 203.2 225.9 278.2 248.0

1.4 13.3 13.6 2.3 0.0 9.8 4.2 20.9 5.0 2.1 13.3 9.6 0.5 4.8 3.2 0.8 5.0 3.5 7.1 3.1 13.2 5.7 17.4 14.1 13.5 1.4 14.0 7.2 17.7 0.7 11.3 12.9 16.9 1.3 4.7 6.5 6.4 6.4 16.2 7.1 12.0 6.1 4.2 10.7 4.2 15.7 7.2 1.2 8.4 6.1 9.9 3.5 16.6 15.7 7.7 6.9 5.6 15.1 3.8 1.0 2.8 7.8 3.0 12.9 7.9 2.1 18.1 9.2 3.4 12.4 1.5 10.8 1.1 2.3 0.0 4.1 6.7 1.4 18.1 0.0 6.9 5.0 0.0

Ind. Eng. Chem. Res., Vol. 48, No. 18, 2009

8765

Table 5. Continued MPBPVP1

ProPred1

ChemOffice1

Joback-Reid6

BPNN

ANN + PSO

no.

formula

CASN

name

mexp p

mcalc |%∆mp| mcalc |%∆mp| mcalc |%∆mp| mcalc |%∆mp| mcalc |%∆mp| mcalc |%∆mp| p p p p p p

84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 AARD AARDmax

C8H10O C8H11N C8H18 C8H18O C8H18O C9H7N C9H10 C9H12 C9H12 C9H12 C9H12 C9H18 C10H14 C10H22O C12H22 C16H32 C16H34

105-67-9 121-69-7 540-84-1 111-87-5 104-76-7 91-22-5 98-83-9 98-82-8 108-67-8 622-96-8 95-63-6 1678-92-8 141-93-5 112-30-1 92-51-3 629-73-2 544-76-3

2,4-xylenol n,n-dimethylaniline 2,2,4-trimethylpentane 1-octanol 2-ethyl-1-hexanol quinoline methylstyrene cumene 1,3,5-trimethylbenzene 4-ethyltoluene 1,2,4-trimethylbenzene n-propylcyclohexane 1,3-diethylbenzene 1-decanol bicyclohexyl 1-hexadecene n-hexadecane

297.7 275.7 165.8 257.7 203.2 258.4 249.2 177.1 228.4 210.8 229.3 178.3 189.3 280.1 276.8 277.5 291.3

298.6 248.6 187.2 258.7 247.7 310.8 228.2 226.9 250.7 244.5 250.7 223.6 256.2 281.1 264.7 294.5 295.7

a

0.3 9.8 12.9 0.4 21.9 20.3 8.4 28.1 9.7 15.9 9.3 25.4 35.3 0.4 4.4 6.1 1.5 12.9 62.9

316.5 266.3 179.5 244.4 239.3 257.9 235.6 199.4 218.4 241.4 218.9 189.6 177.3 258.0 271.8 238.3 234.8

6.3 3.4 8.2 5.2 17.8 0.2 5.5 12.5 4.4 14.5 4.6 6.3 6.3 7.9 1.8 14.1 19.4 12.8 74.9

330.2 238.4 166.9 240.3 225.3 321.5 201.5 202.2 242.3 229.7 242.3 198.2 241.0 262.9 239.4 267.9 269.7

10.9 13.5 0.6 6.8 10.9 24.4 19.1 14.1 6.1 8.9 5.6 11.2 27.3 6.1 13.5 3.5 7.4 12.4 46.1

330.6 238.8 167.3 240.7 225.7 321.9 201.9 202.6 242.7 230.1 242.7 198.6 241.4 263.3 239.8 268.3 270.1

11.1 13.4 0.9 6.6 11.1 24.6 19.0 14.4 6.2 9.2 5.8 11.4 27.6 6.0 13.4 3.3 7.3 12.8 46.7

256.2 222.3 203.5 242.1 243.4 259.1 241.1 226.8 232.0 228.8 232.0 222.4 237.9 260.4 262.0 273.1 274.3

13.9 19.4 22.7 6.0 19.8 0.3 3.2 28.0 1.6 8.5 1.2 24.8 25.7 7.0 5.3 1.6 5.8 16.1 81.8

341.4 251.3 176.6 241.2 202.7 287.8 249.2 213.6 230.2 228.1 244.3 187.4 220.7 264.3 251.0 280.3 290.5

14.7 8.8 6.5 6.4 0.2 11.4 0.0 20.6 0.8 8.2 6.5 5.1 16.6 5.6 9.3 1.0 0.3 7.1 20.9

exp - mexp |%∆mp| ) |(mcalc p p )/(mp )| × 100.

the group appears n times in the substance. For instance, for ethyl vanillin, the property data are the following: M ) 166.2, µ ) 4.2, and the structure of the molecule [sCH3] ) 1, [sCH2s] ) 1, [sOs] ) 1, [sCHO] ) 1, [dCHs(ring)] ) 3, [dC