Prediction of Flash Point of Organosilicon Compounds Using

Nov 4, 2010 - It is found that the 6-descriptor model could fit the training data with R2 = 0.8981 .... A Model for Predicting the Auto-ignition Tempe...
0 downloads 0 Views 2MB Size
12702

Ind. Eng. Chem. Res. 2010, 49, 12702–12708

Prediction of Flash Point of Organosilicon Compounds Using Quantitative Structure Property Relationship Approach Chan-Cheng Chen,*,† Horng-Jang Liaw,‡ and Yi-Jen Tsai‡ Department of Safety, Health and EnVironmental Engineering, National Kaohsiung First UniVersity of Science and Technology, 2 Jhuoyue Rd., Nanzih, Kaohsiung City 811, Taiwan, Republic of China and Department of Occupational Safety and Health, China Medical UniVersity, 91 Hsueh-Shih Rd., Taichung 40402, Taiwan, Republic of China

Flash point (FP) is the primary property to classify flammable liquids for the purpose of assessing their fire and explosion hazards. Because of the advancement of technology in discovering or synthesizing new compounds, there is often a significant gap between the demand for such data and their availability. In this regard, reliable methods to estimate the FP of a compound are indispensible. In this work, a quantitative structure property relationship study is presented for predicting the FP of organosilicon compounds. To build up and validate the proposed models, a data set of 230 organosilicon compounds are collected and divided into a training set of 184 compounds and a testing set of 46 compounds. The stepwise regression method is used to select the required molecular descriptors for predicting the FP of organosilicon compounds from 1538 molecular descriptors. Depending on the p-value for accepting a descriptor to enter the model, models with different number of descriptors are obtained. A 13-descriptor model and a 6-descriptor model are obtained with the p-value of 5 × 10-4 and 1 × 10-5, respectively. It is found that the 6-descriptor model could fit the training data with R2 ) 0.8981 and predict the test data with Q2 ) 0.8533 and the 13-descriptor model could fit the training data with R2 ) 0.9293 and predict the test data with Q2 ) 0.9245. The average predictive errors are less than 5% for both proposed models and they are useful for many practical applications, because the proposed models used only calculated descriptors from the molecular structure. 1. Introduction Flash point (FP) is the lowest temperature at which a liquid chemical emits enough vapor to form an ignitable mixture with air under the test condition. Thus, FP is the primary property to classify flammable liquids for the purpose of assessing their fire and explosion hazards. In fact, in most countries, regulations for safely handling and transporting liquid chemicals are dependent on this classification, so FP is a very important safety information for liquid chemicals in process industries. Although experimental FP data are desirable, there is often a significant gap between the demand for such data and their availability due to the advancement of technology in discovering or synthesizing new compounds. Moreover, for some toxic or radioactive compounds, conducting experiments to determine the FP is extremely difficult. In this regard, a reliable theoretical method to estimate the FP is indispensable. In the literature, many correlations have been proposed to predict the FP of organic compounds and a review of these correlations could be found in the work by Vidal et al.1 Roughly speaking, these correlations could be divided into three categories. The first category contains those correlations which need some other physical properties such as boiling point, density, enthalpy of vaporization, and so on.2-6 Correlations in this category have some important disadvantages. Because other physical properties are required for these correlations, their predictive ability depends on the accuracies of the required physical properties or methods used to predict them. Further* To whom correspondence should be addressed. Tel: 886-76011000ext. 2311. Fax: 886-7-6011061. E-mail: chch_chen@ ccms.nkfust.edu.tw. † Department of Safety, Health and Environmental Engineering, National Kaohsiung First University of Science and Technology. ‡ Department of Occupational Safety and Health, China Medical University.

more, if any of the required properties is not available, these correlations will not work. Thus, these correlations can not apply to the case for predicting the FP of novel substance which often lacks experimental data for most physical properties. The second category contains the well-known structure group contribution (SGC) correlations.7-11 In this category, a group contribution table of different functional groups which are deemed to influence the FP is first construct, and then a correlation between the FP and the functional group counts of a compound is developed. This correlation is then used to predict the FP of a compound according to its functional group counts. Although correlations in this category do not need any physical property to predict FP, they are still limited to the case in which the functional group has been defined and considered in the group contribution table. Thus, this approach does not apply to the case of predicting the FP of a novel substance which has functional groups not defined in the group contribution table. The third category is known as the quantitative structure property relationship (QSPR) method. In this category, many molecularbased parameters which are often called “molecular descriptors” are used, and these molecular descriptors are directly calculated from the chemical structure of a compound. As this approach does not need any physical property and can easily apply to the case of predicting the FP of a novel substance, recently this approach was adopted in many research works.12-16 Katritzky et al. used the CODESSA software to explore the predictive model of FP for a data set of 271 pure organic compounds; in a recent work they expanded previous wok to a data set of 758 organic compounds and an artificial neural network (ANN) was adopted as the model. Gharagheizi and Alamdari used genetic algorithmbased multivariate linear regression (GA-MLR) technique to explore a data set of 1030 organic compounds. Pan et al. also adopted the GA-MLR approach to explore the predictive model of FP of a data set of 314 pure hydrocarbons. Patel et al.

10.1021/ie101381b  2010 American Chemical Society Published on Web 11/04/2010

Ind. Eng. Chem. Res., Vol. 49, No. 24, 2010

Figure 1. Distributions of molecular weight: (a) training set; (b) testing set.

12703

set. As the number of such compounds is very little, we think this will not distort the conclusion. In this work, molecular structure of the organosilicon compounds are drawn into Hyperchem software and preoptimized using MM+ molecular mechanics force field.18 Since the values of some types of molecular descriptors depend on bonds lengths and bonds angles, the optimized chemical structures are necessary to avoid errors in calculating these descriptors. In next step the Dragon software are used to calculate the all the molecular descriptors for all organosilicon compounds according to their optimized chemical structures.19 Dragon software can calculate up to 3224 descriptors for every molecule. However, some of these molecular descriptors give the same numerical values for all compounds in our data set. After dropping such molecular descriptors, there are still 1538 molecular descriptors. The remaining 1538 molecular descriptors are then considered as the candidates of the regressor variables in a multiple linear regression (MLR) model. As we know, when the MLR model, which is depicted in eq 1, is built up from a large number of regressors (i.e., molecular descriptors in this study), there might be interactions between these regressors, and we should properly assess the correlations between the regressors. Otherwise, it is possible to include redundant regressors that confuse the identification of significant effects for a model. Thus, a key problem in developing a QSPR model is to find a model can predict the desired property with the least number of molecular descriptors as well as with the highest accuracy.

Figure 2. Distributions of flash point: (a) training set; (b) testing set.

categorized the 236 organic compounds into five classes to enhance the predictive performance of the multivariate linear regression model and they also considered the ANN model to replace the multivariate linear regression model. Although the QSPR studies for predicting the FP of organic compounds are rigorously explored recently, the QSPR studies for predicting the FP of organosilicon compounds are, to the author’s best knowledge, not reported in the literature. However, as semiconductor related industries have made great progresses in recent years and more and more organosilicon compounds are synthesized and employed in these industries, it seems important to develop a predictive model of FP for such compounds. In this work, a QSPR study is presented for predicting the FP of organosilicon compounds. 2. Methodology In present study experimental FPs of 230 organosilicon compounds were collected form the work by Hshieh and The Chemical Database.2,17 The organosilicon compounds collected in this study show wide variability in both the FP and the molecular weight. Their FPs range from 204K to 466K and their molecular weights range from 60.2 to 607.4. The collected data are randomly distributed into a training set of 184 compounds and a testing set of 46 compounds. In this work the ratio of the number of compounds in the testing set to the number of total collected compounds is chosen to be of 1/5 which is the recommended value for validating empirical model in most studies. Figure 1 compares the distribution of molecular weight for the training set with that for the testing set, and Figure 2 compares the distribution of FP for the training set with that for the testing set. It could be found from these two figures that the molecular weight distribution for the training set and that for the testing set are quite similar; however, the compounds with FP higher than 400K are all distributed into the training

The MLR model: y ) β0 + β1x1 + β2x2 + · · · + βnxn

(1) Selecting a subset of regressors to create a model with smaller number of regressors is the problem of feature selection. Selection criteria usually involve the minimization of a specific measure of predictive error for models which are fit to different subsets of the regressors. Algorithms are then applied to search for a specific subset of regressors that optimally model measured responses, subject to constraints such as required or excluded features and the size of the subset. In the literature many algorithms have been proposed to accomplish this work. Stepwise regression, which is adopted in this work, is a systematic method for adding and removing regressors from a MLR model based on their statistical significance in a regression.20 The stepwise regression method begins with an initial model and then compares the explanatory power of incrementally larger and smaller models. At each step, the p-value of an F-statistic is computed to test models with and without a potential regressor. If a regressor is not currently in the model, then the null hypothesis is that the regressor would have a zero coefficient if it is added to the model. If there is sufficient evidence to reject the null hypothesis, then the regressor is added to the model. Conversely, if a regressor is currently in the model, then the null hypothesis is that the regressor has a zero coefficient. If there is insufficient evidence to reject the null hypothesis, then the regressor is removed from the model. However, depending on the regressors included in the initial model and the order in which regressors are moved in and out, the method may build up different models from the same set of potential regressors. In this sense, models obtained by stepwise regression are locally optimal, but may not be globally optimal. To overcome this drawback, the random search technique is introduced to automatically set up the initial model in the algorithm in present work.

12704

Ind. Eng. Chem. Res., Vol. 49, No. 24, 2010

Table 1. Performance in Predicting the FP of Organosilicon Compounds by Existing QSPR Models reported R2 reported Q2 Q2organosilicon R2reftting

models 3

Gharagheizi and Alamdari Pan et al.11 Patel and Mannan13 monohydric alcohols polyhydric alcohols hydrocarbons amines ethers

0.9669 0.9890

0.9708 0.9850

0.855 0.370 0.842 0.477 0.680

0.870 0.251 0.327 0.157 0.770

0.3966 0.7569 0.4208 0.7367 -0.5128 -3.1076 -11.1591 -33.6606 0.2625

0.5709 0.5150 0.5489 0.6502 0.6497

3. Results and Discussions We first examine the models and molecular descriptors proposed in the literature to see whether such models and molecular descriptors could be directly applied to organosilicon compounds. Table 1 summarizes the result of such explorations. In Table 1, the first column lists QSPR models to predict FP of organic compounds reported in the literature. It should be noted that the organic compounds are divided into five classes in the work of Patel et al., and they built up individual QSPR model for each class. The second and third columns in Table 1 are the fitting ability (R2) and predictive ability (Q2) announced in the original works, respectively. We calculate the predicted FP of all collected organosilicon compounds according to the suggested equation in each work, and then the coefficient of determination is calculated for every investigated case. Obviously, organosilicon compounds are not included in the building step for those explored works, so the results are entitled with Q2organosilicon as the fourth column in Table 1. It could be observed from the table that the predictive ability is very much inferior to the announced value for every work. In order to further evaluate whether or not such a situation can be overcome by refitting the data instead of changing the molecular descriptors, their MLR models are refitted with the collected FP data of organosilicon compounds to get the new parameters in the MLR models. The predicted FPs for all explored compounds and the coefficient of determination for each model are then calculated with these new-obtained parameters. Because such results are the fitting performance for their original molecular descriptors, the new-obtained R2 values are entitled with R2refitting as the fifth column in Table 1. It could be found that all these works do not result in satisfactory fitting performance, and this observation strongly suggested that other molecular descriptors should be considered to predict the FP of organosilicon compounds. Thus, the stepwise regression algorithm is applied to the whole data set of 230 organosilicon compounds to search the influential molecular descriptors for predicting the FP of organosilicon compounds from the above-mentioned 1538 molecular descriptors. It is obvious that the assigned p-values of the hypotheses of adding regressor into the model and

removing regressor from the model are the crucial factors to determine the number of molecular descriptors in the final model. Because we hope the number of molecular descriptors in the resulting model is as small as possible, a choice of higher probability of removing molecular descriptors and lower probability of adding molecular descriptors is preferable. In this work, the p-value for a regressor to leave the model is assigned to be 0.1, and we consider two cases of p-value for a regressor to enter the model: 5 × 10-4 and 1 × 10-5. Obviously, lower p-value for the regressor to enter the model will decrease the number of regressors in the final model while the p-value for a regressor to leave the model keeping constant. With these two settings of p-value for adding regressor, there are 13 molecular descriptors and 6 descriptors in the final model for p-value of 5 × 10-4 and 1 × 10-5, respectively. Table 2 shows the whole 13 descriptors with their definitions for the case of p-value of 5 × 10-4. For the case of p-value of 1 × 10-5, the molecular descriptors are the first 6 descriptors shown in table 2. The corresponding parameters in these two MLR models are then evaluated according to the training set. Equation 2 and 3 show the MLR models with 13 descriptors and with 6 descriptors, respectively. In these two equations, the numbers in the parentheses is the standard error of the corresponding parameters. Thirteen descriptors: Tf ) 239.9566 ((8.5814) 3.6747 ((2.6946) nCL - 17.6729 ((2.9156) nHM +14.2319 ((2.7570) Xu + 0.8718 ((2.0895) SPI + 2.0345 ((0.7322) MPC02 -50.8682 ((7.7430) X0 + 10.3444 ((7.6524) X1 + 8.1776 ((2.3713)X3 +1.7189 ((0.3781) X2v + 28.6903 ((5.1753) X0sol + 43.2846 ((8.5578) R5m+ +3.8879 ((1.3793) H 051 + 9.6851 ((2.7932) F02[C - F]

(2) Six descriptors: Tf ) 241.5076 ((3.6042) + 12.8651 ((0.9890) nCL + 20.3763 ((1.5738) Xu -23.5925 ((2.3278) X0 + 14.1528 ((1.6434) X3 + 1.6728 ((0.3179) X2V +5.9339 ((1.5868) H - 051 For these two proposed models, the predicted FPs are then calculated for both the training set and testing set, respectively. The listing names of the collected organosilicon compounds,

Table 2. Molecular Descriptors for the Proposed Models

1 2 3 4 5 6 7 8 9 10 11 12 13

type

molecular descriptors

definition

constitutional descriptors topological descriptors connectivity indices

nCL Xu X0 X3 X2v H-051 nHM SPI MPC02 X1 X0sol R5m+ F02[C-F]

number of chlorine atoms Xu index connectivity index chi-0 connectivity index chi-3 valence connectivity index chi-2 H attached to alpha-Cb number of heavy atoms superpendentic index molecular path count of order 02 (Gordon-Scantlebury index) connectivity index chi-1 (Randic connectivity index) solvation connectivity index chi-0 R maximal autocorrelation of lag 5/weighted by atomic masses frequency of C-F at topological distance 2

atom-centered fragments constitutional descriptors topological descriptors walk and path counts connectivity indices getaway descriptors 2D frequency fingerprints

(3)

Ind. Eng. Chem. Res., Vol. 49, No. 24, 2010

Figure 3. Plot of experimental FP and predicted FP of 13-descriptor MLR model: (a) training set; (b) testing set.

12705

13-descriptor model gives the numerical values of 13.786% and 22.906% for fitting performance and predictive performance, respectively; the 6-descriptor model gives the numerical values of 14.722% and 20.668% for fitting performance and predictive performance, respectively. It is also found from Table 3 that the fitting performance (R2) and the predictive performance (Q2) for the 13-descriptor model are 0.9293 and 0.8268, respectively; the fitting performance and the predictive performance for the 6-descriptor model are 0.8981 and 0.7601, respectively. A rule of thumb for developing a practical model is: the difference between R2 and Q2 must not be too large and preferably not exceeding 0.2-0.3. Moreover, a Q2 > 0.5 is regarded as good and a Q2 > 0.9 as excellent.21 In this study, the Q2 value of both the 13-descriptor model and 6-descriptor model is much higher than 0.5, so both of them are reasonable models to estimate the FP of organosilicon compounds. Hshieh had correlated FPs of organosilicon compounds with their normal boiling point, and the following correlation was suggested in his work:6 Tf ) -51.2385 + 0.4994Tb + 0.00047T2b

Figure 4. Plot of experimental FP and predicted FP of 6-descriptor MLR model: (a) training set; (b) testing set.

calculated values of the 13 molecular descriptors, the experimental FPs and the predicted FPs for the proposed two models are supplied in the Supporting Information. Figure 3 compares the experimental FPs with the predicted FPs by the 13-descriptor MLR model for both the training set and testing set; the experimental FPs and the predicted FPs by the 6-descriptor MLR model are compared in figure 4. It could be found from these two figures that these two MLR models give satisfactory performances in both fitting and predictive ability. Obviously, the 13-descriptor model performs better than the 6-descriptor one. To quantitatively show their performance, Table 3 summarizes important performance indices for these two models. As it is shown in Table 3, for the case of average error in percentage, the 13-descriptor model performs 3.008% and 3.403% for fitting performance and predictive performance, respectively; the 6-descriptor model performs 3.580% and 4.477% for fitting performance and predictive performance, respectively. In case of the maximum errors in percentage, the

(4)

where Tf and Tb are FP and normal boiling point with the unit of °C, respectively. He reached eq 4 from a data set of 207 organosilicon compounds and the R2 value of his model was announced to be 0.967. Although these 207 organosilicon compounds are all included in our data set of 230 compounds, the R2 value will decrease to 0.8851 when eq 4 is applied to the whole data set. However, because the new-added organosilicon compounds are not included in building up eq 4, such a comparison might be a little unfair to his work. To make a more feasible comparison between this correlated relation and the proposed QSPR models, we refit the relation between FP and normal boiling point with our training set and obtain the following correlation: Tf ) -45.6432 + 0.484869Tb + 0.000414T2b

(5)

The units of corresponding terms in eq 5 are the same as those in eq 4. This equation is then applied to the testing set to assess its predictive ability. The results for both the fitting ability and predictive ability are summarized in Table 4. It could be seen that although after refitting the R2 value will increase from 0.8851 to 0.8953, there is still a gap between 0.8953 and the announced R2 value of 0.967. The predictive ability of this correlation is found to be of 0.8543. It should be noted here that both the R2 and Q2 are larger than 0.85, so it is concluded that the FP is highly correlated with normal boiling point. In

Table 3. Performance of the Proposed 13-Descriptor Model and 6-Descriptor Model model 13 descriptors training set testing set 6 descriptors training set testing set

MSE

R2/Q2

maximum error (K)

average error (K)

maximum error in percentage (%)

average error in percentage (%)

154.27 269.76

0.9293 0.8268

43.861 79.975

9.444 10.540

13.786 22.906

3.008 3.403

222.50 373.50

0.8981 0.7601

46.837 72.164

11.210 13.635

14.722 20.668

3.580 4.477

Table 4. Performance of Hshieh’s Model for the Data Set of 230 Organosilicon Compounds data set

MSE

R2(Q2)

maximum error (K)

average error (K)

maximum error in percentage (%)

average error in percentage(%)

remark

whole training set testing set

237.0923 232.5804 227.0077

0.8851 0.8935 0.8542

92.1420 88.4057 62.6605

10.3453 10.5020 10.5606

25.2340 24.2108 19.6336

3.2233 3.2502 3.5146

eq 5 eq 5 eq 5

12706

Ind. Eng. Chem. Res., Vol. 49, No. 24, 2010 Table 5. Chemical Information for Experimental Chemicals

Table 6. Experimental FPs for n-Hexadecane and Triphenylsilane

Figure 5. Distribution of percentage error for the training set: (a) Hehieh’s model; (b) 13-descriptor model; (c) 6-descriptor model.

Figure 6. Distribution of percentage error for the testing set: (a) Hehieh’s model; (b) 13-descriptor model; (c) 6-descriptor model.

order to further compare the performance of the proposed two models with that of the Hshieh’s model, we plot the histograms of the fitting error and predictive error for these three models in Figures 5 and 6, respectively. It could be found from these two figures that the performance of 6-desciptor model is a little inferior to that of the Hshieh’s model; however, the 13-descriptor model performs as well as Hshieh’s model.

run no.

n-hexadecane(°C)

triphenylsilane (°C)

1 2 3 4 5 6 7 8 9 10 average standard deviation

134 134 134 130 134 134 136 130 132 134 133.2 1.8

143 143 139 145 139 143 143 145 145 147 143.2 2.6

While carefully looking over Figures 3(b) and 4(b), it is found that there is an outlier in the testing set for the proposed methods. This outlier is found to be of triphenylsilane. Its experimental FP was reported to be 349.15K (76 °C) in The Chemical Database, but its predicted FP for the 6-descriptor model and 13-descriptor model is 421.31K (148 °C) and 429.13K (156 °C), respectively. The molecular structure of triphenylsilane is shown in Table 5. However, it seems that there is no obvious reason why this compound is an outlier for the proposed methods, so we decide to conduct experiments to check its FP. The experimental method adopted here is ASTM D93 method, Standard Test Methods for Flash Point by Pensky-Martens Closed Cup Tester. The experimental details could be found in our previous work.22,23 In ASTM D93 method, n-hexadecane was assigned as one certified reference material (CRM) to verify apparatus performance, and the experimental result of its FP should be 133.9 ( 5.9 °C. Thus, we include n-hexadecane in our experiment as a quality control procedure for experimental results. The chemical information for the investigated chemicals are summarized in Table 5. It should be noted that the FP listed in Table 5 is obtained from the material safety data sheet (MSDS) supplied by the manufacturer. It could be found that the FP of triphenylsilane listed in manufacturer’s MSDS is the same as the one reported in The Chemical Database. The testing procedure for measuring FP is replicated ten times for each investigated chemical, and the average is reported to be the FP relative to the explored chemical. Experimental results and the corresponding statistics are summarized in Table 6. It could be found from Table 6 that the average for the case of the n-hexadecane is 133.2 °C and the corresponding standard deviation is 1.8 °C. This result meets the method’s requirement to verify apparatus performance, and we can conclude that the quality control requirement is met. The average for the case of triphenylsilane is 143.2 °C and the corresponding standard deviation is 2.6 °C. The ASTM D93 method announces the repeatability of this method is 5 °C, so the aforementioned result

Ind. Eng. Chem. Res., Vol. 49, No. 24, 2010

12707

Table 7. Predictive Performance after Correcting the Outlier model

MSE

Q2

maximum error (K)

average error (K)

maximum error in percentage (%)

average error in percentage (%)

13-descriptor 6-descriptor Hshieh’s model

134.26 260.83 445.47

0.9245 0.8533 0.7550

27.912 45.259 105.58

9.079 12.174 12.021

11.272 15.707 25.358

2.972 4.053 3.827

meets the repeatability required by the testing method. Thus, we are sure that the reported FP of triphenylsilane (76 °C) in The Chemical Database should be wrong, although the manufacturer’s MSDS also reports the same numerical value. The predicted FP of triphenylsilane by the proposed 13-descriptor and 6-descriptor model is 156 and 148 °C, respectively. It is obvious that these two predicted FPs are much closer to the new-measured FP of 143.2 °C than the original FP of 76 °C. It should be noted that the announced reproducibility for ASTM D93 method is 10 °C, so both of these two predicted FPs conform to the new-measured FP while considering the announced reproducibility for this method. It should be also noted that the predicted FP of triphenylsilane by Hshieh’s model is about 38 °C, which deviates from the new-measured FP very much. Because the triphenylsilane is included in the testing set, the predictive performance are recalculated for all explored models with replacing the FP of triphenylsilane by the new-measured value, and the results are summarized in Table 7. It is observed from Table 3 and Table 7 that the Q2 value increases from 0.8268 to 0.9245 for the 13-descriptor model and increases from 0.7601 to 0.8533 for the 6-descriptor model. It could be also found that after correcting this outlier the performance of 6-desciptor model is now comparable to that of the Hshieh’s model and the 13-descriptor model performs much better than Hshieh’s model. Moreover, it should be noted here that the drawback of Hshieh’s correlation is that it needs an existing physical property (i.e., normal boiling point) to predict the FP. Thus, when the data of normal boiling point is not available for a specific compound, his correlation will not work; however, as the proposed models need only the molecular structure of a compound to calculate all the required molecular descriptors, they are of much wider applicability. Finally, let us give a brief discussion on the selected molecular descriptors. As we have elucidated, the molecular descriptors suggested for predicting the FP of organosilicon compounds are different from the suggested descriptors for predicting the FP of organic compounds in other works. To illustrate why such a difference is observed, let us briefly interpret the FP in the molecular level. As FP is the temperature at which a liquid chemical emits sufficient flammable vapor to form ignitable fuelair mixture, it is obvious that the FP of a liquid chemicals depends on its ability to emit vapor, which is characterized by the normal boiling point of the chemicals. On the molecular level, the normal boiling point of a compound depends on the intermolecular forces. For ionic molecules, ion-ion force is the main intermolecular force. While considering the nonpolar molecules, van der Waals force is the main attribution to the intermolecular force. The factors determine the van der Waals force of a molecule includes: molecular weight, molecular size, and the relative polarizability of electrons of the atoms involved. However, most organic molecules are neither ionic nor nonpolar but have permanent dipoles resulting from a nonuniform distribution of the bonding electrons. In the liquid state, these molecules orient themselves so that the positive end of one molecule is directed toward, and thus attracted by, the negative end of another. Very strong dipole-dipole attractions between hydrogen atoms bonded to small, strongly negative elements

and nonbonding electron pairs on other such electronegative elements. This type of intermolecular force is called a hydrogen bond.24 The Pauling electronegativities of carbon, silicon, and hydrogen are found to be 2.5, 1.8, and 2.2, respectively.24 This fact means that electrons distribute more close to the carbon atom in a carbon-hydrogen bond, but electrons distribute more close to the hydrogen atom in a silicon-hydrogen bond. Thus, the effective molecular descriptors for these two classes of compounds might be different. Furthermore, the molecular weight distribution of the data set of organosilicon compounds is also different from that of the data set of organic compounds. As shown in Patel et al.’s work, the center of molecular weight distribution for the data set of organic compounds is about 100. However, it is observed from Figure 1 that the center of molecular weight distribution for the data set of organosilicon compounds is higher than 200. The difference in molecular weight will also result in different strength of van der Waals force. 4. Conclusions In the present work, one 13-descriptor QSPR model and one 6-descriptor QSPR model are proposed for predicting the FP of organosilicon compounds. To build up and validate these two models, a data set of 230 organosilicon compounds are collected and randomly distributed into a training set of 184 compounds and a test set of 46 compounds. The molecular descriptors to build up the models are selected from 1538 molecular descriptors via the stepwise regression technique. The p-values for accepting a descriptor to enter the model are set to be 5 × 10-4 and 1 × 10-5 for the13-descriptor model and 6-descriptor model, respectively. An outlier of triphenylsilane is indicated by both the proposed models. Its FP is reported to be 76 °C in The Chemical Database, but new experimental results show its FP should be about 143 °C which conforms to the FP predicted by the proposed methods while considering the experimental reproducibility. After correcting this outlier, it is found that the 6-descriptor model could fit the training data with R2 ) 0.8981 and predict the test data with Q2 ) 0.8533 and the 13-descriptor model could fit the training data with R2 ) 0.9293 and predict the test data with Q2 ) 0.9245. The average predictive errors are less than 5% for these two proposed models. A general comparison between the present work and the Hshieh’s work which predict the FP of an organosilicon compound via its normal boiling point is also explored. It is found that the performance of Hshieh’s model is comparable to that of the proposed 6-descriptor model, but is inferior to that of the proposed 13-descriptor model. However, since the proposed models used descriptors calculated from the molecular structure only, they do not require any measured physical property of the target compound to predict the FP. In this regard, they are of much wider applicability than that of Hshieh’s model.

12708

Ind. Eng. Chem. Res., Vol. 49, No. 24, 2010

Acknowledgment The authors would like to thank the National Science Council of the ROC for supporting this study financially under Grant No. NSC 98-2221-E-039 -002 -MY3. Supporting Information Available: Tables listing names of 230 organosilicon compounds used in this study, their experimental FP, the predicted FPs given by proposed methods, and the numerical values of the 13 molecular descriptors used to build up models. This information is available free of charge via the Internet at http://pubs.acs.org. Literature Cited (1) Vidal, M.; Rogers, W. J.; Holste, J. C.; Mannan, M. S. A Review of Estimation Methods for Flash Points and Flammability Limits. Process Saf. Prog. 2004, 23 (1), 47–55. (2) Hshieh, F. Y. Correlation of Closed-Cup Flash Points with Normal Boiling Points for Silicone and General Organic Compounds. Fire Mater. 1997, 21, 277–282. (3) Metcalfe, E.; Metcalfe, A. E. M. On the Correlation of Flash Points. Fire Mater. 1992, 16, 153–154. (4) Patil, G. S. Estimation of Flash Point. Fire Mater. 1988, 12, 127– 131. (5) Satyanarayana, K.; Kakati, M. C. Correlation of Flash Points. Fire Mater. 1991, 15, 97–100. (6) Satyanarayana, K.; Rao, P. G. Improved Equation to Estimate Flash Points of Organic Compounds. J. Hazard. Mater. 1992, 32, 81–85. (7) Albahri, T. A. Flammability Characteristics of Pure Hydrocarbons. Chem. Eng. Sci. 2003, 58, 3629–3641. (8) Chen, C. C.; Liaw, H. J.; Kuo, Y. Y. Prediction of Autoignition Temperatures of Organic Compounds by the Structural Group Contribution Approach. J. Hazard. Mater. 2009, 162, 746–762. (9) Gharagheizi, F.; Alamdari, R. F.; Angaji, M. T. A New NetworkGroup Contribution Method for Estimation of Flash Point Temperature of Pure Components. Energy Fuels 2008, 22, 1628–1635. (10) Pan, Y.; Jiang, J.; Wang, Z. Quantitative Structure-Property Relationship Studies for Predicting Flash Points of Alkanes Using Group Bond Contribution Method with Back-Propagation Neural Network. J. Hazard. Mater. 2007, 147, 424–430.

(11) Suzuki, T.; Ohtaguchi, K.; Koide, K. A Method for Estimating Flash Point of Organic Compounds from Molecular Structure. J. Chem. Eng. Jpn. 1991, 24 (2), 258–261. (12) Gharagheizia, F.; Alamdari, R. F. Prediction of Flash Point Temperature of Pure Components Using a Quantitative Structure - Property Relationship. QSAR Comb. Sci. 2008, 27 (6), 679–683. (13) Katritzky, A. R.; Stoyanova-Slavova, I. B.; Dobchev, D. A.; Karelson, M. QSPR Modeling of Flash Points: An Update. J. Mol. Graphics Modell. 2007, 26, 529–536. (14) Katritzky, A. R.; Jain, R. R.; Karelson, M. QSPR Analysis of Flash Points. J. Chem. Inf. Comput. Sci. 2001, 41, 1521–1530. (15) Pan, Y.; Jiang, J.; Ding, X.; Wang, R.; Jiang, J. Prediction of Flammability Characteristics of Pure Hydrocarbons from Molecular Structures. AIChE J. 2010, 56, 690–701. (16) Patel, S. J.; Ng, D.; Mannan, M. S. QSPR Flash Point Prediction of Solvents Using Topological Indices for Application in Computer Aided Molecular Design. Ind. Eng. Chem. Res. 2009, 48, 7378–7387. (17) The Chemical Database. The Department of Chemistry at the University Akron. The World WideWeb: http://ull.chemistry.uakron.edu/ erd/index.html. (18) HyperChem Release 8.0 for Windows, Molecular Modeling System, Hypercube Inc. 2008. (19) Talete srl. Dragon for Windows (Software for molecular Descriptor Calculations). Version 5.5. 2007 (http://www/talete.mi.it/). (20) The Mathworks. MATLAB Ststistical Toolbox v7.0 - User Guide. 2009, The Math Works. Inc. (21) Umetrics, A. B. Multi- and MegaVariate Data Analysis; Umetrics AB, 2001. (22) Liaw, H. J.; Lu, W. H.; Gerbaud, V.; Chen, C. C. Flash-Point Prediction for Binary Partially Miscible Mixtures of Flammable Solvents. J. Hazard. Mater. 2008, 153, 1165–1175. (23) Liaw, H. J.; Gerbaud, V.; Chen, C. C.; Shu, C. M. Effect of Stirring on the Safety of Flammable Mixtures. J. Hazard. Mater. 2010, 177, 1093– 1101. (24) Graham Solomons, T. W. Fundamentals of Organic Chemistry; John Wiley and Sons: New York, 1982.

ReceiVed for reView June 29, 2010 ReVised manuscript receiVed September 28, 2010 Accepted October 15, 2010 IE101381B