Quantitative Structure Relative Volatility Relationship Model for

Jun 18, 2014 - The Fourth Research Team, Daedeok Research Institute, Lotte Chemical Corporation, Jang-Dong #24-1, Yuseong-Gu, Daejeon,. 305-726 ...
6 downloads 0 Views 614KB Size
Article pubs.acs.org/IECR

Quantitative Structure Relative Volatility Relationship Model for Extractive Distillation of Ethylbenzene/p‑Xylene Mixtures Young-Mook Kang,† Yukwon Jeon,‡ Gicheon Lee,‡ Hyoungjun Son,† Sung Wook Row,‡,¶ Seonghwan Choi,∥ Young-Jong Seo,∥ Young Hwan Chu,⊥ Jae-Min Shin,§ Yong-Gun Shul,*,‡ and Kyoung Tai No*,†,§ †

Department of Biotechnology, ‡Department of Chemical and Biomolecular Engineering, and §Bioinformatics & Molecular Design Research Center, B138A, Yonsei Engineering Research Complex, Yonsei University, 50 Yonsei-Ro, Seodaemun-gu, Seoul, 120-749 Republic of Korea ∥ The Fourth Research Team, Daedeok Research Institute, Lotte Chemical Corporation, Jang-Dong #24-1, Yuseong-Gu, Daejeon, 305-726 Republic of Korea ⊥ Department of New Energy·Resource Engineering, College of Science & Engineering, Sangji University, 124, Sangjidae-gil, Wonju-si, Gangwon-Do, 220-702 Republic of Korea S Supporting Information *

ABSTRACT: Extractive distillation is a highly effective process for the separation of compound pairs having low relative volatility values, such as ethylbenzene (EB) and p-xylene (PX) mixtures. Many solvents or solvent mixtures have been screened experimentally to identify a suitable extraction agent for EB/PX mixtures. Because the number of possible solvent and solvent mixture candidates is high, it is necessary to introduce a computer-aided extraction performance prediction technique. In this study, a knowledge-based quantitative structure relative volatility relationship (QSRVR) model was developed using multiple linear regression (MLR) and artificial neural network (ANN) models, with each model having five descriptors. The root-meansquare errors (RMSE) of the training and test sets for the MLR model were calculated as 0.01486 and 0.00905, while their squared correlation coefficients (R2) were 0.867 and 0.941, respectively. The R2 and RMSE values of the total data set for the MLR model were 0.878 and 0.01408, and for the ANN model the values were 0.949 and 0.00929, respectively. The predictive ability of both models is sufficient for identifying suitable extractive distillation solvents for the separation of EB/PX mixtures.

1. INTRODUCTION Both ethylbenzene (EB) and isomers of xylene are important basic materials used in various chemical industries. Generally, “mixed xylene” refers to a mixture of EB and three isomers of xylene, o-xylene, m-xylene, and p-xylene (PX). These aromatic feedstock chemicals are starting materials for the synthesis of many classes of organic compounds; for example, PX is primarily used for the production of terephthalic acid, the monomer employed in the synthesis of poly(ethylene terephthalate) (PET), or simply, “polyester”, and EB is the main precursor for styrene manufacturing. Thus, the production of high-purity EB and PX is crucial; for instance, a purity of greater than 99.5% EB is required for dehydrogenation of EB to styrene.1−3 Industrially, these two feedstock are produced by the separation of mixed xylene, followed by purification of the obtained fractions; however, effective separation of these individual materials is a difficult task, because mixed xylene contains four structural isomers having identical molecular formulas (C8H10) and similar physical properties, particularly boiling point and vapor pressure, as shown in Table S1 (Supporting Information). The separation of EB from PX by distillation is a laborious process because the differences in boiling points and vapor pressures between the two molecules are very small.4,5 Although distillation is one of the more effective methods for gaseous separation as compared to chemical methods, a number of obstacles still remain in the © 2014 American Chemical Society

efficient separation of EB and PX, with their low relative volatility (RV) being the most important problem. Because the RV of EB to PX is reported to be as low as 1.06, the distillation system for separation requires fractional columns containing more than 200 theoretical plates and a high reflux ratio for conventional separation of EB and PX to be effective.6 Extractive distillation method is used to increase the RV of a compound pair in a mixture by adding a nonvolatile solvent, which acts as an extractive agent.7,8 The extractive agent can increase the RV between EB and PX, thus reducing the number of theoretical plates necessary to separate EB from PX. According to the literature,9,10 the number of plates can be reduced to 87 for an RV of 1.11, and can be further reduced to less than 50 for an RV of 1.20. Although extractive distillation has been used for the separation of some hydrocarbon mixtures, for example, separation of toluene from nonaromatic hydrocarbons, little has been reported on the separation of EB and PX by extractive distillation. In 1980, Berg et al.9,10 investigated single-component extractive agents for EB and PX separation, examining the role of aromatics, chlorinated aromatics, carbonyls, straight chains, alcohols, cyclics, sulfoxides, and Received: Revised: Accepted: Published: 11159

September 30, 2013 June 17, 2014 June 18, 2014 June 18, 2014 dx.doi.org/10.1021/ie403235r | Ind. Eng. Chem. Res. 2014, 53, 11159−11166

Industrial & Engineering Chemistry Research

Article

application.14 In this QSPR study, we used PreADME/T15 software, developed by BMDRC, for the calculation of molecular descriptors. The PreADME/T software was used to calculate more than 1000 molecular descriptors, which included constitutional (153), electrostatic (79), geometrical (26), physicochemical (131), and topological (693) types. The structures of the molecules in the data set were obtained from the PubChem database (http://pubchem.ncbi.nlm.nih.gov/), and their three-dimensional (3D) structures were optimized using Discovery Studio16,17 (Accelrys) with the Merck Molecular Force Field (MMFF).18 To take into account the effects of differences in the purities of the extractive agents used in this study, the descriptor values were corrected by multiplying the purity value by the calculated descriptor value:

phosphates in extractive distillation; the RV was enhanced from 1.06 to 1.11 in the separation of EB and PX using 1,2,4trichlorobenzene. The authors have been working to identify single-component extractive agents for EB and PX separation through experimental methods. As the number of compounds to be tested experimentally is vast, it is necessary to introduce a knowledge-based RV prediction method to reduce the experimental time and cost. Quantitative structure property relationship (QSPR) methods have already been used in various research fields in biology, chemistry, and physics for the effective analysis and prediction of the acquired experimental data.11,12 Herein, we introduce a QSPR method to develop RV predictor models with several machine-learning and pattern recognition tools. The purpose of this work is to find an optimum extractive agent that has the relative volatility of approximately 1.20, which would reduce the number of required theoretical plates from 200 to less than 50, thus making the method practical and economical.

descriptor* = descriptor × purity

2.4. Selection of Parameters for RV Models Using Genetic Algorithm. The selection of descriptors that have a significant influence on RV is crucial for QSPR model development. For selecting an appropriate descriptor, a combined genetic algorithm and multiple linear regression (GA-MLR) method was used.16 GA is frequently used for selecting the best input variables and to reduce their total number.19 The most important useful feature of the GA method is to generate a population of best-fitted models using the reduced number of molecular descriptors as input variables. Some of the important GA simulation parameters are the number of models, model form, model equation length, population size, maximum generations, and scoring function. The parameters used in this study are summarized in Table S3 (Supporting Information). We used the Friedman lack-of-fit (LOF) function to reduce the risk of overfitting to scale the mean-squared error, with a penalty factor based on the complexity of the model. The Discovery Studio software uses a modified version of the Friedman LOF measurement, as shown in eq 3:

2. MATERIALS AND METHODS 2.1. Materials. The feed materials consisting of EB (SigmaAldrich, anhydrous, 99.8%) and PX (Yakuri Pure Chemicals, ≥99%) were used without any further purification. As listed in Table S2 (Supporting Information), 49 single compounds were used as extractive agents, possessing reagent grade purity; these extractive agents comprised aromatics, chlorinated aromatics, carbonyls, straight chains, alcohols, cyclics, sulfoxides, and phosphates. The RV of EB to PX (αEB/PX) is expressed as αEB/PX =

fEB,vapor /fEB,liquid fPX,vapor /fPX,liquid

(2)

(1)

where f refers to mole fraction and the subscripts “liquid” and “vapor” represent the liquid and vapor phases, respectively. All experiments were performed using atmospheric distillation. Distillations were performed in an all-glass dynamic recirculating equipment that included a boiling pot, reflux drum, condenser, and heating mantle. The vapor condensed at the top of the water-cooled condenser and was collected in the reflux drum. Extractive distillations were carried out under isothermic conditions with a 1:1 solvent-to-feed ratio after the system reached the equilibrium conditions with the corresponding extractive agent. The αEB/PX value was calculated from the gas chromatograph (GC) data using eq 1.13 Our experiments were calibrated with an RV value of 1.067 for the EB/PX mixture without any extractive agent, which is comparable to the previously reported value of 1.0612.9 Determining the αEB/PX values allowed us to directly compare the effectiveness of different agents used for extractive distillation. The performance of an extractive agent in the distillation is linearly dependent on the RV value (αEB/PX). 2.2. Data Set for QSPR Model Development. The RV values of the EB/PX mixtures for 49 extractive agents were used as target properties in the QSPR model. The data set was divided into two subsets, a training set of 41 compounds, and a test set of eight compounds, by a simple random sampling method. The training set was used to develop the QSPR models. The test set was used to evaluate the predictability of the models. 2.3. Molecular Descriptors. Molecular descriptors are defined as numerical characteristics associated with chemical structures, which are very important elements in the QSPR

N

LOF =

pred 2 ∑i = 1 (αiexp ,EB/PX − αi ,EB/PX )

⎡ N − 0.99 ⎣⎢

{

αexp i,EB/PX

P + dC(N − Pmax ) Pmax

⎤2 ⎦⎥

}

(3)

αpred i,EB/PX

where and are the experimental and predicted αEB/PX values of sample i, respectively; N is the number of samples in the training set; P is the number of descriptors in each equation; d is the LOF smoothness parameter; C is a measure of equation complexity that is equal to the total number of features in the equation; and Pmax is the maximum equation length. 2.5. Artificial Neural Network (Back-Propagation Method). To develop a nonlinear model with higher accuracy, the five descriptors in the best equations generated by GAMLR were used as the input layer in an artificial neural network (ANN) model. ANN is an advanced and complicated modelbuilding technique widely used in many applications; models using ANNs provide better results with nonlinear functions as compared to linear models.20−24 In this study, we used a backpropagation (BP) neural network module in Discovery Studio.17 Some important parameters in BP-ANN are the number of nodes in a hidden layer, maximum number of cycles, maximum cycles without improvement, data standardization, fraction of test sets, cross-validation methods, and the number of cross-validation groups. These parameters are summarized in Table S4 (Supporting Information). 11160

dx.doi.org/10.1021/ie403235r | Ind. Eng. Chem. Res. 2014, 53, 11159−11166

Industrial & Engineering Chemistry Research

Article

Table 1. Squared Correlation Coefficient and Root Mean Squares Error for the Top Five Models of GA-MLR training set (A)

a

test set (B)

total (A+B)

model

R2a

Q2b

RMSEc

R2

RMSE

R2

RMSE

RVMLR_1 RVMLR_2 RVMLR_3 RVMLR_4 RVMLR_5

0.867 0.871 0.866 0.861 0.862

0.823 0.839 0.827 0.819 0.815

0.014 86 0.014 61 0.014 89 0.015 19 0.015 12

0.941 0.892 0.907 0.900 0.874

0.009 05 0.013 28 0.014 12 0.013 24 0.014 38

0.878 0.873 0.867 0.864 0.862

0.014 08 0.014 40 0.014 76 0.014 89 0.015 00

Determination coefficient. bLeave-one-out cross-validated coefficient of determination. cRoot mean square error.

Table 2. Count of Selected Descriptors in the Two Top Five Groups Generated by GA-MLR and BP-ANN

a

name (category)

description

count (MLR)

count (ANN)

GC_04_Pol (topological, polarity) BCUT_02_AlogP98 (topological, hydrophobicity) MC_07_MPEOE (topological, polarity) ATS_MB_00_MPEOE_Avg (topological, polarity) Abs_Charge_Mean (electrostatic, polarity) No_R_CX_R (physicochemical, hydrophobicity) MC_07_Estate (topological, polarity) MC_07_Pol (topological, polarity) dChi_02 (topological, hydrophobicity) FraVSAhydsat (geometrical, hydrophobicity) No_CH (constitutional, hydrophobicity) dChi_00 (topological, hydrophobicity) Flagmonoheterocyclic (constitutional, hydrophobicity) BCUT_01_AlogP98 (topological, hydrophobicity) Chi_00_Sol (topological, hydrophobicity) EstateSAll (topological, polarity) SPI (topological, polarity) WPSA1 (topological, hydrophobicity)

Geary coefficient 04 weighed by polarizability highest eigenvalue no. 2 of Burden matrix weighed by AlogP98 Moran coefficient 07 weighed by MPEOEa charge Moreau Bruto Autocorrelation38 00 weighed by MPEOE charge average mean of absolute atomic charges in a molecule number of AlogP98 atomic type 26 - C in R--CX--R Moran coefficient 07 weighed by E-state Moran coefficient 07 weighed by polarizability difference between Chib and VChic of order 2 fraction of 2D van der Waals hydrophobic saturated surface area number of single bonds between C atoms and H atoms difference between Chi and VChi of order zero flag of monocyclic and heterocyclic compounds highest eigenvalue no. 1 of Burden matrix weighed by AlogP98 solvation molecular connectivity index of order zero sum of E-state for all atoms including H superpendentic index39 WPSA1d

5 5 3 3 3 1 1 1 1 1 1 0 0 0 0 0 0 0

5 4 1 1 1 0 0 2 0 0 3 1 1 1 1 2 1 1

Modified partial equalization of orbital electronegativity. bKier and Hall molecular connectivity index. cKier and Hall valence connectivity index. The partial positive van der Waals surface area multiplied by the total van der Waals surface area and divided by 1000.

d

The composition of BP-ANN includes an input layer, a hidden layer, and an output layer. The six neurons present in the input layer corresponded to the five descriptors plus one bias. The number of neurons in the hidden layer was set from 1 to 3. The output layer had only one neuron for the prediction of αEB/PX. During BP-ANN training, the same input data used in the training set in our GA-MLR study were automatically divided into 33 compounds for training and eight for validation to test predictability and robustness of the model by crossvalidation. The error value of the validation data was monitored until the training stopped when the validation error increased over a given iteration. The error value of the test set was finally used for the evaluation of the predictability and robustness of our ANN models.

RVMLR_1 = 1.0288 + 0.025919 × GC_04_Pol + 0.10358 × BCUT_02_Alog P98 + 0.048845 × MC_ 07_MPEOE + 0.010045 × No_R_CX_R − 0.065834 × FraVSA hydsat

(4)

RVMLR_2 = 0.94967 + 0.031429 × GC_04_Pol + 0.14241 × BCUT_02_Alog P98 + 0.036912 × MC_ 07_MPEOE − 0.32871 × ATS_MB_00_MPEOE_avg + 0.52227 × Abs_Charg e_Mean

(5)

RVMLR_3 = 0.95746 + 0.033356 × GC_04_Pol + 0.028372 × MC_07_Pol − 0.34149 × ATS_MB_

3. RESULTS AND DISCUSSION

00_MPEOE_avg + 0.13798 × BCUT_02_Alog P98

During each GA-MLR descriptor selection procedure, we obtained a set of descriptors for a QSRVR model. Table 1 shows the top five MLR models scored by R2 values calculated using data from all 49 RV experiments. These five models with 11 descriptors are shown in Table 2. Interestingly, the two descriptors GC_04_Pol and BCUT_02_AlogP98 were selected in all five models. The best MLR model is

+ 0.54518 × Abs_Charg e_Mean

(6)

RVMLR_4 = 0.99675 + 0.02708 × GC_04_Pol + 0.11801 × BCUT_02_Alog P98 + 0.04365 × MC_ 07_MPEOE + 0.017094 × dChi_02 − 0.0026559 × No_CH 11161

(7)

dx.doi.org/10.1021/ie403235r | Ind. Eng. Chem. Res. 2014, 53, 11159−11166

Industrial & Engineering Chemistry Research

Article

Table 3. Architecture and RMSE of the Top Five Models of BP-ANN Results

a

no.

network configurationa

input descriptors

RMSE (train)

RMSE (test)

RMSE (total)

1

5−3−1

0.00717

0.01627

0.00929

2

5−3−1

0.00743

0.01672

0.00959

3

5−3−1

0.00877

0.01703

0.01057

4

5−3−1

0.00933

0.02000

0.01175

5

5−3−1

GC_04_Pol, BCUT_02_AlogP98, MC_07_MPEOE, No_CH and EstateSAll GC_04_Pol, BCUT_02_AlogP98, MC_07_Pol, No_CH and EstateSAll GC_04_Pol, BCUT_02_AlogP98, MC_07_Pol, dChi_00 and Chi_00_Sol GC_04_Pol, BCUT_02_AlogP98, ATS_MB_00_MPEOE_Avg, Abs_Charge_Mean and SPI GC_04_Pol, No_CH, Flagmonoheterocyclic, BCUT_01_AlogP98 and WPSA1

0.00951

0.02011

0.01191

A, B, and C of “A−B−C” are the number of nodes in the input, hidden, and output layers except the bias.

method.35 These results suggest that αEB/PX is considerably related to atomic polarizabilites, atomic partial charges, and atomic AlogP98 values of the extractive agent. Because all of the coefficients of GC_04_Pol, MC_07_MPEOE, and BCUT_02_AlogP98 were positive, we can presume that αEB/PX will increase when these values become larger. Furthermore, the coefficient of No_R_CX_R was also positive; No_R_CX_R is the number of AlogP98 atomic type 26, C in R−CX−R, where X represents a halogen atom. This means that increasing the number of carbon atoms bonded with a halogen atom such as a fluorine or chlorine can increase αEB/PX. FraVSAhydsat refers to a fraction of the 2D hydrophobic surface area saturated with van der Waals interactions; the negative coefficient of FraVSAhydsat suggested that the hydrophobicity of an extractive agent has a negative effect on the separation of EB and PX. These results show that an extractive agent attracts PX. To measure the influence of each descriptor in the best MLR model for αEB/PX, the lower and upper bounds for a (nonmultiplicity-corrected) confidence level of 0.95 were evaluated and are shown in Table S5 (Supporting Information). BCUT_02_AlogP98 showed the minimum p-value and the maximum difference between the lower and upper bounds, meaning that BCUT_02_AlogP98 is a very important descriptor related to the α EB/PX value. The order of F ra V S A h y d s a t , M C_07_MPEOE, GC _04_Pol, and No_R_CX_R represents the order of influence for αEB/PX in a descending manner. The number of carbon atoms bonded to a halogen atom (No_R_CX_R) did not appear to be very important for αEB/PX; rather, the descriptors closely related to hydrophobicity; for example, BCUT_02_AlogP98 and FraVSAhydsat showed a high correlation to αEB/PX. The Pearson correlation coefficients (r) between each pair of descriptors used in this study were evaluated. The value of r represents the

RVMLR_5 = 0.96312 + 0.029611 × GC_04_Pol + 0.026238 × MC_07_Estate − 0.32209 × ATS_MB_ 00_MPEOE_avg + 0.13867 × BCUT_02_Alog P98 + 0.5062 × Abs_Charg e_Mean

(8)

Among the top five GA-MLR models shown Table 1, we chose eq 4 as the best MLR model on the basis of the following three reasons: first, eq 4 showed the highest R2 value of 0.941 for the test set; second, it showed the lowest RMSE value of 0.01408 for the total data; finally, it showed a reasonably good Q2 value of 0.823 for the training set. Two autocorrelation descriptors are present in this model, GC_04_Pol and MC_07_MPEOE, which are also denoted Geary’s C and Moran’s I, respectively.25,26 GC_04_Pol is Geary’s C of order 4 using atomic polarizability; the charge dependence of the effective atomic polarizability (CDEAP) method was used as atomic polarizability.27 MC_07_MPEOE is Moran’s I of order 7 using an atomic partial charge model, which is the modified partial equalization of orbital electronegativity (MPEOE).28−31 The autocorrelation descriptors were often considered for raw data at various locations. The autocorrelation among residuals could also imply the presence of nonlinear relations between the dependent and independent variables,32 which means that MPEOE (atomic partial charge) and CDEAP (atomic polarizability) can be related to αEB/PX in a nonlinear fashion. BCUT_02_AlogP98 is a BCUT descriptor that has a highest eigenvalue no. 2 of Burden matrix weighted by atomic AlogP98 values. BCUT descriptors were obtained from the positive and negative eigenvalues of the adjacency matrix, weighing the diagonal elements with atomic properties.33,34 AlogP98 is a partition coefficient value calculated by Ghose’s atom-additive 11162

dx.doi.org/10.1021/ie403235r | Ind. Eng. Chem. Res. 2014, 53, 11159−11166

Industrial & Engineering Chemistry Research

Article

Table 4. Correlation Matrixes of the Best GA-MLR and BP-ANN Models GC_04_Pol GA-MLR

BP-ANN

GC_04_Pol BCUT_02_AlogP98 MC_07_MPEOE No_R_CX_R FraVSAhydsat GC_04_Pol BCUT_02_AlogP98 MC_07_MPEOE No_CH EstateSAll

BCUT_02_AlogP98

1 −0.1711 0.0319 0.0773 0.0533 GC_04_Pol

1 −0.2375 0.5470 −0.1690 BCUT_02_AlogP98

1 −0.1711 0.0319 −0.2315 −0.2850

1 −0.2375 −0.0210 0.2961

MC_07_MPEOE

1 −0.1284 0.3441 MC_07_MPEOE

1 0.0096 −0.2698

No_R_CX_R

1 −0.1460 No_CH

1 0.1303

FraVSAhydsat

1 EstateSAll

1

Figure 1. Predicted relative volatilities of EB to PX by GA-MLR method versus experimental relative volatilities of EB to PX.

strength and direction of a linear relationship between two variables. Correlations greater than 0.8 and less than 0.5 are usually described as strong and weak, respectively. Table 4 lists the Pearson correlation coefficients between descriptors in the best MLR and ANN models; from the table, we can see the highest value of 0.547 in the correlation matrices. These descriptors appear to be considerably independent of each other and may significantly contribute to the best linear regression model. As compared to that between the descriptors in RV M L R _ 1 , the co rrelation coefficient between ATS_MB_00_MPEOE_avg36 and Abs_Charge_Mean in RVMLR_2, RVMLR_3, and RVMLR_5 models is 0.889. Therefore, these models are considered as poor QSRVR models. Supporting Information Table S6 shows the correlation coefficients between the descriptors used in this study. In the case of BP-ANN, we generated a total of 300 nonlinear models using the descriptor sets of the top 100 MLR models by changing the number of neurons in the hidden layer. BP-ANN method, which has a hidden layer to apply from one neuron to three neurons, was used. Therefore, a total of 300 BP-ANN models was generated from the descriptor subset of 100 GA-MLR models with a variable for the number of neurons

in a hidden layer. Table 3 shows the ANN configurations and the root-mean-square error (RMSE) values of the training and test sets in the top five models. All models (Table 3) had the same configuration, consisting of five neurons in the input layer, three neurons in the hidden layer, and one neuron in the output layer; this configuration did not include the bias neurons. These results show that as the number of neurons in the hidden layer increased, the training process tended to provide a better BP-ANN model; however, too many weights in the neurons as compared to the number of training compounds resulted in overfitted or bad models. When the number of hidden neurons was greater than four, the number of weights in our BP-ANN model was greater than 29, which is too high as compared to the number of training data. Therefore, the maximum number of neurons in the hidden layer was set to three. The best BP-ANN models used were GC_04_Pol, BCUT_02_AlogP98, MC_07_MPEOE, No_CH, and EstateSAll. As compared to the best GA-MLR model, the best BP-ANN model uses GC_04_Pol, BCUT_02_AlogP98, and MC_07_MPEOE. These descriptors may be interpreted as essential descriptors in the QSPR model building for αEB/PX. 11163

dx.doi.org/10.1021/ie403235r | Ind. Eng. Chem. Res. 2014, 53, 11159−11166

Industrial & Engineering Chemistry Research

Article

Figure 2. Predicted relative volatilities of EB to PX by BP-ANN method versus experimental relative volatilities of EB to PX.

Figure 3. Structures of the best top five compounds evaluated by experimental relative volatility of EB to PX.

The observed and predicted values for training and test sets with the names of all compounds are summarized in Table S2 (Supporting Information). In the case of GA-MLR, the R2 value 0.878 and the RMSE value 0.141 indicated good accuracy of the total data set. In the case of BP-ANN, the R2 value 0.949 and very low RMSE value 0.00929 indicated excellent accuracy of the total data set. The predicted and experimental αEB/PX values are shown in Figures 1 and 2 for the GA-MLR and BP-ANN methods, respectively. Figure 3 shows the structures of the five best compounds evaluated by their experimental αEB/PX values. It should be noted that these excellent extractive agents are composed of a backbone containing the benzene structure. According to this result, more effective extractive agents may be identified by functional group modification of the backbone benzene structure.

Although GC_04_Pol and MC_07_MPEOE were not considered the most important descriptors in the best GAMLR model, it shows that the atomic charge and atomic polarizability were also essential properties for building the αEB/PX prediction model. The other two descriptors are the number of single bonds between C and H atoms (No_CH) and the sum of E-state for all atoms, including a hydrogen atom (EstateSAll).37 The best GA-MLR and BP-ANN models are strongly dependent on molecular hydrophobicity and polarity. FraVSAhydsat, BCUT_02_AlogP98, No_CH, and No_R_CX_R are related to hydrophobicity. The 1-octanol−water partition coefficient (log P) is a measure of molecular hydrophobicity. The C atom in R−CX−R is hydrophobic. GC_04_Pol, MC_07_MPEOE, and EstateSAll are related to molecular polarity. Both molecular hydrophobicity and polarity contribute significantly to intermolecular interactions; therefore, most of the selected descriptors are related to molecular hydrophobicity or polarity. The stronger is the attraction between PX and an extractive agent, the higher is the RV of EB to PX. When selecting the best models for GA-MLR and BP-ANN, the external validation methodology was employed to evaluate the predictive ability of the obtained model. Therefore, R2 and RMSE were calculated for the training set and external test set formed by compounds that were not included in their training progresses.

4. CONCLUSIONS An RV QSPR model was developed to predict the RV of an EB/PX pair using the GA-MLR and BP-ANN methods. The GA method was used to select the five most relevant molecular descriptors (among more than 1000 molecular descriptors) to build the MLR and ANN models. On the basis of our QSPR study, we determined that αEB/PX is highly affected by atomic polarizability, atomic partial charge, and log P. In addition, 11164

dx.doi.org/10.1021/ie403235r | Ind. Eng. Chem. Res. 2014, 53, 11159−11166

Industrial & Engineering Chemistry Research

Article

(2) Monton, J. B.; Llopis, F. Isobaric vapor-liquid equilibria of ethylbenzene + m-xylene and ethylbenzene + o-xylene systems at 6.66 and 26.66 kPa. J. Chem. Eng. Data 1994, 39, 50−52. (3) Seko, M.; Miyake, T.; Inada, K. Economical p-xylene and ethylbenzene separated from mixed xylene. Ind. Eng. Chem. Prod. Res. Dev. 1979, 18, 263−268. (4) Yan, T. Y. Separation of p-xylene and ethylbenzene from C8 aromatics using medium-pore zeolites. Ind. Eng. Chem. Res. 1989, 28, 572−576. (5) Seko, M.; Takeuchi, H.; Inada, T. Scale-up for chromatographic separation of p-xylene and ethylbenzene. Ind. Eng. Chem. Prod. Res. Dev. 1982, 21, 656−661. (6) Berg, L. Separation of ethylbenzene from para-and meta-xylenes by extractive distillation. U.S. Patent 4,299,668, 1981. (7) Lek-utaiwan, P.; Suphanit, B.; Douglas, P. L.; Mongkolsiri, N. Design of extractive distillation for the separation of close-boiling mixtures: Solvent selection and column optimization. Comput. Chem. Eng. 2011, 35, 1088−1100. (8) Yin, W.; Ding, S.; Xia, S.; Ma, P.; Huang, X.; Zhu, Z. Cosolvent selection for benzene−cyclohexane separation in extractive distillation. J. Chem. Eng. Data 2010, 55, 3274−3277. (9) Berg, L.; Kober, P. J. The separation of ethylbenzene from p- and m-xylene by extractive distillation using mixtures of polychloro compounds. AIChE J. 1980, 26, 862−865. (10) Berg, L. Separation of ethylbenzene from para-and meta-xylenes by extractive distillation. U.S. Patent 4,292,142, 1981. (11) Kubinyi, H. QSAR: Hansch Analysis and Related Approaches; Wiley-VCH: New York, 1993. (12) Kubinyi, H.; Folkers, G.; Martin, Y. C. 3D QSAR in Drug Design Vol. 2: Ligand-Protein Interactions and Molecular Similarity; Springer: New York, 2003. (13) Jeon, Y.; Row, S. W.; Lee, G.; Park, S. S.; Chu, Y. H.; Shul, Y. G. Solvents screening for the separation of ethylbenzene and p-xylene by extractive distillation. Korean J. Chem. Eng., in press. (14) Todeschini, R.; Consonni, V. Molecular Descriptors for Chemoinformatics; Wiley-VCH: New York, 2009. (15) BMDRC PreADMET, http://preadmet.bmdrc.org. (16) Accelrys Inc. Discovery Studio (Version 3.1.0), 2011. (17) Accelrys Inc. Discovery Studio (Version 2.5.5), 2009. (18) Halgren, T. A. Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94. J. Comput. Chem. 1996, 17, 490−519. (19) Rogers, D.; Hopfinger, A. J. Application of genetic function approximation to quantitative structure-activity relationships and quantitative structure-property relationships. J. Chem. Inf. Model. 1994, 34, 854−866. (20) Kalogirou, S. A. Artificial intelligence for the modeling and control of combustion processes: a review. Prog. Energy Combust. Sci. 2003, 29, 515−566. (21) Rumelhart, D. E.; McClelland, J. L.; Group, P. R. Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations; A Bradford Book: Cambridge, 1986; Vol. 1, pp 318−362. (22) Zupan, J.; Gasteiger, J. Neural Networks for Chemists: An Introduction; Wiley-VCH: New York, 1993. (23) Zupan, J.; Gasteiger, J. Neural Networks in Chemistry and Drug Design, 2nd ed.; Wiley-VCH: New York, 1999. (24) Kumar, S. Neural Networks a Classroom Approach; Mcgraw Hill Higher Education: New York, 2004. (25) Geary, R. C. The contiguity ratio and statistical mapping. Incorporated Statistician 1954, 5, 115−127 and 129−146. (26) Moran, P. A. P. Notes on continuous stochastic phenomena. Biometrika 1950, 37, 17−23. (27) No, K. T.; Cho, K. H.; Jhon, M. S.; Scheraga, H. A. An empirical method to calculate average molecular polarizabilities from the dependence of effective atomic polarizabilities on net atomic charge. J. Am. Chem. Soc. 1993, 115, 2005−2014. (28) No, K. T.; Grant, J. A.; Scheraga, H. A. Determination of net atomic charges using a modified partial equalization of orbital

many good extractive agents share the common benzene backbone, which may be a very important direction for further study. The present QSPR study can also be used to build improved QSPR models of αEB/PX, not only for single extractive agents but also for mixed systems containing additives. Further studies of suitable extractive agents for mixed systems are currently under investigation in our laboratories.



ASSOCIATED CONTENT

S Supporting Information *

Tables of chemical structures and boiling points of ethylbenzene and p-xylene, relative volatilities of ethylbenzene to pxylene, parameters of genetic algorithm, parameters of the back propagation neural networks, coefficients and statistical analysis of the best MLR model, and correlation matrix among the molecular descriptors used in this study. This material is available free of charge via the Internet at http://pubs.acs.org.



AUTHOR INFORMATION

Corresponding Authors

*(K.T.N.) Tel.: 82-2-393-9551. Fax: 82-2-393-9554. E-mail: [email protected]. *(Y.-G.S.) Tel.: 82-2-2123-2758. Fax: 82-2-312-6401. E-mail: [email protected]. Present Address ¶

(S.W.R.) GS Engineering & Construction, 33 Jong-Ro, Jongro-Gu, Seoul, 110-130, Republic of Korea.

Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This work was financially supported by a grant from the Industrial Source Technology Development Programs (201310033277) of the Ministry of Trade, Industry and Energy (MOTIE) of Korea.



ABBREVIATIONS ANN = artificial neural network BP = back-propagation CDEAP = charge dependence of the effective atomic polarizability EB = ethylbenzene GA-MLR = genetic algorithm and multiple linear regression GC = gas chromatograph LOF = lack-of-fit MLR = multiple linear regression MMFF = Merck Molecular Force Field MPEOE = modified partial equalization of orbital electronegativity PET = poly(ethylene terephthalate) PX = p-xylene QSRVR = quantitative structure relative volatility relationship R2 = squared correlation coefficients RMSE = root-mean-square errors RV = relative volatility



REFERENCES

(1) Gu, Z.-Y.; Jiang, D.-Q.; Wang, H.-F.; Cui, X.-Y.; Yan, X.-P. Adsorption and separation of xylene isomers and ethylbenzene on two Zn−terephthalate metal−organic frameworks. J. Phys. Chem. C 2010, 114, 311−316. 11165

dx.doi.org/10.1021/ie403235r | Ind. Eng. Chem. Res. 2014, 53, 11159−11166

Industrial & Engineering Chemistry Research

Article

electronegativity method. 1. Application to neutral molecules as models for polypeptides. J. Phys. Chem. 1990, 94, 4732−4739. (29) No, K. T.; Grant, J. A.; Jhon, M. S.; Scheraga, H. A. Determination of net atomic charges using a modified partial equalization of orbital electronegativity method. 2. Application to ionic and aromatic molecules as models for polypeptides. J. Phys. Chem. 1990, 94, 4740−4746. (30) Park, J. M.; No, K. T.; Jhon, M. S.; Scheraga, H. A. Determination of net atomic charges using a modified partial equalization of orbital electronegativity method. III. Application to halogenated and aromatic molecules. J. Comput. Chem. 1993, 14, 1482−1490. (31) Park, J. M.; Kwon, O. Y.; No, K. T.; Jhon, M. S.; Scheraga, H. A. Determination of net atomic charges using a modified partial equalization of orbital electronegativity method. IV. Application to hypervalent sulfur- and phosphorus-containing molecules. J. Comput. Chem. 1995, 16, 1011−1026. (32) Cliff, A. D.; Ord, J. K. Spatial Autocorrelation; Pion: UK, 1973. (33) Burden, F. R. Molecular identification number for substructure searches. J. Chem. Inf. Model. 1989, 29, 225−227. (34) Burden, F. R. A chemically intuitive molecular index based on the eigenvalues of a modified adjacency matrix. Quant. Struct.-Act. Relat. 1997, 16, 309−314. (35) Ghose, A. K.; Viswanadhan, V. N.; Wendoloski, J. J. Prediction of hydrophobic (lipophilic) properties of small organic molecules using fragmental methods: An analysis of ALOGP and CLOGP Mmthods. J. Phys. Chem. A 1998, 102, 3762−3772. (36) Moreau, G.; Broto, P. The autocorrelation of a topological structure: A new molecular descriptor. Nouv. J. Chim. 1980, 4, 359− 360. (37) Hall, L. H.; Mohney, B.; Kier, L. B. The electrotopological state: structure information at the atomic level for molecular graphs. J. Chem. Inf. Model. 1991, 31, 76−82. (38) Hollas, B. An analysis of the autocorrelation descriptor for molecules. J. Math. Chem. 2003, 33, 91−101. (39) Gupta, S.; Singh, M.; Madan, A. K. Superpendentic index: A novel topological descriptor for predicting biological activity. J. Chem. Inf. Model. 1999, 39, 272−277.

11166

dx.doi.org/10.1021/ie403235r | Ind. Eng. Chem. Res. 2014, 53, 11159−11166