Thermal Hazard of Ionic Liquids: Modeling ... - ACS Publications

Mar 24, 2017 - ABSTRACT: Thermal hazard, which is closely related to the potential fire risk under high temperature, has become one of the most import...
0 downloads 0 Views 601KB Size
Subscriber access provided by UNIV OF ARIZONA

Article

Thermal Hazard of Ionic Liquids: Modeling Thermal Decomposition Temperatures of Imidazolium Ionic Liquids via QSPR Method Xinyue Zhao, Yong Pan, Juncheng Jiang, Shuangyan Xu, Jiajia Jiang, and Li Ding Ind. Eng. Chem. Res., Just Accepted Manuscript • DOI: 10.1021/acs.iecr.6b04762 • Publication Date (Web): 24 Mar 2017 Downloaded from http://pubs.acs.org on March 26, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Industrial & Engineering Chemistry Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Thermal Hazard of Ionic Liquids: Modeling Thermal Decomposition Temperatures of Imidazolium Ionic Liquids via QSPR Method Xinyue Zhao, Yong Pan*, Juncheng Jiang, Shuangyan Xu, Jiajia Jiang, Li Ding

Jiangsu Key Laboratory of Hazardous Chemicals Safety and Control, College of Safety Science and Engineering, Nanjing Tech University, Nanjing 210009, China.

Abstract: The thermal hazard, which is closely related to the potential fire risk under high temperature, has become one of the most important characteristics of various Ionic Liquids (ILs). This study a quantitative structure-property relationship (QSPR) model to predict the thermal decomposition temperature (Td) of imidazolium ILs from their molecular structures. Not only the descriptors for single cation and anion, but also those for describing their interactions were considered to numerically the structure characteristics of ILs. Genetic algorithm based multiple linear regression was used to select the most statistically effective descriptors on the Td of imidazolium ILs. The resulted model is a multilinear equation with seven variables, including two descriptors for cations, four descriptors for anions, and one interaction descriptor. The developed model was rigorously validated using multiple strategies, and further extensively compared to other previously published models. The results demonstrated the robustness, validity and satisfactory predictivity of the proposed model. The predominant structure characteristics responsible for Td were also identified through model The proposed model could be reasonably expected to reliably predict the thermal hazard of novel 1

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 35

imidazolium ILs, and provide guidance for prioritizing design and manufacture of safer ILs with desired properties.

1. Introduction

Recently, Ionic Liquids (ILs), which are composed of ions completely at room temperature, have gained widespread applications in different areas due to their unique and beneficial physicochemical properties, such as low vapor pressures1, the stability to water and air2, excellent solubility3, low toxicity4 (for most ILs) and easy designability5. For instance, ILs have been widely used in the field of electrochemistry6, catalyst7, medicine8, bioengineering9 and so on. It is thus not surprising that both the variety and market value of ILs products are rapidly increasing10. However, sufficient and satisfactory thermal stability is always required for the ILs when they are used as electrolyte, catalyst, solvent, etc. As well known, the thermal decomposition temperature (Td) demonstrates the maximum temperature at which ILs can exist before undergoing chemical decomposing. In other words, it illustrates the maximum processing temperature of ILs. Therefore, Td is considered to be one of the most important properties for ILs’ safety, and has been widely used to characterize the thermal hazard of ILs. However, because of the availability and limitations of both the experiment equipments and the studied ILs materials, it is unrealistic either to measure the thermal decomposition data for all the existed ILs or to meet the aim of the information sharing of ILs by using only experiment method. Consequently, the development of theoretical prediction methods which are convenient and reliable for estimating the thermal hazard of ILs is desirably required and considered to be absolutely necessary even for the formulation into a R&D perspective. The well-known Quantitative Structure-Property Relationship (QSPR) method is based on the assumption that the variance in activity of the chemicals is determined by the variance in their molecular 2

ACS Paragon Plus Environment

Page 3 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

structures11, and it can be expected to capture the complex relationships between the molecular structures and properties of chemicals without requiring detailed knowledge of the mechanisms of interaction. This method is considered to be time- and money- saving, could reduce the number of testing and reveal the activity mechanism. Consequently, it has been widely applied in the areas of various physicochemical properties modeling.12-20 More recently, a few efforts have also been made to develop QSPR models to correlate ionic structures with various properties (toxicity21, n-octanol/water coefficient22, etc.) for ILs, including the thermal decomposition temperature.23-25 Yan et al.23 developed a QSPR model for predicting the decomposition temperature (Td) of 158 ionic liquids (ILs). A new topological index (TI) was proposed based on atom characteristics and atom positions in the hydrogen-suppressed molecule structures to depict the topological characteristics of cations, anions and their interactions. A total of 25 TI descriptors were finally selected as the input parameters to develop the QSPR model. The results showed that the model has a good predictive ability. However, the excessive input descriptors selected in the model would have caused a high risk of overfitting and may seriously affect the prediction performance of the developed model. What’s more, only one particular category of employed descriptors is always considered to be with a strong collinearity and not sufficient to completely characterize the structure characteristics responsible for the thermal hazard of ILs. Gharagheizi et al.24 developed a QSPR model to describe the Td of 586 ILs based on both anion-based and cation-based molecular descriptors. A total of 12 descriptors (6 for anions, and 6 for cations) were selected to quantitatively describe the variability of the ILs’ structures by employing the genetic function approximation method. The results indicated satisfactory goodness of fit and predictive ability of the model. However, this work only took into account the descriptors calculated from the anions and cations separately without considering the 3

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 35

interactions between them, which lead to the incomplete structure characterization and insufficient mechanism explanation of the QSPR model. More recently, Venkatraman and Alsberg25 developed two QSPR models to predict the Td of 995 ILs by using partial least squares and random forests (RF), respectively. Various 3D descriptors and topographical descriptors were employed to describe the structural characteristics of ILs. The RF model preforms better which contains 23 descriptors as model input parameters. However, the model is not so easy to use as a large number of 3D descriptors have been employed in the model, which need complex optimization computations. Also, the excessive input descriptors in the model would have caused a high risk of overfitting as well. Besides, all the above-mentioned models were developed from comprehensive Td datasets for ILs, and all the employed datasets contain at least seven different categories of ILs. So it is rather difficult to indicate and interpret the various thermal decomposition mechanisms of different categories of ILs from the comprehensive QSPR models, since it is known that different categories of ILs may indicate quite different thermal decomposition mechanisms.26,27 Unfortunately, in all the above-mentioned models, only the definitions of each employed descriptor were provided without discussing the physical meanings embodied in those descriptors, and the predominant structure characteristics responsible for Td reflected in the developed models have not been revealed. Thus in this study, the thermal hazard of imidazolium ionic liquids, which are one of the most important and commonly used classes of engineered ILs, are solely studied via QSPR method for the first time, in order to develop a more rational QSPR model for the Td of this typical category of ILs, since it is well-known that only the ILs whose structures are "similar" to the ones in the training set can be theoretically predicted reliably from the perspective of QSPR principle. Both the single descriptors for describing the single ions and the interaction descriptors for describing the interaction between the 4

ACS Paragon Plus Environment

Page 5 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

anions and cations were combined together to extensively represent the structural characteristics of the imidazolium ILs. The QSPR model for predicting the Td of imidazolium ILs was presented, and the predominant structure factors responsible for thermal hazard of imidazolium ILs were identified. After implementing extensive validation using multiple strategies, the obtained QSPR model is expected to provide critical support for predicting the thermal hazard of novel imidazolium ILs solely from their ionic structures, as well as designing and manufacturing safer and green ILs with desired properties. 2. Materials and Methods 2.1 Dataset Imidazolium ionic liquids are one of the most important classes of engineered ILs, which have been widely used for absorbing CO228 and dissolving of organic matter29, due to their remarkable characteristics, such as being easy to prepare, having structures easy to control and having outstanding solubility. In this study, the employed dataset contains 168 different imidazolium ILs with their thermal decomposition temperature (Td) data, all of which were taken from the database of physicochemical properties of ionic liquids provided by Chinese Academy of Sciences30. All the Td data provided in this database were obtained from extensive literatures, and the corresponding experimental protocols including the apparatus and procedures can be available from the original literatures. Most of the Td data of the selected ILs were measured by TGA with a heating rate of 10℃/min, and the resulted temperatures from TGA data were the onset, as determined from the step tangent. The Td data of the rest of the ILs were gained by DSC with the same heating rate of 10℃/min, and onset of decomposition was taken as when the abnormal section of the plot began. This database always assesses the reliability of its reported Td values and it was thus often considered as a major source of reliable data by organizations and researchers. All of the cations of the selected ILs are imidazolium cations, which are combined with 5

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 35

38 different anions. The observed Td values of imidazolium ILs are presented in Table S1 in Supporting Information 1. 2.2 Determination of Molecular Descriptors 2.2.1 Calculation of single descriptors Since the imidazolium ILs are composed of different anions and imidazolium cations, the descriptors of the anions and cations should be calculated separately to describe the ionic structure characteristics of single ions. In order to develop simple and convenient models, we employed a series of conventional and computationally inexpensive descriptors (topological descriptors, constitutional descriptors, atom-centered fragments et al.) to construct QSPR model for predicting the thermal hazard of imidazolium ILs, without considering the more sophisticated and complex molecular descriptors. All the selected descriptors were calculated by employing the Dragon program (version 6)31, which is an efficient program for the calculation of various molecular descriptors. The detailed calculation procedure is as follows: First, the 3D chemical structure of each ion was drawn and preliminary optimized by Marvin Sketch (version 15.9.21.0)32 based on the integrated Dreiding force field algorithm, which is considered to be moderately accurate for both geometries and conformational energies optimization (Accelrys Materials Studio Version 5.5.0.0 User’s Guide, 2010). Next, the optimized 3D structures were imported into Materials Studio (version 7.0, Materials Studio is copyrighted by Accelrys, Inc.) for the further minimum energy molecular geometries optimization based on Dreiding force field. Then the optimized ionic structures were used as inputs into Dragon program to calculate the molecular descriptors employed to search for the best model of the Td prediction for imidazolium ILs in the dataset. In all, a total of 4885 descriptors were calculated for each IL in the dataset. The detailed description on the types and the calculation procedure of the descriptors that Dragon can calculate can refer to 6

ACS Paragon Plus Environment

Page 7 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Handbook of Molecular Descriptors33 and Molecular Descriptors for Chemoinformatics34. After the calculation of molecular descriptors, the constant and near constant ones for all ions were removed from the descriptor pool, since they were not encoding the structural differences between the ions accounting for their different Td values. Further reduction of the descriptor pool was attained by examining pairwise correlations between descriptors so that only one of the pair descriptors contributing similar information was retained (correlation coefficient >0.95 in this study). These reductions lead to a reduced pool of 1026 descriptors for further study. 2.2.2 Calculation of interaction descriptors Many influential and sophisticated topological descriptors (Topological distance matrix descriptors, Barysz matrix descriptors, Burden matrix descriptors et al.) have been integrated in the Dragon software, but all of them are calculated for describing the structures of a single molecule or ion. In this study, a new kind of interaction topological descriptor (named topological index) proposed by Yan et al.23 was employed to describe the interactions between the anions and cations. Topological indexes (TIs) are numerical quantities calculated from a hydrogen-depleted graph of the chemicals. Each atom in the graph is numbered randomly with different numbers from 1 to n, where n is the total number of non-hydrogen atom in the ion. There are two steps to calculate the TIs. Firstly, the distance matrix D, which is used for determining the positions of atoms in ions, is defined as eq.1.

D = ( d ij )  n if the path length between atoms i and j is n dij =  otherwise 0

(1)

The character vector (CV), which is used for determining the characters of atoms in the hydrogen-depleted molecule by employing eight elements to depict an ion, is defined as eq.2.

CV = ( ai )

(2) 7

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 35

CV1, ai: π× van der Waals radii; CV2, ai: atom weight; CV3, ai: atom electronegativity; CV4, ai: π× atom radius; CV5, ai: exp (vertex degree, defined as the number of adjacent atoms); CV6, ai: exp (fraction of hydrogen to atom i and hydrogens adjacent to it); CV7, ai: exp (1/atom electronic shell number); CV8, ai: exp (1/atom outermost electron number). The control group is defined as: CV9, ai: 0, which means no element. where ai is the element that characterize the atom i. The values of the van der Waals radii, atom radius and

atom

electronegativity

are

available

on

the

internet

at

http://periodictable.com/Elements/031/data.html. Set distance matrix D and CV together into a total matrix (TM). For each TM, only one CV is used, and thus nine TMs are obtained in this way. TM is defined as eq.3.

TM = [ D CV ] × [ D CV ]

T

(3)

Secondly, calculate the TIs. There are five sets of TIs can be calculated from one TM in Yan’s work23, but four sets of which are defined to depict the structure for a single ion. Thus here only the set of interaction descriptors are chosen as the descriptor employed in this work. Yan hold the view that the Td of IL is not simply related to the sum of the contributions gained from anion and cation, and then the interaction descriptors are thus defined as eq.4.

TI =

∑λ

ca , i

+ ∑ λan ,i

(4)

where λca ,i and λan ,i are the eigenvalues of TMs from cations and anions, respectively. According to eq.4, the final 9 TIs were obtained from 18 TMs generated from cations and anions for 8

ACS Paragon Plus Environment

Page 9 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

each IL, respectively. 2.3 Descriptor Selection and Model Development After the calculations of various descriptors, the following important problem involved in QSPR studies is to select the optimal subset of descriptors that have significant contribution to the desired property. The well-known Genetic algorithm (GA) is just a powerful optimization method for solving this problem35. This algorithm is developed to search for the global optima of solutions, and it has been successfully applied to feature selection in QSPR studies. In this study, the GA-MLR method, which is a sophisticated hybrid approach that combines GA as a powerful optimization method applied to multiple linear regression (MLR) as a popular statistical method for variable selection36, was employed to find the optimal subset of descriptors that accurately represented the relationships between ionic structures and the Td for imidazolium ILs. The program required to perform GA-MLR in this study was written in MATLAB M-file in our laboratory. The fitness function of this method corresponds to the root mean square error of cross-validation (RMSECV). The GA-MLR analysis was then implemented to thoroughly explore the descriptor space to attain the best QSPR model. The program is started with one descriptor, and models with varying numbers of descriptors are examined. When increasing in the number of descriptors did not improve significantly the RMSECV of the obtained model, it was determined that the optimum subset of descriptors which yield to the best MLR model had been achieved. 2.4 Model Validation Model validation is an essential process for any developed QSPR model. Only validated models can be considered to be reliable in applications according to the OECD recommendations37. In this study, various validation strategies were employed to validate the performance of the developed QSPR model 9

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 35

for its fitness, robustness and predictivity. Firstly, the widely used statistical parameters, the squared correlation coefficient (R2), the root mean square error (RMSE) and the average absolute error (AAE) were employed for the validation of the fitness of the model. Secondly, the robustness of the model was indicated by the leave-many-out cross-validation (Q2LMO), which was known as the most common method in internal validation. In this method, the initial training set is divided into G subsets. Then, m data are removed from each subset, the rest dataset containing n-m data is used as a training set for modeling and the removed m data is used as a test set. After G times calculation, a cross-validation determination coefficient Q2LMO is obtained. By using this method, not only the robustness but also the internal predictivity of the model can be validated. However, the validation of the predictivity of a QSPR model always consists of two parts, namely the internal validation and the external one. The external validation is the most important and widely used strategy for evaluating both the generalizability and the external predictivity (“true” predictive capability) of the QSPR model for new chemicals. In this study, the external validation was carried out by splitting the available dataset into a training set and an external test set. The training set is used for descriptor selection and model development, while the test set is used for model validation. Consequently, the whole dataset is randomly divided into a training set with 135 ILs (80% of the dataset) and a test set with 33 ILs (20% of the dataset), since it was well known that the model must be tested on a sufficiently large number of chemicals not used in the model development (at least 20% of the dataset was recommended), in order to avoid obtaining the QSPR results by chance or obtaining non-general conclusions38. Moreover, the principal components analysis (PCA), which has been widely applied in visually expressing the similarities and differences of the studied data39, was also employed in this work 10

ACS Paragon Plus Environment

Page 11 of 35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

to estimate the relevance of the random partition in terms of chemical space and properties. In addition to the statistical parameters RMSE and AAE, five modified forms of predicted R2 were further employed to validate the predictivity of the model, namely the external predictive parameter (Q2F1, Q2F2, Q2F3), the Concordance Correlation Coefficient (CCC), and the rm2 metrics ( rm2( test ) ).40 Finally, the Y-randomization test was also employed to further ensure the robustness of the developed model. In this test, several new QSPR models are developed by randomly shuffling the dependent-variable vector (Y vector) using new “best” combinations of m out of the M original descriptors, where m is the number of the descriptors used in the original model, and M is the total number of the descriptors in the descriptors pool. In each model, the highest R2 value obtained by descriptor selection was recorded as the highest random R2 of randomization, and the mean highest random (mhr) R2 and its standard deviation (SD) were calculated by averaging over the repetitions. The distribution of highest random R2 values is approximated by a normal distribution. If all the randomized models have much lower R2 values than that of the original model, and the difference between R2 of the original model and mhr R2 is higher than 2.3 SD for significance on the 1% level, higher than 3 SD for the 0.1% level, etc, one may reasonably conclude that there is no chance correlation in the model development, and the developed model can be reasonably considered as an acceptable one by the current modeling method.41 2.5 Applicability Domain (AD) According to the OECD principle 337, an applicability domain (AD) should be defined when a QSPR model is developed. It is a valid range to determine whether the new chemicals can be reliably predicted by the developed model. This theoretical area essentially depends on the characteristics of the training set, including both molecular descriptors and target endpoint. In other words, only the chemicals whose 11

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 35

structures are "similar" to the ones in the training set can be predicted reliably. The most common method used for defining and analyzing the AD is the Williams plot.42 It is a scatter plot presenting the investigated chemicals in two dimensions. For x-axis, the leverage (Hat) refers to the distance between the row vector of descriptors for the i-th chemical x and the descriptors X for all the training chemicals. It is expressed by the leverage value hi, which is defined as follows: hi = xiT ( X T X ) −1 xi

(5)

The warning leverage value h* was employed to determine whether the chemical was abnormal or not. It is calculated as eq.6.

h* =

3( p + 1) n

(6)

where p is the number of molecular descriptors used in the model, and n is the data size of the training set. Thus, a chemical whose descriptor values are in the range of those of the training set will be reasonably considered to be inside the AD and the model goes applicable for it.43 For y-axis, the standardized residuals reflect the quality of the endpoint prediction for particular chemicals. The outliers are defined as the ones with standardized residuals higher than 3 standard deviation units, which always indicate an error in predicting the target property of chemicals. Moreover, in order to further evaluate the prediction performance of the model for the predictions obtained for chemicals lacking experimental data and to visualize both the interpolated and extrapolated predictions, the Insubria graph method was also employed in this study. The ILs with h>h* are outside the structural domain of the training set, therefore, their predicted data are extrapolated by the model and could thus be less reliable. In the Insubria graph, a zone of higher reliability for both structures (h* and the standardized residual >3s (both response outlier and high leverage chemical). However, there are five ILs (see Fig.5) in the dataset are close to wrongly predicted (>2s, but