Extensive Databases and Group Contribution QSPRs of Ionic Liquids

Mar 11, 2019 - Three common machine learning algorithms are employed to represent ... namely, multiple linear regression, feed-forward artificial neur...
0 downloads 0 Views 8MB Size
Subscriber access provided by UNIVERSITY OF TOLEDO LIBRARIES

General Research

Extensive Databases and Revised Group Contribution QSPRs for of Ionic Liquids Properties. 1. Density Kamil Paduszy#ski Ind. Eng. Chem. Res., Just Accepted Manuscript • DOI: 10.1021/acs.iecr.9b00130 • Publication Date (Web): 11 Mar 2019 Downloaded from http://pubs.acs.org on March 18, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Extensive Databases and Revised Group Contribution QSPRs for of Ionic Liquids Properties. 1. Density Kamil Paduszyński∗ Department of Physical Chemistry, Faculty of Chemistry Warsaw University of Technology, Noakowskiego 3, 00-664 Warsaw, Poland E-mail: [email protected]

Phone: +48 (22) 234 56 40

1

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Abstract A new group contribution (GC) method for estimating density (ρ) of pure ionic liquids (ILs) as a function of temperature (T) and pressure (p) is developed on the basis the most comprehensive collection of volumetric data reported so far (in total 41250 data points, deposited for 2267 ILs from diverse chemical families). The model was estabilshed based on carefully revised, evaluated and reduced data set, whereas the adopted GC methodology follows the approach proposed previously [Ind. Eng. Chem. Res., 2012, 51, 591–604]. However, a novel approach is proposed to model both temperature and pressure dependence. The idea consist of an independent representation of reference density ρ0 at T0 = 298.15 K and p0 = 0.1 MPa and dimensionless correction f (T, p) ≡ ρ(T, p)/ρ0 for other conditions of temperature and pressure. Three common machine learning algorithms are employed to represent quantitative structure-property relationship (QSPR) between the studied property endpoints, GCs, T and p, namely, multiple linear regression (MLR), feed-forward artificial neural network (FFANN) and least-squares support vector machine (LSSVM). Based on detailed statistical analysis of the resulting models, including both internal and external stability checks by means of common statistical procedures like cross-validation, y-scrambling and “hold-out” testing, the final model is selected and recommended. An impact of type of cation and anion of the accuracy of calculations is highlighted and discussed. Performance of the new model is finally demonstrated by comparing it with the similar methods published recently in literature.

2

ACS Paragon Plus Environment

Page 2 of 59

Page 3 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Introduction Ionic liquids (ILs) are salts exhibiting low solid-liquid phase transition temperature, often lower than “room” temperature. 1,2 It has been confirmed by diverse theoretical and experimental studies, thus commonly accepted by chemical community, that such peculiar behavior is due to significant difference in size and shape of ionic species composing ILs. 3 In fact ILs’ cations are usually derived from relatively large and branched organic molecules, whereas anions are formed (in a great majority of cases) by much smaller and symmetric moieties. Naturally, having a molten salt at ambient or mild conditions of temperature at hand opens paths to diverse potential applications in both pure and applied sciences. In particular, ILs have been recognized since many years as novel and “green” solvents for chemical engineering of the future, 4 mostly as separating agents for extractive distillation of azeotropic mixtures 5 and liquid-liquid extraction for bioactive compounds. 6 So, a detailed knowledge on a mutual relationship between cation/anion chemical structure and physical and thermodynamic properties of ILs seems to be crucial, if one considers their successful and effective utilization. Nevertheless, this knowledge is very difficult to get, first of all due to extremely large structural diversity of ILs. In fact, nowadays, when one basically does not have any limitations in organic synthesis, a vast number of ILs differing in cation/anion core as well as side chains having different length, branching, cyclization, task-specific functional groups attached, and so on, can be produced. Obviously, a complete experimental investigation of an impact of the chemical structure of ILs and their relevant properties will never be a feasible task. Hence, computer-aided molecular design (CAMD) of ILs, involving contemporary achievements of both computational tools (both hardware and software) and methods for establishing and evaluation of quantitative structure-property methods (QSPRs), seems to be natural way of exploring the regions of “chemical space” undiscovered so far — in fact, there is a significant number of papers published in recent years, which review and directly addresses this point comprehensively. 7–9 As ILs usually form a liquid phase at operating conditions, their density (denoted henceforth by ρ) poses as a fundamental physical quantity for process/product design. From a purely utilitarian point of view, density is always a key process parameter for equipment sizing and material/energy 3

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

balance formulation. On the other hand, other properties, like heat capacity or surface tension, are often correlated with ρ, so that reliable density values allows to estimate them with a reasonable accuracy. Finally, theoretical meaning of temperature-pressure dependence of ρ (i.e. the mechanical equation of state, EoS) is especially emerged in the fields of chemical thermodynamics, as p-ρ-T enables to predict phase diagrams and enthalpy of pure fluids as well as their mixtures. In turn, comparing molar volume (v = M/ρ, where M stands for molecular weight) of different ILs is an important piece of information when studying molecular interactions, in particular packing effects. For all these reasons, a significant amount of effort has been devoted to studying QSPRs for ρ of ILs since pioneering papers of Slattery et al., 10 Ye and Shreeve, 11 Gardas and Coutinho 12 and others. 13–45 In particular, in 2012 I co-authored a paper presenting state-of-the-art method for predicting p-ρ-T of data ILs in terms of group contributions (GCs). 24 In those days, this method was marked as the best one in the review paper of Coutinho et al. 7 Recently, the model has been used as reference method for evaluation of new experimental data, 46–51 as well as utilized as supporting tool in CAMD of ILs for separations. 52 GC approach is especially attractive due to its simplicity and versatility, i.e. the fundamental features of any computational tools, which are strongly demanded by engineers. In fact, the knowledge on thermodynamics is usually not needed (unless the method involves EoS), as the all the calculations based on GC approach require from the user to draw a chemical structure of IL, identify the groups present in it and finally do some elementary math. The number of building blocks defined previously 24 was 177 (including 45 cationic, 69 anionic and 63 substituted functional groups), what is quite high and allows to estimate ρ values of virtually infinite number of structures. This approach emerged as the most promising one in terms of accuracy and predictive capacity comapred to other models reported in those days. 10–23 Since then, a number of new contributions to the field of GC methods for density have been reported. 25–45 In general, they can be divided into two classes: (1) purely empirical models based on either linear or non-linear regression of experimental data; 25–33 (2) the models adopting thermodynamics (EoSs). 34–45 In further text I provide a brief, but critical, review of these contributions (the former ones 10–23 have been summarized previously 24 ). 4

ACS Paragon Plus Environment

Page 4 of 59

Page 5 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

A number of empirical equations based on modified Rackett and Bhirud equations and their Taylor series expansions has been proposed by Roshan and Ghader, 25,26 resulting in average absolute relative error (AARE) between correlated and experimental density of the order 1%. Nevertheless, the approaches proposed suffer from relaying on virtual values of critical properties of ILs as well as from limited generalization capabilities. Predictive QSPR model involving polynomial expression relating density and chemical structure of ILs expressed in terms of GCs was proposed by El-Harbawi et al. 27 The resulting equation reproduces the relatively large database of ambient temperature and pressure density (918 data points for 747 ILs) with moderate AARE of 1.73%. Among the predictive empirical models, one should highlight the GCVOL-IL published by Evangelista et al. 28 This model is basically the extension of original GCVOL-OL-60 with over 100 new groups introduced to represent ILs. In contrary to our previous work, 24 each group was described by an extra parameter accouting for temperature dependence of molar volume. Similar methodology to ours, 24 however, was applied to capture “proximity effects” affecting the molar volume contribution of some functional groups attached directly to cation core. The model was capable of capturing more than 20000 p-ρ-T data points collected for 869 ILs with global AARE below 1%. 28 Nevertheless, neither internal nor external validation procedure was performed to check stability and predictive capacity of the final model. Yan et al. 29 proposed an emprical QSPR model based on topological index derived from distance matrices and character vectors of ions composing an IL. The authors claimed that they were able to obtain predictions accurate within 0.42% compared to almost 6000 p-ρ-T data points collected for 188 distinct ILs. 29 However, the model is very complex and the results are quite difficult to reproduce, since none examples how to properly use the model was provided. Another empirical correlation for ρ of ILs comes from Keshavarz et al. 30 The authors considered approximately 500 distinct ILs and attempted to capture density in terms atomic rather than group contributions. In my opinion, such approach is oversimplified, because ILs are too complex fluid to express their properties in terms of elemental composition. Admittedly, the authors tried to capture the deviation of the real system from oversimplified picture they proposed by incorporating density “increasing/decreasing correcting functions”. Unfortunately, I 5

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

did not found any clear explanation how these functions were calculated. 30 Similar approach for predicting ρ data of ILs was proposed in 2016 by Taherifard and Raeissi. 31 However, parameters accounting for a presence of a limited number of small functional groups were introduced. The model was generalized to predict p-ρ-T surface by using three universal constants fitted to extensive collection of experimental data (≈ 12500 data points), with AARE below 1%. 31 Generalization of the model was tested by using external data set and surprisingly lower AARD was obtained for test set compared to the correlation set. One may speculate (there are no details on the data split reported in the paper), that tested results were actually interpolated rather than extrapolated, i.e. that the data for the same ILs (only differing in T and/or p conditions) were used in both subsets. In my opinion, the results of testing would be less optimistic (in terms of AARD), if one excluded ILs rather than part of their experimental data only, from the correlation process. Barati-Harooni et al. 32 proposed an alternative approach for predicting density of ILs as a function of temperature. Instead of either atomic or group contributions, they considered molecular weight, normal boiling temperature and critical temperature of IL as the input data. Linear dependence between explanatory and output variables was replaced by more sophisticated machine learning algorithm, namely adaptive neuro-fuzzy inference system (ANFIS). The last two properties were estimated using an independent GC model. In my opinion, their utilization should be perceived as very controversial, since there is basically no possibility to evaluate their uncertainty, thus reliability. Nevertheless, the authors obtained very reasonable correlation with overall AARE of 0.66%, with 0.55% and 1.1% for training and testing, respectively. Finally, the authors compared their results with some universal correlations, mostly based on corresponding states principles and concluded that their model outperforms all of them and presents more accurate and reliable predictions. 32 However, none comparison with the models elaborated specifically for ILs were carried out. Recently, Zhao et al. 33 presented an extension of the well-known GC model by Jacquemin et al. 13,14 The authors pointed out the scatter of available p-ρ-T, which obviously affects the quality and predictive power of any GC model. Thus, mathematical gnostics was proposed as a novel approach to preselect the data for fitting the parameters. 33 In total, 5399 data points for correlation and 2522 data points for 6

ACS Paragon Plus Environment

Page 6 of 59

Page 7 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

pure prediction were taken into consideration, resulting in AARE of the order of 0.3%. 33 A number of thermodynamic models for predicting p-ρ-T data of ILs have been also published since my previous work. 24 Mousazadeh et al. 34 proposed perturbed Yukawa chain EoS as a correlative tool for density of ILs. The model was capable of reproducing p-ρ-T data of 25 in the wide ranges of both T and p within 0.22%. Similar works was reported by Hosseini et al., 35 who adopted fromperturbed hard-dimer-chain EoS, and by Maghari and ZiaMajidi, who used SAFTBACK approach. 36 Abildskov and O’Connell 37 extended their two-parameter corresponding states model for liquid density and compressibility; 21 their updated model based on direct correlation function integral comprises 26 cationic and 14 anionic functional units reproduces almost 4000 p-ρ-T data points within 0.2%. More advanced approach based on perturbed hard-sphere equations of state was proposed and developed by Alavianmehr et al. 38–40 Despite the fact the this method is physically sound as allows to predict not only density, but a broad spectrum of derived properties as well, the authors limited testing of the model just to density of pure ILs and their binary mixtures and vaporization curve. None vapor-liquid nor liquid-liquid phase equilibrium calculations for mixtures were presented. Furthermore, the approach of Alavianmehr et al. 38–40 is based on either IL- or ion-specific parameters, what significantly limits their applicability range. Electrolyte perturbed-chain statistical associating fluid theory (ePC-SAFT) was applied by Ji and Held 41 to represent p-ρ-T data of a number of pyridynium and pyrrolidinium-based ILs with ion-specific parameters. Although the results of density correlation are quite promising, one should take care when trying to transfer the proposed parameters into other thermodynamic properties, paritcularly for mixtures — in fact, energetic SAFT parameters are not so sensitive towards volumetric properties, like they are towards phase equilibrium or enthalpy data. More common thermodynamic approach, i.e. the cubic EoS was applied by Farzi and Esmaeilzadeh 42 to represent 2295 p-ρ-T data points for 347 distinct ILs, by fitting increments for critical properties and normal boiling points of pure ILs (estimated by using Joback-Reid equation) due to 35 different functional groups. This strongly oversimplified approach resulted in a very poor AARE of the order of 11%. 42 Nevertheless, the authors showed that the Esmaeilzadeh-Roshanfekr EoS applied in their work performs better 7

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

compared to analogous models derived using more common Patel-Teja, Peng-Robinson or van der Waals equations of state. More advanced cubic EoS, namely, volume-translated Peng-Robinson (VTPR) EoS was recently proposed by Bagheri and Ghader 43 as a correlative tool for accurate representation of p-ρ-T of ILs; for 69 ILs, mean AARD was at the level of 2% based on more than 5000 data points reported up to 200 MPa. As it is usually done in cubic EoSs, generalized correlation between volume shift parameter of VTPR and critical volume and temperature/pressure was found, so that the final model can be perceived as predictive. However, in my opinion, a special care should be taken when relying on the model’s outcome as all the results are based on fictitious critical point data, i.e. the data whose quality cannot be verified experimentally. Furthermore, there some significant errors in comparative analysis presented by Bagheri and Ghader 43 (see Table 3 in that paper). For example, I checked that for 1-butyl-3-methylimidazolium tetrafluoroborate, AARE of 5.82% was reported as obtained from the model given in reference 24, whereas the true value is below 0.3%. This means that the model published by my group previously 24 was not applied properly, thus the conclusions drawn by Bagheri and Ghader 43 are partially wrong. Very recently, Penna et al. 44 proposed Carnahan-Starling-van der Waals model to reproduce p-ρ-T of a number of ILs composed of common cations and anions. Interestingly, extrapolations of the developed model to GPa pressure range were validated by using Raman spectroscopy. 44 A few years before it has been demonstrated that molecular dynamics simulations can play a similar role in EoS validation up to extremely high pressures. 45 This paper initializes a series of reports on new comprehensive GC-based QSPR models for diverse physical and thermodynamic properties of ILs. The goal of this contribution is to propose new models for predicting density, thus other volumetric properties as well, e.g. obtainable by means of differentiation. Based on the success of the GC approach published previously, 24 it was decided to follow the same methodology. However, the following issues related to the original model 24 will be addressed: (i) extended p-ρ-T database covering much broader of ILs structures is presented — the number of ILs found is as twice as in reference 24; 8

ACS Paragon Plus Environment

Page 8 of 59

Page 9 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

(ii) more care is taken during data revision prior fitting, including more precise detection of outliers as well as analysis of statistical significance of functional groups defined within the model — in fact, this was not taken into consideration in reference 24, i.e. the groups were defined arbitrarily without checking their impact on the final model; (iii) utilization and comparative analysis of three different machine learning algorithms for regressing ρ as a function of GCs is demonstrated, including two-stage calculation scheme based on reference ρ value and correction for the deviation of T and p value from standard values T = T0 = 298.15 K and p = p0 = 0.1 MPa; (iv) analysis of both internal stability and generalization capabilities of the developed models is carried out in more detail compared to reference 24, where robustness of the model was checked using external data set only. Finally, the developed models are confronted with the models reported in the literature. In particular, detailed and global comparison with the GC method proposed by Taherifard and Raeissi 31 is provided. This model was selected as the main reference, as it is the newest and hence state-of-theart model established on the basis on the most comprehensive database among other published so far (25850 data points). Furthermore, it is quite easily applicable for virtually any IL as the only input it requires is the atomic composition of constituting ions.

9

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Modeling Database Overview For the purpose of research summarized in this paper, experimental p-ρ-T data for 2267 ILs were extracted from literature published since 1984 until present. The data were organized in the form of 4318 data sets with the following information assigned to each record: the name of IL coded using some predefined abbreviations of ions, sample information (including water content, supplier and purification protocol), experimental technique adopted to measure ρ, reference to experimental data source and experimental data points in the format “temperature (K) – pressure (MPa) – density (kg · m−3 )”. If the pressure is not specified precisely in the source, the value of p = 0.1 MPa is assigned. For 475 ILs, at least two literature sources were found. In turn, for 1037 only a single data point measured at ambient conditions was reported. In total, 41250 data points were collected in the database, covering the range temperature from (220 to 473) K, pressure from (0.1 to 300) MPa and density from (800 to 2800) kg · m−3 . According to the data collected, 7 general densitometric principles were applied to measure ρ so far: oscillating tube (2582 data sets), pycnometry (799 data sets), gravimetry (246 data sets), buoyancy method (157 data sets), dilatometry (37 data sets), isochoric piezometer (20) data sets. For remaining 475 data sets, the method apply to measure ρ was not stated. All the information were stored for each IL in a separate ASCII file. All the files are available for research purposes upon e-mail request. Diversity of the ILs listed in the database (and ions forming them) is extraordinarily huge. In the case of cations, 1035 distinct moieties can be found, including a great variety both branched and cyclic structures. For anions, the numbers of unique structures is 252. Therefore, using only these cations and anions, one can consider an existence of ≈ 260000 distinct binary ILs — this means, that the “chemical space” of ILs was covered experimentally in less than 1% so far; such simple estimation additionally supports the reasoning for development of new and robust QSPRs for ρ of ILs. In order to make further analyses of the modeling results easier, both cationic and anionic structures were formally divided into 24 and 15 chemical families, respectively. A full 10

ACS Paragon Plus Environment

Page 10 of 59

Page 11 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

list of families, including the counts of moieties belonging to them as well as the data sets and data points is given in Table 1. In turn, scatter plot depicted in Figure 1 presents a distribution of different cation-anion combinations, just to show which ILs have been studied the most exhaustively thus far. It is worth mentioning that for more than a half of all possible binary “cationic family + anionic family” combinations (222 out of 360, to be specific), there are no experimental ρ data published in literature so far. Furthermore, it seems that nothing has significantly changed since 2012, 24 as imidazolium- and ammonium-based ILs remain the most abundant classes. In the case of anions, significant amount of aliphatic carboxylates, mostly ammonium ones, should be emphasized. Besides, common anions like bistriflamide [NTf2 ], tetrafluoroborate [BF4 ] and hexafluorophosphate [PF6 ] still remain the most extensively studied ions. All the details regarding the database, including the full list of ions (their names and abbreviations) as well as all the data points with all relevant information assigned (naturally, including references to source papers), are presented in clear tabular form in the Microsoft Excel spreadsheet provided in the Supporting Information.

Data Preparation Prior the modeling, the data for each IL were revised and then reduced, as described in the following subsections. Revision process followed two-stage protocol comprising sample information studies (stage 1) and statistical outliers detection (stage 2). Then, the pool of revised/accepted data was reduced to more compact size by using regression. This step was taken just in order to eliminate an impact of differences in the amount of data available for different classes of ILs (see Table 1) on the final model. Final collections of both raw, revised and reduced data as well as all the details on final regressions obtained for each IL in the reduction stage can be found in the Microsoft Excel spreadsheet provided in the Supporting Information.

11

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Data Revision The goal of the revision process was to select possibly “the best” data, i.e. those which can be recognized as the data for “pure” ILs. This is very fundamental issue for applications of the models developed based on those data, since the measured values usually refer to the samples containing impurities. Thus, the hard question on a mutual relation between impurities content and the models’ estimations arises. The answer can be given by reasoning as follows. In the future, the proposed models will be possibly utilized in two ways: (1) in comparing their predictions with the measured data, treating the predictions as a more sort of reference (“benchmark tool” mode); (2) in computing density of ILs not synthesized yet, just to check whether the IL of assumed structure will possess demanded value of this property (“design tool” mode). In the case of “benchmark” mode, one can attempt to estimate impurities content based on the difference between calculated and the real data. On the other hand, in “design mode” one can simulate an effect of impurities by assuming their chemical identity and taking them into account in estimating molar volume (e.g. assuming additivity). Finally, studying an effect of impurities in both the modes should be always preceded by the check how far is the chemical structure of the probed IL from the model’s applicability domain; obviously, closer “distance” results in more reliable estimation. In the very first step of the revision, sample information and experimental method for ρ determination were chosen as criteria for excluding a given data set from further consideration. The data sets for which either has not been specified in the original source were removed, unless there were no other data sets available. Next, the data sets consisting of single data points were also excluded if temperature-dependent data were available. Data set with the lowest water content declared by the authors was selected as the reference set which was compared with remaining data sets in order to check internal consistency. For instance, it was verified whether the lowest water content does in fact correspond to the highest density. Unfortunately, in many cases this was not true, so that a manual inspection of the data was necessary. For such “problematic” ILs some auxiliary analyses and comparisons of their density with the data available for similar ILs (e.g. within the same homologous series), have been enforced. In particular, the data outlying form the 12

ACS Paragon Plus Environment

Page 12 of 59

Page 13 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

commonly known trends (e.g. increasing molar volume by about 16 cm3 · mol−1 upon addition of CH2 group in alkyl chain 24 ) were excluded. In the second step of the database revision process, the remaining data sets were regressed using common equations for correlating p-ρ-T data along with the well-established statistical methods for outliers detection. For each IL, the data were fitted using equation of the general form: ρ(T, p) = ρ(T0, p0 ) f (τ, π) = ρ0 φ(τ)ψ(τ, π)

(1)

where ρ0 is a constant corresponding to density at reference conditions T0 = 298.15 K and p0 = 0.1 MPa, whereas f stands for temperature-pressure correction factorized into following two expressions: 1 = 1 + a0 τ + a1 τ 2 φ(τ)

(2a)

1 = 1 − C ln [1 + πB(τ)] ψ(τ, π)

(2b)

with B(τ) = (1 + b1 τ) /b0 , τ ≡ T −T0 and p ≡ p − p0 ; obviously, eq (2b) was used only in the case of data sets with high-pressure data available. The coefficients ai , bi (i = 0, 1) and C were iteratively included/excluded from eq (1) in the way to provide statistically the most significant fit based on Akaike information criterion for model selection. 53 At each iteration of fitting process, Williams plot showing studentized residual s versus leverage h was analyzed and the data set containing data point with the maximum residual was excluded. The procedure was repeated until condition |s| < 3 was satisfied for all the data points. In turn, leverages were used to inspect highly influential data sets and data points. Exemplary results of revision process are shown in Figure 2, where the final fit, the Williams plot as well as relative deviations between regressed and experimental densities are presented for 13

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

[C4 C1 Im][BF4 ] IL. The proposed data revision process resulted in an essential decrease of the size of the data pool for further modeling, namely, by 1511 data sets and 10080 data points; nevertheless, it should be emphasized that none ILs were rejected at this stage. Data Reduction The fits resulting from revision process allow to reproduce the accepted data sets with an excellent accuracy, in many cases much lower than uncertainty of the measured ρ data. To be more specific, root-mean-square error (RMSE) between fitted and experimental ρ was below 0.1 kg · m−3 for more than a half of ILs, whereas in the case of 90% of all fits, RMSE was of the order of 0.7 kg · m−3 . For some ILs, the values of RMSE was higher due to a significant scatter of experimental data — see the Microsoft Excel spreadsheet provided in the Supporting Information for more details. Nevertheless, one can firmly assume that the reduced data (i.e. these generated by using the fits) are quite accurate to be applied for further study rather than the “raw” data. Based on these findings, for each IL reduced data were finally generated using corresponding fits, by taking 5 values of T and p equally distributed in [Tmin,Tmax ] and [pmin, pmax ] intervals. In the case of ILs, for which single data points were available only (i.e. the regression was not possible) all the accepted raw experimental data were transferred directly from database. As a result of such procedure, 6920 data points were finally present in the data pool used in the following modeling. Prior to further analyses and calculations, each record of the reduced data pool was transformed into the form given in eq (1), i.e. into a binary composed of the reference term ρ0 and respective correction f . The former contribution was simply computed using the fit obtained for a given IL, whereas the latter one was calculated as f = ρ/ρ0 , i.e. the ratio of the actual (reduced) ρ values and the computed ρ0 . In the case of ILs, for which single data points were available only (i.e. the regression was not possible) raw experimental data were taken directly and assigned as ρ0 . Moreover, the values reported at temperature different than T0 = 298.15 K were corrected assuming constant and universal thermal expansion coefficient αp ≡ − (∂ ρ/∂T) p /ρ = 6 × 10−4 K−1   and resulting formula ρ0 = ρ(T) exp −αp (T0 − T) . The values of ρ0 and f corresponding to each 14

ACS Paragon Plus Environment

Page 14 of 59

Page 15 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

individual data point are listed in the Microsoft Excel spreadsheet provided in the Supporting Information.

GC Scheme Occurences of a set of predefined functional group were used as explanatory variables of the proposed QSPRs. GC scheme applied in this work follows exactly the same idea as presented in my previous papers. 24,54,55 However, the much broader range of unique functional groups is applied to develop the models presented herein due to a substantial increase of the number of ILs published in the open literature since 2012. In general, chemical structure of IL is assumed to be built of three types of functional groups (in further text, also called fragments or building blocks), namely, cation core, anion core and substituted groups. Cation cores are the groups representing charged fragment of the cation and poses the main (and usually the biggest) part of cation. The same regards anion cores representing charged fragment of the anion, or more frequently the anion as a whole. Finally, substituted groups allow to model different types of side chains attached either to cation or anion. In total, 320 distinct functional groups were defined for the purpose of this study, including 73 cation cores, 131 anion cores and 116 side chain groups. The groups assignments proposed in this work were designed in such manner that each atom was present in one and only one fragment of structure. Hence, there were no unassigned groups and there was no overlap of the groups. The number of fragments adopted in this work to represent ILs may be seen as large. However, it can be justified by some rules adopted when defining the fragments. First of all, distinct groups were defined for cyclic cation cores in order to capture positional isomerism of the rings. For example, 1,3- and 1,2,3-substituted imidazolium cations, or pyridinium cations with alkyl chains in various positions around the ring were defined. The same regards branched ammonium and phosphonium cations differing in the number of protons present at the central atom (i.e. whether they were derived from primary, secondary, or tertiary amine/phosphine). This is particularly useful as allows to explicitly distinguish protic ILs. In the case of anions like carboxylates, 15

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

sulfates, sulfonates, or phosphates, groups with CH3 , CH2 and CH groups directly attached to the charged fragment of the moiety were treated separately as well. For example, methyl sulfate was is represented by a single CH3 SO4 unit, whereas ethyl sulfate by the assigment like CH2 SO4 + CH3 . Finally, in the case of substituted groups, small as well as relatively big fragments were taken into account. In particular, some “standard” functional groups like methyl CH3 , methylene CH2 , hydroxyl OH, nitrile C−N, amine NH2 , carboxyl COOH are available. Special groups like “phenyl” (substituted at various positions), or ether groups (CH3 O and CH2 O) are also introduced to capture an effect of them on density more precisely. Furthermore, distinct building blocks were assumed for the fragments attached directly to the aromatic ring or cation core. In particular, there is a number of methylene and methyl groups defined within the proposed GC scheme and each of them corresponds to CH2 /CH3 bonded with different heteroatoms with an additional distinction between heteroatoms present in aromatic ring (like imidazolium, pyridinium), aliphatic ring (like pyrrolidinium, piperidinium) or branched quaternary structure (like ammonium, phosphonium). For example, the group denoted by “aNCH3 ” defines the methyl group attached to nitrogen atom in aromatic ring, e.g. imidazolium, pyridinium, or quinolinium. The main goal for introducing all these rules was to enable the GC models to discriminate between structural isomers, as well as to capture different properties of functional groups due to so-called “proximity” effects. It should be emphasized that one has to aware of the mentioned rules it order to apply the proposed models correctly. Otherwise, the computed results may be misinterpreted. A full list of groups as well as the group assignments for all the ions considered in this work are listed in the Microsoft Excel spreadsheet provided in the Supporting Information. I encourage reader to study this material prior making any predictions with the proposed models, just to get more familiar with the applied GC scheme.

Computational Details The key novelty of this work compared to my previous contributions 24,54,55 is that reference and correction contributions given in eq (1) will be modeled independently. First, the expression given 16

ACS Paragon Plus Environment

Page 16 of 59

Page 17 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

in eq (1) is rewritten into the form: ρ(T, ˆ p, n) = ρˆ0 (n) fˆ(τ, π, n)

(3)

where the symbols with the “hat” above represent the estimators of the properties returned by the modeling, whereas the group assignment is defined by vector n = [n1, . . ., nG ] with ni denoting the number of occurrences of group i in the IL’s structure and G denoting the number of distinct groups; obviously, ni = 0 if the group i does not appear in the structure. Furthermore, reference molar volume v0 will be modeled in the reference term instead of specific density, so that ρˆ0 (n) = where M(n) =

M(n) vˆ0 (n) ÍG

i=1 ni Mi

(4) with Mi denoting molecular weight of functional group i. This is due to

the fact that molar volume has been shown many times to be more suitable for modeling density, as being approximately additive property with respect to fragment-based terms. 24,28,31 Selection of Relevant GCs Before final calculations, it was first investigated which functional groups defined within GC scheme are statistically insignificant, hence should be removed from further consideration. For that purpose, the reference term v0 data versus n were analyzed using stepwise multiple linear regression (SWMLR), to be more specific with the algorithm implemented in MATLAB stepwisefit function. In short, the method starts with an initial model and then compares the explanatory power of incrementally larger and smaller models. At each step, the p-value of an F-statistic is computed for test models with and without a potential term. The maximum p-value for a term to be added was 0.05, whereas the minimum p-value for a term to be removed was 0.10. Scaling of both v0 and n to z-scores data prior each iteration of stepwise regression was applied. Such stepwise regression was applied in an iterative fashion, whilst coupled with outlier de17

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

tection based on studentized residuals — at each step, the ILs exhibiting either |s| or leverage h higher than higher than preset critical values were rejected. These criteria were imposed to remove mismatching and strongly influential data, respectively. If none ILs meets them, the algorithm stops. The applied feature selection process resulted in a significant reduction of both dimensionality of the input data and ILs for further modeling. Finally, only 162 out of 320 all the groups were kept in the model. The list of 158 rejected groups comprises 31 cation cores, 70 anion cores and 57 substituted groups. A full list of included/excluded groups can be found in the Microsoft Excel spreadsheet provided in the Supporting Information. It should be pointed out that such noticeable fraction of groups were removed to due to their statistical rather than physical irrelevance. Rarity or scarcity of data for ILs consisting of excluded groups can be seen as an additional argument explaining their exclusion. In fact, for almost 90% of the rejected groups, the number of ILs having them in either cation’s or anion’s chemical structure was less than 10. Finally, the groups selected to represent v0 versus n data were directly transferred to model fˆ(τ, π, n) relationship term of eq (3). Unfortunately, dimensionality reduction performed with stepwise regression makes the application of the proposed GC scheme a little bit confusing. In fact, one has to keep in mind that some fragments are not explicitly represented by the final model. Nevertheless, the MATLAB subroutines provided in the Supporting Information treats the problem automatically, so that the end-user does not have to be actually aware of this issue. The number of v0 data points, thus ILs, eliminated from further studies on the basis of their studentized residuals is also significant as it equals 268, what corresponds to a reduction of f data samples by 663. Nevertheless, it should be noted that one can still use the methods proposed in this work to predict density of these rejected ILs. However, the analyses carried out suggest that the models’ outcomes might be very inaccurate, as the experimental data on v0 of these ILs look like they do not match the data of much broader pool of accepted ILs.

18

ACS Paragon Plus Environment

Page 18 of 59

Page 19 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Methods In order to express the relationship of v0 or f versus n, T and p, three common machine learning algorithms were employed in this study, namely: multiple linear regression (MLR), feed-forward artificial neural network (FFANN) and least-squares support vector machine (LSSVM). In my opinion, thermodynamic and chemical engineering community is quite familiar with these methods and that is why I decided to used them in this study. Nevertheless, other methods like ANFIS could also be applied. 32 All the applied methods start with an assumption that for a given data set of scalar property y depending on some n-dimensional predictor x, i.e. D = {(xi, yi ) : xi ∈ Rn ; yi ∈ R;i = 1, . . ., N }

(5)

there exists a relationship given as a mapping F, namely, yi = F(xi ) + εi

(6)

where εi (i = 1, . . ., N) stands for a random variable corresponding to noise of observation i. In eqs (5) and (6), symbol y refers to v0 or f . In particular, if y ≡ v0 , then x ≡ n. In turn, if y ≡ f , then the predictors vector is complemented by temperature and pressure change with respect to reference conditions, i.e. x ≡ [τ, π, n]. It is important to stress that the set D from eq (5) is contains the reduced data, i.e. N = 6920 data points obtained from the regression of the raw data, as described in the previous subsection. Obviously, the real form of mapping F given in eq (6) is not known. However, the methods ˆ q) like MLR, FFANN or LSSVM attempt to approximate it by using some estimators yˆ ≡ F(x; parametrized by method-specific adjustable coefficients q; the number of parameters depends, in general, on the method’s principles and structure. Finally, the goal of modeling using these methods is to find the best coefficients q, so that the sum of squares of residuals εi (i = 1 . . ., N)

19

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ˆ q) is a linear function of (or similar expression) is minimized. In short, MLR assumes that F(x; x, FFANN assumes that it is expressed as a superposition of hyperbolic tangent sigmoids of x, whereas LSSVM assumes that it is a linear combination of so-called kernel functions. For more more detailed description of these functions as well as their parameters q and algorithms used to find them, the reader is referred to my previous work 55 and references cited therein. In the case of FFANN, methodology of employing this approach as regression tool was quite standard. In particular, single hidden and output layers with sigmoid and identity activation fuctions, respectively, were adopted in this study. Furthermore, optimum number of nodes in hidden layer of the neural network (S) was determined following the same approach as described in my previous papers. 54,55 Modeling performance was tested for different values of S and the one displaying the lowest generalization error was selected. Training of each network was combined with “early stop” validation algorithm along. Furthermore, sensitivity of the obtained nets toward training data was monitored using cross-validation. Based on such analysis, it was found that the optimized S seems to be equal to 2 and 6 in the case of modeling of v0 and f , respectively. In fact, testing mean square error (MSE) remains basically at the same level (or slightly increases) starting from these values. It is also worth mentioning that the ratios of the number of data points of v0 and f to the number of fitted parameters were respectively equal to 6. These are rather typical values compared to my previous studies, 54,55 so that one can presume (to my best experience as well as following some recommendations found in the literature 56 ) that the final FFANNs are not overfitted. For LSSVM-based models, the radial basis kernel function (RBF) was employed to represent relations of v0 and f with GCs; in fact, it has been shown previously to be robust when used in regression of large data sets. 55 The process of model development is divided into two steps called tuning and training. In the case of tuning, regularization parameters of the LSSVM (namely, regularization constant γ and kernel function parameter σ 2 ) are determined via minimization of a selected performance measure; in this work, 5-fold cross-validation MSE was “tuned” using Nelder-Mead simplex algorithm 57 with starting points for both γ and σ 2 resulting from coupled simulated annealing (CSA). 58 As a result of LSSVM training, the final model is uniquely obtained 20

ACS Paragon Plus Environment

Page 20 of 59

Page 21 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

by solving linear system of equations described in detailed elsewhere. 55 The optimized values of the models’ parameters are tabulated in the Microsoft Excel spreadsheet provided in the Supporting Information, along with all the working equations allowing to reproduce all the results discussed in further text. All the calculations described in this paper were performed by using MATLAB (version 2018a, The MathWorks, Inc.) in-house subroutines adopting some built-in functions, mostly from Statistics and Machine Learning Toolbox (for MLR method) and Neural Networks Toolbox (for FFANN method). LSSVM calculations were carried out using LS-SVMlab Toolbox (version 1.8). 59 MATLAB files enabling to perform predictions using the models developed in this work are also provided in the Supporting Information, along with the brief instructions and exemplary input files. Model Validation Most of the validation schemes applied in this work to check stability and generalization capabilities of the proposed MLR-, FFANN- and LSSVM-based models were the same as in my previous paper, 55 so that only general overview of them will be presented herein. First of all, the data pool of samples of both v0 and f were split into two disjoint sets: training set and test set with the respective ratios of 90% (1800 ILs for v0 and 5632 data points of f ) and 10% (199 ILs for v0 and 625 data points of f ); detailed information on data split can be found in the Microsoft Excel spreadsheet provided in the Supporting Information. The data from the training set were employed to fit parameters as well as to validate internal stability of the obtained models. For all the computational methods applied, stability of the models was checked using K-fold crossvalidation scheme (K-CV) with the number of folds set as K = 9 (for such K value, folds are of the same size as testing set). In the case of MLR, so called y-scrambling 60 was additionally applied to demonstrate that chance correlation between v0 and the GCs is not possible in the final model. Training and validation were finally expressed in terms of commonly used statistical measures like

21

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 59

R2 , Q2K-CV , RMSE and AARE: 2

R = 1−

2 i=1 (yi − yˆi ) ÍN 2 i=1 (yi − y¯ )

ÍN



ÍK Í Q2K-CV = 1 −

v u t RMSE =

k=1

(7)

i∈(k)

yi(k) − yˆi(k)

2 (8)

2 i=1 (yi − y¯ )

ÍN

N

1Õ (yi − yˆi )2 N i=1

(9)

N 1 Õ yˆi × 100% AARE = − 1 N i=1 yi

(10)

Symbol y in eqs (7) to (10) corresponds to reference molar volume v0 , reference density ρ, density ρ, or temperature-pressure correction f , whereas “hat” and “bar” denote computed and average value. In eq (8), (k) stands for the data sample beloning to k-th fold in cross-validation. In turn, the data from the testing set was used to check their predictive capacity when applied to external data. Quality of the model was assessed by using Q2 defined by Consonni et al. 61 2

Q = 1−

(yi − yˆi )2 i=1 ÍNtrain (yi − y¯ )2 i=1 ÍNtest

Ntrain Ntest

(11)

Apart from the global values of statistics, the results are presented for each cationic/anionic family of ILs in order to provide an insight into accuracy one can expect from the models when applying them to particular ionic structures.

22

ACS Paragon Plus Environment

Page 23 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Results and Discussion Reference Term (ρ0 ) All the parameters and other details on the developed MLR, FFANN and LSSVM models for reference term of eqs (3) and (4), along with experimental (reduced) and calculated values of both v0 and ρ0 , can be found in the Microsoft Excel spreadsheet provided in the Supporting Information. A brief summary of the statistics of the proposed models is given in Table 2. As seen, all three approaches applied display equally good performance as far as their correlative capabilities are considered — in fact, R2 for v0 data is higher than 0.99 regardless of the method. All the models are internally stable and statistically sound, what is confirmed by very good results of crossvalidation, expressed by Q2K-CV values comparable with R2 and very close to unity. The results of testing are also very promising, as the values of determination coefficient from external data set, i.e. Q2test , are even higher compared with these resulting from training and cross-validation. Parity plots presenting calculated versus experimental data are shown in Figure 3. In particular, scatter of reference molar volume (v0 ) and reference density (ρ0 ) is depicted in Figure 3a and Figure 3b, respectively. As seen, computed values follow the experimental ones in the wide range both v0 and ρ0 , with slightly more noticeable scatter observed for testing set. Furthermore, it is evidenced in Figure 3 that the ILs excluded from calculations do indeed exhibit very high deviations. One may speculate that these deviations have two contributions, namely empirical and statistical one. The first one is for sure related with an uncertainty of experimental data and their mismatch with other ILs. This especially regards di- and tricationic ILs (there are 42 out of 125 ILs of this kind present in the list of excluded compound) which are very complex, but still weakly characterized compounds, so that is has been quite difficult to capture their density with the proposed GC scheme (the same problem occured previously 24 ). The “statistical” contribution is due to the fact that the excluded ILs may consist of the functional groups eliminated from the model during GC selection process, mostly due to their low statistical significance. In other words, there are some “gaps” in the molecular structure of these ILs which are not captured by the proposed GC approach — 23

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

this is the case of 200 out of 268 excluded ILs. Unfortunately, more data are needed to represent the excluded ILs within the established methods and then to derive the respective GCs. On the other hand, one can be 100% sure that the results for the excluded ILs would be much better if GC selection step was not included in the modeling. However, one should also keep in mind that the coefficients/parameters corresponding to some groups would have uncertainties (standard errors) so high that their utilization was meaningless from purely statistical point of view as providing predictions with extremely broad confidence intervals. Due to fact that MLR, FFANN and LSSVM methods resulted in equally robust models, the simplest one, i.e. MLR, was selected as the final model for reference molar volume. The results of validation of this model are depicted in Figure 4. As seen from Figure 4a, the model is internally stable as the accuracy of predicted results does not depend on the validation data fold. Furthermore, chance correlation between v0 and the selected GCs is not possible, as evidenced by y-scrambling (1000 randomizations) results shown in Figure 4b and Figure 4c. In fact, “background” R2 and Q2K−CV of a model resulting from completely randomized dependent variable vector (i.e. such vector that the Pearson correlation coefficient between original and randomized data ryy = 0) are equal to 0.0899 and −0.1306. These values are essentially lower then the critical values proposed in literature 62 (0.3 and 0.05, respectively). Furthermore, for all the performed 1000 random permutations of v0 data, R2 < 0.2 and Q2K-CV < 0 (see Figure 4c), so that chance correlation is indeed not observed. It should be noted that feature selection was not considered in establishing the models based on randomized v0 data, so that the GCs were taken from the “real” model. Such methodology is not recommended by some authors. 60 Nevertheless, we checked that repeating GCs selection in y-randomization test produces the model which are even worse in terms of both R2 and Q2K-CV compared to these shown in Figure 4b and Figure 4b. As can be noticed from Table 2, all the models are capable to reproduce (including both training and test sets) v0 and ρ0 data approximately within 5 cm3 · mol−1 and 1.5 kg · m−3 , respectively (in terms of RMSE). Such values correspond to AARE at the level of 1.5% in the case of both v0 and ρ0 . It should be stressed that the values obtained from testing are only slightly higher 24

ACS Paragon Plus Environment

Page 24 of 59

Page 25 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

compared to these resulted from training and this finding is independent on the data pool “holdout” split. It is always interesting to report not only the global AARE, but distribution of relative errors, RE ≡ ( ρˆ0 /ρ − 1) × 100%, computed for each IL separately. Such distributions of ρ0 obtained from MLR model are presented for both training and testing set in Figure 5. In particular, in Figure 5a empirical probability density functions (PDFs;, mimicked by histograms) and cumulative distribution function (CDFs) are depicted. As seen, the obtained PDFs and CDFs are relatively narrow and symmetric, but more importantly they look basically the same no matter training or testing set is considered. This finding is additionally supported by Q-Q plots shown in Figure 5b, which also suggest that in the region of the lowest error (approximately for |RE| < 1%) distributions are very close to normal.

Correction Term ( f ) Correction term of eq (1) was modeled by using only FFANN and LSSVM approaches. The idea of apdopting MLR was abandoned, because such approach provides both temperature and pressure dependence of density which is the same for all ILs. On the other hand, FFANN and LSSVM provides mechanical coefficients (like αp ) which are dependent of both IL (i.e., the group assignment) and conditions of temperature/pressure. Parity plots showing the values of f calculated from FFANN or LSSVM models versus those calculated from reduced experimental data are shown in Figure 6. As seen, both of the methods look like displaying similar performance. Nevertheless, LSSVM method exhibits slighlty lower data scatter, particularly noticeable in the case of training set. This is confirmed by the statistical measures collected in Table 3 — in fact, all determination coefficients are higher for LSSVM method, while RMSE values are lower. Furthemore, it is noteworthy that this method is much more accurate if one considers an influence of pressure on the deviation of density from reference conditions, whereas the robustness of FFANN and LSSVM is rather similar in the case of an effect of temperature at ambient pressure. Based on the results just summarized, it was decided to use LSSVM approach to represent 25

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

correction of density for temperature and pressure. Thus, the final model recommended for further analyses is composed of MLR equation for ρ0 term and LSSVM equation for f term of eq (1). Nevertheless, the FFANN-based model will be also mentioned in further text, just for comparative purposes.

Calculated versus Raw Data The final models were applied to recalculate the entire collection of accepted p-ρ-T data. Global values of RMSE and AARE (i.e. computed based on 30222 accepted data points for not excluded ILs) are shown in Table 3. The very promising result is that both FFANN and LSSVM still provide the predictions to be accurate within AARE less than 1%. On the other hand, this may be seen as disappointing, because the previous version of GC model 24 resulted in global AARE of the order of 0.5%. Nevertheless, it should be bear in mind that in this work, diversity of chemical structures deposited in the current database is by fat higher compared to the previous one — in fact, the number of ILs is as twice as high than in reference 24. Since LSSVM-based GC tool has turned out be a little bit more accurate in compared to FFANN, the modeling outcomes presented and visualized in further text and plots will be presented for the former approach only. The results are graphically summarized in Figure 7, where relative deviations between calculated and experimental density are plotted as a function of density (Figure 7a), temperature (Figure 7b) and pressure (Figure 7c). As can be noticed from Figure 7a, distribution of deviations is quite narrow and symmetric. The values of 99% and 95% quantiles equal 5.7% and 2.9%, respectively, taking into account all accepted data for not excluded ILs. Furthermore, data points rejected based on data revision and evaluation are in some cases very high, so that their elimination from regression is actually justified. It is noteworthy that systematic change in the deviations with variation of temperature or pressure was not observed. From Figure 7b and Figure 7c, one can notice that for T = (300 to 350) K and p = (0.1 to 30) K, the deviations seem to be higher. However, this is a result of substantially larger data sets of ρ measured in these ranges of T and p. As expected, the results obtained for ILs excluded at the stage of development of MLR 26

ACS Paragon Plus Environment

Page 26 of 59

Page 27 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

model for reference term are very poor. An impact on type of either cation or anion of IL on the accuracy of the proposed GC model is presented in Figure 8. As can be noticed from Figure 8a, only for a number of cationic families, overall AARE is higher that 1%, as far as only the accepted data are taken into account. Only in the case of ILs based on morpholinium, amidium and sulfonium cations, AARE values were higher than 2%. In my opinion, this is due to scarcity and rarity of the experimental density data available for these families of ILs, see Table 1, thus possibly high uncertainty of them. As seen from Figure 8b, AAREs are lower than 2% regardless of a chemical family of anion, except alcoholates. As follows from Table 1 these ILs are least numerous among other families, so that one should not recognize this finding as disappoiting. Much more promising result is that density of ILs consisting of the most abundant anions like [NTf2 ], [BF4 ], [PF6 ], sulfates and halides are captured by the model very precisely. Finally, as one can easily noticed from Figure 8, AAREs calculated on the basis of all the data reported for given cationic/anionic family are significantly higher compared to the values obtained from accepted data only. This is another supporting argument demonstrating an importance of data revision and evaluation carried out prior to the model development. The results shown in Figure 8 does not take into account the counterions. In order to provide more detailed insight into the performance of the proposed GC model, AARE values were calculated for each possible combination of cationic and anionic families (accounting for accepted data only). The results of this evaluation are depicted in Figure 9 in the form of pseudocolor (checkerboard) plot — I strongly encourage the reader to carefully analyse this figure before making any predictions, as it may assist in estimating their expected uncertainty. It is clearly seen that accuracy of the prediction for a given family of cation/anion may significantly depend of the type of the other moiety forming IL. In fact, for each chemical family of cation, AARE vary moderately or strongly. In particular, high overall AARE observed for morpholinium ILs (see Figure 8a) is evidently related to high AARE detected for tetrafluoroborates. The highest AARE of 11.9% was obtained for cyclic sulfonium cations combined with common inorganics. To be more specific, this value was obtained from single data point reported for 1-ethylthiolan-1-ium nitrate. Density value ρ = 1472.2 kg · mol−1 27

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

reported by Zhang et al. 63 for this IL is unusually high compared to other [NO3 ]-based ILs. More reasonable density of ρ = 1297.2 kg · mol−1 is predicted by the proposed model. Thus, the model can potentially be used to identify the wrong data. In turn, the only anions for which AARE was lower than 1% irrespective of cation were [PF6 ] and methanides (with tricyanomethanide [C(CN)3 ] anion dominating in this group). Unfortunately, the number of still unstudied anions is significant, so that estimation of accuracy of the modeling using the GC scheme described in this work is not feasible for many ILs.

Comparison with Literature Models In order to assess robustness and applicability range of the obtained model (composed of MLR equation for reference term and LSSVM equation for the correction), some global statistics of it need to be confronted with respective values for the methods published in the literature so far. 12–31,33–43 The results of such comparison are summarized in Table 4 and they can be treated as a suggestion for selection of the best methodologies for the future prediction of density of ILs, especially as a function of pressure. As can be seen, the method proposed in this work is the most comprehensive one in terms of diversity of chemical structures used in the model development process — in my opinion, this is a beneficial feature relevant for IL research community. If fact, the number of ILs considered in this work is almost twice as high as in the case of the previous version 24 and much higher compared with some other comprehensive models like those from Evangelista et al. 28 or Taherifard and Raeissi. 31 The same regards the size of the database given as the number of data points. Taking these findings into account, one can justify not the lowest, nevertheless still very low, value of overall AARE. The range of pressure covered by the models reported herein is not such broad as in the case of other methods, because very high-pressure (p > 250 MPa) data available for some ILs were exlcuded in the data evaluation step. In general, this is not recommended to used the new models outside the T and p domain given in Table 4. Finally, the proposed GC-based QSPR is confronted with the model proposed recently by Taherifard and Raeissi. 31 We decided to choose this model for detailed comparative analysis for 28

ACS Paragon Plus Environment

Page 28 of 59

Page 29 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

three main reasons: (i) it has been published quite recently, so that one may assume that it poses as state-of-the-art empirical model for estimating p-ρ-T of ILs; (ii) it has been established by using extensive database; (iii) it is very easy to be coded and implemented, as it relies only on elemental composition and simple molecular fragments, which can be easily extracted from chemical structure files. A consequence of point (iii) is that the model by Taherifard and Raeissi can be used to predict density of all ILs considered in this work, so that a global comparison can be made. The results of the comparsion are summarized in terms of RMSE and AARE Table 5 as well as depicted in Figure 10. First of all, it should be stressed that the quality of the predictions obtained from the model of Taherifard & Raeissi 31 is surprisingly good, especially taking into account simplicity of this approach. Indeed, the entire database used in this work is reproduced within 2.1%, whereas predictive power for the reference term ρ0 (i.e. the effect of chemical structure only) is very impressive as well. Nevertheless, these AAREs are still as twice as high as the values obtained for LSSVM or FFANN GC models reported herein. Furthermore, the new model reproduce temperature and pressure dependence of density more accurately, by an order of magnitude in terms of RMSE of the correction f . The results of more detailed comparative analysis are shown in Figure 10. In particular, Figure 10a presents PDFs of AARE values computed for each IL separately. LSSVM method proposed in this work displays narrower histogram and steeper CDF. As an representaive example of this fact, one can mention that the number of ILs with AARE lower than 1% is 1089 for the GC model developed in this work and 586 for the model published by Taherifard amd Raeissi. 31 In Figure 10b and Figure 10c, a more in-depth comparison is made in terms of the fraction of ILs belonging to different cationic and anionic families, respectively, with lower AARE. As seen, in the case of cations the method proposed in this work usually yields in better results, with a few minor exceptions (sulfonium and pyrazolium ILs). In the case of anions, however, the method by Taherifard and Raeissi 31 result in a comparable or even better predictions for intesively studied anions like [BF4 ] and [PF6 ]. On the other hand, the reference model 31 gives worse results for the most abundant anion [NTf2 ], whilst completely fails when applied to ILs based on anions derived from carboxylic acids, heterocyclic amines, alcoholates and large organoboron 29

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

compounds.

30

ACS Paragon Plus Environment

Page 30 of 59

Page 31 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Conclusions New physically and statistically sound QSPRs for predicting density of ILs as a function of chemical structure of constituting ions, temperature and pressure were proposed and discussed. For the very first time, modeling of reference term and modeling of correction reflecting an effect of temperature and pressure were carried out separately. A vast amount of attention was paid to statistics, in particular statistical significance of the utilized GCs as well as both internal and external stability of the final models. The recommended model is composed of MLR correlation for molar volume at T = 298.15 K and p = 0.1 MPa and LSSVM model for computing correction accouting for deviation of actual temperature and pressure for reference conditions. The model is based on a great variety of molecular fragments, thus may be applied in the future, e.g. in computer-aided molecular design of new task-specific ILs, or simply in accessing the missing density data for known structures if they are needed. This idea is additionally supported by remarkable accuracy of the calculated data against the most extensive and comprehensive p-ρ-T database published so far. I am quite sure that this paper will be beneficial for the ILs research community due to this database. Given this data collection at hand, the researchers studying ILs (and the reviewers evaluating their papers) are capable of comparing density data measured by other authors, thus identifying wrong results and enhancing the quality of their research. Although the model is basically not limited with respect to chemical structures, it should be emphasized that it should not be applied outside the domain of temperature and pressure determined by experimental data used in the parameters regression and model validation. Nevertheless, this domain is very broad, so that this is not a trivial task to imagine any applications of ILs in more extreme conditions. Detailed analyses of the new GC model’s performance presented in this paper allow to use them as tools for discrimination between the data of bad or good quality as well as to estimate accuracy of the predictions. Finally, the proposed model has been shown to display significantly better performance compared to the recently reported reference model, 31 thus might be treated as one of the most reliable and versatile predictive QSPRs of this type published in the open literature. 31

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

One can argue that thermodynamic EoS is the only reasonable way for representing p-ρ-T data of fluid. However, current state of knowledge does not allow to elaborate both accurate and universal EoS, especially for such complex fluids like ILs. The models of this type published so far are usually based on IL-specific coefficients, what makes them correlative rather than predictive tools. Besides, the focus in development of EoS-based models is not on density, since volumetric data are usually adopted to adjust their parameters. More preferably, modern equations of state are used to predict more sophisticated properties like phase diagrams or caloric properties of both pure fluids and like enthalpy of mixing or heat capacity. I believe that presented study will help in future searching promising paths in prediction of ILs density as well as assist in evidencing gaps in the existing volumetric data. Based on the results shown, one may expect that machine learning combined with huge data compilations should be seriously considered as one of these approaches for further development in the field.

32

ACS Paragon Plus Environment

Page 32 of 59

Page 33 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Supporting Information Available (1) Detailed information of functional groups used to represent ILs; a full list of ions along with the group assigment; a complete density database with bibliography; detailed results of calculations usign the approaches applied along with all the relevant model parameters (Microsoft Excel XLS file). (2) MATLAB function ildensgc for calculating p-ρ-T data by using the models developed and presented in this work; a template for input file is included along with short instructions inside (ZIP archive).

33

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 59

Acknowledgment Funding for this research was provided by the National Science Centre, Poland, UMO-2016/23/D/ST4/02467.

34

ACS Paragon Plus Environment

Page 35 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

References (1) Freemantle, M. An Introduction to Ionic Liquids; Royal Society of Chemistry, 2010. (2) Hallett, J. P.; Welton, T. Room-Temperature Ionic Liquids: Solvents for Synthesis and Catalysis. 2. Chem. Rev. 2011, 111, 3508–3576. (3) Krossing, I.; Slattery, J. M.; Daguenet, C.; Dyson, P. J.; Oleinikova, A.; Weingärtner, H. Why Are Ionic Liquids Liquid? A Simple Explanation Based on Lattice and Solvation Energies. J. Am. Chem. Soc. 2006, 128, 13427–13434. (4) Plechkova, N. V.; Seddon, K. R. Applications of Ionic Liquids in the Chemical Industry. Chem. Soc. Rev. 2010, 37, 123–150. (5) Pereiro, A. B.; Araújo, J. M. M.; Esperanca, ˛ J. M. S. S.; Rebelo, L. P. N. Ionic Liquids in Separations of Azeotropic Systems — A review. J. Chem. Thermodyn. 2012, 46, 2–28. (6) Ventura, S. P. M.; e Silva, F. A.; Quental, M. V.; Mondal, D.; Freire, M. G.; Coutinho, J. A. P. Ionic-Liquid-Mediated Extraction and Separation Processes for Bioactive Compounds: Past, Present, and Future Trends. Chem. Rev. 2017, 117, 6984–7052. (7) Coutinho, J. A. P.; Carvalho, P. J.; Oliveira, N. M. C. Predictive Methods for the Estimation of Thermophysical Properties of Ionic Liquids. RSC Adv. 2012, 2, 7322–7346. (8) Das, R. N.; Roy, K. Advances in QSPR/QSTR Models of Ionic Liquids for the Design of Greener Solvents of the Future. Mol. Div. 2013, 17, 151–196. (9) Hosseini, S. M.; Mulero, A.; Alavianmehr, M. M. Predictive Methods and Semi-classical Equations of State for Pure Ionic Liquids: A Review. J. Chem. Thermodyn. 2019, 130, 47–94. (10) Slattery, J. M.; Daguenet, C.; Dyson, P. J.; Schubert, T. J. S.; Krossing, I. How to Predict the Physical Properties of Ionic Liquids: A Volume-Based Approach. Angew. Chem. Int. Ed. 2007, 46, 5384–5388. 35

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(11) Ye, C.; Shreeve, J. M. Rapid and Accurate Estimation of Densities of Room-Temperature Ionic Liquids and Salts. J. Phys. Chem. A 2007, 111, 1456–1461. (12) Gardas, R. L.; Coutinho, J. A. P. Extension of the Ye and Shreeve Group Contribution Method for Density Estimation of Ionic liquids in a Wide Range of Temperatures and Pressures. Fluid Phase Equilib. 2008, 263, 26–32. (13) Jacquemin, J.; Ge, R.; Nancarrow, P.; Rooney, D. W.; Costa Gomes, M. F.; Pádua, A. A. H.; Hardacre, C. Prediction of Ionic Liquid Properties. I. Volumetric Properties as a Function of Temperature at 0.1 MPa. J. Chem. Eng. Data 2008, 53, 716–726. (14) Jacquemin, J.; Nancarrow, P.; Rooney, D. W.; Costa Gomes, M. F.; Husson, P.; Majer, V.; Pádua, A. A. H.; Hardacre, C. Prediction of Ionic Liquid Properties. II. Volumetric Properties as a Function of Temperature and Pressure. J. Chem. Eng. Data 2008, 53, 2133–2143. (15) Valderrama, J. O.; Zarricueta, K. A Simple and Generalized Model for Predicting the Density of Ionic Liquids. Fluid Phase Equilib. 2009, 275, 145–151. (16) Lazzús, J. A. ρ(T, p) Model for Ionic Liquids Based on Quantitative Structure-Property Relationship Calculations. J. Phys. Org. Chem. 2009, 22, 1193–1197. (17) Lazzús, J. A. ρ-T-P Prediction for Ionic Liquids Using Neural Networks. J. Taiwan. Inst. Chem. Eng. 2009, 40, 213–232. (18) Lazzús, J. A. A Group Contribution Method to Predict ρ-T-P of Ionic Liquids. Chem. Eng. Comm. 2010, 197, 974–1015. (19) Qiao, Y.; Ma, Y.; Huo, Y.; Ma, P.; Xia, S. A Group Contribution Method to Estimate the Densities of Ionic Liquids. J. Chem. Thermodyn. 2010, 42, 852–855. (20) Ji, X.; Adidharma, H. Thermodynamic Modeling of Ionic Liquid Density with Heterosegmented Statistical Associating Fluid Theory. Chem. Eng. Sci. 2009, 64, 1985–1992.

36

ACS Paragon Plus Environment

Page 36 of 59

Page 37 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

(21) Abildskov, J.; Ellegaard, M. D.; O’Connell, J. P. Densities and Isothermal Compressibilities of Ionic Liquids — Modeling and Application. Fluid Phase Equilib. 2010, 295, 215–229. (22) Wang, J.; Li, Z.; Li, C.; Wang, Z. Density Prediction of Ionic Liquids at Different Temperatures and Pressures Using a Group Contribution Equation of State Based on Electrolyte Perturbation Theory. Ind. Eng. Chem. Res. 2010, 49, 4420–4425. (23) Shen, C.; Li, C.-x.; Li, X.-m.; Lu, Y.-z.; Muhammad, Y. Estimation of Densities of Ionic Liquids Using Patel-Teja Equation of State and Critical Properties Determined from Group Contribution Method. Chem. Eng. Sci. 2011, 66, 2690–2698. (24) Paduszyński, K.; Domańska, U. A New Group Contribution Method For Prediction of Density of Pure Ionic Liquids over a Wide Range of Temperature and Pressure. Ind. Eng. Chem. Res. 2012, 51, 591–604. (25) Roshan, N.; Ghader, S. Developing Models for Correlating Ionic Liquids Density: Part 1 – Density at 0.1MPa. Fluid Phase Equilib. 2012, 331, 33–47. (26) Roshan, N.; Ghader, S. Developing Models for Correlating Ionic Liquids Density: Part 2 – Density at High Pressures. Fluid Phase Equilib. 2013, 358, 172–188. (27) El-Harbawi, M.; Samir, B. B.; Babaa, M.-R.; Mutalib, M. I. A. A New QSPR Model for Predicting the Densities of Ionic Liquids. Arab. J. Sci. Eng. 2014, 39, 6767–6775. (28) Evangelista, N. S.; do Carmo, F. R.; de Santiago-Aguiar, R. S.; de Sant’Ana, H. B. Development of a New Group Contribution Method Based on GCVOL Model for the Estimation of Pure Ionic Liquid Density over a Wide Range of Temperature and Pressure. Ind. Eng. Chem. Res. 2014, 53, 9506–9512. (29) Yan, F.; Shang, Q.; Xia, S.; Wang, Q.; Ma, P. Application of Topological Index in Predicting Ionic Liquids Densities by the Quantitative Structure Property Relationship Method. J. Chem. Eng. Data 2015, 60, 734–739. 37

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(30) Keshavarz, M. H.; Pouretedal, H. R.; Saberi, E. A Simple Method for Prediction of Density of Ionic Liquids through Their Molecular Structure. J. Mol. Liq. 2016, 216, 732–737. (31) Taherifard, H.; Raeissi, S. Estimation of the Densities of Ionic Liquids Using a Group Contribution Method. J. Chem. Eng. Data 2016, 61, 4031–4038. (32) Barati-Harooni, A.; Najafi-Marghmaleki, A.; Mohammadi, A. H. ANFIS Modeling of Ionic Liquids Densities. J. Mol. Liq. 2016, 224, 965–975. (33) Zhao, N.; Menegolla, H. B.; Degirmenci, V.; Wagner, Z.; Bendová, M.; Jacquemin, J. Group Contribution Method for Evaluation of Volumetric Properties of Ionic Liquids Using Experimental Data Recommended by Mathematical Gnostics. Ind. Eng. Chem. Res. 2017, 56, 6827–6840. (34) Mousazadeh, M. H.; Diarmand, H.; Hakimelahi, R. Correlation Densities of Ionic Liquids Based on Perturbed Yukawa Chain Equation of State. Phys. Chem. Liq. 2013, 51, 33–43. (35) Hosseini, S.; Alavianmehr, M.; Moghadasi, J. Density and Isothermal Compressibility of Ionic Liquids from Perturbed Hard-Dimer-Chain Equation of State. Fluid Phase Equilib. 2013, 356, 185–192. (36) Maghari, A.; ZiaMajidi, F. Prediction of Thermodynamic Properties of Pure Ionic Liquids through Extended SAFT-BACK Equation of State. Fluid Phase Equilib. 2013, 356, 109–116. (37) Abildskov, J.; O’Connell, J. P. Densities of Pure Ionic Liquids and Mixtures: Modeling and Data Analysis. J. Solution Chem. 2015, 44, 558–592. (38) Alavianmehr, M. M.; Hosseini, S. M.; Moghadasi, J. Densities of Ionic Liquids from Ion Contribution-Based Equation of State: Electrolyte Perturbation Approach. J. Mol. Liq. 2014, 197, 287–294. (39) Alavianmehr, M. M.; Taghizadehfard, M.; Hosseini, S. M. Development of a Perturbed HardSphere Equation of State for Pure and Mixture of Ionic Liquids. Ionics 2016, 22, 649–660. 38

ACS Paragon Plus Environment

Page 38 of 59

Page 39 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

(40) Alavianmehr, M. M.; Akbari, F.; Behjatmanesh-Ardakani, R. Modelling Phase Equilibria of Pure Ionic Liquids from a New Equation of State. Ionics 2016, 22, 2447–2459. (41) Ji, X.; Held, C. Modeling the Density of Ionic Liquids with ePC-SAFT. Fluid Phase Equilib. 2016, 410, 9–22. (42) Farzi, R.; Esmaeilzadeh, F. Prediction of Densities of Pure Ionic Liquids Using Esmaeilzadeh– Roshanfekr Equation of State and Critical Properties from Group Contribution Method. Fluid Phase Equilib. 2016, 423, 101–108. (43) Bagheri, H.; Ghader, S. Correlating Ionic Liquids Density over Wide Range of Temperature and Pressure by Volume Shift Concept. J. Mol. Liq. 2017, 236, 172–183. (44) Penna, T. C.; Paschoal, V. H.; Faria, L. F.; Ribeiro, M. C. Estimating Density of Ionic Liquids under Very High Pressure. J. Mol. Liq. 2018, 265, 510–516. (45) Ribeiro, M. C.; Pádua, A. A.; Gomes, M. F. Equations of States for an Ionic Liquid under High Pressure: A Molecular Dynamics Simulation Study. J. Chem. Thermodyn. 2014, 74, 39–42. (46) Zawadzki, M.; Królikowska, M.; Lipiński, P. Physicochemical and Thermodynamic Characterization of N-Alkyl-N-Methylpyrrolidinium Bromides and Its Aqueous Solutions. Thermochim. Acta 2014, 589, 148–157. (47) Součková, M.; Klomfar, J.; Pátek, J. Surface Tension and 0.1 MPa Density for Members of Homologous Series of Ionic Liquids Composed of Imidazolium-, Pyridinium-, and Pyrrolidinium-based Cations and of Cyano-groups Containing Anions. Fluid Phase Equilib. 2015, 406, 181–193. (48) Zawadzki, M.; Królikowska, M.; Antonowicz, J.; Lipiński, P.; Karpińska, M. Physicochemical and Thermodynamic Properties of the {1-Alkyl-1-Methylmorpholinium Bromide,

39

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

[C1 Cn=3,4,5 MOR]Br, or 1-Methyl-1-pentylpiperidinium Bromide, [C1 C5 PIP]Br + water} Binary Systems. J. Chem. Thermodyn. 2016, 98, 324–337. (49) Sarabando, J. A.; Magano, P. J. M.; Ferreira, A. G. M.; Santos, J. B.; Carvalho, P. J.; Mattedi, S.; Fonseca, I. M. A. Influence of Temperature and Pressure on the Density and Speed of Sound of 2-Hydroxyethylammonium Propionate Ionic Liquid. J. Chem. Thermodyn. 2018, 122, 183–193. (50) Dakkach, M.; Gaciño, F. M.; Mylona, S. K.; Comuñas, M. J. P.; Assael, M. J.; Fernández, J. High Pressure Densities of Two Nanostructured Liquids Based on the Bis(trifluoromethylsulfonyl)imide Anion from (278 to 398) K and up to 120 MPa. J. Chem. Thermodyn. 2018, 118, 67–76. (51) Mirarabazi, M.; Stolarska, O.; Smiglak, M.; Robelin, C. Solid–liquid Equilibria for a Pyrrolidinium-based Common-cation Ternary Ionic Liquid System, and for a Pyridiniumbased Ternary Reciprocal Ionic Liquid System: An Experimental Study and a Thermodynamic Model. Phys. Chem. Chem. Phys. 2018, 20, 637–657. (52) Peng, D.; Zhang, J.; Cheng, H.; Chen, L.; Qi, Z. Computer-Aided Ionic Liquid Design for Separation Processes Based on Group Contribution Method and COSMO-SAC Model. Chem. Eng. Sci. 2017, 159, 58–68. (53) Akaike, H. A New Look at the Statistical Model Identification. IEEE Trans. Automat. Contr. 1974, 19, 716–723. (54) Paduszyński, K.; Domańska, U. Viscosity of Ionic Liquids: An Extensive Database and a New Group Contribution Model Based on a Feed-Forward Artificial Neural Network. J. Chem. Inf. Model. 2014, 54, 1311–1324. (55) Paduszyński, K. In Silico Calculation of Infinite Dilution Activity Coefficients of Molecular Solutes in Ionic Liquids: Critical Review of Current Methods and New Models Based on Three Machine Learning Algorithms. J. Chem. Inf. Model. 2016, 56, 1420–1437. 40

ACS Paragon Plus Environment

Page 40 of 59

Page 41 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

(56) Klimasauskas, C. C. In Neural Networks in Finance and Investing: Using Artificial Intelligence to Improve Real World Performance; Trippi, R. R., Turban, E., Eds.; Probus, Chicago, 1993; Chapter : Applying Neural Networks, pp 64–65. (57) Nelder, J. A.; Mead, R. A Simplex Method for Function Minimization. Comput. J. 1965, 7, 308–313. (58) Xavier-de Souza, S.; Suykens, J.; Vandewalle, J.; Bolle, D. Coupled Simulated Annealing. IEEE Trans. Syst. Man Cybern. Part B Cybern. 2010, 40, 320–335. (59) LSSVMlab. Published online. Accessed on October 1, 2018, 2014; URL: http://www. esat.kuleuven.be/sista/lssvmlab/. (60) Rücker, C.; Rücker, G.; Meringer, M. y-Randomization and Its Variants in QSPR/QSAR. J. Chem. Inf. Model. 2007, 47, 2345–2357. (61) Consonni, V.; D., B.; Todeschini, R. Comments on the Definition of the Q2 Parameter for QSAR Validation. J. Chem. Inf. Model. 2009, 49, 1669–1678. (62) Eriksson, L.; Jaworska, J.; Worth, A. P.; Cronin, M. T. D.; McDowell, R. M.; Gramatica, P. Methods for Reliability and Uncertainty Assessment and for Applicability Evaluations of Classification- and Regression-Based QSARs. Environ. Health Perspect. 2003, 111, 1361– 1375. (63) Zhang, Q.; Liu, S.; Li, Z.; Li, J.; Chen, Z.; Wang, R.; Lu, L.; Deng, Y. Novel Cyclic SulfoniumBased Ionic Liquids: Synthesis, Characterization, and Physicochemical Properties. Chem. Eur. J. 2009, 15, 765–778.

41

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 42 of 59

Table 1. Summary of Cationic And Anionic Families of Ionic Liquids Contained in the p-ρ-T Database Used in This Work. Family name

Code

Number of distinct: molecules counterions ionic liquids data sets data points

Cations imidazolium

[im]

356

162

894

2292

24041

ammonium

[n]

232

91

466

633

4708

pyridinium

[py]

105

32

191

341

4180

phosphonium

[p]

65

115

230

320

3383

pyrrolidinium

[pyr]

51

46

126

287

2634

piperidinium

[pipz]

35

24

66

106

740

morpholinium

[mo]

21

22

51

58

360

cyclopropanium

[cprop]

19

2

33

37

297

bicyclic

[bic]

11

9

19

20

195

azepanium

[azp]

7

7

15

23

149

thiouronium

[thur]

15

2

17

18

125

pyrazolium

[pz]

17

5

21

24

111

guanidinium

[guan]

20

14

34

41

71

triazolium

[trz]

33

4

34

39

65

sulfonium

[s]

15

3

17

24

64

quinolinium

[quin]

8

4

10

10

64

cyclic sulfonium

[cs]

8

4

13

15

30

oxazolidinium

[ox]

4

8

14

14

14

amidium

[amd]

4

2

6

6

6

cyclic amidium

[camd]

4

2

5

5

5

piperazinium

[pipz]

1

1

1

1

4

42

ACS Paragon Plus Environment

Page 43 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Table 1. (continued) pyridazolium

[pdz]

2

1

2

2

2

tetrazolium

[tetraz]

1

1

1

1

1

thiazolium

[thz]

1

1

1

1

1

Anions [NTf2 ] derivatives

[NTf2 ]

10

727

787

1551

12233

[BF4 ] derivatives

[BF4 ]

35

100

193

472

6133

sulfates

[RSO4 ]

13

87

107

323

4181

carboxylates

[RCO2 ]

38

92

238

347

3870

[PF6 ] derivatives

[PF6 ]

5

54

69

254

3180

inorganics

[X]

10

121

167

304

2843

sulfonates

[RSO3 ]

27

128

199

311

2326

dicyanamides

[dca]

2

122

126

236

2228

aminoacids

[AA]

22

46

119

153

1217

phosphates

[RPO4 ]

16

52

70

121

1017

metal complexes

[MLn ]

25

27

75

94

735

19

27

53

61

600

heterocyclic amines [hca] methanides

[CR3 ]

2

8

9

35

376

organic borates

[B]

6

18

26

27

190

alcoholates

[O]

22

9

29

29

121

43

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 44 of 59

Table 2. Statistics of the Proposed Models for Reference Term of Eq (3) Given in Terms of Training, K-Fold Cross-Validation and Testing Determination Coefficients (R2 , Q2K -CV , Q2 ), Root-Mean-Square Error (RMSE) and Average Absolute Relative Error (AARE) Between Computed and Reduced Experimental Data a Method

R2

Q2K-CV

Q2

RMSE / AARE (%) v0 / cm3 · mol−1

ρ0 / kg · m−3

training

training

test

test

SWMLR 0.9985 0.9981 0.9983 5.5 / 1.42 5.8 / 1.56 27.1 / 1.41 31.4 / 1.57 FFANN

0.9989 0.9980 0.9983 4.8 / 1.24 6.0 / 1.50 24.2 / 1.23 27.3 / 1.49

LSSVM

0.9989 0.9977 0.9998 4.7 / 1.20 5.4 / 1.42 23.7 / 1.20 28.0 / 1.43

a

K = 9. Statistical measures defined in eqs (7) to (11).

44

ACS Paragon Plus Environment

Page 45 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Table 3. Summary of the Statistics of the Proposed Models for Correction Term f (T , p) of eq (3) Given in Terms Training, 10-Fold Cross-Validation and Testing Determination Coefficients (R2 , Q2K -CV , Q2 ), Root Mean-Square-Error (RMSE) Between Computed and Reduced Experimental Data a Method

FFANN

R2

Q2K-CV

Q2

RMSE (overall)

RMSE (p = p0 )

RMSE (p > p0 )

training test

training test

training test

0.9917 0.9845 0.9848 0.0014

0.0019 0.0014

0.0018 0.0017

0.0030

LSSVM 0.9984 0.9913 0.9995 0.0006

0.0010 0.0006

0.0011 0.0002

0.0008

a

K = 9. Statistical measures defined in eqs (7) to (11).

45

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 46 of 59

Table 4. Comparison of the Models Reported in the Literature for Estimating p-ρ-T Data with the Method Proposed in this Work Given in Terms of Global Average Absolute Relative Error (AARE) Between Calculated and Raw Experimental Density Data Authors

Year

Number of: ILs

T /K

p / MPa

AARE (%)a

data points

Empirical QSPR models MLR + LSSVM (this work) 2019 1999

30222 217–473 0.1–250

0.90

5399 217–473 0.1–207

0.31

Zhao et al. 33

2017

54

Keshavarz et al. 30

2016

484

Taherifard et al. 31

2016

—b

Barati-Harooni et al. 32

2016

146

602 278–358

0.1

0.66

Yan et al. 29

2015

188

5948 253–473

0.1-250

0.42

Evangelista et al. 28

2014

864

21845 251–473 0.1–300

0.83

El-Harbawi et al. 27

2014

747

Roshan-Ghader 26

2013

32

3127 293–472 0.1–200

1.09

Paduszyński-Domańska 24

2012 1028

16830 251–473 0.1–300

0.51

Roshan-Ghader 25

2012

40

Lazzús 18

2010

310

3530 258–393 0.1–207

0.73

Qiao et al. 19

2010

123

6750 258–473 0.1–300

0.88

Lazzús 16

2009

163

3020 258–473 0.1–207

2.00

Lazzús 17

2009

322

3183 273–393 0.1–207

0.55

Gardas-Coutinho 12

2008

23

1086 293–393 0.1–100

0.45

Jacquemin et al. 13,14

2008

28

5080 298–324 0.1–207

0.36

5020 273–473 0.1–205

2.19

484

298

0.1

25850 217–473 0.1–300

918

298

322 293–472

0.1

0.1

2.20 0.95

1.73

0.83

Thermodynamic models (EoSs) Bagheri-Ghader 43

2017

67

Ji-Held 41

2016

55

Farzi-Esmaeilzadeh 42

2016

347

—b

283–373

2295 273–473 46

ACS Paragon Plus Environment

0.1–60

< 0.50

0.1

11.5

Page 47 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Table 4. (continued) Alavianmehr et al. 39

2016

39

6331 272–472 0.1–200

0.18

Alavianmehr et al. 40

2016

24

2189 273–432 0.1–150

0.62

Alavianmehr et al. 38

2014

29

4353 272–472 0.1–204

0.64

Mousazadeh et al. 34

2013

25

2650 273–393 0.1–100

0.22

Hosseini et al. 35

2013

32

5110 273–472 0.1–250

0.24

Maghari et al. 36

2013

8

932 298–472 0.1–200

0.12

Shen et al. 23

2011

747

918

4.40

Abildskov et al. 21

2010

46

3827 273–473 0.1–250

0.20

Wang et al. 22

2010

12

1163 293–415

0.1–70

0.41

Valderrama-Zarricueta 15

2009

146

602 270–360

0.1

2.80

Ji–Adidharma 20

2009

9

0.1–65

0.20

a

Defined in eq (10).

b

Not stated in the original paper.

—b

47

298

293–415

ACS Paragon Plus Environment

0.1

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 48 of 59

Table 5. Comparison of the Performance of the Models Presented in This Work with GC Method Published by Taherifard & Raeissi (Reference 31) Given in Terms of Root-MeanSquare Error (RMSE) and Average Absolute Relative Error (AARE) Between Computed and Experimental Reduced or Raw Density Data a Method

RMSEb

AARE (%)

ρ(T, p)c

ρ0

f (T, p)

f (p = p0 )

f (p > p0 )

ρ(T, p)c

ρ0

LSSVM (this work)

17.3

27.6

0.0007

0.0007

0.0003

0.90

1.43

FFANN (this work)

17.3

27.6

0.0015

0.0014

0.0019

0.90

1.43

Taherifard & Raeissi 31

50.7

70.1

0.0213

0.0209

0.0274

2.10

3.52

a

Statistical measures defined in eqs (9) and (10).

b

RMSE of density given in kg · m−3 .

c

Calculated based on raw p-ρ-T data.

48

ACS Paragon Plus Environment

Page 49 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Figure 1. Overview of p-ρ-T database used in this work, presented as the distribution of the number of ILs (NILs ) composed of the cations and anions belonging to different chemical families. Cationic and anionic families put on the ordinate and abscissa axis, respectively, abbreviated using codes explained in Table 1.

49

ACS Paragon Plus Environment

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2. An example of p-ρ-T data revision and evaluation for 1-butyl-3-methylimidazolium tetrafluroborate IL (in total, 89 data sets and 1598 data points): (a) experimental data versus fitted p-ρ-T surface as a function of temperature pressure — parameters of eq (1) and eq (2) given in the Supporting Information; (b) relative errors (RE) between fitted and experimental data as a function of density; (c) Williams plot of studentized residuals s resulting from the fit versus leverage h of each data point scaled by its mean value (dashed lines determined by the critical values of both h and s for data acceptance).

Industrial & Engineering Chemistry Research

ACS Paragon Plus Environment

50

Page 50 of 59

Page 51 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Figure 3. Parity plots of computed (with MLR method, including training and testing) versus reduced experimental data of reference (T = 298.15 K, p = 0.1 MPa) property of model eq (3): (a) reference molar volume v0 (property utilized in the regression); (b) reference density (property derived from v0 ).

51

ACS Paragon Plus Environment

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 4. Results of internal validation of the final MLR model for reference molar volume v0 : (a) parity plot of the cross-validation versus reduced experimental data; (b) training and crossvalidation determination coefficients (R2 and Q2K-CV ) of 1000 y-randomized models and the final model as a function of the Pearson correlation coefficient ryy between original and randomized v0 data — solid lines designated by linear fits; (c) Q2K-CV as a function of R2 for y-randomized models and the final model.

Industrial & Engineering Chemistry Research

ACS Paragon Plus Environment

52

Page 52 of 59

Page 53 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Figure 5. Distribution of relative errors (RE) between computed (with MLR model) and reduced experimental reference density data in the training and testing set: (a) empirical probability density functions (PDFs) and cumulative distribution functions (CDFs) of RE; (b) quantile-quantile plots for the values of RE versus standard normal distribution N(0, 1).

53

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 6. Parity plots of computed (including training and testing) versus reduced experimental data of temperature-pressure correction of density f , see eq (3): (a) FFANN results; (b) LSSVM results.

54

ACS Paragon Plus Environment

Page 54 of 59

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 7. Scatter plots of relative errors (RE) between computed (with LSSVM method applied for correction term) and raw experimental density data as a function of: (a) density; (b) temperature; (c) pressure. Quantiles of absolute RE marked in the plots equal 2.9% (0.95 quantile) and 5.6% (0.99 quantile).

Page 55 of 59 Industrial & Engineering Chemistry Research

ACS Paragon Plus Environment

55

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 8. Average absolute relative errors (AARE) between computed (with LSSVM method) and raw experimental density for different families of ILs, see eq (10): (a) cationic families; (b) anionic families. Abbreviations used to denote families of ions are explained in Table 1.

56

ACS Paragon Plus Environment

Page 56 of 59

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 9. Average absolute relative errors (AARE) between computed (with LSSVM method) and raw experimental density for different combinations of cationic and anionic families of ILs, see eq (10). “n/a” means that the experimental data have been not available yet. Abbreviations used to denote families of ions are explained in Table 1.

Page 57 of 59 Industrial & Engineering Chemistry Research

ACS Paragon Plus Environment

57

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 10. Comparison of the performance of the final model proposed in this work, i.e. MLR for reference term and LSSVM for correction term of eqs (3) and (4), with the model proposed and published by Taherifard amd Raeissi 31 (T & R): (a) empirical probability density functions (PDFs) and cumulative distribution functions (CDFs) of AAREs obtained for each IL separately; (b) fraction of lower relative errors (REs) obtained from MLR-LSSVM modeling for different cationic families of ILs; (c) fraction of lower relative errors (REs) obtained from MLR-LSSVM modeling for different anionic families of ILs. Abbreviations used to denote families of ions are explained in Table 1.

Industrial & Engineering Chemistry Research

ACS Paragon Plus Environment

58

Page 58 of 59

Page 59 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

TOC Graphic screening CAMD ...

IL? GC database

machine learning

SWMLR, FFANN, LSSVM

59

ACS Paragon Plus Environment

ρ, V GC-QSPR