A General Guidebook for the Theoretical Prediction of

Review pubs.acs.org/CR

A General Guidebook for the Theoretical Prediction of Physicochemical Properties of Chemicals for Regulatory Purposes Carlos Nieto-Draghi,*,† Guillaume Fayet,‡ Benoit Creton,† Xavier Rozanska,§ Patricia Rotureau,‡ Jean-Charles de Hemptinne,† Philippe Ungerer,§ Bernard Rousseau,∥ and Carlo Adamo⊥,# †

IFP Energies nouvelles, 1 et 4 avenue de Bois-Préau, 92852 Rueil-Malmaison, France INERIS, Parc Technologique Alata, BP2, 60550 Verneuil-en-Halatte, France § Materials Design S.A.R.L., 18, rue de Saisset, 92120 Montrouge, France ∥ Laboratoire de Chimie-Physique, Université Paris Sud, UMR 8000 CNRS, Bât. 349, 91405 Orsay Cedex, France ⊥ Institut de Recherche Chimie Paris, PSL Research University, CNRS, Chimie Paristech, 11 rue P. et M. Curie, F-75005 Paris, France # Institut Universitaire de France, 103 Boulevard Saint Michel, F-75005 Paris, France ‡

S Supporting Information *

3.2.4. COSMO-RS/SAC 3.2.5. Molecular Simulation 3.2.6. Overview: Tb 3.3. Relative Density (ρr) 3.3.1. GC Method 3.3.2. QSPR 3.3.3. EoS 3.3.4. Molecular Simulation 3.3.5. Overview: ρr 3.4. Vapor Pressure (Psat) 3.4.1. GC Method 3.4.2. QSPR 3.4.3. EoS 3.4.4. COSMO-RS/SAC 3.4.5. Molecular Simulation 3.4.6. Overview: Psat 3.5. Surface Tension (γ) 3.5.1. GC Method 3.5.2. QSPR 3.5.3. EoS 3.5.4. Molecular Simulation 3.5.5. Overview: γ 3.6. Water Solubility (Sw) 3.6.1. GC Method 3.6.2. QSPR 3.6.3. EoS 3.6.4. COSMO-RS/SAC 3.6.5. Molecular Simulation 3.6.6. Overview: (Sw) 3.7. Partition Coefficient n-Octanol/Water (Kow) 3.7.1. GC Method 3.7.2. QSPR 3.7.3. EoS 3.7.4. COSMO-RS/SAC 3.7.5. Molecular Simulation 3.7.6. Other Methods 3.7.7. Overview: log Kow

CONTENTS 1. Introduction 2. Description of Methods 2.1. Group-Contribution Methods (GC) and Correlative Models 2.2. Quantitative Structure−Property Relationships (QSPR) 2.3. Equations of State and Molecular-Based Approaches 2.3.1. Background 2.3.2. Cubic EoSs 2.3.3. Statistical Mechanics-Based EoSs 2.3.4. GC-PPC-SAFT as an Example 2.4. Activity Coefficient Models and COSMO/ COSMO-RS 2.5. Molecular Simulation 2.5.1. Energy of Molecular Systems 2.5.2. Monte Carlo Methods 2.5.3. Molecular Dynamics 2.5.4. Final Remarks on the Methods 3. Prediction of Physicochemical Properties 3.1. Melting Point (Tm)/Freezing Point 3.1.1. GC Methods 3.1.2. QSPR 3.1.3. Molecular Simulation 3.1.4. Overview: Tm 3.2. Boiling Point (Tb) 3.2.1. GC Method 3.2.2. QSPR 3.2.3. EoS © XXXX American Chemical Society

B C C C E E E F G G I I J K K K L L L M N N N N N

O O R R R R S T T T W W W X Y Z Z Z AA AA AB AE AE AE AF AF AF AG AG AG AG AH AH AH AI AJ AJ

Received: May 7, 2015

A

DOI: 10.1021/acs.chemrev.5b00215 Chem. Rev. XXXX, XXX, XXX−XXX

Chemical Reviews 3.8. Flash Point (FP) 3.8.1. GC Method 3.8.2. QSPR 3.8.3. Overview: FP 3.9. Flammability (F) 3.9.1. QSPR 3.9.2. Overview: (F) 3.10. Explosive Properties 3.10.1. QSPR 3.10.2. Molecular Simulation 3.10.3. Overview of Explosive Properties 3.11. Self-Ignition Temperature (AIT) 3.11.1. GC Method 3.11.2. QSPR 3.11.3. Overview: AIT 3.12. Oxidizing Properties 3.12.1. QSPR 3.12.2. Overview of Oxidizing Properties 3.13. Dissociation Constant (pKa) 3.13.1. GC Method 3.13.2. QSPR 3.13.3. COSMO-RS/SAC 3.13.4. Other Methods 3.13.5. Overview: pKa 3.14. Viscosity (η) 3.14.1. GC Method 3.14.2. QSPR 3.14.3. EoS 3.14.4. Molecular Simulation 3.14.5. Overview: η 4. Final Remarks and Outlook Associated Content Supporting Information Author Information Corresponding Author Notes Biographies Acknowledgments Acronyms and Abbreviations References

Review

notably aimed at reducing, refining, or replacing the use of animals in toxicological testing where feasible.2 Considering the huge number of chemical compounds currently used in industry, and in particular those requiring a new registration in the framework of REACH, alternative methods for the gathering of physicochemical properties are also of great interest.3 Indeed, the experimental measurement of all of the data required is not realistic in terms of time (by 2018), cost, feasibility at the R&D level, and safety (considering the risks to manipulators, in particular for the characterization of hazardous physicochemical properties). Moreover, all new substances will require early examination to identify possible hazards, so it is necessary to anticipate the risks and recognize the hazardous substances as early as possible in their development. In this context, it is urgent to develop fast and reliable methods for property screening. During the past decade, quantitative structure−activity (or property) relationships (QSAR/QSPR) have been considered to be pertinent solutions. These methods are explicitely mentioned in Annex XI of the REACH regulation, and validation principles have been established to allow application in a regulatory context4 and are listed among the available methods to gather physicochemical properties under the guidance of the European Chemical Agency.5 If QSAR/ QSPR methods were particularly targeted by REACH, molecular scale modeling, in general, can play a critical role. Indeed, for several decades, molecular and modeling simulation in all of their forms (particularly quantum chemistry, molecular dynamics and Monte Carlo methods, and empirical correlations) have progressed enormously. This development was further boosted by the impressive development in computer science, concerning both hardware and software developments.6 It is imperative to seize this opportunity in the context of the systematic, effective, and reliable prediction of the physicochemical properties of chemicals. These methods can be considered not only for the gathering of data for the REACH registration files but also for the early identification of hazards for new chemicals at the R&D level, to anticipate potential regulatory constraints in various contexts, for example, REACH, classification and labeling,7 transportation,8 and industrial constraints aimed at the safety of workers, consumers, or the environment. This document aims at reviewing the molecular modeling approaches available as a fast and reliable alternative approach to experiments for the physicochemical properties required by the REACH regulation. Among all properties listed in Table 1, three properties were not considered: the state of the substance in standard conditions (1), the granulometry (14), and the stability in organic solvents and identity of degradation products (15), which are not directly dependent on the molecular scale properties of the substance. Several molecular modeling methods were particularly considered: group-contributions (GC), quantitative structure−property relationships (QSPR), equations of states (EoS), COSMO-RS/SAC, and molecular simulations (MS). In some particular cases, other predictive methods were also mentioned for completeness. To have a consistent review, a brief presentation of these methods is first given below, with the aim of providing the reader of their main crucial points (pros and cons). The Review is organized according to the list of properties to be determined, and for each property were given a definition according to the EU-REACH regulation followed by the list of existing methods.

AJ AJ AJ AK AK AL AL AL AL AP AP AP AQ AQ AQ AQ AQ AQ AQ AR AR AR AR AS AS AS AS AT AU AW AW AX AX AX AX AX AX AZ AZ BB

1. INTRODUCTION The REACH regulation (recent EU regulation for “Registration, Evaluation, Authorization, and Restriction of Chemicals”) entered its product-recording phase after December 2008.1 It aims to reduce the environmental, health, and safety impact of chemical products used in industry and is based on a systematic and reliable knowledge of their intrinsic properties. It is applicable to all substances produced or marketed in Europe at more than 1 ton a year. In particular, it requires the evaluation of the physicochemical, toxicological, and ecotoxicological properties of more than 143 000 existing substances that have been preregistered by 65 000 companies to allow their use before 2018. The implementation of REACH regulation1 was also used as an opportunity to encourage alternative methods to experimental testing and in particular to animal testing, primarily for ethical purposes. In fact, the use of alternative methods was explicitly included in the regulation (see article 13 and Annex XI of the regulation). Initiatives have also been launched in the United States by the Interagency Coordinating Committee on the Validation of Alternative Methods (ICCVAM), which are B


Chemical Reviews

Review

correlation models, and they are thus classified under different orders, first,15 second,16 or third.17 The first order describes the simple functional groups in organic molecules, whereas the second and third orders allow a much more detailed description of the structure of molecules including other groups and fragments that are not described in the first order. The second and third orders act as a correction of deviation from the parameters provided in the first order so as to improve the accuracy of the method. GC methods are generally easy to use, because one property Y can be estimated by a sum of terms or contributions representing the different groups present in a molecule (the Ai to Fk values), as shown in eq 1.

Table 1. List of Physicochemical Properties Required in the REACH Regulation property (1) state of the substance at 20 °C and 101.3 kPa (2) melting/freezing point (3) boiling point (4) relative density (5) vapor pressure (6) surface tension (7) water solubility (8) partition coefficient n-octanol/water (9) flash point (10) flammability (11) explosive properties (12) self-ignition temperature (13) oxidizing properties (14) granulometry (15) stability in organic solvents and identity of degradation products (16) dissociation constant (17) viscosity

abbreviation Tm Tb ρr Psat γ Sw Ko/w FP

Y = (∑ Ai Bi )a + (∑ CjDj)b + (∑ Ek Fk)c i

j

k

(1)

AIT

where the constants a, b, and c determine the linear or nonlinear character of the equation. The relative simplicity of these methods makes them popular in process engineering design. A list of available methods along with the property they can compute and recommendations depending on the type of chemical compound is provided in the well-known handbook by Poling et al.18 The main limitation of these techniques is the lack of parameters for some molecules, their accuracy in predicting properties of some complex compounds, or the impossibility of isomer discrimination (except when using a higher order). Some authors have recently proposed extending GC methods by using connectivity index (CI) to extract new terms that could be missing in the original method.19 This new approach, known as GC+, provides more flexible capabilities for describing the thermodynamic properties of complex systems such as polymers.20 One recent publication expands the GC+ method for a vast number of compounds to predict 18 pure component properties, including normal boiling point, normal melting point, flash point, self-ignition temperature, n-octanol/water partition coefficient, among other properties at 298 K.21

Sos pKa η

2. DESCRIPTION OF METHODS 2.1. Group-Contribution Methods (GC) and Correlative Models

Group-contribution methods belong to the class of empirical methods and are also known as additive group methods, because the property of a molecule can be described as a sum of contributions coming from different blocks representing different functional groups. The main idea behind GC methods is shown in Figure 1.

2.2. Quantitative Structure−Property Relationships (QSPR)

The first development toward quantitative structure−property relationships stems from as early as the mid-19th century. In 1868, Crum-Brown and Fraser postulated the possibility of relationships between physiological activities and chemical properties by linking changes in biological activities to simple structural modifications.22 An important step toward QSPR models was proposed with the Hammett equations, in which constants were proposed to quantitatively estimate the reaction constant in organic compounds.23−25 Nevertheless, the first projects that properly followed the QSPR methodology were proposed by Hansh26 by exhibiting direct relationships between biological activities to hydrophobic, electronic, and steric properties at molecular scale. From that point onward, methods have constantly progressed toward better models: through improvements of the molecular descriptors, the data mining methods, and the validation tests, among others. The principle of QSPR is based on the assumption that similar structures have similar properties and that variations in the molecular structure entail changes in macroscopic properties. In general, the modeling technique is named according to the target end point belonging to activity, toxicity, or physicochemical property, leading to quantitative structure activity/toxicity/ property relationships with corresponding acronyms QSAR, QSTR, and QSPR, respectively. Hereafter, QSPR will include the three acronyms. More presicely, it consists of quantitatively

Figure 1. Schematic representation of a molecule decoupled into building blocks.

GC methods are predictive by definition, because the contribution to one property is carried out by the functional group itself that, in principle, can be used to represent any possible combination of groups present in molecules. Recently, Speybroeck et al. reviewed the advantages and limitations of GC methods as compared to quantum chemical calculations.9 Various methods exist that are similar to GC methods, some of which are based on topological indices.10−12 Speybroeck et al. consider that GC methods are a special case of QSPR methods.13,14 In some cases, chemical descriptors can be computed by quantum methods and be used directly as building blocks for the GC method. GC methods can be classified depending on the number and extent of groups used in the C


Chemical Reviews

Review

Different descriptors (constitutional, topological, geometric, and quantum chemical) are calculated to characterize the molecular structures of these compounds, which can be classified on the basis of the level at which they are calculated, that is, 1D originating from chemical formulas, 2D and 3D from two- and three-dimensional structures, respectively.38,39 Constitutional descriptors are based on the empirical formula and 2D structures of the molecule by the occurrence or count of particular atoms, bonds, or chemical groups in the molecules. Topological descriptors are based on the molecular graph of molecules (originating from the 2D structure). The molecular graphs are a representation of molecules in which atoms are vertices and bonds are edges. Topological indices are then calculated on the basis of these molecular graphs. These indices can be linked to the shape, size, and degree of ramification of molecules. A third class of descriptors is geometrical descriptors, which are calculated on the basis of the 3D structure of molecules: interatomic distances, angles, dihedral angles, molecular volume, and surface area. Finally, quantum chemical descriptors40 gather information based on the quantum chemical characterization of the molecules containing electronic features (e.g., atomic charges, polarizability, and dipole moment), vibrational frequency levels, and reactivity indices, like those originating from the conceptual density functional theory.41,42 It has to be noticed that the determination of 3D descriptors (geometrical and quantum chemical) needs the identification of a relevant conformation of the target compounds that could require a full conformational study, as demonstrated, for instance, by Hechinger et al.43 Additional information about the classification of descriptors in 1D, 2D, and 3D can be obtained in Table 8.2 of the book of Gasteiger and Engel.44 The definition and units of the numerous descriptors mentioned in this Review can be found in the original papers of each model, and notations were kept unchanged for the sake of simplicity. For most of them, definitions can also be found in the Handbooks of Karelson38 and Todeschini.39 It has to be noted that some QSPR models include commonly available experimental properties as descriptors to obtain a combined structure−property and property−property relationship by adding structural descriptors to improve the predictivity of historical property−property relationships. Various data mining tools44,45 can then be used to develop a model correlating the experimental property to the molecular descriptors. These tools include artificial neural networks (ANN), partial least-squares (PLS), support vector machines (SVM), genetic algorithms (GA), and multilinear and nonlinear regressions (MLR, NL, respectively). If multilinear regressions are very simple to use, some others like neural networks could need implementation efforts. It has to be noted that most of the papers related to such complex methods do not offer the entire definition of the algorithm. Nevertheless, such works have been cited here as the reader can decide to recompile it or to contact original authors. To evaluate the reliability of the model,46,47 a series of internal and external validations are performed (see Table S3). Among internal validations, cross-validation methods48 are the most widely used. Such approaches aim to evaluate the robustness of the models, that is, the stability of the model regarding the data set. Y-randomization methods49 are used to ensure against chance correlation. The predictive power of the model is evaluated using a series of statistical coefficients applied on external molecules (validation set) out of the training set.50−52

correlating an experimental property of chemical compounds with molecular features encoded by a series of molecular descriptors. Thus, the relationship has the following general form: property = f (descriptors)

(2)

This approach has been widely used to predict biological and toxicological end points,27 notably in the pharmaceutical industry.28,29 Recently, it has also been increasingly used to predict physicochemical properties.30−32 Thus, it has been considered among the methods that can be used as an alternative to experimental tests in the context of REACH,5 with the condition to respect five validation principles proposed by the OECD33 to ensure its performance and transparency. The validation principles are (1) a defined end point, (2) an unambiguous algorithm, (3) a defined domain of applicability, (4) appropriate measures of goodness-of-fit, robustness, and predictivity, and (5) a mechanistic interpretation, if possible. Extended reviews of the models dedicated to physicochemical end points of REACH have already been published by Katritzky et al.,30 Dearden et al.,31,32 and Quintero et al.34 Figure 2 presents a diagram of the typical procedure followed for the development of QSPR models in accordance with the OECD principles of validation.

Figure 2. Principle of development of validated QSPR models.

To derive a model, an experimental data set provides property values for a series of compounds. This available data ensemble is generally subdivided into a training set and a validation set. The training set is devoted to the development of the model, whereas the validation set is used to estimate the predictivity of the model. This partition has to be done so that the validation set represents the applicability domain of the model in terms of property values and chemical diversity correctly. To that end, several methods can be used, for example, based on the property ranking, by random division, or using k-nearest neighbors (KNN) methods.35−37 These data must be as accurate as possible, obtained under homogeneous conditions, and represent an homogeneous set of chemical compounds to limit the statistical noise in the database. Indeed, this data set is critical not only for the accuracy of the models (limited notably by experimental uncertainties), but it also defines the end point of the model (in terms of experimental protocol and conditions) and its applicability domain (in terms of range of properties and chemical structures). D


Chemical Reviews

Review

The applicability domain (AD)53 is the domain in which models are applicable and their predictions can be considered as reliable, that is, according to the performance of the models as evaluated by the previously presented validation methods. Because the model is developed for a defined training set, the definition of the applicability domain is important. The latter can be defined by the domain covered by the training set, both in terms of property values and in terms of chemical diversity. Over the years, various methods have been developed to define ADs,54 for example, those using a leverage approach55,56 and those utilizing projection in descriptor space.57 In short, the robust application of existing QSPR models may follow six steps (as presented in Figure 3).

Application of the Model. The application of the model may also be more or less complex. Multilinear regressions are certainly quite easy to implement as compared to more sophisticated methods (such as neural networks). In such complex cases, a complete algorithm is rarely available. Checking of the Applicability Domain Based on the Property Value. Once the model has been applied, the predicted value must be checked to range in the applicability domain of the model. If it does, the prediction can be considered to be reliable, at least within the uncertainty estimated for the model by external validation, when available. 2.3. Equations of State and Molecular-Based Approaches

2.3.1. Background. Equations of state (EoSs) are mathematical relationships that enable calculation of the Helmholtz energy as a function of temperature, volume, and composition. Using thermodynamic relationships, other relevant properties (pressure, chemical potential, compressibility, speed of sound) can be assessed. The most important properties are pressure and chemical potentials, which are derived from P=

∂Ares(T , V , N ) RT − ∂V V

(3)

T ,N

Where T is temperature, V is volume, N is the number of moles and μi = RT ln φi =

∂Ares(T , V , N ) ∂Ni

T , V , Nj ≠ i

⎛ PV ⎞ ⎟ − RT ln⎜ ⎝ NRT ⎠ (4)

Detailed algorithmic issues are discussed by Møllerup and Michelsen.58 For the purpose of this paper, the properties of interest that are accessible through an EoS are density, boiling temperature or vapor pressure, surface tension (using an additional theory, either the so-called “gradient theory” or a molecular density functional theory), and solubility, in particular water solubility and n-octanol−water partition coefficient. The publications of Martin,59 Gubbins,60 Tsonopoulos and Heidman,61 Han et al.,62 Anderko,63 Sandler,64 Economou and Donohue,65 Wei and Sadus,66 Muller and Gubbins,67 and Valderrama68 are some examples of reviews devoted to EoSs. The recent book by Kontogeorgis and Folas69 also provides an excellent state of the art. Most of the EoSs can be classified into three groups: virial, cubic, and molecular-based. The virial-based equations of state have given rise to a large number of very precise equations that are constructed from a large number of parameters.70,71 They require an extensive database and, for that reason, are hardly considered predictive. They will not be considered further. A list of predictive EoS developments is proposed in Table S1. In what follows, we focus more specifically on two types of such predictive approaches. First, we will consider the cubic equations for which the most well-known predictive version is PSRK (predictive Soave−Redlich−Kwong). Second, we will concentrate on a more recent family of equations that are molecular based, and whose most developed version is SAFT (statistical associating fluid theory). The GC-PPC-SAFT EoS will be used as an example of its predictive capacity. 2.3.2. Cubic EoSs. Among the cubic EoSs, the van der Waals (vdW) equation of state was the first reliable model able to account for the liquid vapor phase coexistence; however, it is disappointing from a quantitative point of view.72 The term

Figure 3. Principle of the robust use of validated QSPR models.

Checking of the Target Compounds. The family of compounds for which the model is developed has to be checked. Indeed, it gives a first estimation of the domain of applicability of the model. Checking of the End Point. The experimental protocol defining the end point of the model must also be identified to ensure its compatibility with the final use of the predictive data, particularly in a regulatory context. Calculation of the Molecular Descriptors Included in the Model. The calculation of descriptors can be more or less complex. For the simplest ones, only the empirical formula is needed, whereas quantum chemical descriptors require more complex computational approaches, including quantum chemical calculations. Checking of the Applicability Domain Based on the Molecular Descriptor Values. A refined evaluation of the applicability domain of the model can be performed on the basis of descriptor values. In most cases, existing models (especially the less recent ones) do not give any details on ADs, but knowledge of the training set can help the user to define it. E


Chemical Reviews

Review

“cubic” comes from the fact that a cubic equation is found for solving volume at known pressure: Equation 3 is written as P=

a(T ) RT − 2 V−b V + uVb + wb2

remains limited. Regarding predictive EoSs, the PSRK is probably the most widely recognized to date. Nonetheless, it is only truly predictive for mixture properties, where the UNIFAC activity coefficient model is used. However, in case of mixtures with water, which are the object of the REACH regulations,91,92 the results should be used with great caution. This is why some authors have suggested improving UNIFAC adding a specific association contribution.93 More details on these predictive models, their capacities, and limitations are provided in the handbook by Gmehling et al.94 2.3.3. Statistical Mechanics-Based EoSs. In parallel, notable advancements in statistical mechanics (chain models, inclusion of hydrogen-bond interactions, i.e., associating fluids) enabled the development of realistic molecular-based theories.69,95 Although there is not a single approach to the EoS, the most widely available engineering models are generally constructed using Barker and Henderson’s perturbation theory,96 which provides a sum of contributions of the residual Helmholtz energy, where each term describes a specific intermolecular interaction:

(5)

where u and w are characteristics of the equation (u = 1 and w = 0 for Redlich−Kwong;73 u = 2 and w = −1 for Peng−Robinson (PR)74). The a and b values are parameters that are computed using the critical pressure (Pc) and temperature (Tc). This means that at the critical point, the P−V isotherm shows an inflection point, that is, (∂P(Vc)/∂V)|Tc = 0 and (∂2P(Vc)/∂V2)|Tc = 0. These equations imply that a = Ωa

R2Tc2 α (T ) Pc

(6)

b = Ωb

RTc Pc

(7)

where the temperature function α(T) is necessarily one for T = Tc, and the quantities Ωa and Ωb have specific values depending on the selected equation. It was not until the work of Redlich and Kwong that a sufficiently reliable EoS for engineering applications was finally made available.73 This work opened the way to many other studies aiming at improving and extending the field of EoS. Several empirical modifications were proposed to improve the accuracy of EoS, leading to the most popular and most widely used cubic EoS in the industry, SRK,73,75 and PR.74 The accuracy of these equations depends on the quality of the parameters. One should distinguish between pure component parameters and parameters used within mixing rules. Most often, pure component parameters are based on the critical constraints (and therefore use the critical parameters pressure and temperature), leading to an exact representation of the vapor pressure at the critical point. To improve the vapor pressure calculations, a so-called “alpha function” is used. The Soave function,75 which is based on the acentric factor (ω), aims at providing a predictive approach for this function. Other more recent developments, for example, by Twu et al.,76−78 have proposed a different function, based on the same acentric factor. Neau et al.79,80 review these functions and show the effect it may have on other derived properties (heat capacity, speed of sound). The calculation of mixture properties requires so-called “mixing rules”. The best-known mixing rules for cubic EoSs are the so-called “classical mixing rules”,81 which provide acceptable results only for hydrocarbons mixtures. The PPR78 (predictive Peng−Robinson) equation of state82−85 proposes a group contribution method for predicting the binary interaction parameters, a correction to these classical mixing rules, in view of computing full phase diagrams. Ever since Huron and Vidal86 proposed to combine the power of activity coefficient models with equations of state, followed by a number of other authors,87,88 it has been possible to calculate complex phase behaviors of mixtures (particularly mixtures with water and noctanol). However, to be considered as a predictive model, the required input parameters should be accessible without further data. The most well-known and industrially developed model is the group contribution model as developed by the UNIFAC consortium: PSRK (predictive SRK)89 is now available in many software tools. It has been improved recently with VTPR (volume translated Peng−Robinson),90 which is probably the most powerful “conventional” method, although its database

Ares = Arep + Adisp + Aass + ... rep

disp

(8)

ass

where A , A , and A account for repulsive, London (dispersive), and associating interactions, respectively. Different theories and hypotheses were used in the literature to describe these terms. The simplest statistical mechanical EoS was applied to spherical molecules,97 as well as to nonspherical molecules.98 Many attempts have been proposed, with various degrees of success, involving the combination of a repulsive and a dispersive term (chain of rotators (COR),98 Boublik Alder Chen and Kreglewski (BACK),70 perturbed anisotropic chain theory (PACT),99 among others). Yet, the developments that have encountered the greatest success in the engineering community are without a doubt those based on the statistical associating fluid theory (SAFT) EoS.100−102 The SAFT-based equations of state are constructed from a hard sphere reference using a perturbation expansion of Barker and Henderson96 for the dispersive interactions and Wertheim 103 theory for the association. It also includes a term describing chain formation. Although several earlier attempts have been made to describe the chemical association observed with hydrogen bonding, for example (associating perturbed anisotropic chain theory (APACT),104 Anderko105), the approach developed within the SAFT framework is now the most widely recognized.58 This type of molecular interaction is at the origin of the majority of strongly nonideal behavior observed in real systems. Chapman106 very nicely summarizes the reasons for the extensive use of this model: very good mapping between the most common intermolecular interactions and the equation parameters. There are many implementations of SAFT EoSs. Some of the most common versions of SAFT are listed as follows. (1) Soft SAFT:107 a virial-type EoS is used to map the Lennard-Jones spheres behavior to molecular simulation results. (2) Perturbedchain SAFT (PC-SAFT): The molecules are envisioned as chains of freely jointed spherical segments exhibiting attractive forces among each other.108 (3) Hard-sphere SAFT (HS-SAFT): Molecules are viewed as chains of hard spheres with van der Waals interactions.100,101 (4) Lennard-Jones SAFT (LJ-SAFT): Includes effects from dipole interactions and modifications to the hard sphere term.109 (5) Square-well SAFT (SW-SAFT): Extended thermodynamic perturbation to include square-well chains, which quantifies monomer−monomer potential.110 (6) SAFT with variable range (SAFT-VR): Addresses chain F


Chemical Reviews

Review

contribution with segments of variable range potentials.111,112 (7) Dimer SAFT (D-SAFT): Inclusion of a dimer equation for hard-sphere chain molecules.113 (8) SAFT-Mie: The Mie potential, with variable exponents, is used as a basis for the description of the sphere−sphere interactions.114,115 All of these differ in two respects: (1) the reference term and how it is calculated or approximated (hard sphere,100 LennardJones,109 square well,110,111,116 Hard Chains100) and (2) the dispersive term. Recently, several authors have proposed a further refinement of the theory by adding a specific term for polar interactions117−119 (dipole−dipole, dipole−quadrupole, or quadrupole−quadrupole). 2.3.4. GC-PPC-SAFT as an Example. In view of its sound physical foundation, good predictive capacity, and the many developments that still concern SAFT EoS, we will provide some additional details here on one of its representatives, which is the GC-PPC-SAFT model. This model is chosen because it contains a large parameter database and has been shown to perform well on many systems, including some that are highly polar, and in particular those containing water.120−122 A detailed description of the equations required for using SAFT EoS would go beyond the scope of this Review, but we will provide an overview of the different main terms involved in this method. For the sake of simplicity, we have adopted the definitions commonly used in SAFT EoS; this is expressed as the sum of contributions to the Helmholtz energy at fixed temperature, volume, and compositions.123 Figure 4 illustrates the types of interactions thus explicitly taken into consideration.

including water properties). The GC-PPC-SAFT method, initiated by Tamouza130 and further developed by NguyenHuynh,131 allows for a more complete description of nonpolar and polar molecules. A number of authors have proposed so-called “hybrid” approaches between cubic and molecular-type models, with various degrees of success. The best-known are cubic plus association (CPA)132 and group contribution association (GCA).133 They use the Wertheim association in combination with either a classic SRK EoS (in the case of CPA) or the group contribution-based EoS called GC-EoS.134 2.4. Activity Coefficient Models and COSMO/COSMO-RS

Klamt and Schüürmann135 developed the continuum solvation model (COSMO) in 1993. In this approach and as in other equivalent approaches,136−139 the solvent is modeled via a dielectric continuum described by its dielectric constant140 that surrounds the “solute” molecule, which is explicitly described, outside of a cavity. The polarization then induced by the electron distribution of the solute on the dielectric medium can be represented by charges on the surface of the molecular cavity (charge-image technique). This “solvent” charge, together with the charge of the solute obtained by quantum chemical calculation, makes it possible to compute the interaction energy between the solute and the solvent. This interaction energy term is calculated using a simple charge−charge potential energy equation. The COSMO model135 is an improved continuum solvation model as compared to some others.137,138 Because it simplifies the treatment of the boundary conditions between the explicit and implicit domains in the quantum chemical calculations, it is quite popular. It has been implemented in several quantum chemical softwares. This approach has been successfully employed in atomistic simulations at the quantum chemical level.139 Later, Klamt141 proposed improvements to the solvation energies obtained in the COSMO approach to correct the “ideal” solvation model to the “real solvent”-case model. Since then, the conductor-like screening model for real solvent (COSMO-RS) parameters and equations has continuously been improved and further validated.142,143 The principle and main equations of the COSMO-RS are described below. Lin et al.144,145 and Wang et al.146 developed the COSMO-segment activity coefficient (COSMO-SAC) method, which is a variation of the COSMORS method. This method and COSMO-RS possess parameters that are common, and some others that are not. The COSMOSAC code is freely available, which has paved the way to numerous reparameterizations and method modifications. Over-

Figure 4. A schematic illustration of the contributions of Helmholtz energy in polar SAFT approaches.

To have a predictive model, one must be able to use it without regressing experimental data. The most usual and convenient way to do this is to use the group contribution concept. In the literature, several groups have developed this approach. We can mention the works of Tihic124 (PC-SAFT), Lymperiadis,125 and most recently Dufal 126 (SAFT-gamma) or Peng and McCabe.127−129 However, their application range is rather limited (in particular not adapted for polar molecules or mixture

Figure 5. σ-Surface (left) and σ-profile (right) of propan-1,2-diol. The oxygen atoms induce positive charge density on the cavity surface (in red), while the hydrogen atoms induce a negative charge density (in blue). Most molecules, for example, alkanes, do not induce any charge density difference on the cavity surface (in green). The σ-profile is the histogram distribution of the σ-surface. G


Chemical Reviews

Review

where aiα is the surface of the atom α in the molecule i, and τ(α) is an element-specific term that is fitted to experimental data. It is only introduced to evaluate the potential energy when the molecule moves from the gas to the liquid phase. Mixtures and pure compounds can be similarly described with the above equations. The σ-profile of a mixture is obtained by a weighted sum of the pure compounds σ-profiles:

all, the efficiency and accuracy of the two methods are of the same order of magnitude. In the next paragraph, we describe further the principles and equations used in the historical COSMO-RS method. First, the screening charge densities and the σ-profile of the solute and molecules that constitute the solvent are computed using COSMO in an infinite continuum (i.e., the virtual conductor that surrounds the molecule is defined by ε = ∞). The σ-profile is a representation of the screening charge densities σ that are induced by the solute on its cavity surface: It is the distribution function of σ per surface area unit. For instance, the σ-profile and σ-surface of propan-1,2-diol are shown in Figure 5. For a solute characterized by the screening charge σ and a solvent characterized by a screening charge σ′, the potential energy of interaction can be evaluated by “matching” the two density surfaces so as to remove all residual charges from the σ and σ′ profiles. In the case of an ideal electrostatic contact between σ and σ′, as in the case of the contact between a molecule and an ideal conductor, for instance, σ = −σ′ holds exactly. In that case, the potential energy of interaction is zero. In general practice, there is always some mismatch between σ and σ′, and one gains or loses potential energy due to this mismatch (Emismatch). Using the elementary electrostatic theory, the misfit potential energy equation is simply expressed as Emismatch(σ, σ′) = aeff emismatch(σ, σ′) = aeff

pmixture (σ) =

pi (σ) =

∑ wpj j (σ)

(13)

j ϵi

where p (σ) is the σ-profile of the conformer j of the molecule i, and wj its Boltzmann-weighted fraction in the considered conditions (nature of the “solvent”, temperature, etc.) with respect to all other conformers of i using the Gibbs free energies obtained in the COSMO-RS theory. Now, one can define the σ-potential of a system S, which is characterized by its screening charge density σ, with reference to a solvent, which is characterized by its screening charge density σ′, according to j

α′ (σ + σ′)2 2

μS (σ) = −

⎛a ⎞⎞ RT ⎛ ln⎜ pS (σ′) exp⎜ eff (μS (σ′) − e(σ, σ′)) dσ′⎟⎟ ⎝ ⎠⎠ ⎝ aeff RT

∫

(14)

and e(σ, σ′) = E hb(σ, σ′) + Emisfit(σ, σ′)

(15)

The σ-potential μS(σ) is solved iteratively, as μS appears on both sides of the equation. The evaluation of the Gibbs free energy of solvation of a molecule uses the above potential as well as an additional combinatorial term that stems from statistical thermodynamics. The chemical potential μxSi, of the system S having a molar concentration xi in a solution is given by xi μSxi = μC,S +

∫ px (σ)μS(σ) dσ i

(16)

i μxC,S

E hb(σ, σ′) = aeff ehb(σ, σ′)

where = (∂GC,S/∂xi), and GC,S is obtained from the combinatorial free energy expression:

= aeff chb min(0, min(0, σdonor + σhb)

GC,S = RT (λ 0 ∑ xi ln ri − λ1 ln(∑ xiri) − λ 2 ln(∑ xiqi))

(10)

i

i

i

(17)

with σdonor = min(σ, σ′) and σacceptor = max(σ, σ′). In this equation, only the highest and lowest screening density charges of the solute are considered and must match the low and high screening density charges of the solvent, respectively. chb is a tuning parameter for adapting the “strength” of the hydrogen bond. It is fitted with experimental data. chb is the charge density difference that is assumed to be induced when a hydrogen bond forms. In addition to electrostatic and hydrogen-bond energy terms, a van der Waals specific potential energy term is introduced for the molecules in solution when their energies are computed with respect to their gas-phase states. This potential energy correction EiVdW is proportional to the atom types and surface of the molecule i. It is given by

which gives ⎛ r ⎞ ⎛ xi μC,S = RT ⎜⎜λ 0 ln ri + λ1⎜1 − i − ln r ̅ ⎟ ⎠ ⎝ r ⎝ ̅ ⎛ ⎞⎞ q + λ 2⎜1 − i − ln q ̅ ⎟⎟⎟ q̅ ⎝ ⎠⎠

(18)

In the above equations, ri and qi are the molecular volume and surface of compound i, respectively, and: r̅ =

∑ xiri , i

q̅ =

∑ xiqi i

(19)

The above combinatorial free energy is valid especially when the surface and volume of the molecule and system are welldefined. This is not always the case, however (e.g., polymers and

∑ aαi τ(α) αϵi

(12)

where x is the molar fraction of the molecule i in the mixture and pi(σ) the σ-profile of the molecule i. Furthermore, a molecule can be represented by different conformers. Its σ-profile is then expressed by

In this equation, the assumption is made that the entire surface of the molecular cavity is covered by the solvent. However, that is not possible in reality, and an effective contact area coefficient (aeff) is introduced. In the above equation, the general constant α′ can be computed exactly by theoretical means in principle, but is fitted to experimental data in practice. The case of the hydrogen bond must be considered apart from the “normal” misfit between σ and σ′ because of the electron delocalization that is taking place when a strong hydrogen bond forms. However, a formalism is adopted similar to that employed above, and the potential energy that is gained in the hydrogen bond is expressed as

i E VdW =

xipi (σ)

i

(9)

max(0, σacceptor − σhb))

∑ i ϵmixture

(11) H


Chemical Reviews

Review

minutes to dozens of minutes to get the σ-profile and σ-potential of most common organic or inorganic molecules (at least, those that are liquid under moderate conditions of pressures and temperatures); this is the only parameterization that is necessary to compute the thermochemical properties of that molecule later on in only a couple of seconds. Otherwise, the prerequisites concerning the definition of the molecular systems in the COSMO-RS/SAC approach are minimal because the energies, σ-profile, and σ-potential are all obtained from quantum chemical calculations when not yet included in the COSMORS molecule database. Essentially, only atom types and coordinates are needed in quantum chemical calculations. Other information, such as bond orders and topology, for instance, stem from the resolutions of the electronic wave function equations and do not need to be defined explicitly a priori. Using appropriate software such as OpenBabel,165 SMILES code definition166 of the molecules is the only information that is eventually necessary to build a COSMORS/SAC molecule database. Both COSMO-RS and COSMOSAC167 possess graphical user interfaces.

proteins). The universal parameters λ0, λ1, and λ2 are fitted to experimental data. Finally, the above chemical potential μxSi is the “standard” chemical potential minus RT ln xi and can be used as such to predict the thermochemical properties of chemical species in solution. For gas-phase compounds, the Gibbs free energy equation is defined by xi xi xi xi xi μgas = Egas/QM − ECOSMO/QM − E VdW − wcyclencycle + ηgas

(20) i i In the above equation, Exgas/QM and ExCOSMO/QM are the quantum chemical electronic energies of the molecule xi in the i gas phase and in the conductor, respectively, and ExVdW is the van der Waals potential energy correction as defined in eq 11. The i term wcyclenxcycle is a correction for molecules that contain ring(s), xi where ncycle is the number of cycle in the molecule and wcycle an adjustable parameter that is fitted to the experimental data. Finally, ηgas is an additional fitted parameter that links the free energy reference of the gas phase compound with that of the liquid phase. For solid-state compounds, there is no model equation, and the thermochemical values of the solids are given explicitly using experimental data or alternative method values (e.g., values obtained from QSPR-based models). The free energies of the compounds in their liquid state remain computed using the COSMO-RS/SAC formalism. The solid state properties cannot be fully determined by the COSMO-RS/SAC theory, and it is essentially of the same quality as the QSPR model(s) used to predict the solid free energies. Hence, chemical potential equations of any compounds can be defined in the COSMO-RS theory in the gas, liquid (pure or mixtures), and solid states. It thus becomes possible to compute melting and boiling temperatures, vapor pressure, solubility of gas, liquid, and solids in any solvents, as well as solubility, including hydrosolubility, and all partition coefficients between two liquid phases (extraction), including the n-octanol/water partition function coefficient. It is possible to compute reaction energies in solution. This also enables computation of acidity constants. We will review these properties in the next section. In addition to the above properties, the COSMO-RS/SAC σ-profile and σ-potential have been used as molecular descriptors and included in QSPR147,148 and other method149,150 parameterizations in several studies, making it possible to compute viscosity, density, and flash-point using QSPR methods with molecular descriptors from COSMO-RS/SAC. In the description of the COSMO-RS equations, we explicitly pointed out 13 parameters that are fitted to the experimental data. In the COSMO-SAC equations, there are the same number of “free” parameter,146 and the public parameters of COSMO-RS equations are identical to those of the COSMO-SAC equations. The current version of COSMO-RS is C30_1201, and it corresponds to a given set of optimized parameters. Several authors have considered the fine-tuning reoptimization of these parameters or the introduction of others to correct obvious flaws in specific applications.151−156 Klamt and Eckert142 made a comparison between COSMO-RS and GC, describing their advantages and drawbacks, given that both methods are essentially GC-type methods. This comparison between COSMO-RS and GC methods has been made by several other groups.157−160 The COSMO-RS method was also analyzed in comparison to the Monte Carlo,161 COSMO-SAC,162 and QSPR methods.163,164 The accuracy of COSMO-RS and COSMO-SAC is about the same. With COSMO-RS or COSMO-SAC, it takes

2.5. Molecular Simulation

Molecular simulation considers small size systems, at a typical scale of a few nanometers, and determines their behavior from a careful computation of the interactions between their components. Provided the potential energy and/or forces of the system can be determined from atomic coordinates through an adequate model, its thermophysical properties can be derived in the framework of statistical thermodynamics.168−176 This holds for equilibrium properties like vapor pressure as well as for transport properties like shear viscosity. Depending on the desired application, a statistical ensemble must be chosen in such a way that the constraints correspond to variables that are controlled in the experimental setup. The canonical ensemble or NVT ensemble, in which temperature, number of molecules, and volume are imposed, is used for single phase fluids when density is known. In the isothermal−isobaric or NpT ensemble, pressure is imposed instead of volume, and the average volume of the system is used to predict fluid density. The most widely used ensemble for phase equilibrium calculations (solubility, partition coefficient, etc.) is the Gibbs ensemble177 in which two phases are introduced without explicit interface. The temperature and the total number of molecules are also imposed. Its outputs are then the equilibrium properties: saturated vapor pressure, vaporization enthalpy, and the coexisting densities of the liquid and vapor. Practically, there are two main methods to sample the fluid phase space using different statistical ensembles, both resting on the computation of interparticle energies. The first is through molecular dynamics, which consists in solving the equations of motion, and the second is Monte Carlo simulation, in which a statistical method is used. Energy description is a key quantity common to both simulation tools and is presented below before MC and MD methods. 2.5.1. Energy of Molecular Systems. Potential energy is classically broken down into intermolecular (or external) energy and intramolecular energy: U = Uext + Uint

(21)

The intermolecular energy arises from the interaction between different molecules. In the applications considered here, it is broken down as Uext = Ud−r + Uel + Upol where Ud−r arises from dispersion and repulsion interactions, Uel is the electrostatic (or I


Chemical Reviews

Review

Coulombic) interaction energy, and Upol is the polarization energy.178 The dispersion−repulsion energy, which is dominant in low polarity systems like alkanes, is obtained by a summation over all of the pairs of force centers. Although the Buckingham exp-6 model has been shown to be more accurate on several important systems,179−181 the most popular model (e.g., the one with the largest parameter databases) is Lennard-Jones 6-12: Ud − r

⎛⎛ ⎞12 ⎛ ⎞6 ⎞ σij σij = ∑ 4εij⎜⎜⎜ ⎟⎟ − ⎜⎜ ⎟⎟ ⎟ ⎜⎝ r ⎠ i,j ⎝ rij ⎠ ⎟⎠ ⎝ ij i 300 K allows isomer differentiation (with first and second order)

allows isomer differentiation (with first and second and third order) bond contribution method (called group interaction contribution technique) accurate for Tc perfluorinated saturated hydrocarbons; AMD < 0.5% tested for a large range of nonelectrolyte, multifunctional organic compounds; AMD < 6.37 K

Marrero−Pardillo277 Kelly et al.278 Nannoolal et al.279

therefore been developed for this property. Moreover, it has been currently considered as a test case for the development of new descriptors280,281 or methods.282 Numerous models exist and show good results for a wide diversity of organic compounds or for a more limited specific family of compounds. A presentation of these models is available in the reviews of Dearden,283 Katritzky,30 Lyman,284 Rechsteiner,285 and Reinhard.230 Simple models were first proposed. For instance, in 1939, the one developed by Banks286 was based only on molecular weight (MW):

3.2. Boiling Point (Tb)

“The normal boiling point is the temperature at which the vapor pressure of a liquid equals 101.3 kPa. Note: If the vapor pressure equals 101.3 kPa or more at a given temperature this means the substance is completely gaseous at that temperature.”1 According to the REACH directive, this property should not be known for gaseous compounds and for solid systems for which melting points are above 300 °C, or for which decomposition/ auto-oxidation/degradation appears before their boiling temperature. In this particular last case, Tb can be estimated or measured at a reduced pressure. 3.2.1. GC Method. There are several GC methods for computing Tb.18 Noteworthy here is the group contribution approach by Marrero and Gani,228 which was used to model the boiling points of 1794 organic compounds with a standard error of 8.1 °C. The method of Labute269 uses 18 atomic contributions on a set of 298 diverse organics, yielding a standard error of 15.5 °C; Simamora and Yalkowsky227 published a method for modeling Tb containing 36 group contributions and 4 intramolecular hydrogen-bonding terms for a set of 44 aromatic compounds, with a standard error of 17.6 °C. A nonexhaustive list of methods and correlations available in the literature is given in Table 4. The most recent version of the GC method with CI (called GC +) was used to model Tb for 3510 molecules with an R2 of 0.998 and AAD of 1.44%.21 Several GC methods and correlations are available from the web.229 3.2.2. QSPR. Among the properties required by the REACH regulation, boiling point is the one for which the most experimental data are available. Several QSPR models have

log Tb = 2.98 − 4*MW −1/2

(33)

More parameterized MLR models were then developed. In 1996, Katritzky287 obtained a four-parameter model with a standard error of 12.4 K for 298 diverse organic compounds, and, in 2008, Sola’s288 model even reached an error of 9.1 K. The best model was probably obtained by Hall and Story.289 It uses a neural network based on a series of topological descriptors with a mean absolute error of 3.9 K for a set of 298 organic compounds. 3.2.3. EoS. Any of the predictive equations of state is able to calculate boiling temperatures, because it is nothing but the inverse of a vapor pressure calculation. These calculations are based on the simultaneous resolution of P=

∂Ares(T , V V , N ) RT − V ∂V V res

=

T ,N

L

∂A (T , V , N ) RT − ∂V VL

= 101.3 kPa T ,N

(34)

and N


Chemical Reviews μi =

∂Ares(T , V L , N ) ∂Ni res

=

Review

T , V , Nj ≠ i

V

∂A (T , V , N ) ∂Ni

T , V , Nj ≠ i

⎛ PV L ⎞ − RT ln⎜ ⎟ ⎝ NRT ⎠ ⎛ PV V ⎞ − RT ln⎜ ⎟ ⎝ NRT ⎠

at lower temperature require the use of single-phase NpT simulations (typically for heavy molecules having high values of Tb above 400 K). If information is available (by GC or EoS), it is interesting to have a rough estimate of the position of Tb in the phase diagram of the molecule. If this is not possible, one can alternatively try to have an estimation of the critical temperature (Tc) by using other GC methods (see section 3.2.1). Simulations in the NpT isothermal−isobaric ensemble should be performed on systems with a representative number of particles (250−500 molecules usually) to obtain the saturated liquid properties at temperatures close below the estimated Tb. It is then possible to get a good approximation of the molar enthalpy of vaporization by using the following equation:

(35)

with unknowns T, VL, and VV. Tb is not generally reported in the literature for the EoSs and related methods; more often, vapor pressures are reported. However, nothing prevents the calculation of Tb using this approach. It is worth noting that for cubic EoS, whose input parameters are Tc, Pc, and acentric factor (see Table 10 for predictive methods), the vapor pressure curve (and therefore Tb) is very correctly represented, provided that an adequate alpha function is used. In fact, given that the acentric factor is precisely defined as the vapor pressure at T/Tc = 0.7, which is generally close to the normal boiling temperature, these equations are expected to provide very good results. Because this property is related to the vapor pressure, we will discuss the literature data in greater detail in section 3.4.5. As an example, Emami et al. have extended the statistical associating fluid theory (SAFT),290 perturbed-chain statistical associating fluid theory (PC-SAFT),108 and the Elliot−Suresh−Donohue (ESD) EoS of Elliott and Natarajan276,291 to include the concept of GC for 19 families (hydrocarbons, cyclic hydrocarbons, aromatic hydrocarbons, alcohols, amines, nitriles, thiols, sulfides, aldehydes, ketones, esters, ethers, halocarbons, and silicones).292 The authors present contributions for 84 firstorder groups (FOG). The Tb values of different families of molecules were compared to the available experimental data obtaining a mean absolute deviation (MAD) of 8, 12, 8, 10, and 9 K for ESD, SAFT, PC-SAFT, Tihic293 first order group, and Tihic second order group (SOG) equations, respectively. 3.2.4. COSMO-RS/SAC. Using the equations given in section 2.4, the boiling temperature is obtained as defined at the beginning of section 3.2. The equations can be solved for any gas pressure (boiling point under vacuum) or liquid phase composition, be it mixtures of any number of compounds or pure compound. The boiling point calculation has been considered by Constantinescu et al.294 This requires a different mixing rule than that used in “standard” COSMO-RS/SAC theory. The observed accuracy of COSMO-RS/SAC in the determination of the boiling point is 15−18 K,141,146,295 although it varies depending on the functional groups of the molecule or the boiling temperature itself. The best accuracy is obtained for boiling temperatures in the range 250−550 K. 3.2.5. Molecular Simulation. There are several ways for obtaining liquid thermodynamic properties using molecular simulation under ambient conditions. We will discuss two different approaches here, the first one being based on direct NVT Gibbs ensemble Monte Carlo177 (GEMC biphasic) coupled with thermodynamic integration using the Clapeyron equation, and the second one being based on molecular simulations in the adiabatic Gibbs ensemble (GEMC). Thermodynamic integration through the Clapeyron’s equation: The calculation of Tb using this method can be divided into two cases. (a) The size of the molecule permits the use of NVT GEMC at all temperatures (typically for molecules having moderate-to-low Tb values); and (b) only the highest temperatures can be reached using NVT GEMC and some extra points

ΔH vap = −⟨E l⟩ + RT

(36)

where ⟨El⟩ is the average molar intermolecular potential energy of the liquid phase in the simulation, and R is the ideal gas constant. This relationship assumes that (i) the molar volume of the liquid is negligible as compared to that of the vapor phase and (ii) the vapor behaves closely to an ideal gas. These assumptions are correct for heavy molecules because the vapor pressure is significantly lower than atmospheric pressure. In this case, liquid properties at saturation should not be significantly different when pressure is set to an arbitrary low value such as 1 mbar, for instance. Prior to Tb, one needs to determine the saturated vapor pressures, and the NpT simulations must be done at constant intervals at a scale of T−1 to use the thermodynamic integration of the Clapeyron equation296 with a second-order numerical integration algorithm.190 This yields the following equation: 2 ln(Psat(Tn + 2k)) = ln(Psat(Tn)) − Δ(1/T ) R k

∑ ΔHvap(Tn+ 2i− 1) i=1

(37)

where Δ(1/T) = (1/Ti) − (1/Ti+1) is the constant spacing of the temperatures on the inverse temperature scale, Tn being a temperature where a Gibbs ensemble simulation has been performed. One important aspect in thermodynamic integration is the estimation of the statistical uncertainty. In evaluating the standard deviation (SD) on the summation of the right-hand side of eq 37, the leading term is the uncertainty on Psat(Tn) in most cases. Thus, the statistical uncertainty for Psat(Tn+2k) is at most twice the uncertainty for Psat(Tn). Finally, Tb is obtained as shown in Figure 6. Tb can be estimated in molecular simulation using one of the following approaches. 3.2.5.1. Direct Calculation of Vapor−Liquid Coexistence via Heterogeneous NVT Simulations. This approach enables the location of phase coexistence points using a single simulation at a given temperature by using molecular dynamics (MD) simulations to generate spontaneous phase separation. The idea is to equilibrate a single phase in a relatively large simulation cell under NVT conditions and then quench the system into a two-phase region, which will spontaneously separate into two coexisting phases. The different phases can be identified by collecting histograms of the local density and/or local composition of the pure phase regions, which can be done soon after the quench.297 The method can be used to locate vapor−liquid, liquid−liquid, or solid−fluid equilibrium. It can be applied without difficulty to very complex molecules and mixtures and can be implemented on large parallel computers. In two-phase MD, the liquid and the vapor phases are simulated O


Chemical Reviews

Review

N!Γ(3N /2) + N1 ln V1 N1!Γ(3N1/2)N2!Γ(3N2/2) ⎛ 3N ⎞ ⎛ 3N ⎞ + ⎜ 1 − 1⎟ ln K1 + N2 ln V2 + ⎜ 2 − 1⎟ ln K 2 ⎝ 2 ⎠ ⎝ 2 ⎠

ln p = ln

(38)

where N = N1 + N2. This method implies displacement moves, volume change (but not exchange) moves, particle transfer moves, and enthalpy exchange moves. All of these moves are accepted according to eq 38. Additional details about its implementation can be obtained in ref 307. NpH GEMC ensemble simulations are very similar to the normal NVT GEMC ensemble, but phase coexistence data are obtained in a onecomponent system at a specific pressure rather than at a given temperature. This makes it possible to determine the normal boiling temperature in a single simulation of two coupled subsystems. Like in constant-NpT GEMC simulations of binary mixtures, NpH-GEMC allows volumes of the two cells to vary independently. Convergence of an instantaneous measurement of pressure can also be used to confirm equilibration. In the case of mixtures, it is possible to apply a specific bubblepoint pseudo ensemble, which allows a direct calculation of the bubble pressure of liquid mixtures.310,311 More direct and efficient methods have then been proposed, such as the bubble point pseudoensemble concept310,312 and the Grand Equilibrium method.313 This method has already been successfully employed to predict phase equilibrium of mixtures involving nonpolar311 and polar compounds.314,315 In this pseudoensemble, two simulation boxes without an explicit interface are used for the representation of the liquid and vapor phases. The temperature, the numbers of molecules in the liquid phase, and the global volume are kept constant. The simulation in this pseudoensemble allows one to determine the vapor composition and the equilibrium pressure, which corresponds to the bubble pressure of the imposed liquid. The boiling point can be determined afterward by iterative procedure over the imposed temperature until the bubble pressure converges to 1 bar. Alternative approaches using the “semigrand” or “osmotic” ensemble of the Clapeyron equation are also available for the estimation of the multicomponent phase diagram required for the calculation of Tb.316 Different simulations have been used in the literature to obtain Tb for different systems; a nonexhaustive list of studies is provided in Table 5. The most-used method for computing Tb is the combined approach of NVT-GEMC ensemble with thermodynamic integration of the Clapeyron equation. This approach has been used in most of the publications concerning the development of transferable force fields. It has been used for molecules having low to moderate molecular weight (i.e., for MW < 200 g/mol). This method is accurate and has only two known limitations: (a) the availability of an accurate force field and (b) problems of sampling. The latter problem can be improved by using heterogeneous NVT-MD simulations, in which phase separation is induced by temperature quenching and the phase diagram can be obtained through appropriate clustering sampling. The use of the adiabatic GEMC ensemble seems to be a very elegant approach to determining Tb in a single simulation. However, additional work is required to evaluate its potential use in complex systems. Most of the literature data for Tb are devoted to hydrocarbons and single substituted hydrocarbons smaller than C14. Although bigger molecules have been studied (C20−C100), Tb was not

Figure 6. Schematic representation of the use of the Clapeyron equation to determine Tb by the combined use of NVT-GEMC + NpT simulations.

in the same box, usually separated by an interface. An advantage of the MD method (as compared to MC) is that no exotic MC movements for phase space sampling are required (such as collective moves or hybrid MD−MC). The main drawback is the determination of the liquid and vapor phase properties themselves because of the complications caused by the presence of the interface and the difficulty in determining which phase the molecules collectively belong to. Several techniques have been proposed for identifying the densities of coexisting phases in the same simulation box by a fit to an hyperbolic tangent function to the interface298−300 and by using spatial and inverse histograms of local densities, which always fail close to the critical point.297,301 Other methods for determining the phase composition have been developed in which complete phase segregation is not required. In that case, the system is analyzed via Voronoi tessellations, yielding a bimodal distribution when the simulation is performed in the two-phase region.302,303 More recently, Patel et al.304 proposed a new approach using an MC volume sampling method of trajectories rather than Voronoi tessellations to determine phase coexistence. Once the phase diagram is known, Tb can be determined by thermodynamic integration method of the Clapeyron diagram (Figure 6). 3.2.5.2. Adiabatic GEMC Simulation. In simulations using the adiabatic NpH (isobaric−isenthalpic) ensembles, the number of particules (N), the total energy (U), the enthalpy (H), and pressure (p) are held constant. The probability associated with a given microstate is determined by the corresponding total kinetic energy (K), which is indirectly determined by K = H − pV − U, specifying H, p, and U. The weight associated with a given configuration is then given by the phase-space volume defined by the associated K and not with the usual Boltzmann factor.305,306 This method was introduced by Kristóf and Liszi307 for both NpH and μVL variants of the Gibbs ensemble Monte Carlo and applied to the calculation of the thermodynamic properties of pure H2S and other mixtures.308 This approach was recently used to directly obtain Tb for complex metallic systems.309 We will only describe the NpH (isobaric−isenthalpic) variant, because it enables the calculation of Tb when the atmospheric pressure p = 1 bar is imposed. In this pseudoensemble, the total enthalpy is fixed but allowed to redistribute between the two simulation cells such that H1 + H2 = H, which is equivalent to the fluctuations allowed in standard GEMC ensemble where individual cell volumes fluctuate but the total volume is constrained through V1 + V2 = V. In NpH-GEMC simulations, then, the probability p associated with a given state is P


Chemical Reviews

Review

Table 5. List of Publications Available in the Literature in Which MC Simulations Have Been Used To Compute family of molecules

size

FF/type

[Cnmim][BF4] ionic liquid

(C2−C6)

Z-AA

acrylates alcohols alcohols

(C4−C13) (C2−C8) (C2−C8)

TraPPE-UA ch-AUA NERD

alcohols aldehydes amines amines amines, nitro, nitriles, amides, and pyridines

(C2−C8) (C1−C8) (C1−C5) (C1−C8) (C2−C6)

aromatics, alkyl aromatics aromatics, alkyl aromatics benzene, pyridine, pyrimidine, pyrazine, pyridazine, thiophene, furan, pyrrole, thiazole, oxazole, isoxazole, imidazole, and pyrazole carboxylic acids

(C6>) (C6−C14) C5

TraPPE-UA TraPPE-UA ch-AUA OPLS-AA TraPPE-EH, TraPPE-UA AUA/ch-AUA TraPPE-UA TraPPE-EH

carboxylic acids cyclic alkanes esters ethers ethers, glycol ethers ethylene oxide glycols i-alkanes i-alkanes i-alkanes ketones n-alkane n-alkane

(C1−C8) (C5−C8) (C3−C20) (C2−C6) (C2−C6) C2 (C2−C4) (C7−C19) (C2−C8) (C4−C8) (C1−C8) (C2−C8) (C2−C8)

n-alkanes n-alkanes n-alkanes

(C2−C30) (C2−C8) (C2−C12)

OPLS-AA OPLS-UA TraPPE-UA AUA ch-AUA TraPPE-UA ch-AUA TraPPE-UA TraPPE-UA AUA CS-AA TraPPE-UA TraPPE-UA MMFF94 OPLS-AA OPLS-UA AUA CS-AA EP-UA

n-alkanes

(C2−C16)

NERD

n-alkanes

(C10−C60)

NERD

n-alkanes n-alkanes n-alkanes

(C2−C8) (C10−C100) (C2−C12)

n-alkanes n-alkanes and perfluoroalkanes

(C10−C100) (C1−C14) (C1−C8) (C2−C8)

SKS SKS TraPPE-EH TraPPE-UA TraPPE-UA Mie-UA

n-alkenes nitriles olefins perfluoro alkenes polyaromatics and naphtenoaromatics pyridines, dipyridines sulfides and thiols sulfides, disulfides thiols thiophene thiophenes n-alkanes i-alkanes cyclic alkanes olefins aromatics, alkyl aromatics polyaromatics and naphtenoaromatics

(C2−C4)

(C2−C8) (C2−C3) (C10−C16) (C4−C5) (C2−C4) (C1−C5) (C1−C8) (C41−C8) (C4−C9) (C2−C30) (C7−C19) (C5−C8) (C2−C8) (C6>) (C10−C16) Q

TraPPE-UA OPLS-AA AUA HFO-AA AUA/ch-AUA OPLS-AA ch-AUA TraPPE-UA TraPPE-UA TraPPE-UA ch-AUA AUA AUA AUA AUA AUA/ch-AUA AUA/ch-AUA

Tba

method

MAD (%)

GEMCc/ TI GEMC/TI GEMC/TI GEMC/ TI* GEMC/TI GEMC/TI GEMC/TI GEMC/TI GEMC/TI

1019−1052 Kd

A General Guidebook for the Theoretical Prediction of

Recommend Documents