Optimization of Transferable Site–Site Potentials Using a Combination

Mar 8, 2012 - The AUA approach employs roughly three parameters per site type, .... The optimization problem with one site would be convex and have ...
0 downloads 0 Views 979KB Size
Article pubs.acs.org/IECR

Optimization of Transferable Site−Site Potentials Using a Combination of Stochastic and Gradient Search Algorithms Sinan Ucyigitler and Mehmet C. Camurdan Chemical Engineering Department, Bogazici University, Bebek, Istanbul Turkey

J. Richard Elliott* Chemical and Biomolecular Engineering Department, University of Akron, Akron, Ohio 44325-3906, United States S Supporting Information *

ABSTRACT: Discontinuous molecular dynamics (DMD) simulation and thermodynamic perturbation theory (TPT) have been used to study thermodynamic properties for organic compounds. The aim is to infer transferable intermolecular potential models based on correlating the vapor pressure and liquid density. The combination of DMD/TPT generates a straightforward global optimization problem for the attractive potential, instead of facing an iterative optimization−simulation type problem. This global optimization problem is then formulated as a black-box optimization problem and solved using a combination of random recursive search (RRS) and Levenberg−Marquardt (LM) optimization. RRS is suitable for black-box optimization problems since its algorithm is robust to the effect of random noises in the objective function and is advantageous in optimizing the objective function with negligible parameters. LM is efficient local to an optimum with a smooth response surface. The local response surface is shown to be smooth but very flat along valleys with a high degree of coupling between the potential parameters. The algorithm is demonstrated with discretized versions of the Lennard-Jones (LJ) potential and a linear step potential using a database of 231 hydrocarbons, alcohols, aldehydes, amines, amides, esters, ethers, ketones, phenols, sulfides, and thiols. A correspondence is established between the discretized LJ potential and the TraPPE model, demonstrating the manner of improving density estimates and a way of expediting improvement of continuous transferable potentials.

1. INTRODUCTION Estimating physical properties such as vapor pressure and liquid density directly from molecular interaction potentials necessitates molecular simulation for each compound under consideration. Instead of performing a traditional molecular dynamics (MD) simulation for the full potential, Cui and Elliott1 showed that discontinuous molecular dynamics (DMD) can be used to simulate the repulsive part of the potential and the attractive part can be incorporated as perturbations. In that work it was concluded that second order thermodynamic perturbation theory (TPT) is sufficient to provide quantitative accuracy for the effect of the attractive potential on the physical properties. We have since performed several related studies expanding the database and refining the methodology.2−4 Owing to the DMD/TPT formalism, leverage is obtained between simulation and optimization that vastly reduces the number of simulations required for optimization. Vapor pressure prediction using only the molecular structure has been a difficult challenge. Varying by 3 orders of magnitude over a relatively short range of temperature, it is very sensitive to details of the potential model. One alternative has been through group contributions, but results have been mixed. A typical example is the work of Fredenslund and co-workers based on UNIFAC groups.5,6 The UNIFAC effort, although claiming an accuracy of ∼3% for the training set, was restricted to pressures below 0.3 MPa, and the method was not validated beyond the training set. Recently, Asher et al. adapted the UNIFAC model as the basis for a customized optimization © 2012 American Chemical Society

targeted at 76 compounds with the temperature range restricted to 290−320 K.7 Despite this customization of data set and conditions, the validation set indicated deviations averaging near 300%. Bureau et al. performed a similar evaluation for seven esters and found deviations averaging near 80%.8 Other alternatives implement the ConstantinouGani9 or Joback10 methods. Emami et al. have evaluated these group contribution approaches and more, finding that the best results with group contribution methods provide roughly 30% accuracy in their predictions of vapor pressure and 4% in liquid density.11,12 Group contribution methods provide a basis for comparison that more sophisticated methods should surpass. Examples of predictions based on molecular simulation are the works of Siepmann and co-workers,13,14 Fuchs et al.,15 and Unlu et al.2 Siepmann and co-workers refer to their model as the TraPPE-UA (transferable potentials for phase equilibria united atom) method and report average deviations in vapor pressure near 40%. Fuchs et al. apply a potential model known as the AUA (anisotropic united atom) approach and report average deviations closer to 20%. The AUA approach employs roughly three parameters per site type, whereas the TraPPE method has only two. The method of Unlu et al. is referred to as the SPEADMD (step potentials for equilibria and Received: Revised: Accepted: Published: 6219

June 7, 2011 February 4, 2012 March 8, 2012 March 8, 2012 dx.doi.org/10.1021/ie201186q | Ind. Eng. Chem. Res. 2012, 51, 6219−6231

Industrial & Engineering Chemistry Research

Article

2. PROBLEM FORMULATION There are significant advantages in recognizing the accuracy of TPT for discontinuous potentials. These stem largely from noting that all attractive interactions are negligible for the reference fluids. This means that the exact shape of the attractive potential is not important during the computationally intensive simulation phase. The step depths represent perturbations that are specified during the TPT application, so many candidate potentials can be evaluated in rapid succession after the simulation is over. Therefore, the precise characterization of the attractive potentials can be optimized to a high degree of precision in a short amount of time. Barker and Henderson18 observed that TPT for multistep potentials could be formulated such that the second order term was characterized rigorously from purely repulsive simulations. The essential equations are given as follows:

discontinuous molecular dynamics) model, also applying three parameters per site type and reporting average errors for straight chain hydrocarbons, ethers, and alkanes of less than 10%. All these methods apply united atom (UA) models. UA models subsume the hydrogens as part of their parent atoms in moieties such as CH3, CH2, and OH. Explicit atom models introduce more decision variables and can reduce average deviations. For example, the TraPPE-EH model achieves roughly 15% deviation for n-alkanes.16 On the other hand, explicit atom models are more computationally intensive and more complex for optimization. Therefore, they are not considered as part of the present work. This diversity in underlying potential models suggests a need to evaluate multiple perspectives to determine which perspective offers the greatest advantage, if any. In this spirit, we previously reported results for a wide range of potential models, without describing the optimization methodology in detail.17 The present work is primarily focused on describing that methodology. Many alternative methods have been applied in developing the commonly applied potential models. Most of these have focused on liquid density and cohesive energy density over a fairly narrow range of temperatures. To formulate a basis for broad application to the estimation of engineering physical properties, we have included data for a broad range of conditions and compounds. Vapor pressure is one of the most important properties for characterizing phase equilibrium, and liquid density is important in estimating many other properties that are not explicitly treated here, viscosity for example. Ideally, it should be feasible to construct a discretized version of any continuous potential. In this way, the benefits of DMD/ TPT could be applied when globally optimizing the potential parameters. Then the continuous potential could be applied as desired. The primary obstacle to this approach is identifying the effective diameter and step shape to characterize the softness of continuous potential models. This represents a final refinement to the normal procedure that merits exploration. In this work, we detail and demonstrate a methodology for efficiently and continuously improving transferable intermolecular potential models. For this purpose, a database is constructed of hydrocarbons and oxygenated compounds, namely n-alkanes, branched alkanes, alkenes, aromatics, naphthenics, alcohols, phenols, aldehydes, and acetates. This database is used to obtain a model to describe the attractive potential for 26 site types (united atoms) from which 231 compounds are built. The algorithm for optimization is based on both a stochastic search algorithm, namely random recursive search (RRS), and a gradient-based local search algorithm, namely Levenberg−Marquardt (LM). Using this hybrid method, the attractive potential is fitted to two models that differ in the number of parameters to describe the potential. For illustration, the parameters for Lennard-Jones (LJ) and Linear-2580 models (described in section 2) are optimized. For the LJ model, the root mean squared vapor pressure error (% RMSE) is around 20%, and for the 2580 model, the average error is roughly 13%. To demonstrate the methodology for building a correspondence between a continuous model and a step potential, a discretized version of the TraPPE model is developed using a reduced set of compounds and site types. Finally, we conclude with several general observations about limitations of the current methodology and means of making further progress.

A A A − Aig = aref + 1 + 22 + ... NkBT T T

(1)

where A is the Helmholtz energy, the superscript “ig” signifies an ideal gas, T is temperature (K), N is the number of molecules, and aref is the Helmholtz departure of the reference fluid given by a ref =

A1 =

Aref − Aig = NkBT

∫0

η

Z ref − 1 dη η

1 ∑ ∑ ∑ ⟨Nijm⟩uijm NkB i j m

A2 = −

(2)

(3)

1 ∑ ∑ ∑ ∑ ∑ ∑ (⟨NijmNkln⟩ 2NkB 2 i j m k l n

− ⟨Nijm⟩⟨Nkln⟩)uijmukln

(4)

kB is Boltzmann’s constant. In eqs 3 and 4 uijm designates the attractive energy in the mth well between the ith and jth site types. Nijm in eqs 1−3 is the number of pairs of interactions obtained from the reference fluid simulation and N is the number of molecules. The ⟨·⟩ term denotes an ensemble average of the reference fluid. In eq 2, Zref is the compressibility factor which is derived directly from simulations of the reference fluid over a range of packing fractions, η, and interpolated by regressing Zi with i = 1, 2, 3. The necessary regression equation is given by Z ref = (1 + Z1η + Z 2η2 + Z3η3)/(1 − η)3

(5)

A1 and A2 in eqs 3 and 4 are evaluated from reference fluid simulations over a range of densities and then interpolated from state to state by Pade approximants in η. For associating compounds there is an additional contribution, Aassoc, given by Wertheim’s theory. DMD simulation tabulates the values for aref and number of interactions for all sites in all distances at each density. Inserting the energy values into eqs 1−4, one obtains the Helmholtz energy. Cui and Elliott demonstrated that the results from second order TPT are quantitatively consistent with the full potential simulation. Equations 2−4 define a complete equation of state at all temperatures and densities. Well-known derivative properties result in the internal energy, compressibility factor, and fugacity. The vapor pressure and saturated liquid density are computed through the isofugacity criteria. In this way, time-consuming 6220

dx.doi.org/10.1021/ie201186q | Ind. Eng. Chem. Res. 2012, 51, 6219−6231

Industrial & Engineering Chemistry Research

Article

simulation of trial values for the attractive potential is replaced by a straightforward algebraic function evaluation. More typical approaches to optimizing force fields require considerably more computationally intensive functional evaluations. The method of Ungerer et al. requires 12 simulations of the full potential to evaluate one state point and one equilibrium point for each compound, even when the initial parameters were close to the optimum.19 The method of Huelsmann et al. includes methods like those applied in the present work, but for a much narrower range of conditions.20 For example, a typical optimization based on a single state point of a single compound (without equilibrium) would require roughly70 molecular simulations of the full potential. Molecular simulations of a new trial potential model can require hours or days with these two approaches, especially for a molecule like triacontane, compared to seconds with the DMD/TPT methodology. It was demonstrated that a single square well with a width of λ = 1.5 was inaccurate for correlating the vapor pressure of nalkanes in developing the transferable potential function,21 Cui and Elliott showed that a single well with λ ≈ 1.9σ was much better, yet could not fit all the data. Martin and Siepmann have shown that the Lennard-Jones potential exhibits transferability that is more favorable.13 Chapela has shown that a discrete LJ potential can provide properties that are similar to those of the continuous potential for spheres.22 We seek transferable potentials in the sense that the same potentials are used for the CH3 and CH2 groups in ethane, butane, octane, etc. At a given temperature, vapor pressure and saturated liquid density for a compound are calculated using the interaction energies between sites. These energies are defined for each site individually, and the strength of the interaction varies with distance between interacting sites. The attractive part of the potential is thus divided into 11 subintervals which are referred to as wells with r/σ ratios of 1.0−3.0, at 1.10, 1.15, 1.20, 1.30, 1.40, 1.50, 1.70, 1.80, 2.00, 2.40, and 3.00. The potential at r/σ > 3.0 is assumed to be zero. The well depths correspond to the strength of interactions between sites. The absolute value of these depths is a monotonic function of the distance between interacting sites, except when developing a soft equivalent to the LJ potential, for which a single “shoulder” step suffices. 2.1. Models. Many models for representing the attractive potentials are conceivable. We consider two models: (1) step Lennard-Jones; (2) Linear-2580. A detailed analysis of a wider range of potential models was reported previously.17 This work focuses on two models for illustrative purposes. 2.1.1. Lennard-Jones Models. Two versions of the LJ model have been examined in detail. We refer to these as the LJ11i and SLJ11 models. In the LJ11i model, the first three well depths are set equal to the attractive parameter, ε, and the other eight wells are set to the “inside” of the LJ function. ⎛⎛ σ ⎞12 ⎛ σ ⎞6 ⎞ y = 4ε⎜⎜ ⎟ − ⎜ ⎟ ⎟ ⎝r⎠ ⎠ ⎝⎝ r ⎠

Figure 1. Graphical representation of LJ11 model.

ui α = (1 − α)ui in + αui out

(7)

These alternative cases are also illustrated in Figure 1 as the LJ11o and SLJ11 cases. In the SLJ11 case, the illustration shows that α > 1 is a possibility, although it may be counterintuitive at first glance. The SLJ11 case also includes the final added feature of a “shoulder” being applied in the first well. The depth of the shoulder well provides an additional optimization parameter, ε1. For example, the value of ε1/ε is 0.5 in Figure 1. In one sense, the SLJ11 model has four optimization parameters: σ, ε, α, and ε1. On the other hand, the LJ model has only two parameters, σ and ε. Furthermore, the value of σ is only weakly coupled to the value of ε because liquid density deviations are much more sensitive to σ than to ε. The identification of a single parameter per site type (e.g., ε) increases the efficiency of the optimization and facilitates the determination of whether two site types are distinguishable in the way that the CH3 in n-alkanes is distinguishable from the CH3 in branched alkanes. We can recover these advantages by treating α and ε1/ε as global parameters. In other words, we can apply the following iterative procedure: (1) Assume {σ} and perform all molecular simulations. (2) Assume values of α and ε1/ε and then globally optimize {ε} values. (3) Repeat step 2, systematically varying the values of α and ε1/ε to minimize vapor pressure deviations. (4) Repeat steps 1−3 to minimize density deviations. The manner of conducting step 2 approximates a steepest descent method, as illustrated under Results and Discussion. 2.1.2. Linear-2580 Model. In this model the attractive potential is discretized into four wells from r/σ of 1.0 to 2.0. The potential energy is assumed to be zero for r > 2.0. The Linear-2580 model (Figure 2) has two parameters per site type, ε1 and ε4. The four wells are located from r/σ of 1.0 to 1.2, 1.2 to 1.5, 1.5 to 1.8, and 1.8 to 2.0. The first well depth is equal to the first parameter ε1; the last well depth is equal to the second parameter ε4. The second and third well depths are equal to ε2 and ε3, respectively, and ε2 and ε3 are obtained by a linear interpolation between ε1 and ε4.

(6)

A graphical representation of the LJ11i model is given in Figure 1. Upon reflection, one might observe that the choice to set the step depths to the inside of the LJ function is arbitrary. One might choose to set the step depths according to the “outside,” or to interpolate between these two cases. The formula for interpolating between these definitions uses an interpolation parameter α as given by 6221

ε2 = ε4 +

2 (ε1 − ε4) 3

(8)

ε3 = ε4 +

1 (ε1 − ε4) 3

(9)

dx.doi.org/10.1021/ie201186q | Ind. Eng. Chem. Res. 2012, 51, 6219−6231

Industrial & Engineering Chemistry Research

Article

min pSSR with respect to uijm ∈ [0, 400] exp where pSSR = f (PCcalc , T , PC , T )

PCcalc , T = f (uijm)

2.3. Summary Statistics for Comparing Models. It is convenient to define several quantities that summarize the results of the optimization. The percent absolute average deviation (%AAD) is commonly quoted to describe the quality of fit. %#AAD =

#icalc − #iexp #iexp

∑ i

(13)

where “#” is the property of interest (P or ρ). The %bias characterizes the skewness of the fit. Figure 2. Linear-2580 model.

%# bias =

⎛ #calc − #exp ⎞ i i ⎟⎟ exp # ⎠ ⎝ i

∑ ⎜⎜ i

Normally, the bias should be near zero, but it may deviate from zero if a few compounds with poor fits tend to dominate the pSSR. The %bias may also deviate from zero in cases where further optimization is indicated. For example, using site diameters optimized for the 2580 model may not provide optimal density characterization for the SLJ model. In that case, the %bias might approach the %AAD, indicating that the diameters were too small in every case. The maximum deviation (%max) is simply the maximum value in any given set. The maximum deviation provides a measure of the worst-case analysis. In this work, RMSE refers only to the root mean squared error in pressure, because it is the primary property of interest and dominates the response surface.

2.2. Optimization Problem. In eqs 3 and 4, the interaction strength between two sites i and j at a certain distance, i.e., at the mth well is defined as uijm =

uiimujjm

(10)

Through the derivative of the Helmholtz energy, the pressure at a specified temperature can be considered as a function of interaction energies between sites. Based on the observation that the potentials are transferable and that they can be incorporated into the system through TPT, it is necessary to determine the potentials that would minimize the deviation from the experimental data. The optimization problem with one site would be convex and have unique minima; however, when more than one site is involved, the bilinear interaction term renders the problem nonconvex and hence gives rise to a global optimization problem. The objective of the optimization problem is to minimize the sum of squared residuals in vapor pressure and liquid density (pSSR + ρSSR). The pSSR dominates the response surface, however, because it varies exponentially with ε whereas the density is most sensitive to the site diameters, which are not varied during this optimization. pSSR

⎛ P calc − P exp ⎞2 C ,T C ,T ⎟ = ∑ ∑ ⎜⎜ exp ⎟ P C ,T ⎠ C T ⎝

(11)

ρSSR

⎛ ρ calc − ρ exp ⎞2 C ,T C ,T ⎟ = ∑ ∑ ⎜⎜ exp ⎟ ρ C ,T ⎠ C T ⎝

(12)

(14)

pSSR

RMSE =

∑C ∑T 1

(15)

When parameters are optimized, a measure of their sensitivity and significance is provided by the standard error in the estimated parameter. For example, it might be suggested that the CH3 site in methylphenols is distinct from the CH3 site in methylbenzenes. To test this hypothesis, optimization could be performed assuming that they are distinct, examining the values of the parameters and their standard errors. If these ranges overlap, then the null hypothesis is that these sites are not distinct. The formula for estimating the standard error is given by the trace of the Δ matrix, described by Ralston and Jennrich as23 Δi =

calc where Pcalc C,T is the calculated vapor pressure and ρC,T is the calc calc calculated liquid density. PC,T and ρC,T are dependent on the well depth of the ith site in the jth well, Pexp C,T is the experimental data for vapor pressure, and subscripts C and T denote the number of compounds and the number of temperature points, respectively. The decision variables in the optimization problem are the well depths in the potential model. For a given set of site diameters, the objective function is most sensitive to pSSR, but small variations in the decision variables may help to reduce ρSSR slightly, motivating its inclusion. Hence most discussion focuses on pSSR. In short, the objective function and the decision variables can be summarized as

Δi , i 2

Δ2 = pSSR (J′·J)−1

(16) (17)

where Jij = ∂Fi/∂θj is the Jacobian of the deviation vector, F, with respect to the parameter vector, θ.

3. OPTIMIZATION APPROACH There are several optimization algorithms developed to address chemical engineering optimization problems that can be classified as deterministic or stochastic. Deterministic algorithms such as Levenberg−Marquardt can only guarantee local optimality. Stochastic methods such as simulated annealing can achieve global optimality, so they can be applied to many 6222

dx.doi.org/10.1021/ie201186q | Ind. Eng. Chem. Res. 2012, 51, 6219−6231

Industrial & Engineering Chemistry Research

Article

simple and widely used search technique. It takes random samples from a uniform distribution over the variable space. Despite its simplicity, random sampling is able to provide a strong probabilistic convergence guarantee. Furthermore, random sampling has proved to be surprisingly more efficient than deterministic exploration methods, such as grid covering, in terms of some probabilistic criteria, and it is especially so for high-dimensional problems.31 As the number of function evaluations increases, the region bracketing the true optimum diminishes within the r-percentile. However, this decrease levels off with increasing number of function evaluations. This means that random sampling is efficient initially, but loses efficiency as the number of function evaluations increases. The disadvantage of random sampling is its apparent lack of efficiency in the later stages. The basic idea of RRS is to keep the initial efficiency of random sampling by restarting it before its efficiency becomes low. The restart of the random sampling is accomplished by changing its sample space. Basically, random sampling is performed for a number of times, and then the sample space is moved or resized according to the previous samples and another random sampling is restarted in the new sample space. At the beginning of the search, RRS performs sampling from the entire variable space and thus examines the overall structure of the objective function. With the search continuing and the sample space gradually shrinking, the search algorithm obtains more detailed information about the objective function until it finally converges to an optimum. A stochastic search algorithm usually comprises two elements: exploration and exploitation. Exploration aims to identify promising areas in the parameter space, while exploitation attempts to exploit local information to quickly improve the solution. Basically, the RRS algorithm uses random sampling for exploration and recursive random sampling for exploitation. In the exploration part, RRS determines promising areas and executes the exploitation procedure only in these areas. Restart of the random sampling is performed through resizing or moving of the sample space. This resizing is performed only if a better result than that in the last sampling could not be found and this procedure is basically “shrinkage” of the space around the best set of variables by a shrinkage ratio. If a better result is found, then the variable space is realigned such that the best set of variables is in the center of the new space with the same size. With realignment and shrinkage alternately performed, the sample space gradually moves to the local optimum. This local optimum is probably the global optimum if multiple explorations and exploitations converge to the same solution. 3.3.1. Exploration Phase. Within the space assigned for variables, a search point is generated using uniformly distributed random numbers in the interval [0, 1] in accordance with the RANDOM() function of FORTRAN. No attempt was made to improve the random number generation. Then the trial point xij at which the interaction energies are obtained is determined from the equation:

optimization problems regardless of convexity. On the other hand, they require hundreds of thousands of function evaluations. In the present case, one function evaluation implies evaluating roughly 5000 saturated liquid densities and vapor pressures for an assumed set of intermolecular potential functions, making it extremely expensive. As a compromise, we consider stochastic methods that may not guarantee global optimality, but reduce function evaluations while capturing most of the benefits of stochastic optimization. Fortunately, with appropriately selected parameters, these intermediate methods have a high probability of locating the globally optimal solution.24 Among these stochastic methods are recursive random search (RRS),25 tabu search (TS), genetic algortihm (GA),26 and simulated annealing (SA).27 SA and GA are the most widely used algorithms since they require little a priori information from the concerned problem and are generally applicable. The disadvantage of these algorithms is that they lack efficiency. In general, these are combined with local search methods to increase their efficiency. TS is a heuristic approach for solving optimization problems by using a guided, local search procedure to explore the entire solution space without becoming easily trapped in a local optimum. It differs from other stochastic optimization techniques by maintaining lists of previous solutions that help guide the search process. RRS is based on the initially high efficiency of random sampling. Besides this high efficiency, the RRS algorithm is also robust to random noise and trivial parameters in the objective function. Stochastic algorithms tend to diminish in efficiency during the final stages of optimization, especially in comparison to gradient-based methods when the response surface is smooth. Local search algorithms are typically gradient based, such as the Levenberg−Marquardt (LM)28 and quasi-Newton methods.29 In this work three optimization algorithms are considered. These are RRS as a stochastic method and LM and quasiNewton as local search algorithms. 3.1. Levenberg−Marquardt Algorithm. A general class of algorithms using gradients can be expressed as xk + 1 = xk − αMk −1∇f (xk)

(18)

where Mk is an n × n matrix, α is a positive search parameter, and ∇f(xk) is the gradient at xk. When Mk is equal to identity, I, the algorithm is equivalent to steepest descent, and when Mk is the Hessian, H, it is equivalent to Newton’s algorithm. Levenberg−Marquardt is a compromise between these two and is obtained by taking Mk = (H + λkI), where λk is a positive number which makes the M k positive definite.28 The Levenberg−Marquardt algorithm is implemented in MINPACK, using internal scaling of the variables.30 A trust region is applied and refined in accordance with the Marquardt parameter and ratio of improvement in the objective function relative to the initial guess. 3.2. Quasi-Newton Method. A quasi-Newton method was briefly considered during the initial stages of development. It was the quasi-Newton method available in the IMSL package included with the Fortran77 compiler and implemented the Broyden−Fletcher−Goldfarb−Shanno (BFGS) formula. These initial evaluations did not demonstrate a significant reduction in the number of function evaluations, while added robustness was expected for the LM method. Therefore, the quasi-Newton method was not evaluated further. 3.3. Recursive Random Search. In this work, RRS is used as the randomized sampling algorithm. Random sampling is a

xij = LBij + (UBij − LBij) ·rand

(19)

where “rand” stands for a uniformly distributed random number; LBij is the lower bound and UBij is the upper bound for the jth parameter of the ith site. With the trial set of interaction energies obtained at randomly selected points, saturation points are calculated using the necessary thermodynamic relations. This procedure is 6223

dx.doi.org/10.1021/ie201186q | Ind. Eng. Chem. Res. 2012, 51, 6219−6231

Industrial & Engineering Chemistry Research

Article

Table 1. Optimization of Nine Site Types for 29 Compounds with Various Settings switch RMSE

# explore

# exploit

RRS RMSE

RRS FunCalls

LM RMSE

50 40 50 50 50 60 50

150 180 180 180 180 180 180

90 60 60 70 80 80 90

67.7 39.2 49.5 51.3 49 57 48.7

1505 1062 801 1461 982 787 917

no conv 33.2 31.8 no conv 11.7 56.7 11.7

LM FunCalls

total FunCalls 1505 1114 873 1461 1044 798 1061

52 72 62 11 144

Table 2. Optimization with Various Settings Using a Reduced Data Set of 27 Compounds and Seven Site Types n explore

n exploit

LM switch

RRS RMSE

RRS FunCalls

LM RMSE

LM FunCalls

total FunCalls

200 280 300 300 300 350 350 350 350 350 350 400

100 120 110 120 130 20 40 60 80 100 120 20

50 50 50 50 50 50 50 50 50 50 50 50

66.7 48.5 49.8 47.4 48.6 56.6 49.2 43.9 45.6 44.9 46.1 66.8

1863 997 1149 1200 1189 680 579 633 758 809 811 755

66.6 37.9 39.4 11.7 11.7 no conv 11.7 11.7 11.7 11.7 11.7 no conv

9 60 57 58 50

1872 1057 1206 1258 1239 680 629 674 800 859 861 755

final optimum, however, even as the search region was extremely narrowed at great computational expense. By returning to a gradient-based local search, and tightening the convergence tolerance on everything from the density and vapor pressure computations to the overall ρSSR and pSSR, we found that the local search could converge to a unique optimum. A detailed investigation of the response surface for a limited database revealed the nature of the problem. As shown in Figure 3, the response surface is very smooth but very flat in the vicinity of the optimum. Figure 3 shows the response of RMSE for vapor pressure as a function of εCH3 and εCH2 assuming the SLJ11 model. The optimal value of RMSE = 29.8 is in the vicinity of εCH3 = 125.7, εCH2 = 96.5. These vary

repeated R times and in each repetition S subiterations are carried out. The optimum function value and the decision variable vector are stored and passed to the exploitation phase. 3.3.2. Exploitation Phase. The initial space of search in the exploration phase is resized by a shrinkage factor and relocated around the optimum decision variable point obtained in the exploration phase in the following manner. LBijnew = xij − (UBij − LBij)

k 2

50 41 42 50 50

(20)

k (21) 2 where k is the shrinkage ratio between 0 and 1 and usually taken around 0.6; xij is the best point obtained from the previous iteration. After obtaining the new search space, R iterations with S subiterations in each are carried out and the optimum function value is compared to that of the previous one. If this newly obtained function value is smaller than the old one, then the space of search is relocated around this new point. If this is not the case, the space is shrunk around the old point by a factor k. These alternating shrinkage and realignment procedures are performed until a termination criterion for a single exploitation phase is satisfied. Our termination criterion for single exploitation was when two consecutive shrinkages failed to improve the estimated optimum. The maximum allowed function calls for exploitation was varied as listed in Tables 1 and 2. 3.4. Necessity of a Gradient-Based Local Optimization. Stochastic optimization is indicated in the presence of a rough response surface, but gradient-based optimization should be applied when the surface is smooth. In our early work, we found that slightly different optimal parameters were obtained when different initial guesses were applied. This suggested a rough response surface with many local optima. Further work with RRS indicated a similar variability in convergence to the UBijnew = xij + (UBij − LBij)

Figure 3. Response surface of RMSE of vapor pressure vs εCH3 and εCH2 for n-alkanes. 6224

dx.doi.org/10.1021/ie201186q | Ind. Eng. Chem. Res. 2012, 51, 6219−6231

Industrial & Engineering Chemistry Research

Article

molecules in which they occur. Table 5 lists the values of the well depths characterizing each site type. Table 6 summarizes the deviations for each family of compounds. Two observations are immediately apparent from Table 6, pertaining to the SLJ11 model. First, the deviations in vapor pressure and liquid density are substantially larger for the SLJ11 model. Second, the bias in liquid density deviations makes it apparent that the diameters selected for the 2580 potential are inappropriate for the SLJ11 model. These observations suggest that a careful comparison of the potential models must include optimization of the diameters and liquid density as well as vapor pressure. In section 4.1, we discuss observations about the optimization algorithm and the rationale for the recommended procedure. In section 4.2, results of the optimization are presented while applying the diameters developed for the 2580 model to both the 2580 and SLJ11 models. This leads to a systematic procedure for reducing the model to its optimal number of site types. In section 4.3, we illustrate how the present methodology permits fairly efficient optimization of alternative site diameters while retaining a close connection with a continuous potential model, developing a step equivalent of the TraPPE model as the initial basis. 4.1. Recommended Procedure for Force-Field Optimization. We recommend a two-stage optimization that combines a stochastic search when parameters are poorly anticipated and a gradient search when reasonable guesses for the parameters are known. In particular, we illustrate the combination of RRS with LM. The problem with using LM at the outset is that poor initialization results in poor convergence of the vapor pressure iteration. When vapor pressure iteration does not converge, a penalty value of 10000% is assigned as the deviation for that data point. With a poor initialization, and a nearby search to determine the gradient, the same points fail to converge at every function evaluation and the objective function may not change at all. This would mean that there is no gradient, or if there is one, it is primarily a result of happenstance. Our experience has shown that the objective function rarely improves when this approach is used at the outset. When we apply a stochastic method, however, the search over the entire bounded range quickly identifies site parameters that promote convergence of the vapor pressure iteration. This results in a set of parameters that make estimation of the gradient feasible. Then LM can be applied for the final refinement. Figure 4a illustrates an iteration history using a data set of 27 hydrocarbons with seven site types. For this illustration, α, ε1/ε, and {σ} have been previously optimized as discussed in section 4.3. In the initial stage, exploration and exploitation are combined to achieve a relatively rapid improvement in the objective function. When an acceptable RMSE is achieved, the method switches to LM. This setting results in 192 function evaluations for the initial search, illustrated by squares in Figure 4a for each function evaluation. The reduced search space is then restarted with a new center around the best result so far, but with the search space reduced to 30% for each decision variable. This means that regions outside the original search space may become feasible if the best results tended to lie near the edge of the original search space. The exploration and exploitation is conducted again, and this cycle is repeated until RMSE < 50%. The choice of 50% was somewhat arbitrary but was motivated by experience. For example, since deviations with the optimized TraPPE-UA model are reportedly near 30%, going much below

slightly from the values optimized for all hydrocarbons because the database has been restricted to n-alkanes. This variance is one small additional indicator of the variability of the precise optimum in the face of any change in the problem definition. Figure 3 also shows that the RMSE increases sharply when both parameters are increased simultaneously, but slowly when one parameter is increased and the other is decreased. Although precise values of εCH3, εCH2, and RMSE would vary for other potential models, the qualitative features illustrated in Figure 3 should be encountered generally.

4. RESULTS AND DISCUSSION It is necessary to organize coverage systematically when a substantial database is implemented. Ideally, all site types could be optimized simultaneously, but that would create a very large optimization problem. Several site types are fundamental, in that they occur frequently in many compounds. For example, CH3 and CH2 site types occur very frequently. In general, the hydrocarbon site types fall in the category of occurring fairly frequently, but not exceeding the number of site types that can be optimized simultaneously. We find that 18 site types are sufficient to characterize the 93 combined hydrocarbons in our database, omitting alkynes. A brief summary of the scope of compounds is given in Table 3. The detailed list of compounds in the database is available online as Supporting Information. Table 3. Summary of Compound Families, Site Types, and Molecular Weights family n-alkanes branched alkanes alkenes aromatics naphthenics combined hydrocarbons alkynes alcohols aldehydes amides amines ethers esters (including acetates) ketones phenols sulfides thiols

no. compds

MW

26 33 16 11 7 93

44−422 72−142 56−112 78−182 84−140 44−422

101, 102, 122, 501, 204,

7 18 11 5 16 9 32

54−82 60−158 73−212 59−129 45−143 60−102 60−228

16 8 9 5

58−142 94−122 62−146 76−132

107, 207, 1101, 1201 203, 303, 1403, 1404 217, 805, 1606 905, 1605, 1903 211, 311, 1901, 1904, 2001 106, 1501 109, 804, 901, 904, 1502, 1509, 1602 117, 206, 903, 1601 505, 1406 1701 1801

site types 201 103, 701, 601, 208,

301, 801, 605, 209,

401 802 104 219

After characterizing the hydrocarbon site types, additional compounds and site types are included while assuming the combined hydrocarbon site types as predetermined. For nonhydrocarbons, the site types listed beside each family were those optimized specifically for that family. In this way, the characterizations could proceed in a systematic fashion while keeping the number of simultaneously optimized parameters to a minimum. Alkynes were not included among the hydrocarbons because the long-term prospect for adding large numbers of alkynes to the database seemed unlikely. Table 4 lists the site types considered in this work along with their numerical identifiers and brief examples of the types of 6225

dx.doi.org/10.1021/ie201186q | Ind. Eng. Chem. Res. 2012, 51, 6219−6231

Industrial & Engineering Chemistry Research

Article

Table 4. Site Type Identifiers ID

diam 2580 (nm)

abbr

101

0.363

CH3a−

102

0.363

CH3b−

103

0.363

CH3c−

104 106

0.363 0.363

CH3d− CH3f−

107

0.363

CH3g−

109 117 122

0.363 0.363 0.363

CH3i− CH3q− CH3v−

201 203 204 206

0.357 0.357 0.357 0.362

>CH2a >CH2o >CH2b >CH2k

207

0.357

>CH2y

208

0.357

>CH2ar

209

0.357

>CH2r

211

0.357

>CH2n

217 219

0.362 0.357

>CH2al >CH2rb

301

0.39

>CH−a

303 311 401

0.39 0.39 0.3425

>CH−d >CH− >C
ACt >ACa

example

ID

diam 2580 (nm)

1° terminus, e.g., end groups of n-octane, cis-2-butene 2° branch, e.g., all CH3’s in isobutane or 2,3-dimethylbutane 3° branch, e.g., neopentane or 2,2dimethylhexane bonded to aromatic ring, e.g., toluene in methyl ether or ester, e.g., methyl butyrate in 2-alkyne, e.g., 2-butyne, 2-pentyne, 2hexyne in an acetate, e.g., hexyl acetate 2-ketone, e.g., acetone and 2-butanone 2-alkene, extra well for cis-bonded CH3, e.g., cis-2-pentene n-alkyl, ether, ester, e.g., n-octane bonded to OH, e.g., 1-propanol, 1-butanol bonded to olefin, e.g., 1-hexene, 2-hexene CH2 next to carbonyl carbon, e.g., 2pentanone CH2 in alkyne, e.g., 2-pentyne, 3-hexyne (for 180° bond angle) bonded to aromatic ring, e.g., ethylbenzene in a nonaromatic ring (see also 2−19 30 27) bonded to nitrogen or amine, e.g., 3aminopentane bonded to aldehyde, e.g., butyraldehyde bonded to a nonaromatic ring, e.g., ethylcyclohexane alkyl branch, e.g., isobutane, 2methylhexane 2° alcohol branch, e.g., 2-butanol bonded to nitrogen 3° alkyl branch, e.g., 2,2-dimethylhexane, 1,1-dimethylcyclohexane aromatic carbon, e.g., benzene

701

0.35

CH2a

α-olefin, e.g., 1-hexene

801

0.35

CH−

alkene, e.g., 3-pentene

802

0.35

CH−b

804 805

0.35 0.35

CH−su CH−F

901

0.35

C