Solvent Design Using a Quantum Mechanical Continuum Solvation

Jan 6, 2006 - Department of Chemical Engineering, Centre for Process Systems Engineering, ..... The problem is therefore solved using the outer approx...
0 downloads 0 Views 182KB Size
1128

Ind. Eng. Chem. Res. 2006, 45, 1128-1140

Solvent Design Using a Quantum Mechanical Continuum Solvation Model T. J. Sheldon, M. Folic´ , and C. S. Adjiman* Department of Chemical Engineering, Centre for Process Systems Engineering, Imperial College London, London SW7 2AZ, U.K.

The design of solvents for solutes typically found in the pharmaceutical and agrochemical industries is considered. These solutes are usually aromatic, with several heteroatoms, and they therefore exhibit complex interactions with solvents. As a result, detailed models are often needed to predict the behavior of solute/ solvent systems. The use of the SM5.42 continuum model of solvation1-4 in solvent design is investigated. This model is based on a quantum mechanical representation of the solute. An optimization-based molecular design problem is formulated with the simple objective of minimizing the free energy of solvation. The solvent properties needed to calculate the free energy of solvation are obtained using group contribution methods. The resulting problem is a nonconvex mixed-integer nonlinear program with mixed-integer algebraic constraints. The outer-approximation algorithm is implemented to solve this optimization problem, using a combination of analytical and numerical gradients. Several case studies are solved, based on HF/6-31G* quantum calculations. Monofunctional and bifunctional aromatic compounds are used as test solutes. The design of the solvent is based on combinations of up to 41 different atom groups. In all cases, the algorithm identifies the best solvent in a small number of iterations. The required CPU time is up to 9 times smaller than that needed to evaluate all possible solvent structures. 1. Introduction Solvents are frequently used in the process industry and are especially common in the pharmaceutical and agrochemical industries5 where they are used to carry out a variety of tasks such as reaction, transport, and separation. In many of these operations, the affinity between solvent and solute plays an important role, determining the feasibility and/or performance of the processing task. The infinite dilution activity coefficient for a solute (1) in a solvent (2), γ∞12, is often used as a measure of this affinity and is therefore a useful tool in solvent selection.6 Several approaches have been proposed for the prediction of γ∞12. The UNIFAC group contribution method, based on the concept of a mixture of groups, can be used in its original form7 or one of its modified forms.8,9 One issue which often arises with complex solutes is that all group interaction parameters may not be available. An approach has recently been proposed to estimate missing parameters when few experimental measurements are available.6,10 Another issue which can prevent the reliable prediction of γ∞12 for complex solutes is the presence of multiple functional groups and of aromatic rings, which can lead to significant proximity effects. The development of a second-order version of UNIFAC, KT-UNIFAC,11 extends the applicability of the approach by including connectivity information and correcting for proximity effects. Nevertheless, the full connectivity of the solute, which is always known during solvent design, is not used as an input for the predictions, and this loss of information may result in poor predictions for complex systems. Another class of methods that has recently been developed is based on computational chemistry. The COSMO-RS approach12 uses separate quantum mechanical calculations for solute and solvent. These calculations yield molecular surface descriptors for the individual molecules. These are then used in a simple procedure to predict the phase behavior of the * To whom correspondence should be addressed. E-mail: [email protected]. Tel.: +44 (0)20 7594 6638. Fax: +44 (0)20 794 6606.

solvent/solute mixture. This approach allows solvent screening by using a library of molecular descriptors for a wide range of solvents. It has also been shown13-15 that the free energies of solvation calculated using a quantum mechanical continuum model of solvation16,17 can be related to infinite dilution activity coefficients. The free energy of solvation, defined as the difference between the free energy of a solute in the gas phase and its free energy in solution, is, thus, a useful tool for solvent selection. One of the main benefits of continuum models of solvation is that the solute molecule is represented in detail and to a high degree of accuracy. The connectivity of the solute is an input to the model, and proximity effects are, therefore, well accounted for. As a result, complex organic molecules can readily be studied. The solvent, which is typically a small molecule, is represented implicitly via a parameterization based on a few bulk properties. Despite the wide applicability of such an approach, the computational cost of each free energy of solvation calculation is high and it is not yet clear how these techniques can be incorporated in the larger, process-centered, solvent design problem, where several criteria are typically taken into account.18-26 A first step in this direction has been taken with the work of Lehmann and Maranas,27 who have considered the use of quantum mechanics (QM) to design refrigerants and solvents. The use of computationally demanding property prediction techniques for molecular design has also been discussed by Harper and Gani,28 who proposed a multilevel generate-and-test approach in which only molecules which pass inexpensive property tests are then screened with more expensive tests. In recent years, optimization-based molecular design techniques have emerged as a promising framework for solvent design (see ref 29 for an overview), in which it seems possible to handle expensive computations. One characteristic of interest in this context is that convergence of the optimization algorithm to the optimal solvent(s) typically occurs without testing all possible solvents in the design space. Another characteristic is that these approaches allow several design criteria to be taken into account simultaneously, in the form of constraints on the

10.1021/ie050416r CCC: $33.50 © 2006 American Chemical Society Published on Web 01/06/2006

Ind. Eng. Chem. Res., Vol. 45, No. 3, 2006 1129

allowable range of values of some solvent properties, such as the boiling point, and in the form of objectives, such as economic (e.g., ref 30) and/or environmental (e.g., ref 31) performance measures. There has been some work on using computationally demanding models for performance evaluation in optimization-based approaches (e.g., ref 30). However, this has focused mainly on the use of detailed process models, and the treatment of physical properties has generally been limited to group contribution and connectivity index methods, with the notable exception of the work of Lehmann and Maranas27 in which the feasibility of using the model of Lin and Sandler13 was investigated. They based their approach on a stochastic optimization algorithm (genetic algorithm) tuned by using group contribution methods and showed that it is indeed feasible to identify optimal solvents within a search space of approximately 50 molecules. The quantum mechanical calculations were treated as a black box. In the present work, the integration of a quantum mechanical model of solvation within an optimization-based solvent design framework is investigated further. A deterministic optimization approach is followed with the aim to design solvents using a large search space of several thousands of molecules. Furthermore, a different continuum solvation model from that in ref 27 is used to evaluate solvent properties. While this model has not been extended to calculate activity coefficients, it has been developed over a long period of time and is applicable to a very large set of solutes and solvents. The formulation of the solvent design problem is first presented, including an overview of the solvation model selected for this work. The solution algorithm is then introduced and is applied to several examples. The performance of the proposed approach is discussed. 2. Formulation of the Solvent Design Problem A basic design problem is formulated in order to investigate whether it is feasible to use a continuum solvation model within a solvent design scheme. Given a solute, the goal is to find the molecular structure of the solvent that minimizes the standard free energy of solvation of the solute, ∆G°s. Upper and lower bounds are imposed on the solvent melting point and boiling point. 2.1. Overall Problem Formulation. The solvent design space is defined by a set G of atom groups from which the solvent can be built. The solvent design problem is formulated as a mixed-integer nonlinear program (MINLP):

min ∆G°s(p)

∆G°s(p) ) min ∆G°(r,p) - min ∆G°g(r) r

r

∆G°s(p) ) ∆G°(r*,p) - ∆G°g(r*)

gsp(p,n,y) e 0 hsf(n,y) ) 0 gsf(n,y) e 0 gp(p) e 0 gmc(n) e 0 p ∈ [pL, pU] ⊂ Rl n ∈ [nL, nU] ⊂ Rm (1)

where p is an l-dimensional vector of continuous variables

(3)

The assumption of fixed geometry in the solvent can be readily lifted by noting that ∆G°g is solvent-independent and by considering the following objective: p,r,n,y

s.t. hsp(p,n,y) ) 0

(2)

where r is a vector representing the positions of the nuclei in the solute, ∆G°(r,p) is the standard free energy of the solute with geometry r in a solvent with properties p, and ∆G°g(r) is the standard free energy of the solute with geometry r in the gas phase. This latter term is independent of the solvent and, therefore, only needs to be evaluated once for any given solute. It is assumed here that the geometry of the solute does not vary when transferring the solute from the gas phase to the liquid phase. The nuclei positions are thus fixed at r* ) arg minr ∆G°g(r). The objective function then becomes

min ∆G°s(p,r) ) min ∆G°(r,p) - ∆G°g(r*)

p,n,y

y ∈ {0,1}q

representing bulk solvent properties and related variables (e.g., slacks), n is an m-dimensional vector of continuous variables representing the number of each type of atom group in the solvent, and y is a q-dimensional vector of binary variables, including variables used to force n to take on integer values. The constraints are defined as follows: hsp(p,n,y) and gsp(p,n,y) are structure-property constraints; hsf(n,y) and gsf(n,y) are structural feasibility constraints which ensure that a physically meaningful molecule is designed; gp(p) are design constraints imposed on the solvent physical properties such as the melting point; gmc(n) are molecular complexity constraints that limit the numbers and combinations of atom groups that can occur in the solvent. These last constraints are formulated based on physical insight, such as knowledge of reactivity. 2.2. Objective Function. The standard free energy of solvation is calculated using the SM5.42R model from the SMx series.1-4 In this model, the free energy of solvation depends on the following seven solvent properties: (1) Abraham’s overall hydrogen bond basicity property, B; (2) Abraham’s overall hydrogen bond acidity property, A; (3) the macroscopic surface tension at 298 K, γ; (4) the refractive index at 298 K, nD; (5) the dielectric constant at 298 K, ; (6) the aromaticity, φ, defined as the fraction of non-hydrogen atoms that are aromatic; (7) the electronegative halogenicity, ψ, defined as the fraction of non-hydrogen atoms that are electronegative halogen atoms (F, Cl, or Br). The standard free energy of solvation is expressed as

p,r,n,y

(4)

In this case, the optimization variables include the solute geometry in the solvent phase (r). Whether eq 3 or 4 is used, the evaluation of the objective function requires the calculation of the electronic density function of the solute in the solvent, using a self-consistent reaction field approach (e.g., see ref 32). 2.3. Structure-Property Constraints. The structure-property constraints, hsp(p,n,y) ) 0 and gsp(p,n,y) e 0, are presented in this section. Property estimation techniques based on group contribution have recently been derived for all the properties needed in the evaluation of the objective function.33 Abraham’s overall hydrogen bond basicity and acidity properties are obtained by group contribution. The macroscopic surface tension at 298 K, the refractive index at 298 K, and the dielectric constant at 298 K are related to several intermediate properties: the liquid molar volume at 298 K, Vm, the enthalpy of vaporization at 298 K, ∆Hvap, the Hildebrand solubility param-

1130

Ind. Eng. Chem. Res., Vol. 45, No. 3, 2006

eter, δ, and the dipole moment at 298 K, µ. Vm and ∆Hvap are obtained by group contribution (refs 34 and 35, respectively). The properties δ and µ are calculated through the correlations given in ref 33. The aromaticity and the electronegative halogenicity are additive properties, and they can therefore be predicted exactly by group contribution methods. The boiling point and melting point used in the design constraints are predicted using a group contribution method.35 For several properties of interest, second- and third-order group contribution methods exist,35,36 but only first-order contributions are used here. Second- and third-order contributions would require the introduction of connectivity information, leading to the introduction of new variables and structural feasibility constraints such as those shown in ref 37. This increased complexity would not result in significant gains in reliability because the solvent molecules to be designed are typically simple enough for their properties to be predicted well by first-order methods. 2.3.1. Abraham’s Overall Hydrogen Bond Basicity, B. The following group contribution method33 is used

{

if

0

B)



{

0

(5)

otherwise

if



ni ) 0

i∈G\GB

1

{

niAi + A0 e 0.05 ∑ i∈G

0

if

1

otherwise

niAi + A0 - NmaxyA e 0.05 ∑ i∈G -

[∑

]

[∑

]

niAi + A0 + Amax(yA - 1) e 0

[∑

(18)

A - AmaxyA e 0

(19)

Ag0

(20)

niAi + A0 e 0

where Amax is the maximum achievable value of A. 2.3.3. Liquid Molar Volume at 298 K, Vm. The liquid molar volume in cm3 mol-1 is estimated by group contribution34 using the first-order terms only. It is given by

(∑

)

niVm,i + d

i∈G



ni - NmaxyB e 0

(7)

yB -



ni e 0

(8)

i∈G\GB

where Nmax is the maximum number of groups permitted in the molecule and Bmax is the maximum achievable value of B for the given Nmax. The hydrogen bond basicity is then given by

-B +

niBi + B0 + Bmax(yB - 1) e 0 ∑ i∈G B-

niBi - B0 e 0 ∑ i∈G

(9) (10)

B - BmaxyB e 0

(11)

Bg0

(12)

2.3.2. Abraham’s Overall Hydrogen Bond Acidity, A. The following group contribution method33 is used

A)

{

niAi + A0 e 0.05 ∑ i∈G

0

if

niAi + A0 ∑ i∈G

otherwise

(13)

(17)

]

A-

Vm ) 1000

i∈G\GB

(16)

The hydrogen bond acidity is then given by

(6)

Equivalently,

(15)

niAi + A0 - (1 - yA) e -0.05

i∈G

i∈G

otherwise

(14)

Noting that the minimum value that ∑i∈G niAi + A0 - 0.05 can take is -0.166, this is expressed algebraically as

i∈G

where GB ) {CH3, CH2, CH, C}, ni is the number of groups of type i, B0 ) 0.204, and Bi is the contribution of group i to the hydrogen bond basicity. To express this relationship algebraically, the binary variable yB is defined as

yB )

yA )

-A + ni ) 0

i∈G\GB

niBi + B0 ∑ i∈G

where ni is the number of groups of type i, A0 ) -0.116, and Ai is the contribution of group i to the hydrogen bond acidity. As in the treatment of basicity, a new binary variable yA is introduced and defined as

(21)

where Vm,i is the contribution of group i to the liquid molar volume and d is 0.01211 m3 kmol-1. 2.3.4. Enthalpy of Vaporization at 298 K, ∆Hvap. The enthalpy of vaporization in units of J mol-1 is estimated by group contribution35 using the first-order terms only. It is given by

∆Hvap ) 1000

(∑

)

ni∆Hvap,i + hv0

i∈G

(22)

where ∆Hvap,i is the contribution of group i to the enthalpy of vaporization and hv0 is 6.829 kJ mol-1. 2.3.5. Hildebrand Solubility Parameter, δ. The Hildebrand solubility parameter is defined by a nonlinear relationship involving the liquid molar volume and the enthalpy of vaporization:

δ)

[

]

∆Hvap - RT Vm

0.5

(23)

where δ is in MPa0.5, ∆Hvap is in J mol-1, R is the gas constant in J mol-1 K-1, T is 298.15 K, and Vm is in cm3 mol-1. This relationship has been shown to predict δ with good accuracy for a large range of compounds when values of Vm and ∆Hvap predicted by eqs 21 and 22 are used.33 2.3.6. Macroscopic Surface Tension at 298 K, γ. The macroscopic surface tension at the solvent/air interface at 298 K in calories mol-1 Å-2 can be predicted based on the correlation given for this quantity in dyne cm-1 in ref 33

Ind. Eng. Chem. Res., Vol. 45, No. 3, 2006 1131

γ) 1.43932 × 0.01707δ2Vm1/3 1.43932 × 0.0068δ2Vm0.45

{

otherwise

if nOH + nCOOH + naCOH g 1 otherwise

1 0

(yγ - 1) -

nOH + nCOOH + naCOH - 0.5 e0 Nmax

+

sγ,1

- Myγ e 0



nUi

yµ2 - 1 (26)

nidi ∑ i∈G

(27)

∑ i∈G

(28)

+ + sγ,2 - M(1 - yγ) e 0 sγ,2

ni

(25)

+ + Four new positive slack variables, sγ,1 , sγ,1 , sγ,2 , and sγ,2 are introduced to calculate γ:

+ sγ,1

(29)

{

µ)

(∑ )

0.11

nidi

Vm-0.16 if

0.29

i∈G

∑ i∈G\G

ni g 1 and if

HC

0

nidi g 0 ∑ i∈G

otherwise

(33)

where di is the contribution of group i to the dipole moment, Vm is in cm3 mol-1, and

GHC ) {CH3, CH2, CH, C, CH2dCH, CHdCH, CH2dC, CHdC, CdC, aCH, aC, aCCH3, aCCH2, aCCH}

{ {

1

yµ ) Equivalently,

{

1 0



ni g 1

i∈G\GHC

0

yµ2 )

if

(34)

nidi g 0 ∑ i∈G

1

if

0

otherwise

if yµ1 ) 1 and yµ2 ) 1 otherwise

- yµ2 e 0

s+ µ + sµ - M(1 - yµ) e 0

µ - 0.11

(∑ ) nidi

0.29

(43)

Vm-0.16 + s+ µ - sµ ) 0

(44)

µ - Myµ e 0

(45)

s+ µ , sµ , µ g 0

(46)

2.3.8. Refractive Index at 298 K, nD. The following nonlinear structure-property relationship is used for the prediction of the refractive index33

{

1

nD ) 7.26 1

((0.48872δ)0.36 + 8.15)

14.95

if



ni g 1

i∈GnD

(0.48872δ + 13.47)

(47)

otherwise

where GnD ) {CH3O, CH2O, CH-O, CH2Cl, CHCl, CHCl2, I, Br, aCCl, aCF} and the factor of 0.48872 is used to convert δ to (cal cm-3)0.5 (1 MPa0.5 ) 0.48872(cal cm-3)0.5). A binary variable ynD is defined such that

{



1

if

0

otherwise

ni g 1

i∈GnD

(48)

Equivalently,





ni



nUi

i∈GnD

ni e 0

(49)

- ynD e 0

(50)

i∈GnD

i∈GnD

(36)

(40)

nUi di

y nD (35)

(39)

(42)

ynD )

otherwise

∑ i∈G

e0 nUi di

where nUi is the maximum number of groups of type i in the molecule. Two positive slack variables, s+ µ and sµ , are introduced, and µ is given by

Binary variables yµ1, yµ2, and yµ are defined such that

yµ1 )

nidi ∑ i∈G

yµ1 + yµ2 - 1.5 - yµ e 0

(32)

where M is a large positive scalar (M ) 5000 is used here). 2.3.7. Dipole Moment at 298 K, µ. The dipole moment in Debye is predicted using a combination of group contribution and the liquid molar volume as follows33

(38)

(41)

i∈G

+ + , sγ,1 , sγ,2 , sγ,2 g0 sγ,1

- yµ1 e 0

yµ - 0.4(yµ1 + yµ2) - 0.5 e 0

+ γ - 1.43932 × 0.01707δ2Vm1/3 + sγ,1 - sγ,1 ) 0 (30) + γ - 1.43932 × 0.0068δ2Vm0.45 + sγ,2 - sγ,2 ) 0 (31)

(37)

i∈G\GHC

This is written algebraically as

nOH + nCOOH + naCOH - yγ e 0 Nmax



i∈G\GHC

where δ is in MPa0.5 and Vm is in cm3 mol-1. A new binary variable yγ is introduced such that

{

ni e 0

i∈G\GHC

(24)

yγ )



yµ1 -

if nOH + nCOOH + naCOH ) 0

1132

Ind. Eng. Chem. Res., Vol. 45, No. 3, 2006

Four positive slack variables sn+D,1, sn-D,1, sn+D,2, and sn-D,2 are defined and nD is given by

sn+D,1

+

sn-D,1

sn+D,2 nD -

+

- M(1 - ynD) e 0

sn-D,2

 - (0.91(48µ2 - 15.5µ3)Vm-0.5 + + - s,2 ) 0 (66) 1 + 2 + 3) + s,2

(51)

- MynD e 0

(52)

+ + , s,1 , s,2 , s,2 g0 s,2

Binary variables, y1, y2, and y3 are introduced for the calculation of 1, 2, and 3 as follows

1 ((0.48872δ)0.36 + 8.15) + sn+D,1 - sn-D,1 ) 0 (53) 7.26

1 nD (0.48872δ + 13.47) + sn+D,2 - sn-D,2 ) 0 14.95 sn+D,1, sn-D,1, sn+D,2, sn-D,2 g 0

y 1 ) (54) y 2 )

(55)

2.3.9. Dielectric Constant at 298 K, E. The dielectric constant is calculated as follows33

) nD2 + 0.1 0.91(48µ2 - 15.5µ3)Vm-0.5 + 1 + 2 + 3

{

if µ < 0.5D otherwise (56)

y 3 )

1 )

{(

∑ i∈G

ni + 4.5

1-9

)

-1

if

∑ ni g 1 i∈G 1

{

ni + 3

i∈G1-9

)

-1

0

if nCOOH g 1

∑ ni g 1 i∈G

if

0

otherwise

3

0 1

if

{

if µ e 0.5 otherwise

∑ ni g 1

i∈G3

1

(69)

(70)

otherwise

∑ ni e 0

(71)

- y 1 e 0

(72)

i∈G1

nUi

U nCOOH - y2nCOOH e0

(73)

y2 - nCOOH e 0

(74)

∑ ni e 0

(75)

- y 3 e 0

(76)

y3 -

(58)

i∈G3

∑ ni



(59)

nUi

i∈G3

where G3 ) {CH2Cl, CHCl, CHCl2}. Let us define a new binary variable, y such that

{

0

i∈G3

2.5

y )

if nCOOH g 1 otherwise

i∈G1

where G1-9 is as defined above.

{

0 1

{



(57)

otherwise

3 )

otherwise

(68)

∑ ni

where G1 ) {OH, CH3CO, CH2CO, CHO, CH2CN, CH2NO2, CHNO2} and G1-9 ) {CH3, CH2, CH, C, CH2dCH, CHdCH, CH2dC, CHdC, CdC}.

(∑

1

y 1 -

otherwise

-16

∑ ni g 1

i∈G1

i∈G1

0

2 )

{

if

0

The values of these binary variables are given by

The terms 1, 2, 3 are correction factors which depend on the functionality of the molecule and are defined as follows:

70

(67)

(60)

This is expressed equivalently as

µ - My e 0.5

(61)

M(y - 1) - µ e -0.5

(62)

+ + Four positive slack variables s,1 , s,1 , s,2 , and s,2 are defined, and  is given by

U where nCOOH is the maximum number of COOH groups allowed. Positive slack variables s+i,1, s-i,1, s+i,2, and s-i,1, i ) 1, 2, are introduced to define 1 and 2:

s+1,1 + s-1,1 - M(1 - y1) e 0

(77)

s+1,2 + s-1,2 - My1 e 0

(78)

1 - 70

(∑

i∈G1-9

ni + 4.5

)

-1

+ s+1,1 - s-1,1 ) 0

(79)

1 + s+1,2 - s-1,2 ) 0

(80)

+ s,1 + s,1 - My e 0

(63)

s+1,1, s-1,1, s+1,2, s-1,2 g 0

(81)

+ + s,2 - M(1 - y) e 0 s,2

(64)

s+2,1 + s-2,1 - M(1 - y2) e 0

(82)

+ - s,1 )0  - (nD2 + 0.1) + s,1

(65)

s+2,2 + s-2,2 - My2 e 0

(83)

Ind. Eng. Chem. Res., Vol. 45, No. 3, 2006 1133

2 + 16

(

∑ i∈G

)

ni + 3

+ s+2,1 - s-2,1 ) 0

-1

(84)

y1 )

2 + s+2,2 - s-2,2 ) 0

(85)

y2 )

s+2,1, s-2,1, s+2,2, s-2,2 g 0

(86)

y3 )

1-9

Finally, 3 is given by

3 - 2.5y3 ) 0

(87)

{ { {

1 0

if the molecule is acyclic otherwise

1 0

if the molecule is monocyclic otherwise

1 0

if the molecule is bicyclic otherwise

(93)

The following constraint then ensures that the molecule is acyclic, monocyclic, or bicyclic

y1 + y2 + y3 ) 1

(94)

2.3.10. Aromaticity, φ. The aromaticity is given by

φ

∑ i∈G

niκi -

∑ i∈G

niσi ) 0

(88)

where κi is the total number of non-hydrogen atoms in group i and σi is the number of aromatic carbon atoms in group i. 2.3.11. Electronegative Halogenicity, ψ. The electronegative halogenicity of the solvent is given by

ψ

niκi - ∑ niσi ) 0 ∑ i∈G i∈G

( ) niTbi ∑ i∈G

(90)

where tb0 is 204.359 K and Tbi is the contribution of group i to the boiling point. This is reformulated as a linear constraint by a variable transformation: Tbexp ) exp(Tb/tb0). Then,

Tbexp )

niTbi ∑ i∈G

∑ i∈G

niTmi

if the molecule is acyclic if the molecule is monocyclic if the molecule is bicyclic

(92)

where Tmi is the contribution of group i to the melting point. 2.4. Structural Feasibility Constraints. The constraints needed in problem 1 to ensure the molecule designed is structurally feasible, hsf(n,y) ) 0 and gsf(n,y) e 0, are introduced in this section. The atom groups from which the solvent molecule can be built are assigned to the following subsets of G: GAr, the set of aromatic hydrocarbon groups; GF, the set of nonaromatic functional groups; GArF, the set of aromatic functional groups; and GD, the set of groups containing a double bond. The set of basic groups, GBa, is defined as GBa ) G\(GF ∪ GAr ∪ GD ∪ GArF). Different feasibility constraints apply depending on whether the molecule is acyclic, monocyclic, or bicyclic.19 The following binary variables are introduced to denote the type of molecule designed.

(95)

The value of m is determined by

m - y 1 + y3 ) 0

(96)

In this work, only aromatic molecules are considered in the monocyclic and bicyclic classes. The octet rule19 ensures that the solvent molecule designed has no free bonds:

(2 - νi)ni - 2m ) 0 ∑ i∈G

(97)

where νi is the valency (number of free attachments) of group i. To prevent any group from being bonded to itself, the formation of a double bond, or the formation of more than one molecule, the following constraints are included19

nj(νj - 1) + 2m -

(91)

2.3.13. Melting Point, Tm. The melting point Tm is calculated by the first-order component of the group contribution method of ref 35. The transformed variable Tmexp ) exp(Tm/tm0) is used, where tm0 is 102.425 K

Tmexp )

{

1 m) 0 -1

(89)

where σi is the number of electronegative halogen atoms in group i. 2.3.12. Boiling Point, Tb. The group contribution method of ref 35 is used for the boiling point. Only the first-order terms are included here:

Tb ) tb0 ln

Furthermore, an intermediate variable, m, is introduced to represent the structural type

ni e 0 ∑ i∈G

∀j∈G

(98)

In a bicyclic molecule, the aC group allows the two rings to bind together. The number of aC groups must therefore be greater than or equal to 2.

2y3 - naC e 0

(99)

In an aromatic molecule, the number of aromatic groups must be equal to 6 if the molecule is monocyclic or 10 if it is bicyclic:



i∈GAr∪GArF

ni - 6y2 - 10y3 ) 0

(100)

The continuous variables ni, i ∈ G, that give the makeup of the molecule, are forced to take on integer values by introducing binary variables as follows Ki

2k-1yik - ni ) 0 ∑ k)1

∀i∈G

(101)

Ki where Ki is the smallest integer such that ∑k)1 2k-1 g nUi . 2.5. Property Design Constraints. Design constraints (gp(p) in problem 1) are imposed on the boiling and melting point to ensure that the solvent is liquid under the desired processing conditions. They are given by

1134

Ind. Eng. Chem. Res., Vol. 45, No. 3, 2006

exp

()

()

TLb TUb e Tbexp e exp tb0 tb0

(102)

where TLb and TUb are the desired lower and upper bounds on the boiling point of the solvent, and

()

()

TLm TUm exp e Tmexp e exp tm0 tm0

(103)

where TLm and TUm are the desired lower and upper bounds on the melting point of the solvent. 2.6. Molecular Complexity Constraints. The molecular complexity constraints (gmc(n)) allow the user to formulate mathematically any physical insights or heuristics on the nature of the solvent sought. The following constraint sets the minimum number of groups, Nmin, in the molecule

Nmin -

ni e 0 ∑ i∈G

(104)

The upper limit on the number of groups, Nmax, is given by

ni - Nmaxy1 e 0 ∑ i∈G



ni - NF,max e 0

∑ l)1

L

σAl (nD,A,B,r)Al(r) + σM(B,γ,φ,ψ)

(106)

(n ) (A) (B) (σˆ ll′ nD + σˆ ll′ A + σˆ ll′ B)Tll′(r) ∑ l′∈Λ D

ni -

e 0, ∀ i ∈ G

(nD) (A) (B) D) ˆ (A) ˆ (B) ˆ ll′ , σˆ ll′ , and σˆ ll′ are scalars that where σˆ (n l , σ l , σ l , σ depend on the basis set used for the calculations and the type of atoms l and l′ (H, C, N, O, S, F, Cl, Br, I, or P) and Tll′ is a function of solute geometry which depends on atom types. For atoms l of type P, F, Cl, Br, and I, Tll′ ) 0. σM is given by 2

2

2

3.1. Solution Strategy. 3.1.1. Optimization Algorithm. Solvent design problem 1 is a nonconvex mixed-integer nonlinear programming problem in which the objective function evaluation requires a quantum mechanical calculation of the free energy. Some of the equality constraints used for property prediction, such as eqs 23 and 30, are nonlinear. The problem is therefore solved using the outer approximation algorithm with equality relaxation.38-41 While the augmented penalty version42 could be used to account for the nonconvexity of the formulation, it will be shown in the case studies that this is not necessary. To avoid generating the same solvent more than once, an integer cut is added to the master problem at every iteration. 3.1.2. Gradient Evaluation. A primary concern in the efficient solution of the problem is the calculation of the seven derivatives of the objective function. A numerical evaluation would significantly increase the computational cost. In the SM5.42R model,1-4 the Gibbs free energy of solvation depends on the solvent properties as follows

∆G°s ) ∆GENP(,r) + GCDS(r,B,A,γ,nD,φ,ψ)

(109)

where ∆GENP accounts for the gain in electric polarization energy due to the mutual polarization of the solvent and the

(112)

2

) ) ) where σˆ (γ) ˆ (B ˆ (φ ˆ (ψ CS, σ CS , σ CS , and σ CS are scalars that depend only on the basis set used. Thus, for a solute geometry fixed to the gas-phase geometry (r ) r*), the following gradients can be calculated analytically

∂∆G°s

L

)

∑ l)1

(

)

L

σˆ (B) l +

(B) σˆ ll′ Tll′(r*) Al(r*) + ∑ l′)1 L

2

2σˆ (B )B ∂∆G°s

3. Solution of the Solvent Design Problem

2

) 2 ) 2 ) 2 ˆ (B ˆ (φ ˆ (ψ σM ) σˆ (γ) CSγ + σ CS B + σ CS φ + σ CS ψ

(107)

(108)

(111)

l

i∈GD

nUi

(110)

D) σAl ) σˆ (n ˆ (A) ˆ (B) l nD + σ l A+σ l B+

∂B

The number of groups of type i is restricted to nUi

Al(r) ∑ l)1

where the index l runs over the set of atoms in the solute (1 to L), Al is the exposed surface area of atom l, and σAl and σM are contributions to the surface tension of the solute. σAl is given by

2

The number of carbon-carbon double bonds in the molecule is restricted to ND,max

∑ ni - ND,max e 0

L

GCDS )

(105)

The number of functional groups is restricted to NF,max. i∈GF∪GArF

solute and for the energy cost of polarizing the solute and GCDS accounts for the first solvation shell effects, namely cavitation (C), dispersion (D), and solvent structural changes (S). The GCDS term is given by

∂A ∂∆G°s

L

)

∑ l)1

∂nD ∂∆G°s

L

)

∑ l)1

Al(r*) ∑ l)1

∂ψ

(114)

(

(115) L

D) σˆ (n + l

)

(n ) σˆ ll′ Tll′(r*) Al(r*) ∑ l′)1 D

(116)

L

2

) 2σˆ (φ )φ

∂φ ∂∆G°s

)

(A) σˆ ll′ Tll′(r*) Al(r*) ∑ l′)1

(113)

L

) σˆ (γ)

∂γ ∂∆G°s

(

L

σˆ (A) l +

Al(r*) ∑ l)1

Al(r*) ∑ l)1

(117)

L

2

Al(r*) ∑ l)1

) 2σˆ (ψ )ψ

(118)

All the quantities on the right-hand-side of these equations, other than B, φ, and ψ, are functions of the solute only and are, thus, constant from iteration to iteration. The gradients with respect to A, γ, and nD are solvent-independent when the solute geometry is fixed. 3.1.3. Implementation. At the beginning of iteration k, a solvent structure is fixed as specified by the binary combination y(k) and the primal problem is solved. The properties of the solvent are first evaluated. The objective function value and its gradients are then computed using GAMESOL-v.3.1,43 an implementation of SM5.42 in GAMESS,44,45 and the analytical

Ind. Eng. Chem. Res., Vol. 45, No. 3, 2006 1135 Table 1. Atom Groups Considered in the Small Design Space and Upper Bounds on the Number of Each Group

Table 2. Atom Groups Considered in the Large Design Space and Upper Bounds on the Number of Each Group

group i

group type

nUi

group i

group type

nUi

group i

CH3 CH2 CH C CH2dCH CHdCH CdC

basic basic basic basic double double double

7 7 4 1 1 1 1

OH CH3CO CH2CO CH3O CH2NH2 CH2CN COOH CH2Cl

functional functional functional functional functional functional functional functional

1 1 1 1 1 1 1 1

CH3 CH2 CH C CH2dCH CHdCH CH2dC CHdC CdC aCH aC aCCH3 aCCH2 aCCH aCOH aCNH2 aCCl aCF OH CH3CO CH2CO

expressions of eqs 114-122. The gradients of the nonlinear constraints and the Lagrange multipliers of the nonlinear equality constraints are calculated. These calculations are performed via the SNOPT optimization software:46 this allows for a general implementation which can be used when the solute geometry is relaxed in solution. This information is then used to construct the mixed-integer linear program (MILP) master problem. A GAMS47 file containing the problem is automatically generated. The MILP is solved, and a new solvent candidate is obtained, together with a lower bound on the optimal solution. A convergence test is applied. If necessary, the algorithm goes back to the primal problem for further iterations. Communication between the primal and the master problem is fully automated. The primal problem is solved on a Compaq professional workstation XP1000 with a 667 MHz Alpha processor and 512 Mb RAM. The master problem is solved on a Sun UltraSparc workstation with a 360 MHz processor and 512 Mb RAM. Over a series of 10 benchmark tests run by the Standard Performance Evaluation Corporation,48 the Compaq XP1000 workstation was found to be on average twice as fast as the Sun UltraSparc. The total computational times reported are the sum of the CPU times on both machines. 3.2. Case Studies. The proposed formulation is applied to the design of solvents for several solutes. Where possible, the results are discussed in the context of experimental data obtained from DECHEMA’s DETHERM database, accessed through the UK Chemical Database Service.49 Two of the case studies are studied in more detail to assess whether the optimization-based formulation can successfully identify appropriate solvents for a given solute and to determine the computational performance of the proposed method. The issue of local minima is also explored. The solvents designed with the proposed approach and those obtained through a minimization of the infinite dilution activity coefficient as calculated by UNIFAC (as described in ref 50) are compared. 3.2.1. Description of Case Studies. The key motivation for resorting to a quantum mechanical solvation model is the ability to predict solvation for solutes for which UNIFAC parameters are not readily available or where proximity effects may be important. With this in mind, all the solutes chosen are aromatic compounds with one or two functional groups. Such compounds can be seen as representative precursors of the types of compounds found in the pharmaceutical and agrochemical industries.5 Two sets of atom groups, or design spaces, are used to identify optimal solvents. A small design space consisting of the 14 groups listed in Table 1 is considered. The second set is a large design space in which the 41 groups listed in Table 2 are considered. The design parameters used for both design spaces are listed in Table 3. Both design spaces contain common solvents as well as molecules which have not yet been synthesized, allowing a broad range of designs to be identified. The number of structurally feasible molecules that can be

group type basic basic basic basic double double double double double aromatic aromatic aromatic aromatic aromatic arom/func arom/func arom/func arom/func functional functional functional

nUi

group i

group type

nUi

7 7 4 1 1 1 1 1 1 6 4 1 1 1 1 1 1 1 1 1 1

CHO CH3COO CH2COO CH3O CH2O CHsO CH2NH2 CH3NH CH2NH CH3N CH2N CH2CN COOH CH2Cl CHCl CHCl2 CH2NO2 CHNO2 I Br

functional functional functional functional functional functional functional functional functional functional functional functional functional functional functional functional functional functional functional functional

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Table 3. Design Parameters for the Small and Large Design Spaces Nmin

Nmax

NF,max

ND,max

2

10

1

1

TLm (K)

TUm (K)

TLb (K)

TUb (K)

150

270

310

600

Table 4. Number of Feasible Compounds for Each Design Space With and Without Temperature Constraints, Listed by Chemical Class small design space

large design space

chemical class

w/temp constrs

total

w/temp constrs

total

alkanes/alkenes alcohols ketones aldehydes esters ethers amines nitriles carboxylic acids chloroalkanes nitroalkanes iodoalkanes bromoalkanes aromatics total

92 114 188

117 117 208

112 88 111 8 114

117 117 117 117 117

827

1027

137 165 277 161 288 379 611 162 9 450 277 161 163 8 3248

168 168 297 168 297 391 688 168 168 465 297 168 168 14 3625

constructed based on these parameters is 1027 for the small design space and 3625 for the large design space. If the temperature constraints on the boiling and melting points are imposed to remove some molecules from the design spaces, these numbers are reduced to 827 and 3248, respectively. The number of feasible structures per chemical class is shown in Table 4. All quantum mechanical calculations were performed at the HF/6-31G* level. 3.2.2. Case Study 1: m-Cresol. The solvation of a small monofunctional solute, m-cresol, is first considered, and several issues are examined. The quality of the predictions of ∆G°s using the proposed combination of group contribution methods and the SM5.42 model is discussed. Then, the performance of the algorithm is investigated in terms of solution quality and computational cost. (A) Quality of predictions. Experimental and predicted values of ∆G°s in 13 solvents are shown in Table 5 and in Figure 1. The experimental values of ∆G°s (∆G°s,exp) are taken

1136

Ind. Eng. Chem. Res., Vol. 45, No. 3, 2006

Table 5. Free Energy of Solvation (in kcal mol-1) of m-Cresol in Different Solvents, Measured Experimentally and Calculated Using Predicted (ppred) and Measured (pexpt) Solvent Properties experiment

ppred

pexpt

solvent

∆G°s

rank

∆G°s

rank

∆G°s

rank

hexanol octanol decanol diethyl ether 1,2-dichloroethane chloroform benzene iodobenzene water cyclohexane octane decalin heptane

-8.42 -8.20 -8.01 -7.95 -6.91 -6.70 -6.66 -6.04 -5.49 -5.20 -5.19 -5.11 -5.01

1 2 3 4 5 6 7 9 10 11 12 13 14

-8.52 -8.31 -8.29 -7.94 -8.24 -7.04 -6.92 -7.66 -5.14 -6.03 -5.76 -5.27 -5.82

1 2 3 5 4 7 8 6 13 9 11 12 10

-8.53 -8.37 -8.14 -7.85 -7.70 -7.04 -6.72 -7.33 -5.14 -5.63 -5.68 -5.60 -5.71

1 2 3 4 5 7 8 6 13 11 10 12 9

from ref 4. The predicted ∆G°s values are calculated using GAMESOL43 with solvent property values as measured experimentally (pexpt) or solvent property values as predicted by the structure-property constraints given in section 2.3 (ppred). In Table 5, the solvent rank based on ordering the ∆G°s values is shown with the most negative value first. With minor exceptions, the ∆G°s(ppred) values closely mirror the ∆G°s(pexpt) values. The average absolute error is 0.18 kcal mol-1, indicating that the prediction of the solvent properties is sufficiently accurate. The ∆G°s(ppred) values for chloroform and water are equal to their ∆G°s(pexpt) values. This is because these compounds are represented by a single atom group and, by design, the solvent-property relationships predict the experimental values exactly. The only solvent which has a ∆G°s(ppred) value significantly different from its ∆G°s(pexpt) is 1,2-dichloroethane. The errors in the predicted parameters for 1,2dichloroethane are relatively small, and when considering the error caused by using one predicted parameter at a time, the absolute error in the free energy of solvation lies between 0.01 kcal mol-1 and 0.34 kcal mol-1. However, when combining all predicted parameters, the errors are amplified and an absolute deviation of 0.54 kcal mol-1 is observed. There is also good agreement in Table 5 between the ∆G°s(ppred) values and the ∆G°s,exp values. The average absolute error is 0.52 kcal mol-1. Of the 13 solvents, two have absolute errors greater than 1.0 kcal mol-1: 1,2 dichloroethane and iodobenzene. These compounds also have a relatively large error in the ∆G°s(pexpt) predictions, so this cannot be solely attributed to the use of predicted solvent properties. Last, the aliphatic alkanes have values of ∆G°s(ppred) that are systematically more negative than either the ∆G°s(pexpt) or experimental ∆G°s values.

Figure 1. Experimental ∆G°s for m-cresol versus predicted ∆G°s using predicted (∆G°s(ppred)) and measured (∆G°s(pexpt)) solvent properties.

Figure 2. Primal and master problem solutions versus iteration number for the m-cresol case study (small design space).

Figure 3. Candidate solvent structures generated by the algorithm against the set of all possible feasible structures for m-cresol (small design space).

However, this underestimation does not affect the positioning of this class of solvent in the ranking. Overall, similar rankings of the molecules are obtained when using experimental or predicted free energies of solvation. It is therefore possible to use the proposed approach of combining group contribution and quantum mechanics to obtain a solvent ranking on the basis of the free energy of solvation. (B) Small Design Space. The optimal solvent for m-cresol in the small design space was obtained by using hexane as a starting point. Aliphatic alkanes are known to be poor solvents for m-cresol. The algorithm identifies 2-propanol and 2-methyl2-propanol as the optimal solvents. They have the same objective function value. The top 13 solvents found by the algorithm are alcohols. The progress of the algorithm can be seen in Figure 2, which shows the solution of the primal and master problems at every iteration. The algorithm terminates due to an infeasible master problem. The large plateaus indicate that, for a significant number of iterations, the addition of constraints to the master problem does not reduce the feasible space further. The majority of the structures generated by the master problem are alcohols with smaller numbers of carboxylic acids, amines, and ketones also tested. Most of these are good candidate solvents. This can be seen in Figure 3 in which the candidate structures tested and all 1027 possible solvent structures are shown. The

Ind. Eng. Chem. Res., Vol. 45, No. 3, 2006 1137 Table 6. Number of Iterations and CPU Time for the m-Cresol Case Study design space

no. of iterations

primal CPU time (s)

master CPU time (s)

total CPU time (s)

small large

113 120

25807 28561

68 192

25875 28753

algorithm tests the 50 best candidate structures (i.e., those with the most negative ∆G°s(ppred) values) and 86 out of the 100 best candidates. Many of these are alcohols, which can be expected to be good solvents for m-cresol. The full exploration of the design space shows that 2-propanol and 2-methyl-2-propanol are in fact the global solutions. Because the melting and boiling point constraints are included in the master problem, the maximum number of structures that can be generated by the master problem is 827 (see Table 4). As shown in Table 6, the algorithm terminates after 113 structures have been tested, which represents only 13.7% of the design space. The CPU times given in Table 6 clearly show that the burden of the computation is due to the solution of the primal problem, which is largely dominated by the cost of the quantum mechanical calculations. In the current implementation, the gradient of the objective function with respect to the dielectric constant is obtained by a central difference and, therefore, takes up a significant portion of the cost. On average, each ∆G°s calculation requires 75 CPU s. Thus, a complete evaluation of the design space requires approximately 62 000 CPU s. The proposed approach is more than twice as fast, despite the cost of the gradient evaluations. (C) Large Design Space. The CPU times for this case study using the large design space are presented in Table 6. The algorithm requires only seven iterations more for the large design space than for the small design space, although there are 2421 additional feasible structures in the large design space. The best solvent is again 2-propanol, and the solution of the master problem generates many alcohols, as can be seen from Figure 4, which shows the structures explored by the algorithm against the set of all feasible structures. The main changes that result from the enlargement of the design space are that a number of carboxylic acids are tested before the alcohols and that structures containing the CH2NH group are generated. The 20 structures with the most negative ∆G°s are generated by the algorithm. Of the 100 structures with the lowest value of ∆G°s, 71 are selected. The progress of the algorithm as a function of the iteration number (Figure 5) shows the same trends as for the small design space. Less than 4% of the total search space is

Figure 4. Candidate solvent structures generated by the algorithm against the set of all possible feasible structures for m-cresol (large design space).

Figure 5. Primal and master problem solutions versus iteration number for the m-cresol case study (large design space). Table 7. Free Energy of Solvation (in kcal mol-1) of p-Bromophenol in Different Solvents, Experimental Values4 and Calculated Values Using Predicted (ppred) and Measured (pexpt) Solvent Properties experiment

ppred

pexpt

solvent

∆G°s

rank

∆G°s

rank

∆G°s

rank

pentanol octanol hexanol heptanol nonanol decanol 1,2-dichloroethane dibromomethane benzene toluene chloroform ethylbenzene bromobenzene iodobenzene cyclohexane water hexane

-10.6 -10.5 -10.5 -10.4 -10.3 -10.3 -9.10 -9.01 -8.81 -8.70 -8.59 -8.54 -8.49 -8.45 -7.14 -7.13 -6.96

1 2 3 4 5 6 7 8 9 10 12 13 14 15 16 17 18

-9.61 -9.41 -9.54 -9.48 -9.34 -9.28 -9.75 -9.42 -7.95 -7.74 -8.66 -7.61 -8.75 -8.69 -7.01 -4.57 -6.86

2 6 3 4 7 8 1 5 12 13 11 14 9 10 15 17 16

-9.87 -9.36 -9.50 -9.43 -9.24 -9.14 -8.62 -8.56 -7.73 -7.67 -8.66 -7.58 -8.49 -8.38 -6.59 -4.57 -6.67

1 4 2 3 5 6 8 9 12 13 7 14 10 11 16 17 15

tested. The proposed approach is almost 8.5 times faster than an evaluation of all possible structures. Although experimental data for many binary mixtures containing m-cresol are available,49 the reported temperatures are not consistent and it is, thus, not possible to compare the computational results with measurements. (D) Comparison with UNIFAC. UNIFAC is well-suited for m-cresol, a small solute with only one functional group which can be represented by the aromatic group aCOH. Nonzero interaction parameters are reported for aCOH with 14 of the other 40 groups. The top solvents found are all alkenols, while the top solvents based on the QM approach are all alkanols. 3.2.3. Case Study 2: p-Bromophenol. In the second case study, a larger problem is considered. The solute, p-bromophenol, is an aromatic compound with two functional groups. This results in more complex interactions with the solvent and in more expensive quantum mechanical calculations (approximately 85 CPU s per free energy of solvation). (A) Quality of Predictions. Experimental values for ∆G°s in 17 solvents and calculated values using predicted and measured solvent property values are given in Table 7. The ∆G°s(ppred) values generally follow the same trend as the ∆G°s(pexpt) values, and the average absolute difference between the two sets of values is only 0.2 kcal mol-1. There is a larger difference between experimental and calculated values of 0.8 kcal mol-1.

1138

Ind. Eng. Chem. Res., Vol. 45, No. 3, 2006

Table 8. Number of Iterations and CPU Time for the p-Bromophenol Case Study search space

no. of iterations

primal CPU time (s)

master CPU time (s)

total CPU time (s)

small large

66 116

16766 30459

36 261

16802 30720

Table 9. Summary of Results for Four Bifunctional Aromatic Solutes solute

no. of iterations

primal CPU time (s)

master CPU time (s)

p-hydroxybenzoic acid p-aminobenzoic acid p-chlorobenzoic acid paracetamol

118 166 18 112

49705 48724 7457 63822

209 424 8 351

Nevertheless, the experimental and calculated solvent rankings follow very similar trends. (B) Solvent Design. The algorithm is applied to both design spaces shown in Table 4. In both cases, the algorithm identifies the global solutions (2-propanol and 2-methyl-2-propanol). The number of iterations and the CPU times are shown in Table 8. The algorithm converges to the solution in a small number of iterations. In the small design space, the algorithm tests only 8% of the possible structures and is 4 times faster than an exhaustive evaluation. In the large design space, less than 4% of the possible structures are tested and the algorithm is almost 9 times faster. The iteration progress and exploration of the design space proceed in a manner similar to the case of m-cresol. In particular, the algorithm quickly narrows in on good candidate solvents, which are primarily alcohols. No experimental data were found for p-bromophenol mixtures. (C) Comparison with UNIFAC. The functional group aCOH can be used when modeling p-bromophenol with UNIFAC. However, no aCBr group is available, and the distinct groups aC and Br must be used instead. Wittig et al.8 indicate that an aCBr group is available in D-UNIFAC and that interaction parameters have been regressed between aCBr and 11 other groups. However, the parameter values are not in the public domain. Kang et al.11 have also proposed an aCBr group in KT-UNIFAC, but interaction parameters with only two other groups are available. Furthermore, whatever version of UNIFAC is used, it is not possible to distinguish between isomers of bromophenol and, therefore, to account accurately for the different degree of interaction between the OH and Br groups on the solute. In contrast, the QM approach used takes the full connectivity of the solute into account and is, thus, able to differentiate isomers. The top 10 solvents found by UNIFAC consist primarily of alkenols, with the exception of ethyl acetate (solvent 2) and vinyl acetate (solvent 7). The QM-based method identifies the top 10 solvents as alkanols. 3.2.4. Solvent Design for Bifunctional Molecules. The proposed approach has been further tested on several other solutes, all of which are bifunctional aromatics. They include p-hydroxybenzoic acid, p-aminobenzoic acid, p-chlorobenzoic acid, and paracetamol. The results for these four solutes, using the large design space, are summarized in Table 9. In all cases, the algorithm identifies top candidate solvents by considering only a fraction of the total design space. (A) Solvent Design. For p-hydroxybenzoic acid, the top 10 solvents are all alcohols (ethanol and derivatives of propanol and butanol), with predicted free energies of solvation ranging from -16.2 to -15.9 kcal mol-1. Experimental data51 show that, at 298.15 K, the best solvents tested are 2-propanol, ethanol, acetone, methanol, 1-octanol, MEK (Methyl Ethyl Ketone), and ethyl acetate in order of decreasing solubility. Toluene and water

are found to be very poor solvents. Solubility measurements in heptane at higher temperatures52 also show that heptane is a poor solvent. The top 10 solvents for p-aminobenzoic acid are also alcohols, from ethanol to pentanol derivatives, with a similarly narrow range of free energy of solvation values. Experimental data show that ethanol, ethyl acetate, and methanol are the three best solvents in which solubilities have been measured at 298.15 K.53 Much lower solubilities have been measured in trichloromethane, benzene,53 and water.54 For p-chlorobenzoic acid, the top five candidates are alcohols (∆G°s around -12.6 kcal mol-1), followed by some carboxylic acids (∆G°s around -9.2 kcal mol-1). For this particular solute, the algorithm converges in few iterations and, therefore, identifies few candidate solvents. Available experimental data at 293.15 K55 show acetone as the best solvent for which measurements have been made. It is followed by ethanol and methanol. The solubility of p-chlorobenzoic in ethylbenzene, heptane, cyclohexane, benzene, and water is much lower. In the solvent design problem, acetone is eliminated from the search space because its predicted boiling point (305.4 K) is lower than the minimum boiling point allowed (310 K). If acetone is evaluated using the QM model, its free energy of solvation is found to be -13.7 kcal mol-1. Thus, it is likely that it would be identified as a top solvent candidate if the boiling point bounds were widened. In fact, the experimentally measured boiling point of acetone is 329.3 K:56 to identify acetone as a potential candidate, the uncertainty in the property prediction techniques should be explicitly accounted for by relaxing the constraints or by adopting a probability-based problem formulation.58 Finally, for paracetamol, the top 10 solvent candidates are alcohols (mostly butanol derivatives) with a narrow range of free energy of solvation values. Solubility measurements57 show that, while alcohols are good solvents, higher solubility is observed in dimethylformamide (DMF), dimethyl sulfoxide (DMSO), and diethylamine. However, the free energies of solvation calculated using predicted or experimental properties for these solvents are higher for alcohols. This highlights the need to use solubility as an objective function, to identify other solvent candidates. (B) Comparison with UNIFAC. For p-hydroxybenzoic acid, p-aminobenzoic acid, and p-chlorobenzoic acid, the groups aC and COOH must be used in the absence of an aCCOOH group. The other functional group for each compound is available in aromatic form (i.e., aCOH, aCNH2, and aCCl). For p-hydroxybenzoic acid, both UNIFAC and the proposed approach identify alkanols as top solvents. For p-aminobenzoic acid, UNIFAC identifies primarily amines and nitrites, whereas the QM-based approach results in alcohols. Finally, for p-chlorobenzoic acid, the top UNIFAC solvents are aldehydes and carboxylic acids, while the top QM solvents are alcohols and carboxylic acids. If acetone is included in the UNIFAC search space, it is predicted to be a poor solvent. Unfortunately, paracetamol cannot be represented with the groups available in the version of UNIFAC used. It could be represented using the aCNHR group of D-UNIFAC,8 but the parameters for this group are not available in the public domain. In KT-UNIFAC,11 a suitable acetamide group (aCNHCO) is available, but no interaction parameters have yet been regressed between this group and others. 4. Concluding Remarks A solvent design problem was formulated by combining a quantum mechanical description of the solute, a continuum

Ind. Eng. Chem. Res., Vol. 45, No. 3, 2006 1139

solvation model based on several bulk solvent properties, and group contribution methods for the prediction of these properties. In contrast with previous work,27 a deterministic optimization approach was used, and it was shown that a very large search space could be explored efficiently. The outer-approximation algorithm was used successfully to solve a series of solvent design problems based on the simple objective of minimizing the free energy of solvation. Aromatic solutes with different functional groups were used, as precursors to pharmaceutical and agrochemical solutes. This study demonstrates the feasibility of incorporating quantum level calculations within molecular design. Good quality solvent candidates were obtained during the course of the optimization. The entire search space was explored for two solutes, and it was shown that the global solution was found in both cases. All problems were solved in few iterations, with very modest computational costs compared to a full evaluation of the design space. These results indicate that there are significant benefits to be gained from the combination of computational chemistry and molecular design tools, particularly as they allow the full connectivity of the solute to be taken into account. To make such approaches applicable to practical problems, several issues must be addressed. As demonstrated in the case of paracetamol, the computational chemistry calculations should be related to properties that are directly relevant at the process level, such as solubility or activity coefficients. Such an approach can then be integrated into a wider solvent design problem which can consider tradeoffs between conflicting objectives. Lehmann and Maranas27 calculated activity coefficients from quantum mechanics for solvent design, but they highlighted some limitations of the solvation model they used in terms of the solutes that can be treated and of the dependence of the results on the coordinate system used. The model used here does not suffer from these limitations but has not yet been used to compute activity coefficients. As more widely applicable models are developed, there will be further opportunities to combine computational chemistry and design techniques. As larger problems are considered, computational efficiency is likely to become a limiting factor, and a tight integration between quantum mechanical calculations and the optimization algorithms will be needed in order to avoid unnecessary calculations. Furthermore, energy minimization plays a central role in reliable property prediction. It is needed, for instance, to lift the assumption of a fixed solute geometry in solution. Thus, the wider use of computational chemistry in design is likely to require the formulation of multilevel optimization problems, for which efficient solution methods must be developed. Acknowledgment The authors gratefully acknowledge financial support from the UK Engineering and Physical Sciences Research Council and Syngenta under the CASE for New Academics scheme. They also acknowledge the use of the EPSRC’s Chemical Database Service at Daresbury. Literature Cited (1) Zhu, T. H.; Li, J.; Hawkins, G. D.; Cramer, C. J.; Truhlar, D. G. Density Functional Solvation Model based on CM2 Atomic Charges. J. Chem. Phys. 1998, 109 (20), 9117. (2) Zhu, T. H.; Li, J.; Hawkins, G. D.; Cramer, C. J.; Truhlar, D. G. Density Functional Solvation Model based on CM2 Atomic Charges (vol 109, pg 9117, 1998). J. Chem. Phys. 1999, 111 (12), 5624. (3) Li, J. B.; Hawkins, G. D.; Cramer, C. J.; Truhlar, D. G. Universal Reaction Field Model based on Ab Initio Hartree-Fock Theory. Chem. Phys. Lett. 1998, 288 (2-4), 293.

(4) Li, J.; Zhu, T.; Hawkins, G. D.; Winget, P.; Liotard, D. A.; Cramer, C. J.; Truhlar, D. G. Extension of the Platform of Applicability of the SM5.42R Universal Solvation Model. Theor. Chem. Acc. 1999, 103, 9. (5) Kola´ø, P.; Shen, J.-W.; Tsuboi, A.; Ishikawa, T. Solvent Selection for Pharmaceuticals. Fluid Phase Equilib. 2002, 194-197, 771. (6) Abildsov, J.; O’Connell, J. P. Predicting the Solubilities of Complex Chemicals I. Solutes in Different Solvents. Ind. Eng. Chem. Res. 2003, 42, 5622. (7) Fredenslund, A.; Jones, R. L.; Prausnitz, J. M. Group Contribution Estimation of Activity Coefficients in Non-Ideal Liquid Mixtures. AIChE J. 1975, 21 (6), 1086. (8) Wittig R.; Lohmann, J.; Gmehling, J. Vapor-Liquid Equilibrium by UNIFAC Group Contribution. 6. Revision and Extension. Ind. Eng. Chem. Res. 2003, 42, 183. (9) Larsen B. L.; Rasmussen P.; Fredenslund A. A Modified UNIFAC Group-Contribution Model for Prediction of Phase-Equilibria and Heats of Mixing. Ind. Eng. Chem. Res. 1987, 26 (11), 2274. (10) Abildskov, J.; O’Connell, J. P. Prediction of Solubilities of Complex Medium-Sized Chemicals. II Solutes in Mixed Solvents. Mol. Simul. 2004 30 (6), 367. (11) Kang, J. W.; Abildskov. J.; Gani, R.; Cobas, J. Estimation of Mixture Properties from First- and Second-Order Group Contributions with the UNIFAC Model. Ind. Eng. Chem. Res. 2002, 41 (13), 3260. (12) Eckert, F.; Klamt, A. Fast Solvent Screening via Quantum Chemistry: COSMO-RS Approach AIChE J. 2002, 48 (2), 369. (13) Lin, S.-T.; Sandler, S. I. Infinite Dilution Activity Coefficients from Ab Initio Solvation Calculations. AIChE J. 1999, 45 (12), 2606. (14) Lin, S.-T.; Sandler, S. I. A Priori Phase Equilibrium Prediction from a Segment Contribution Solvation Model Ind. Eng. Chem. Res. 2002, 41, 899. (15) Nanu, D. E.; Mennucci, B.; de Loos, T. W. Predicting Infinite Dilution Activity Coefficients with the Group Contribution Solvation Model: An Extension of its Applicability to Aqueous Systems. Fluid Phase Equilib. 2004, 221 (1-2), 127. (16) Amovilli, C.; Mennucci, B. Self-Consistent-Field Calculation of Pauli Repulsion and Dispersion Contributions to the Solvation Free Energy in the Polarizable Continuum Model. J. Phys. Chem. B 1997, 101 (6), 1051. (17) Tomasi, J.; Mennucci, B.; Cance`s, E. The IEF Version of the PCM Solvation Method: An Overview of a New Method addressed to Study Molecular Solutes at the QM Ab Initio Ievel. J. Mol. Struct. (THEOCHEM) 1999, 464 (1-3), 211. (18) Macchietto, S.; Odele, O.; Omatsone, O. Design of Optimal Solvents for Liquid-Liquid Extraction and Gas Absorption Processes. Chem. Eng. Res. Des. 1990, 68, 429. (19) Odele, O.; Macchietto, S. Computer Aided Molecular Design: A Novel Method for Optimal Solvent Selection. Fluid Phase Equilib. 1993, 82, 47. (20) Churi, N.; Achenie, L. E. K. Novel Mathematical Programming Model for Computer Aided Molecular Design. Ind. Eng. Chem. Res. 1996, 35 (10), 3788. (21) Stefanis, S. K.; Pistikopoulos, E. N. Methodology for Environmental Risk Assessment of Industrial Nonroutine Releases. Ind. Eng. Chem. Res. 1997, 36 (9), 3694. (22) Pistikopoulos, E. N.; Stefanis, S. K. Optimal Solvent Design for Environmental Impact Minimization. Comput. Chem. Eng. 1998, 22 (6), 717. (23) Buxton, A.; Livingston, A. G.; Pistikopoulos, E. N. Optimal Design of Solvent Blends for Environmental Impact Minimization. AIChE J. 1999, 45, 817 (24) Hostrup, M.; Harper, P. M.; Gani, R. Design of Environmentally Benign Processes: Integration of Solvent Design and Separation Process Synthesis. Comput. Chem. Eng. 1999, 23 (10), 1395. (25) Marcoulaki, E. C.; Kokossis, A. C. On the Development of Novel Chemicals using a Systematic Optimisation Approach. Part II. Solvent Design. Chem. Eng. Sci. 2000, 55 (13), 2547. (26) Papadopoulos, A. I.; Linke, P. On the Synthesis and Optimization of Liquid-Liquid Extraction Processes using Stochastic Search Methods. Comput. Chem. Eng. 2004, 28 (11), 2391. (27) Lehmann, A.; Maranas, C. D. Molecular Design Using Quantum Chemical Calculations for Property Estimation. Ind. Eng. Chem. Res. 2004, 43, 3419. (28) Harper, P. M.; Gani, R. A Multi-step and Multi-Level Approach for Computer Aided Molecular Design. Comput. Chem. Eng. 2000, 24 (27), 677. (29) Achenie, L. E. K., Gani, R., Venkatasubramanian, V. Eds. Computer-Aided Molecular Design: Theory and Practice; Elsevier Science: Amsterdam, 2002.

1140

Ind. Eng. Chem. Res., Vol. 45, No. 3, 2006

(30) Giovanoglou, A.; Barlatier, J.; Adjiman, C. S.; Pistikopoulos, E. N.; Cordiner, J. L. Optimal Solvent Design for Batch Separation based on Economic Performance. AIChE J. 2003, 49 (12), 3095 (31) Hugo, A.; Ciumei, C.; Buxton, A.; Pistikopoulos, E. N. Environmental Impact Minimisation through Material Substitution: A MultiObjective Optimization Approach. Green Chem. 2004, 6 (8), 407. (32) Jensen, F. Introduction to Computational Chemistry; Wiley and Sons: New York, 1998. (33) Sheldon, T. J.; Adjiman, C. S.; Cordiner, J. L. Pure Component Properties from Group Contribution: Hydrogen-Bond Basicity, HydrogenBond Acidity, Hildebrand Solubility Parameter, Macroscopic Surface Tension, Dipole Momemt, Refractive Index and Dielectric Constant. Fluid Phase Equilib. 2005, 231, 27. (34) Constantinou, L.; Gani, R.; O’Connell, J. P. Estimation of the Acentric Factor and the Liquid Molar Volume at 298 K using a New Group Contribution Method. Fluid Phase Equilib. 1995, 103, 11. (35) Constantinou, L.; Gani, R. New Group Contribution Method for Estimating Properties of Pure Components. AIChE J. 1994, 40 (10), 1697. (36) Marrero, J.; Gani, R. Group-Contribution based Estimation of Pure Component Properties. Fluid Phase Equilib. 2001, 183-184, 183. (37) Apostolakou, A.; Adjiman, C. S. In Computer-Aided Molecular Design: Theory and Practice; Achenie, L. E. K., Gani, R., Venkatasubramanian, V., Eds.; Elsevier Science: Amsterdam, 2002; pp 63-94. (38) Duran, M. A.; Grossmann, I. E. An Outer Approximation Algorithm for a Class of Mixed Integer Non-Linear Programs. Math. Programming 1986, 36, 307. (39) Duran, M. A.; Grossmann, I. E. A Mixed-Integer Nonlinear Programming Algorithm for Process Synthesis. AIChE J. 1986, 32 (4), 592. (40) Kocis, G. R.; Grossmann, I. E. Relaxation Strategy for the Structural Optimization of Process Flow Sheets. Ind. Eng. Chem. Res. 1987, 26 (9), 1869. (41) Fletcher, R.; Leyffer, S. Solving Mixed-Integer Nonlinear Programs by Outer Approximation. Math. Programming 1994, 66 (3), 327. (42) Viswanathan, J.; Grossmann, I. E. A Combined Penalty Function and Outer Approximation Method for MINLP Optimisation. Comput. Chem. Eng. 1990, 14 (7), 769. (43) Xidos, J. D.; Li, J.; Zhu, T.; Hawkins, G. D., Chuang, Y.-Y.; Fast, P. L.; Liotard, D. A.; Rinaldi, D.; Cramer, C. J.; Truhlar, D. G. GAMESOL: A Module Incorporating the SM5.42 Solvation Models, the CM2 and CM3 Charge Models and Lo¨wdin Population Analysis in GAMESS. Users Manual; http://comp.chem.umn.edu/gamesol/gamesol_ Users_Manual_3.1.pdf, University of Minnesota: Minneapolis, MN, 2002 (accessed Dec 21, 2005). (44) Schmidt, M. W.; Baldridge, K. K.; Boatz, J. A.; Elbert, S. T.; Gordon, M. S.; Jensen, J. H.; Koseki, S.; Matsunaga, N.; Nguyen, K. A.; Su, S. J.; Windus, T. L.; Dupuis, M.; Montgomery, J. A. J. Comput. Chem. 1993, 14, 1347.

(45) GAMESS homepage, http://www.msg.ameslab.gov/GAMESS/ GAMESS.html, (accessed Dec 21, 2005). (46) Gill, P. E.; Murray, W.; Saunders, M. A. User’s Guide for Snopt Version 6, A Fortran Package for Large-Scale Nonlinear Programming; http://www.sbsi-sol-optimize.com, Stanford Business Software Inc.: Palo Alto, CA, 2002. (47) Brooke, A.; Kendrick, D.; Meeraus, A.; Raman, R. GAMS A User’s Guide; http://www.gams.com/docs/gams/GAMSUsersGuide.pdf, GAMS Development Corporation: Washington, DC, 2006. (48) Standard Performance Evaluation Corporation, SPEC., Gaithersburg, MD. http://www.spec.org, 2005 (accessed Dec 21, 2005). (49) Fletcher, D. A.; McMeeking, R. F.; Parkin, D. The United Kingdom Chemical Database Service, J. Chem. Inf. Comput. Sci. 1996, 36, 746. (50) Poling, B. E.; Prausnitz, J. M.; O’Connell, J. P. The Properties of Gases and Liquids, 5th ed.; McGraw-Hill: New York, 2001. (51) Gracin, S.; Rasmuson, A. C. Solubility of Phenylacetic Acid, p-Hydroxyphenylacetic Acid, p-Aminophenylacetic Acid, p-Hydroxybenzoic Acid and Ibuprofen in Pure Solvents. J. Chem. Eng. Data 2002, 47, 1379. (52) Sidgwick, N. V.; Ewbank, E. K. CVII. - The Influence of Position on the Solubilities of Substituted Benzoic Acids. J. Chem. Soc. 1921, 119, 979. (53) Lazzell, C. L.; Johnston, J. Solubility Relations of Isomeric Organic Compounds. VIII. Solubility of the Aminobenzoic Acids in Various Liquids. J. Phys. Chem. 1928, 32, 1331. (54) Berezina, L. P.; Ennan, A. A.; Gudimovich, T. F.; Sokhranenko, G. P. Solubility of Aminobenzenecarboxylic Acids in Solutions of Fluorosilicic Acid at 25 C. IzV. Vyssh. Uchebn. ZaVed., Khim. Khim. Tekhnol. 2000, 43, 56. (55) Lashanizadegan, A.; Newsham, D. M. T.; Tavare, N. S. Separation of Chlorobenzoic Acids by Dissociation Extractive Crystallization. Chem. Eng. Sci. 2001, 56, 2335. (56) Brown, R. L.; Stein, S. E. Boiling Point Data. In NIST Chemistry WebBook, NIST Standard Reference Database; Linstrom, P. J., Mallard, W. G., Eds.; Number 65; National Institute of Standards and Technology: 2005; http://webbook.nist.gov. (57) Granberg, R. A.; Rasmuson, A. C. Solubility of Paracetamol in Pure Solvents. J. Chem. Eng. Data 1999, 44, 1391. (58) Maranas, C. D. Optimal Molecular Design under Property Prediction Uncertainty. AIChE J. 1997, 43, 1250.

ReceiVed for reView April 3, 2005 ReVised manuscript receiVed October 30, 2005 Accepted November 22, 2005 IE050416R