8354
Ind. Eng. Chem. Res. 2004, 43, 8354-8362
Solubility Modeling with a Nonrandom Two-Liquid Segment Activity Coefficient Model Chau-Chyun Chen* and Yuhua Song Aspen Technology, Inc., Ten Canal Park, Cambridge, Massachusetts 02141
A segment contribution activity coefficient model, derived from the polymer nonrandom twoliquid model, is proposed for fast, qualitative estimation of the solubilities of organic nonelectrolytes in common solvents. Conceptually, the approach suggests that one account for the liquid nonideality of mixtures of complex pharmaceutical molecules and small solvent molecules in terms of interactions between three pairwise interacting conceptual segments: hydrophobic segment, hydrophilic segment, and polar segment. In practice, these conceptual segments become the molecular descriptors used to represent the molecular surface characteristics of each solute and solvent molecule. The treatment results in component-specific molecular parameters: hydrophobicity X, polarity Y, and hydrophilicity Z. Once the molecular parameters are identified from experimental data for common solvents and solute molecules, the model offers a simple and practical thermodynamic framework to estimate solubilities and to perform other phase equilibrium calculations in support of pharmaceutical process design. Introduction Solvent selection is a critical task in the chemical synthesis and recipe development phase of the pharmaceutical and agricultural chemical industries.1-3 The choice of solvents directly impacts the reaction rates, extraction efficiency, crystallization yield, etc. Proper solvent selection results in faster product separation and purification, reduced solvent emission and lesser waste, higher yield, lower overall cost, and better production processes. Solubility is a key property of concern in solvent selection because pharmaceutical product isolation is often done through crystallization at reduced temperature and/or with the addition of antisolvent. Solubility data involving new drug molecules and their precursors in the solvents rarely exist, if any. Although limited solubility experiments are taken for a few solvents as part of the process development practice, the experimental task can multiply rapidly when one considers the choices of solvents and solvent-antisolvent mixtures, the effect of temperature, the impacts of impurities, the possibilities of multiple polymorphs, etc. As a result, solvent selection is largely dictated by researchers’ preferences or prior experiences. Existing solubility estimation techniques are best represented by the Hansen model,4 the UNIFAC group contribution model,5 and perhaps the Abraham solvation model.6 From the three, Hansen and UNIFAC are activity coefficient models that can be used for the estimation of solubilities in pure solvents and in solvent mixtures. Other popular activity coefficient models, such as van Laar, Wilson, nonrandom two liquid (NRTL), or UNIQUAC, are not practical because use of these models requires the determination of binary interaction parameters from phase equilibrium data for each of the solute-solvent and solvent-solvent binary mixtures. Solute-solvent phase equilibrium data are rarely avail* To whom correspondence should be addressed. Tel.: (617) 949-1202. Fax: (617) 949-1030. E-mail: chauchyun.chen@ aspentech.com.
able to support the use of these activity coefficient models in pharmaceutical process design. The Hansen model is a correlative model. It requires experimental solubility data from which componentspecific solubility parameters can be determined for the solutes. The UNIFAC model is a predictive model that requires only chemical structure information for the solutes and solvents. Unfortunately, although these models have shown limited utilities for solubility estimation of chemicals with molecular weights in the low 100s g/mol, prior investigators2 have found that, because of inherent assumptions with Hansen and UNIFAC, they are inadequate in estimating solubilities for large, complex organic molecules with molecular weights in the range of 200-600 g/mol. UNIFAC fails for systems with large complex molecules for which either the UNIFAC functional groups are undefined or the functional group additivity rule becomes invalid. Additionally, neither Hansen nor UNIFAC is applicable to electrolyte solutes, a major concern for the pharmaceutical industry because organic electrolytes account for the majority of drug compounds. Recent developments in computational chemistry yielded COSMO-RS7 and COSMO-SAC,8 predictive models that represent promising alternatives to UNIFAC. Like UNIFAC, the current COMOS-RS-type models are not applicable to electrolyte solutes. In this paper, we present the NRTL segment activity coefficient (NRTL-SAC) model as the thermodynamic framework for solubility modeling. The NRTL-SAC model is based on the polymer NRTL model,9 a derivative of the original NRTL model of Renon and Prausnitz.10 NRTL is one of the most successful molecular thermodynamic models in the chemical industry. The model and its derivatives have been widely used to correlate and extrapolate phase behaviors of highly nonideal systems with chemicals, electrolytes, oligomers, polymers, surfactants, etc.9,11 We show that the NRTL-SAC model provides a simple and practical thermodynamic framework for chemists and engineers to perform solubility modeling in support of their
10.1021/ie049463u CCC: $27.50 © 2004 American Chemical Society Published on Web 12/15/2004
Ind. Eng. Chem. Res., Vol. 43, No. 26, 2004 8355
pharmaceutical process design. While this paper focuses on modeling solubilities of organic nonelectrolytes, future work will extend the model to organic electrolytes.
ln
Γlc m
)
∑j xjGjmτjm
+
∑k xkGkm
Solubility Modeling
xm′Gmm′
τmm′ ∑ m′ ∑k xkGkm′ ∑k xkGkm′
The solubility of a solid organic nonelectrolyte can be described by the expression1,2
ln xSAT ) I
(
)
∆fusS Tm 1- ln γSAT I R T
(1) ln Γlc,I m )
for T e Tm
∆fusS ) ∆fusH/Tm
∑j xj,IGjmτjm
(2)
+ ln γSAT ln Ksp ) ln xSAT I I
(3)
Ksp corresponds to the ideal solubility of the solute. NRTL Segment Activity Coefficient Model The proposed NRTL segment activity coefficient model builds on the segment contribution concept that was first incorporated into the polymer NRTL model9 for systems with oligomers and polymers. In NRTLSAC, the activity coefficient expression is written in two parts such that
ln γI ) ln γCI + ln γRI
(4)
where γCI and γRI are the combinatorial and residual contributions to the activity coefficient of component I. The residual part, γRI , is set equal to the local composition (lc) interaction contribution, γlc I , of the polymer NRTL as follows:
ln γRI ) ln γlc I )
lc,I rm,I[ln Γlc ∑ m - ln Γm ] m
∑k xk,IGkm
(
xj )
∑J xJrj,J ∑I ∑i
xj,I )
)
∑j xj,IGjm′τjm′
τmm′ ∑ m′ ∑k xk,IGkm′ ∑k xk,IGkm′
(7)
(8)
xIri,I
rj,I
∑i ri,I
(9)
where i, j, k, m, and m′ are the segment-based species indices, I and J are the component indices, xj is the segment-based mole fraction of segment species j, xJ is the mole fraction of component J, rm,I is the number of segment species m contained in component I, Γlc m is the activity coefficient of segment species m, and Γlc,I m is the activity coefficient of segment species m contained only in component I. G and τ in eqs 6 and 7 are local binary quantities related to each other by the NRTL nonrandomness factor parameter R:
G ) exp(-Rτ)
(10)
Equation 5 is a general form for the local composition interaction contribution to activity coefficients of components in the NRTL-SAC model. For monosegment solvent components (S), eq 5 can be simplified and reduced to the classical NRTL model as follows: lc ) ln γI)S
lc,S rm,S[ln Γlc ∑ m - ln Γm ] m
(11)
with
(5)
We then compute the segment activity coefficient, Γm, from the NRTL equation.
)
(6)
+
xm′,IGmm′
xSAT I
where is the mole fraction of the solute I dissolved in the solvent phase at saturation, ∆fusS is the entropy is the activity coefficient of of fusion of the solute, γSAT I the solute in the solution at saturation, R is the gas constant, T is the temperature, and Tm is the melting point of the solute. Given a polymorph, ∆fusS and Tm are fixed. At a fixed temperature, the solubility is only a function of the activity coefficient of the solute in the solution. Clearly, the activity coefficient of the solute in the solution plays a key role in determining the solubility. Equation 1 is a simplified expression for solubility. It ignores the contributions due to the difference between solid and liquid heat capacities at the melting point and due to the pressure correction. When the values of ∆fusS and Tm are not available, the solubility product constant, Ksp, can be introduced into eq 1 as an adjustable parameter for data regression:
(
∑j xjGjm′τjm′
Therefore
rm,S ) 1
(12)
ln Γlc,S m ) 0
(13)
8356
(
Ind. Eng. Chem. Res., Vol. 43, No. 26, 2004
lc ln γI)S )
∑j xjGjSτjS ∑k xkGkS
+
xmGSm
)
∑j xjGjmτjm
τSm ∑ m ∑k xkGkm ∑k xkGkm
(14)
GjS ) exp(-RjSτjS)
(15)
GSj ) exp(-RjSτSj)
(16)
Equation 14 is the same as the classical NRTL model.10 The combinatorial part, γCI , is calculated from the Flory-Huggins term:
ln γCI ) ln
φI xI
rI )
φI )
+ 1 - rI
φJ
∑J r
(17)
J
∑i ri,I rIxI
∑J rJxJ
(18)
(19)
where rI and φI are the total segment number and segment mole fraction of component I, respectively. Conceptual Segment Contribution Concept. The essence of NRTL-SAC resides in its use of the conceptual segment contribution concept. While UNIFAC decomposes molecules into a large set of predefined functional groups based on the chemical structure, NRTL-SAC maps molecules into a few predefined conceptual segments, or molecular descriptors, based on expressed characteristics of molecular interactions in solutions. Specifically, for each solute and solvent molecule, NRTLSAC describes their effective surface interactions in terms of three types of conceptual segments: hydrophobic segment, polar segment, and hydrophilic segment. Equivalent numbers of the conceptual segments for each molecule are measures of the effective surface areas of the molecule that exhibit surface interaction characteristics of hydrophobicity (X), polarity (Y), and hydrophilicity (Z). These molecular measures, i.e., X, Y, and Z, are to be determined not from the molecular structure but from the interaction characteristics of the molecules in solution as expressed in their experimental phase equilibrium data. The pairwise segment-segment interaction characteristics of these conceptual segments are represented by their corresponding binary NRTL parameters. The determination of these binary NRTL parameters is discussed in the next section. Given the NRTL parameters for the pairwise segment-segment interactions and the molecular measures (X, Y, and Z) for the molecules, we apply eqs 4-9 to compute activity coefficients for the segments and the molecules in solution. In other words, the phase behavior of the mixtures will be accounted for based on the segment compositions of the molecules and their pairwise segment-segment interactions. The conceptual segment contribution approach represents a practical alternative to the UNIFAC functional group contribution approach. This approach is suitable for use in the industrial practice of carrying out mea-
surements for a few selected solvents and then using a model to quickly predict other solvents or solvent mixtures and to generate a list of suitable solvent systems. The NRTL-SAC model aims to provide such a thermodynamic framework. With NRTL-SAC, available experimental data are used to identify molecular parameters for the solutes, and the model is used to extrapolate to other solvent systems that are also described in terms of the same set of molecular descriptors. Conceptual Segments and NRTL Binary Parameters. Three conceptual segments are initially identified for nonelectrolyte molecules: hydrophobic segment, polar segment, and hydrophilic segment. Additional conceptual segments may be introduced when we expand the scope to cover organic electrolytes, charged molecules, zwitterions, etc. To enhance the usability of NRTL-SAC, the choice of conceptual segments is meant to be a minimal set rather than a comprehensive set. These conceptual segments are chosen to simulate the interaction characteristics of representative molecular surfaces that significantly contribute to the liquid-phase nonideality of real molecules. Here the hydrophilic segment simulates polar molecular surfaces that are “hydrogen bond donor or acceptor.“ As such, it represents molecular surfaces with the tendency to form a hydrogen bond. The hydrophobic segment simulates molecular surfaces with the adversity to form a hydrogen bond. The polar segment simulates polar molecular surfaces that are “electron pair donor or acceptor.” While the hydrophobic and hydrophilic segments have their strong and clear physical meanings and unique contributions to the liquid-phase nonideality, in our drive to minimize the number of conceptual segments and for practical purposes, we lumped all other surface interactions with the “polar” segment. With the conceptual segments identified, real molecules are then selected as reference molecules for the conceptual segments and available phase equilibrium data of these reference molecules are used to identify NRTL binary parameters for the conceptual segments. In choosing the reference molecules, we prefer those molecules with distinct molecular characteristics (i.e., hydrophobic, hydrophilic, or polar) and with abundant, publicly available phase equilibrium data. We focus our study on the 59 solvents reviewed for use in pharmaceutical process design by the International Conference on Harmonization of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH).12 We also consider water, triethylamine, and n-octanol in this study because they are used extensively in pharmaceutical processes. Additional solvents can be considered in the future. Table 1 shows these 62 solvents and their molecular characteristics. Hydrocarbon solvents (aliphatic or aromatic), halogenated hydrocarbons, and ethers are mainly hydrophobic. Ketones, esters, and amides are both hydrophobic and polar. Alcohols, glycols, and amines may have both substantial hydrophilicity and hydrophobicity. Acids are “complex” molecules, exhibiting hydrophilicity, polarity, and hydrophobicity. Also shown in Table 1 are the available NRTL binary parameters for various solventwater and solvent-hexane binary systems. We obtained these binary parameters by fitting the available data compiled by DECHEMA for phase equilibrium at or around room temperature. We deliberately ignore the temperature dependency of these parameters because
Ind. Eng. Chem. Res., Vol. 43, No. 26, 2004 8357 Table 1. NRTL Binary Parameters for Common Solvents in Pharmaceutical Process Design solvent (component 1) acetic acid acetone acetonitrile anisole benzene 1-butanol 2-butanol n-butyl acetate methyl tert-butyl ether carbon tetrachloride chlorobenzene chloroform cumene cyclohexane 1,2-dichloroethane 1,1-dichloroethylene 1,2-dichloroethylene dichloromethane 1,2-dimethoxyethane N,N-dimethylacetamide N,N-dimethylformamide dimethyl sulfoxide 1,4-dioxane ethanol 2-ethoxyethanol ethyl acetate ethylene glycol diethyl ether ethyl formate formamide formic acid n-heptane n-hexane isobutyl acetate isopropyl acetate methanol 2-methoxyethanol methyl acetate 3-methyl-1-butanol methyl butyl ketone methylcyclohexane methyl ethyl ketone methyl isobutyl ketone isobutyl alcohol N-methyl-2-pyrrolidone nitromethane n-pentane 1-pentanol 1-propanol isopropyl alcohol n-propyl acetate pyridine sulfolane tetrahydrofuran 1,2,3,4-tetrahydronaphthalene toluene 1,1,1-trichloroethane trichloroethylene m-xylene water triethylamine 1-octanol
τ12a
τ21a
τ12b
1.365 0.880 1.834
0.797 0.935 1.643
2.445 0.806 0.707
-1.108 1.244 1.787
1.490 -0.113 -0.165
-0.614 2.639 2.149
-0.148 1.309 0.884 1.121
0.368 -0.850 -0.194 -0.424
0.269 -0.168 1.430 1.534
2.870 3.021 2.131 4.263
-0.824 1.576
1.054 -0.138
0.589
0.325
1.245
1.636
1.246 0.533 -0.319 0.771
0.097 2.192 2.560 0.190
-0.940
1.400
-0.414
0.398
1.478
1.155
0.062
2.374
1.412 -0.036
-1.054 1.273
0.021 -0.583
2.027 3.270
0.496 -0.320 0.049 0.657
-0.523 2.567 2.558 1.099
-0.665
1.664
0.631 1.134 -0.869 0.535 1.026
1.981 -0.631 1.292 -0.197 -0.560
10.949 -0.908 -0.888
6.547 1.285 3.153
τ21b
3.207
4.284
1.983 0.450 -0.564 -1.167 -2.139 1.003 -0.024 -1.593
3.828 1.952 1.109 2.044 0.955 1.010 1.597 1.853
1.380
-1.660
τ12c
τ21c
3.692 -2.157 -1.539
5.977 5.843 5.083
5.314 4.013 3.587
7.369 7.026 4.954
6.012 2.833
9.519 4.783
0.508
3.828
1.612
3.103
-0.340
-1.202
6.547
10.949
6.547
10.949
0.103 1.389 0.715 -0.042
0.396 -0.566 2.751 3.029
-0.598
5.680
0.823 0.977 0.592 -0.235 1.968
2.128 4.868 2.702 0.437 2.556
-0.769
3.883
-1.479
5.269
-0.029 0.197 0.079 1.409 -0.990 1.045 1.773
3.583 2.541 2.032 2.571 3.146 0.396 0.563 4.241
7.224
-0.169 0.301
4.997 8.939
1.200
1.763
solvent characteristics complex polar polar hydrophobic hydrophobic hydrophobic/hydrophilic hydrophobic/hydrophilic hydrophobic/polar hydrophobic hydrophobic hydrophobic hydrophobic hydrophobic hydrophobic hydrophobic hydrophobic hydrophobic polar polar polar polar polar polar hydrophobic/hydrophilic hydrophobic/hydrophilic hydrophobic/polar hydrophilic hydrophobic polar complex complex hydrophobic hydrophobic polar polar hydrophobic/hydrophilic hydrophobic/hydrophilic polar hydrophobic/hydrophilic hydrophobic/polar polar hydrophobic/polar hydrophobic/polar hydrophobic/hydrophilic hydrophobic polar hydrophobic hydrophobic/hydrophilic hydrophobic/hydrophilic hydrophobic/hydrophilic hydrophobic/polar polar polar polar hydrophobic hydrophobic hydrophobic hydrophobic hydrophobic hydrophilic hydrophobic/polar hydrophobic/hydrophilic
a NRTL binary τ parameters for various solvent-hexane systems. The NRTL nonrandom factor parameter, R, is fixed as a constant of 0.2. In these binary systems, the solvent is component 1 and hexane is component 2. τ’s were determined from available VLE and LLE data. b NRTL binary τ parameters for various solvent-water systems. The NRTL nonrandom factor parameter, R, is fixed as a constant of 0.3. In these binary systems, the solvent is component 1 and water is component 2. τ’s were determined from available VLE data. c NRTL binary τ parameters for various solvent-water systems. The NRTL nonrandom factor parameter, R, is fixed as a constant of 0.2. In these binary systems, the solvent is component 1 and water is component 2. τ’s were determined from available LLE data.
these parameters are reported here only to illustrate ranges of values for these binary parameters. Table 1 shows that all hydrophobic solvents (1) exhibit similar repulsive interactions with water (2) and both τ12 and τ21 are large positive values for the solventwater binaries. When the hydrophobic solvents also
carry significant hydrophilic or polar characteristics, we see that τ12 becomes negative while τ21 retains a large positive value. Interestingly, we see similar repulsive, but weaker, interactions between the polar solvent (1) and hexane (2), a representative hydrophobic solvent. Both τ12 and
8358
Ind. Eng. Chem. Res., Vol. 43, No. 26, 2004
Table 2. NRTL Binary Parameters for Conceptual Segments in NRTL-SAC segment 1 segment 2 τ12 τ21 R12 ) R21
X Y1.643 1.834 0.2
X Z 6.547 10.949 0.2
YZ -2.000 1.787 0.3
Y+ Z 2.000 1.787 0.3
X Y+ 1.643 1.834 0.2
τ21 are small but positive values for the solvent-hexane binaries. On the other hand, the interactions between hydrophobic solvents and hexane are weak, and the corresponding NRTL binary parameters are around or less than unity, characteristic of nearly ideal solutions. The interactions between polar solvents (1) and water (2) are more subtle. While all τ21 are positive, τ12 can be positive or negative. Apparently, different polar molecules exhibit different interactions, some repulsive and others attractive, with hydrophilic molecules. For example, acetonitrile and acetone are hydrogen bond acceptors, and they form hydrogen bonds with water. Both τ12 and τ21 are positive for the acetone-water and acetonitrile-water binaries. For example, dimethyl sulfoxide is a compound with excellent solvation capacity and high dielectric constant (48.75 at 25 °C). τ12 is negative and τ21 is positive for the dimethyl sulfoxidewater binary. Hexane and water are the obvious choices as the reference molecules for the hydrophobic and hydrophilic segments, respectively. The selection of the reference molecule for the polar segment requires attention to the wide variations of interactions between polar molecules and water. Ultimately, we choose acetonitrile as a representative of polar molecules, and we introduce a mechanism to tune the way we characterize the polar segment. The chosen values for the NRTL binary interaction parameters, R and τ, for the three conceptual segments are summarized in Table 2. As mentioned earlier, we ignore the temperature dependency of the binary parameters. The binary parameters for the hydrophobic segment X (1)-hydrophilic segment Z (2) are determined from available liquid-liquid equilibrium (LLE) data of the hexane-water binary mixture (see Table 1). We fix R at 0.2 because it is the customary value for R for systems that exhibit liquid-liquid separation. Here both τ12 and τ21 are large positive values (6.547 and 10.950). They highlight the strong repulsive nature of the interactions between the hydrophobic and hydrophilic segments. The binary parameters for the hydrophobic segment X (1)-polar segment Y (2) are determined from available LLE data of the hexane-acetonitrile binary mixture (see Table 1). Again, we fix R at 0.2. Both τ12 and τ21 are small positive values (1.643 and 1.834). They highlight the weak repulsive nature of the interactions between the hydrophobic and polar segments. The binary parameters for the polar segment Y (1)hydrophilic segment Z (2) are determined from available vapor-liquid equilibrium (VLE) data of the acetonitrile-water binary mixture (see Table 1). We fix R at 0.3 for the hydrophilic segment-polar segment pair because this binary does not exhibit liquid-liquid separation. We fix τ21 at a positive value (1.787), and we allow τ12 to vary between -2 and +2 to reflect the fact that the interaction between the polar molecule and water can be negative or positive as shown in Table 1. In practice, this is achieved by allowing for two types of polar segments, Y- and Y+. For the Y- polar
segment, the values of τ12 and τ21 are -2 and +1.787, respectively. For the Y+ polar segment, they are 2 and 1.787, respectively. Note that both the Y- and Y+ polar segments exhibit the same repulsive interactions with hydrophobic segments as those discussed in the previous paragraph. Also, an ideal solution is assumed for the Y- polar segment and Y+ polar segment binary, i.e., τ12 ) τ21 ) 0. We understand that the treatment above is somehow arbitrary and it only reflects our own limited molecular insights at this time. However, the treatment is designed to capture the general trends of the NRTL binary parameters that we have observed for systems with hydrophobic, polar, and hydrophilic molecules. Further investigation may bring improved treatments. Molecular Parameters for Solvents. The application of NRTL-SAC requires an extensive databank of molecular parameters for common solvents used in the industry. As mentioned earlier, we focus on the common solvents used in the pharmaceutical industry.12 For each solvent, there can be up to four molecular parameters, i.e., X, Y-, Y+, and Z. Because of the fact that these molecular parameters represent certain pairwise surface interaction characteristics, often only one or two molecular parameters are needed for most solvents. For example, alkanes are hydrophobic and are well represented with hydrophobicity, X, alone. Alcohols are hybrids of hydrophobic and hydrophilic segments and are primarily represented with X and Z. Ketones, esters, and ethers are polar molecules with varying degrees of hydrophobic contents. They are well represented by X and Y’s. Determination of solvent molecular parameters involves regression of available experimental VLE or LLE data for binary systems of solvent and the abovementioned reference molecules (i.e., hexane, acetonitrile, and water) or their substitutes. Solvent molecular parameters X, Y-, Y+, and Z are the adjustable parameters in the regression. If binary data are lacking for the solvent with the reference molecules, data for other binaries may be used as long as the molecular parameters for the substitute reference molecules are already identified. Table 3 lists the molecular parameters identified for the 62 solvents. We used the VLE or LLE data taken at or around room temperature and available in the DECHEMA database. Among the ICH solvents, because of the lack of sufficient experimental binary phase equilibrium data, we are less comfortable with the molecular parameters identified for anisole, cumene, 1,2-dichloroethylene, 1,2-dimethoxyethane, N,N-dimethylacetamide, dimethyl sulfoxide, ethyl formate, isobutyl acetate, isopropyl acetate, methyl butyl ketone, tetralin, and trichloroethylene. In fact, we are not able to locate any public data for methyl butyl ketone (2hexanone) and, therefore, its molecular parameters were set to be the same as those for methyl isobutyl ketone. The NRTL-SAC model with the molecular parameters does qualitatively capture the interaction characteristics of the solvent mixtures and the resulting phase equilibrium behavior. As an example, Figures 1-3 show the binary phase diagrams for the water-1,4-dioxaneoctanol system. We compare the predictions from the NRTL model with the binary parameters in Table 1 to the predictions from the NRTL-SAC model with the molecular parameters in Table 3. The predictions with the NRTL-SAC model are broadly consistent with the
Ind. Eng. Chem. Res., Vol. 43, No. 26, 2004 8359 Table 3. NRTL-SAC Molecular Parameters for Common Solvents solvent name
X
Y-
Y+
Z
acetic acid acetone acetonitrile anisole benzene 1-butanol 2-butanol n-butyl acetate methyl tert-butyl ether carbon tetrachloride chlorobenzene chloroform cumene cyclohexane 1,2-dichloroethane 1,1-dichloroethylene 1,2-dichloroethylene dichloromethane 1,2-dimethoxyethane N,N-dimethylacetamide N,N-dimethylformamide dimethyl sulfoxide 1,4-dioxane ethanol 2-ethoxyethanol ethyl acetate ethylene glycol diethyl ether ethyl formate formamide formic acid n-heptane n-hexane isobutyl acetate isopropyl acetate methanol 2-methoxyethanol methyl acetate 3-methyl-1-butanol methyl butyl ketone methylcyclohexane methyl ethyl ketone methyl isobutyl ketone isobutyl alcohol N-methyl-2-pyrrolidone nitromethane n-pentane 1-pentanol 1-propanol isopropyl alcohol n-propyl acetate pyridine sulfolane tetrahydrofuran 1,2,3,4-tetrahydronaphthalene toluene 1,1,1-trichloroethane trichloroethylene m-xylene water triethylamine 1-octanol
0.045 0.131 0.018 0.722 0.607 0.414 0.335 0.317 1.040 0.718 0.710 0.278 1.208 0.892 0.394 0.529 0.188 0.321 0.081 0.067 0.073 0.532 0.154 0.256 0.071 0.322
0.164 0.109 0.131
0.157 0.513 0.883
0.217
0.448 0.257 0.707 1.340 1.000 1.660 0.552 0.088 0.052 0.236 0.419 0.673 1.162 0.247 0.673 0.566 0.197 0.025 0.898 0.474 0.375 0.351 0.514 0.205 0.210 0.235 0.443 0.604 0.548 0.426 0.758
0.190 0.007 0.082 0.030 0.219
0.194 0.030 0.564 2.890 0.086 0.081 0.318 0.049 0.141 0.041 0.089 2.470
0.154 0.149 0.043 0.224 0.036 0.224
0.485 0.355 0.330 0.172 0.141 0.424 0.039 0.541 0.691 0.208 0.832 1.262 0.858 0.157 0.372
Figure 1. Txy phase diagram for a water-1,4-dioxane mixture at atmospheric pressure.
0.401 0.507 0.237 0.421 0.338 0.165 0.280 0.341
0.108 0.498 0.027 0.251 0.337 0.538 0.469 0.251 0.480 0.469 0.067
0.322
0.252
0.562 0.560
Figure 2. Txxy phase diagram for a water-octanol mixture at atmospheric pressure.
0.314
0.485 0.305
1.216 0.223 0.030 0.070 0.134 0.135
0.426
0.040 0.555
0.320
0.003 0.587 0.174
0.248 0.511 0.353 0.457
0.304 0.287 0.285 0.021
Figure 3. Txy phase diagram for an octanol-1,4-dioxane mixture at atmospheric pressure.
0.316 1.000
0.557 0.766
0.105 0.032
0.624
0.335
calculations from the NRTL model that are generally understood to represent experimental data within engineering accuracy. Model Applications To test the usability of NRTL-SAC with solubility modeling of pharmaceuticals, we apply the model to aspirin with the room-temperature solubility data compiled by Frank et al.2 Of the 23 solvents in Frank et
al.’s compilation, we focus on the 14 solvents for which we have molecular parameters available in Table 3. We first fit the aspirin solubility data for all 14 solvents. The regression results are shown in Table 4 and Figure 4. The regressed molecular parameters for aspirin are given in Table 5. With NRTL-SAC, the root-meanexp 2 1/2 - ln xcal square (rms) error in ln x, [∑N i (ln xi i ) /N] , for the fit is 0.506 (here x is the solubility of the solute, i.e., mole fraction, and N is the number of data used in the correlations). Acetic acid, a strong proton donor, is the outlier in this case. With acetic acid removed, the rms error in ln x for the fit drops significantly to 0.362. While there is room for further optimization of NRTL-
8360
Ind. Eng. Chem. Res., Vol. 43, No. 26, 2004
Table 4. Solubility of Aspirin at Room Temperaturea literature data solvent
wt %
mole fraction
methanol acetone ethanol 1,4-dioxane acetic acid methyl ethyl ketone 2-propanol isoamyl alcohol chloroform diethyl ether n-octanol 1,2-dichloroethane 1,1,1-trichloroethane cyclohexane
33 29 20 19 12 12 10 10 6 5 3 3 0.5 0.005
8.053 × 10-2 1.163 × 10-1 6.007 × 10-2 1.029 × 10-1 4.347 × 10-2 5.174 × 10-2 5.924 × 10-2 5.155 × 10-2 4.057 × 10-2 2.119 × 10-2 2.186 × 10-2 1.670 × 10-2 3.706 × 10-3 2.335 × 10-5
NRTL-SAC (mole fraction)
NRTL-SAC (four solvents)b (mole fraction)
UNIFAC (mole fraction)
Hansen (mole fraction)
7.950 × 10-2 1.084 × 10-1 3.907 × 10-2 1.130 × 10-1 1.709 × 10-1 4.838 × 10-2 3.257 × 10-2 4.552 × 10-2 4.547 × 10-2 1.127 × 10-2 2.491 × 10-2 1.352 × 10-2 2.743 × 10-3 4.962 × 10-5
8.053 × 10-2 1.163 × 10-1 3.208 × 10-2 1.204 × 10-1 1.670 × 10-1 5.016 × 10-2 2.903 × 10-2 4.195 × 10-2 4.057 × 10-2 9.081 × 10-3 2.015 × 10-2 1.232 × 10-2 2.001 × 10-3 2.335 × 10-5
7.722 × 10-2 8.782 × 10-2 1.606 × 10-2 5.699 × 10-2 9.522 × 10-2 6.596 × 10-2 2.897 × 10-2 1.490 × 10-2 9.735 × 10-2 1.685 × 10-2 1.453 × 10-2 3.969 × 10-2 3.750 × 10-2 9.351 × 10-4
4.256 × 10-2 7.892 × 10-2 4.643 × 10-2 1.997 × 10-2 9.053 × 10-2 5.642 × 10-2 7.174 × 10-2 5.155 × 10-2 3.369 × 10-2 2.558 × 10-2 3.664 × 10-2 2.809 × 10-2 2.238 × 10-2 4.695 × 10-3
aLiterature data, UNIFAC prediction results, and Hansen correlation results are taken from Frank et al.2 solvents are acetone, cyclohexane, methanol, and chloroform.
b
The four representative
Table 5. NRTL-SAC Molecular Parameters for Solutes solute
MW
no. of solvents
T (K)
X
aspirin aspirin p-aminobenzoic acid benzoic acid camphor ephedrine lidocaine methylparaben testosterone theophylline estriol estrone morphine piroxicam hydrocortisone haloperidol
180.16 180.16 137.14 122.12 152.23 165.23 234.33 152.14 288.41 180.18 288.38 270.37 285.34 331.35 362.46 375.86
14 4 7 7 7 7 7 7 7 7 9a 12 6 14b 11c 13d
298.15 298.15 298.15 298.15 298.15 298.15 298.15 298.15 298.15 298.15 298.15 298.15 308.15 298.15 298.15 298.15
0.103 0.039 0.218 0.524 0.604 0.458 0.698 0.479 1.051
Y-
0.853 0.499 0.773 0.665 0.401 0.827
0.681 0.089 0.124 0.068 0.596 0.484 0.771 0.757
Y+
Z
ln Ksp
rms error in ln x
1.160 1.372 1.935 0.450 0.478
0.777 0.799 0.760 0.405
-2.630 -2.582 -2.861 -1.540 -0.593 -0.296 -0.978 -2.103 -3.797 -6.110 -7.652 -6.531 -4.658 -7.656 -6.697 -4.398
0.506 0.533e 0.284 0.160 0.092 0.067 0.027 0.120 0.334 0.661 0.608 0.519 1.007 0.665 0.334 0.311
0.679
0.293 1.218 0.233 1.208 0.291 1.521
0.970
1.803 1.248
0.193 0.172 0.683 0.669 0.341 1.928 0.196 1.811 0.169 0.611 0.131
a With THF excluded. b With 1,2-dichloroethane, chloroform, diethyl ether, and DMF excluded. chloroform and DMF excluded. e 14 solvents.
Figure 4. NRTL-SAC results for aspirin solubility at 298.15 K. Solubility data2 for all 14 solvents are fit simultaneously with NRTL-SAC.
SAC including both molecular descriptors and parameters, the results are considered to be very satisfactory. To test the predictive capability of NRTL-SAC, we also fit the aspirin solubility data using only four representative solvents (i.e., acetone for the polar solvent, cyclohexane and chloroform for the hydrophobic solvents, and methanol for the hydrophilic solvent) and then use the identified molecular parameters to estimate the aspirin solubilities in the other 10 solvents. As shown in Table 5, the molecular parameters for
c
With hexane excluded.
d
With
Figure 5. NRTL-SAC results for aspirin solubility at 298.15 K. Solubility data2 for 4 solvents are fit with NRTL-SAC, while the other 10 are predicted.
aspirin only change slightly. Likewise, the rms error in ln x for all 14 solvents only increases slightly from 0.506 to 0.533. The comparison of experimental data vs computed solubilities is given in Figure 5, which shows a quality of fit similar to that shown in Figure 4. In other words, these molecular parameters are found to be relatively independent of the number of solvents used as long as proper representative solvents (hydrophobic, hydrophilic, and polar) are included. This study with aspirin and other similar studies suggest that the
Ind. Eng. Chem. Res., Vol. 43, No. 26, 2004 8361
extract and test the solubility data for the eight molecules reported by Lin and Nash.14 We also test the model against six additional molecules with sizable solubility data sets. We apply the model with the solvents that are included in Table 3. The molecular parameters determined for the solutes and the rms errors in ln x for the fits are summarized in Table 5. A good representation of the solubility data is obtained. The average rms error in ln x for the 14 solutes (aspirin excluded) summarized in Table 5 is 0.37. It corresponds to (45% accuracy in solubility predictions. Certainly, the quality of the fit reflects both the effectiveness of NRTL-SAC and the quality of the molecular parameters identified from the limited available experimental data for the solvents. Figure 6. UNIFAC results for aspirin solubility at 298.15 K. Solubility data2 for all 14 solvents are predicted with UNIFAC.
Conclusions The NRTL-SAC model is a practical thermodynamic framework for solubility modeling in pharmaceutical process design. The model requires only componentspecific molecular parameters that represent the surface interaction characteristics of the molecules. For solute molecules, these parameters are identified from solubility measurements of the solute in a few representative solvents, i.e., hydrophobic, hydrophilic, and polar solvents. The model is a useful tool for qualitative correlation and prediction of phase behavior, i.e., solubility, of systems with large, complex pharmaceutical solutes in common solvents. Figure 7. Hansen correlation results for aspirin solubility at 298.15 K. Solubility data2 for all 14 solvents are fit simultaneously with Hansen.
NRTL-SAC molecular descriptors are good representations of molecular surface interaction characteristics and that the solvent molecules used to identify molecular parameters for the solute can be thought of as molecular “sensors” used to elucidate the surface interaction characteristics of the solute molecule in solution. These molecular “sensors” probe and express the solutesolvent interactions in terms of binary phase equilibrium data, i.e., solubility. Note that, during the data regression, all experimental solubility data, regardless of their order of magnitude, were assigned with a standard deviation of 20%. In addition to the experimental data and the NRTLSAC results for aspirin at room temperature, Table 4 also includes the UNIFAC prediction results and the Hansen correlation results reported by Frank et al.2 To be sure of the UNIFAC predictions and the Hansen correlations, we duplicated Frank’s results. With UNIFAC and Hansen, the rms errors in ln x for the 14 solvents are 1.352 and 1.600, respectively. Figures 6 and 7 show the comparisons of experimental data and computed solubilities with UNIFAC and Hansen. The outliers could be attributed to either “poor” experimental data or “poor” model representations. Given that the NRTL-SAC results are clearly superior to those of UNIFAC and Hansen, the results illustrate the relative inability of UNIFAC and Hansen to capture solvent effects on the solubility of aspirin. The data compilation of Marrero and Abildskov13 provides a good source of public solubility data for large, complex chemicals. To further test NRTL-SAC, we first
Acknowledgment The authors are grateful to Hsien-Hsin Tung, Daniel E. Bakken, Christopher Rentsch, and Jose E. Tabora of Merck for their critical evaluation of NRTL-SAC, UNIFAC, and Hansen models for solubility modeling of Merck compounds in solvents and solvent mixtures. We also thank Prof. John Prausnitz for his warm encouragement and insightful critiques on the manuscript. Literature Cited (1) Gupta, A.; Gupta, S.; Groves, F. R., Jr.; McLaughlin, E. Correlation of Solid-Liquid and Vapor-Liquid Equiibrium Data for Polynuclear Aromatic Compounds. Fluid Phase Equilib. 1991, 64, 201. (2) Frank, T. C.; Downey, J. R.; Gupta, S. K. Quickly Screen Solvents for Organic Solids. Chem. Eng. Prog. 1999, Dec, 41. (3) Kolar, P.; Shen, J.-W.; Tsuboi, A.; Ishikawa, T. Solvent Selection for Pharmaceuticals. Fluid Phase Equilib. 2002, 194197, 771. (4) Hansen, C. M. Hansen Solubility Parameters: A User’s Handbook; CRC Press: Boca Raton, FL, 2000. (5) Fredenslund, A.; Jones, R. L.; Prausnitz, J. M. GroupContribution Estimation of Activity Coefficients in Nonideal Liquid Mixtures. AIChE J. 1975, 21, 1086. (6) Acree, W. E., Jr.; Abraham, M. H. Solubility Predictions for Crystalline Nonelectrolyte Solutes Dissolved in Organic Solvents Based upon the Abraham General Solvation Model. Can. J. Chem. 2001, 79, 1466. (7) Klamt, A.; Eckert, F. COSMO-RS: a Novel and Efficient Method for the a Priori Prediction of Thermophysical Data of Liquids. Fluid Phase Equilib. 2000, 172, 43. (8) Lin, S.-T.; Sandler, S. I. A Prior Phase Equilibrium Prediction from A Segment Contribution Solvation Model. Ind. Eng. Chem. Res. 2002, 41, 899.
8362
Ind. Eng. Chem. Res., Vol. 43, No. 26, 2004
(9) Chen, C.-C. A Segment-Based Local Composition Model for the Gibbs Energy of Polymer Solutions. Fluid Phase Equilib. 1993, 83, 301. (10) Renon, H.; Prausnitz, J. M. Local Compositions in Thermodynamic Excess Functions for Liquid Mixtures. AIChE J. 1968, 14, 135. (11) Chen, C.-C.; Song, Y. Generalized Electrolyte NRTL Model for Mixed-Solvent Electrolyte Systems. AIChE J. 2004, 50, 1928. (12) ICH Steering Committee, ICH Harmonised Tripartite Guideline, Impurities: Guideline for Residual Solvents, Q3C. International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use, 1997 (http://www.ich.org).
(13) Marrero, J.; Abildskov, J. Solubility and Related Properties of Large Complex Chemicals, Part 1: Organic Solutes Ranging from C4 to C40. Chemistry Data Series XV; DECHEMA: Frankfurt/ Main, Germany, 2003. (14) Lin H.-M.; Nash, R. A. An Experimental Method for Determining the Hildebrand Solubility Parameter of Organic Electrolytes. J. Pharm. Sci. 1993, 82, 1018.
Received for review June 18, 2004 Revised manuscript received September 29, 2004 Accepted October 18, 2004 IE049463U