Predicting the Solubilities of Complex Chemicals I. Solutes in Different

Oct 16, 2003 - Predicting the Solubilities of Complex Chemicals I. Solutes in Different Solvents ... For a more comprehensive list of citations to thi...
0 downloads 13 Views 285KB Size
5622

Ind. Eng. Chem. Res. 2003, 42, 5622-5634

Predicting the Solubilities of Complex Chemicals I. Solutes in Different Solvents Jens Abildskov*,† and John P. O’Connell‡ Computer-Aided Process Engineering Center, Department of Chemical Engineering, Building 229, Technical University of Denmark, 2800 Lyngby, Denmark, and Department of Chemical Engineering, University of Virginia, Charlottesville, Virginia 22904

A systematic approach is suggested for predicting the solubility of sparingly soluble solid fine chemicals and pharmaceuticals. The procedure uses group contribution methods for computing the difference in solubility at infinite dilution in the solvent of interest from an optimal reference solvent with the aim of (1) minimizing the impact of uncertainties in pure-solute properties, (2) decreasing the number of adjustable parameters to be determined by data reduction, and (3) using appropriate experimental data to fit unknown parameters. Several examples illustrate the method. Introduction The production of pharmaceuticals and medium-sized biochemicals customarily involves liquid solvents for reaction, separation, and formulation.1-3 Solvent selection has commonly been based on experience and empirical descriptions of experimental results. Experience shows that over 30% of the efforts of industrial property modelers and experimentalists deal with solvent selection.2 Minimum time-to-market concerns require rapid and reliable property predictions, so more systematic approaches would be highly desirable. The foundation of systematic solvent selection procedures is knowledge of the properties of desirable and undesirable products in candidate solvents. Thus, solubilities are particularly important in this decision, although toxicological and environmental considerations can also be of great importance. Although molecular simulation is developing,4-7 solvent selection has traditionally been viewed as essentially a thermodynamic problem, formulated in terms of thermodynamic phase equilibrium criteria involving basic physical properties and state variables such as P, T, and phase compositions. The key element is predicting the nonideality of a solution using activity coefficients based on well-defined standard states. Systematic and efficient methods for obtaining activity coefficients in complex systems appeared during the 1960-1970s when solubility parameters were first applied to paint formulations.8 Later, group contribution methods were applied to a broad range of commodity chemicals. These techniques spurred the development of automatic generate-and-test algorithms for molecular structures with desired properties.9-11 In recent years, expert system methods12 have been developed by combining prediction methods with property databanks and information on environmental, safety, and hazard issues. * To whom correspondence should be addressed. Tel.: +4545252905. Fax: +4545882258. E-mail: [email protected]. † Technical University of Denmark. ‡ University of Virginia.

The physical properties of pharmaceuticals and medium-sized biochemicals are more complicated than those of petrochemicals and many traditional commodity species. Features to address include: (1) molecular weights greater than 100 g/mol; (2) solid pure substances at room temperature; (3) polyfunctionality with two or more functional groups; (4) multiple molecular conformations and isomers, leading to combinatorial effects; (5) complex molecular interactions, such as large dipoles, high polarizability, hydrogen bonding, and charge-transfer complexes; and (6) limited amounts of reliable measured data. For many years, the “peak solubility” method, based on the Scatchard-Hildebrand concept of regular solutions, has been used in the pharmaceutical industry. There are several variations,2,13-15 based on either the actual or an effective solubility parameter for the solute and solvents. Experimental drug solubilities are plotted as a function of solvent solubility parameter, and often, the points define a curve with a maximum in solubility. Figure 1 shows such a plot for the solute hydrocortisone16 (2) in 12 different solvents (1), where the line is the theoretical parabola using the solute properties of Table 1. Regular solution theory would conclude (see Appendix A) that a definite maximum should exist that coincides with the solute solubility parameter. In the absence of a known value, δ2 is chosen as the solvent solubility parameter at a solubility maximum. Note that there would be some uncertainty in such a value given that the data do not form a precisely defined parabola because of the inadequacy of the theory and data errors. When insufficient data are available to define the solubility peak, it has been suggested2,19,20 that solutionof-groups (SOG) methods such as ASOG21 and UNIFAC22,23 (see Appendix A) could be used to predict the solubilities to be used for the peak solubility plot. Thus, Nielsen and co-workers19,20 used a UNIFAC model to identify good solvents for steroids and antibiotics with optimal fragments found by sensitivity tests on the model parameters, in effect allowing for the determination of drug solubility parameters from group contributions. However, direct application of group contribution methods to the systems of interest is often not possible. Commonly, not all group parameters for the

10.1021/ie030210v CCC: $25.00 © 2003 American Chemical Society Published on Web 10/16/2003

Ind. Eng. Chem. Res., Vol. 42, No. 22, 2003 5623

Thermodynamic Framework The solubility of a (solid) solute in a liquid solvent at equilibrium is found by solving the thermodynamic equations of equilibrium. Usually, these equations are written in terms of fugacities of species in solution

ˆf Li (T,P,{xL}) ) ˆf Si (T,P,{xS})

( all i )

(1)

where i denotes a component present in the solid (S) and liquid (L) phases at equilibrium. The full variable dependence of eq 1 is rarely used. In fact, the simplest case is usually chosen, where the solid phase is assumed to be pure (the size and shape of solute and solvent molecules are ordinarily sufficiently different that solid solutions do not form). If the subscript 2 denotes the solute

ln ˆf2S ) ln f2S ≈ ln f 0S 2 (T)

(2)

Then, there is only a single eq 1 for the solute component, which, in a low-pressure binary system, becomes 0S ln ˆf L2 ) ln[xL2 γL2 (T,{xL})f 0L 2 ( T )] ) ln f 2

Figure 1. Hydrocortisone ln x2 versus δ1 diagram16 at 298.15 K.

w

Table 1. Solutes solute, 2 ephedrine hydrocortisone salicylic acid

δ2 ∆Hm2 [(J/cm3)1/2] (kJ/mol) 22.77a 25.37 24.21

20.37 34.87 24.6

Tm2 (K) 310.15 485.5 432

V L2 ln xid 2 (m3/kmol) (298.15 K) 0.1534a 0.293 0.1196

-0.32 -5.43 -3.07

a Estimated with group contribution methods17,18 (∆H Lv ) 2 82.021 kJ/mol).

solute are available in existing ASOG or UNIFAC parameter tables. Although the values can be obtained from new measurements3 or from structurally similar substances, it often happens that so many group parameters need to be obtained that the available database is insufficient to estimate all missing parameters. This is a limitation on both generate-and-test algorithms and peak solubility methods. We offer here an alternative approach for using group contributions to predict how solute solubilities vary with solvent constitution. The goal is to use existing thermodynamic insights more beneficially in solvent selection, especially to show how to use a minimum of measured solubility data to establish predictions of feasible solvents. In practice, final decisions would then be based on both direct experiments on final candidates and nonthermodynamic criteria. In the next section, we provide a thermodynamic framework that leads to the simplicity of infinitedilution activity coefficients and to the use of reference solvents to maximize reliability and accuracy, including a procedure for establishing an optimal reference solvent. Then, this formulation is illustrated for cases in which group contribution parameters have already been determined; one solute is sparingly soluble, and two others dissolve to much higher extents. Subsequently, we suggest a strategy for using minimal data and parameter fitting to obtain unknown UNIFAC parameters, fitting data in two cases and then predicting results in a third. Finally, brief comments are made about using the method when composition measures differ from mole fractions, as well as about future work.

ln[xL2

γL2 (T,{xL})]

) ln

f 0S 2 f

0L 2

≡ ln xid 2

L L w ln xL2 ) ln xid 2 - ln γ2 (T,{x })

(3)

where xL2 is the solute mole fraction solubility and γ2 is the solute activity coefficient for the Lewis/Randall standard state (pure component as a liquid at the system temperature T ). For nonideal solutions, the solubility calculation for xL2 uses eq 3 iteratively to find xL2 . Values of xid 2 are obtained from the ratio of standard-state fugacities, usually approximated with the use of only melting point properties, Tm2 and ∆Hm2

ln xid 2(T) )

(

)

∆Hm2 Tm2 1RTm2 T

(4)

Corrections3,24 to eq 4 can be important, especially for phase transitions in the solid phase occurring between the system temperature T and Tm2. In addition, some chemicals can persist for appreciable periods25 in one or more metastable solid modifications (polymorphism) in addition to the thermodynamically stable state. Their melting properties and solubilities might be close to those of the stable form, in which case, the effects might not be major. Also, Schefter and others26 have found that the solid form of the solute can change with the solvent, either by altered crystal structure, solvation, or compound formation. Such changes cannot be handled by the calculational technique discussed above. Until a reliable method for predicting solid-phase effects is developed, only experiment can indicate whether any of these situations occur. Here, if we use eq 3, we assume that the properties in eq 4 are known, and we do not attempt to take into account any variations of the solid with different solvents. If one has an activity coefficient method and if one knows the solubility of the solute in one pure solvent, j, then the solubility in another solvent, i, can be directly calculated according to

ln x2i ) ln x2j + ln γ2j( T,{xs}j ) - ln γ2i( T,{xs}i )

(5)

5624 Ind. Eng. Chem. Res., Vol. 42, No. 22, 2003

This equation is useful because, in contrast to eq 3, no pure-solute properties are needed. We identify this form as the “reference solvent” approach. (One cannot use a similar equation for finding the relative solubility of two different solutesssay, 2 and 3sin the same solvent; a difference is expected in their pure-component properid ties, i.e., ln xid 3 - ln x2 * 0, which does not get canceled out as above.) If xL2 is less than 10-2, it is common to approximate the solution as being at infinite dilution, where interactions between solute molecules are ignored and the activity coefficient is a function only of the species involved; T; and, if the solvent is a mixture, the relative amounts of solvent components, {xs}. Then, eqs 3 and 5 can be approximated as ∞ ln x2(T,{xs}) ≈ ln xid 2 ( T ) - ln γ2 (T,{xs}) ∞ ∞ ( T,{xs}j ) - ln γ2i ( T,{xs}i ) ln x2i ) ln x2j + ln γ2j

be developed. For each pair of solvents i and j, we define a residual (a circumflex denotes a calculated value) as

δ ln x2,ij ) ln x2i ∞ (10) ln xˆ 2i( j ) ) ln x2i - ln x2j - ∆ ln γˆ 2,ji

These residuals will depend on the solvent pair ∞ ∞ and γ2j will vary in accuracy. We because values of γ2i seek the solvent j that minimizes the sum of the N available residuals of eq 10 with the hope that this will also minimize errors in predictions based on eq 7. Thus, we wish to obtain the minimum sum of residuals

minj|



δ ln x2,ij| ) minj|

i)data



∞ )| (11) N(ln x2j + ln γˆ 2j

(6) (7)

This formulation is advantageous because methods exist for determining γ∞2 and no iteration is needed for calculating solubilities. From this point on, we use the following notation for the differences

∆ ln x2,ij ≡ ln x2i - ln x2j

(8)

∞ ∞ ∞ ≡ ln γ2j ( T ) - ln γ2i (T) ∆ ln γ2,ji

(9)

Despite the basis of its derivation, eq 7 can actually be accurate at higher concentrations. Deviations from the infinite-dilution result arise from solute-solute interactions; if these interactions are similar in solvents i and j, cancellation will occur. However, there is no built-in limit on x2 in eqs 6 and 7. For example, if γ∞2 in eq 6 is too small, x2 could be greater than unity. An example is shown later. Reference solubilities have been used previously. Any model that correlates infinite-dilution partition coefficients between solvents effectively uses one of the solvents as a reference. Thus, our treatment of ∆ ln x2,ij with a group contribution method resembles the approach to partition coefficients of Wienke and Gmehling.27 They concluded, as did Derawi and co-workers,28 that UNIFAC methods with VLE-based parameters could not adequately represent octanol/water partition coefficients and vice versa. It is not uncommon for VLE parameters to be in error for LLE, but it might also have been that the octanol solubilities were too high for the infinite-dilution assumption. Here, we initially focus on lower solubilities and nonaqueous solvents and later explore higher concentrations. There is also precedent for correlating differences of solubility in solvents. For example, Acree and Abraham29 correlate the equivalent of ∆ ln x2,ij relative to that of water or in the gas phase with molecular descriptors. However, no earlier work known to us has attempted to develop a method for selecting an optimal reference solvent based on a strategy of data measurements for a given solute, as we now suggest. Optimal Reference Solvent Although eq 7 cancels errors of measurement or assumption of pure-solute properties, it requires selection of a reference solvent, j. An optimal choice based on either available data or a strategy of data taking can

∞ (ln x2i + ln γˆ 2i )-

i)data

Thus, an optimal reference solvent, j, is one for which ∞ ln x2j + ln γˆ 2j )



i)data

∞ (ln x2i + ln γˆ 2i )

N

(12)

If the magnitude of the terms in eq 12 are of the same order, they will scatter about zero. It is not expected that any real data set will exactly satisfy eq 12, so the optimal reference solvent is likely to be the one that minimizes eq 11. This also suggests a data measurement strategy, perhaps similar to that of Kolar et al.,2 sampling a wide range of solvents based on molecular structure or even solubility parameters. After a few cases have been measured, the residuals, or equivalently the rhs of eq 11, should be calculated. (This assumes that parameters are available to predict γ∞2 . If not, the fitting procedure suggested below can be implemented and the solvent set extended to those systems with the same parameters.) Even if an apparently optimal solvent is found, both predictions and measurements should be made for a small additional set of solvents predicted to give the desired solubility. If the predictions are satisfactory, no further data need be taken. If the new data suggest that another solvent might be optimal or that no candidates are acceptable, or if there appear to be outliers that might indicate solid-phase modifications by some solvents, the process might need to be modified or repeated. Example Calculation for a Sparingly Soluble Complex Chemical with Known UNIFAC Parameters The (pure) solute properties used in this section are given in Table 1. Their UNIFAC group assignments are given in Table 2. First, we examine the results for hydrocortisone at T ) 298.15 K of Hagen and Flynn.16 Averages of pure-component data16,30 are tabulated in Table 1. Other references31 report similar values. The 16 ideal solubility, xid 2 , was found with eq 4, as justified by differential thermal analysis. Figure 1 shows the results, with the filled symbols being the measured values, the curve being computed from the Scatchard/ Hildebrand equation, and the open symbols being the UNIFAC predictions calculated using eq 6. The general trends are obtained, but the accuracy is not good, especially with the Scatchard/Hildebrand equation. This is seen more effectively in Figure 2, where ln x2 (measured) is plotted versus ln x2 (predicted) for all

Ind. Eng. Chem. Res., Vol. 42, No. 22, 2003 5625 Table 2. UNIFAC Groups for All Solutes solute

groups

Mw (g/mol)

ephedrine hydrocortisone salicylic acid niflumic acid diuron monuron

1 CH3 + 1 CH + 5 ACH + 1 ACCH + 1 OH + 1 CH3NH 2 CH3 + 6 CH2 + 4 CH + 3 C + 1 CHdC + 3 OH + 2 CH2CO 4 ACH + 1 AC + 1 ACOH + 1 COOH 4 ACH + 1 AC + 1 C5H3N + 1 COOH + 1 CF3 + 1 AC-NH3 ACH + 2 ACCl + 1 -(CdO)-N(CH3)2 + 1 AC-NH4 ACH + 1 ACCl + 1 -(CdO)-N(CH3)2 + 1 AC-NH-

165.234 362.500 138.120 282.221 233.096 198.650

Table 3. Hydrocortisone Reference Solvent Selection reference solvent, j

ln(x2jγ2j∞/xid 2)

∑i(δ ln x2,ij)

AAE(ln x2)eq 7

AAPE(ln x2)eq 7 (%)

n-hexane n-butyl acetate cyclohexane CCl4 isopropyl acetate ethyl acetate toluene benzene chloroform methyl acetate 1-octanol 1,2-propylene glycol water

-2.11 0.166 0.513 -0.406 -0.652 -0.813 -1.45 -1.86 -2.05 -0.943 0.0401 0.762 3.57

1.71 -0.568 -0.915 0.00406 0.250 0.411 1.04 1.46 1.65 0.541 -0.442 -1.16 -3.97

1.71 1.24 1.42 1.09 1.07 1.08 1.30 1.52 1.66 1.11 1.19 1.60 3.97

20.0 13.1 15.3 11.6 11.4 11.5 14.8 17.7 19.3 12.0 12.5 17.2 45.8

∑i(ln x2i + ln γ2i∞ - ln xid 2 )/N ) -0.4.

Figure 2. Hydrocortisone: measured versus predicted ln x2 diagram16 at 298.15 K.

Figure 3. Hydrocortisone solubility differences16 at 298.15 K. Reference solvent ) CCl4.

solvents (except water, for which the Scatchard/Hildebrand prediction is off the scale of the figure). The Scatchard/Hildebrand equation is reliable for low solubilities in hydrocarbons, but for the higher-solubility polar species, UNIFAC is much better. To determine whether there are systematic variations associated with xid 2 , we plot ∆ ln x2,ij from eq 8 vs δ1 in Figure 3 using carbon tetrachloride as the reference solvent, j (δj ≈ ∞ 17.6). Using γ2i from both the Scatchard/Hildebrand and UNIFAC methods, the results improve somewhat. The decrease in the errors for both methods suggests that xid 2 could be incorrect, probably from erroneous pure-component properties. Recently, Gracin et al.3 addressed the issue of errors in pure-component properties in some detail, showing that such errors are not uncommon.

We have used the analysis of eqs 11 and 12 for the UNIFAC representation of the hydrocortisone data. The numerical details are summarized in Table 3, where columns 4 and 5 gives the average absolute error (AAE) (in ln x2i) and the average absolute percent error (AAPE) for each reference solvent based on eq 7. Table 3 shows that carbon tetrachloride is the optimal reference solvent according to eq 12. Water is the poorest reference solvent. Other solvents give results of intermediate quality, although usually much closer to those of carbon tetrachloride than water. Note from Table 3 that minimization of |∑i δ ln x2,ij| does not guarantee minimization of either AAE or AAPE; two solvents have slightly lower AAEs and lower AAPEs than carbon tetrachloride even though carbon tetrachloride gives the smallest value of |∑i δ ln x2,ij|. However, the impact of these differences are minor. Thus, although minimizing

5626 Ind. Eng. Chem. Res., Vol. 42, No. 22, 2003 Table 4. Hydrocortisone Set. Reference Solvent Method, Eq 7 vs Eqs 3 + 6a AAE(ln x2) AAPE(ln x2) (%) reference, j ln(x2jγ2j∞/xid 2) a

eq 3

eq 6

eq 7

eq 7

1.3 14.1 none none

1.2 12.4 none none

1.1 11.6 CCl4 -0.406

3.97 45.8 water 3.57

∑i(ln x2i + ln γ2i∞ - ln xid 2 )/N ) -0.4.

Figure 5. Ephedrine solubility differences32 at 298.15 K. Reference solvent ) 1,3-propylene glycol. Table 5. Ephedrine Set. Reference Solvent Method, Eq 7 vs Eqs 3 + 6a AAE(ln x2) AAPE(ln x2) (%) reference, j ln(x2jγ2j∞/xid 2) a

Figure 4. Ephedrine ln x2 versus δ1 diagram32 at 298.15 K.

|∑i δ ln x2ij| does not guarantee minimal AAE or AAPE, it might be that any of several solvents is adequate. Table 4 compares results obtained using UNIFAC with eqs 3 and 6 (based on pure-solute data without reference solvents) with those obtained using eq 7 with water and carbon tetrachloride as the reference solvents. It is apparent that eq 7 with appropriate reference solvents does as well as eqs 3 and 6, if not better. However, a poor reference solvent can give large errors, so a breadth of data is desirable. Our experience with other solutes has confirmed these observations. Example Calculations for More Soluble Complex Chemicals with Known UNIFAC Parameters. Next, we consider the ephedrine data of Lin and Nash32 at T ) 298.15 K and the salicylic acid data (also T ) 298.15 K) measured by De Fina et al.33 These solubilities are significantly higher than those of hydrocortisone. The ephedrine properties in Table 1 are averages compiled by Lin and Nash; δ2 was calculated from the equation

δ22 )

∆U Lv 2 V L2



∆H Lv 2 - RT V L2

(13)

with V L2 and ∆H Lv 2 predicted via group contribution methods.17,18 Figure 4 shows the results in the same form as Figure 1. In this case, the filled symbols (data)

eq 3

eq 6

eq 7

eq 7

0.54 32.4 none none

1.14 288 none none

4.21 987 hexane 4.75

1.01 257 1,3-propylene glycol 0.614

∑i (ln x2i + ln γ2i∞ - ln xid 2 )/N ) 0.54.

do not form any obvious maximum. The open symbols are UNIFAC (eq 6) predictions. The errors are quite large, mainly because the models predict large but similar nonidealities using infinite-dilution activity coefficients. The solubility is very high (x2 ≈ 0.7), as suggested by T ) 298.15 K being very close to Tm2 ) 310 K; the result is that the solubilities, except for those of methanol and water, are essentially all at xid 2 . Thus, the appropriate value of γ2 to use is not γ∞2 , but a value much closer to unity. This is verified by obtaining the experimental ephedrine activity coefficient in a methaexp exp ) ln xid ≈ nol solution, γexp 2 , from eq 3 (ln γ2 2 - ln x2 exp -0.32 - ln(0.6371) ≈ 0.13 f γ2 ≈ 1.14) and comparing it to the UNIFAC prediction obtained by solving eq 3, which gives x2 ) 0.75, where γ1 ) 0.43 and γ2 ) 0.97. Neither is close to the (UNIFAC) predicted γ∞2 value () 0.044), yielding the large value of ln x2 in Figure 4. However, we can test whether the reference solvent approach can provide reasonable predictions even at high solubilities, which will be true if there are not great differences in the values of γ2 and γ∞2 among the solvents. Figure 5, which shows a plot of ∆ ln x2,ij versus δi for 1,3-propylene glycol as the reference solvent (δj ≈ 30.8), indicates that this is the case for a subset of the data. Goodness-of-fit measures are collected and compared in Table 5. Although the best agreement, not unexpectedly, is found when actual activity coefficients are used (eq 3), the results obtained using optimal reference solvent (1,3-propylene glycol) approach are quite good when compared to those obtained with eq 6. The final example for known UNIFAC groups is for salicylic acid, where the measured systems are more

Ind. Eng. Chem. Res., Vol. 42, No. 22, 2003 5627

Figure 8. Salicylic acid solubility differences33 at 298.15 K. Reference solvent ) ethyl acetate. Figure 6. Salicylic acid ln x2 versus δ1 diagram33 at 298.15 K.

Table 6. Salicylic Acid Set. Reference Solvent Method, Eq 7 vs Eqs 3 + 6a AAE(ln x2) AAPE(ln x2) (%) reference, j ln(x2jγ2j∞/xid 2) c

Figure 7. Salicylic acid; measured versus predicted ln x2 diagram33 at 298.15 K.

limited in δ1 values; the solubilities are in the range of x2 ≈ 0.2, and there is no apparent maximum when ln x2 is plotted as a function of δ1. We used an average of pure-component properties combined from different sources,34-36 although there was a major discrepancy in ∆Hm2. As Figure 6 (of the same form as Figure 4) and Figure 7 show, not only are the solubilities not small, we see that, for essentially all solvents, x2 > xid 2. The Scatchard/Hildebrand equation cannot describe such a case except by empirical adjustment, so to address this situation, an empirical extension for solvation attraction can be included.37 Additionally, for better agreement, UNIFAC interaction parameters could be adjusted as done by Gracin et al.3 Here we

eq 3

eq 6

eq 7

eq 7

0.78 50.4 none none

0.8 49.9 none none

2.7 160 DBEb 2.7

0.777 48.7 EtAcc -0.061

R ∑ (ln x + ln γ ∞ - ln xid)/N ) 0.001. b DBE ) dibutyl ether. i 2i 2i 2 EtAc ) ethyl acetate.

investigate the use of the reference solvent method, eq 7. The near-zero values of ∆ ln x2,ij in Figure 8 show that there is relatively little variation in the measured values, whereas the predicted values cover several orders of magnitude. Although UNIFAC does reasonably well at higher solubilities, both models are poor at lower solubilities (Figure 8), where the measured values are much higher than predicted. This is a case where there are significant differences for the solvents between appropriate γ2 and γ∞2 values. In developing a database of solubilities, this situation should become apparent with only a few solvents. Table 6 summarizes statistics and confirms that a good reference solvent, such as ethyl acetate, makes eq 7 at least as accurate as eqs 3 and 6, if not better. A poor reference solvent can give large errors. Thus, reference-solvent-based predictions of low solubilities are feasible and of a quality at least as good as those obtained using eqs 3 and 6. Reference-solventbased predictions might also work at higher solubilities, but the success will depend on whether there are great differences in the γ2 - γ∞2 deviations for the different solvents. Thermodynamic Models and Data Fitting When Parameter Values Are Unknown Even when solubilities are low and reliable melting data are available, it might not be possible to utilize these methods because solution-of-groups parameters often do not exist for all of the solute and solvent groups. These missing parameter values can be obtained only

5628 Ind. Eng. Chem. Res., Vol. 42, No. 22, 2003

by directly fitting the model to data, which might be problematic under many circumstances. Here, we explore options for obtaining unknown parameters. The UNIFAC parameters to be determined for an unknown (new) group are those for size and shape and those for interactions, that is (1) Rk and Qk, two parameters for each group, and (2) amn interaction parameters, usually two for each pair of groups. The values of Rk and Qk are obtained from atomic structure, but the amn parameters are determined only by fitting the model to a substantial and reliable body of data. For many of the cases of interest here, it is also common for there to be missing elements of the a matrix. A simple, direct approach is to adjust all such amn values is to minimize an objective function, S, of the form

S)



(ln x2i - ln xˆ 2i )2 )

i)data



( δ ln x2i )2

(14)

i)data

where a circumflex denotes a calculated value and i denotes a data point, usually one for each solvent. Here, the minimum number of solubility data needed would be the same as the number of unknown amn values. Because any complex chemical can have several new groups, there are often more unknown parameters than the number of solvents for which measurements can feasibly be made. One way to eliminate unknowns has been to evaluate the sensitivity19,20 of S to each of the adjustable parameters and adjust only the most sensitive ones. That is, the partial derivatives (∂S/∂amn)a are computed numerically, and then only the amn parameters with the largest derivatives are fitted. Similar ideas have been used for electrolyte solutions,38 where the numbers of standard-state and nonideality parameters are too great for simple regression. For our case, a formulation of this approach is to minimize an objective function modified from eq 14 in the form

S′ )



2

(δ ln x2i ) + R

i)data

∑ ( δamn )

2

(15)

m*n

where

δamn ≡ amn - a0mn

(16)

The quantity R keeps the data and parameter residuals in proper proportion, as in the so-called regularization19,20,39 techniques, also known as Ridge regression. Typically, one sets a0mn ) 0, which then minimizes the parameter values so that the m-n interaction is ignored unless available information, as uncovered by the fitting, supports a nonzero value of amn. The last sum in eq 15 must include at least one term for which the partial derivative (∂S/∂amn)a is nonzero. This technique can improve efficiency and reduce data requirements. In principle, the last sum in eq 15 enables one to adjust more parameters than the number of residuals in the first sum. Employing an objective function of the form of eq 14 restricts the number of adjustable parameters to the number of residuals (or number of data points), making the minimization problem equivalent to solving δ ln x2i ) 0 ( all i ). This will not happen during minimization of eq 15, as there will always be more squared terms in S ′ than adjustable parameters. Thus, each individual parameter estimated by minimizing eq 15 is, in some way, based on two types of information: residual information and regularization.

Table 7. (AC-NH-)-Containing Solutes solute, 2

δ2 [(J/cm3)1/2]

∆Hm2 (kJ/mol)

Tm2 (K)

ln xid 2 (T ) 298.15 K)

diuron niflumic acid monuron

26.5 23.8 24.37

33.89 32.73 29.46

430.92 476.85 447.6

-4.21 -4.95 -3.97

It is possible that limited residual (data) information might bias the parameters too strongly toward the regularized values, perhaps compromising the reliability of the prediction. That shortcoming will show up when such estimated parameters are applied in predictive mode. This potential problem will, of course, be less likely the more data points one has for creating residuals. Note, however, that both objective functions, S and S ′, include solubility residuals, so adjustments of interaction parameter values would also attempt to compensate for errors in applied pure-solute properties, which would bias the fitting results. An analogy to this situation is to fit vapor-liquid equilibrium data to obtain activity coefficient parameters, when there might be errors in the applied pure-component vapor pressure values. Here, we explore the possibility of minimizing another objective function, S ′′, based on eq 7 with the notation of eqs 8 and 9

S ′′ )



data i