Model-Based Calculation of Solid Solubility for ... - ACS Publications

Jun 14, 2008 - Hassan Modarresi,† Elisa Conte,† Jens Abildskov,† Rafiqul Gani,*,† and Peter Crafts‡. CAPEC, Department of Chemical and Biochemical ...
1 downloads 0 Views 165KB Size
5234

Ind. Eng. Chem. Res. 2008, 47, 5234–5242

Model-Based Calculation of Solid Solubility for Solvent SelectionsA Review Hassan Modarresi,† Elisa Conte,† Jens Abildskov,† Rafiqul Gani,*,† and Peter Crafts‡ CAPEC, Department of Chemical and Biochemical Engineering, Technical UniVersity of Denmark, Building 229, Soltofts Plads, 2800 Kgs. Lyngby, Denmark, and AstraZeneca, PR&D Department, Macclesfield, Cheshire SK10 2NA, United Kingdom

Solvent selection is one of the major concerns in the early development of many chemicals-based products from the pharmaceutical, agrochemicals, food, and specialty chemicals industries. Because of the nature of the active chemicals in the product, the most important solvent property is the solubility of complex solids. Predictive models for estimation of solid solubility in different organic solvents, especially suitable for solventselection procedures, are reviewed. Also, schemes that can be employed for solvent selection and/or solubility calculation through limited available experimental data are reviewed. For initial solvent screening and for many solvent-based calculations, the Hansen solubility parameters are useful property values to have. For this purpose, new combined group contribution-atom connectivity models for predictions of the three Hansen solubility parameters with a very wide application range are presented. These models are able to predict the Hansen solubility parameters for organic chemicals with C, H, N, O, F, Cl, Br, I, S, and P atoms. 1. Introduction New chemicals-based products, such as drugs, pesticides, and food additives, are being discovered, produced, and marketed almost on a daily basis, even though the development period, starting from discovery and ending with release to market, could be rather long. The important chemical responsible for the desired activity of these chemicals-based products is usually called the active ingredient (AI) or the active pharmaceutical ingredient (API), in the case of pharmaceutical products. In this paper, unless otherwise indicated, the term AI will be used to indicate both. The AIs are usually high molecular weight chemicals that are recovered (and purified) as pure solids by downstream processing from solutions coming from upstream reaction operations. Crystallization is one of the most common operations employed in the downstream recovery/purification steps, where solvents and solid solubility curves (solubility as a function of temperature) play a very important role. The role of the solvent in the crystallization process could very well determine the success or failure of this operation, and therefore, the selection of the appropriate solvent is an important issue.1,2 Karunanithi et al.3 have also shown that the shape of solid crystals may depend on the choice of the solvent. Obviously, the most important property governing the selection of the solvent (or blends of solvent and antisolvent) is the solubility of the AI in the solvent (or blend). By manipulating the solid saturation curves (solubility as a function of temperature), the optimal crystallizer configuration together with the selection of the solvent can be made.21 Other properties that play a role in the selection of solvents, as well as design/ operation of crystallizers, are the pure-component (of the AI and the solvent) normal melting points (Tm), normal boiling points (Tb), heats of fusion (Hf), solubility parameters (the Hildebrand solubility parameter (δT) as well as the three Hansen solubility parameters (δHB, δP, and δD)), and solvent properties related to environment, health, and safety (θEHS). If experimental data is not available for these properties, models would be needed to predict them. The main difficulty, however, is that, * Corresponding author. Tel.: (+45) 4525 2882. Fax: (+45) 4593 2906. E-mail:[email protected]. † Technical University of Denmark. ‡ AstraZeneca.

more often than not, the chemical complexity of the systems and minimal existing physical properties data limit the application range of most property models.1 In this respect, use of the group-contribution-based models and their extensions to combined group-atom-contribution models have become interesting options. The objective of this paper is to review the property models needed for solvent selection related to applications involving AI recovery/purification. More specifically, the paper provides a review of predictive property models suitable for solvent selection with special focus on the new developments in the area of the group-atom-contribution approach; followed by a review of the solvent-selection approaches applicable for solid solubility (or precipitation); followed by a review of available data that is suitable for solvent selection and solubility studies; and finally, a presentation of new developments in the form of group-atom-contribution models for the prediction of the three Hansen solubility parameters. The solvent-selection method considered in this review is the computer-aided molecular design (CAMD) technique developed by Harper and Gani4 and widely used for solvent design/selection. 2. Property Models In recent years, many efforts have been made to develop predictive models and methods for calculation of physical properties and phase behavior of newly discovered AIs. Although satisfactory results have been achieved in purecomponent property predictions,2,5 the phase-behavior prediction of the cases in question is still a challenging research area because of the complexity of the involved chemical systems.6 This complexity is further intensified considering that the design of chemical products usually involves the matching of the product needs with a set of target properties. For example, in the case of solvent design and selection, the solvent needs are defined through properties such as solubility, normal melting point, etc., and the property models used for estimating the property values need to be predictive because every solvent candidate will have a different molecular structure. Databases are unlikely to include the necessary property data for all solvent candidates.7 Molecular interaction based models will not have a wide application range, since they are unlikely to contain all

10.1021/ie0716363 CCC: $40.75  2008 American Chemical Society Published on Web 06/14/2008

Ind. Eng. Chem. Res., Vol. 47, No. 15, 2008 5235

the solvent candidate (molecule)-solute (molecule) interactions. On the other hand, predictive models, such as the GC-based models, have at least a wider application since the same group contributions (GCs) can be used to predict the necessary solvent (different molecular structure for every candidate)-solute interactions. The combined group-atom-contribution based property models further extend the application range by generating missing group parameters needed for solvent-solute interactions. Among phase-equilibria calculations required for solvent design and selection involving AIs, the solid-liquid equilibrium (SLE) calculation is the most important. Most AIs have high molecular weight and are sensitive to high temperature. In addition, high purity of AI is important for final product approval. Therefore, in many production processes, crystallization emerges as the only option for the final purification step, and consequently, SLE calculations involving AIs provide the “road map” for the crystallization process to be used in its production. Hence, accurate predictive models are vital in design/selection of solvents and/or blends of solvents and antisolvents suitable for crystallization. While numerous property-estimation methods exist, not all of them are applicable in CAMD techniques.4 In this review, therefore, only a class of predictive models suitable for CAMDbased solvent selection/design techniques is being considered. 2.1. Group-Contribution Models. Most property-estimation methods used in CAMD techniques are based on group contributions,8 where the properties of a species in solution are expressed in terms of functions of the number of occurrences of predefined fragments (groups) in the molecule. The best known and most successful predictive approach for phaseequilibria calculation of API solutions is the UNIFAC model and its modified versions.9 However, the method is restricted to the group information and group-interaction parameters, which for most newly developed drug molecules are missing. Moreover, even in the case of available required information, accuracy of the results even for moderately complex molecules are not adequate for solvent-design purposes in crystallization and extraction.6 2.2. CI Models. One limitation of the UNIFAC method is the missing group-interaction parameters in the UNIFAC parameter table. The unavailability of experimental data is one of the main reasons for the missing group-interaction parameters. In order to meet the increased demands with respect to the complexity of the chemical molecular structures, wider ranges of chemicals need to be handled. For this purpose, further development of current estimation methods and techniques is one option that may be considered. The objective here is to develop new features of the current methods so that they can handle increased complexity of molecular structures, demand for more reliable predictions, and need for more versatile property model parameter tables without the need for too many additional experimental data points (as most available data have already been used by the current models). At the same time, the developed estimation methods need to be computationally simple and efficient, so they can be used routinely for process-product engineering calculations. Recently, Gonza´lez et al.10 have proposed the use of connectivity indices (CIs) for the generation of missing group-interaction parameters for the UNIFAC parameter table. The basic idea is to derive a relationship (see eq 1) between the group interaction parameters with the CI and the atom constitution of the groups. Parameter regression is performed in order to determine the atom interaction parameters (AIPs), i.e., a, b, c, and d (see eq 1), that are

then used for the generation of any group-interaction parameter (GIP) akl (see eq 1). This model is called UNIFAC-CI. Once the AIPs have been regressed, it is possible to (i) generate values for missing GIPs on the UNIFAC parameter table, (ii) reestimate one or more GIPs, and (iii) create a new group and estimate its GIPs. The UNIFAC-CI method has been extended and tested for groups consisting of atoms of C, O, H, N, and Cl.11 On the basis of the original UNIFAC parameter table, the following groups are covered: CH2, CdC, ACH, ACCH2, OH, CH3OH, H2O, ACOH, CH2CO, CHO, CCOO, HCOO, CH2O, DOH (ethylene glycol), CtC, COO, OCCOH, furfural, CNH2, CNH, (C)3N, ACNH2, pyridine, CCtN, CNO2, ACNO2, ACRY (acrylonitrile), DMF (dimethylformamide), NMP (n-methyl pyrrolidone), CON, morpholine, CCl, CCl2, CCl3, CCl4, ACCl, and Cl-CdC involving 1332 GIPs. With the UNIFAC-CI method, only a set of 128 AIPs need to be regressed to cover all the groups containing C, H, O, N, and Cl atoms in the original UNIFAC matrix. A generic form for calculation of AIPs is highlighted through the following equations: CO CN akl ) bC-C(ACC kl )0 + bC-O(Akl )0 + bC-N(Akl )0 +...

0th-order interactions CO CN +cC-C(ACC kl )1 + cC-O(Akl )1 + cC-N(Akl )1 +... 1st-order interactions CO CN + d A +dC-C(ACC ) ( ) kl 2 C-O kl 2 + dC-N(Akl )2 +...

(1)

2nd-order interactions CO CN + e A +eC-C(ACC ) ( ) kl 3 C-O kl 3 + eC-N(Akl )3 +... 3rd-order interactions where

(AXY kl )0 ) (AXY kl )2 )

0 (l)V 0 n(k)V X χ(l) - nY χ(k)

; (AXY kl )1 )

V 0 V 0 χ(l) χ(k) (k)V 0 1 nX χ(l) - nY(l)Vχ(k) ; V 1 V 1 χ(l) χ(k)

(AXY kl )3 )

1 (l)V 0 n(k)V X χ(l) - nY χ(k) V 1 V 0 χ(l) χ(k) (k)V 2 0 nX χ(l) - nY(l)Vχ(k) V 2 V 0 χ(l) χ(k)

(2) In the above equations, all binary atomic interactions are combined to obtain the group interaction, akl. This is an empirical relation derived in a similar fashion as the UNIFAC model. The group interaction parameter akl between groups k and l is predicted from the calculated AklXY (see eq 2) and the regressed coefficients b, c, d, and e (the atom interaction parameters, AIPs) representing the atomic interactions (i.e., bC-C, bC-O, bC-N, bC-Cl, cC-C, cC-O, etc.) from eq 1. In the above equations, nX(k) is the number of atoms of type X in group k, and Vχm(k) is the mth-order valence connectivity index for group k. 2.3. UNIFAC/UNIFAC-CI Combinations. The UNIFACCI model has fewer adjustable parameters than the groupcontribution models. Thus, although it has shown good capability to fit known data, it cannot, in general, be expected to give high precision in predictions of phase equilibria, since with only few parameters, a large set of compounds are being represented. To overcome the problem of “missing groups” and/ or “missing group parameters”, however, this model is very useful when used together with the UNIFAC GC model (that is, use the UNIFAC-CI model only to generate the missing information in the host UNIFAC GC model). 2.4. NRTL-SAC. A useful model for SLE calculation developed recently for pharmaceutical applications is the

5236 Ind. Eng. Chem. Res., Vol. 47, No. 15, 2008

NRTL-SAC.12–14 The idea behind NRTL-SAC is to limit the number of intermolecular interaction parameters of the ordinary nonrandom two-liquid (NRTL) model to a few general ones, namely, hydrophobicity, hydrophilicity, and polarity characteristics of solvents. By applying such a hypothesis to the solution theory and fixing each interaction parameter value, phase behavior of any mixture can be predicted through these interactions, independent of specific solution information. That is, the phase behavior is defined using a few molecular descriptors of solution constituents, which have been defined as fractions of molecular structure. Therefore, the descriptors of a molecule (solvent) are evaluated with respect to a set of reference molecules. In the original NRTL-SAC method, these reference molecules were suggested to be water, n-heptane, and acetonitrile, representing hydrophilic, hydrophobic, and polar molecules, respectively. The model has been found to be suitable for solvent-selection purposes in a crystallization process, although more accurate correlative models or experimental data are necessary for final verification of the selected solvents for a specific crystallization process. 2.5. COSMO. Quantum chemistry calculation based methods such as the conductor-like screening model-realistic solvation (COSMO-RS)15 and the conductor-like screening model-segment activity coefficient (COSMO-SAC)16 are two a priori models that predict intermolecular interactions using molecular structure and a few adjustable parameters. Both models require input in the form of a σ-profile. These profiles are strongly dependent on the methods of solving the Schro¨dinger equation. While the scope of these methods is potentially very large, the lack of a comprehensive, open literature σ-profile database and/or calculation method represents an inconvenience associated with the applicability of the COSMO approach. The capabilities of these methods in SLE modeling have not been proven, specifically for complex APIs. Recent attempts, for a data set of APIs, gave less satisfactory results,17 although applications in many other areas have been encouraging. 3. Solvent-Selection Methods Design or selection of the optimal solvent for a given crystallization operation is quite complex. Solvent design or selection procedures have been developed for liquid-liquid extraction,18 but surprisingly, there has been comparatively little work done on solvent design or selection strategies for crystallization, where an antisolvent rather than a solvent is necessary. 3.1. Databases. Nass19 describes a strategy for choosing crystallization solvents based on equilibrium limits. Frank et al.20 review strategies for solvent selection for various types of crystallization processes such as cooling crystallization and drowning-out crystallization. In both of the mentioned publications, the task of solvent selection is carried out from a list of good solvents from a database with solubility being the only criterion for selection. Frequently, when pure-component properties are dominant in the product design, the size of the combinatorial problem is not limited by the availability of binary parameters. A sound strategy to handle this problem is to make a preliminary search of product candidates using only singleand dual-valence groups. Thereafter, it is convenient to select the main group families that lead to the most promising branched structures. Note that, in this case, a database search method may also be employed, provided that a large database is available. If mixture properties are also considered, the search is more difficult. Thus, the direct role of database search in solvent design, which has been very dominant in the past, may still continue to be so for many cases in the future. More computer-

aided methods can be expected to gain acceptance in the future, where, as highlighted below, databases can play a more indirect role in solvent design for APIs. In the future, databases will more likely be used for cross-checking, confirming, and/or validating results of systematic solvent designs as well as for developing property-prediction models. 3.2. CAMDsComputer-Aided Molecular Design. The CAMD technique provides a more systematic approach for solvent design/selection for crystallization processes by efficiently allowing other factors to be considered simultaneously, in addition to measures of solubility. Also, many other factors such as the solvent effect on crystal morphology, solvent ability to dissolve impurities, solvent inflammability, solvent toxicity, etc. need to be considered during the design/selection stage of the solvent (as part of the crystallizer design). These factors are important because undesired crystal morphology can reduce the efficiency of the filtration step, alter the purity of the product, and/or create problems during drying, packaging, handling, and storage. Safety and toxicity characteristics of the solvents are also very important. Moreover, in the biopharmaceutical industry, new drug molecules are constantly being discovered and selection of solvents from a list of common solvents is not always feasible. Karunnithi et al.3,21 presented a computer-aided molecular design framework for the design/selection of solvents and/or antisolvents for design/analysis of crystallization processes. The solvent-selection problem, defined as a CAMD problem, is formulated as a mixed-integer nonlinear programming model (MINLP). Although the CAMD problem definition allows many combinations of performance objectives and property constraints, in the case studies considered here, the issue of potential solvent recovery was considered as the performance objective. The latter was maximized, while other solvent property requirements such as solubility, crystal morphology, flash point, toxicity, viscosity, normal boiling point, and normal melting point are posed as constraints. All the properties involved in the CAMD problem definition have been estimated using group-contribution-based methods. The MINLP model is then solved using a decomposition approach to obtain optimal solvent molecules. Solvent design and selection for two types of solution crystallization processes, namely, cooling crystallization and drowning-out crystallization, are presented. Karunnithi et al.3,21 gave two case studies: in the first case study, the design/selection of a single solvent for crystallization of ibuprofen was considered. One of the important issues, namely, the effect of the solvent molecular structure (in the form of Hansen solubility parameters) on the shape of ibuprofen crystals, was also considered in the MINLP model. The second case study from Karunnithi et al.3,21 is a (solvent) mixture design problem where an optimal solvent/antisolvent mixture is designed for crystallization of ibuprofen by the drowning-out technique. For both case studies, the performances of the solvents were verified qualitatively through SLE diagrams and, more recently, through experiments.3 3.3. Reference-Solvent Methods. The reference-solvent methods6,22,23 are methods for predicting the solubility of sparingly soluble solid fine chemicals and pharmaceutical chemicals. This solvent-selection procedure uses groupcontribution-based methods for computing the difference in solubility of a solute at infinite dilution in the solvent of interest and in an optimal reference solvent with the aim of (i) minimizing the impact of uncertainties in pure-(component) solute properties, (ii) decreasing the number of adjustable parameters to be determined by data reduction, and (iii) using appropriate experimental data to fit unknown parameters. The

Ind. Eng. Chem. Res., Vol. 47, No. 15, 2008 5237

basic idea of the method is that the differences between the two (low) solubilities (xi) at the same temperature can be approximated by ∞ ln xim - ln xir ) ln γir∞ - ln γim

(3)

because the ideal solubility contributions of the solute in the two solubility expressions cancel out. This approach gives a direct relation between the solubilities of a solute in a given solvent (denoted by m) and that in the reference solvent (denoted by r), as well as the values of the infinite dilution activity coefficients (γ∞) in the two solvents ln xim ) ln xir + [ln γ∞i (T, {xS}r) - ln γ∞i (T, {xS}m)]

(4)

Fitting new group parameters (for the GE model used for calculating the infinite dilution activity coefficients) is accomplished by using an adaptation of regularization

{∑

min

m20.32 This uncertainty may be extended to the other terms of Hansen solubility parameters. For overcoming this uncertainty, one approach might be to use experimental data for higher molecular weight solvents in the regression step. Such solvents may capture the API molecular structure features better than low molecular weight solvents. However, such data are scarcely available. Another approach might be to use a more accurate regular solution-based model (in one sense, there is one more adjustable parameter), since we found eqs 6 and 8 fairly unreliable in SLErelated screening problems of APIs. In this regard, the Flory-Huggins model, which was developed based on Hansen solubility parameters (FH-HSP model),33 has been found more valuable for solvent screening in crystallization problems. According to the FH-HSP model, the activity coefficient is calculated by ln γi ) ln

φi φi + 1 - + χijφj2 xi xi

(9)

and χij ) R

Vi {(δ - δd,j)2 + 0.25[(δp,i - δp,j)2 + (δh,i - δh,j)2]} RT d,i (10)

where R is an extra adjustable parameter. However, in the case of a small number of experimental data points, the value of R can be set to 1. If there are enough data points, e.g., more than six solubility measurements, the extra adjustable parameter can be also fitted on data. Fitting the same experimental data set for felodipine using eqs 8–10, a value of 22.1 was obtained for the Hansen dispersion solubility parameter, which is more realistic. Moreover, the FHHSP model showed more robust fitting of solubility data. Figure 4 shows the scatter diagram of predicted versus experimental solubility data of extended-Hansen and FH-HSP models. Nevertheless, it should be noted that regular solution-based models are not usually suitable to predict temperature depen-

Figure 4. Scatter diagram of extended-Hansen and FH-HSP model-based predicted versus experimental solubility data.

Figure 5. Tuned-UNIFAC model-based solubility prediction of felodipine at 25 °C in mixed ethanol/solvent.

dency of API solution activity coefficient (and, consequently, solubility), as well as solubility in mixed solvents, which are crucial for industrial crystallization modeling. Preferably excess Gibbs free energy based models, such as correlative NRTL and UNIQUAC or predictive UNIFAC models, ought to be utilized for such application. Usually, because of the lack of sufficient experimental measurement data, the correlative activity coefficient model cannot be used. Meanwhile, a tuned-UNIFAC model approach34 has shown robust capability in solubility correlation. The method is based on the original UNIFAC activity model, where a few experimental data are used to identify and tune the most sensitive group-interaction parameters in the model. The principles, which are based on the first-derivative analysis of activity coefficient regarding the interaction parameters, are discussed elsewhere.34 Application of the method for felodipine solubility in ethanol/solvent-2 on a solvent-free basis at 25 °C is presented in Figure 5. According to our experience in SLE modeling of APIs, the UNIFAC model-based solubility calculation usually shows a high discrepancy with respect to experimental data for these systems. However, in most cases, the prediction trend of the solubility with UNIFAC is correct. It seems that using a few additional experimental data points might shift the predicted solubility curve toward the measured data. In fact, the tuned-UNIFAC approach follows this procedure. The prediction for the felodipine case is in good agreement with binary experimental data and also shows rational behavior through the mixed-solvent composition range, though there were not measured ternary mixture values available for comparison. The new method is implemented in the ICAS-SoluCalc toolbox,35 where solubility analysis of an API in mixed solvents

5240 Ind. Eng. Chem. Res., Vol. 47, No. 15, 2008

Figure 6. Calculated dispersion HSP versus experimental dispersion HSP values: (a) calculation with the GC methods and (b) with the CI method.

and various temperatures gives the possibility of studying cooling and drowning-out crystallization problems. One of the advantages of this method is that it can be applied even for compounds that have some (a few) missing group information in the original UNIFAC model. However, it must be noted that such a tuned model is only applicable for the systems whose data have been used for regression. In other words, the predictive UNIFAC model turns to a correlative but more accurate model. 5.2. New HSP GC-CI Prediction Model. Hansen solubility parameters (HSPs) play an important role in the solvent-selection process, and therefore, a new, more flexible model with wider application range has been attempted. In this regard, a combined group-contribution2 and connectivity-index36 (GC-CI) method has been developed. The use of this predictive model would be very valuable because of its wide application range and, therefore, particularly suitable for AIs, solvents, and many newly synthesized organic chemicals. In this method, the molecular structure of a compound is considered to be collection of groups in three different levels; i.e., first-, second-, and third-order groups. The first-order groups are for describing the structure of compounds, while the secondand third-order groups provide more structural information about molecular fragments whose contributions may not be sufficient through the first-order groups. This group-contribution method is written as follows, f() )

∑ N C + w∑ M D + z∑ O E i i

i

j

j

j

k k

(11)

k

where Ci is the contribution of the first-order group of type i that occurs Ni times, Dj and Ek are the contributions of the second-order group of type j, and the third-order group is of type k, which occurs Mj and Ok times, respectively. In the first level of estimation, where w ) z ) 0, only first-order groups are employed. In the second level, where w ) 1 and z ) 0, only first- and second-order groups are involved. In the third level, both w ) 1 and z ) 1, and contributions of groups of all levels are included in the calculation. The left-hand side of eq 11 is a simple function, f(), of the property. Here, the functions are simply dispersion (δd), polar (δp), and H-bond (δh) HSPs. In addition, atomic representation of groups using connectivity indices makes it possible to generate the missing group parameters. Combining this approach with the above explained GC model would cover all molecular functional groups, which consisted of atoms that have been involved in the database used for model development. The CI model may not have as high

accuracy as the three-level GC method described above, but it has been found to provide, with reasonable accuracy, the missing group-contribution values for property prediction.37 The general form of the CI model is written as f() )

∑ (a A ) + b( χ ) + 2c( χ ) + d V 0

i i

V 1

(12)

i

where f() is the sought property function and Ai is the contribution of atom i occurring in the molecular structure ai times. Vχ0 and Vχ1 are the zeroth-order (atom) and first-order (bond) connectivity indices, respectively, as defined by Kier and Hall.38 b, c and d are adjustable parameters. Such models have very few adjustable parameters: Ai (one per atom i), b, c, and d. Again, the property functions are simply dispersion (δd), polar (δp), and H-bond (δh) HSP. Equations 11 and 12 have been utilized to develop a GC-CI prediction model for HSPs using data for 1050 compounds. All data are taken from one source,32 where mentioned predicted values were excluded. Figures 6, 7, and 8 show the calculated values for the properties of interest (dispersion, polar, and H-bond HSPs, respectively) versus the experimental values. In the figures, the final (third-order) root-mean-squared error (RMSE) of the developed GC model is reported. Table 1 gives the root-mean-squared error (RMSE) of the developed GC and CI models for dispersion, polar, and H-bond HSPs. Tables 2, 3, and 4 gives the RMSE in terms of molecule types (hydrocarbons, alcohols, ketones, aldehydes, chlorinated, fluorinated, amines, amides, sulfonated, others) and how many compounds belong to each family. Each family contains unifunctional compounds; all the multifunctional compounds are put under “others”. Using the new method to predict the dispersion solubility parameter of felodipine results in 21.8 MPa1/2, which is very close to the experimentally measured value of 22.1. Tables 5-8 (see Supporting Information) give, respectively, the first-order, the second-order, and the third-order groups and atom-CI along with their contributions for the Hansen solubility parameters. 6. Conclusions The predictive models for estimation of solubility of solutes in organic solvent compounds have been reviewed together with their use in solvent-selection procedures. The availability of sufficient experimentally measured data for model tuning and

Ind. Eng. Chem. Res., Vol. 47, No. 15, 2008 5241

Figure 7. Calculated polar HSP versus experimental polar HSP values: (a) calculation with the GC methods and (b) with the CI method.

Figure 8. Calculated H-Bond HSP versus experimental H-Bond HSP values: (a) calculation with the GC methods and (b) with the CI method. Table 3. RMSE of Developed GC and CI Methods for Polar HSP Prediction

Table 1. RMSE of Developed GC and CI Methods for Hansen Solubility Parameter Prediction GC Hansen solubility parameter dispersion polar H-bond

first-order 1.26 2.29 1.86

second-order 1.14 2.18 1.72

GC third-order 1.08 2.15 1.70

CI 5.99 4.49 4.99

Table 2. RMSE of Developed GC and CI Methods for Dispersion HSP Prediction GC dispersion

no.

first-order

second-order

third-order

CI

hydrocarbons alcohols ketones aldehydes chlorinated fluorinated amines amides sulfonated others

84 90 35 17 76 18 28 14 26 663

2.29 0.98 2.10 0.50 0.77 0.56 1.34 0.69 1.26 1.12

2.12 0.75 1.90 0.42 0.68 0.60 1.32 0.68 1.25 1.00

1.78 0.87 1.91 0.42 0.66 0.60 0.77 0.68 1.25 0.99

5.81 4.96 5.44 8.03 6.22 8.51 3.97 3.89 6.25 6.09

adaptation has also been reviewed. For applications involving AIs, where only limited or zero measured data is usually available, various approaches that can provide useful estimates have been discussed. In the area of predictive models, a new

polar

no.

first-order

second-order

third-order

CI

hydrocarbons alcohols ketones aldehydes chlorinated fluorinated amines amides sulfonated others

84 90 35 17 76 18 28 14 26 663

2.16 1.53 2.68 2.58 2.74 2.92 2.14 1.49 1.40 2.34

2.11 1.49 2.53 2.43 2.58 2.77 1.87 1.49 1.40 2.23

2.13 1.43 2.47 2.43 2.57 2.77 1.74 1.49 1.40 2.18

2.44 3.25 5.45 7.92 4.40 5.82 3.38 5.37 2.98 4.70

combined GC-CI based model for predicting the three Hansen solubility parameters has been developed. Since the model can take second- and third-order groups into account, the Hansen solubility parameters for a wide range of AIs can now be predicted more accurately. In addition, a connectivity indices based model has been developed and can be utilized in combination with the new GC model to cover the missing group parameters, thereby extending the application range of the GCbased model. In other words, the method is able to provide a result even if a group contribution to a particular compound is not known. This can be extremely useful in design scenarios involving solubility problems in the pharmaceutical industry.

5242 Ind. Eng. Chem. Res., Vol. 47, No. 15, 2008 Table 4. RMSE of Developed GC and CI Methods for H-bond HSP Prediction GC H-bond

no.

first-order

second-order

third- order

CI

hydrocarbons alcohols ketones aldehydes chlorinated fluorinated amines amides sulfonated others

84 90 35 17 76 18 28 14 26 663

1.89 2.39 2.42 0.97 1.57 2.50 2.25 1.45 1.79 1.75

1.88 2.29 0.92 0.92 1.55 2.40 2.09 1.45 1.79 1.64

1.79 2.28 0.93 0.92 1.56 2.40 1.88 1.45 1.79 1.62

4.21 8.69 3.15 4.73 2.27 2.68 5.08 5.09 3.92 4.78

Acknowledgment The authors acknowledge AstraZeneca Ltd. for providing financial support to the research programs at CAPEC, DTU. Supporting Information Available: Tables 5-8 are provided as Supporting Information. This material is available free of charge via the Internet at http://pubs.acs.org. Literature Cited (1) Ka, M. Ng.; Gani, R.; Dam-Johansen, K. Chemical Product Design: Towards a PerspectiVe through Studies; Elsevier B.V.: The Netherlands, 2007. (2) Marrero, J.; Gani, R. Group contribution based estimation of pure component properties. Fluid Phase Equilib. 2001, 183-184, 183. (3) Karunanithi, A. T.; Acquah, C; Achenie, L. E. K.; Sithambaram, S.; Suib, S. L.; Gani, R. An experimental verification of morphology of ibuprofen crystals from CAMD designed solvent. Chem. Eng. Sci. 2007, 12, 3276. (4) Harper, P M; Gani, R. A multi-step and multi-level approach for computer aided molecular design. Comput. Chem. Eng. 2000, 24, 677. (5) Constantinou, L.; Gani, R. New group contribution method for estimating properties of pure compounds. AIChE J. 1994, 40, 1697. (6) Abildskov, J.; O’Connell, J. P. Prediction of solubility of complex chemicals. I. Solutes in different solvents. Ind. Eng. Chem. Res. 2003, 42, 5622. (7) Gani, R.; Jime´nez-Gonza´lez, C.; ten Kate, A.; Crafts, P. A.; Jones, M.; Powell, L.; Atherton, J. H.; Cordiner, J. L. A modern approach to solvent selection. Chem. Eng. 2006, 30. (8) Wilson, G. M.; Deal, C. H. Activity Coefficients and Molecular Structure. Activity Coefficients in Changing Environments-Solutions of Groups. Ind. Eng. Chem. Fundam. 1962, 1 (1), 20. (9) Gmehling, J.; Li, J.; Schiller, M. A modified UNIFAC model. 2. Present parameter matrix and results for different thermodynamic properties. Ind. Eng. Chem. Res. 1993, 32, 178. (10) Gonza´lez, H. E.; Abildskov, J.; Gani, R.; Rosseaux, P.; Le Bert, B. A. Method for Prediction of UNIFAC Group Interaction Parameters. AIChE J. 2007, 53, 1393. (11) Gonza´lez, H. E.; Abildskov, J.; Gani, R. Computer-aided framework for pure component properties and phase equilibria prediction for organic systems. Fluid Phase Equilib. 2007, 261, 199. (12) Chen, C. C. A segment-base local composition model for the Gibbs energy of polymer solutions. Fluid Phase Equilib. 1993, 83, 301. (13) Chen, C. C.; Song, Y. Solubility modeling with nonrandom twoliquid segment activity coefficient model. Ind. Eng. Chem. Res. 2004, 43, 8354. (14) Chen, C. C.; Crafts, P. A. Correlation and Prediction of Drug Molecule Solubility in Mixed Solvent Systems with the Nonrandom Two-

Liquid Segment Activity Coefficient (NRTL-SAC) Model. Ind. Eng. Chem. Res. 2006, 45, 4816. (15) Klamt, A. From Quantum Chemistry to Fluid Phase Thermodynamics and Drug Design; Elsevier B.V.: The Netherlands, 2005. (16) Lin, S. T.; Sandler, S. I. A Priori Phase Equilibrium Prediction from a Segment Contribution Solvation Model. Ind. Eng. Chem. Res. 2002, 41, 899. (17) Mullins, P. E. Application of COSMO-SAC to Solid Solubility in Pure and Mixed Solvent Mixtures for Organic Pharmacological Compounds. M.Sc. Thesis, Virginia Polytechnic Institute and State University, Blacksburg, VA, 2007. (18) Pretel, E. J.; Lopez, P. A.; Bottini, S. B.; Brignole, E. A. Computeraided molecular design of solvents for separation processes. AIChE J. 1994, 40, 1349. (19) Nass, K. K. Rational solvent selection for cooling crystallizations. Ind. Eng. Chem. Res. 1994, 33, 1580. (20) Frank, T. C.; Downey, J. R.; Gupta, S. K. Quickly screen solvents for organic solids. Chem. Eng. Prog. 1999, 95 (12), 41. (21) Karunanithi, A. T.; Achenie, L. E. K.; Gani, R. A computer-aided molecular design framework for crystallization solvent design. Chem. Eng. Sci. 2006, 61, 1247. (22) Abildskov, J.; O’Connell, J. P. Prediction of solubilities of complex medium-sized chemicals. II. Solutes in mixed solvents. Mol. Simul. 2004, 30, 367. (23) Abildskov, J.; O’Connell, J. P. Thermodynamic method for obtaining the solubilities of complex medium-sized chemicals in pure and mixed solvents. Fluid Phase Equilib. 2004, 228, 395. (24) Seidel, A. Solubilities of Organic Compounds, 3rd ed.; Van Nostrand: Princeton, NJ, 1941; Supplement (with W. F. Linke), ibid. , 1952. (25) Landolt-Bo¨rnstein, E. Zahlenwerte und Funktionen; Zweiter Band. Springer-Verlag: Berlin, 1959. (26) Stephen, H.; Stephen, T. The Solubilities of Inorganic and Organic Compounds; Pergamon Press: Oxford, U.K., 1963. (27) Wisniak, J.; Herskowitz, M. Solubilities of Gases and Solids. A Literature Source Book; Physical Sciences Data 18, parts A-B; Elsevier B.V.: The Netherlands, 1984. (28) Knapp, H.; Teller, M.; Langhorst, R. Solid-Liquid Equilibrium Data Collection. Part 1. Binary Systems; DECHEMA CDS: Germany, 1987; Vol. VIII, pt. 1. (29) Sangster, J. Phase Diagrams and Thermodynamic Properties of Binary Systems of Drugs. J. Phys. Chem. Ref. Data 1999, 28, 889. (30) Allan F.; Barton, M. Handbook of Solubility Parameters and Other Cohesion Parameters; CRC Press Inc.: Boca Raton, FL, 1983. (31) Ashton, N. F.; McDermott, C.; Brench, A. Chemistry of Extraction of Nonreacting Solutes, Chapter 1. In Handbook of SolVent Extraction; Lo, T. C., Baird, M. H. I., Hanson, C., Eds.; Wiley: New York, 1983. (32) Hansen, C. M. Hansen Solubility Parameters: A User’s Handbook, 2nd ed.; CRC Press: Boca Raton, FL, 2007. (33) Lindvig, T.; Michelsen, M. L. Kontogeorgis, G. M. A FloryHuggins model based on the Hansen Solubility Parameters. Fluid Phase Equilib. 2002, 203, 247. (34) Christensen, S. Prediction of Pesticide Solubility. M.Sc.Thesis, CAPEC, Technical University of Denmark, Lyngby, Denmark, 2003. (35) ICAS Documentation, Internal Report, CAPEC, Department of Chemical Engineering, Technical University of Denmark; www.capec. kt.dtu.dk (36) Camrada, K. V.; Maranas, C. D. Optimisation in polymer design using connectivity indices. Ind. Eng. Chem. Res. 1999, 38, 1884. (37) Gani, R.; Harper, P. M.; Hostrup, M. Automatic creation of missing groups through connectivity index for pure-component property prediction. Ind. Eng. Chem. Res. 2005, 44, 7262. (38) Hall, L. H.; Kier, L. B. Molecular connectiVity in chemistry and drug research; Academic Press: New York, 1976.

ReceiVed for reView December 1, 2007 ReVised manuscript receiVed March 17, 2008 Accepted March 21, 2008 IE0716363