ARTICLE pubs.acs.org/jced
A Predictive Model for the Solubility and Octanol-Water Partition Coefficient of Pharmaceuticals Chieh-Ming Hsieh,† Shu Wang,‡ Shiang-Tai Lin,*,† and Stanley I. Sandler‡ † ‡
Department of Chemical Engineering, National Taiwan University, Taipei, 10617, Taiwan Center for Molecular and Engineering Thermodynamics, Department of Chemical Engineering, University of Delaware, Newark, Delaware 19716, United States
bS Supporting Information ABSTRACT: The prediction of drug solubility in various pure and mixed solvents and the octanol-water partition coefficient (KOW) are evaluated using the recently revised conductor-like screening segment activity coefficient (COSMO-SAC) model. The solubility data of 51 drug compounds in 37 different solvents and their combinations over a temperature range of 273.15 K to 323.15 K (300 systems, 2918 data points) are calculated from the COSMO-SAC model and compared to experiments. The solubility data cover a wide range of solubility from (10-1 to 10-6) in mole fraction. When only the heat of fusion and the normal melting temperature of the drug are used, the average absolute error from the revised model is found to be 236 %, a significant reduction from that (388 %) of the original COSMO-SAC model. When the pure drug properties (heat of fusion and melting temperature) are not available, predictions can still be made with a similar accuracy using the solubility data of the drug in any other solvent or solvent mixture. The accuracy in prediction of the solubility in a mixed solvent can be greatly improved (average error of 70 %), if the measured solubility data in one pure solvent is used. Also, the standard deviation in log KOW of 89 drugs, whose values range from -3.7 to 5.25, is found to be 1.14 from the revised COSMO-SAC model. Our results show that the COSMO-SAC model can provide reliable predictions of properties of pharmaceuticals and is a useful tool for drug discovery.
1. INTRODUCTION The knowledge of solubility and the octanol-water partition coefficient (KOW) of a drug are important in drug discovery, development, and manufacturing.1-3 For example, in the area of rational drug design, a strategy in the discovery of new drugs is to use information about the structure of a drug receptor or ligands to identify candidate drugs.4 The octanol-water partition coefficient may be used for this process, as it is one index in the rule of five, in particular that log KOW should not exceed five for a candidate drug. Another example of the use of the octanol-water partition coefficient is in quantitative structure-activity relationships (QSARs), where it is assumed that there is a simple relationship between KOW and biological responses, such as the lethal dose for 50 % of test organisms, LD50.4 For the synthesis, production, and formulation of a drug, solubility is an important factor, as solubility requirements can be very different in different stages of the process. For example, high (or intermediate) solubility is desired in the reaction stage, while low solubility in the solvent is desired in the purification and crystallization stage. The solubility of a drug in water is also important to estimate the drug access to biological membranes.5 Solvent screening (i.e., finding the optimal solvent or solvent combinations for desired solubility) requires making many measurements, a costly and time-consuming task. Although the techniques of solubility and KOW measurement have improved,6-8 it is impractical to measure these data for all drug candidates at all possible operating conditions (temperature, composition of a mixed solvent, etc.). Furthermore, as a result of the wide range of values of solubility and KOW, especially in cases r 2010 American Chemical Society
of extremely large or small values, the accuracy of the measurements is sometimes questionable. It is not uncommon that reported values for the same compound may differ by a factor of 10 (one log unit) or more. Thus, a predictive thermodynamic model, that does not rely on experimental data, would be very useful for the design and development of drugs.9 In the literature several ways to estimate values of KOW10 and the solubility9,11-16 of organic compounds have been reported with varying degrees of accuracy. In particular, the diverse chemical structures of drug molecules and the complexity in drug-solvent interactions present great challenges. Most predictive methods for KOW are based on an arbitrarily defined set of fragments and a correlation using approximately 1000 compounds (training set) for which there were reliable experimental KOW data.17 While such structure-based methods can be very accurate, their accuracy deteriorates for compounds that are outside the training set. This is especially so for compounds having complex structures, such as pharmaceuticals, for which it may also be difficult to divide the molecule systematically into the fragments or groups that were defined in the training set. Furthermore, some methods have not been verified for calculating the solubility of drugs. Theoretically based approaches allow for the prediction of both KOW and solubility of drugs simultaneously. One very Special Issue: John M. Prausnitz Festschrift Received: August 27, 2010 Accepted: October 21, 2010 Published: November 11, 2010 936
dx.doi.org/10.1021/je1008872 | J. Chem. Eng. Data 2011, 56, 936–945
Journal of Chemical & Engineering Data
ARTICLE
successful example is the nonrandom two-liquid segment activity coefficient (NRTL-SAC) model of Chen and Song.13 In this model, each compound is characterized using four speciesspecific parameters that can be determined from the regression of few experimental vapor-liquid equilibrium or solubility data. Once the parameters are available for both the drug and the solvent, satisfactory predictions can be achieved. This method has been shown to be a practical tool in the design of drug purification processes.18 Another example is the conductor-like screening (COSMO)-based activity coefficient models,11,12,16 in which the drug-solvent interactions are determined from quantum mechanical solvation calculations. Therefore, predictions can be made prior to, and in the absence of, any experimental measurements. Mullins et al.11 examined the accuracy of the conductor-like screening segment activity coefficient (COSMOSAC) model of Lin and Sandler19 using 33 drugs in 37 solvents and solvent mixtures. Shu and Lin16 have shown that the accuracy of COSMO-SAC in the prediction of drug solubility in mixed solvents can be greatly improved when experimental solubility data in relevant pure solvents are used. Kaemmerer et al.20 used the COSMO-SAC model21 for solvent or antisolvent screening and developed a selective crystallization process for chiral pharmaceuticals. Recently, the COSMO-SAC activity coefficient model has been improved to provide better descriptions of vapor-liquid and liquid-liquid equilibria.22 In this study, we examine the accuracy of the latest COSMOSAC model [denoted here as COSMO-SAC(2010)] in the prediction of solubilities and octanol-water partition coefficients of drugs. A larger set of solubility data (51 drugs in 37 solvents and their mixtures) and the KOW (of 89 drugs) is used, and the performance of the model on drugs of different isomeric structures is tested. No experimental data are needed for the prediction of KOW . Heat of fusion and melting temperature are used for the prediction of drug solubilities. However, when the needed data for the drug (heat of fusion and melting temperature) are missing, we show that consistent predictions can be achieved in both pure and mixed solvents using a single solubility data point. When the correction method of Shu and Lin16 is used, the COSMOSAC(2010) model predictions of solubility data are within 70 % of the experiment.
The correction terms involving the heat capacities have been neglected in eq 3 because their contributions are much smaller compared to the contribution from the heat of fusion. Substituting eqs 2 and 3 into eq 1, we obtain an expression for calculating the solubility of drug i, mole fraction xi of drug in the solvent, as24 ! Δfus Hi 1 1 ln xi ¼ ð4Þ - ln γi ðT, P, x Þ T Tm, i R Therefore, to evaluate the solubility xi, it is necessary to have ΔfusHi and Tm,i of the drug and its activity coefficient γi in the liquid mixture. The experimental values for ΔfusHi and Tm,i are used when available. Most of these data can be found in the DECHEMA25,26 and/or National Institute of Standards and Technology (NIST)27 databases. When experimental data are not available, the group contribution method, such as that of Chickos and Acree,28,29 can be used (only for 2-(4-methylphenyl)acetic acid in this work). Water and 1-octanol are partially miscible at ambient conditions. When mixed, an octanol-rich phase (containing approximately 0.725 mole fraction 1-octanol and 0.275 mole fraction water) coexists with another water-rich phase (which is nearly pure water) at equilibrium. The octanol-water partition coefficient of a drug is the ratio of its distribution in these two phases when a trace amount of the drug is added, that is, KOW , i ¼ lim
Ci s f0
liquid
ðT, P, xÞ
O W W μO i ðT, P, x Þ ¼ μi ðT, P, x Þ
liquid
liquid
ðT, P, x Þ ¼ μi
ðT, PÞ þ RT ln xi γi ðT, P, xÞ
ð6Þ
Therefore (from eq 2), the mole fraction of the drug is related to the activity coefficient as W O
xO γW i i ðT, P, x Þ ¼ Þ W γO xi i ðT, P, x
ð7Þ
and the octanol-water partition coefficient can be determined from ! ! O xO CO γW;¥ i i C log KOW, i ¼ log lim W W ¼ log xi s f 0 xi C CW γO;¥ i ! 8:37γW;¥ i ¼ log ð8Þ 55:5γO;¥ i
ð1Þ
The chemical potential of the species in the liquid mixture is related to its activity coefficient as μi
ð5Þ
W where CO i and Ci are the molar concentrations of drug i in the octanol-rich and water-rich phases, respectively. In the limit of low solute concentrations, CO i can be estimated from the mole fraction of the drug and the total molar concentration of the O O W W W liquid, that is, CO i ≈ xi C and Ci ≈ xi C . Furthermore, the equilibrium criteria require the equivalence of chemical potential of the drug in both phases
2. THEORY The solubility limit of a solid drug i in a solvent S (whose composition is denoted by x) is determined from the equality of chemical potentials of the drug i in the solid (assuming a pure phase) and the liquid solvent at equilibrium, that is, ðT, PÞ ¼ μi μsolid i
CO i CW i
where γO,¥ and γW,¥ are the infinite dilution activity coefficients i i of the drug in the octanol-rich phase and the water-rich phase, respectively. Under ambient conditions, the octanol-rich phase contains approximately 2.3 mol 3 L-1 1-octanol and 6.07 mol 3 L-1 water.30 Therefore, the total concentration CO is 8.37 mol 3 L-1. For pure water CW = 55.5 mol 3 L-1. The activity coefficient γi needed in eqs 4 and 8 is determined from the COSMO-SAC model originally developed by Lin and Sandler [denoted as COSMO-SAC(2002)].19 The COSMO-SAC
ð2Þ
and the difference in chemical potentials of the compound in the liquid and solid states can be estimated from its heat of fusion (ΔfusHi) and normal melting temperature (Tm,i)23 ! T liquid solid μi ðT, PÞ - μi ðT, PÞ = Δfus Hi 1 ð3Þ Tm, i 937
dx.doi.org/10.1021/je1008872 |J. Chem. Eng. Data 2011, 56, 936–945
Journal of Chemical & Engineering Data
ARTICLE
model is a refinement of the COSMO-RS model of Klamt and his colleagues.31-33 In such models19,21,31-35 the interactions between solute and solvent are considered as the interactions between the screening charges on the paired surface segments of the same size. These surface charges on the molecular cavity are determined from a two-step first principles calculation. First, the optimal conformation of the molecule in vacuum is determined by geometry optimization using quantum mechanical (QM) density functional theory (DFT). Then, a COSMO solvation calculation32 is used to determine the total energy in the perfect conductor (solvent with infinite dielectric constant) and the ideal screening charges on the molecular surface. The σ-profile is the probability of finding a surface segment with screening charge density σ and defined as pi ðσÞ ¼
Ai ðσÞ Ai
where the subscript S denotes the solution of interest (S = i for pure liquid i), the superscripts s and t can be nhb, OH, or OT, and the segment exchange energy ΔW measures the interaction energy between two surface segments of the same area aeff with charge densities σm and σn ΔWðσtm , σsn Þ ¼ cES ðσtm þ σsn Þ2 - chb ðσtm , σsn Þðσ tm - σsn Þ2 ð15Þ The cES, the electrostatic interaction parameter, is a temperaturedependent function cES ¼ AES þ
ð9Þ
∑i xi Ai pi ðσÞ ∑i xi Ai
ð17Þ
ð10Þ
where the values of three hydrogen-bonding interaction parameters have been given earlier.22 With the segment activity coefficient determined above, the activity coefficient of species i in mixture S is determined from
To have a better description of hydrogen-bonding (hb) interactions, Hsieh et al.22 suggested dividing the σ-profile into three components: contributions from surfaces of non-hydrogen bonding atoms, from surfaces of hydroxyl (OH) group, and from surfaces of all other types of hydrogen bond donating and accepting atoms, that is, OH OT pðσÞ ¼ pnhb 0 ðσÞ þ p0 ðσÞ þ p0 ðσÞ
ln γi=S ¼
Ai aef f
nhb, OH, OT
∑t ∑ pti ðσ m Þ½ln ΓtS ðσ m Þ σ m
- ln
ð11Þ
Γti ðσ m Þ þ ln
γcomb i=S
ð18Þ
where γcomb i/S , the combinatorial contribution to activity coefficient, is used to take into account the molecular size and shape differences of the species and is given by
where pnhb 0 (σ) is the collection of all non-hydrogen-bonding segments; pOH 0 (σ) are the segments on the hydroxyl groups; and pOT 0 (σ) includes all of the other hb segments, that is the segments belong to oxygen, nitrogen, fluorine, or hydrogen atoms connected to N or F atoms. For interactions between atoms of hydrogen bonding donors and acceptors, Wang et al.21 suggested refining the hydrogen bonding σ profiles by removing the less polarized surfaces using a Gaussian-type function ! 2 -σ ð12Þ PHB ðσÞ ¼ 1 - exp 2σ 0 2
¼ ln ln γcomb i=S
φi z θi φ þ qi ln þ li - i xi 2 φi xi
∑j xj lj
ð19Þ
with φi = (xiri)/(∑jxjrj), θi = (xiqi)/(∑jxjqj), and li = 5(ri - qi) (ri - 1), where xi is the mole fraction of component i; ri and qi are the normalized volume and surface area parameters for component i (the standard volume and area used for normalization are 0.06669 nm3 and 0.7953 nm2); the summation is over all of the species in the mixture. This revised COSMO-SAC model is denoted as the COSMO-SAC(2010) model in this study. The values of all universal parameters in the COSMO-SAC(2010) model are given in ref 22 and have not been changed. All of the species specific quantities [Ai, ri, qi, pi(σ)] are obtained from first principle COSMO calculations. Two changes have been made in COSMO-SAC(2010) compared to the original COSMO-SAC(2002) model. First, the electrostatic interaction parameter cES is made to be temperature-dependent (eq 16). Second, the variation in the strength of hydrogen bonds formed by different types of donor-acceptor pairs is explicitly taken into account by using separate σ-profile (eq 13) and interaction parameters (eq 17). Two correction methods were used in this study depending on the availability of experimental data. First, when the data of pure drug heat of fusion and temperature of melting (ΔfusHi and Tm,i) are missing, they can be replaced with any experimental solubility
with σ0 = 0.7 e 3 nm-2.21 The σ-profile becomes pðσÞ ¼ pnhb ðσÞ þ pOH ðσÞ þ pOT ðσÞ
ð16Þ
where AES and BES are adjustable parameters and their values are taken from literature.22 The hydrogen-bonding interaction parameter chb(σtm, σsn) is independent of temperature and given by 8 > c if s ¼ t ¼ OH and σtm 3 σsn < 0 > > OH - OH > > < cOT - OT if s ¼ t ¼ OT and σ t σ s < 0 m3 n chb ðσ tm , σ sn Þ ¼ > c if s ¼ OH; t ¼ OT; and σtm 3 σsn < 0 OH - OT > > > > : 0 otherwise
where Ai and Ai(σ) are the total surface areas of molecule i and the summation of the surface areas of all segments with charge density σ; pi(σ) is a specific property of each pure component. The σprofile of a mixture is the summation of the σ-profile of each substance in the system weighted by its surface area (Ai) and mole fraction (xi), that is, pS ðσÞ ¼
BES T2
ð13Þ
with pOH(σ) = p0OH(σ) 3 PHB(σ), pOT(σ) = p0OT(σ) 3 PHB(σ), and pnhb(σ) = p0nhb(σ) þ [p0OH(σ) þ p0OT(σ)][1 - PHB(σ)]. Once the σ-profile is established, the activity coefficient (Γ) for surface segment σm can be determined from 8