Solubility Correlations of Common Organic Solvents

Received: April 15, 2018. Published: June 6, 2018. Figure 1. Unchained Laboratories automated implementation of the shake flask method. Article pubs.a...
1 downloads 0 Views 3MB Size
Article Cite This: Org. Process Res. Dev. 2018, 22, 829−835

pubs.acs.org/OPRD

Solubility Correlations of Common Organic Solvents Jun Qiu* and Jacob Albrecht* Chemical and Synthetic Development, Bristol-Myers Squibb Company, One Squibb Drive, New Brunswick, New Jersey 08903, United States

Downloaded via UNIV OF ALABAMA BIRMINGHAM on August 20, 2018 at 07:34:06 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.

S Supporting Information *

ABSTRACT: We describe general organic solvent solubility correlations derived from methodology that analyzed 63 240 pieces of automation-enabled solubility data of pharmaceutically relevant compounds and synthetic intermediates. A total of 1125 solubility screening panels were empirically collected on 905 distinct solutes using an Unchained Laboratories (formerly Symyx and Freeslate) automated solubility workflow over the last 15 years. Mining and analyzing these results revealed statistically significant solubility correlations between many solvent pairs and hierarchical clustering of most common organic solvents. This has enabled more efficient experimental solubility surveys by reducing the number of solvents in the experimental design, resulting in savings of both material and throughput.



INTRODUCTION Thermodynamic solubility data are critical data that inform and help define reaction conditions, workups, and isolations in organic process development.1 Within a typical step of a synthetic sequence, every unit operation from reaction through crystallization may be built upon knowledge of the solubilities of the most relevant reaction processing components (e.g., starting materials, reagents, products, byproducts, and key impurities) in pertinent solvent systems.2,3 The choice of an empirical solubility measurement workflow is determined by technical factors such as data quality, experimental space, and material consumption as well as economic factors such as speed, FTE cost, and capital equipment requirements. For the needs of process development, solubility data quality is a primary requirement, as erroneous measurements can often lead to wasted experimental effort and poor choices of both reaction conditions and isolation strategies. For early-phase process development, our experience is that process development teams typically demand solubility data sets with no higher than 10% inaccuracy; however, since it is practically impossible to determine the absolute accuracy of measurements within an initial solubility screen, experiments are usually carried out in replicates, and imprecision calculated from these is used as an internal quality control. Numerous methods exist to measure thermodynamic solubility in organic and aqueous solvents, and they can be broadly separated into two camps: “excess solid” and “excess solvent” methods.4 The “excess solid” approach can be further subdivided into two categories: filtration of the excess solid prior to quantitation or no filtration. In general, many arguments support the notion that the “excess solid” method with filtration (a.k.a. the shake flask method) is the method that produces the highestquality data.5 Besides data quality, experimental space and material consumption are also important considerations. In general, the production of more high-quality solubility data with extensive coverage of experimental space is desirable in terms of informing development decisions. Unfortunately, both material availability and time are often very limited in the early stages of process © 2018 American Chemical Society

development, when the need for solubility knowledge happens to be the greatest. Miniaturization of the shake flask method allows more experimental space to be covered with the same amount of research material; however, as the scale of the experiment is reduced, the data quality may also be compromised as a result of a number of technical challenges, namely, solvent evaporation, temperature fluctuations, filtration effectiveness, and general liquid handling and sampling inaccuracies. Thus, the most appropriate solubility measurement workflow is one that balances the need for data quality and experimental space with speed and the use of minimal material. Through extensive evaluations of commercial products and application of experience from internal development efforts, we determined that the Symyx (later Freeslate and currently Unchained Laboratories) automated solubility workflow (Figure 1) is optimal

Figure 1. Unchained Laboratories automated implementation of the shake flask method.

in terms of the aforementioned criteria, primarily because of its high-performance 96-well filter assembly (Figure 2) as well as its associated automation platform for liquid handling, mixing, and precise temperature control. We therefore acquired this earlygeneration system in the early 2000s. Since then, we have carried out thousands of solubility screens6 using this system as well as on subsequent generations of this technology. The typical screen design includes most of the common solvents as well as some Received: April 15, 2018 Published: June 6, 2018 829

DOI: 10.1021/acs.oprd.8b00117 Org. Process Res. Dev. 2018, 22, 829−835

Organic Process Research & Development

Article

undergone conversion to a solvate form. However, for our highthroughput screens, the additional resources required to prepare and analyze all of the samples for PXRD would have significantly diminished the overall throughput and cycle time of the workflow. As these screens were usually conducted early in development, we opted to omit PXRD analysis with the understanding that likely follow-up studies would interrogate and analyze additional solubility and form properties in a small subset of desirable process solvents. Over the past decade, the complexity of active pharmaceutical ingredients (APIs) has increased, and hence, a consummate increase in the number of steps and synthetic strategies to be evaluated has also been realized.9 As a result, we have noted a marked rise in the overall number of isolated compounds, especially intermediates, submitted for solubility screening data against a backdrop of compressed timelines as well as lower availability of materials. Hence, it was found to be impractical to require ∼5 g of material over a few weeks to generate to requisite solubility data sets. Aiming to address this acute challenge of having less material and time at our disposal, we pursued three orthogonal approaches to reduce overall material consumption, decrease cycle time, and provide needed design flexibility, all while meeting the data quality requirements: 1. The experiments were further scaled down. We collaborated with Unchained Laboratories to conduct the same screen in as little as 0.25 mL of total solvent per vial;5 however, this miniaturization led to notable deterioration in data precision and hence required more replicates to be included to control the data quality, which largely canceled out the anticipated material savings. We eventually determined that a total volume of 0.4 mL per vial was the smallest scale at which replicates could be omitted without fundamentally compromising the data quality, thus reducing the material usage to about 3 g for each comprehensive screen. 2. The amount of material charged into each vial was decreased. For example, if 20 mg instead of 40 mg of each compound was charged with 0.4 mL of solvent, any fully dissolved vials would be reported as “>50 mg/mL” rather than “>100 mg/mL”. This proposed change was not considered an acceptable compromise by our collaborators, as the majority strongly believed that it was important to generate solubility values between 50 and 100 mg/mL. 3. The number of solvents included in each screen was decreased. After options 1 and 2 had been exhaustively explored, we concentrated our efforts on rational data-driven approaches to reducing the number of solvents included in these screens. In the course of executing automated solubility screens and compiling summary reports for over 10 years, certain correlations among solvents were empirically observed and noted. For example, the series EtOAc, i-PrOAc, and n-BuOAc were all part of the standard screen design, and it appeared that the solubility of a given compound in EtOAc was on average marginally higher than that in i-PrOAc, which in turn was slightly higher than that in n-BuOAc. If the solubilities in these solvents could be shown to have some kind of correlation to each other with statistical significance, we would have the opportunity to omit two out of these three solvents and thus use only one in the screen, followed by extrapolation to calculate the solubilities

Figure 2. Unchained Laboratories filter plate assembly.

frequently used binary solvent mixtures.7,8 Customized designs are available without much limitation on the solventsany solvents or solvent mixtures with normal physical properties (e.g., boiling point, viscosity) can be accommodated. When the solubility had to be measured at different temperatures, the screens were carried out sequentially at each temperature, as all of the vials in the plate, liquid handlers, and filter assemblies had to be held at the same temperature on the instrument deck. In our initial forays into this automated platform, our standard design filled all of the positions on a 96-well plate, with 36 common solvents, 36 solvent mixtures, a few special solutions, and replicates to round up the remainder for quality control purposes. To every vial was added 50 mg of compound and 0.5 mL of solvent or solvent mixture, and thus, 5 g of material was consumed in each screen. The distribution of data imprecision is summarized in Table 1. The solubility of a fully dissolved vial Table 1. Distribution of Data Imprecision from Historical Data Sets solubility range 100 mg/mL

half of measurements had imprecision less than

3/4 of measurements had imprecision less than

equivalent to below the detection limit 20% 40% 10% 20% 6% 10% equivalent to f ully dissolved

would be reported as “>X mg/mL”, with X mg/mL being the actual filtrate concentration quantitated by HPLC analysis, which was always close to 100 mg/mL when nothing unusual had happened. The lowest solubility that could be measured consistently was around 0.1 mg/mL. In the sub 0.1 mg/mL range, while solubility data sometimes could still be obtained as numeric values, their imprecision would often be much larger than 10%; at other times, the anticipated HPLC peak would not be observed at all, and the data point would be reported as “below detection limit”. From a practical perspective in process development, concentrations less than 0.1 mg/mL and concentrations below the detection limit can be considered as equivalent. During solubility screens the compound has the potential to form solvates: in these situations, the solute may initially fully dissolve but later precipitate as a solvated form. Thus, for largerscale, medium-throughput screens, powder X-ray diffraction (PXRD) was used to determine whether the residual solids had 830

DOI: 10.1021/acs.oprd.8b00117 Org. Process Res. Dev. 2018, 22, 829−835

Organic Process Research & Development

Article

Table 2. Numbers of Solubility Data by Solute, Temperature, and Solvent Condition solute type noncharged

temperature >25 °C 25 °C X mg/mL”) were excluded. • All numerical solubility data less than 0.1 mg/mL and data points reported as “below detection limit” were converted to 0.1 mg/mL. In order to aggregate the solubility reports and analyze the data, a custom Python script was used to read in and clean the raw data as the first step. In total, 1125 solubility reports on 905 distinct compounds were collected, and 63 240 pieces of data were mined. Characteristics of the data set are shown in Table 2.

Figure 4. Solubility correlation between 2-MeTHF and THF at RT (log scale, R2 = 0.775, medium-high correlation).

in the other two solvents. Considering the amount of material that could be saved, we were inspired by this idea and started 831

DOI: 10.1021/acs.oprd.8b00117 Org. Process Res. Dev. 2018, 22, 829−835

Organic Process Research & Development

Article

Table 3. Slope, Intercept, and R2 Values from Regressions of Solubilities (Log Scale) of Noncharged Solutes in Solvent Pairs at RT, Rank-Ordered by R2 solvent Y

solvent X

i-BuOAc n-BuOAc i-PrOAc 1-propanol n-PrOAc TAMEa MIBKa n-BuOAc n-butanol 2-propanol diethoxymethane CPMEa DMAca chlorobenzene EtOAc trifluorotoluene ethanol heptane tert-amyl alcohol 1,2-DCEa 1,2-DMEa MEKa heptane MIBK NMP 2-MeTHF chlorobenzene MTBE EtOAc EtOAc MeCN toluene trifluorotoluene acetone MTBE

n-BuOAc n-PrOAc n-PrOAc ethanol EtOAc MTBE EtOAc i-PrOAc 1-propanol 1-propanol MTBE MTBE DMF toluene MeOAc toluene methanol cyclohexane 2-propanol DCM THF acetone Isopar G MEK DMF THF DCM EtOAc THF acetone acetone DCM DCM THF toluene

slope 1.00 0.96 1.02 0.93 1.00 0.95 0.98 0.93 1.00 0.99 0.98 0.97 1.00 0.99 0.91 0.88 0.94 0.81 0.93 0.83 0.89 0.92 0.89 0.96 0.99 0.90 1.01 0.84 0.80 0.86 0.86 0.92 0.91 0.72 0.69

intercept

R2

−0.11 −0.06 −0.15 0.03 −0.13 −0.09 −0.02 0.08 −0.04 −0.19 0.03 0.23 0.07 0.21 0.00 −0.18 −0.12 −0.25 0.16 −0.18 −0.16 0.04 −0.02 −0.33 0.26 −0.40 −0.45 −0.47 −0.48 −0.22 −0.31 −0.55 −0.67 −0.03 0.19

0.958 0.946 0.943 0.941 0.934 0.934 0.931 0.929 0.926 0.916 0.903 0.894 0.868 0.860 0.856 0.845 0.841 0.831 0.826 0.820 0.808 0.798 0.794 0.793 0.784 0.775 0.773 0.714 0.702 0.690 0.686 0.627 0.573 0.569 0.549

solvent Y DMSO toluene MeCN DMF MTBE MTBE MTBE MeCN DCM heptane toluene heptane acetone toluene EtOAc DMF MTBE heptane toluene heptane water MeOH water water heptane water toluene water heptane water water DCM water water water

solvent X

slope

intercept

R2

DMF EtOAc THF THF THF acetone 2-propanol MeOH THF toluene THF MTBE MeOH acetone MeOH MeOH MeOH i-PrOAc 2-propanol 2-propanol THF THF MeOH DMAc MEK NMP MeOH 1-propanol acetone ethanol 2-propanol MeOH acetone MeCN DMF

0.77 0.75 0.67 0.49 0.58 0.65 0.68 0.63 0.64 0.27 0.48 0.28 0.49 0.52 0.47 0.32 0.45 0.24 0.42 0.32 −0.32 0.27 0.25 −0.32 0.14 −0.30 0.24 0.19 0.11 0.16 0.14 0.16 −0.06 −0.01 −0.03

0.14 −0.55 −0.39 0.87 −0.78 −0.59 −0.15 0.01 −0.29 −0.79 −0.80 −0.81 0.48 −0.60 0.28 1.07 −0.21 −0.87 −0.16 −0.81 0.00 0.34 −0.78 −0.15 −0.92 −0.13 −0.18 −0.64 −0.90 −0.63 −0.55 0.35 −0.37 −0.45 −0.51

0.523 0.507 0.424 0.413 0.412 0.397 0.395 0.318 0.296 0.263 0.262 0.243 0.232 0.226 0.177 0.175 0.161 0.145 0.145 0.133 0.072 0.070 0.064 0.053 0.048 0.043 0.042 0.035 0.032 0.025 0.020 0.019 0.003 0.000 0.000

a

Abbreviations: 1,2-DCE, 1,2-dichloroethane; 1,2-DME, 1,2-dimethoxyethane; CPME, cyclopentyl methyl ether; DMAc, N,N-dimethylacetamide; MEK, methyl ethyl ketone; MIBK, methyl isobutyl ketone; TAME, tert-amyl methyl ether.



EXPLORATORY ANALYSIS OF SOLUBILITY DATA The solubilities in certain obvious solvent pairs (e.g., i-BuOAc vs n-BuOAc) were quickly found to have very high correlations, while exploratory analysis of other pairs (e.g., dichloromethane vs MeOH) showed no correlations at all. The four representative graphs shown in Figures 3−6 provide visual cues on what high (R2 ≈ 0.9), medium-high (R2 ≈ 0.8), medium (R2 ≈ 0.5), and low/no (R2 ≪ 0.5) correlations look like. In these graphs, each dot represents the solubilities of a certain solute in solvent X and solvent Y (log scale). The slope and intercept are of the line of best fit calculated for the plot of log(solubility in solvent Y) versus log(solubility in solvent X).

Plots and statistical analyses were generated using the numpy, scipy, pandas, and matplotlib Python libraries. The vast majority (87%) of the solubility data in our collection were obtained at room temperature (RT), so all of the correlation work that we performed was on RT data only, as the absolute number of data points at other temperatures was too small to draw statistically meaningful conclusions. We first attempted a straightforward regression of the entire data set as a whole, and then we attempted to include different solute properties in the regression to improve the correlations. Among the tested properties, only one was found to improve the correlation: whether the solute is charged (i.e., a salt), or noncharged. When the solutes were separated into these two categories, the solubility correlations within the two categories were found to be stronger than that when all of the solutes were pooled together and regressed as a single set. Of the 905 distinct solutes, 701 were noncharged and 204 were charged. Besides this property, all of the other common solute properties evaluated, such as molecular weight10,11 and topological polar surface area,11,12 failed to show any significant effect on the correlations from regression.



RESULTS AND DISCUSSION Large-scale regression analysis of many pairs of interest were then carried out to quantitate their correlations, and the results are tabulated below. It should be noted that these correlations were established with pharmaceutically relevant compounds such as APIs, intermediates, impurities, and key reagents typical of small-molecule development programs at Bristol-Myers Squibb over the past 10 years. 832

DOI: 10.1021/acs.oprd.8b00117 Org. Process Res. Dev. 2018, 22, 829−835

Organic Process Research & Development

Article

Noncharged Solutes. The results of the regression analyses for solvent pairs with noncharged solutes are shown in Table 3. Since the slope (m) and intercept (b) were obtained from linear regression of log(solubility in solvent Y) versus log(solubility in solvent X), the solubility data themselves can be written as follows:

Table 4. Summary of Solubility Correlations with Noncharged Solutes minimum number of solvents to include in the screen

solvent group

correlation

water DMSO MeCN DMF, DMAc, NMP heptane, cyclohexane, Isopar G dichloromethane, 1,2-DCE, chlorobenzene toluene, trifluorotoluene, chlorobenzene MeOH, EtOH, n-PrOH, IPA, n-BuOH, tert-amyl alcohol MTBE, TAME, CPME, diethoxymethane THF, 2-MeTHF, 1,2-DME acetone, MEK, MIBK MeOAc, EtOAc, n-PrOAc, i-PrOAc, n-BuOAc, i-BuOAc, MIBK

none with other solvents none with other solvents none with other solvents high within group high within group

1 1 1 1 from group 1 from group

high within group

1 from group

high within group

1 from group

neighbors are more correlated than other pairs. high within group

2 (not neighbors) from group

high within group high within group high within group

1 from group 1 from group 1 from group

log(S Y ) = m log(SX) + b

where SJ is the solubility of the solute in solvent J (in mg/mL). This expression can be rearranged to give S Y = 10b ·SX m

R2 values indicate how significant the correlations are, i.e., the amount of solubility variance in one solvent that can be explained by the measurement in another. Two observations are most noteworthy:

1 from group

1. Strongly correlated solvent pairs are most likely from the same solvent class, with the single exception of the MIBK/ ester pairs. 2. Strongly correlated solvent pairs also tend to have slope values that are close to 1.

Table 5. Slope, Intercept, and R2 Values from Regressions of Solubilities (Log Scale) of Charged Solutes in Solvent Pairs at RT, Rank-Ordered by R2 solvent Y

solvent X

i-PrOAc n-BuOAc i-BuOAc n-BuOAc DMAc MIBK n-PrOAc heptane diethoxymethane EtOAc tert-amyl alcohol CPME MIBK TAME NMP 2-MeTHF MEK ethanol DMSO EtOAc acetone MTBE 1,2-DCE chlorobenzene MeCN trifluorotoluene EtOAc chlorobenzene MTBE n-butanol MeCN toluene trifluorotoluene 2-propanol 1-propanol

n-PrOAc n-PrOAc n-BuOAc i-PrOAc DMF EtOAc EtOAc cyclohexane MTBE MeOAc 2-propanol MTBE MEK MTBE DMF THF acetone methanol DMF THF THF toluene DCM toluene acetone toluene acetone DCM EtOAc 1-propanol THF DCM DCM 1-propanol Ethanol

slope 0.91 0.89 0.96 0.92 0.71 0.93 0.87 0.82 0.90 0.81 0.97 0.85 0.73 0.79 0.59 0.71 0.76 0.91 0.55 0.59 0.62 0.68 0.69 0.92 0.73 0.77 0.61 0.51 0.53 0.53 0.57 0.42 0.44 0.46 0.77

intercept

R2

−0.13 −0.13 −0.08 −0.01 0.31 −0.04 −0.16 −0.19 −0.08 −0.04 −0.31 0.04 −0.27 −0.12 0.57 −0.34 −0.13 −0.46 0.78 −0.34 0.06 −0.09 −0.18 0.05 −0.11 −0.16 −0.33 −0.45 −0.42 0.25 −0.11 −0.57 −0.57 0.15 −0.11

0.881 0.865 0.862 0.837 0.822 0.816 0.792 0.766 0.765 0.738 0.725 0.694 0.688 0.675 0.672 0.615 0.604 0.588 0.563 0.557 0.526 0.522 0.517 0.513 0.508 0.501 0.473 0.460 0.453 0.449 0.432 0.419 0.405 0.384 0.382

solvent Y toluene heptane 1,2-DME DMF MTBE DCM toluene MTBE MTBE toluene toluene water heptane EtOAc acetone MeCN heptane water toluene water heptane MeOH DCM heptane MTBE DMF water water water water heptane heptane water water water 833

solvent X

slope

intercept

R2

EtOAc Isopar G THF THF THF THF THF 2-propanol acetone acetone 2-propanol 1-propanol toluene MeOH MeOH MeOH MTBE ethanol MeOH MeOH 2-propanol THF MeOH i-PrOAc MeOH MeOH 2-propanol THF DMF NMP acetone MEK MeCN DMAc acetone

0.48 0.82 0.51 0.68 0.33 0.51 0.28 0.41 0.32 0.31 0.33 0.45 0.17 0.42 0.45 0.38 0.15 0.32 0.19 0.30 0.09 0.14 0.30 0.09 0.16 0.27 0.24 −0.16 0.18 −0.19 0.04 0.04 0.11 −0.15 0.05

−0.49 −0.12 0.03 0.72 −0.62 −0.18 −0.66 −0.66 −0.58 −0.60 −0.71 −0.05 −0.76 −0.65 −0.38 −0.50 −0.78 −0.07 −0.83 −0.25 −0.92 1.12 −0.47 −0.83 −0.73 0.47 0.17 0.41 −0.28 0.45 −0.85 −0.85 0.30 0.35 0.29

0.381 0.368 0.337 0.317 0.294 0.273 0.233 0.207 0.205 0.171 0.150 0.124 0.114 0.108 0.107 0.080 0.079 0.069 0.059 0.053 0.046 0.045 0.043 0.038 0.032 0.031 0.029 0.022 0.014 0.012 0.010 0.010 0.008 0.008 0.001

DOI: 10.1021/acs.oprd.8b00117 Org. Process Res. Dev. 2018, 22, 829−835

Organic Process Research & Development

Article

highly correlated with esters, as was the case for noncharged solutes; however, it is no longer highly correlated to other ketones. In total, with charged solutes, the number of solvents that can be extrapolated and omitted from screening is eight. Hierarchical Clustering of Solvents. To further characterize similarities between solvents, hierarchical clustering was used to group solvents by similarity (Figures 7 and 8). Hierarchical

When the slope is 1 or about 1, the solubility values for the two solvents have a linear or near linear relationship. Thus, approximately for strongly correlated solvents, S Y = 10b ·SX

This leads to straightforward extrapolation from one solvent to another solvent. For example, using the intercept and R2 values for i-BuOAc/n-BuOAc from Table 3 gives Si‐BuOAc = 10−0.11·Sn‐BuOAc

with R2 = 0.958

On the basis of the high R2 value, we should be able to remove i-BuOAc from the screening set and use n-BuOAc solubility data to extrapolate i-BuOAc values with high confidence. The same principle can be applied to many other solvent pairs in the table; however, as the R2 value decreases moving downward in the table, it follows that our confidence in the extrapolated values also decreases. Where to draw the line is subject to debate, but we suggest a minimum R2 value of 0.75 as the cutoff point for carrying out solubility extrapolation that does not introduce unacceptable errors into the data set (Table 4). Three solvents (water, DMSO, and MeCN) do not correlate to other solvents, and they should be included in a comprehensive solubility screen. On the other hand, two solvents (chlorobenzene and MIBK) have strong correlations with two different groups of solvents. Alcohols exhibit interesting correlations: among the six common alcohols, neighboring ones have high correlation but far-apart ones do not, and as a result, two disparate alcohols are recommended as the minimum to be included in a screen. In comparison, all six ester solvents have strong correlations with each other, and only one needs to be included in a screen when material is limited. Finally, it is notable that ethereal solvents are clearly separated into two groups, where the correlation is strong within the groups but weak between them. To summarize, with noncharged solutes, the number of solvents that can be extrapolated is 24 from Table 3. Therefore, when material is limited, we can take advantage of the correlations and easily excise as many as 24 solvents from the screens (Table 4). This not only enables a material-sparing design with a faster throughput but also allows for more design flexibility per plate when material is not an issue. For example, we can screen critical solvent mixtures on newly available spots on the screening plate. Charged Solutes. The results of the regression analyses for solvent pairs with charged solutes are shown in Table 5. It should be noted that the solubility data set with charged solutes is much smaller than the data set with noncharged solutes, and the R2 values overall are also lower. When we adopt the same minimum R2 value of 0.75 as the cutoff for carrying out the extrapolation, for these charged solutes there are much fewer well-correlated solvent pairs, as shown in Table 6. We note that MIBK is still

Figure 7. Clustering of solvents for noncharged solutes.

Table 6. Summary of Solubility Correlations with Charged Solutes solvent group DMF, DMAc heptane, cyclohexane MTBE, diethoxymethane EtOAc, n-PrOAc, i-PrOAc, n-BuOAc, i-BuOAc, MIBK other solvents

correlation

minimum number of solvents to include in screen

high within group high within group high within group high within group

1 from group 1 from group 1 from group 1 from group

none

all

Figure 8. Clustering of solvents for charged solutes.

agglomerative clustering scores the similarity between different solvents on the basis of a complete set of solubility measurements on individual solutes. The resulting similarity score is a measure of the distance between the solubility profiles of the pure solvents. 834

DOI: 10.1021/acs.oprd.8b00117 Org. Process Res. Dev. 2018, 22, 829−835

Organic Process Research & Development



Article

P. Sanofi’s solvent selection guide: a step toward more sustainable processes. Org. Process Res. Dev. 2013, 17, 1517−1525. (9) Li, J.; Eastgate, M. Current complexity: a tool for assessing the complexity of organic molecules. Org. Biomol. Chem. 2015, 13, 7164− 7176. (10) Lipinski, C.; Lombardo, F.; Dominy, B.; Feeney, P. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Delivery Rev. 2001, 46, 3−26. (11) Veber, D.; Johnson, S.; Cheng, H.; Smith, B.; Ward, K.; Kopple, K. Molecular properties that influence the oral bioavailability of drug candidates. J. Med. Chem. 2002, 45, 2615−2623. (12) Ertl, P.; Rohde, B.; Selzer, P. Fast calculation of molecular polar surface area as a sum of fragment-based contributions and its application to the prediction of drug transport properties. J. Med. Chem. 2000, 43, 3714−3717.

CONCLUSION A total of 63 240 pieces of solubility data from 905 pharmaceutically relevant compounds were analyzed to establish statistically significant correlations between solvent pairs. We propose that when material is limited, 24 and eight common solvents can be omitted from empirical screens for noncharged and charged solutes, respectively, with the use of extrapolation from other highly correlated solvents to provide relevant solubility information. In addition, since the time when the solubility correlations between common solvents were established, we have been taking advantage of this knowledge. For example, when empirical data have seemed to deviate too much from the expected relationship between certain solvent pairs, we have either repeated the measurements or analyzed whether a previously unknown solvate was formed. Finally, other data mining and analysis efforts are still ongoing, and their results will be published in subsequent papers.



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.oprd.8b00117. Solvent pair correlation tables (XLSX) Solubility extrapolation guide (PDF) Solvent pair correlation graphs (PDF)



AUTHOR INFORMATION

Corresponding Authors

*Jun Qiu: Tel.: 732-227-6230. E-mail: [email protected]. *Jacob Albrecht: Tel.: 732-227-6330. E-mail: jacob.albrecht@ bms.com. ORCID

Jun Qiu: 0000-0002-2733-7979 Notes

The authors declare no competing financial interest.

■ ■

ACKNOWLEDGMENTS The authors thank Jacob Janey and Srinivas Tummala for their kind support and frequent discussions. REFERENCES

(1) Alsenz, J.; Kansy, M. High throughput solubility measurement in drug discovery and development. Adv. Drug Delivery Rev. 2007, 59, 546−567. (2) Hsieh, D.; Marchut, A.; Wei, C.; Zheng, B.; Wang, S.; Kiang, S. Model-Based Solvent Selection during Conceptual Process Design of a New Drug Manufacturing Process. Org. Process Res. Dev. 2009, 13, 690−697. (3) Diorazio, L.; Hose, D.; Adlington, N. Toward a More Holistic Framework for Solvent Selection. Org. Process Res. Dev. 2016, 20, 760− 773. (4) Black, S.; Dang, L.; Liu, C.; Wei, H. On the measurement of solubility. Org. Process Res. Dev. 2013, 17, 486−489. (5) Selekman, J. A.; Qiu, J.; Tran, K.; Stevens, J.; Rosso, V.; Simmons, E.; Xiao, Y.; Janey, J. High-throughput automation in chemical process development. Annu. Rev. Chem. Biomol. Eng. 2017, 8, 525−547. (6) At the time of writing, we have carried out about 1500 solubility screens. (7) Ashcroft, C. P.; Dunn, P. J.; Hayler, J. D.; Wells, A. S. Survey of solvent usage in papers published in Organic Process Research & Development 1997−2012. Org. Process Res. Dev. 2015, 19, 740−747. (8) Prat, D.; Pardigon, O.; Flemming, H.-W.; Letestu, S.; Ducandas, V.; Isnard, P.; Guntrum, E.; Senac, T.; Ruisseau, S.; Cruciani, P.; Hosek, 835

DOI: 10.1021/acs.oprd.8b00117 Org. Process Res. Dev. 2018, 22, 829−835