Solubility Correlations of Common Organic Solvents - Organic Process

6 hours ago - We describe general organic solvent solubility correlations derived from methodology that analyzed 63240 pieces of automation enabled ...
3 downloads 0 Views 2MB Size
Subscriber access provided by Kaohsiung Medical University

Full Paper

Solubility Correlations of Common Organic Solvents Jun Qiu, and Jacob Albrecht Org. Process Res. Dev., Just Accepted Manuscript • DOI: 10.1021/acs.oprd.8b00117 • Publication Date (Web): 06 Jun 2018 Downloaded from http://pubs.acs.org on June 6, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Organic Process Research & Development

Solubility Correlations of Common Organic Solvents

Jun Qiu*, and Jacob Albrecht* Chemical and Synthetic Development, Bristol-Myers Squibb Company, One Squibb Drive, New Brunswick, New Jersey, 08903, USA

1 ACS Paragon Plus Environment

Organic Process Research & Development 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table of Contents

2 ACS Paragon Plus Environment

Page 2 of 21

Page 3 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Organic Process Research & Development

Abstract We describe general organic solvent solubility correlations derived from methodology that analyzed 63240 pieces of automation enabled solubility data of pharmaceutically relevant compounds and synthetic intermediates. A total of 1125 solubility screening panels were empirically collected on 905 distinct solutes using an Unchained Labs (formerly Symyx and Freeslate) automated solubility workflow over the last 15 years. Mining and analyzing these results revealed statistically significant solubility correlations between many solvent pairs, and hierarchical clustering of most common organic solvents. This has enabled more efficient experimental solubility surveys by reducing the number of solvents in the experimental design resulting in savings of both material and throughput.

Keywords Organic solvent solubility Automated solubility measurement Solubility correlations Solvent group Hierarchical clustering of solvents

3 ACS Paragon Plus Environment

Organic Process Research & Development 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Introduction Thermodynamic solubility data is a critical piece of data that informs and helps define reaction conditions, work-ups, and isolations in organic process development1. Within a typical step of a synthetic sequence, every unit operation from reaction through crystallization may be built upon knowledge of solubility for most relevant reaction processing components (e.g. starting materials, reagents, products, by-products, and key impurities) in pertinent solvent systems2,3. The choice of an empirical solubility measurement workflow is determined by technical factors such as data quality, experimental space, material consumption, as well as economic factors such as speed, FTE cost, and capital equipment requirements. For the needs of process development, solubility data quality is a primary requirement as erroneous measurements can often lead to wasted experimental effort and poor choices for both reaction conditions and isolation strategies. For early phase process development, our experience is that process development teams typically demand solubility datasets with no higher than 10% inaccuracy; however, since it is practically impossible to determine the absolute accuracy of measurements within an initial solubility screen, experiments are usually carried out in replicates, and imprecision calculated from these are used as an internal quality control. Numerous methods exist to measure thermodynamic solubility in organic and aqueous solvents, and they can be broadly separated into two camps: “excess solid” and “excess solvent” methods4. The “excess solid” approach can be further subdivided into two categories: filtering off the excess solids prior to quantitation, or no filtration. In general, many arguments support the notion that the “excess solid” method with filtration, a.k.a. the shake flask method, to be the method that produces the highest quality data5.

Fig 1. The Unchained Labs automated implementation of the shake flask method. Besides data quality, experimental space and material consumption are also important considerations. In general, the production of more, high quality solubility data with extensive coverage of experimental space is desirable in terms of informing development decisions. Unfortunately, both material availability as well as time is often very limited at the early stages of process development when the need of solubility knowledge happens to be the greatest.

4 ACS Paragon Plus Environment

Page 4 of 21

Page 5 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Organic Process Research & Development

Miniaturization of the shake flask method allows more experimental space to be covered with the same amount of research material; however, as the scale of the experiment is reduced, data quality may also be compromised due to a number of technical challenges, namely: solvent evaporation, temperature fluctuations, filtration effectiveness, and general liquid handling and sampling inaccuracies. Thus, the most appropriate solubility measurement workflow is one that balances the need for data quality and experimental space with speed and minimal material.

Fig 2. The Unchained Labs Filter Plate Assembly

Through extensive evaluations of commercial products and applying experience from internal development efforts, we determined that the Symyx (later Freeslate, and currently Unchained Labs) automated solubility workflow was optimal in terms of the aforementioned criteria, primarily due to its high performance 96-well filter assembly, as well as its associated automation platform for liquid handling, mixing, and precise temperature control. We therefore acquired these early generation systems in the early 2000’s. Since then, we have carried out thousands of solubility screensa using this system as well as on subsequent generations of this technology. The typical screening design includes most of the common solvents, as well as some frequently used binary solvent mixtures6,7. Customized designs are available without much limitation on the solvents—any solvents or solvent mixtures with normal physical property (e.g. boiling point, viscosity) can be accommodated. If solubility needs to be measured at different temperatures, the screens are carried out sequentially at each temperature, as all vials in a plate, liquid handlers, and filter assemblies must be held at the same temperature on the instrument deck. In our initial forays into this automated platform, our standard design filled all positions on a 96well plate, with 36 common solvents, 36 solvent mixtures, a few special solutions, and replicates to round up the remainder for quality control purposes. To every vial was added 50 mg of compound, and 0.5 ml of solvent or solvent mixtures, thus consuming 5 g of material in each 5 ACS Paragon Plus Environment

Organic Process Research & Development 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 21

screen. The solubility of a fully dissolved vial would be reported as “> X mg/ml”, with X being the actual filtrate concentration quantitated by HPLC analysis, which was always close to 100 when nothing unusual had happened. The lowest solubility that could be measured consistently was around 0.1 mg/ml solution. At the sub 0.1 mg/ml range, while solubility data sometimes may still be obtained as a numeric value, their imprecision would often be much larger than 10%; at other times, the anticipated HPLC peak may not be observed at all and the data point would be reported as “below detection limit”. From a practical perspective in process development, anything less than 0.1 mg/ml or below the detection limit can be considered as equivalent.

Solubility Range < 0.1 mg/ml 0.1 - 1 mg/ml 1 - 10 mg/ml 10 - 100 mg/ml > 100 mg/ml

Half of Measurements had imprecision less than Eq. to below detection limit 20% 10% 6% Eq. to fully dissolved

3/4 of Measurements had imprecision less than 40% 20% 10%

Table 1. Distribution of data imprecision from historical datasets During solubility screens the compound has the potential to form solvates: in these situations the solute may initially fully dissolve, but later precipitate as a solvated form. Thus, for larger scale, medium throughput screens, powder X-ray diffraction (PXRD) is used to determine if the residual solids have undergone conversion to a solvate form. However, for our high throughput screens, the additional resources required to prepare and analyze all of the samples for PXRD would significantly diminish the overall throughput and cycle time of the workflow. As these screens were usually conducted early in development, we opted to omit PXRD analysis with the understanding that likely follow-up studies will interrogate and analyze additional solubility and form properties in a small subset of desirable process solvents. Over the last decade API complexity has increased, hence a consummate increase in the number of steps and synthetic strategies to be evaluated has also been realized8. As a result, we noted a marked rise in the overall number of isolated compounds, especially intermediates, submitted for solubility screening data against a backdrop of compressed time lines as well as lower availability of materials. Hence, it was found to be impractical to require ~5g of material over a few weeks to generate to requisite solubility data sets. Aiming to address this acute challenge of having less material and time at our disposal, we pursued three orthogonal approaches to reduce overall material consumption, decrease cycle time, and provide needed design flexibility, all whilst meeting the data quality requirements: 1. Further scale down the experiments. We collaborated with Unchained Labs to conduct the same screen in as little as 0.25 ml of total solvent per vial5; however, this miniaturization led to notable deterioration in data precision, and hence required more replicates to be included to control data quality, which largely cancelled out the anticipated material savings. We eventually determined that a total volume of 0.4 ml per vial was the smallest scale at which replicates may be omitted without fundamentally 6 ACS Paragon Plus Environment

Page 7 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Organic Process Research & Development

compromising data quality, thus reducing material usage to about 3 g for each comprehensive screen. 2. Bring down the amount of material charged into each vial. For example, if 20 mg instead of 40 mg of the compound was charged with 0.4 ml of solvent, any fully dissolved vials would be reported as “> 50 mg/ml” rather than “> 100 mg/ml”. This proposed change was not considered an acceptable compromise by our collaborators, as the majority strongly believed that it was important to generate solubility values between 50 and 100 mg/ml. 3. Decrease the number of solvents included in each screen. After options 1 and 2 had been exhaustively explored, we concentrated our efforts on rational, data driven approaches to reducing the number of solvents included in these screens. In the course of executing automated solubility screens and compiling summary reports for over ten years, certain correlations among solvents were empirically observed and noted. For example, the series of EtOAc, i-PrOAc, and n-BuOAc were all part of the standard screen design, and it appeared that solubility of a given compound in EtOAc was, on average, marginally higher than in i-PrOAc, which was then slightly higher than in n-BuOAc. If the solubility amongst these solvents could be shown to have some kind of correlation to each other with statistical significance, we would have the opportunity to omit two out of these three solvents, and thusly use only one in the screen, followed by extrapolation to calculate the solubility for the other two solvents. Considering the amount of material that could be saved, we were inspired by this idea and started analyzing the datasets that we had collected to date to apply this idea across multiple solvent classes. Data Analysis Methodology Since we have observed that the solubility of solutes in certain solvent pairs appears to have a constant ratio relationship, we decided to regress the log of measured solubility (in mg/ml) from solvent pairs to quantify the magnitude of these ratios. Before regression took place, nonnumeric solubility data from the reports were pre-processed according to the following rules: • •

All fully dissolved data points (i.e. reported as “> X mg/ml)” were excluded All numerical solubility data less than 0.1 mg/ml and data points reported as “below detection limit” were converted to 0.1 mg/ml

In order to aggregate the solubility reports and analyze the data, a custom Python script was used to read in and clean the raw data as the first step. In total, 1125 solubility reports of 905 distinct compounds were collected, and 63240 pieces of data were mined. Plots and statistical analyses were generated using the numpy, scipy, pandas, and matplotlib Python libraries. The vast majority (87%) of solubility data in our collection were at room temperature, so all the correlation work that we performed was on RT data only, as the absolute number of data points at other temperatures was too small to draw statistically meaningful conclusions.

7 ACS Paragon Plus Environment

Organic Process Research & Development 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 21

We first attempted a straightforward regression of the entire data as a whole, and then attempted to include different solute properties in the regression to improve the correlations. Among the tested properties, only one was found to be able to improve the correlation: whether the solute is charged (i.e. a salt), or non-charged. By separating all solutes into these two categories, solubility correlations within each was found to be stronger than if all solutes were pooled together and regressed as a single set. Besides this property, all the other common solute properties evaluated, such as molecular weight9,10 and topological polar surface area10,11 failed to show any significant effect on the correlations from regression. Of the 905 distinct solutes, 701 were non-charged, and 204 were charged:

Solute

Temperature > 25 ºC

Non-charged

< 20 ºC RT (20 – 25 ºC) > 25 ºC

Charged

< 20 ºC RT (20 – 25 ºC)

Solvent

Number of Measurements

Mixed Pure Mixed Pure Mixed Pure Mixed Pure Mixed Pure Mixed Pure

2924 1950 1106 945 22925 19951 377 523 127 217 6420 5775 63240

Total

Table 2. Number of solubility data by solute, temperature, solvent condition Exploratory Analysis of Solubility Data Solubility of certain obvious solvent pairs (e.g. i-BuOAc vs. n-BuOAc) were quickly found to have very high correlations, while exploratory analysis of other pairs (e.g. dichloromethane vs. MeOH) showed no correlations at all. Four representative graphs are shown below to provide a visual cue on what high, medium-high, medium, and low/no correlations look like b. In these graphs, each dot represents the solubility of a certain solute in Solvent X vs. Solvent Y (log scale). The slope and intercept are of the line of best fit calculated from log (Solubility in Solvent Y) vs. log (Solubility in Solvent X).

8 ACS Paragon Plus Environment

Page 9 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Organic Process Research & Development

Fig 3. Solubility correlation between i-BuOAc and n-BuOAc at RT (log scale, R2 = 0.958, high correlation)

Fig 4. Solubility correlation between 2-MeTHF and THF at RT (log scale, R2 = 0.775, mediumhigh correlation)

9 ACS Paragon Plus Environment

Organic Process Research & Development 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Fig 5. Solubility correlation between toluene and EtOAc at RT (log scale, R2 = 0.507, medium correlation)

10 ACS Paragon Plus Environment

Page 10 of 21

Page 11 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Organic Process Research & Development

Fig 6. Solubility correlation between dichloromethane and MeOH at RT (log scale, R2 = 0.019, low/no correlation)

Results and Discussion Large scale regression analysis of many pairs of interest were then carried out to quantitate their correlations, and the results are tabulated below. It should be noted that these correlations were established with pharmaceutically relevant compounds such as Active Pharmaceutical Ingredients (APIs), intermediates, impurities and key reagents, typical of small molecule development programs at Bristol-Myers Squibb over the past 10 years. Non-charged solutes Table 3. Slope, intercept, and R2 value of solubility (log scale) from solvent pairs with noncharged solutes at RT, rank ordered by R2:

Solvent Y i-BuOAc n-BuOAc i-PrOAc 1-Propanol n-PrOAc TAMEc MIBKc n-BuOAc n-Butanol 2-Propanol Diethoxymethane CPMEc DMAcc Chlorobenzene EtOAc Trifluorotoluene Ethanol Heptane t-Amyl alcohol 1,2-DCEc 1,2-DMEc MEKc

Solvent X n-BuOAc n-PrOAc n-PrOAc Ethanol EtOAc MTBE EtOAc i-PrOAc 1-Propanol 1-Propanol MTBE MTBE DMF Toluene MeOAc Toluene Methanol Cyclohexane 2-Propanol DCM THF Acetone

Slope 1.00 0.96 1.02 0.93 1.00 0.95 0.98 0.93 1.00 0.99 0.98 0.97 1.00 0.99 0.91 0.88 0.94 0.81 0.93 0.83 0.89 0.92

Intercept -0.11 -0.06 -0.15 0.03 -0.13 -0.09 -0.02 0.08 -0.04 -0.19 0.03 0.23 0.07 0.21 0.00 -0.18 -0.12 -0.25 0.16 -0.18 -0.16 0.04

R2 0.958 0.946 0.943 0.941 0.934 0.934 0.931 0.929 0.926 0.916 0.903 0.894 0.868 0.860 0.856 0.845 0.841 0.831 0.826 0.820 0.808 0.798

11 ACS Paragon Plus Environment

Organic Process Research & Development 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Heptane MIBK NMP 2-MeTHF Chlorobenzene MTBE EtOAc EtOAc MeCN Toluene Trifluorotoluene Acetone MTBE DMSO Toluene MeCN DMF MTBE MTBE MTBE MeCN DCM Heptane Toluene Heptane Acetone Toluene EtOAc DMF MTBE Heptane Toluene Heptane Water MeOH Water Water Heptane Water Toluene Water

isopar G MEK DMF THF DCM EtOAc THF Acetone Acetone DCM DCM THF Toluene DMF EtOAc THF THF THF Acetone 2-Propanol MeOH THF Toluene THF MTBE MeOH Acetone MeOH MeOH MeOH i-PrOAc 2-Propanol 2-Propanol THF THF MeOH DMAc MEK NMP MeOH 1-Propanol

0.89 0.96 0.99 0.90 1.01 0.84 0.80 0.86 0.86 0.92 0.91 0.72 0.69 0.77 0.75 0.67 0.49 0.58 0.65 0.68 0.63 0.64 0.27 0.48 0.28 0.49 0.52 0.47 0.32 0.45 0.24 0.42 0.32 -0.32 0.27 0.25 -0.32 0.14 -0.30 0.24 0.19

-0.02 -0.33 0.26 -0.40 -0.45 -0.47 -0.48 -0.22 -0.31 -0.55 -0.67 -0.03 0.19 0.14 -0.55 -0.39 0.87 -0.78 -0.59 -0.15 0.01 -0.29 -0.79 -0.80 -0.81 0.48 -0.60 0.28 1.07 -0.21 -0.87 -0.16 -0.81 0.00 0.34 -0.78 -0.15 -0.92 -0.13 -0.18 -0.64

0.794 0.793 0.784 0.775 0.773 0.714 0.702 0.690 0.686 0.627 0.573 0.569 0.549 0.523 0.507 0.424 0.413 0.412 0.397 0.395 0.318 0.296 0.263 0.262 0.243 0.232 0.226 0.177 0.175 0.161 0.145 0.145 0.133 0.072 0.070 0.064 0.053 0.048 0.043 0.042 0.035

12 ACS Paragon Plus Environment

Page 12 of 21

Page 13 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Organic Process Research & Development

Heptane Water Water DCM Water Water Water

Acetone Ethanol 2-Propanol MeOH Acetone MeCN DMF

0.11 0.16 0.14 0.16 -0.06 -0.01 -0.03

-0.90 -0.63 -0.55 0.35 -0.37 -0.45 -0.51

0.032 0.025 0.020 0.019 0.003 0.000 0.000

Since the slope and intercept were obtained from linear regression of log (Solubility in Solvent Y) vs. log (Solubility in Solvent X), the solubility data themselves can be written as follows: log (Solubility in Solvent Y) = Slope * log (Solubility in Solvent X) + Intercept Or Solubility in Solvent Y = 10Intercept x (Solubility in Solvent X)Slope The unit of solubility in the equation is mg/ml solution. R2 values indicate how significant the correlations are, i.e. the amount of solubility variance in one solvent that can be explained by the measurement in another. Two observations are most noteworthy: 1. Strongly correlated solvent pairs are most likely from the same solvent class, with a single exception of the MIBK/esters pairs. 2. Strongly correlated solvents pairs also tend to have slope values that are close to 1. When the slope is 1 or about 1, solubility values from the two solvents have a linear or near linear relationship. So approximately for strongly correlated solvents: Solubility in Solvent Y = 10Intercept x (Solubility in Solvent X) This leads to straightforward extrapolation from one solvent to another solvent. For example, using the intercept value from the table: Solubility in i-BuOAc = 10-0.11 x (Solubility in n-BuOAc), with R2 = 0.958 Based on the high R2 value, we should be able to remove i-BuOAc from the screening set, and use n-BuOAc solubility data to extrapolate i-BuOAc values with high confidence. The same principle can be applied to many other solvent pairs in the table; however, as the R2 value between pairs decreases down the table, it follows that our confidence in the extrapolated values also decreases. Where to draw the line is subject to debate, but we suggest a minimum R2 value of 0.75 as the cut-off point for carrying out solubility extrapolation that does not introduce unacceptable errors into the dataset (Table 4).

13 ACS Paragon Plus Environment

Organic Process Research & Development 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 21

Solvent Group

Correlation

Water DMSO MeCN DMF, DMAc, NMP Heptane, cyclohexane, isopar G Dichromethane, 1,2-dichloroethane, chlorobenzene Toluene, trifluorotoluene, chlorobenzene MeOH, EtOH, n-PrOH, IPA, nBuOH, t-Amyl alcohol MTBE, t-amyl methyl ether, CPME, diethoxymethane THF, 2-MeTHF, 1,2dimethoxyethane Acetone, MEK, MIBK MeOAc, EtOAc, n-PrOAc, i-PrOAc, n-BuOAc, i-BuOAc, MIBK

None with other solvents None with other solvents None with other solvents High within group High within group High within group

Minimum number of solvents to include in screen 1 1 1 1 from group 1 from group 1 from group

High within group

1 from group

Neighbors are more correlated than other pairs. High within group

2 (not neighbors) from group 1 from group

High within group

1 from group

High within group High within group

1 from group 1 from group

Table 4. Summary of solubility correlation with non-charged solutes Three solvents (water, DMSO, MeCN) do not correlate to other solvents, and they should be included in a comprehensive solubility screen. On the other hand, two solvents (chlorobenzene and MIBK) each have strong correlations with two different groups of solvents. Alcohols exhibit interesting correlations: among the six common alcohols, neighboring ones have high correlation, but not far apart ones, and as a result, two disparate alcohols are recommended as the minimum to be included in a screen. In comparison, all six ester solvents have strong correlations with each other, and only one can be included in a screen when material is limited. Finally, it’s notable that ethereal solvents are clearly separated into two groups, where the correlation is strong within the groups, but weak between the groupings. To summarize, with non-charged solutes, the number of solvents that can be extrapolated is 24 from Table 3. So when material is limited, we take advantage of the correlations and can easily excise as many as 24 solvents from the screens (Table 4). This not only enables a material sparing design with a faster throughput, but it also allows for more design flexibility per plate when material is not an issue. For example, we are able to screen critical solvent mixtures on newly available spots on the screening plate.

Charged solutes

14 ACS Paragon Plus Environment

Page 15 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Organic Process Research & Development

It should be noted that the solubility dataset with charged solutes is much smaller than the dataset with non-charged solutes, and the R2 values, overall, are also lower. Table 5. Slope, intercept, and R2 value of solubility (log scale) from solvent pairs with charged solutes at RT, rank ordered by R2: Solvent Y i-PrOAc n-BuOAc i-BuOAc n-BuOAc DMAc MIBK n-PrOAc Heptane Diethoxymethane EtOAc t-Amyl alcohol CPME MIBK TAME NMP 2-MeTHF MEK Ethanol DMSO EtOAc Acetone MTBE 1,2-DCE Chlorobenzene MeCN Trifluorotoluene EtOAc Chlorobenzene MTBE n-Butanol MeCN Toluene Trifluorotoluene 2-Propanol 1-Propanol

Solvent X n-PrOAc n-PrOAc n-BuOAc i-PrOAc DMF EtOAc EtOAc Cyclohexane MTBE MeOAc 2-Propanol MTBE MEK MTBE DMF THF Acetone Methanol DMF THF THF Toluene DCM Toluene Acetone Toluene Acetone DCM EtOAc 1-Propanol THF DCM DCM 1-Propanol Ethanol

Slope 0.91 0.89 0.96 0.92 0.71 0.93 0.87 0.82 0.90 0.81 0.97 0.85 0.73 0.79 0.59 0.71 0.76 0.91 0.55 0.59 0.62 0.68 0.69 0.92 0.73 0.77 0.61 0.51 0.53 0.53 0.57 0.42 0.44 0.46 0.77

Intercept -0.13 -0.13 -0.08 -0.01 0.31 -0.04 -0.16 -0.19 -0.08 -0.04 -0.31 0.04 -0.27 -0.12 0.57 -0.34 -0.13 -0.46 0.78 -0.34 0.06 -0.09 -0.18 0.05 -0.11 -0.16 -0.33 -0.45 -0.42 0.25 -0.11 -0.57 -0.57 0.15 -0.11

R2 0.881 0.865 0.862 0.837 0.822 0.816 0.792 0.766 0.765 0.738 0.725 0.694 0.688 0.675 0.672 0.615 0.604 0.588 0.563 0.557 0.526 0.522 0.517 0.513 0.508 0.501 0.473 0.460 0.453 0.449 0.432 0.419 0.405 0.384 0.382

15 ACS Paragon Plus Environment

Organic Process Research & Development 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Toluene Heptane 1,2-DME DMF MTBE DCM Toluene MTBE MTBE Toluene Toluene Water Heptane EtOAc Acetone MeCN Heptane Water Toluene Water Heptane MeOH DCM Heptane MTBE DMF Water Water Water Water Heptane Heptane Water Water Water

EtOAc isopar G THF THF THF THF THF 2-Propanol Acetone Acetone 2-Propanol 1-Propanol Toluene MeOH MeOH MeOH MTBE Ethanol MeOH MeOH 2-Propanol THF MeOH i-PrOAc MeOH MeOH 2-Propanol THF DMF NMP Acetone MEK MeCN DMAc Acetone

0.48 0.82 0.51 0.68 0.33 0.51 0.28 0.41 0.32 0.31 0.33 0.45 0.17 0.42 0.45 0.38 0.15 0.32 0.19 0.30 0.09 0.14 0.30 0.09 0.16 0.27 0.24 -0.16 0.18 -0.19 0.04 0.04 0.11 -0.15 0.05

-0.49 -0.12 0.03 0.72 -0.62 -0.18 -0.66 -0.66 -0.58 -0.60 -0.71 -0.05 -0.76 -0.65 -0.38 -0.50 -0.78 -0.07 -0.83 -0.25 -0.92 1.12 -0.47 -0.83 -0.73 0.47 0.17 0.41 -0.28 0.45 -0.85 -0.85 0.30 0.35 0.29

0.381 0.368 0.337 0.317 0.294 0.273 0.233 0.207 0.205 0.171 0.150 0.124 0.114 0.108 0.107 0.080 0.079 0.069 0.059 0.053 0.046 0.045 0.043 0.038 0.032 0.031 0.029 0.022 0.014 0.012 0.010 0.010 0.008 0.008 0.001

When we adopt the same minimum R2 value of 0.75 as the cut-off for carrying out the extrapolation, for these charged solutes there are much fewer well correlated solvent pairs. Solvent Group

Correlation

DMF, DMAc

High within group

Minimum number of solvents to include in screen 1 from group

16 ACS Paragon Plus Environment

Page 16 of 21

Page 17 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Organic Process Research & Development

Heptane, cyclohexane MTBE, diethoxymethane EtOAc, n-PrOAc, i-PrOAc, n-BuOAc, i-BuOAc, MIBK Other solvents

High within group High within group High within group

1 from group 1 from group 1 from group

None

All

Table 6. Summary of solubility correlation with charged solutes Note that MIBK is still highly correlated with esters as was the case for non-charged solutes; however, it is no longer highly correlated to other ketones. In total, with charged solutes, the number of solvents that can be extrapolated and omitted from screening is eight. Hierarchical clustering of solvents To further characterize similarities between solvents, hierarchical clustering was used to group solvents by similarity (Figures 7 and 8). Hierarchal agglomerative clustering scores the similarity between different solvents based on a complete set of solubility measurements on individual solutes. The resulting similarity score is a measure of distance between the solubility profiles for each of the pure solvents.

17 ACS Paragon Plus Environment

Organic Process Research & Development 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Fig 7. Clustering of solvents for non-charged solutes

18 ACS Paragon Plus Environment

Page 18 of 21

Page 19 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Organic Process Research & Development

Fig 8. Clustering of solvents for charged solutes Conclusion A total of 63240 pieces of solubility data from 905 pharmaceutically relevant compounds were analyzed to establish statistically significant correlation between solvents pairs. When material is limited, we propose that 24, and 8 common solvents to be omitted from empirical screens for non-charged, and charged solutes, respectively, with use of extrapolation from other highly correlated solvents to provide relevant solubility information.

19 ACS Paragon Plus Environment

Organic Process Research & Development 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

In addition, since the time when solubility correlations between common solvents were established, we have been taking advantage of this knowledge. For example, when empirical data seemed to deviate too much from the expected relationship between certain solvent pairs, we would either repeat the measurements, or analyze if a previously unknown solvate was formed. Finally, other data mining and analysis efforts are still ongoing, and their results will be published in subsequent papers. Author Information *Jun Qiu: Tel. 732-227-6230; Email: [email protected] *Jacob Albrecht: Tel. 732-227-6330; Email: [email protected]

Acknowledgement The authors would like to thank Jacob Janey and Srinivas Tummala for their kind support and frequent discussions.

Footnotes a

At the time of writing, we have carried out about 1500 solubility screens.

b

High correlation: R2 about 0.9; medium-high correlation: R2 about 0.8; medium correlation: R2 about 0.5; low/no correlations: R2 much lower than 0.5.

c

1,2-DCE: 1,2-dichloroethane; 1,2-DME: 1,2-dichloromethane; CPME: cyclopentyl methyl ether; DMAc: N,N-dimethylacetamide; MEK: methyl ethyl ketone; MIBK: methyl isobutyl ketone; TAME: t-amyl methyl ether.

Support Information a. Solvent pair correlation tables (Excel) b. Solubility extrapolation guide (PDF) c. Solvent pair correlation graphs (PDF)

References 1. Alsenz J, Kansy M. 2007. High throughput solubility measurement in drug discovery and development. Advanced Drug Delivery Reviews 59: 546-67 2. Hsieh D, Marchut A, Wei C, Zheng B, Wang S, Kiang S, 2009. Model-Based Solvent Selection during Conceptual Process Design of a New Drug Manufacturing Process. Organic Process Research & Development 13: 690-697. 20 ACS Paragon Plus Environment

Page 20 of 21

Page 21 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Organic Process Research & Development

3. Diorazio L, Hose D, Adlington N. 2016. Toward a More Holistic Framework for Solvent Selection. Organic Process Research & Development 20: 760-773. 4. Black S, Dang L, Liu C, Wei H. 2013. On the measurement of solubility. Organic Process Research & Development 17: 486-9 5. Selekman JA, Qiu J, Tran K, Stevens J, Rosso V, Simmons E, Xiao Y, Janey J. 2017. High-throughput automation in chemical process development. Annu Rev Chem Biomol Eng 8:525-547 6. Ashcroft CP, Dunn PJ, Hayler JD, Wells AS. 2015. Survey of solvent usage in papers published in organic process research & development 1997–2012. Organic Process Research & Development 19: 740-47 7. Prat D, Pardigon O, Flemming H-W, Letestu S, Ducandas V. 2013. Sanofi’s solvent selection guide: a step toward more sustainable processes. Organic Process Research & Development 17: 1517-25 8. Li J, Eastgate M. 2015. Current complexity: a tool for assessing the complexity of organic molecules. Org. Biomol. Chem. 13: 7164-7176 9. Lipinski C, Lombardo F, Dominy B, Feeney P. 2001. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Advanced Drug Delivery Reviews 46: 3–26 10. Veber D, Johnson S, Cheng H, Smith B, Ward K, Kopple K. 2002. Molecular properties that influence the oral bioavailability of drug candidates. J. Med. Chem. 45: 2615-2623 11. Ertl P, Rohde B, Selzer P. 2000. Fast calculation of molecular polar surface area as a sum of fragment-based contributions and its application to the prediction of drug transport properties. J. Med. Chem. 43: 3714-3717

21 ACS Paragon Plus Environment