PDF (9 MB) - ACS Publications

1 Syngenta, 4333 Münchwilen, Switzerland; 2 Syngenta, 1870 Monthey, Switzerland; 3. Syngenta, Bracknell, RG42 6EY, UK; 4 Syngenta, Corlim, Goa, 403110...
2 downloads 0 Views 2MB Size
Subscriber access provided by BUFFALO STATE

Full Paper

Solvent selection methods and tool Patrick M. Piccione, Julia Baumeister, Thomas Salvesen, Christophe Grosjean, Yannick Flores, Eliane Groelly, Vikrant Murudi, Ashok S. Shyadligeri, Olga Lobanova, and Christian Lothschuetz Org. Process Res. Dev., Just Accepted Manuscript • Publication Date (Web): 17 Apr 2019 Downloaded from http://pubs.acs.org on April 17, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Organic Process Research & Development

Title Solvent selection methods and tool

Authors Patrick M. Piccione1,*, Julia Baumeister1, Thomas Salvesen2, Christophe Grosjean1, Yannick Flores1, Eliane Groelly1, Vikrant Murudi4, Ashok Shyadligeri4, Olga Lobanova3, Christian Lothschütz1,†

Affiliations 1

Syngenta, 4333 Münchwilen, Switzerland; 2 Syngenta, 1870 Monthey, Switzerland; Syngenta, Bracknell, RG42 6EY, UK; 4 Syngenta, Corlim, Goa, 403110, India *

3

Current address: F. Hoffmann-La Roche AG, 4070 Basel, Switzerland [email protected] † Current address: DSM, 4334 Sisseln, Switzerland

ACS Paragon Plus Environment

Organic Process Research & Development 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 43

Table of Contents Graphic

ACS Paragon Plus Environment

2

Page 3 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Organic Process Research & Development

Abstract An interactive tool with a browser-type interface has been developed for solvent selection using the software R. Two main classes of considerations can be taken into account: technical suitability for the intended duties; and practical considerations including costs and health, safety environment (HSE) impact. The tool builds on quantitative analyses of properties selected by the user for the application at hand. The underlying philosophy is to assist the thought processes of the tool’s users, rather than to prescribe set answers. The tool is a stepping stone towards Design of Experiment in chemical process development, enabling parameter space exploration without specialized software licenses, and grouping properties to assist the users. Six examples of use are given to illustrate various methodologies. In building the tool, scientific software development was found to be more intrinsically iterative than originally expected, with mock-ups and sharing of user stories more agile than lengthy user requirement specifications. Technical improvements for the future were identified, such as the automation of regressions and Hansen parameter calculations, a more extensive chemical knowledge formalism, and the addition of electron descriptions.

Keywords solvent selection, computer-aided molecular design, software tool, physical properties, chemical properties

ACS Paragon Plus Environment

3

Organic Process Research & Development 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 43

1.

1 Introduction 1.1 Importance and complexity of solvent selection Chemical manufacturing processes rely on solvents for many duties, e.g., facilitating reactions, enabling separations and work-up, lowering viscosity for transport, and cleaning of equipment. Where possible, water is typically used, but many phenomena require different properties, better provided by organic solvents. Two types of characteristics enter into solvent selection: their technical suitability for the duties requiring solvation; and practical considerations such as technico-commercial aspects and health, safety environment (HSE) impact. The performance effect of solvents on reaction rates was already known in the 19th Century1 and more recent information has been summarized by Wicaksono et al.2 HSE considerations are also very important, leading to the concept of green chemistry3 and sustainability, in particular to limit waste generation and reduce accident potential.4 Indeed, in the pharmaceutical industry, solvents have been reported to represent over ¾ of non-aqueous materials.5 Astute solvent selection is thus desirable in chemical process development to maximize process performance whilst minimizing solvent undesirable consequences, whether cost or sustainability related.6 The multitude of possible functional groups and combinations thereof leads to even more numerous property permutations, and thus corresponding solvent choices. Due to the conservativeness of process chemists in solvent selection, it is important to generate multiple alternatives in an initial, broad list.7 Many methods have been put forward to optimize solvent selection, from high-throughput screening to predictive models. The latter typically rely on physicochemical descriptors and can be made more accessible by embodiment in software tools.7-8 To extract the most value from such predictive tools, the corresponding theories and assumptions must be understood. To this end, and to achieve both prediction and insight, a new in-house tool was built at Syngenta and is now described here. This contribution starts by a review of the literature in section 2, covering both solvent selection and physicochemical descriptions of solvent space. Section 3 explains the philosophy underlying tool development together with intended use. The need to combine physical and data science is thus introduced, leading to the development methods of Section 4. The resulting tool is described in Section 5 together with six examples and a commentary.

2 Background 2.1 Solvent selection Extensive literature discussions are available on solvent selection, from both chemists and chemical engineers. Sophistication in the recent solvent selection literature increases steadily from descriptive guides to database lookups and finally predictive techniques. A frequent approach is to translate process requirements into physical

ACS Paragon Plus Environment

4

Page 5 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Organic Process Research & Development

property and/or phase equilibria requirements. Chemical reactivity requirements, however, are more difficult to translate unequivocally in quantitative terms; a typical approach is to “include” or “exclude” various chemical classes. In terms of industrial guides, it is the pharmaceutical sector which has published the most substantial body of work. A continuous stream of articles on solvent selection methods and tools is available.7-15 Solvent selection guides typically rely on physicochemical properties together with HSE parameters, and are in good agreement across different pharmaceutical companies.7 GSK’s solvent selection guides combining physical properties with sustainability principles8-10 are rich in data and heuristics but do not automate data processing. The screenshots in 10 are clearly of a spreadsheet, so similarity searching would require manual operations, which are usually restricted to one property at a time. The 2016 update8 only concentrates on improving the sustainability and life cycle metrics. By contrast, Diorazio et al. at AstraZeneca7 processed data mathematically within a solvent selection tool, a version of which is available online11 through the American Chemical Society’s Green Chemistry Institute Pharmaceutical Roundtable. This tool, well worth exploring, has attractive graphics and features principal component analysis (PCA) although the functionalities are limited: only single properties, or one of three pre-calculated principal components, can be used. In addition, the formulae for the principal components are not given. PCA, starting from limited datasets such as 8 properties for 82 solvents,12 was later extended to far bigger data sets,13 and has also been used in the context of design-of-experiment (DoE) approaches.14 In parallel, academic research has tried to further the selection tools by incorporating the kinetic and thermodynamic impacts of solvents on processes. Predictive approaches to generate solvent candidates form a subset of computer-aided molecular design (CAMD). Progress in the field has been summarized by Wicaksono.2 The combination of physical property constraint settings (to satisfy process requirements) with predictions is documented by Gani et al.,4, 6 leading to ProCAMD, part of the ICAS software suite developed by the KT consortium of Technical U. of Denmark.16 Physical property prediction methods, especially those building on group contributions, are summarized by Hukkerikar et al.17 in the context of sustainable process development. CAMD methods rely on describing the solvents in quantitative ways, e.g., by measurable physical properties such as phase change temperatures. Alone these do not capture chemical effects sufficiently for really valuable decision support. Physico-chemical descriptors have thus been popularized by several authors, including for instance parameters representing the intuitive concepts of acidity, basicity, polarizability. Examples to highlight include Kamlet-Taft18 linear solvation energy relationships (for logarithms of rate and equilibrium constants, and spectral positions and intensities), and Abraham descriptors.19 Extensive descriptor tabulations are available.18-21 The correlations based on these parameters became known as solvatochromic equations, and have been used explicitly for solvent design.22

ACS Paragon Plus Environment

5

Organic Process Research & Development 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 43

New research directions are constantly applied to solvent selection and design. Struebing et al.23-24 combined quantum mechanical calculations with solvatochromic equations to predict rate constants for solvent design. Peters et al.25 used COSMO-RS, an approach later stated to improve by the incorporation of experimental data.26 To reduce numbers of solvents in experimental design, Qiu and Albrecht employed solubility correlations.27 NRTL-SAC models have enabled the design of mixed solvent systems at pharmaceutical companies.28-29 Zhou et al.30 applied genetic algorithm ideas to physico-chemical phenomena leading to yet another computer-aided molecular design approach. Datta et al. further combined genetic algorithms with PCA to optimize reactants and solvents simultaneously.31 Samudra & Sahinidis applied a new optimization framework, which includes graph-theoretical methods, to refrigerant and solvent design.32

2.2 Physicochemical descriptions of solvent space In several cases, combinations of the properties and/or parameters above are meant to be used together to provide an estimate of another property. Many quantitative structure-property relationships (QSPR) have been proposed to correlate the effect of solvents, e.g., on reaction rates.22,30 Of particular interest are relationships based on one of the solvatochromic parameter sets, and the Hansen solubility parameters. 2.2.1 Solvatochromic equations 2.2.1.1 Kamlet-Taft description The generalized linear complexation energy relationship18 using solvatochromic parameters is given in Table 1. As a simplification of the above, the Kamlet-Taft equation18 can be used to correlate properties with a relatively simple description (three parameters), and possessing intuitive meaning. This parameter set can be visualized in three dimensions, and is hence still accessible to unaugmented human beings. By definition, in order to use the equation, it is necessary to collect the values of the desired property in at least four different solvents, ideally more (at least seven is recommended). For maximum predictive power, it is further recommended that a selection of solvents covering a maximal range of α, β, π* be chosen. The coefficients P0, a, b, and s must then be obtained by linear regression. 2.2.1.2 Imperial College / University College variations: Abraham and Folić descriptions Further models based on the generic linear energy relationship above have been developed to estimate physical properties related to mobility of chemicals (Abraham’s equation19) or kinetic rates (Folić’s version of the solvatochromic equation22, 24); see Table 1. As mentioned in the case of the Kamlet-Taft model, to cover a wide solvent space it is recommended that solvents with different physical properties be chosen for the regression. For both approaches, measurements in at least six solvents are required, with Folić et al. recommending eight22.

ACS Paragon Plus Environment

6

Page 7 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46

Organic Process Research & Development

Parameters Kamlet-Taft description: o XYZ0 is a constant, o π* is an index of solvent dipolarity/polarizability, o δ is a "polarizability correction term" equal to 0.0 for nonchlorinated aliphatic solvents, 0.5 for polychlorinated aliphatics, and 1.0 for aromatic solvents, o α is an index of solvent hydrogen bond donor (acidity), o β is an index of solvent hydrogen bond acceptor (basicity), o δH is the Hildebrand solubility parameter, o ξ is a coordinate covalency measure. Simplified Kamlet-Taft description: o P0 is a constant, o π* is an index of solvent dipolarity/polarizability, o α is an index of solvent hydrogen bond donor (acidity), o β is an index of solvent hydrogen bond acceptor (basicity), Abraham’s descriptors: A is the hydrogen bond acidity B is the hydrogen bond basicity S is the polarizability/dipolarity parameter o E is the excessive molar refraction o L is the partitioning coefficient between the gaseous phase and hexadecane o V is the McGowan volume o o o

Solvatochromic equations For a reaction rate or equilibrium constant, or a position or intensity of spectral absorption XYZ: 𝑋𝑌𝑍 = 𝑋𝑌𝑍0 + 𝑠(𝜋 ∗ + 𝑑𝛿) + 𝑎𝛼 + 𝑏𝛽 + ℎ𝛿𝐻 + 𝑒𝜉 The s, d, a, b, h, and e coefficients measure the relative susceptibilities of XYZ to the indicated solvent property scales and are obtained by regression.

The Kamlet-Taft equation for any property P is: 𝑃 = 𝑃0 + 𝑎𝛼 + 𝑏𝛽 + 𝑠𝜋 ∗

For a solvation related property SP: log 𝑆𝑃 = 𝑐 + 𝑒𝐸 + 𝑠𝑆 + 𝑎𝐴 + 𝑏𝐵 + 𝑙𝐿 (1) log 𝑆𝑃 = 𝑐 + 𝑒𝐸 + 𝑠𝑆 + 𝑎𝐴 + 𝑏𝐵 + 𝑣𝑉 (2) (1) Applies for transfer from the gas phase to a condensed phase (2) Applies for transfer between different condensed phases The reaction-specific coefficients, c, e, s, a, b, and l/v are obtained by linear regression (in the case of solubility using equation (2))

ACS Paragon Plus Environment

Organic Process Research & Development 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46

Folić’s solvatochromic equation’s parameters: o A is the hydrogen bond acidity o B is the hydrogen bond basicity

Folić regresses rate constants k according to: log 𝑘 = 𝑐0 + 𝑐𝐴𝐴 + 𝑐𝐵𝐵 + 𝑐𝑆𝑆 + 𝑐𝛿𝛿 + 𝑐𝐻𝛿2𝐻

S is the polarizability/dipolarity parameter o δ is the chemical class indicator o 𝛿2𝐻 is the cohesive energy density (Hildebrand solubility parameter) o

Page 8 of 43

The reaction-specific coefficients, c0, cA, cB, cS, cδ, and cH are obtained by linear regression.

Table 1: Solvatochromic equations.

ACS Paragon Plus Environment

8

Page 9 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Organic Process Research & Development

2.2.2 Hansen solubility parameters The Hansen solubility parameters were developed for polymeric solutes.33 While not discussed further here it is worth noting that the use of Hansen solubility parameters generally requires some adjustment for smaller solutes as described briefly in the ‘HSP Science/Small solutes’ section of 33 and in more details in 34. Therefore, and while comparison of Ra values (see below) is still valuable for early solvent screen, it is recommended to use the Hansen solubility parameters with care. The Hansen equation is: 𝑅𝑎2 = 4(𝛿𝑑1 ― 𝛿𝑑2)2 + (𝛿𝑝1 ― 𝛿𝑝2)2 + (𝛿ℎ1 ― 𝛿ℎ2)2 The three Hansen parameters are defined as follows: dispersive δd, polar δp, and hydrogen bonding δh. The subscripts 1 and 2 refer to the solvent and the solute respectively. The relative solubility is estimated by calculating the relative energy difference of the system (RED): 𝑅𝐸𝐷 = 𝑅𝑎/𝑅𝑜 Here Ro is the solubility radius of the solute. The user needs to input δd2, δp2, δh2, and Ro. The solubility is then estimated based on the value of RED:  RED < 1 – Likelihood of dissolution  RED = 1 – Partial dissolution  RED > 1 – No dissolution To compare solvents for a given solute, Ro (which is a property of the solute only) is not absolutely needed; the Ra values can be compared rather than the RED values. Solute solubility parameters can either be looked up, or estimated, e.g. with ICAS’s ProPred module (available from Technical U. Denmark’s KT Consortium). Hansen’s equation is a non-linear combination of parameters requiring extra user inputs (the solute Hansen parameters). It would thus require special features to be interactive. These were not yet implemented at the time of writing, though the Hansen parameters themselves can be used in the same way as all other parameters, as exemplified for ibuprofen below. Due to the theoretical background they were intended always to be used as a set.

3 Philosophy Although quantitative statistical tools have long been available, they are not universally used in chemical process development. In particular, physical scientists can find it challenging to implement Statistical Design of Experiment. To maximize uptake and impact of the work, careful consideration thus had to be given to development philosophy.

ACS Paragon Plus Environment

Organic Process Research & Development 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 43

3.1 An empowering tool for iterative development The principle underlying this work was to build a tool empowering trained chemists to explore the process space – not a prescriptive tool (with definite decisions), but a suggestive one. The main intended use is the computer-aided generation of a set of solvents to consider trying in the laboratory. Within an iterative workflow, a first set of solvents is explored, not expecting uniform success but rather to maximize information generation. The wide range of performance for this first set can be analyzed in various ways to increase process understanding – in turn leading to a second set of solvents, this time targeted for best performance. The second step can of course be split further into refinement and validation for particularly difficult problems. To support this iterative experimental philosophy, the tool was designed to offer wide ranges of potential alternatives, i.e., a greater variety of solvents than “the usual suspects”. It is then necessary, and expected, that trained chemists will discount several suggestions due to reactivity incompatibilities or other issues. The tool must therefore be interactive, to allow each user to use it in a personalized way in real-time. It must also be visually attractive and easy to use, to be consistent with users’ experience with other software such as mobile apps in their private lives. Development occurred entirely within Syngenta, for two reasons: 1) to make further development possible, including into originally-unforeseen functionalities; 2) to upskill the project team in concepts and theories of solvent selection.

3.2 A combination of physical science and data science Statistical formalisms can seem quite remote from the intuitive chemical description of electrons, named reactions etc. The fundamental axiom of this work is that physicochemical parameters can be used as the numerical description of the chemical world, effectively acting as a Rosetta Stone between chemistry and mathematics. To be thought-provoking, let us imagine a process where diethyl ether as the solvent leads to better-than-needed selectivity but insufficient yield, and where 1,4dioxane (also an ether) as the solvent leads to better-than-needed yield but insufficient selectivity. A reasonable next step might then be to look for a solvent “at the midpoint” of these two solvents. The key question is how to make the midpoint concept meaningful. Typical solvent listings are alphabetical; the half-way point might then be diiodomethane – clearly an absurd choice on the basis of chemical class alone. How can one then approach this? If two or more parameters described the characteristics of the solvents believed to be important for the process, the problem of exploring alternatives would be intuitively far more tractable – one would look for solvents close to the midpoint between a set of points. Principal component analysis allows the generation of 2-D and 3-D maps for any number of parameters. The assumption that the first two, or three, principal components are a good, concise, set of descriptors of all process requirements allows an interpolation to be carried out graphically. Identifying the solvent candidates in the table is then simply a matter of a straight lookup. Importantly, there is no “universal” PCA:

ACS Paragon Plus Environment

10

Page 11 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Organic Process Research & Development

the choice of the parameters used is critical to such an analysis, which is where subject matter expertise expresses itself. Process chemists will know from the reactions, their literature descriptions including mechanism, and their own experience, what solvent properties are likely to matter most for their process. The same holds true for process engineers designing physical unit operations.

4 Methods 4.1 Programme management The original starting point was a data table. The key innovation was the development of an interactive tool (relying on the data table as the primary data repository), so as to make various calculations and theoretical frameworks accessible without each user manually producing calculations, manipulations, and graphs based on own data compilations and in inevitably different ways. Information technology skills are required to produce robust, visually attractive, and sustainable tools. A multi-functional project team was thus assembled (see Table 2). Process chemists* Definition of usage criteria Data for repository

Process engineers Statisticians Functionalities and workflows Fitting techniques Theories for property/phenomena correlations

Data scientists IT implementation

Table 2. Experts’ roles on the project team (* process chemists were the intended main user community)

4.2 Data table A filterable and macro-enabled spreadsheet was set up to capture properties, where each row is a solvent and each column a property. This was considered a good data repository, available to all Syngenta employees. The filters are not required for the Intranet tool, but enable searches if using the spreadsheet alone. The data table lists the properties and parameters in Table 3, divided in classes, for 209 solvents. The references for the property values, where available, are listed in the “References” tab (worksheet) of the data table workbook.

ACS Paragon Plus Environment

11

Organic Process Research & Development 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33Parameters 34 by type 35 36 37 38 39 40 41 42 43 44 45 46

Page 12 of 43

Identifiers

Measured physical or physico- Calculated parameters chemical parameters

Assessed parameters

Name

Molecular Mass MW

Kamlet-Taft parameters:

GSK HSE assessments:

Classification

Density at 20°C 

Hydrogen bond donor acidity α,

Waste

Formula

Reichardt ET(30)

Hydrogen bond acceptor basicity β,

Environmental impact

NMR spectrum link

Intrinsic polarity-polarizability π*

Health Flammability and explosion

CAS number

Acity A

Identifier code

Basity B

Hansen parameters:

Stability

Elutropic series ε0 Silica

dispersive δD

Overall Sum

Solubility

in

water

Flash point closed,

at

20

°C polar δP hydrogen bonding δH

Fp

GHS: Explosive

Vapor pressure at 20 °C or Melting Abraham's parameters: Point (p°) excessive molar refractivity E Boiling Point bp polarizability/dipolarity S Dipole moment μ hydrogen bond acidity A Dielectric constant ε hydrogen bond basicity B Enthalpy of vaporisation ΔHvap Mc Gowan volume V n-octanol water partition coefficient gas/hexadecane partition coeff L Kow Polarizability correction term δ Index of refraction n D

Surface tension σ Viscosity η Thermal conductivity κ Solubility of water in solvent

GHS assessments: GHS: Flammable GHS: Oxidizing GHS: Compressed gas GHS: Corrosive GHS: Toxic GHS: Harmful GHS: Health hazard GHS: Environmental hazard

Hildebrand Solubility (calculated from CMR-classification the Hansen parameters) Price category class Water solubility class

Heat capacity Cp

ACS Paragon Plus Environment

Table 3: considered,

Page 13 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Organic Process Research & Development

4.3 Statistics and data science 4.3.1 Overview of analyses The analyses were implemented using the statistical and programming software R35 and some of the packages that exist in the R macrocosm;36-49 namely for PCA: pcaMethods;36 for plotting: ggplot2,37 plotly,40 networkD3,47 and extrafont;48 and for data organization and analysis: stringi,38 magrittr,39 tidyr,41 readxl,42 boot,43-44 DT,45 lazyeval,46 and dplyr.49 Overviews on the analyses and the algorithms are given below. The web interface was designed and implemented using the Shiny R package to give users access.50 When the interface is launched, the solvent data is uploaded and some initial data sorting operations are carried out; the data is stored in a simple spreadsheet rather than a more native R format in order to enable simple future updating. The following analyses are available to the users: 1) similarity and difference searching using cluster analysis; 2) multi-criteria filtering in property value space; 3) principal component analysis (PCA) for visualization of the solvent properties as maps. They are illustrated in Figure 1.

Figure 1. Types of analyses performed by the Syngenta solvent selection tool

4.3.2 Cluster analysis The motivation of cluster analysis is to partition the data into the collection of groups of data points that are closest to each other in multivariate space, given a particular number of groups.51-53 The approach taken in this work was to allow the user to select a set of properties that were of interest, and to calculate the Euclidean distance between the solvents from these properties. This distance matrix was then clustered using the ward.D2 algorithm from the hclust R function.35 This algorithm was chosen as it generally provides clusters that relate readily to the regions of points that are commonly found in PCA analysis of the same data, thus providing 13 ACS Paragon Plus Environment

Organic Process Research & Development 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

internal consistency between the various modules of the tool. Details on this relation to PCA and how the Ward method works can be found at 54 . Two main types of analyses can then be conducted, both of which require the user to specify the number of alternative solvents to return, p. In similarity searching, the p closest solvents in n-dimensional space are returned. Here, this serves to identify the p solvents closest to the target solvent with respect to the properties deemed important by the user. For difference searching, the data are clustered to produce p+1 clusters, p of which do not contain the target solvent of interest. From each of the p clusters, the algorithm then returns the solvent with the largest distance from the target solvent. This identifies a collection of p solvents which differ significantly from the target solvent and from each other, so that the next set of experimental solvents will maximize coverage of the n-dimensional property space. While other approaches to finding solvents that differ from the target solvent would be possible, this was deemed to be a pragmatic approach to the problem which would return a wide array of solvents and would be unlikely to suggest solvents that were all distant from the target solvent but were very similar to each other. 4.3.3 Filtering Filtering is accomplished by implementing a slider for each of the parameters chosen by the user. The sliders cover the data range of each parameter and enables the user to place their preferred limits on the minimum and maximum values for each of these parameter. These limits are then used in filtering out solvents which have values outside of these ranges, returning a refined set of solvents. The filtering is accomplished using the indexing capabilities of R. 4.3.4 Principal component analysis (PCA) Principal component analysis aims to assist visualization and analysis of an ndimensional parameter space by dimensionality reduction. The technique constructs new, orthogonal, n-dimensional parameters by linear combinations of the original parameter space. The initial component is constructed so as to maximize the amount of data variability it describes and further components are defined so that they maximize the residual variability described, within the limitation of being orthogonal to previous components. PCA has a long history, being initially described by Pearson in 1901;55 an introduction to the subject is available at 56. Choosing only a subset of the first few principal components maximizes information content while minimizing number of parameters. In practice, and for explorations of relatively small datasets, only two or three principal components are often chosen as in 14, since this allows for a single graph or a set of three graphs to represent the spread of observations. To give all variables equal weighting, it is best to scale and center the parameters (by subtracting the mean and dividing the centered variable by the standard deviation of the variable). For visualization purposes, PCA maps are then often color-coded to look at the spread either versus a classification (here often solvent class) or a property of interest. It is also possible, 14 ACS Paragon Plus Environment

Page 14 of 43

Page 15 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Organic Process Research & Development

based on PCA, to regress application-specific quantities, such as reaction rate constants in different solvents. A clear restriction in color coding is the number of distinguishable colors that can be readily differentiated by a human observer. A palette of ten colors that are distinguishable by normal vision and under computer evaluation of deficient color vision57 was devised and put in place. However, some of the parameters in the data have more than 10 categories and these remain problematic as it is difficult to find palettes that will cope with that many categories given common color vision capabilities. The package pcaMethods36 was used for the PCA implementation since it is wellvalidated, offering distance to model calculations which are used in model evaluation. The basic presentation of the PCA output is a scores plot which presents the projection of the original variables onto the newly formed orthogonal components. This is commonly used to highlight areas of points in the principal component space which can be considered to be similar. In addition to this PCA map, it is instructive to look at the loadings plot, which shows the contributions to the transformed principal components from the original variables. This helps to envisage how the original variables are correlated to each other and the new principal components. An evaluation of the model is important, since it is possible to obtain a PCA model which does not describe a reasonable amount of the original data’s variation and is therefore useless in any modelling sense. The R2 and cross validated Q2 for the model are calculated to indicate how well the model describes and predicts the data respectively.36 The scores plot is equipped with an oval which demarcates the distance, within the plane of the model, beyond which a point might be considered an outlier. Moderate outliers can also be assessed using the ‘distance-to-model’ measure. This compares the distance of a point orthogonal to the model with a critical distance that can be calculated using a number of different methods.58-61 These methods all tend to give slightly different results so a critical zone has been added to the distance-to-model graphic which covers all of the critical distances calculated. Eriksson et al.58 state that mild outliers should be seen as points that are outside the critical distance while more serious outliers are at greater than twice the critical distance. The approach was thus taken that a critical zone is an appropriate way to assess the distance from the model.

5 Results and discussion The general implementation of the data table and tool is explained first, followed by several examples of usage.

5.1 Data table The data table was made available online internally, but editing was restricted to avoid corruption. Its general appearance is shown in Figure 2.

15 ACS Paragon Plus Environment

Organic Process Research & Development 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Name

Water Acetonitrile Propanenitrile Formamide Dimethyl formamide N-Methylformamide Dimethyl sulphoxide N-Methyl pyrrolidone (NMP) Dimethyl acetamide Dimethylpropylene urea 1,3-Dimethyl-2-imidazolidinone Nitromethane

Classification

Formula

CAS

Molecular Mass [g/mol]

Density [g/ml] at 20°C

Water Polar/Aprotic Polar/Aprotic Polar/Aprotic Polar/Aprotic Polar/Aprotic Polar/Aprotic Polar/Aprotic Polar/Aprotic Polar/Aprotic Polar/Aprotic Other

H2O C2H3N C3H5N CH3NO C3H7NO C2H5NO C2H6OS C5H9NO C4H9NO C6H12N2O C5H10N2O CH3NO2

7732-18-5 75-05-8 107-12-0 75-12-7 68-12-2 123-39-7 67-68-5 872-50-4 127-19-5 7226-23-5 80-73-9 75-52-5

18.01 41.05 55.08 45.04 73.09 59.07 78.14 99.13 87.12 128.18 114.15 61.04

1 0.78 0.78 1.13 0.95 1.003 1.1 1.03 0.94 1.064 1.06 1.14

Page 16 of 43

Reichardt ET(30) [kcal/mol] 63.1 45.6 43.6 55.8 43.2 54.1 45.1 42.2 42.9 42.5 46.3

Flash point Melting Point Boiling Point Dipole [°C] closed [°C] [°C] moment [D]

2 2 175 58 111 88 86 66 121 114 36

0 -45 -93 3 -61 -4 19 -24 -20 -23 8.2 -29

100 82 97 220 153 200 189 202 165 247 225 101

1.85 3.2 4.05 3.73 3.8 3.83 3.96 0.57 3.8 4.17 4.09 3.46

Figure 2. Appearance of the data table

16 ACS Paragon Plus Environment

Page 17 of 43

In terms of the tabulated properties, it must be emphasized that these are not fully linearly independent since different authors use different descriptors for similar (or related) concepts. Since the tool philosophy was to allow the users to select their own favorite properties for analyses, this was not considered a problem. For instance, both Abraham’s hydrogen bond acidity A and Kamlet-Taft’s hydrogen bond donor acidity α relate to the concept of “acidity”, so their correlation is quite high at R2 = 0.85, as shown in Figure 3.

1 Abraham's A

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Organic Process Research & Development

0.8 0.6 0.4 0.2

R² = 0.85

0 0

0.5 1 1.5 Kamlet-Taft α

2

Figure 3: Correlation between Kamlet-Taft  and Abraham’s A parameters.

5.2 Interactive tool The R-Shiny tool appears as a typical browser interface, with multiple tabs. All major parameter and visualization choices are selected in a “start” tab, leading to real-time calculations in the various results tabs. Access is provided by sign-in with Windows credentials. 5.2.1 General layout The tabs available are shown in Figure 4. Sequentially, they are the control panel (START), followed by various analyses, an azeotrope lookup function, the definitions of the properties, the PCA plots, and the PCA diagnostics (for interpretation of significance of the plots).

Figure 4. Tabs available in the interactive tool.

5.2.2 Control panel: “Start” tab The layout of the control tab is color-coded in Figure 5. 1. First the user must select the properties which will form the basis of the analysis. 17 ACS Paragon Plus Environment

Organic Process Research & Development 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 43

a. The user can choose properties individually in the left-most portion of the screen, outlined in red. These are tick-on/off boxes, any number of which can be chosen. b. A parallel functionality is to choose a group of properties together (orange section). This is recommended when wanting to use parameters believed to form a descriptive set together, such as the solvatochromic equations parameters (as explained above under “Physicochemical descriptions of solvent space”). At the time of writing, the groupings of Table 4 are supported. The corresponding sets of properties are thus selectable together for convenient access to the occasional user. 2. For the cluster analysis, the user can select if they want to find solvents that are similar or dissimilar to a target solvent on the basis of the properties previously selected. The maximum number of solvents returned is also chosen in the green block. Further a filter function allows to deselect specific solvent classes or solvents – for instance, based on known chemical incompatibilities. 3. For principal component analyses, additional controls are given in the purple block. The variables selected in the left part of the screen are auto-populated for convenience, and further editing is possible. Color-coding by variable (often solvent class) and labelling of the points can be assigned to any variable. Centered and scaled variables are the default for the PCA, since this is the most common way of performing a PCA - but this can be overridden. Finally, the number of principal components to use can be chosen by the user. The most common choice is 2 or 3. Although the PCA module functions are controlled from the Start tab, the output is then visible in the “PCA plots” and “PCA diagnostics” tabs. 4. The state of any search can be saved as a bookmark using the button highlighted in blue. 5. It is also possible to match numerical values of properties directly without having to make reference to a solvent, or to perform a multi-dimensional filter search. To do this, the properties of interest are still selected in the control panel in the same way described above, but the actual setting of the numerical values occurs in the “Match properties” and “Solvent filter” tabs respectively.

18 ACS Paragon Plus Environment

Page 19 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Organic Process Research & Development

Principal Component Analysis (PCA)

Choose all possible properties individually

 Choose group(s) of properties  Compare with known solvent  Filter  Share or save current status of search

Figure 5. Layout of control

panel.

19 ACS Paragon Plus Environment

Organic Process Research & Development 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 43

Table 4. Selectable pre-defined groupings of properties

Name

Solvency - Kamlet-Taft only Solvency for reactionsSolvatochromic equation

Solvency for gas-liquid reactionsAbrahams-L

Solvency for condensed phase reactions-Abrahams-V

Solvency - Kamlet-Taft & other general parameters

Polarity descriptors Solubility Parameters (gE models, regular solution theory)

General physical properties

Parameters

Kamlet–Taft hydrogen bond donor Kamlet–Taft hydrogen bond acceptor Kamlet–Taft dipolarity/polarizability Abrahams polarizability dipolarity S Abrahams hydrogen bond acidity A Abrahams hydrogen bond basicity B Polarizability correction term delta Hildebrand Solubility deltaH_sq Abrahams excessive molar refractivity E Abrahams polarizability dipolarity S Abrahams hydrogen bond acidity A Abrahams hydrogen bond basicity B Abrahams gas hexadecane partition coeff L Abrahams excessive molar refractivity E Abrahams polarizability dipolarity S Abrahams hydrogen bond acidity A Abrahams hydrogen bond basicity B Abrahams Mc Gowan volume V Dimroth-Reichardt Et Parameter Acity Basity Kamlet–Taft hydrogen bond donor Kamlet–Taft hydrogen bond acceptor Kamlet–Taft dipolarity/polarizability Elutropic series Dipole moment Dielectric constant Hansen par. – dispersive (delta_D) Hansen par. – polar (delta_P) Hansen par. – hydrogen bonding (delta_H) Dipole moment Dielectric constant Hansen par. – dispersive (delta_D) Hansen par. – polar (delta_P) Hansen par. – hydrogen bonding (delta_H) Hildebrand Solubility deltaH_sq Hildebrand Solubility - calculated Molecular Mass Density Flash point Vapor pressure Melting point Boiling point Water solubility Dipole moment Dielectric constant 20 ACS Paragon Plus Environment

Page 21 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Organic Process Research & Development

GSK HSE descriptors

Vaporisation enthalpy GSK Guide: Waste GSK Guide: Environmental impact GSK Guide: Health GSK Guide: Flammability and explosion GSK Guide: Stability

5.3 Examples Six examples are described below: similarity searching, direct matching of properties, multidimensional filtering, matching of solute properties, difference by clustering, and PCA mapping. A further example on multidimensional filtering to find an extraction solvent is provided in the Supporting Information. 5.3.1 Find a Similar Solvent (diethyl ether alternatives) In process development there are many cases where an efficient solvent for the chemical reaction is already known, but is not favored for other reasons like toxicity, process safety, cost, environmental impact or the interface with other steps in the synthesis. The functionality Similarities/Differences can be used to discover other solvents that are similar to the original solvent regarding properties chosen by the user. An example of a solvent that is regularly found in publications and at research lab scale is diethyl ether. In process development diethyl ether is strongly unfavored due to its tendency to form explosive peroxides. In order to find a replacement for diethyl ether with similar polarity properties, but better performance with respect to process safety, the Solvent Selection Tool can perform a similarity analysis. In this case the Kamlet-Taft parameters were used, since acidity, basicity and polarity were believed to be important characteristics for many reactions.62 Figure 6 shows the control tab and Figure 7 the results returned. The tool proposes solvents with similar Kamlet-Taft parameters and also tabulates various process safety parameters like flash point and the GSK stability rating. In this case the tool proposes other ethers, but some of them can also form peroxides. t-Butyl methyl ether and cyclopentyl methyl ether are interesting alternatives as well as butyl acetate, a promising solvent with similar Kamlet-Taft parameters and a good GSK HSE rating. The result lists of subsequent examples have been converted to tables for maximal legibility due to the limitations of screen captures, but their typical appearance is similar to Figure 7. The interested reader will find them in the Supporting Information, together with the full table version of Figure 7. A functionality to highlight is that any column can be used for sorting (increasing or decreasing), using the small grey arrows next to the column label.

21 ACS Paragon Plus Environment

Organic Process Research & Development 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 43

Figure 6: Input parameters for similarity searching.

22 ACS Paragon Plus Environment

Page 23 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Organic Process Research & Development

Figure 7. Screenshots of similarities results in the tool for diethyl ether, based on the Kamlet-Taft parameters.

23 ACS Paragon Plus Environment

Organic Process Research & Development 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 43

5.3.2 Match Properties (2-MeTHF alternatives with lower solubility) The Match Properties functionality works in a similar way, but does not need a start solvent to compare with. Here the user chooses the properties first, and then modifies the target values. In this example a solvent is needed that has similar polarity, and hence influence on the reaction, as 2-methyltetrahydrofuran (2-MeTHF), but a lower solubility of water in the solvent. The latter condition is useful to simplify the recycling of the solvent. The chosen parameters, as shown in Figure 8, are again the Kamlet-Taft parameters, this time augmented by the water solubility. Hydrogen bond donor, hydrogen bond acceptor and polarizability are set to the values of 2-MeTHF, whereas the solubility is adjusted manually. The tool uses the clusters to find a list of solvents (see Table 5) that is similar to the given values. iso-Amyl alcohol, tert-butyl methyl ether and 3-pentanone are proposed as most similar solvents. This is a reasonably result, although alcohols and ketones might not always be compatible with reaction conditions. All values can be manually adjusted to change the searching parameters.

Figure 8. Screenshot of the chosen filters (match properties).

24 ACS Paragon Plus Environment

Page 25 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Organic Process Research & Development

Table 5. Results (match properties).

Name

Kamlet-Taft Classification Formula hydrogen bond donor

iso-Amyl alcohol t-Butyl methyl ether 3-Pentanone 1-Pentanol Dichloromethane Methyl isobutyl ketone Isopropyl acetate Propyl acetate Benzyl alcohol Ethyl propionate

Alcohol Ether Ketone Alcohol Halogenated Ketone Ester Ether Alcohol Ester

C5H12O C5H12O C5H10O C5H12O CH2Cl2 C6H12O C5H10O2 C5H10O2 C7H8O C5H10O2

Kamlet-Taft hydrogen bond acceptor

0.84 0 0 0.84 0.13 0.02 0 0 0.6 0

0.86 0.45 0.45 0.86 0.1 0.48 0.49 0.4 0.52 0.42

Kamlet-Taft dipolarity / polarizability 0.4 0.25 0.72 0.4 0.82 0.65 0.48 0 0.98 0.47

Water solubility 30 26 35 21 20 19 18.9 18.9 40 17

25 ACS Paragon Plus Environment

Organic Process Research & Development 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 43

5.3.3 Multidimensional filter with HSE constraint: high flash point solvents The Solvent Filter allows the user to purely filter the existing data table without clusters. This functionality is employed here to find a solvent with a high flash point. The general solvent parameters are chosen as melting point below -20 °C and boiling point between 60 and 210 °C in order to have a solvent that is liquid at most operating conditions and can be recycled by distillation. The third filter is then the flash point that is set to be greater than 60 °C as shown in Figure 9. Table 6 shows the table of results. Four solvent could be found that have a flashpoint greater than 60°C. They can be considered as non-flammable at ambient temperature.63

Figure 9. Screenshot of the chosen filters (high flash point solvents).

Name N-methyl pyrrolidinone Ethyl methyl pyridine 1,2-propanediol 2-Ethyl hexanol

Classification Polar/Aprotic Base Alcohol Alcohol

Formula C5H9NO C8H11N C3H8O2 C8H18O

Flash point Melting point Boiling point 86 -24 202 74 -70 178 101 -60 188 73 -76 185

Table 6. Results (multidimensional filtering: high flash point solvents).

In this case the number of results is limited. In order to have more options available, it could be possible to be more flexible with the given ranges. If for example the melting point of the solvent can be higher than -20 °C (e.g., up to 0 °C), ten solvents are found with a flash point greater than 60 °C (see Table 7). 26 ACS Paragon Plus Environment

Page 27 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Organic Process Research & Development

Name N-methylformamide N-methyl pyrrolidinone Dimethyl acetamide 1,2-Dichlorobenzene Ethyl methyl pyridine 1,2-propanediol Benzyl alcohol Ethylene glycol 2-Ethyl hexanol Benzonitrile

Classification Polar/Aprotic Polar/Aprotic Polar/Aprotic Halogenated Base Alcohol Alcohol Alcohol Alcohol Aromatic

Formula C2H5NO C5H9NO C4H9NO C6H4Cl2 C8H11N C3H8O2 C7H8O C2H6O2 C8H18O C7H5N

Flash point Melting point Boiling point 111 -4 200 86 -24 202 66 -20 165 66 -17 180 74 -70 178 101 -60 188 94 -15 205 111 -13 197 73 -76 185 70 -13 190

Table 7. Results with relaxed constraints (multidimensional filtering: high flash point solvents).

5.3.4 Matching solute properties: ibuprofen solvents It is also possible to Match Properties of a solute. A typical application might be to use the Hansen parameters to estimate solubility, while ensuring convenience by controlling the phase change temperatures. Solute physical properties can either be looked up or estimated from predictive methods. As an example, consider the generation of alternative solvents for ibuprofen, the Hansen parameters for which are 16.4 (dispersive), 6.4 (polar) and 8.9 (hydrogen bonding).64 A range of Hansen parameters differing by no more than 3 units in each was used to set the filters. To ensure a comfortable liquid range at normal processing temperatures, one can further restrict solvents to have boiling points below 100 °C and melting points below -21 °C (see Figure 10).

27 ACS Paragon Plus Environment

Organic Process Research & Development 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 10. Input to “Solvent properties” tab to identify convenient ibuprofen solvents

A relatively small set of seven solvents is returned: dichloromethane, trichloroacetonitrile, tetrahydrofuran, methyl acetate, ethyl acetate, ethyl formate and isopropyl acetate. Indeed the solubility of ibuprofen in dichloromethane and ethyl acetate is known to be high.65 Alcohols are not found despite the high solubility of ibuprofen in this chemical class. The accuracy of results, of course, can depend on applicability of the Hansen description to the problem at hand, and accuracy of the values.

5.3.5 Dissimilarity: Finding solvents for an early solubility screen The reverse functionality of the cluster algorithm is to identify a set of solvents dissimilar to each other and to a “seed” solvent. Imagine a crystallization for which water is a poor solvent, yet little is known about exact requirements on the solvent, and performance of organic solvents. For a first screen, a dozen solvents to maximize information generation can be selected by difference searching using the Abraham’s parameters as a set of properties describing a range of solvent behavior and phenomena. The cluster algorithm returns the following list, based on the tree of Figure 11: Nmethylformamide, dimethyl sulphoxide, isooctane, cis-decalin, bis(2-methoxyethyl) ether, 28 ACS Paragon Plus Environment

Page 28 of 43

Page 29 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Organic Process Research & Development

diphenyl ether, butyl acetate, pyridine, tributylamine, p-xylene, glycerol, 2-ethylhexanol. These solvents can be visualized as the solvents with blue labels in Figure 11, where water is indicated in red (see arrow). Inspection of the properties (Table 8) confirms the chemical intuition that a wide variety of solvents have indeed been proposed, and that they differ from each other as well as from water.

Figure 11: Cluster structure based on “Solvency for condensed phase reactions-AbrahamsV” grouping of properties.

29 ACS Paragon Plus Environment

Organic Process Research & Development 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Abrahams excessive molar refractivity E

Page 30 of 43

Abrahams polarizability / dipolarity S

Abrahams hydrogen bond acidity A

Abrahams hydrogen bond basicity B

Abrahams McGowan volume V

Name

Classification

Formula

N-methylformamide

Polar/Aprotic

C2H5NO

0.405

1.36

0.4

0.55

0.5059

Dimethyl sulphoxide

Polar/Aprotic

C2H6OS

0.522

1.72

0

0.97

0.6126

Isooctane

Hydrocarbon

C8H18

0

0

0

0

1.236

cis-Decalin

Hydrocarbon

C10H18

0.55

0.25

0

0

1.3004

Bis(2-methoxyethyl) ether

Ether

C6H14O3

0.113

0.76

0

1.17

1.1301

Diphenyl ether

Ether

C12H10O

1.216

1.08

0

0.2

1.3829

Butyl acetate

Ester

C6H12O2

0.071

0.6

0

0.45

1.0284

Pyridine

Base

C5H5N

0.631

0.84

0

0.52

0.6753

Tributylamine

Base

C12H27N

0.051

0.15

0

0.79

1.8992

p-Xylene

Aromatic

C8H10

0.613

0.52

0

0.16

3.839

Glycerol

Alcohol

C3H8O3

0.512

0.76

0.47

1.43

0.7074

2-Ethyl hexanol

Alcohol

C8H18O

0.209

0.39

0.37

0.48

1.2945

Water

Water

H2O

0

0.45

0.82

0.35

0.1673

Table 8. Properties of solvents proposed for the difference analysis from water.

30 ACS Paragon Plus Environment

Page 31 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Organic Process Research & Development

5.3.6 Principal Component Analysis PCA can be used in at least three ways: to explore trends in a solvent set using a particular description (set of parameters); to look for alternatives for a given solvent, and to interpolate between solvents. To illustrate these uses, consider a gas-phase reaction, which can be described using the solvency for gas-liquid reactions-Abrahams-L parameter set. The 5 parameters thereof are available for 106 solvents, and the distance to model plot suggests good predictive power (Figure 12).

Figure 12. Distance to model plot for a PCA based on the gas-liquid reactions-Abrahams-L parameter set.

The overall shape of the PCA shows strong groupings as lines of functional classes, consistent with expectation since Abraham’s description is heavily based on observable effects on chemical species (Figure 13).

31 ACS Paragon Plus Environment

Organic Process Research & Development 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 43

Figure 13. Full PCA map of the first two components using the gas-liquid reactionsAbrahams-L parameter set.

The use of the third principal component leads to greater spread across functional groups (Figure 14), but there are still visible colour “islands” – again, as expected. One cannot easily change Figures 11 and 12 in such a way that the reader can identify all the solvents. A zoom function was therefore added in the R implementation.

32 ACS Paragon Plus Environment

Page 33 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Organic Process Research & Development

Figure 14. Full PCA map using the gas-liquid reactions-Abrahams-L parameter set (top: PC-3 vs. PC-1; bottom: PC-3 vs PC-2).

33 ACS Paragon Plus Environment

Organic Process Research & Development 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 43

For the second condition of use, alternatives to the peroxidizable tetrahydrofuran (THF) can be identified by looking radially away from THF on the PC-1 vs PC-2 map. This “target” solvent has been identified by a manually added arrow in the figures below to maximize readability in article format. Relatively close to THF, but with different functional groups, are butyl acetate, diethyl carbonate, other esters, methyl isobutyl ketone (and other ketones); and a bit further away, xylene and dichloroethane (see Figure 15). While the exact reaction’s nature would determine which of these alternatives are more worthwhile, the PCA has allowed for a visual identification of alternatives – consistent with the exploratory philosophy above. The plot of the third principal component (see Figure 16) versus the first allows either for an expansion of possibilities or a refinement to a smaller subset, depending on how long a list was generated from the plot of the first two principal components.

Figure 15. Alternatives to tetrahydrofuran for a gas-liquid reaction: First two principal components.

34 ACS Paragon Plus Environment

Page 35 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Organic Process Research & Development

Figure 16. Alternatives to tetrahydrofuran for a gas-liquid reaction: PC-3 vs PC-1.

The last condition to use relates to finding a solvent with intermediate properties between other ones. Of course, similarities across homologous series are easily spotted on a PCA map due to its highly visual nature. Finding solvents of intermediate properties across a more diverse set is more challenging. As an example, consider a gas-liquid reaction studied extensively in ethers and for which intermediate properties between 1,4-dioxane and diethyl ether are desired. Using the Abraham description, the map from the first two principal components, enlarged in the dioxane-diethyl ether area (see Figure 17) suggests esters, C-5 ketones, but also propanenitrile as possible alternatives. Here a line connecting the “two original” solvent has been added manually to maximize readability in article format. Other solvents in the general area of solvent space are isoamyl alcohol, triethylamine, and nitromethane, though this last option has an unfavourable HSE profile. Here too, additional alternatives or a smaller set of options can be obtained by looking at further principal components, starting from the third one (Figures 18 and 19). Upon doing so, the esters and ketones remain close candidates, as well as isoamyl alcohol; propanenitrile, nitromethane and trimethylamine, by contrast, are farther in space once the third principal component is examined.

35 ACS Paragon Plus Environment

Organic Process Research & Development 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 36 of 43

Figure 17. Solvents intermediate to 1,4-dioxane and diethyl ether. Main map: First two principal components.

Figure 18. Solvents intermediate to 1,4-dioxane and diethyl ether: PC-3 vs PC-1 for refinement.

36 ACS Paragon Plus Environment

Page 37 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Organic Process Research & Development

Figure 19. Solvents intermediate to 1,4-dioxane and diethyl ether: PC-3 vs PC-2 for refinement.

5.4 Commentary The examples shown above demonstrate the versatility of the solvent selection tool developed at Syngenta. It can be used: 1) to generate a list of solvents similar to one that has already been identified; 2) to generate a collection of solvents all dissimilar to one another and to a target solvent, and also without reference to a single specific solvent: 3) to interpolate visually between a pair or more solvents; or finally 4) to match a selection of properties, such as those of a solute. The large set of parameters makes it possible to include technical performance, practical and environmental considerations – thus leading to truly holistic technico-economical assessments. The examples are not meant to be exhaustive: the imagination of the process technologist should be the limit rather than the uses to date. The original philosophy of empowering exploration was embodied in the variety of parameters and descriptions tabulated in the data table, and thus accessible to the users. Furthermore, the variety of algorithms and visualizations (tables, clusters, and maps) ensured that the preferences of users were met so as to maximize uptake. The use of the free software R, with an established open-source community, allowed for reuse of well-validated modules for many tasks. The use of R also obviated the need for statistical software licensing. The browser interface, with several pre-set analyses as well as property groupings, supported seamless calculations “on the fly” without requiring users to learn a new program. The tool has been rolled 37 ACS Paragon Plus Environment

Organic Process Research & Development 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 38 of 43

out across several departments with live demonstrations, and is now available to every Syngenta employee. The tool offers different capabilities from the ACS tool11 with respect to cluster analysis, and far greater interactivity in the parameters the user selects depending on process requirements, though the ACS tool has a larger database. Upon request by users, solvent names in PCA maps are easily accessible and printable in the Syngenta tool, rather than requiring hovering. Indeed, to remain meaningful to physical scientists, it was important to label graphs and points more extensively than in 11. The groupings of properties to facilitate their selection based on theoretical descriptions is another new feature introduced here.

6 Conclusions An interactive tool with browser-type interface was developed at Syngenta to assist with solvent selection, based on physical properties and other parameters. The tool automates a variety of quantitative analyses of properties selected by the user for the application at hand. Its underlining philosophy has thus been to assist the thought processes of its users, rather than to prescribe set answers. The tool is very easy to use, and accessible to anyone at Syngenta; it has been rolled out at various sites and functions. The tool is a stepping stone towards Design of Experiment in chemical process development, in that it enables parameter space exploration without specialized software licenses, and with groupings of properties pre-defined to assist the users. In building the tool, insights into the process of scientific software development were also obtained. Different scientific specialisms lead to different languages and approaches. Tool development was found to be more intrinsically iterative than originally expected. While physical properties are a Rosetta stone for the meeting of chemists, engineers, and statisticians, progress was accelerated when real test cases were brought on the table. Finding a way to unequivocally explain what functionalities were desired proved trickier than expected. The solution found here was to discuss mock-ups and share user stories rather than to write requirement specifications. However the definition of “user stories” likely will be different for a different team. Possible technical improvements were continuously identified and captured. An important direction in the mathematics is to automate regressions and the resulting optimizations, e.g. of rate constant and solubility logarithms. In terms of basic chemistry, the “chemical knowledge” of the data table needs improving by allowing the description of molecules as consisting of multiple functional groups, or belonging to multiple chemical classes; this will also enhance filtering capability of the R tool. Finally, in terms of parametric descriptions, there are two aspects: the incorporation of a full estimate of solubility using the Hansen equation rather than only matching the Hansen parameters manually; and the addition of electronic descriptions (e.g., sigma profiles).66 38 ACS Paragon Plus Environment

Page 39 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Organic Process Research & Development

Associated content The Supporting Information is available free of charge at on the ACS Publications website. It includes screenshots of the native output from the interactive tool, as well as the full table contents from the first example (finding alternatives for diethyl ether by similarity searching). An additional example is also described: multi-dimensional filtering for liquid-liquid extraction.

Acknowledgments The authors acknowledge Syngenta for supporting this work. P.M.P. and E.G. thank Dr. Alay Arya (Technical University of Denmark) for help with locating physical property values. P.M.P. further thanks Professor Georgios Kontogeorgis (Technical University of Denmark) for comments on the manuscript draft. This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

7 References [1] Menschutkin, N. Über die Affinitätskoeffizienten der Alkylhaloide und der Amine. Zeitschrift für Physikalische Chemie 1890, 6, 41-57. [2] Wicaksono, D.S.; Mhamdi, A.; Marquardt W. Computer-aided screening of solvents for optimal reaction rates. Chem. Eng. Sci. 2014, 115, 167-176. [3] Anastas, P. T.; Warner, J. C. (1998). Green chemistry: theory and practice. Oxford: Oxford University Press. [4] Gani, R.; Jiménez-González, C.; Constable, D.J.C. Method for selection of solvents for promotion of organic reactions. Comp. Chem. Eng. 2005, 29, 1661-1676. [5] Constable, D.J.C.; Jiménez-González, C.; R. K. Henderson, R.K. Perspective on Solvent Use in the Pharmaceutical Industry. Org. Process Res. Dev. 2007, 11, 133–137. [6] Gani, R.; Gomez, P.A.; Folić, M.; Jiménez-González, C.; Constable, D.J.C. Solvents in organic synthesis: Replacement and multi-step reaction systems. Comp. Chem. Eng. 2008, 32, 2420-2444. [7] Diorazio, L.J.; Hose, D.R.J.; Adlington, N.K. Toward a More Holistic Framework for Solvent Selection. Org. Proc. Res. Dev. 2016, 20, 760–773. [8] Alder, C.M.; Hayler, J.D.; Henderson, R.K.; Redman, A.M.; Shukla, L.; Shuster, L.E.; Sneddon, H.F. Updating and further expanding GSK’s solvent sustainability guide. Green Chem. 2016, 18, 3879-3890. [9] Curzons, A.D.; Constable, D.J.C.; Mortimer, D.N.; Cunningham, V.L. So you think your process is green, how do you know?—Using principles of sustainability to determine what is green–a corporate perspective. Green Chem. 2001, 3, 1–6. 39 ACS Paragon Plus Environment

Organic Process Research & Development 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 40 of 43

[10] Henderson, R.K.; Jiménez-González, C.; Constable, D.J.C.; Alston, S.; Inglis, G.G.A.; Fisher, G.; Sherwood, J.; Binks, S.P., Curzons, A.D. Expanding GSK’s solvent selection guide – embedding sustainability into solvent selection starting at medicinal chemistry. Green Chem. 2011, 13, 854-862. [11] https://www.acs.org/content/acs/en/greenchemistry/research-innovation/tools-forgreen-chemistry/solvent-selection-tool.html [12] Carlson, R.; Lundstedt, T.; Albano, C. Screening of suitable solvents in organic synthesis: Strategies for solvent selection. Acta. Chem. Scan. 1985, B39, 79-91. [13] Katritzky, A.R.; Fara, D.C.; Kuanar, M.; Hur, E.; Karelson, M. The classification of solvents by combining classical QSPR methodology with principal component analysis. J Phys Chem A. 2005, 109, 10323-10341. [14] Murray, P.M.; Bellany F.; Benhamou, L.; Bučar, D.-K.; Tabor, A.; Sheppard, T.D. The application of design of experiments (DoE) reaction optimisation and solvent selection in the development of new synthetic chemistry. Org. Biomol. Chem. 2016, 14, 2373-2384. [15] Prat, D.; Pardigon, O.; Flemming, H.-W-; Letestu, S.; Ducandas, V.; Isnard, P.; Guntrum, E.; Senac, T.; Ruisseau, S.; Cruciani, P.; Hosek, P. Sanofi’s Solvent Selection Guide: A Step Toward More Sustainable Processes. Org. Proc. Res. Dev. 2013, 17, 1517-1525. [16] Gani, R. Case Studies in Chemical Product Design – Use of CAMD Techniques. (2007) In Ng., K.M.; Gani R.; Dam-Johansen, K., editors. Chemical Product Design: Toward a Perspective through Case Studies. Amsterdam: Elsevier. [17] Hukkerikar A.S.; Sarup, B.; ten Kate, A.; Abildskov, J.; Sin, G.; Gani, R. Groupcontribution+ (GC+) based estimation of properties of pure components: Improved property estimation and uncertainty analysis. Fluid Phase Eq. 2012, 321, 25-43. [18] Kamlet, M.J.; Abboud, J.-L.M.; Abraham, M.H.; Taft, R.W. Linear Solvation Energy Relationships. 23. A Comprehensive Collection of the Solvatochromic Parameters, π*, α and β, and Some Methods for Simplifying the Generalized Solvatochromic Equation. J. Org. Chem. 1983, 48, 2877-2887. [19] Abraham, M.H. Scales of Solute Hydrogen-bonding: Their Construction and Application to Physicochemical and Biochemical Processes. Chem. Soc. Rev. 1993, 73-83. [20] Marcus, Y. The Properties of Organic Liquids that are Relevant to their Use as Solvating Solvents. Chem. Soc. Rev. 1993, 409-416. [21] Marcus, Y. The Properties of Solvents. Wiley: London, 1998. [22] Folić, M.; Adjiman, C.S.; Pistikopoulos, E.N. The Design of Solvents for Optimal Reaction Rates, 14th European symposium on computer-aided process engineering proceedings 2004, 175-180. [23] Struebing, H.; Obermeier, S.; Siougkrou, E.; Adjiman, C.S.; Galindo, A. A QMCAMD approach to solvent design for optimal reaction rates. Chem. Eng. Sci. 2017, 159, 69-83.

40 ACS Paragon Plus Environment

Page 41 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Organic Process Research & Development

[24] Struebing, H.; Ganase, Z; Karamertzanis, P.G.; Siougkrou, E.; Haycock, P.; Piccione P.M.; Armstrong, A.; Galindo, A.; Adjiman, C.S. Computer-aided molecular design of solvents for accelerated reaction kinetics. Nature Chem. 2013, 5, 952-957. [25] Peters, M.; Greiner, L.; Leonhard, K. Illustrating computational solvent screening: Prediction of standard Gibbs energies of reaction in solution. AIChE J. 2008, 54, 2729–2734. [26] Lapkin, A.A.; Peters, M.; Greiner, L.; Chemat, S.; Leonhard, K.; Liauw, M.A.; Leitner, W. Screening of new solvents for artemisinin extraction process using ab initio methodology. Green Chem. 2010, 12, 241–251. [27] Qiu, J.: Albrecht, J. Solubility Correlations of Common Organic Solvents. Org. Proc. Res. Dev., 2018, 22, 829-835. [28] Hsieh, D.; Marchut, A.J.; Wei, C.; Zheng, B.; Wang, S.S.Y.; Kiang, S. Model-Based Solvent Selection during Conceptual Process Design of a New Drug Manufacturing Process. Org. Proc. Res. Dev. 2009, 13, 690-697. [29] Kokitkar, P.B.; Plocharczyk, E.; Chen, C.-C. Modeling Drug Molecule Solubility to Identify Optimal Solvent Systems for Crystallization. Org. Proc. Res. Dev. 2008, 12, 249-256. [30] Zhou, T.; Wang, J.; McBride, K.; Sundmacher, K. Optimal Design of Solvents for Extractive Reaction Processes. AIChE J. 2016, 62, 3238-3249. [31] Datta. S.; Dev, V.A.; Eden, M.R. Hybrid genetic algorithm-decision tree approach for rate constant prediction using structures of reactants and solvent for Diels-Alder reaction. Comp. Chem. Eng. 2017, 106, 690-698. [32] Samudra, A.P.; Sahinidis, N.V. Optimization-Based Framework for ComputerAided Molecular Design. AIChE J. 2013, 10, 3686-3701. [33] For more information on Hansen parameters visit: https://www.hansensolubility.com [34] Louwerse, M. J.; Maldonado, A.; Rousseau, S.; Moreau-Masselon, C.; Roux, B.; Rothenberg, G. Chem. Phys. Chem., 2017, 18, 2999–3006 [35] R Core Team (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/. [36] Stacklies, W.; Redestig, H.; Scholz, M.; Walther, D.; Selbig, J. pcaMethods -- a Bioconductor package providing PCA methods for incomplete data. Bioinformatics 2007, 23, 1164-1167. [37] H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag: New York, 2016. [38] Gagolewski M.; Tartanus, B. and others (2018). R package stringi: Character string processing facilities. http://www.gagolewski.com/software/stringi/. DOI:10.5281/zenodo.32557 [39] Bache, S.M.; Wickham, H. (2014). magrittr: A Forward-Pipe Operator for R. R package version 1.5. https://CRAN.R-project.org/package=magrittr [40] Sievert, C. (2018) plotly for R. https://plotly-book.cpsievert.me

41 ACS Paragon Plus Environment

Organic Process Research & Development 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 42 of 43

[41] Wickham, H.; Henry, L. (2018). tidyr: Easily Tidy Data with 'spread()' and 'gather()' Functions. R package version 0.8.2. https://CRAN.R-project.org/package=tidyr [42] Wickham, H.; Bryan, J. (2018). readxl: Read Excel Files. R package version 1.1.0. https://CRAN.R-project.org/package=readxl [43] Canty, A.; Ripley, B. (2017). boot: Bootstrap R (S-Plus) Functions. R package version 1.3-20. [44] Davison, A. C.; Hinkley, D. V. (1997) Bootstrap Methods and Their Applications. Cambridge University Press, Cambridge. ISBN 0-521-57391-2 [45] Xie, Y.; Cheng, J.; Tan, X. (2018). DT: A Wrapper of the JavaScript Library 'DataTables'. R package version 0.5. https://CRAN.R-project.org/package=DT [46] Wickham, H. (2017). lazyeval: Lazy (Non-Standard) Evaluation. R package version 0.2.1. https://CRAN.R-project.org/package=lazyeval [47] Gandrud, C.; Allaire, J.J.; Russell, K. (2016). networkD3: D3 JavaScript Network Graphs from R. R package version 0.2.13. https://CRAN.R-project.org/package=networkD3 [48] Chang, W. (2014). extrafont: Tools for using fonts. R package version 0.17. https://CRAN.R-project.org/package=extrafont [49] Wickham, H.; François, R.; Henry, L.; Müller, K. (2018). dplyr: A Grammar of Data Manipulation. R package version 0.7.8. https://CRAN.R-project.org/package=dplyr [50] Chang, W.; Cheng, J.; Allaire, J.J.; Xie, Y.; McPherson, J. (2018). shiny: Web Application Framework for R. R package version 1.2.0. https://CRAN.Rproject.org/package=shiny [51] Everitt, B.S.; Landau, S.; Leese, M.; Stahl, D. Cluster Analysis, 5th Edition. John Wiley and Sons, 2011. [52] Kaufman, L.; Rousseeuw, P.J. Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley and Sons, 2005. [53] Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference and Prediction 2nd Ed. Springer, 2009. [54] Murtagh, F.; Legendre, P. Ward's hierarchical agglomerative clustering method: which algorithms implement Ward's criterion? Journal of Classification 2014, 31, 274–295. [55] Pearson, K. On Lines and Planes of Closest Fit to Systems of Points in Space. Philosophical Magazine 1901, 2, 559–572. [56] https://en.wikipedia.org/wiki/Principal_component_analysis [57] https://projects.susielu.com/viz-palette. Accessed 24 January 2019. [58] Eriksson, L.; Johansson, E.; Kettaneh-Wold, N.; Wold, S. Multi and Megavariate Data Analysis. Umetrics Academy: Umeå, 2001. [59] Nomikos, P.; McGregor, J.F. Multivariate SPC Charts for Monitoring Batch Processes. Technometrics 1995, 37, 41-59:

42 ACS Paragon Plus Environment

Page 43 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Organic Process Research & Development

[60] Maesschalck, R.D.; Candolfi, A.; Massart, D.L.; Heuerding, S. Decision criteria for soft independent modelling of class analogy applied to near infrared data. Chemometrics and Intelligent Laboratory Systems 1999, 47, 65-77; [61] Filzmoser, P.; Varmuza, K. (2017). chemometrics: Multivariate Statistical Analysis in Chemometrics. R package version 1.4.2. https://CRAN.R-project.org/package=chemometrics. [62] Hill, J.S.; Isaacs, N.S. Mechanism of α-substitution reactions of acrylic derivatives. J. Phys. Org. Chem. 1990, 3, 285-288. [63] Regulation (EC) No 1272/2008 of the European Parliament and of the Council of 16 December 2008 on classification, labelling and packaging of substances and mixtures, amending and repealing Directives 67/548/EEC and 1999/45/EC, and amending Regulation (EC) No 1907/2006. [64] Jouyban, A. Handbook of solubility of Pharmaceuticals. CRC Press: Boca Raton, 2010. [65] Acree, Jr., W.E. IUPAC-NIST Solubility Data Series. 102. Solubility of Nonsteroidal Anti-inflammatory Drugs (NSAIDs) in Neat Organic Solvents and Organic Solvent Mixtures. J. Phys. Chem. Ref. Data 2014, 43, 023102—1-276. [66] Mullins, E.; Oldland, R.; Liu, Y.A.; Wang, S.; Sandler, S.I.; Chen, C.-C.; Zwolak, M.; Seavey, K.C. Sigma-Profile Database for Using COSMO-Based Thermodynamic Methods. Ind. Eng. Chem. Res. 2006, 45, 4389-4415.

43 ACS Paragon Plus Environment