A Review on Property Estimation Methods and Computational

Apr 26, 2013 - ... emission to continental rural air (carcinogenic and noncarcinogenic), ...... Reactions producing isomers, as the Friedel-Crafts pre...
2 downloads 0 Views 810KB Size
Subscriber access provided by Illinois Institute of Technology

Review

A review on property estimation methods and computational schemes for rational solvent design: A focus on pharmaceuticals Harini Madakashira, Jhumpa Adhikari, and K. Yamuna Rani Ind. Eng. Chem. Res., Just Accepted Manuscript • DOI: 10.1021/ie301329y • Publication Date (Web): 26 Apr 2013 Downloaded from http://pubs.acs.org on April 27, 2013

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Industrial & Engineering Chemistry Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 68

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

A review on property estimation methods and computational schemes for rational solvent design: A focus on pharmaceuticals †



M. Harini , Jhumpa Adhikari , K.Yamuna Rani* †

*

Department of Chemical Engineering, Indian Institute of Technology Bombay, Mumbai-400076 Chemical and Energy Engineering Division, Indian Institute of Chemical Technology, Hyderabad-500607

Abstract This paper provides a review of the available literature on computational schemes for rational solvent design, with a focus on solvent extraction and crystallization (the two most common unit operations) in pharmaceutical industry. The computer aided design of solvents is important as a cost effective tool, especially, with the regular development of new pharmaceutical molecules. Also, there is a need to minimize the amount and the number of solvents used with regard to environmental, health and toxicological concerns. This review covers the properties of interest and the predictive methods for estimation of these properties in solvent design including the group contribution based methods, quantitative structure property prediction methods and molecular modeling methods. Additionally, the various optimization approaches for rational solvent design such as outer approximation, branch and bound, simulated annealing and genetic algorithm are also discussed. Keywords: product design, property prediction, optimization, solvent, pharmaceuticals, crystallization, extraction.

*Author to whom correspondence should be addressed; electronic mail: [email protected]; [email protected]

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Contents Introduction………………………………………………………………………………. . 1 Property Estimation Methods ……………………………………………………………... 6 Group contribution based methods………………………………………………… 6 QSPR methods…………………………………………………………………….. .10 Utility of various topological indices………………………………………. 13 Molecular Modeling and Simulation……………………………………………......16 Properties of Interest for Solvent Selection……………………………………………….. .18 Solvent Extraction……………………………………………………………….….18 Infinite dilution activity coefficient………………………………………….19 Crystallization……………………………………………………………………… 20 Hydrogen bonding solubility parameter…………………………………….20 n-octanol/water partition coefficient….…………………………………… .21 Solubility parameter………………………………………………………....21 Toxicity……………………………………………………………………....22 Rational Solvent Design Approaches …………………………………………………….....23 Constraints……………………………………………………………………...……23 Property Constraints………………………....………………………..…......23 Structural Constraints………………………………..…………………..…..23 Practicality Constraints……………………………………………………...26 Optimization Approaches………………………………………….……………........26 Deterministic Approaches…………………………………...……………….28 Branch and bound.….……………………………………………….28 Outer approximation.………………………………………………. .29 Stochastic Approaches……………………………………………………… 31 Simulated annealing……………………….....………………………31 Genetic algorithm..………………………...…………………………32 Solvent Design for Pharmaceuticals…….……………………………………………………35 Solvent extraction….....…………………..…………………………………….……35 Crystallization………..…….........………...…………………………………….…...36 Summary…………………………………………………………………………………...…40 Acknowledgement…………………………………………………………………………. ...41 References…………………………………………………………………………………. ...42 Appendix…………………………………………………………………………………… ...53

*Author to whom correspondence should be addressed; electronic mail: [email protected]; [email protected]

ACS Paragon Plus Environment

Page 2 of 68

Page 3 of 68

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

Industrial & Engineering Chemistry Research

Introduction

2

The design and development of new chemical based products with applications such as

3

drugs, cosmetics, pesticides, food, etc., is important and requires validation. Process engineering

4

and process design have been slowly evolving into a newer area of product engineering and

5

product design with totally different goals from those of the former over the last few years.

6

According to Moggridge and Cussler,1 chemical product design is a procedure defining “what we

7

need, generating ideas to meet this need, screening and selecting the best of the ideas and finally

8

deciding what the product should look like and how it should be manufactured”. An alternate

9

definition given by Hill2 is that product design is a general procedure for structured products

10

consisting of six steps: “consumer need identification, conceptual product design, identification

11

of active ingredient, incorporation of active ingredient into a physical prototype, assessing it

12

against relevant criteria and experimental refinement in prototype based on measured results”.

13

In chemical product design, with knowledge of the desired behavior and properties, we

14

attempt to identify the final product. Occasionally, the product may also be manipulated in order

15

to obtain the desired product behavior. These types of products are commonly known as

16

formulations, where an additive when added to chemical or non-chemical product enhances its

17

properties. Therefore, the problem is to find the appropriate chemical that will exhibit the desired

18

behavior; for example: enhanced profit, increased operational efficiency, positive environmental

19

impact, low toxicity, etc. Since millions of compounds exist, it is difficult to find an appropriate

20

chemical that meets the specific needs, only by direct experimentation. In addition, as

21

experimental measurements are often time consuming and expensive, predictive methods can

22

replace measurements if the estimates are sufficiently good. Until recently, many researchers

23

have proposed various methodologies which have made a significant contribution in reducing the

24

time and cost for the experimental effort.

25

Development of systematic methodologies for the design of chemical based products has

26

been attempted by various researchers.3-8 A multi-step and multi-level approach consisting of

27

problem formulation (pre-design step), compound identification (design step) and result analysis

28

(post-design step) has been proposed for computer aided product design by Harper and Gani

29

along with the description of roles of the different steps and tools needed in each step. Later, 1 ACS Paragon Plus Environment

3

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

Charpentier 4 proposed a ‘3PE’ approach, ‘the triplet molecular product process engineering’ for

2

successful product development of complex, multidisciplinary, non-linear and non-equilibrium

3

phenomena occurring at different length and time scales, in order to understand how physical

4

and bio-chemical phenomena at a smaller length scale relate to properties and behavior at a

5

longer length scale.

6

A computer aided molecular design (CAMD) methodology for the design of optimal

7

solvents and solvent mixtures using sub-problem approach has been illustrated through case

8

studies by Karunanithi et al.5 This methodology makes use of decomposition based solution

9

strategy, where the number of feasible molecules is systematically reduced in subsequent levels

10

by partitioning the constraints. Later, for a period, the focus was on methodology of integrated

11

product and process design where, Smith and Ierapepritou 6 have presented a review on the need

12

of integrative product design in perspective to the current challenges in the chemical process

13

industry. Bommareddy et al.7 have presented an algorithm based on algebraic approach for

14

simultaneous solution of product and process design problems. In this approach, the primary

15

problem identifies property targets corresponding to the desired process performance and the

16

secondary problem discovers the molecular structures that match the property targets identified.

17

Recently, a systematic methodology with an integrated three stage approach has been proposed

18

for product design and verification of liquid formulations; such that, stage-1 generates a list of

19

feasible product candidates, stage-2 deals with planning and execution of experiments and stage-

20

3 involves product validation.8

21

There is a wealth of literature available on physical and chemical property prediction of

22

compounds based on their structure. Target compounds reported in literature include solvents,9

23

refrigerants,

24

with a specific target property and its target value. The target property and its range vary with the

25

size and complexity of the chemical product. For example, the target property for solvent design

26

involving relatively small molecules relate to macroscopic scales whereas drug design involving

27

large molecules, relate to microscopic and mesoscopic scale. Gani13 has illustrated a

28

methodology to solve chemical product design problems using computer aided methods and

29

tools, and proposed an approach to predict a wide range of physical and thermodynamic

30

properties with the help of molecular description of targeted compounds. Further, the need of

31

thermodynamic modeling towards chemical product design for complex chemical products such

10

polymers

11

and ionic liquids.12 Each class of targeted compounds is associated

2 ACS Paragon Plus Environment

Page 4 of 68

Page 5 of 68

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1

as detergents, paints and polymers has been outlined.14 Recently, O’Connell et al.15 have

2

provided a comprehensive review and perspective on thermodynamic property modeling which

3

can be applied to product engineering.

4

Solvents have been widely used for centuries in various industries such as chemicals,

5

petrochemicals, pharmaceuticals, leather, cosmetics, food and beverages etc., to carry out

6

reaction, formulation, separation and cleaning of equipment. In process industries, solvents are

7

used in various steps such as separation (of gas, liquid and/or solid), reaction (as reaction

8

medium, reactant, and carrier) and washing. Apart from these, solvents are used in paints,

9

textiles, rubber, adhesives, cleaning reagents (dry cleaning, washing), etc., for product

10

formulations.

11

The pharmaceutical industry is one of the largest users of organic solvents per unit of the

12

final product. Solvents constitute 56% of the mass in the manufacture of an active

13

pharmaceutical ingredient (API).16 The presence of a solvent can amend a wide variety of

14

important factors on a reaction, such as controlling the reaction temperature, altering the

15

chemical kinetics and even affecting chemical equilibrium. Pharmaceutical molecules, because

16

of their high polarizability, conformational flexibility and existence of multiple functional

17

groups, are very different from the common petrochemicals and therefore require special

18

attention. It is essential to produce pharmaceutical products of high purity, consistent quality and

19

high yield. To meet these demands, solvent selection plays a vital role but is generally least

20

considered by pharmaceutical chemists. An article by Nicponski and Ramachandran17 on the role

21

of solvent selection at different stages in pharmaceutical industry discusses the factors that are to

22

be considered in solvent selection such as cost, ease of recoverability, recyclability, inherent

23

reactivity, environmental effects, etc., and their disposal effects in the current era; thus affirming

24

the need for greener solvent design.

25

Recently, Henderson et al.16 have highlighted the issues that one encounters in solvent

26

selection and reported a database of solvents used in pharmaceuticals along with their properties

27

such as toxicity, vapor pressure, boiling point, melting point, flammability, environmental

28

impact, waste and recycling ability. A review on organic solvents used in drug research by

29

Grodowska and Parczewski18 reports the division of various classes of solvents based on their

30

toxicity and environmental hazard. The article also discusses the problems encountered while 3 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

using organic solvents in each step (reaction, formulation, separation) of synthesis of the API.

2

The different methods employed for the removal of undesirable and toxic residual solvents are

3

discussed in detail so as to maintain their concentrations within acceptable limits. They have also

4

discussed ways to avoid organic solvents by choosing new alternatives such as supercritical

5

fluids and ionic liquids. In the last few years, research on the solvent design for pharmaceuticals

6

has being carried out by various researchers.19-23 Some have investigated the therapeutic effect of

7

the drug.22,23 Others have explored the physical property aspects and controlling the

8

manufacturing process of the drug (i.e. improving the yield).18-21

9

There is a need to identify greener solvents with similar or enhanced performance when

10

compared to the existing solvents in chemical and pharmaceutical industries. In this work, a

11

review of the literature on the methodologies available for the prediction of product (solvent)

12

properties from the molecular structure along with the various optimization approaches to solve

13

the solvent design problem is presented. In the literature on molecular design, the property of

14

interest of the molecule is estimated with good accuracy by three methods; namely, Group

15

contribution (GC), Quantitative structure property prediction (QSPR) and molecular simulations.

16

The advantages and disadvantages of these methods have been discussed. Among the methods,

17

the GC based Marrero-Gani (MG) method is widely used by various researchers for property

18

estimation. Both deterministic and stochastic optimization approaches have been widely

19

employed with slight modifications for the attainment global solution. The structure and property

20

constraints employed to formulate the optimization problem are discussed. Further, the

21

properties of interest that are considered in the literature for solvent selection are discussed in

22

detail. Finally, literature available on the application of the solvent design methods to

23

pharmaceuticals has been reviewed. The last section summarizes the literature reviewed in this

24

paper. A roadmap representing the outline of this article is shown in Figure 1.

25 26 27 28 29 30 4 ACS Paragon Plus Environment

Page 6 of 68

Page 7 of 68

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Reverse design 1

Need

Property

Structure

2 3

Refrigerants, Polymers, Solvents . . .

CAMD

List of feasible molecules

4 5 6 7

Property Estimation Methods

Structural feasibility constraints

Selection of basic groups

8

UNIFAC groups

UNIFAC groups based

9

Marrero-Gani groups

Octet rule based

10 11

Joback GC Constantinou Gani Marrero-Gani Position GC

12

GIC

13 The selection of

QSPR

basic groups 14 depend on the property prediction 15 method used

To generate structurally feasible aliphatic, aromatic, acyclic molecules

Based on the need and the problem definition properties are chosen with bounds on them

GC-CI MG-CI Signature descriptors

GC based methods do not give information about the connectivity of the molecule

18 19

Branch and bound Interval analysis Stochastic methods Simulated annealing Genetic algorithm Tabu Search

CI

16 17

Optimization Deterministic methods Outer approximation

GC based methods

Adjacency matrix based

Base groups for CI

Property constraints

Minimize/ maximize objective function with imposed structural and property constraints

Molecular simulation

20 Topological Indices 21 Worked out example for

22 melting point estimation using structure property 23 predictions

24

Connectivity index Volumetric connectivity index Mass connectivity index Electronegativity index Bond based linear indices

25 26

Lu Index

Figure 1. Flowchart / Roadmap of CAMD framework 5 ACS Paragon Plus Environment

Case studies on solvent extraction and crystallization using CAMD

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

Property Estimation Methods

2

Based on the definition of chemical product design, it can be described as “reverse

3

property prediction” since the molecular structure is predicted for the desired property. The

4

problem type can be molecular and mixture design,24 process design synthesis and evaluation,25

5

process and product design,26 process solvent design,27 etc. Once the problem is formulated, the

6

inputs and the constraints (structure and property) are set based on the process objectives. New

7

molecules are designed and then followed by a check, whether the molecule obtained can be

8

synthesized, the availability of raw materials and their environmental impact; and validation of

9

the desired property of interest. In chemical product design, the molecular design problem is

10

transformed into a CAMD problem, which incorporates optimization techniques along with

11

molecular structure-property relationships. A book on molecular systems engineering edited by

12

Adjiman and Galindo,28 presents in brief, the structure-property correlations and optimization

13

based approaches to CAMD along with solvent design for reactions.

14

In literature, various structure property prediction methods are reported for the design of

15

new molecules. They can be broadly categorized as GC based methods; QSPR based methods

16

and their combinations, and molecular simulations. Different GC based methods, namely general

17

GC method, Constantinou-Gani method, group interaction contribution (GIC) method, MG

18

method and position group contribution method are discussed first. It is then followed by the

19

discussion on QSPR methods: the connectivity indices (CI) method and the molecular signature

20

descriptors method with a focus on various topological indices that are reported in the literature

21

and are of interest in context to this article. Later, the methods based on the combined approach:

22

the GC-CI method, MG-CI methods are also described. The flowsheet of Figure 1 lists out

23

various structure property prediction methods that are discussed in this article.

24

Group contribution based methods

25

The GC method is a well established technique in literature, developed by Lydersen in

26

1955, to estimate pure component critical properties from molecular structure. Since an infinite

27

number of chemical compounds exist but only a limited number of functional groups, it was

28

convenient to estimate functional group parameters from existing data, and then predict the

29

properties of new compounds. Deal and Derr29 were the first to review the use of molecular

30

structure for making quantitative estimates of activities in mixtures of simple organic 6 ACS Paragon Plus Environment

Page 8 of 68

Page 9 of 68

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1

compounds. The estimates were found by interpolating and extrapolating from one system to

2

another system using the idea of characteristic structural group contributions. Research on GC

3

based formulations aided in overcoming the dearth of experimental data, and imparted fast and

4

intuitive tools for the prediction of pure component properties.30 The simplest form of GC is the

5

determination of the physical property by summing up the product of the contributions made by

6

structural groups in the individual molecule and the number of times each group appears, (i.e.

7

assuming linear additive dependence) as shown in the appendix of this article. This simple

8

approach was first used in the Joback method for the prediction of thermo-physical and

9

transport properties such as critical state data, heat capacity, viscosity, etc., and it worked well

10

for a ‘limited range’ of components.31 The first order GCs have been developed with 40 groups

11

for organic compounds containing halogens, oxygen, nitrogen and sulfur. The universal

12

functional activity coefficient (UNIFAC) method developed by Fredenslund et al.32 use the GC

13

method to estimate activity coefficients, which along with other properties, such as vapor

14

pressures in the modified Raoult’s law, are used to predict vapor-liquid equilibrium in mixtures.

15

However, this method predicts the same result for all isomers and does not differentiate among

16

these molecules. Moreover, reliable GC models are available only for a limited number of

17

thermodynamic properties and it is not possible to represent all atomic arrangements. Thus, it is

18

difficult to accurately predict the properties of complex molecules (eg. heterocyclics). To address

19

a few of these disadvantages of GC, a general methodology for CAMD using GC approach,

20

which can handle molecules of various degrees of complexity and size has been proposed by

21

Constantinou et al.33 This approach categorizes groups into the first and second order, where the

22

second order groups have the first order groups as building blocks and is represented as follows: 23

f ( p) =

∑NC i

i

i

+W ∑ M jDj

(1)

j

24

In equation l, Ci and Dj are the first order and second order GCs, respectively, and Ni and Mj

25

correspond to the number of first order and second order occurrences in the compound. The

26

constant W is set equal to unity if the second order term is to be used. This two level approach

27

consists of 63 first order groups and 40 second order groups. Five case studies,33 where this

28

approach has been successfully applied include solvent design for liquid - liquid extraction and

29

azeotropic distillation, design of polymers, solute design for dehydration by super critical

30

extraction, and design of low cost solvent blends for coatings. Although, the property estimates

31

of a class of molecules were improved after addition of second order groups, this method also 7 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 68

1

suffers from the same limitations as observed in GC. An example of melting point estimate using

2

this method is shown in the Appendix along with the average absolute error (AAE) values. The

3

AAE is defined as:

4

∑θ AAE =

est i

− θ iexp t

(2)

N

5

where θiest is the property θ estimated by regression θiexpt is the experimental value of the

6

property θ for compound I and N is the number of data points.

7

Numerous attempts have been made to overcome the limitations of GC methods. Eladio

8

and Raman developed a model applicable to mixtures, where the property is determined by GIC,

9

which considers the contributions of different interactions between bonding groups present in a

10

given molecule instead of contribution of structural groups.24, 34 The models based on GIC were

11

observed to give a better estimate when compared with GC and can be used to distinguish

12

between isomers in mixtures but require a large number of model parameters. Their results also

13

show that the estimate of activity coefficients obtained by GIC are very close to those obtained

14

by UNIFAC method, and can be used for isomers. The accuracy of the prediction depends on the

15

correctness of the determination of model parameters.

16

Subsequently, a modified GC based estimation for pure organic compounds, named the

17

MG method, has been developed to increase the accuracy of the prediction and applicability by

18

accounting more complex heterocyclic and large polyfunctional alicyclic compounds.35 A data

19

set of more than 2000 compounds ranging from 3 to 60 carbon atoms, including large and

20

complex polycyclic compounds have been used to develop the correlations. The property

21

estimation model has the form of the following equation:

f ( X ) = ∑ NiCi + w∑ M j D j + z ∑ O22 k Ek i

j

(3)

k

23 24

where, Ci, Dj, Ek are the contributions of the first, second and third order group of type i, j and k

25

that occur Ni , Mj, Ok times respectively. In the first level of estimation, the constants w and z are

26

assigned zero values because only first order groups are employed. In the second level, the

27

constants w and z are assigned unity and zero values respectively, because only first and second

28

order groups are involved while in the third level, both w and z are set to unity values. The left-

8 ACS Paragon Plus Environment

Page 11 of 68

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1

hand side of equation 3 is a simple function f(X) of the target property X (such as exponential

2

function of the ratio of property value to the adjustable parameter or just the difference of the

3

property value and the adjustable parameter). Three levels of molecular groups have been

4

identified which are termed as first order, second order and the third order groups. The first level

5

corresponds to simple and mono-functional compounds; the second level to polyfunctional

6

compounds, aromatic and aliphatic compounds with one ring; third level with large complex

7

polycyclic compounds. The proposed method was found to estimate properties with increased

8

accuracy, low AAE and wide range of applicability for chemical, biochemical and environmental

9

compounds. Their results also indicate that for smaller, complex molecules, it is better to have a

10

smaller set of second order groups, and for large polyfunctional compounds, larger third order

11

groups give better results. Marrero and Gani35 have reported the contributions for 182 first order

12

groups, 122 second order groups and 66 third order groups for the following properties: normal

13

boiling point, critical temperature, critical pressure, critical volume, standard enthalpy of

14

formation, standard enthalpy of vaporization, standard Gibbs energy, normal melting point and

15

standard enthalpy of fusion. The determination of the representative example property, melting

16

point, using this method is reported in the Appendix.

17

Recently, a new GC method named the ‘position group contribution method’ has been

18

proposed for estimation of critical properties, boiling point and melting point for organic

19

compounds.36 This method distinguishes isomers including cis and trans structures and takes into

20

account the ortho, meta, para corrections in benzene ring and pyridines which were not taken

21

into account by MG and the other methods discussed above. A total of 730 compounds

22

containing carbon, hydrogen, oxygen, nitrogen, chlorine, bromine and sulphur were used for the

23

determination of group contributions for melting point. An example calculation has been shown

24

in Appendix. This method was found to perform better and showed less deviation in prediction

25

when compared with Joback and Constantinou-Gani method. Table 1 presents the AAE for

26

various properties when the above discussed structure property prediction methods were used.

27

The expression for the GC is as follows:

28

N   + a exp(1 N ) f ( x ) = ∑ Ai N i + ∑ A j tanh  j  + ∑ Ak Pk + a1 exp 1  2 N M w  i j   k

(4)

29

where Ai or Aj represents i or j group contributions, determined through minimization of residual

30

error by regression. Ni represents the number of groups in which carbon element forms the center 9 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 68

1

of the group, Nj represents the number of groups in which non-carbon element forms the center,

2

  N is the total number of groups  N = ∑ N i + ∑ N j  , Pk characterizes the position factor, which i j  

3

accounts for ortho, meta, para positions, Mw denotes the molecular weight, and a1 and a2 are the

4

coefficients of correlation and f(X) is the simple function of target property X.

5

QSPR methods

6

The QSPR approach for property prediction represents the effect on the properties due to

7

interactions among different molecular groups based on their connectivities. QSPR is a technique

8

to quantitatively correlate chemical structure and properties based on the molecular structure.

9

The QSPR approach is widely used for the prediction of physico-chemical properties.37 This

10

approach is based on the assumption that the variations in the properties of the compounds can

11

be correlated with changes in their molecular features, characterized by the so-called “molecular

12

descriptors”. In this method, molecular structure is characterized by a number of topological,

13

geometrical and quantum chemical descriptors which are used to estimate the property of interest

14

by multi-linear regression. Among various descriptors, the topological indices (TIs) are widely

15

used descriptors, since these indices offer a simple way of measuring molecular branching,

16

shape, size, cyclicity, symmetry, centricity and complexity. The molecular CIs are the most

17

commonly used topological descriptors that provide quantitative characterization of skeletal

18

variation in a molecule. These descriptors are based on substructure features in the molecular

19

graph, such as bonds, clusters and rings.

20

Graph theory is a branch of mathematics that deals with the objects that are connected.

21

The objects in the graph are called vertices and the lines used to connect them are called edges.

22

This analogy in the chemical system is as follows: the sites are represented by atoms, molecules

23

and molecular groups, and the connections between those sites are bonds and interactions. For

24

simplicity, molecular graphs are generally represented as hydrogen suppressed graphs.38 The

25

representation of a molecule in the form of graph is the first step in the development of any

26

topological index. The central problem in QSPR is to convert chemical structures into molecular

27

descriptors that are relevant to a certain physico-chemical property. Many physico-chemical

28

properties can be satisfactorily correlated with the topostructural or topochemical features. In

29

principle, these are mathematical objects, without an accurate physical meaning. Topological 10 ACS Paragon Plus Environment

Page 13 of 68

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1

descriptors are calculated using information based on the connectivity of atoms/groups within a

2

molecule. Consequently, these descriptors contain information about the constitution, size, shape

3

and branching, whereas bond length, bond angles and torsion angles are neglected.

4

In 1975, Milan Randic proposed an algorithm to characterize bond contributions to a

5

molecular branching index. A molecular CI is calculated by counting all bonded atoms other

6

than hydrogen in the molecular structure and designating a “δ” value (cardinal number) for each

7

atom. The “δ” value of an atom equals the number of adjacent non-hydrogen atoms. The “δ”

8

values of each atom forming a bond pair designates a bond value, and the bond values are then

9

summed over all the bonds (single, double, triple) in the chemical structure to calculate pχ, the pth

10

order Randic index39 as shown in equation 5. p

11

χ=



(δ iδ j ) −1/2

(5)

edges _ ij

12

Where, i and j are adjacent atoms forming a bond pair in the structure, and δi and δj are the atom

13

connectivities of the molecular graph.

14

The Randic index has later been modified by Kier and Hall40 by decomposition of graphs

15

into subgraphs, which may consist of a single atom, a single edge or a set of connected edges

16

(path) in which no vertex is included twice. It is then summed over the subgraphs. Subsequently,

17

the valence CI is obtained by summing over all type ‘t’ subgraphs with ‘m’ edges using the

18

valence δ (δv) values, which take into account the presence of multiple bonds and heteroatoms.

19

The expression for valence connectivity index is as follows: N m +1

m

20

χ = ∑∑ (δ kv ) −1/2 v t

i =1 k =1

(6)

21

where

22

δ kv =

(Z kv − H k ) ( Z k − Z kv − 1)

(7)

11 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 68

1

In equation 7, Zk ≡ the total number of electrons in the kth atom, Zkv≡ the number of valence

2

electrons in the kth atom, Hk ≡ the number of hydrogen atoms directly attached to the kth non-

3

hydrogen atom, m = 0 for atomic valence connectivity indices, m = 1 for one bond path valence

4

connectivity indices, m = 2 for two bond fragment valence connectivity indices, and m = 3 for

5

three contiguous bond fragment valence connectivity indices.

6

These indices revealed that hydrogen atom count is actually included in the calculation of

7

indices developed from the graph. Also, this realization gave additional support to use the term

8

hydrogen-implied rather than hydrogen-suppressed graph. The general expression for property

9

prediction using these connectivity indices is as follows:

10

P=

1 0 0 C χ + 0C v 0 χ v + 1C1 χ + 1C v1χ v + constant ) ( n

(8)

11

where P is a property of interest, C represents the regression coefficient for each term, 0χ and 1χ

12

are the zeroth and first order molecular connectivity indices, and n represents the total number of

13

groups present in the molecule. The structure–property correlations developed using CIs are

14

more vigorous than those based on GC method. The contributions made by each individual

15

group in a molecule to estimate the property in GC method is comparable with the two zeroth

16

order molecular connectivity indices (0χ , 0χv), which depend on the identities of the various

17

basic groups present in the molecule. However, the advantages of using the CI over the GC

18

method is seen when the first order CIs, (1χ , 1χv), which depend on the bonding characteristics of

19

various basic groups in the molecule, are employed. In other words, the first order CIs can

20

judiciously distinguish the isomers than the zeroth order CIs. Furthermore, when CAMD is

21

performed using this method, a complete molecule structure is obtained, rather than a list of

22

groups that are obtained when GC is used, which need to be further arranged in the optimization

23

step. Though this method offers advantages over the GC method, the main limitation of this

24

method is the knowledge of the correlation coefficients, which are regressed from a database of

25

compounds. Also, several other indices apart from CIs reported in the literature, which are based

26

on various factors such as mass, electronegativity, volume, dipole moment, etc., of each group,

27

are used in accurate property estimation and are discussed below. However, it is to be noted that

28

many such relationships can be very specific to certain classes of molecules, which typically

29

limits their application (Table 2). Additionally, the selection of a set of indices in estimation of a 12 ACS Paragon Plus Environment

Page 15 of 68

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1

property is again an area subject to research. Again representative calculations for estimating

2

melting point using this CI method are given in the Appendix.

3

Hu et al.41 have reported a detailed review on twenty six TIs, nine different matrices

4

expressing molecular structure and eight methods to deal with different molecular graphs. They

5

have also proposed one new ‘variable index’ for the molecules containing heteroatoms. In

6

attempts to predict physical and thermodynamic properties of polymers a more universally

7

applicable QSPR correlation was developed by Bicerano42 using experimental data of more than

8

400 polymers and involving various descriptors. In general, if two or more chemical graphs have

9

same TI, then the TI is said to be degenerate. Wiener index was the first TI based on topological

10

distance matrix. Balaban43 based his ideas on the first, second and third generation TIs which

11

transmitted information on properties but not on structures. Among the first generation (e.g.

12

Weiner, Hosoya, Centric index) TIs, the Wiener index has high degeneracy. Balaban observed

13

that for the inverse problem such indices would lead to combinatorial explosion of solutions. In

14

comparison, the second (e.g. Randic index, Kier-Hall index, Balaban’s J index) and third

15

generation indices (e.g. triplet indices, BCUT indices) have low degeneracy and are uniquely

16

associated with chemical structures but it is difficult to solve the inverse problem in a reasonable

17

amount of time. Recently, Katritzky and co-workers44 reviewed QSPR studies correlating the

18

prediction of physical and chemical properties with chemical structures. The review mainly

19

focuses on structural descriptors derived from chemical structures for the correlation and

20

prediction of various physical and chemical properties. A detailed approach of QSPR with

21

various modeling procedures, both linear and nonlinear such as multi linear regression, principle

22

component regression, artificial neural networks, genetic algorithm, support vector machines

23

etc., were discussed. In addition, estimation methods for various important physical and chemical

24

properties such as boiling point, heat of fusion, viscosities, refractive index, densities,

25

solubilities, rate constants, stability and dissolution constants, glass transition temperatures, etc.,

26

along with the correlation using different approaches were reported. Numerous TIs (around 400)

27

can be found in the literature41 to correlate molecular structure and properties. However, only

28

few of them have found wide applications. The “connectivity index” is the most widely used TI.

29

Table 2 lists various indices that have been introduced for organic compounds in literature,

30

recently, for a range of properties of interest with low degeneracy which can be utilized to design

31

new molecules. 13 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

Page 16 of 68

Utility of various topological indices:

2

It has been identified that different physical properties depend in a distinct way on the

3

inherent structural features of a molecule. There are the many factors that impact the prediction

4

of a physical property of a molecule; the most apparent among them are the molecular size,

5

shape, polarity, electronegativity and hydrogen bonding. Tetko et al. have developed multi-

6

platform software45, called virtual computational chemistry laboratory involving an indices

7

generation program that computes more than 1600 molecular descriptors which can be used to

8

evaluate structure property relationships. For properties such as solubility and octanol/water

9

partition coefficient multiple prediction tools are available; however, the uncertainty and

10

variability between predictions can be substantial. A software package, CODESSA PRO

11

(COmprehensive Descriptors for Structure and Statistical Analysis), developed by Katritzky and

12

his co-workers44 which enables the calculation of numerous (~1000) constitutional, topological,

13

geometrical, thermodynamic, semi-empirical, quantum chemical, and electrostatic descriptors

14

solely on the basis of molecular structural information, is widely used in prediction of properties

15

of various class of molecules.

16

With ionic liquids (ILs) emerging as new solvents, two indices have come up in the

17

literature that accurately predict the densities of ILs. The volumetric connectivity index (σ) was

18

used in predicting the density of 142 ILs including imidazolium, pyridinium, pyrrolidinium,

19

piperidinium, quaternary ammonium, and quaternary phosphonium at room temperature. This

20

index, when combined with mass connectivity index (λ), can be used to predict the densities of

21

ILs accurately at different temperatures. The detailed calculation to estimate these indices are

22

reported by Xiong et al.46 The bond-based indices were considered to be statistically significant

23

when compared to other molecular descriptors for the estimation of physical, chemical, and

24

biological properties; such as the boiling point, partition coefficient, antibacterial activity, etc.

25

Several promising results have been achieved in the computational drug discovery with the use

26

of these indices.47

27

The electronegativity descriptor has played a key role in estimation of various properties

28

such as melting point, partial charges, boiling point, molar refraction, etc., with low degeneracy.

29

The Lu index48 (shown in Table 2), which is a modification of the well established Weiner

30

index,49 makes use of both relative electronegativity and relative bond length of vertices to

31

correlate the normal boiling points and molar refractions of aldehydes and ketones. Apart from 14 ACS Paragon Plus Environment

Page 17 of 68

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1

hydrocarbons it is also applicable to the class of hetero-atom and multiple bond containing

2

organic compounds. Index F50 (shown in Table 2) utilizes electronegativity between the groups

3

to obtain a good QSPR model for hetero-atom containing organic and inorganic molecules. The

4

reciprocal distance matrix of Weiner has been modified to multiple matrices that represent

5

topological properties of vertices, bonds, edges and interaction of vertices in a molecular graph.

6

The models for QSPR were obtained for the properties such as aqueous solubility, and

7

octanol/water partition of benzene halides with use of electronegativity descriptor.51 Thus, it is

8

evident from the above discussion that there exist numerous indices for various classes of

9

molecules to predict different properties. The software packages such as CODESSA-PRO44 and

10

DRAGON45 can assist in the selection of these descriptors based on the class/type of the

11

molecule and the property of interest that leads to a better correlation with experimental data.

12

To overcome the common bottleneck while predicting the properties when group

13

contributions are not available, Gani et al.9 have come up with an approach where the

14

contribution of the missing group has been predicted using zero and first order CI. This method

15

is named as the “combined group contribution-connectivity indices (GC-CI) method” and is

16

found to predict the property much closer to the experimental value when compared with the MG

17

method without any missing group contributions. However, in some cases the addition of

18

missing GC introduces large errors in property prediction as combined GC-CI method does not

19

improve the accuracy of original GC method (as shown in the Appendix). The AAE values of

20

various properties have been reported in Table 1. The uncertainties in using this method for

21

property estimation has been quantified by maximum likelihood estimation of GC and CI

22

methods by Hukkerikar and his coworkers.52

23

Recently, a new method based on QSPRs using molecular signature descriptors to

24

identify potentially new molecules has been proposed by Weis and Visco.53 It was found to be a

25

powerful tool for encoding the local neighborhood of a molecule; where, a user-specified

26

parameter called the signature height, ‘h’ determines the size of the local neighborhood. A

27

subgraph centered at a specific root atom including all atoms/bonds extending out to the

28

predefined height, h, without tracking the path backwards is defined as a atomic signature of

29

height ‘h’. It is specified as hσ G ( x ) for the root atom x of the 2D graph G = (V , E ) ; where, V, E

30

refers to the vertex (atom) set and edge (bond) set, respectively. The developed algorithm has the

31

ability to unite a variety of property estimation models based on GC and TIs based QSPRs to 15 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 68

1

trace different property targets in molecular design. Additionally, the method can be employed

2

for TIs of different signatures heights as well. The applicability of this method has been

3

demonstrated with a case study on the design of alkyl substituent for the fungicide by

4

Chemmangattuvalappil et al.,54 where affinity, mobility and retention properties were correlated

5

using first order CIs and toxicity using GC. The accuracy of the predictions using this method

6

will depend on the correctness of the parameters of GC and QSPR methods. In addition, it

7

depends on the range of training set data as it would severely restrict the diversity of inverse

8

structures obtained if the test set does not fall within the bounds of training set. Some of the

9

structure-property prediction methods have been illustrated in the Appendix. The property

10

chosen to illustrate the methods is the melting point and the example compound is 2,5-dimethyl

11

benzoic acid.

12

Recently, Katrizky et al.44 in his review on the utility of structure property correlations,

13

summarized the QSPR models developed for various properties such as boiling point, critical

14

pressure, heat of vaporization, heat of formation, aqueous solubility, partition coefficient and

15

flash point based on type of compound (various classes of organic compounds), number of

16

components, molecular descriptor used, correlation coefficient and standard deviation. In this

17

article, we have summarized the AAE, correlation coefficient, number of components for each

18

property in using various property estimation methods as shown in Table 1. Among various

19

property prediction methods, we recommend the Marrero-Gani method for use; as this method

20

can be easily implemented to a wide variety of molecules, to predict various properties and at the

21

same time offers better accuracy.

22

Molecular Modeling and Simulation

23

Molecular simulations are the computer experiments that allow us to predict macroscopic

24

properties by studying the behavior of a large number of particles.55, 56 Molecular simulations

25

provide insight into the interactions and local structures of the molecules. These simulations start

26

with the consideration of microscopic structure and molecular interactions of the system to

27

derive thermodynamic, transport or other properties based on principles of statistical mechanics.

28

In general the molecular simulations are widely used to predict the properties of materials;

29

understand underlying molecular aspects of phenomena, which may lead to new experiments and

30

development of new theories; and test approximated theories for the system of interest. There are

31

two approaches in performing these simulations: stochastic and deterministic. The stochastic 16 ACS Paragon Plus Environment

Page 19 of 68

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1

approach, called Monte Carlo (MC) is based on random generation of configurations which are

2

associated with a known probability. The deterministic approach, called molecular dynamics

3

(MD), actually simulates the time evolution of the molecular system and provides us with the

4

actual trajectory of the system. Each technique has distinct advantages for certain class of

5

molecules. For a given molecular system, MC methods require, in iterations, less computing time

6

than MD. Compared to qualitative approaches like the correlations of group contributions,

7

molecular modeling is more reliable given an accurate force field to describe inter- and intra-

8

particular interactions.

9 10

Molecular MC and MD simulations are used to generate pseudo-experimental data for

11

wide ranges of pressure and temperature; understand the macroscopic behavior of mixtures that

12

include expensive novel products or toxic compounds, instead of costly experimental

13

investigations. These simulations have been used to predict macroscopic properties in oil and

14

gas, cosmetics, pharmaceutical industries57 and more generally in the chemical industry14 where,

15

MC methods are typically used to equilibrate solute-solvent systems. A broad spectrum of

16

problems in chemical and material science such as vapor–liquid equilibrium,

17

liquid equilibrium, solid-liquid equilibrium,58 supercritical solutions and ionic liquid properties;59

18

crystal growth and crystal orientation;21 determination of activity coefficient;60 transport

19

properties such as viscosity, diffusivity, thermal conductivity55, 56 can be calculated presently by

20

molecular modeling based on information of force fields and the ab initio parameters.

55, 56

vapor liquid–

21

Meniai and Newsham60 were the first to employ molecular modeling along with GC for

22

the evaluation of design parameters in solvent selection for liquid-liquid extraction. The

23

molecular graphics system was used to avoid the unusual combination during the assembly of

24

shortlisted groups (i.e. ensuring intermolecular stability) and in the estimation of unknown

25

UNIQUAC (UNIversal QUAsi Chemical) interaction parameters. The evaluation of properties

26

using this method was significantly complex and limited the size of the possible search space

27

using conventional computational resources. Harper et al.61 have modified this approach and had

28

come up with a CAMD methodology which combined molecular modeling with GC. This

29

method includes a structure generation algorithm, a large collection of property estimation

30

methods and a link to molecular modeling tools. In comparison to CAMD which gives the list of

31

possible candidates that need further investigation; this methodology with the link to molecular 17 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 68

1

modeling delivers the result in the form of molecules in the ready to use form. Stanescu and

2

Achenie62 have proposed a two-step CAMD method in which candidate solvents are generated

3

based on constraints on their physical properties, followed by density functional theory (a

4

quantum mechanical modelling method) solvation calculations to estimate the reaction rate and

5

product yield in the candidate solvents. A multi-scale model-based approach for predicting

6

physical properties of polymer repeat unit by combined CAMD technique based on GC plus

7

method with atomistic simulations has also been put in use.63 The molecular simulations were

8

capable of providing the physical properties of the polymer as a function of size (number of

9

repeat units) and operational variables such as the temperature and the pressure.

10

Properties of Interest for Solvent Selection

11

This section deals with the properties of interest for solvent extraction and crystallization

12

operations, and their estimation based on the methodologies that are described in the above

13

section. The book by Poling et al.30 provides an abundance of thermodynamic and physical

14

properties (such as viscosity, thermal conductivity, surface tension and diffusivity) for the pure

15

component and mixtures of various organic compounds. A detailed list of models developed for

16

the prediction of the properties of interest for solvent selection along with the descriptors

17

considered for diverse organic compounds has been reported in a recent review by Katrizky et

18

al.44 A statistical analysis using eight solvent parameters including hydrogen bond acceptor

19

propensity, hydrogen bond donor propensity, polarity/dipolarity, dipole moment, dielectric

20

constant, viscosity, surface tension and cohesive energy density of 96 pure solvents to separate

21

out the basic groups for the discovery of new polymorphs by crystallization has been carried out

22

by Gu et al.64 Table 1 reports the AAE of the property of interest using various property

23

estimation methods. Table 3 lists out the properties of interest those are relevant to the context of

24

this article.

25

Solvent Extraction

26

The physical properties of interest are, generally, classified into three categories: primary,

27

secondary and mixture properties. For liquid-liquid extraction, the primary properties (such as

28

boiling point, melting point, liquid density, viscosity, critical properties) can be predicted using 18 ACS Paragon Plus Environment

Page 21 of 68

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1

GC, Constantinou-Gani, MG and CI methods.31, 33, 35, 40 For practical liquid-liquid extraction, the

2

ratio of the densities at the operational temperature must be at least 1.05, the melting point

3

should be lower, and the boiling point should be far above the operating temperature. Secondary

4

properties, (such as vapour pressure and heat of fusion) which are predicted using analytical

5

expressions and are functions of primary properties, can also be determined by GC, MG, CI

6

methods and their combinations. 65-67 Mixture properties such as activity coefficients involved in

7

phase equilibrium are estimated using UNIFAC method. 32, 68

8

Infinite dilution activity coefficient

9

An important mixture property relevant to solvent extraction is the infinite dilution

10

activity coefficient (γ∞). This property measures the interactions between solute and solvent

11

molecules in the absence of solute-solute interactions; and gives an estimate as to how the solvent

12

medium differs from the pure solute. The activity coefficient value also accounts for the non-ideal

13

behaviour of a mixture and can be predicted with UNIFAC/modified UNIFAC using vapour

14

liquid equilibrium / liquid-liquid equilibrium (VLE/LLE) interaction parameters. For

15

pharmaceutical molecules, the activity coefficient can be estimated using regular solution

16

theory,57 UNIFAC, modified UNIFAC and COSMO-RS;

17

solubility using equation 9, which states that the solid solubility (xi) is a function of the activity

18

coefficient (γi) and pure component properties of the solute (the melting temperature Tm and heat

19

of fusion ∆Hf ).

20

ln xi =

69

and in turn can be used in predicting

∆H f  Tm  1 −  − ln γ i RTm  T 

(9)

21

where R is the gas constant. The use of constitutive models in the phase equilibria of

22

pharmaceutical product-process design for pure component and mixtures has recently been

23

discussed in a review article by O’Connell et al.15 Various case studies have been discussed with

24

detailed model relations for VLE of oleum, formaldehyde and water; VLE of carbon-dioxide and

25

hydrogen sulphide in aqueous amine solutions; LLE of carboxylic acid in aqueous solution.

26

Further, various thermodynamic models such as perturbed-chain statistical associating fluid

27

theory (PC-SAFT), non random two liquid – segment activity coefficient (NRTL-SAC),

28

conductor like screening model – segment activity coefficient (COSMO-SAC) and UNIFAC-GC

29

can be used to predict γ∞ values. There are, however, many gaps in the UNIFAC parameter tables 19 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 68

1

due to lack of the necessary experimental data. Recently, a GC+ approach for predicting mixture

2

properties by combining the UNIFAC-GC based activity coefficient model with valence CIs

3

called UNIFAC-CI, has been developed.70 The properties that are estimated from the ratio of

4

activity coefficients at infinite dilution in two phases are: solvent loss, must be as low as

5

possible; separation factor, solvent capacity and selectivity, must be as high as possible. Pretel et

6

al.71 have described solvent selection for extraction in detail and listed out the dominant

7

properties of separation problems as the solute distribution coefficient, solvent loss, solvent

8

power, and solvent selectivity.

9

Crystallization

10

The properties of interest for crystallization include the melting point, (where the solid

11

and liquid phases exist in equilibrium); which in turn is used for prediction of solubility,

12

viscosity and heat of fusion. The boiling point specifies the volatility of a compound and is one

13

of the important properties in characterizing the solvent. Other properties such as critical

14

temperature, flash point, enthalpy of vaporization can be estimated based on boiling points.

15

Dielectric constant, which measures the ability of a liquid to solvate a charged molecular species

16

is used to characterize the polarity of organic solvents. Stanescu and Achenie62 found that in

17

solvents with high dielectric constant, the yield is limited by the reversibility of the reaction.

18

Apart from basic physical properties, the complex physical properties involving

19

interaction between solute/solvent and solvent/solvent include: hydrogen bonding interaction

20

parameter, octanol-water partition coefficient, solubility (Hildebrand solubility parameter,

21

Hansen solubility parameter), donor / acceptor numbers, solvatochromic parameters and other

22

environmental related parameters.

23

Hydrogen bonding solubility parameter (δH)

24

This property quantifies the hydrogen bonding ability of a compound. Highly polar

25

solvents such as methanol have high δH values while non-polar solvents like hexane have very

26

low δH values. As the polarity of the solvent molecules increases, the hydrogen bonding

27

tendency between solute molecules decreases, which may affect the stimulation time of

28

nucleation or the polymorphic structure of the solutes. It is evaluated using GC,72 MG and MG-

29

CI52 and is a function of molar volume. 20 ACS Paragon Plus Environment

Page 23 of 68

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

Industrial & Engineering Chemistry Research

n-octanol/water partition coefficient (P)

2

This property is the ratio of the concentration of a compound in n-octanol to that in water

3

at equilibrium. The logarithm of this partition coefficient (log P) is used in calculating numerous

4

physical properties such as membrane transport and water solubility. GC, MG, CI, MG-CI and

5

structural analog approaches52, 73-75 are used to estimate log P values. Soskic and Plavsik76 have

6

put forward a modeling approach for predicting log P from molecular CIs by using empirically

7

determined optimized weights for characterization of skeletal atoms in a molecule instead of the

8

valence delta values and found better estimates of this property.

9

Solubility parameter (δ)

10

The usefulness of δ lies in the ease with which relative solubility comparisons can be

11

made. The smaller the difference in δ values, the greater is the solubility of two chemicals with

12

each other. This is, therefore, a comparative approach and not an absolute measurement. The

13

solubility being the prime property for the solvent selection for crystallization, many research

14

articles have been published over last five years on the prediction of solubilities of organic

15

compounds in various solvents using different methods. The Hildebrand solubility parameter and

16

the Hansen solubility parameter estimations are widely used to predict solubility of solutes in

17

solvents.58

18

The prediction of pharmaceutical solubility using two thermodynamic models, NRTL-

19

SAC and COSMO-SAC, which predict solubility from ab initio calculations have been carried

20

out for four compounds namely, lovastatin, simvastatin, rofecoxib and etoricoxib. The NRTL

21

method was found to offer superior performance than COSMO model in rapidly screening

22

solvents for the crystallization process.77 A review by Modaressi et al.78 on various models for

23

prediction of solid solubility in different organic solvents give an insight about developments in

24

the GC approach for the prediction of three Hansen solubility parameters. For the usefulness of

25

solvent selection, various databases available in literature and various methods to determine the

26

solubility were discussed in detail. Among the methods of property estimation GC, CI, UNIFAC-

27

CI, NRTL-SAC and COSMO methods are discussed with regard to the predictions of phase

28

equilibrium. Tsivintzelis et al.58 have modeled the solid liquid equilibrium for pharmaceutical

29

solvent systems accounting for their complex hydrogen bonding behavior and fitted them to

30

Hansen solubility parameter. Their methodology is applied for modeling the solubility of three 21 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 68

1

pharmaceuticals, namely acetanilide, phenacetin and paracetamol; using the non-random

2

hydrogen bonding equation of state. Their solubility predictions were satisfactory and matched

3

well when compared with COSMO-RS method. Ruether and Sadowski79 applied a

4

thermodynamic model based on PC-SAFT to correlate and predict the solubility of different

5

drugs in pure solvents and solvent mixtures in designing a crystallization process. Using this

6

approach with an input of very few pure component properties, the solubility predictions for

7

various drug intermediates such as paracetamol, ibuprofen, sulfadiazine, p-hydroxyphenylacetic

8

acid and p-aminophenylacetic acid matched well with the experimental results. The solubilities

9

of active pharmaceutical ingredients of aspirin, paracetamol and ibuprofen in various solvents

10

were predicted using the GC methods (UNIFAC, modified UNIFAC (Dortmund)) and a quantum

11

chemical approach COSMO-RS (real solvents) method.69 When compared with the experimental

12

results, it was found that among the three methods, GC modified UNIFAC method provided

13

lowest root mean square deviations for temperature and solubilities, and is able to accurately

14

predict the solvent that shows high solubility, followed by UNIFAC and COSMO-RS.

15

Toxicity

16

Toxicity is the most important property when dealing with pharmaceuticals. As many

17

organic solvents are toxic in nature, it is necessary to ensure that its end effects are considered

18

after the usage. Until recently, GC was the only structure property method that have been widely

19

used to predict the fathead minnow 96-hr, lethal concentration of 50 ppm (LC50) in literature.80

20

However, a short time ago Hukkerikar and his coworkers illustrated the usage of the developed

21

property models based on MG and MG-CI methods, to estimate environment related properties

22

and the uncertainties of the estimated property values through an application example. A total of

23

809 data points were used to estimate LC50.81 Apart from this, other properties related to

24

environment safety and hazard are the flash point, corrosion, reactivity, etc., which are necessary

25

to be considered wherever solvents are used.

26

Table 3 lists the properties of interest for extraction and crystallization operations, and the

27

structure-property relations available to estimate these properties. The above stated properties are

28

of interest in general for the selection of organic (aliphatic or aromatic) solvent. For inorganic

29

solvents and ionic liquids, depending on their nature and interactions, the properties of interest

30

differ even when we look at extraction and crystallization operations.12, 22 ACS Paragon Plus Environment

82

For example, the

Page 25 of 68

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1

hydrogen bond basicity, hydrogen bond acidity, hydrophobicity etc., are important

2

considerations for an ionic liquid as an alternative solvent.83 Hence, there is a need for better

3

structure property correlations for individual properties of interest and this requires further

4

exploration.

5

Rational Solvent Design Approaches

6

Gani and his co-workers proposed a GC based exhaustive search approach for molecular

7

design.13 Their methodology starts with selecting the structural groups from the basis set (the

8

total number of groups considered for designing a molecule) and generating a large number of

9

combinations of the selected groups. It is then followed by the constraint check.

10

Constraints

11

Three classes of constraints are considered in the literature for solvent design problem,

12

namely, property constraints, structure feasibility constraints and industrial practicability

13

constraints. The first two constraints are most commonly employed.

14

Property Constraints

15

Property constraints are the inequality constraints where, the bounds on the properties of

16

interest are specified with regard to process conditions. The properties of interest and their

17

estimation methods are already discussed in the previous sections and at this point it suffices to

18

state that bounds are imposed on the estimated values of the properties of interest expressed in

19

terms of the selected set of groups.

20 21

Structural Constraints

22

The structural constraints are employed to generate a structurally viable molecule. Two

23

different approaches for structural constraints are used in the literature: one is based on the octet

24

rule which ensures that the molecule as a whole does not have any free attachments and is given

25

by the following expressions,84

26

∑ ∑ (2 − v )U j

i

27

ij

= 2m

(10)

j

∑ ∑ U ij = N max i

(11)

j

23 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 68

∑ U ij = 1

1

(12)

j

2

where νj is the valence of group j, Uij is the binary variable with i representing the groups in the

3

basis set and j representing the position in the molecule, Nmax represents the maximum number of

4

positions in the molecule and m = −1,0,1 for acyclic, monocyclic and bicyclic groups. Note that

5

this octet rule does not allow aromatic groups and most cyclic compounds, and cannot handle

6

isomers. Eljack et al. have come up with a constraint pertaining to free bond number (FBN),

7

which differentiates the aromatic, acyclic and cyclic groups and is given by,85

 Ng  n FBN = 2  ∑ n g − 1  + 2 N r ∑ g g g =1  g =1  Ng

8

(13)

9 10

where Nr is the number of rings in the final molecule, ng is the number of groups ‘g’, FBNg is the

11

number of free bonds in each group and Ng is the total number of basis groups chosen for the

12

study.

13

Vaidyanathan et al.86 have developed a set of structural constraints for the UNIFAC

14

groups as base groups (groups chosen to form a molecule); where these groups are classified into

15

four classes and three types. The division of groups into various classes was based on the

16

valency of the group. The type of the group specifies whether the attachment belongs to one or

17

more atoms. For example: type 1 groups are those in which all the attachments belong to a single

18

atom. The following are the structural feasibility constraints employed for UNIFAC groups:

19

1. The sum of number of univalent groups and the trivalent groups is an even number.

20

2. Sum of valencies of all the groups in the molecule is greater than or equal to twice the

21 22 23 24 25 26 27

maximal valency. 3. Sum of valencies of all the groups in the molecule is greater than or equal to twice the total number of groups less 2. 4. Sum of number of ternary groups and twice the number of quaternary groups is greater than or equal to number of univalent groups less 2. 5. The product of number of odd-valent groups less 2 and number of divalent groups of type 3 is greater than or equal to zero.

24 ACS Paragon Plus Environment

Page 27 of 68

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

Industrial & Engineering Chemistry Research

6. The product of number of odd-valent groups less 3 and number of trivalent groups of

2

type 3 is greater than or equal to zero.

3

The other structural constraint approach is based on adjacency matrix which is a

4

symmetric matrix that gives the information about the connectivity of the groups in the molecule.

5

The constraints are,23

6

1. The sum of the valence of the groups present in the molecule is equal to the sum of the

7

elements present in the adjacency matrix.

8

w i v a li =

i −1



a

ji

+

j =1

N



a ij

(14)

j = i+1

9

2. Sum of the upper triangular matrix elements in the adjacency matrix is equal to the total

10

number of groups present in the molecule -1 + number of rings. (If number of rings = 0

11

an acyclic molecule is obtained) N −1

12

N

∑∑a

ij

= n −1 + Nr

(15)

i =1 j =i +1

13

3. Number of groups in the molecule N

14

n = ∑ wi

(16)

i =1

15

where wi is a binary number which indicates the presence or absence of the ith group, vali is the

16

valency of the ith group, aij is an element of the adjacency matrix in the ith row and jth column. N

17

is the total number of groups in the basis set, n is the number of groups in the molecule and Nr is

18

the number of rings.

19

Apart from the general structural constraints, in general, molecules are unstable if two

20

heteroatoms are bonded to same carbon atom and at least one heteroatom is also bonded to

21

hydrogen atom. If neither of the two heteroatoms is bonded to hydrogen atoms, then the

22

combination could lead to a stable molecule. Some combinations such as, peroxides (HO-OH)

25 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

which are physically viable, should be avoided as those combinations are highly reactive and are

2

not considered as solvents in general.

3

Practicality Constraints

Page 28 of 68

4

For industrial practicality of the solvent, there are two main constraints reported in the

5

literature.8 One constraint is that it should be synthesized easily and the other is the stability of

6

the molecule. It follows that more the functional groups in the molecule, the greater is the

7

difficulty to synthesize that molecule. Therefore, the limit on the number of kinds of functional

8

groups should be one of the constraints. The molecules that cannot be synthesized easily (for

9

example: molecules containing both aromatic and cycloalkyl groups) and are relatively

10

expensive have to be eliminated with a constraint. The stability of the solvent is ensured by

11

eliminating the selection of the unstable groups such as aldehyde, ethenyl, acetenyl and more

12

than three carbonaceous substitutions on a five / six membered ring. Properties of the structurally

13

feasible molecules are then predicted using GC methods.

14

Optimization Approaches

15

The CAMD methodology can be classified into four categories: generate and test,

16

mathematical optimization, a priori methods and combinatorial optimization approaches. A

17

knowledge based generation and test approach is composed of a set of rules for selecting groups,

18

generating feasible molecules and rating them. This method does not promise that the solvents

19

generated are optimal. In addition, this approach is time consuming because of combinatorial

20

explosion with increase in the number of functional groups. The mathematical optimization

21

approaches including mixed integer non-linear programming (MINLP) and mixed integer linear

22

programming (MILP) have difficulty in framing the exact expression of the structure-property

23

relationships. A priori methods such as COSMO-RS (conductor-like screening model for real

24

solvents) are based on quantum chemical calculations and offer alternatives to the more

25

commonly used GC methods. The combinatorial optimization makes use of stochastic approach

26

and can handle combinatorial complexity of molecular design.

27

CAMD methodology is based on different property estimation methods coupled with an

28

optimization technique. Deterministic and stochastic optimization procedures are used to 26 ACS Paragon Plus Environment

Page 29 of 68

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1

generate a list of globally best molecules (based on the basic groups chosen) which render the

2

expression of a specific property. For nonlinear (MINLP) problems, due to high level of

3

complexity neither method guarantees the generation of a global solution, as there are many local

4

minima along with a global minimum and it is most likely that the solution will get trapped in

5

one of the many local minima. However, in special cases, these methods can guarantee a global

6

solution but such cases occur rarely in molecular design. Also, these techniques may encounter

7

problems in the case of a combinatorial explosion or a discontinuous search space in solving an

8

MINLP. One strategy to address this issue is to linearize the non-linear components and

9

reformulate the original MINLP problem to a MILP problem. Another approach is to reduce the

10

search space by utilizing the problem structure. When these problems are linearized (MILP) the

11

global optimal solution is found through deterministic approach. The stochastic methods, which

12

generate and use random variables for optimization, can handle combinatorial explosion and a

13

discontinuous search space, because these methods are essentially combinatorial in nature but

14

never guarantee convergence.

15

The types of problems that are encountered in optimization with discrete variables

16

include mixed integer programming, binary integer programming, MILP and MINLP. The most

17

general case is the mixed integer programming problem and is represented as follows: 87

18

Minimize : f(x, y)

19

Subject to : hi (x, y) = bi

20

gi (x, y) ≤ cj

21

i=1,2,….,m

(17)

j=1,2….,r

x=[x1 x2 ….. xn]T , y ∈ Y integer,

22

where f(x, y) is the objective function, hi (x, y) and gi (x, y) are the equality and inequality

23

constraints respectively, which can be either structure feasibility constraints or property

24

constraints. x is a vector of continuous variables and y is a vector of integer variables. The

25

functions f, g and h are convex in continuous variables and are once continuously differentiable.

26

However, depending on the structure of the optimization problem, there are ways to transform

27

non-convex problem into a convex problem.

28

A special case of integer programming in which all the variables of y are either 0 or 1 is

29

called as the binary integer programming and is widely used to solve product design problems.

30

Many mixed integer programming problems are linear in objective function and constraints and

27 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 68

1

are termed as MILP problems. When the objective function and/or the constraints are non-linear

2

it is termed as MINLP problem.

3

Deterministic Approaches

4

Deterministic methods for solution of MILP and MINLP problems include outer approximation

5

(OA), branch and bound (BB), interval analysis and generalized Benders decomposition. Among

6

which BB and OA are most commonly employed and are discussed below.

7

Branch & Bound

8

The branch and bound is very effective method for solving MILP and NLP problems. Consider

9

a general MILP problem as represented by equation 17. The simplest way to solve such integer

10

optimization problem is to enumerate all integer points by discarding the infeasible ones;

11

followed by identification of point that has best objective function value among the feasible

12

integer points. However, this approach will be computationally expensive even for moderate size

13

problems. The BB can be considered as the refined enumeration method in which most of the

14

non promising integers are discarded without even testing them.

15

In BB method, initially, the continuous problem obtained by relaxing integer restrictions

16

on the variables (i.e. y can be real) is solved. If the solution happens to be an integer, it

17

represents the optimal solution. Otherwise (i.e. if yk is real), the problem is split into two

18

subproblems, one with an upper bound constraint (yk ≥ [yk] +1) and the other with lower bound

19

constraint (yk ≤ [yk]). This split will facilitate in reducing some part of the continuous space that

20

is not feasible for the integer problem and at the same time ensures that none of the feasible

21

solutions are eliminated. The process of branching continues till the optimal solution is found.

22

This method is at times computationally expensive, when the number of branches becomes too

23

many.

24

The BB used the same method as described above to solve the MINLP problems with

25

non-linear relations among continuous variables and linear binary or integer variables, and the

26

problem is represented by equation 18:87

27 28

28 ACS Paragon Plus Environment

Page 31 of 68

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1

Minimize : z = f(x)+cTy

2

Subject to : hi (x) = 0 gi (x) +By ≤ 0

3 4

i=1,2,….,m

(18)

i=1,2…..,m

x∈ X, y ∈ Y integer,

5

where x is a vector of continuous variables, y is a vector of integer (usually binary) variables, B

6

is a matrix and X and Y are sets.

7

Outer Approximation

8

The outer approximation (OA) algorithm has been developed by Duran and Grossmann in

9

1986.88 The basic idea in the OA algorithm is to decompose the MINLP model (equation 17)

10

into NLP primal and MILP master problems. In each iteration, OA involves solving of two

11

subproblems. First, the problem is solved as a NLP(yk) by fixing the integer y variables at some

12

set of values yk and optimize over continuous x variables between their bounds.

13

NLP problem: Equation (18) is modified as:87

14

Minimize : f(x)+cTyk

15

Subject to : hi (x) = 0 gi (x) +Byk ≤ 0

16 17

i=1,2,….,m

(19)

i=1,2….,m

x∈ X, yk ∈ Y integer,

18

Then a linearization is carried out around the optimal solution, and the resulting constraints are

19

added to the linear constraints that are already present. This new linear model is referred to as the

20

master MILP problem.

21

MILP subproblem: The new variable w is introduced to make the objective function linear.

22

Minimize : w +cTy

23

Subject to : w ≥ f(xi) +∇f T(xi)(x-xi),

i=1,2,…..,m

24

hi (xi) + ∇hT(xi)(x-xi)=0

25

gi (xi) + ∇gT(xi)(x-xi) +Byi ≤ 0 i=1,2….,m

26

i=1,2,….,m

x ∈ X, y ∈ Y integer,

27 29 ACS Paragon Plus Environment

(20)

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

For the minimization problem, the NLP primal problem provides an upper bound for the

2

objective function, while the MILP master problem creates a lower bound. The primal and

3

master problems are solved alternatively until convergence. This method cannot guarantee a

4

global optimal solution because of the non-convexity in the NLP subproblem. A common

5

strategy is to try several initial guesses, to see if a consistent solution is obtained.

6

A mathematical programming approach to solve the CAMD problem for design of

7

solvents for extractive fermentation employing the OA algorithm with equality relaxation and

8

augmented penalty has been proposed by Wang and Achenie.89 Initially the problem is

9

reformulated such that all the binary variables appear linearly and the continuous variables can

10

appear linearly and non-linearly. As the first step, non-linear programming problem is solved and

11

if the solution obtained is an integer then the program terminates successfully; otherwise the

12

MILP problem is solved by linearizing the non-linear constraints using slack variables (a

13

variable that is added to an inequality constraint to transform it to an equality constraint) and

14

penalty weights.

Page 32 of 68

15

Sinha et al.90 used the reduced space BB strategy for solvent design problems and

16

introduced the idea of splitting functions that result in smaller number of branching nodes. A

17

reduced dimension BB algorithm, where branching is done only for a set of branching functions

18

instead of all search variables has been proposed.91 This methodology is applied to a case study

19

where an optimal solvent had to be designed for printing industry to serve as a cleaning agent.

20

The problem with 120 non-linear variables is solved with just four splitting variables. Also, it is

21

observed that as the problem size increases the computation time increases linearly. Sinha et al.

22

developed an interval based global optimization tool called LIBRA to solve CAMD for design

23

blends of solvents for blanket wash to arrive at a global optimal solution.92

24

A recent review by Floudas and Gounaris discusses the research progress in deterministic

25

global optimization over a decade.93 A design of a biocompatible solvent for extractive

26

fermentation and extractive distillation process to yield water free ethanol is formulated as an

27

MINLP problem by Cheng and Wang.94 A two-phase computational scheme was introduced with

28

relative volatility of solvent as one of the prime property constraints. The mixed-integer hybrid

29

differential evolution (MIHDE) algorithm was first applied in order to obtain a feasible solution.

30

Later, to confirm whether the optimal design was achieved, feasible solution obtained from

30 ACS Paragon Plus Environment

Page 33 of 68

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1

MIHDE was used as an initial starting point for the mixed-integer sequential quadratic

2

programming solver. The optimal solvent obtained by this approach was found to be 3-octanone.

3

The identification of three dimensional molecular (crystal) structure that best fits the X-

4

ray diffraction measurements using discrete and non-linear optimization approach is reported by

5

Sahinidis.95 The model developed possesses multiple local minima because of non-convexities in

6

the objective function. As the model does not impose atomicity constraints, the solutions do not

7

guarantee the coincidence of two atoms will not take place. The addition of constraints provided

8

complete problem formulation leading to the correct crystal structures.

9

Stochastic Approaches

10

Stochastic optimization techniques include simulated annealing (SA), genetic algorithm

11

(GA) and Tabu search. These heuristic search methods can be applied to certain type of

12

combinatorial problems when BB and OA are difficult to apply or converge too slowly. In Tabu

13

search, the random moves are performed by preventing the already visited solutions by keeping

14

track of previous moves with the incorporation of a short term memory function. The knowledge

15

on the new starting point (which has not been explored previously) and when to restart the entire

16

procedure is determined with the help of a long term memory function. This diversifies the

17

search and spans the entire search space. The GA framework being a multiple point search

18

technique offers a number of advantages. It examines a set of solutions and not just one solution,

19

and the stochastic nature of the algorithm helps the search to escape local minima traps. In

20

addition, it is easy to solve as it is not a derivative based technique. A detailed review on the

21

usage of these three leading optimization techniques along with their advantages and

22

disadvantages has been reported by Fouskakis and Draper.96 The SA and GA are widely used

23

stochastic optimization techniques in the literature on product design; hence we intend to

24

describe the functioning of these methods in little detail in the following section.

25

Simulated Annealing

26

SA is a combinatorial optimization technique for solving unconstrained and bound-constrained

27

optimization problems based on random estimates of the objective function, f and the evaluation

28

of the constraints (equation 17)87. This method usually requires large number of function

29

evaluations to find the optimal solution. But, it promises the attainment of global optimal

30

solution even for the ill conditioned functions with multiple local minima. Also, the quality of 31 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 68

1

the final solution is not affected by the initial guess. A new point is randomly generated in each

2

iteration. The distance between the new point and the current point is based on probability

3

distribution. The strategy of attaining global solution is achieved through the introduction of two

4

steps. The first is the so-called "Metropolis algorithm", which helps in spanning the space of

5

solutions with the following probability criterion

6

P(∆f) = e-∆D/kT > R(0,1)

(21)

7

where, ∆D is the change of distance implied by the move, k is the scaling factor called

8

Boltzmann’s constant, T is the “synthetic temperature”, R(0,1) is a random number in the

9

interval [0,1],

∆f = f(Xi+1)-f(Xi) and P is the probability. D is called the cost function/

10

performance objective function and corresponds to free energy in the case of annealing a metal.

11

The second step is, by the analogy of a metal, to lower the temperature. This notion of slow

12

cooling is implemented in the SA algorithm as a slow decrease in the probability of accepting

13

worse solutions as it explores the solution space.

14

The algorithm accepts all new points that lower the objective function value along with

15

few points that raise the objective function with certain probability. The algorithm avoids being

16

trapped in local minima by accepting the points that raise the objective function.

17

Genetic Algorithm

18

The tabu search and SA operate by transforming a single solution at a given step; whereas GA

19

works with a set of solutions called a population.87 The GA begins with a population of ‘n’

20

chromosomes, which are random strings representing decision variables (0 and 1). Each string is

21

associated with fitness value derived from the objective function and is used in successive

22

genetic operations. At each step, the GA randomly selects the individuals from the current

23

population are then these are made to go through the process of evolution using three operators

24

namely selection, crossover and mutation to create a new population. By using mutation and

25

crossover functions GA handles the linear and bound constraints and generates only feasible

26

newer points. The new population is then tested for termination; and until the termination

27

criterion is met, the population is iteratively operated by above three operators. One cycle of

28

these three operations and subsequent evaluation procedure is known as a generation.

29

the most widely used stochastic optimization technique in the product design problems.

GA is

30

All the above discussed methods and their accuracies are limited by the availability of

31

parameters of GC methods. For single objective problems, the solution is unique as we are trying 32 ACS Paragon Plus Environment

Page 35 of 68

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1

to minimize or maximize only one particular property in the solvent design problem. However,

2

in a multi-objective optimization framework in general, the solution is not a single output but a

3

pareto set which is obtained from the potential compromise among the objectives. Hence, the

4

solvent design problem with multi-objective function is a non-convex set (a set of points in

5

which not all segments connecting points of the set lie entirely in the set); and we have many

6

local minima apart from the global minima.

7

There is ample literature on the application of stochastic optimization techniques for

8

CAMD problem over past decade. Venkatasubramanian et al.97 was one among the foremost to

9

employ a stochastic optimization technique, GA, for polymer design problem where properties

10

are estimated using GC method. Van Dyk and Nieuwoudt98 developed a GA based CAMD

11

called ‘SolvGen’ to design the solvents and solvent mixtures for extractive distillation,

12

azeotropic distillation, liquid extraction and liquid chromatography. The study revealed that

13

solvent blends performed better than pure solvents for extractive distillation systems and a few

14

new solvents other than classical solvents were listed for azeotropic distillation systems. Also, a

15

number of these predictions were verified through experiments and the results found to hold

16

good. Lehmann and Maranas99 examined the combination of quantum chemical methods with

17

multi-objective optimization for design of solvents for liquid-liquid extraction, such as benzene-

18

cyclohexane system. GA has been chosen as the optimization technique and is tuned based on

19

GC methods. Tuned GA calls the quantum chemical subroutine to evaluate the properties of

20

generated molecules. A CAMD using tabusearch algorithm based on CI property estimation

21

method implemented with novel neighbour-generating operators, such as swap and move, has

22

been developed by Lin et al.100 to design transition metal catalysts. The Tabu lists helped in

23

diversifying the search space to cover the entire search space and locate the final solution

24

precisely. Also, the algorithm is able to locate large number of near optimal solutions within a

25

short span.

26

Wu et al.101 have proposed an improved genetic algorithm technique for CAMD, by

27

including the cross-generatant elitist selection, dislocation crossover, and mutation operators.

28

The results obtained by using this method for a known system agreed well with that reported in

29

literature. A review has been presented by Song and Song102 to design environmentally friendly

30

solvents for separation processes using CAMD approach based on SA technique and modified

31

UNIFAC GC method. The proposed methodology has shortened the computing time greatly and

32

few case studies have been illustrated for industrial problems with a single objective function. 33 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 36 of 68

1

Serrato et al.103 proposed a new design strategy composed of sequential GA combined with

2

quantum chemical calculations for CAMD. This strategy is applied to design of solvents for

3

extraction of lactic and acetic acids, and it was found that aldehyde and acid groups are the most

4

important groups for the design.

5

For more than a decade, Diwekar’s group has done extensive work on the deterministic

6

and stochastic approaches for CAMD. Kim and Diwekar104 used Hammersely stochastic

7

annealing algorithm for CAMD to design solvents for extraction of acetic acid from water. They

8

used UNIFAC model to estimate infinite dilution activity coefficient and the Hansen’s solubility

9

parameter model for solubility prediction. Xu and Diwekar105 reported that the Hammersley

10

stochastic GA performed better than the stochastic SA technique for the above mentioned case

11

study. A multi-objective efficient GA called MOEGA has been developed for solvent selection

12

and solvent recycling by the same group.106 The algorithm uses the weighting method and the

13

weights are generated by Hammersley Sequence Sampling technique. A higher number of pareto

14

sets were obtained when compared to SA technique. A more recent review article has been

15

published on the developments in product and process design for environmental considerations.

16

Unlike traditional design, such process design problems involve multiple objectives increasing

17

the complexity of the problem. The new approaches and algorithms that are continuously being

18

sought over past few years have been reviewed.107

19

A hybrid optimization approach which uses both deterministic and stochastic techniques (OA

20

and SA) to solve a solvent design problem has also been reported;108 however, it was also

21

mentioned that the approach has not been proven to give a global optimal solution. A systematic

22

deterministic and stochastic approach for solvent design that maximizes product formation by

23

enhancing the reaction rate constants of main reaction and suppresses by-product formation has

24

been proposed by Folic et al.109 This led to a MINLP problem formulation, which is linear in the

25

binary variables and is solved using OA algorithm. The solvent design methodology is modified

26

so as to fit the problem definition by building reaction model from the rate constant data of the

27

chosen molecules, before optimizing the objective function. On the other hand, modifying the

28

objective function (as the logarithm of the reaction rate constant) led to MILP problem

29

formulation and a case study showed that the solution of this design problem can lead to solvents

30

that potentially suppress undesired reactions. There was an overlap of the solvent candidates in

34 ACS Paragon Plus Environment

Page 37 of 68

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1

both deterministic and stochastic optimization approaches, indicating the relative insensitivity of

2

the approach employed to the design of solvents.

3

Solvent Design for Pharmaceuticals

4

Based on the discussion in earlier sections, it is evident that the need for the solvent

5

design for pharmaceuticals is immense and there exist various structure property prediction

6

methods and optimization techniques for designing a solvent with bounds on specific properties.

7

This section summarizes the literature on solvent design for pharmaceuticals. Liquid-liquid

8

extraction and crystallization are the most common and vital operations that occur in all

9

pharmaceutical manufacturing processes; selection of the solvent for these operations is

10

important as solvents influence the nature or quality of the finished product. Numerous articles

11

have been published related to crystallization when compared to extraction, as in most

12

pharmaceuticals, the crystallization is the final process operation where, all accumulated

13

impurities from process flow are present, as a result leading to more complexities. Table 4 gives

14

a summary of the various case studies reported in literature on solvent design, with the properties

15

of interest, property estimation and the optimization approach used for extraction and

16

crystallization operations.

17

Solvent extraction

18

A review on strategies for solvent selection and high pressure liquid chromatography

19

mobile phase optimization has been reported by Barwick110 to select alternative solvents, which

20

meet the performance requirements of toxicity, flammability and cost for solvent extraction and

21

liquid chromatography. The solvents have been classified according to their polarity and

22

selectivity by using Hildebrand and the Rohrschneider polarity schemes as the basis for solvent

23

selection. Cismondi and Brignole

24

branched molecules, which used GC based on the electronegativities of each group, to predict

25

mixture and pure component properties. The algorithm used for the study was the knowledge

26

based generation and test approach, which generates chemicals by means of some expert rules

27

but cannot guarantee that the solvents are optimal. Gani9 proposed a method for the selection of

28

green solvents for the promotion of organic reactions occurring in liquid phase based on reaction

111

have proposed an efficient search algorithm for design of

35 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 38 of 68

1

solvent properties such as reactivity of solvent, phase split, solubility, selectivity, toxicity, etc.,

2

with four illustrative examples. This methodology employs, estimation of thermodynamic

3

properties to generate a knowledge base of reactions using reaction indices and MG method with

4

ProCAMD database. The constraints are based on solvent and environment related properties

5

that directly or indirectly influence the rate and/or conversion of a given reaction. One of the

6

case studies was on the replacement for dichoromethane as solvent in oxidation reactions, where

7

the solvent needs to dissolve 3-octanol and 3-octanone with density close to water and should be

8

liquid at room temperature. The ProCAMD has generated 2-pentanone as the optimal solvent.

9

Crystallization

10

Crystallization, a key and complex operation in pharmaceutical processes, is a good

11

example to illustrate how process knowledge can be used in the selection of solvents. The crystal

12

morphology affects dissolution characteristics, bio-availability, solubility and the ease with

13

which the crystals can be compressed into tablets. As the pharmaceutical molecules are organic

14

in nature, studies are restricted to crystallization in organic solvents as the solubility of the

15

organic solute in aqueous solvents is poor. It is well known that different classes of molecules

16

with a single solvent give rise to different crystal morphologies. The crystal morphology of many

17

organic solutes is strongly influenced by the solvent used for crystallization and its polarity.

18

Highly polar solvents tend to produce crystals with low aspect ratio and vice versa with non-

19

polar solvents. 19 Further, Gernaey and Gani112 presented a systematic model based approach for

20

pharmaceutical product design and analysis consisting of a modeling tool; a knowledge base;

21

computer aided methods and tools, and a user interface. The application of the framework has

22

been demonstrated with examples of crystallization and fermentation processes.

23

A review on the methods used to estimate the solubility of organic solids in a wide

24

variety of solvents for the crystallization process has been presented by Frank et al.113 The

25

screening of solvents has been performed using Hildebrand and Hansen solubility parameters. In

26

a particular example, the Hansen solubility parameters for aspirin have been regressed using four

27

different solvents (acetone, chloroform, ethanol and cyclohexane). Once the parameters have

28

been identified, a quick estimate of the solubility of aspirin in any solvent can be obtained,

29

provided the solubility parameters are also available for the solvents. It was found that neither

36 ACS Paragon Plus Environment

Page 39 of 68

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1

Hansen model nor the UNIFAC model provided adequate quantitative results because of the

2

multifunctional nature of pharmaceutical molecules. Kolar et al.57 addressed the problem on

3

solvent selection for pharmaceuticals by investigating the solubility of pharmaceutical

4

compounds in a wide range of solvents of varying polarity and hydrogen bonding tendency. The

5

estimation of solubilities by regular solution theory approach, GC methods, QSPR methods, and

6

molecular simulation methods have been summarized. Their study was focused on small to

7

medium size aromatics and heterocyclic compounds. Abildskov and O’Connell

8

approach which uses GC method for predicting the solubility of sparingly soluble pharmaceutical

9

compounds for solvent design problems. To minimize the numbers of adjustable parameters and

10

reduce the uncertainty, an optimal reference solvent procedure has been reported; where the

11

difference in solubility at infinite dilution between the solvent of interest and the optimal

12

reference solvent is computed. This methodology has been applied to predict the solubility of

13

various compounds such as, ephedrine, hydrocortisone, salicylic acid, niflumic acid, diuron and

14

monuron. The optimal solvents found from this approach were found to be in good agreement

15

with the experimental measurements. In addition, this method was found to effectively eliminate

16

errors in pure-solute properties and many binary interaction parameters.

114

suggested an

17

Further, the difference in solubility ranges for polyphenols using Hansen solubility

18

parameters, with the variation in the polarity of the solvent (ethanol-water mixture) has been

19

estimated using GC method by Savova et al.

20

experimental solubility profile, were found to be in good agreement for ideal mixtures; though,

21

found to have some shortcomings because of kinetic effects in case of non-ideal mixtures, multi-

22

component systems and diffusion dominated processes.

115

The calculated δ values when compared to the

23

Hydrogen bonding plays a critical role in solvent selection for crystallization and δH of

24

solvents can be correlated to crystal morphology.72 Many publications have focused on ibuprofen

25

as the model pharmaceutical compound. A CAMD framework has been proposed by Karunanithi

26

et al.

27

The methodology has been illustrated to design an optimal solvent for cooling crystallization

28

process. The CAMD problem has been formulated as a MINLP model with two different

29

objectives: one with potential recovery as the performance objective, which needs to be

30

maximized and the other minimizing toxicity. The structural constraints based on octet rule have

31

been employed. Further, properties such as solubility, flashpoint, toxicity, viscosity, normal

19

to design solvents with δH as a key property for desired ibuprofen crystal morphology.

37 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 40 of 68

1

boiling and melting point have been posed as property constraints. The solubility has been

2

estimated using the regular solution theory approach (and equation 9) as discussed in a previous

3

section of this article. All the properties have been estimated using GC based methods. The

4

decomposition based approach has been used to decompose the MINLP model into sub-problems

5

for the generation of optimal solvent molecules. It was observed that the ibuprofen crystals

6

crystallized from solvents with high hydrogen bonding ability were plate-like with low aspect

7

ratio and of large size. On the other hand, ibuprofen crystallized from solvents with low

8

hydrogen bonding ability were needle-like crystals with high aspect ratio. The optimal solvent

9

for cooling crystallization process was found to be methoxymethyl 2-ethoxy acetate for

10

maximizing potential recovery and methoxy (2-methoxyethoxy) methane for minimizing

11

toxicity. Also, the performances of the solvents were verified qualitatively through SLE

12

diagrams and, more recently through experiments. 116

13

The relation of crystal size and the hydrogen bonding ability of the solute with the

14

solvent has been debated in literature.20, 21, 117 The belief has been that hydrogen bonding is a

15

more important parameter than the polarity of the solvent. In addition, the nucleation and growth

16

towards the formation of a specific polymorph is affected by hydrogen bonding. A method based

17

on the atomic electronegativity that calculates the partial charge distribution in the solute and

18

solvent molecules to develop correlations and predict the hydrogen bonding ability of the solute

19

and/or solvent molecule has been proposed by Mirmehrabi and Rohani.117 A case study has been

20

reported using this approach to screen solvents for crystallization of the drug ‘Ranitidine

21

Hydrochloride’ using database search. It was found that solvents with higher dipole moments

22

influence the hydrogen bonding between solute molecules.

23

Further, the study on the effect of solvent on the shape of the crystal has been conducted

24

with experimental validation of the proposed CAMD framework using decomposition based

25

approach.116 It has been found and verified that the ibuprofen crystals formed from 2-ethoxy

26

ethyl acetate as solvent are significantly larger and have low aspect ratio when compared to

27

crystals formed from solvent, n-hexane, by the combined approach of experiments, database

28

search and CAMD. In addition, it has also been proved from combined approach of experiments

29

and CAMD, for the design of solvents for crystallization of carboxylic acids, that different solute

30

molecules of the same class (carboxylic acids) show, different crystal morphology with a single

31

solvent component.20 It has been observed from experimental studies, that the aspect ratio of the 38 ACS Paragon Plus Environment

Page 41 of 68

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1

crystals does not always have an inverse relationship with δH. Moreover, solvents exhibiting high

2

intermolecular hydrogen bonding tend to be viscous in nature, thus it leads to the conclusion that,

3

δH alone should not be taken as a criterion to shortlist solvent molecules. The database search

4

approach resulted in monohydric alcohols as the optimal solvents for the crystallization of

5

sebacic acid with potential recovery as objective function and melting point, boiling point,

6

solubility, δH, viscosity, flash point, toxicity as property constraints. Acquah et al.118 have

7

identified “acceptance number” as an index that substantially collars the ibuprofen-solvent

8

hydrogen bonding interactions, apart from dipole moment and solubility, by fitting linear

9

regression models to the experimental data and the solvent properties. The predictions based on

10

the model, have been found to be in good agreement with experimentally determined aspect ratio

11

data from literature. A flowchart has been proposed for various solvent categories and

12

subcategories based on acceptance number and intermolecular interaction respectively.

13

The study on the use of mixed solvents for crystallization was of interest for quite a few 19,77, 119,120

14

authors in the literature.

Its application in improvement of physical and chemical

15

properties of the crystal and solvent; and ability to dissolve certain substances made the uses

16

manifold. Winn and Doherty presented a review on modeling of crystal shapes of organic

17

materials grown from solution.119 It has been observed that, crystals grown from a mixture of

18

solvents have different characteristics than the crystal grown from pure solvent and this effect is

19

significant if solute has very different solubility in each solvent. On the other hand, the study by

20

Zilnik et al.120 on the solubility of diclofenac in a mixture of dichloromethane and dimethyl

21

sulfoxide resulted in decolourization of solution suggesting instability of solute in mixed solvent.

22

In continuation to the earlier discussion on design of solvent for desired ibuprofen crystal

23

morphology by Karunanithi et al.;19 a case study on design of solvent anti-solvent mixture has

24

been carried out by drowning out technique. The results indicate that 3-(ethoxymethoxy)

25

propanal and butane-1,2,4 triol were the optimal solvent and anti-solvent pair, when potential

26

recovery has been maximized.

27

Recently, a case study on solvent design using CAMD, for improving the crystal

28

morphology of 2, 6-dihydrobenzoic (DHB) acid has been reported. 21 A database search has been

29

performed to select solvents with different properties such as polarity, hydrogen bonding ability,

30

aromaticity, etc., as a preliminary step. Further, MD simulations were carried out to understand

31

the interactions on solvent-crystal interfaces and modified attachment energy model was used to 39 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

estimate the aspect ratio of crystals in various solvents. From this approach, it has been found

2

that a mole ratio of 1:4 of diethylether / toluene mixture reduced the aspect ratio of the DHB

3

crystals and was confirmed experimentally. Therefore, it can be concluded that the choice of

4

solvent plays a significant role in the final crystal morphology and the CAMD approach has the

5

potential to aid in the rational selection of solvents for improving the morphology of the crystal.

6

However, it should be noted that this is not the only approach to design crystallization processes

7

for desired crystal morphology. Other factors which tailor the crystal morphology are pH,

8

temperature, supersaturation, mixing intensity, seed crystals121,

9

account while designing the process.

122

Page 42 of 68

and have to be taken into

10

Summary

11

This article presents an extensive literature survey on chemical product design, property

12

prediction methods and optimization approaches employed for product design. Amongst the

13

methods for prediction of properties from molecular structure, the GC is the oldest and most

14

widely used. However, due to limited availability of reliable group contributions, its usage is

15

limited. The GIC method gives better estimates than GC but requires more model parameters for

16

property estimation. The method of CI, which makes use of “molecular descriptors”, has better

17

prediction accuracy when compared to GC and GIC; but the knowledge of regression

18

coefficients is essential for property prediction. MG method with inclusion of higher order group

19

contributions have been widely used in recent literature for property prediction because of the

20

low AAE. The combinatorial methods GC-CI, UNIFAC-CI, MG-CI methods, where the missing

21

group’s contribution can be predicted by CI, can sometimes lead to high AAE in property

22

estimation. A brief insight on using molecular simulation with quantum mechanical calculations

23

for the prediction of properties is also discussed. Although, the resultant molecules obtained by

24

this method are more reliable and can be directly put in use, the method is computationally

25

expensive. The scope on the estimation of properties from chemical structure using more

26

accurate thermodynamic models and ab initio calculation methods, well correlated topological

27

indices, etc., is extensive and is a great challenge for future research.

28

The properties important for solvent design (as reported in literature) include, solubility

29

and hydrogen bonding interaction parameter for crystallization process; and activity co-efficient

30

involved in phase equilibrium for extraction. Ionic liquids tend to find wide applications in

40 ACS Paragon Plus Environment

Page 43 of 68

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1

pharmaceuticals and, hence, can also be considered in the design of new solvent molecules. The

2

properties of interest in choosing the optimal ionic liquid may differ from the properties of

3

interest of organic solvents and there is a need for further investigation.

4

Further, literature on numerous optimization techniques that are used to solve the solvent

5

design problem has been discussed. The problem formulation, incorporating definitions of

6

different constraints such as property constraints, structural feasibility constraints, etc., is

7

presented in detail. Both stochastic approach and deterministic approaches were widely used in

8

the literature. Many developments were made and are still underway, to broaden the search

9

direction, so as to guarantee a global optimal solution.

10

Rational solvent design can be further extended to the process design problem by

11

choosing the operating conditions such as the temperature, pressure and on what follows in the

12

process. This sequential decision making can lead to better performance as it includes the design

13

space and it incorporates the intrinsic links between molecules and process.

14

Acknowledgement

15

The author M. Harini is grateful to Council of Scientific and Industrial Research (CSIR), New

16

Delhi for financial support.

17 18 19 20 21 22 23 24

41 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

References

2

1.

3

Engg. Res. Des. 2000, 78, 5.

4

2.

Hill, M., Product and process design for structured products. AICHE J. 2004, 50, 1656.

5

3.

Harper, P. M.; Gani, R., A multi-step and multi-level approach for computer aided

6

molecular design. Comp. Chem. Engg. 2000, 24, 677.

7

4.

8

future of chemical engineering ? Chem. Engg. Sci. 2002, 57, 4667.

9

5.

Page 44 of 68

Moggridge, G. D.; Cussler, E. L., An introduction to chemical product design. Chem.

Charpentier, J. C., The triplet "molecular processes-product-process" engineering: the

Karunanithi, A. T.; Achenie, L. E. K.; Gani, R., A new decomposition-based computer-

10

aided molecular/mixture design methodology for the design of optimal solvents and solvent

11

mixtures. Ind. Engg. Chem. Res. 2005, 44, 4785.

12

6.

13

Reflecting industry trends and challenges. Comp. Chem. Engg. 2010, 34, 857.

14

7.

15

Simultaneous solution of process and molecular design problems using an algebraic approach.

16

Comp. Chem. Engg. 2010, 34, 1481.

17

8.

18

Methodology. AICHE J. 2011, 57, 2431.

19

9.

20

promotion of organic reactions. Comp. Chem. Engg. 2005, 29, 1661.

21

10.

22

global optimization. AICHE J. 2003, 49, 1761.

23

11.

24

indices. Ind. Engg. Chem. Res. 1999, 38, 1884.

25

12.

26

ionic liquids via computational molecular design. Comp. Chem. Engg. 2010, 34, 1476.

27

13.

28

Res. Des. 2004, 82, 1494.

29

14.

30

applied thermodynamics. Chem. Engg. Res. Des. 2004, 82, 1505.

Smith, B. V.; Ierapepritou, M. G., Integrative chemical product design strategies:

Bommareddy, S.; Chemmangattuvalappil, N. G.; Solvason, C. C.; Eden, M. R.,

Conte, E.; Gani, R.; Ka Ming, N., Design of Formulated Products: A Systematic

Gani, R.; Jimenez-Gonzalez, C.; Constable, D. J. C., Method for selection of solvents for

Sahinidis, N. V.; Tawarmalani, M.; Yu, M. R., Design of alternative refrigerants via

Camarda, K. V.; Maranas, C. D., Optimization in polymer design using connectivity

McLeese, S. E.; Eslick, J. C.; Hoffmann, N. J.; Scurto, A. M.; Camarda, K. V., Design of

Gani, R., Computer-aided methods and tools for chemical product design. Chem. Engg.

Abildskov, J.; Kontogeorgis, G. M., Chemical product design - A new challenge of

42 ACS Paragon Plus Environment

Page 45 of 68

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1

15.

O'Connell, J. P.; Gani, R.; Mathias, P. M.; Maurer, G.; Olson, J. D.; Crafts, P. A.,

2

Thermodynamic Property Modeling for Chemical Process and Product Engineering: Some

3

Perspectives. Ind. Engg. Chem. Res. 2009, 48, 4619.

4

16.

5

A.; Fisher, G.; Sherwood, J.; Binks, S. P.; Curzons, A. D., Expanding GSK's solvent selection

6

guide - embedding sustainability into solvent selection starting at medicinal chemistry. Green

7

Chem. 2011, 13, 854.

8

17.

9

production stages in the pharmaceutical industry. Future Med. Chem. 2011, 3, 1469.

Henderson, R. K.; Jimenez-Gonzalez, C.; Constable, D. J. C.; Alston, S. R.; Inglis, G. G.

Nicponski, D. R.; Ramachandran, P. V., The role of solvent selection at exploratory and

10

18.

Grodowska, K.; Parczewski, A., Organic solvents in the pharmaceutical industry. Acta

11

Poloniae Pharmaceutica 2010, 67, 3.

12

19.

13

framework for crystallization solvent design. Chem. Engg. Sci. 2006, 61, 1247.

14

20.

15

design for crystallization of carboxylic acids. Comp. Chem. Engg. 2009, 33, 1014.

16

21.

17

of Needle-like Crystals: A Case Study of 2,6-Dihydroxybenzoic Acid. Crys. Growth & Des.

18

2010, 10, 4379.

19

22.

20

materials: An industrial perspective. J. Pharma. Sci. 2008, 97, 2855.

21

23.

22

combinatorial optimization. Comp. Chem. Engg. 2004, 28, 425.

23

24.

24

Prediction of azeotropic parameters. Chem. Engg. Comm. 1998, 169, 1.

25

25.

26

force-based approach. Chem. Engg. Proc. 2003, 43, 251.

27

26.

28

simultaneous separation process and product design. Chem. Engg. Proc. 2004, 43, 595.

29

27.

30

reactive separation systems. Chem. Engg. Proc. 2009, 48, 1047.

31

28.

Adjiman, C. a. G., A., Molecular systems engineering. Wiley-VCH: 2010; Vol. 6.

32

29.

Deal, C. H.; Derr, E. L., Group Contribution in mixtures. Ind. Engg. Chem. 1968, 60, 28.

Karunanithi, A. T.; Achenie, L. E. K.; Gani, R., A computer-aided molecular design

Karunanithi, A. T.; Acquah, C.; Achenie, L. E. K.; Sithambaram, S.; Suib, S. L., Solvent

Chen, J.; Trout, B. L., Computer-Aided Solvent Selection for Improving the Morphology

Chow, K.; Tong, H. H. Y.; Lum, S.; Chow, A. H. L., Engineering of pharmaceutical

Siddhaye, S.; Camarda, K.; Southard, M.; Topp, E., Pharmaceutical product design using

Eladio, P. F., Using the Group-Interaction Contribution Approach (GIC) in mixtures - 1.

Bek-Pedersen, E.; Gani, R., Design and synthesis of distillation systems using a driving-

Eden, M. R.; Jorgensen, S. B.; Gani, R.; El-Halwagi, M. M., A novel framework for

Papadopoulos, A. I.; Linke, P., Integrated solvent and process selection for separation and

43 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 46 of 68

1

30.

Poling B.E.; Prausnitz J.M.; O'Connell J.P., The properties of gases and liquids. 5th ed.;

2

McGraw-Hill: New York, 2001.

3

31.

4

contributions. Chem. Engg. Comm. 1987, 57, 233.

5

32.

6

Computerized design of multicomponent distillation-columns using unifac group contribution

7

method for calculation of activity-coefficients. Ind. Engg. Chem. Proc. Des. Dev. 1977, 16, 450.

8

33.

9

product design: Problem formulations, methodology and applications. Comp. Chem. Engg. 1996,

Joback, K. G.; Reid, R. C., Estimation of pure component properties from group

Fredenslund, A.; Gmehling, J.; Michelsen, M. L.; Rasmussen, P.; Prausnitz, J. M.,

Constantinou, L.; Bagherpour, K.; Gani, R.; Klein, J. A.; Wu, D. T., Computer aided

10

20, 685.

11

34.

12

for the estimation of physico-chemical properties of branched isomers. Chem. Engg. Comm.

13

1998, 163, 245.

14

35.

15

Fluid Phase Equilib. 2001, 183, 183.

16

36.

17

Melting Point of Organic Compounds. Chin. J. Chem. Engg. 2009, 17, 468.

18

37.

19

Academic press: New York, 1976.

20

38.

21

numbers and graph valence-shells in trees. Chem. Phys. Lett. 2002, 354, 417.

22

39.

23

of molecular connectivity. J. Mol. Graphics & Modeling 2001, 20, 4.

24

40.

25

accessibility model. Croatica Chemica Acta 2002, 75, 371.

26

41.

27

Atomic Attribute of Molecular Topological Structure. J. Data Sci. 2003, 1, 361.

28

42.

29

York 2002.

30

43.

31

structures? J. Comp. Aided Mol. Des. 2005, 19, 651.

Eladio, P. F.; Ramon, G. R., A group-interaction contribution approach. A new strategy

Marrero, J.; Gani, R., Group-contribution based estimation of pure component properties.

Wang, Q.; Ma, P.; Neng, S., Position Group Contribution Method for Estimation of

Kier, L. B.; Hall, L. H., Molecular connectivity in chemistry and drug research.

Lukovits, I.; Nikolic, S.; Trinajstic, N., On relationships between vertex-degrees, path-

Hall, L. H.; Kier, L. B., Issues in representation of molecular structure - The development

Kier, L. B.; Hall, L. H., The meaning of molecular connectivity: A bimolecular

Hu, Q.-N.; Liang, Y.-Z.; Fang, K.-T., The Matrix Expression, Topological Index and

Bicerano, J., Prediction of polymer properties. 3rd Ed. ed.; Marcel Dekker: New

Balaban, A. T., Can topological indices transmit information on properties but not on

44 ACS Paragon Plus Environment

Page 47 of 68

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1

44.

Katritzky, A. R.; Kuanar, M.; Slavov, S.; Hall, C. D.; Karelson, M.; Kahn, I.; Dobchev,

2

D. A., Quantitative Correlation of Physical and Chemical Properties with Chemical Structure:

3

Utility for Prediction. Chem. Rev. 2010, 110, 5714.

4

45.

5

V.; Radchenko, E.; Zefirov, N. S.; Makarenko, A. S.; Tanchuk, V. Y.; Prokopenko, V. V.,

6

Virtual computational chemistry laboratory - design and description. J. Comp. Aided Mol. Des.

7

2005, 19, 453.

8

46.

9

Index: A New Approach for Estimation of Density of Ionic Liquids. Ind. Engg. Chem. Res. 2011,

Tetko, I. V.; Gasteiger, J.; Todeschini, R.; Mauri, A.; Livingstone, D.; Ertl, P.; Palyulin,

Xiong, Y.; Ding, J.; Yu, D. H.; Peng, C. J.; Liu, H. L.; Hu, Y., Volumetric Connectivity

10

50, 14155.

11

47.

12

A.; Echeveria-Diaz, Y.; Zaldivar, V. R.; Tygat, J.; Borges, J. E. R.; Garcia-Domenech, R.;

13

Torrens, F.; Perez-Gimenez, F., Bond-based linear indices of the non-stochastic and stochastic

14

edge-adjacency matrix. 1. Theory and modeling of ChemPhys properties of organic molecules.

15

Mol. Diversity 2010, 14, 731.

16

48.

17

studies of aldehydes and ketones. J. Math. Chem. 2006, 40, 379.

18

49.

19

vaporization of isomers among paraffin hydrocarbons. J. Am. Chem. Soc. 1947, 69, 2636.

20

50.

21

incidence matrix. J. Comput. Chem. 2003, 24, 1812.

22

51.

23

Chem. 2004, 25, 881.

24

52.

25

contribution+ (GC+) based estimation of properties of pure components: Improved property

26

estimation and uncertainty analysis. Fluid Phase Equilib. 2012, 321, 25.

27

53.

28

molecular descriptor: Application to solvent selection. Comp. Chem. Engg. 2010, 34, 1018.

29

54.

30

problem formulation approach to molecular design using property operators based on signature

31

descriptors. Comp. Chem. Engg. 2010, 34, 2062.

Marrero-Ponce, Y.; Martinez-Albelo, E. R.; Casanola-Martin, G. M.; Castillo-Garit, J.

Lu, C. H.; Guo, W. M.; Hu, X. F.; Wang, Y.; Yin, C. S., A novel Lu index to QSPR

Wiener, H., Correlations of heats of isomerization, and differences in heats of

Yang, F.; Wang, Z. D.; Huang, Y. P.; Zhu, H. L., Novel topological index F based on

Yang, F.; Wang, Z. D.; Huang, Y. P., Modification of the Wiener index 4. J. Comput.

Hukkerikar, A. S.; Sarup, B.; Kate, A. T.; Abildskov, J.; Sin, G.; Gani, R., Group-

Weis, D. C.; Visco, D. P., Computer-aided molecular design using the Signature

Chemmangattuvalappil, N. G.; Solvason, C. C.; Bommareddy, S.; Eden, M. R., Reverse

45 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

55.

2

1987.

3

56.

4

2002.

5

57.

6

Fluid Phase Equilib. 2002, 194, 771.

7

58.

8

Equilibrium in Pharmaceutical-Solvent Mixtures: Systems with Complex Hydrogen Bonding

9

Behvaior. AICHE J. 2009, 55, 756.

Page 48 of 68

Allen, M. P.; Tildesley, D. J., Computer simulations of liquids. Clarendon Press, Oxford:

Frenkel, D.; Smit, B., Understanding molecular simulation. Academic Press: San Diego,

Kolar, P.; Shen, J. W.; Tsuboi, A.; Ishikawa, T., Solvent selection for pharmaceuticals.

Tsivintzelis, I.; Economou, I. G.; Kontogeorgis, G. M., Modeling the Solid-Liquid

10

59.

Izgorodina, E. I., Towards large-scale, fully ab initio calculations of ionic liquids. Phys.

11

Chem. Chem. Phys. 2011, 13, 4189.

12

60.

13

Chem. Engg. Res. Des. 1992, 70, 78.

14

61.

15

combined molecular modeling and group contribution. Fluid Phase Equilib. 1999, 158, 337.

16

62.

17

reaction kinetics. Chem. Engg. Sci. 2006, 61, 6199.

18

63.

19

aided polymer design using multi-scale modelling. Brazilian J. Chem. Engg. 2010, 27, 369.

20

64.

21

of solvent property parameters: implication to polymorph screening. Int. J. Pharmaceutics 2004,

22

283, 117.

23

65.

24

models for property prediction of organic chemical systems. Fluid Phase Equilib. 2011, 302,

25

274.

26

66.

27

connectivity index for pure-component property prediction. Ind. Engg. Chem. Res. 2005, 44,

28

7262.

29

67.

30

Atom Connectivity Index-Based Methods for Estimation of Surface Tension and Viscosity. Ind.

31

Engg. Chem. Res. 2008, 47, 7940.

Meniai, A. H.; Newsham, D. M. T., The selection of solvents for liquid-liquid extraction.

Harper, P. M.; Gani, R.; Kolar, P.; Ishikawa, T., Computer-aided molecular design with

Stanescu, I.; Achenie, L. E. K., A theoretical study of solvent effects on Kolbe-Schmitt

Satyanarayana, K. C.; Abildskov, J.; Gani, R.; Tsolou, G.; Mavrantzas, V. G., Computer

Gu, C. H.; Li, H.; Gandhi, R. B.; Raghavan, K., Grouping solvents by statistical analysis

Mustaffa, A. A.; Kontogeorgis, G. M.; Gani, R., Analysis and application of GC(Plus)

Gani, R.; Harper, P. M.; Hostrup, M., Automatic creation of missing groups through

Conte, E.; Martinho, A.; Matos, H. A.; Gani, R., Combined Group-Contribution and

46 ACS Paragon Plus Environment

Page 49 of 68

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1

68.

Gmehling, J.; Li, J. D.; Schiller, M., A modified UNIFAC model. 2. Present parameter

2

matrix and results for different thermodynamic properties. Ind. Engg. Chem. Res. 1993, 32, 178.

3

69.

4

of active pharmaceutical ingredients. Int. J. Pharmaceutics 2010, 388, 73.

5

70.

6

prediction of UNIFAC group interaction parameters. AICHE J. 2007, 53, 1620.

7

71.

8

design for solvents for separation processes. AICHE J. 1994, 40, 1349.

9

72.

Hahnenkamp, I.; Graubner, G.; Gmehling, J., Measurement and prediction of solubilities

Gonzalez, H. E.; Abildskov, J.; Gani, R.; Rousseaux, P.; Le Bert, B., A method for

Pretel, E. J.; Lopez, P. A.; Bottini, S. B.; Brignole, E. A., Computer-aided molecular

Karunanithi, A. T.; Acquah, C.; Achenie, L. E. K., Tuning the morphology of

10

pharmaceutical compounds via model based solvent selection. Chin. J. Chem. Engg. 2008, 16,

11

465.

12

73.

13

coefficient and aqueous solubility. Ind. Engg. Chem. Res. 2002, 41, 6623.

14

74.

15

models to the calculation of the octanol-water partition coefficient. Ind. Engg. Chem. Res. 2001,

16

40, 434.

17

75.

18

octanol-water partition coefficient. J. Chem. Inf. Modeling 2006, 46, 1598.

19

76.

20

optimized molecular connectivity index. J. Chem. Inf. Modeling 2005, 45, 930.

21

77.

22

preformulation and drug delivery.Prediction of Pharmaceutical Solubility Via NRTL-SAC and

23

COSMO-SAC. J. Pharm. Sci. 2008, 97, 1813.

24

78.

25

solid solubility for solvent selection - A review. Ind. Engg. Chem. Res. 2008, 47, 5234.

26

79.

27

and Solvent Mixtures for Drug Process Design. J. Pharm. Sci. 2009, 98, 4205.

28

80.

29

compounds to the fathead minnow (Pimephales promelas) using a group contribution method.

30

Chem. Res. Toxicol. 2001, 14, 1378.

31

81.

32

Environment-Related Properties of Chemicals for Design of Sustainable Processes: Development

Marrero, J.; Gani, R., Group-contribution-based estimation of octanol/water partition

Derawi, S. O.; Kontogeorgis, G. M.; Stenby, E. H., Application of group contribution

Sedykh, A. Y.; Klopman, G., A structural analogue approach to the prediction of the

Soskic, M.; Plavsic, D., Modeling the octanol-water partition coefficients by an

Tung, H.-H.; Tabora, J.; Variankaval, N.; Bakken, D.; Chen, C.-C., Pharmaceutics,

Modarresi, H.; Conte, E.; Abildskov, J.; Gani, R.; Crafts, P., Model-based calculation of

Ruether, F.; Sadowski, G., Modeline the Solubility of Pharmaceuticals in Pure Solvents

Martin, T. M.; Young, D. M., Prediction of the acute toxicity (96-h LC50) of organic

Hukkerikar, A. S.; Kalakul, S.; Sarup, B.; Young, D. M.; Sin, G.; Gani, R., Estimation of

47 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

of Group-Contribution(+) (GC(+)) Property Models and Uncertainty Analysis. J. Chem. Inf.

2

Modeling 2012, 52, 2823.

3

82.

4

S., Ionic Liquids and Relative Process Design. Mol. Thermo. Complex Systems 2009, 131, 143.

5

83.

6

pairs. New J. Chem. 2011, 35, 1740.

7

84.

8

optimal solvent selection. Fluid Phase Equilib. 1993, 82, 47.

9

85.

Page 50 of 68

Zhang, S.; Lu, X.; Zhang, Y.; Zhou, Q.; Sun, J.; Han, L.; Yue, G.; Liu, X.; Cheng, W.; Li,

Abraham, M. H.; Acree, W. E., Hydrogen bond descriptors and other properties of ion

Odele, O.; Macchietto, S., Computer-aided molecular design - A novel method for

Eljack, F. T.; Eden, M. R.; Kazantzi, V.; Qin, X.; El-Halwagi, M. A., Simultaneous

10

process and molecular design - A property based approach. AICHE J. 2007, 53, 1232.

11

86.

12

with target properties. Ind. Engg. Chem. Res. 1996, 35, 627.

13

87.

14

ed.; McGraw-Hill: 2001.

15

88.

16

integer nonlinear programs. Mathematical Programming 1986, 36, 307.

17

89.

18

fermentation. Fluid Phase Equilib. 2002, 201, 1.

19

90.

20

by global optimization. Comp. Chem. Engg. 1999, 23, 1381.

21

91.

22

algorithm for molecular design. Comp. Chem. Engg. 2003, 27, 551.

23

92.

24

analysis. Ind. Engg. Chem. Res. 2003, 42, 516.

25

93.

26

Global Optim. 2009, 45, 3.

27

94.

28

integrated extractive fermentation-separation process. Chem. Engg. J. 2010, 162, 809.

29

95.

30

Comp. Chem. Engg. 2009, 33, 2055.

31

96.

32

315.

Vaidyanathan, R.; El-Halwagi, M. M., Computer-aided synthesis of polymers and blends

Edgar, F. T.; Himmelblau, D. M.; Lasdon, L. S., Optimization of chemical processes. 2nd

Duran, M. A.; Grossmann, I. E., An outer-approximation algorithm for a class of mixed-

Wang, Y. P.; Achenie, L. E. K., Computer aided solvent design for extractive

Sinha, M.; Achenie, L. E. K.; Ostrovsky, G. M., Environmentally benign solvent design

Ostrovsky, G. M.; Achenie, L. E. K.; Sinha, M., A reduced dimension branch-and-bound

Sinha, M.; Achenie, L. E. K.; Gani, R., Blanket wash solvent blend design using interval

Floudas, C. A.; Gounaris, C. E., A review of recent advances in global optimization. J.

Cheng, H. C.; Wang, F. S., Computer-aided biocompatible solvent design for an

Sahinidis, N. V., Optimization techniques in molecular structure and function elucidation.

Fouskakis, D.; Draper, D., Stochastic optimization: a review. Int. Statist. Rev. 2002, 70,

48 ACS Paragon Plus Environment

Page 51 of 68

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1

97.

Venkatasubramanian, V.; Chan, K.; Caruthers, J. M., Computer-aided molecular design

2

using genetic algorithms. Comp. Chem. Engg. 1994, 18, 833.

3

98.

4

Chem. Res. 2000, 39, 1423.

5

99.

6

property estimation. Ind. Engg. Chem. Res. 2004, 43, 3419.

7

100.

8

Tabu search. Comp. Chem. Engg. 2005, 29, 337.

9

101.

van Dyk, B.; Nieuwoudt, I., Design of solvents for extractive distillation. Ind. Engg.

Lehmann, A.; Maranas, C. D., Molecular design using quantum chemical calculations for

Lin, B.; Chavali, S.; Camarda, K.; Miller, D. C., Computer-aided molecular design using

Wu, L. L.; Chang, W. X.; Guan, G. F., Extractants design based on an improved genetic

10

algorithm. Ind. Engg. Chem. Res. 2007, 46, 1254.

11

102.

12

solvents for separation processes. Chem. Engg. Tech. 2008, 31, 177.

13

103.

14

Quantum Calculations For Selecting Extractants. 20th European Symposium on Computer Aided

15

Process Engineering – ESCAPE20 2010.

16

104.

17

Application to stochastic solvent selection. Ind. Engg. Chem. Res. 2002, 41, 1285.

18

105.

19

optimization under uncertainty. Part II. Solvent selection under uncertainty. Ind. Engg. Chem.

20

Res. 2005, 44, 7138.

21

106.

22

recycling under uncertainty using a new genetic algorithm. Int. J. Environ. Pollution 2007, 29,

23

70.

24

107.

25

Environ. Policy 2011, 13, 227.

26

108.

27

design. Comp. Chem. Engg. 2002, 26, 1415.

28

109.

29

reactions: Maximizing product formation. Ind. Engg. Chem. Res. 2008, 47, 5190.

30

110.

31

Chem. 1997, 16, 293.

Song, J.; Song, H. H., Computer-aided molecular design of environmentally friendly

Serrato, B. J. C.; Gómez, P. J.; Caicedo, A. L. M., Sequential Evolutionary Design-

Kim, K. J.; Diwekar, U. M., Efficient combinatorial optimization under uncertainty. 2.

Xu, W.; Diwekar, U. M., Improved genetic algorithms for deterministic optimization and

Xu, W. Y.; Diwekar, U. M., Multi-objective integrated solvent selection and solvent

Diwekar, U.; Shastri, Y., Design for environment: a state-of-the-art review. Clean Techn.

Wang, Y. P.; Achenie, L. E. K., A hybrid global optimization approach for solvent

Folic, M.; Adjiman, C. S.; Pistikopoulos, E. N., Computer-aided solvent design for

Barwick, V. J., Strategies for solvent selection - A literature review. Trac-Trends in Anal.

49 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

111.

2

algorithm for branched molecules. Ind. Engg. Chem. Res. 2004, 43 784.

3

112.

4

process design and analysis. Chem. Engg. Sci. 2010, 65, 5757.

5

113.

6

Chem. Engg. Prog. 1999, 95, 41.

7

114.

8

in different solvents. Ind. Engg. Chem. Res. 2003, 42, 5622.

9

115.

Page 52 of 68

Cismondi, M.; Brignole, E. A., Molecular design of solvents: An efficient search

Gernaey, K. V.; Gani, R., A model-based systems approach to pharmaceutical product-

Frank, T. C.; Downey, J. R.; Gupta, S. K., Quickly screen solvents for organic solids.

Abildskov, J.; O'Connell, J. P., Predicting the solubilities of complex chemicals I. Solutes

Savova, M.; Kolusheva, T.; Stourza, A.; Seikova, I., The use of group contribution

10

method for predicting the solubility of seed polyphenols of Vitis Vinifera L. with a wide polarity

11

range in solvent mixtures. J. Univ. Chem. Tech. Metallurgy 2007, 42, 295.

12

116.

13

R., An experimental verification of morphology of ibuprofen crystals from CAMD designed

14

solvent. Chem. Engg. Sci. 2007, 62, 3276.

15

117.

16

polymorphic pharmaceuticals and fine chemicals. J. Pharm. Sci. 2005, 94, 1560.

17

118.

18

models for prediction of ibuprofen crystal morphology based on hydrogen bonding propensities.

19

Fluid Phase Equilib. 2009, 277, 73.

20

119.

21

solution. AICHE J. 2000, 46, 1348.

22

120.

23

diclofenac in different solvents. Fluid Phase Equilib. 2007, 261, 140.

24

121.

25

base: Solubility relations, supersaturation control and polymorphic behavior. J. Phys. Chem. B

26

2005, 109, 5273.

27

122.

28

multidimensional crystallization processes. Comp. Chem. Engg. 2002, 26, 1103.

29

123.

30

monocyclic aromatic hydrocarbons. J. Am. Chem. Soc. 1949, 71, 1362.

Karunanithi, A. T.; Acquah, C.; Achenie, L. E. K.; Sithambaram, S.; Suib, S. L.; Gani,

Mirmehrabi, M.; Rohani, S., An approach to solvent screening for crystallization of

Acquah, C.; Karunanithi, A. T.; Cagnetta, M.; Achenie, L. E. K.; Suib, S. L., Linear

Winn, D.; Doherty, M. F., Modeling crystal shapes of organic materials grown from

Zilnik, L. F.; Jazbinsek, A.; Hvala, A.; Vrecer, F.; Klamt, A., Solubility of sodium

Jones, H. P.; Davey, R. J.; Cox, B. G., Crystallization of a salt of a weak organic acid and

Ma, D. L.; Tafti, D. K.; Braatz, R. D., Optimal control and simulation of

Birch, S. F.; Dean, R. A.; Fidler, F. A.; Lowry, R. A., The preparation of the c(10)

50 ACS Paragon Plus Environment

Page 53 of 68

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1

124.

Wang, Q.; Ma, P. S.; Wang, C.; Xia, S. Q., Position Group Contribution Method for

2

Predicting the Normal Boiling Point of Organic Compounds. Chin. J. Chem. Engg. 2009, 17,

3

254.

4

125.

5

predicting pure component properties of biochemical and safety interest. Ind. Engg. Chem. Res.

6

2004, 43, 6253.

7

126.

8

critical pressure of organic compounds. J. Chem. Engg. Data 2008, 53, 1877.

9

127.

Stefanis, E.; Constantinou, L.; Panayiotou, C., A group-contribution method for

Wang, Q.; Jia, Q.; Ma, P., Position group contribution method for the prediction of

Sheldon, T. J.; Adjiman, C. S.; Cordiner, J. L., Pure component properties from group

10

contribution: Hydrogen-bond basicity, hydrogen-bond acidity, Hildebrand solubility parameter,

11

macroscopic surface tension, dipole moment, refractive index and dielectric constant. Fluid

12

Phase Equilib. 2005, 231, 27.

13

128.

14

Compounds at Their Normal Boiling Point with the Positional Distributive Contribution Method.

15

J. Chem. Engg. Data 2010, 55, 5614.

16

129.

17

the estimation of ionic liquid properties. Fluid Phase Equilib. 2010, 297, 107.

18

130.

19

Sadhana (Sd) index of phenylenes and its hexagonal squeezes for QSAR studies. J. Ind. Chem.

20

Soc. 2010, 87, 1449.

21

131.

22

F., Estimating the Octanol/Water Partition Coefficient for Aliphatic Organic Compounds Using

23

Semi-Empirical Electrotopological Index. Int. J. Mol. Sci. 2011, 12,7250.

24

132.

25

organic molecules to tasks of chemical informatics. Russ. Chem. Bul. 2005, 54, 2235.

26

133.

27

distance-connectivity-based topological indices. 4: Stepwise factor selection-based PCR models

28

for QSPR study of 14 properties of monoalkenes. Pol. J. Chem. 2007, 81, 269.

29

134.

30

Sum Connectivity Indices: Novel Highly Discriminating Topological Descriptors for

31

QSAR/QSPR. Chem. Biol. Drug Des. 2012, 79, 38.

Jia, Q. Z.; Wang, Q. A.; Ma, P. S., Prediction of the Enthalpy of Vaporization of Organic

Valderrama, J. O.; Rojas, R. E., Mass connectivity index, a new molecular parameter for

Aziz, S.; John, P. E.; Khadikar, P. V., Use of structure codes (counts) for computing

Souza, E. S.; Zaramello, L.; Kuhnen, C. A.; Junkes, B. D.; Yunes, R. A.; Heinzen, V. E.

Trofimov, M. I.; Smolenskii, E. A., Application of the electronegativity indices of

Shamsipur, M.; Hemmateenejad, B.; Ghavami, R.; Sharghi, H., Highly correlating

Gupta, M.; Gupta, S.; Dureja, H.; Madan, A. K., Superaugmented Eccentric Distance

51 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

135.

2

empirical topological index: a tool for QSPR/QSAR studies. J. Mol. Modeling 2005, 11, 128.

3

136.

4

J. Mol. Struct. 2003, 621, 37.

5

137.

6

property correlations. Theochem. J. Mol. Struct. 2002, 586, 137.

7

138.

Li, X. H., The extended Wiener index. Chem. Phys. Lett. 2002, 365, 135.

8

139.

Patel, S. J.; Ng, D.; Mannan, M. S., QSPR Flash Point Prediction of Solvents Using

9

Topological Indices for Application in Computer Aided Molecular Design. Ind. Engg. Chem.

Page 54 of 68

Junkes, B. D.; Arruda, A. C. S.; Yunes, R. A.; Porto, L. C.; Heinzen, V. E. F., Semi-

Torrens, F., Valence topological charge-transfer indices for dipole moments. Theochem.

Ren, B. Y., Application of novel atom-type AI topological indices in the structure-

10

Res. 2009, 48, 7378.

11

140.

12

group-contribution method. Int. J. Thermophysics 2008, 29, 568.

13

141.

14

design of solvents for separation processes. AICHE J. 1994, 40, 1349.

15

142.

16

systematic optimisation approach. Part II. Solvent design. Chem. Engg. Sci. 2000, 55, 2547.

17

143.

18

using molecular clustering. Chem. Engg. Sci. 2006, 61, 6316.

19

144.

20

Combined property clustering and GC(+) techniques for process and product design. Comp.

21

Chem. Engg. 2010, 34, 582.

Stefanis, E.; Panayiotou, C., Prediction of Hansen solubility parameters with a new

Pretel, E. J.; Lopez, P. A.; Bottini, S. B.; Brignole, E. A., Computer-aided molecular

Marcoulaki, E. C.; Kokossis, A. C., On the development of novel chemicals using a

Papadopoulos, A. I.; Linke, P., Efficient integration of optimal solvent and process design

Chemmangattuvalappil, N. G.; Solvason, C. C.; Bommareddy, S.; Eden, M. R.,

22

23

24

52 ACS Paragon Plus Environment

Page 55 of 68

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1

Appendix

2

Illustrations of normal melting point estimation using various structure property predictions

3

methods. The experimentally determined melting point for 2,5, dimethyl benzoic acid reported in

4

the literature123 is around 405.15 K.

Estimation of normal melting point of 2,5, dimethyl benzoic acid 2,5, dimethyl benzoic acid

Molecular structure

CAS No. 611-72-0 Molecular formula : C9H10O2 Molecular weight: 150.177

Group contribution method

5

Groups

Occurrences(Ni)

Contribution (Tmi)

CH3

2

-5.1

COOH

1

155.5

=CH- (ring)

3

8.13

=C< (ring)

3

37.02

T m = 1 2 2 .5 +



N iT m i

i

6

Tm = 122.5 - 5.1 × 2 + 155.5 × 1 + 8.13 × 3 + 37.02 × 3 = 403.25 K

Constantinou-Gani method First order groups

Occurrences (Ni)

Contribution (Tm1i)

aC-CH3

2

1.8635

aCH

3

1.4669

aC

1

0.2098

COOH

1

11.563

53 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 56 of 68

Second order groups

Occurrences(Mj)

Contribution(Tm2j)

aCCOOH

1

28.4324

COH

1

0.3189

1

Tmo = 102.425;

2

T  exp  m  = ∑ NiTm1i +∑ M jTm 2 j j  Tmo  i

3

exp (Tm/102.425) = (1.8635 × 2 + 1.4669 × 3 + 0.2098 × 1 +11.563 × 1 ) +

4

(28.4324 × 1 + 0.3189 × 1)

5

Tm = ln(48.6518) ×102.425 = 397.88 K

Marrero-Gani method

6

First order groups

Occurrences (Ni)

Contribution (Tm1i)

aC-COOH

1

12.4296

aC-CH3

2

1.0068

aCH

3

0.5860

Second order groups

Occurrences(Mj)

Contribution(Tm2j)

C-OH

1

0.3695

* No third order groups are involved

7

Tmo = 147.45;

8

T  exp  m  = ∑ NiTm1i + ∑ M jTm 2 j +∑ Ok Tm3k j k  Tmo  i

9

exp(Tm/147.45) = (12.4296 × 1 + 1.0068 × 2 + 0.5860 × 3) + (0.3695 × 1) = 16.5707

10

Tm =

ln(16.5707) ×147.45 = 413.985 K

11

54 ACS Paragon Plus Environment

Page 57 of 68

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Position group contribution method Group

Occurrences (N)

Contribution(A)

CO –(CH)(O)

1

34.246

Cb-(H)

3

6.224

Cb- ( C )

2

-331.51

C-(Cb)(H)3

2

49.96

O-(CO)(H)

1

369.423

Cb-(COOH)

1

1181.043

Group

Occurrences (Pk)

Contribution (Ak)

Ortho correction

1

0.777

Meta correction

2

-7.374

1

Tmo = 5963.486, N=10, a1 = -5758.997, a2=51.127;

2

N   + a exp 1 N Tm = Tmo + ∑ Ai N i + ∑ Aj tanh  j  + ∑ Ak Pk + a1 exp  1 ( )  2 N M w    k i j

3

Tm = 5963.486 + 49.96×2 + 34.246 × tanh(1/10) + 6.224 × tanh(3/10) – 331.51 × tanh(2/10) +

4

369.423 × tanh(1/10) + 1181.043 × tanh(1/10) + 1 × 0.777 – 2 × 7.374 – 5758.997 ×

5

exp(1/150.177) + 51.127 × exp(1/10) = 402.79 K

6 7 8 9

55 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 58 of 68

Connectivity index method

Group

δ

δv

Occurrence

-CH3

1

1

2

=CH-

2

3

3

=C